Tutorial: Results Analysis #
Grex returns raw results. Rules can overlap and be sub-rules matching subsets of others. The postprocessing filters are under development (WIP 🚧)
Rules capture strong correlations between the features in the sample space of the scope and the conclusion. Rules could be:
- pertinent linguistic patterns
- corpus properties
- irrelevant patterns
The results are encoded in a JSON file. This file contains information on the scope and conclusion, the given file path, the intercept values for each model, and the predicted patterns.
Grex returns patterns for
- positive rules: patterns that favor the conclusion
- negative rules: patterns that support the negation (¬) of the conclusion
At the top of the file, you will find some general information about the extraction.
- s_occs: number of occurrences matched by the scope.
- q_occs: number of occurrences matched by the scope and the conclusion.
Each pattern is associated with a series of information and statistics to interpret the model’s decision.
- p_occs: number of matches of the selected pattern within the scope.
- pq_occs: number of occurrences matching the pattern p and the conclusion within the scope.
- decision: indicator of a positive or negative rule
- coverage: The proportion of occurrences matching the conclusion that are captured by the pattern P, within the scope.
- precision: the proportion of occurrences matched by the pattern that actually match also the conclusion, whithin the scope.
- delta: the difference in frequency between occurrences matching the pattern and conclusion, and the expected frequency under the independence hypothesis.
Some information from statistical inference:
- g-statistic: value of the g-statistic
- p-value: the probability of observing the value of the g-statistic or a more extreme value under the independence hypothesis (Note that if sample is too large, it becomes uninformative.)
- cramers_phi: effect size measure
And some information regarding the model and the ranking:
- alpha: the value of the penalty parameter when the function is activated. The partial order given by the alpha values can be interpreted as a salient order.
- coef: weight assigned to the feature