Contrastive Analysis

Example: Contrastive Analysis #

It’s possible to compare two languages given a target linguistic phenomenon. We call this a constrative grammar. We look for linguistic patterns that will predict one language over another, our sub-treebank over another, given a treebank that combined sentences of two different sources.

Scope and conclusion #

As an example, let’s see the different between Romanian finite verbs and French finite verbs.

scope: 'pattern { X[upos="VERB", VerbForm="Fin"] }'
conclusion: meta.language=Romanian

Features #

templates:
    base:
        own:
            method: exclude
            regexp: ["lemma", "textform", "form", "wordform", "SpaceAfter", "xpos", "Typo", "Language", "upos", "VerbForm"]
            lemma_top_k: 0
        parent:
            method: exclude
            regexp: ["lemma", "textform", "form", "wordform", "SpaceAfter", "xpos", "Typo", "Language"]
            lemma_top_k: 0
        child:
            method: exclude
            regexp: ["lemma", "textform", "form", "wordform", "SpaceAfter", "xpos", "Typo", "Language"]
            lemma_top_k: 0
        prev:
            method: exclude
            regexp: ["lemma", "textform", "form", "wordform", "SpaceAfter", "xpos", "Typo", "Language"]
            lemma_top_k: 0
        next:
            method: exclude
            regexp: ["lemma", "textform", "form", "wordform", "SpaceAfter", "xpos", "Typo", "Language"]
            lemma_top_k: 0

features:
    X: base

We exclude the features declared in the scope and in the conclusion.

Extraction #

python extract_rules_via_lasso.py \
    examples/UD_French-GSD-UD_Romanian-RTT-5kfinite_verbs_train.conllu
    --output examples/romanian_french_finite_verbs.json \
    --patterns examples/patterns_ro_fr_finite_verbs.yml \
    --max-degree 2 \

Results #

WIP 🚧