Evaluation
The machine translation models will be evaluated using the following metrics:
- CHRF++: ChrF and ChrF++ are two MT evaluation metrics. They both use the F-score statistic for character n-gram matches, and ChrF++ adds word n-grams as well which correlates more strongly with direct assessment.
- BLEU: a metric for automatically evaluating machine-translated text. The BLEU score is a number between zero and one that measures the similarity of the machine-translated text to a set of high quality reference translations.
- TER: TER (Translation Edit Rate, also called Translation Error Rate) is a metric to quantify the edit operations that a hypothesis requires to match a reference translation.
However, the submitted models will be ranked based on CHRF++ scores.
For simulating the evaluation SacreBLEU library can be used.
Use this Link