Thai romanization using LSTM encoder-decoder model with attention mechanism


Model Details

Intended Use - conversion of thai text to the Roman.

Factors - Based on known problems with thai natural Language processing.

Metrics - Evaluation metrics include precision, recall and f1-score.

Training Data Thai2Rom trainset

Evaluation Data

Thai2Rom testset

Quantitative Analyses

The model was evaluated with 3 metrics including F1-score, Exact match, Exact match at character level on the validation set (20% of the dataset or 129,642 examples).

  • F1 (macro-average): 0.987
  • Exact match: 0.883
  • Exact match (Character-level): 0.949

Ethical Considerations

no ideas

Caveats and Recommendations

  • Thai text only