Transliteration

Thai W2P

Model Details

Intended Use

  • Converter thai word to thai phoneme
  • Not suitable for other language.

Factors

  • Based on thai word to thai phoneme problems.

Metrics

  • Evaluation metrics include phoneme error rate (number error / number phonemes)

Training Data

Thai W2P (80%)

Evaluation Data

Thai W2P (20%)

Quantitative Analyses

epoch: 100
step: 100, loss: 0.03179970383644104
step: 200, loss: 0.04126007482409477
step: 300, loss: 0.01877519115805626
step: 400, loss: 0.03311225399374962
per: 0.0432
per: 0.0419

Ethical Considerations

This corpus is based on the website, such as wiktionary, Royal Institute et cetera and more. It may not be the dialect that you use in everyday life.

Caveats and Recommendations

  • 1 Thai word only

Thai2Rom

Thai romanization using LSTM encoder-decoder model with attention mechanism

v0.1

Model Details

Intended Use - conversion of thai text to the Roman.

Factors - Based on known problems with thai natural Language processing.

Metrics - Evaluation metrics include precision, recall and f1-score.

Training Data Thai2Rom trainset

Evaluation Data

Thai2Rom testset

Quantitative Analyses

The model was evaluated with 3 metrics including F1-score, Exact match, Exact match at character level on the validation set (20% of the dataset or 129,642 examples).

  • F1 (macro-average): 0.987
  • Exact match: 0.883
  • Exact match (Character-level): 0.949

Ethical Considerations

no ideas

Caveats and Recommendations

  • Thai text only

Thai G2P

Thai Grapheme-to-Phoneme (Thai G2P) based on Deep Learning (Seq2Seq model)

v0.1

Model Details

Intended Use

Grapheme-to-Phoneme conversion tool.

Factors

  • Based on thai grapheme-to-phoneme conversion problems.

Metrics

f1-score.

Training Data

wiktionary trainset

Evaluation Data

wiktionary testset

Quantitative Analyses

F1 (macro-average) =  0.9415941561267093
EM =  0.71
EM (Character-level) =  0.8660247630539959
save best model em score=0.71 at epoch=1148
Save model at epoch  1148
Epoch: 1149 | Time: 2m 55s
    Train Loss: 0.352 | Train PPL:   1.422
     Val. Loss: 0.512 |  Val. PPL:   1.669
epoch=1149, teacher_forcing_ratio=0.4

Ethical Considerations

This model is based on the Thai wiktionary Dump (include bias from Thai wiktionary).

Caveats and Recommendations

  • 1 Thai word only