Model Details

Intended Use

Language Modeling for Thai text classification pretrained or more.


Based on known problems with Thai natural Language processing. Language Modeling for many tasks of Natural Language processing. Ep. text classification, text generation, and more.


Evaluation metrics include Perplexity.

Training Data

Thai Wikipedia Dump last updated February 17, 2019

Evaluation Data

Thai Wikipedia Dump by using 40M/200k/200k tokens of train-validation-test split

Quantitative Analyses

perplexity is 28.71067 with 60,005 embeddings at 400 dimensions

Ethical Considerations

This language model is based on the Thai Wikipedia Dump (include bias from Thai Wikipedia).

Caveats and Recommendations

It’s want to have fastai 1.9 for using it or using it from pythainlp. It supports Thai Language only.