- Developer: Charin Polpanumas
- This report author: Wannaphong Phatthiyaphaibun
- Model date: 2019-06-14
- Model version: 0.32
- Used in PyThaiNLP version: 2.0+
- GitHub: https://github.com/cstorm125/thai2fit
- Notebook for training: https://github.com/cstorm125/thai2fit/blob/96fe40d1a9f270dfe0d3a61d2a93254df4078b0d/thwiki_lm/thwiki_lm.ipynb
- Language Model
- License: MIT License
Language Modeling for Thai text classification pretrained or more.
Based on known problems with Thai natural Language processing. Language Modeling for many tasks of Natural Language processing. Ep. text classification, text generation, and more.
Evaluation metrics include Perplexity.
Thai Wikipedia Dump last updated February 17, 2019
Thai Wikipedia Dump by using 40M/200k/200k tokens of train-validation-test split
28.71067 with 60,005 embeddings at 400 dimensions
This language model is based on the Thai Wikipedia Dump (include bias from Thai Wikipedia).
Caveats and Recommendations
It’s want to have fastai 1.9 for using it or using it from pythainlp. It supports Thai Language only.