- Developer: Wannaphong Phatthiyaphaibun
- This report author: Wannaphong Phatthiyaphaibun
- Model date: 2020-10-03
- Model version: 0.2
- Used in PyThaiNLP version: 2.2.4 +
- GitHub: https://github.com/PyThaiNLP/pythainlp/pull/479
- CRF Model
- License: CC0
- Segmenting Thai text into clauses (smaller than a sentence but bigger than a word)
- Not suitable for other language or non-news domains.
- Based on known problems with thai natural Language processing.
- Evaluation metrics include precision, recall and f1-score.
LST20 Corpus Train set (news domain)
LST20 Corpus Test set (news domain)
precision recall f1-score support B_CLS 0.90 0.94 0.92 16111 E_CLS 0.90 0.94 0.92 15947 I_CLS 0.99 0.97 0.98 169565 micro avg 0.97 0.97 0.97 201623 macro avg 0.93 0.95 0.94 201623 weighted avg 0.97 0.97 0.97 201623 samples avg 0.94 0.94 0.94 201623
- It trains from LST20 Corpus. It is possible to have a bias from LST20 Corpus.
Caveats and Recommendations
- The user must perform word segmentation first before using this model.
- Thai text only