- Developer: Wannaphong Phatthiyaphaibun
- This report author: Wannaphong Phatthiyaphaibun
- Model date: 2022-10-14
- Model version: 1.0
- Used in PyThaiNLP version: 3.2 +
- GitHub: https://github.com/PyThaiNLP/pythainlp/issues/729
- CRF Model
- License: CC0
- Segmenting Thai text into clauses (smaller than a sentence but bigger than a word)
- Not suitable for other language or non-news domains.
- Based on known problems with thai natural Language processing.
- Evaluation metrics include precision, recall and f1-score.
precision recall f1-score support B_CLS 1.00 1.00 1.00 91698 E_CLS 1.00 1.00 1.00 91700 I_CLS 1.00 1.00 1.00 707795 micro avg 1.00 1.00 1.00 891193 macro avg 1.00 1.00 1.00 891193 weighted avg 1.00 1.00 1.00 891193 samples avg 1.00 1.00 1.00 891193
- It trains from Blackboard treebank. It is possible to have a bias from Blackboard treebank.
Caveats and Recommendations
- The user must perform word segmentation first before using this model.
- Thai text only