Blackboard CLS
V1.0
Model Details
- Developer: Wannaphong Phatthiyaphaibun
- This report author: Wannaphong Phatthiyaphaibun
- Model date: 2022-10-14
- Model version: 1.0
- Used in PyThaiNLP version: 3.2 +
- Filename:
pythainlp/corpus/blackboard-cls_v1.0.crfsuite
- GitHub: https://github.com/PyThaiNLP/pythainlp/issues/729
- CRF Model
- License: CC0
Intended Use
- Segmenting Thai text into clauses (smaller than a sentence but bigger than a word)
- Not suitable for other language or non-news domains.
Factors
- Based on known problems with thai natural Language processing.
Metrics
- Evaluation metrics include precision, recall and f1-score.
Training Data
Blackboard treebank
Evaluation Data
Blackboard treebank
Quantitative Analyses
precision recall f1-score support
B_CLS 1.00 1.00 1.00 91698
E_CLS 1.00 1.00 1.00 91700
I_CLS 1.00 1.00 1.00 707795
micro avg 1.00 1.00 1.00 891193
macro avg 1.00 1.00 1.00 891193
weighted avg 1.00 1.00 1.00 891193
samples avg 1.00 1.00 1.00 891193
Ethical Considerations
- It trains from Blackboard treebank. It is possible to have a bias from Blackboard treebank.
Caveats and Recommendations
- The user must perform word segmentation first before using this model.
- Thai text only