CRFcut

v1.0

Model Details

  • Developer: Chonlapat Patanajirasit
  • This report author: Wannaphong Phatthiyaphaibun
  • Model date: 2020-05-09
  • Model version: 1.0
  • Used in PyThaiNLP version: 2.2 +
  • Filename: pythainlp/corpus/sentenceseg_crfcut.model
  • GitHub: https://github.com/vistec-AI/crfcut
  • CRF Model
  • License: CC0

Intended Use - Segmenting Thai text into sentences.

Factors - Based on known problems with thai natural Language processing.

Metrics - Evaluation metrics include precision, recall and f1-score.

Training Data Ted + Orchid + Fake review

Evaluation Data

Ted + Orchid + Fake review dataset validate

Quantitative Analyses

The result of CRF-Cut is trained by different datasets are as follows:

dataset-train dataset-validate I-precision I-recall I-fscore E-precision E-recall E-fscore space-correct
Ted Ted 0.99 0.99 0.99 0.74 0.70 0.72 0.82
Ted Orchid 0.95 0.99 0.97 0.73 0.24 0.36 0.73
Ted Fake review 0.98 0.99 0.98 0.86 0.70 0.77 0.78
Orchid Ted 0.98 0.98 0.98 0.56 0.59 0.58 0.71
Orchid Orchid 0.98 0.99 0.99 0.85 0.71 0.77 0.87
Orchid Fake review 0.97 0.99 0.98 0.77 0.63 0.69 0.70
Fake review Ted 0.99 0.95 0.97 0.42 0.85 0.56 0.56
Fake review Orchid 0.97 0.96 0.96 0.48 0.59 0.53 0.67
Fake review Fake review 1 1 1 0.98 0.96 0.97 0.97
Ted + Orchid + Fake review Ted 0.99 0.98 0.99 0.66 0.77 0.71 0.78
Ted + Orchid + Fake review Orchid 0.98 0.98 0.98 0.73 0.66 0.69 0.82
Ted + Orchid + Fake review Fake review 1 1 1 0.98 0.95 0.96 0.96

Ethical Considerations

no ideas

Caveats and Recommendations

  • Thai text only