Part of speech

orchid perceptron

Model Details

Intended Use

  • Part of speech for Thai.
  • Not suitable for other languages or other domains of orchid corpus.


  • Based on known problems with thai natural Language processing.


  • Evaluation metrics include precision, recall and f1-score.

Training Data Orchid Corpus

Evaluation Data Orchid Corpus

Quantitative Analyses

No data (This corpus do not have the test set.)

Ethical Considerations

  • It trains from orchid Corpus. It is possible to have a bias from orchid Corpus.

Caveats and Recommendations

  • Thai word token only

LST20 perceptron

Model Details

Intended Use

  • Part of speech for Thai.
  • Not suitable for other languages or other domains of LST20 corpus.

Factors - Based on known problems with thai natural Language processing.


  • Evaluation metrics include precision, recall and f1-score.

Training Data

LST20 Corpus Train set

Evaluation Data

LST20 Corpus Test set

Quantitative Analyses

              precision    recall  f1-score   support

          AJ       0.90      0.87      0.88      4403
          AV       0.88      0.79      0.83      6722
          AX       0.95      0.94      0.95      7556
          CC       0.94      0.97      0.95     17613
          CL       0.87      0.85      0.86      3739
          FX       0.99      0.99      0.99      6918
          IJ       1.00      0.25      0.40         4
          NG       1.00      1.00      1.00      1694
          NN       0.97      0.98      0.98     58568
          NU       0.98      0.98      0.98      6256
          PA       0.88      0.89      0.88       194
          PR       0.88      0.85      0.86      2139
          PS       0.94      0.93      0.94     10886
          PU       1.00      1.00      1.00     37973
          VV       0.95      0.97      0.96     42586
          XX       0.00      0.00      0.00        27

    accuracy                           0.96    207278
   macro avg       0.88      0.83      0.84    207278
weighted avg       0.96      0.96      0.96    207278

Ethical Considerations

  • It trains from LST20 Corpus. It is possible to have a bias from LST20 Corpus.

Caveats and Recommendations

  • Thai word token only

^ Back to top

UD_Thai-PUD Part-of-speech


Model Details

Intended Use

  • Part of speech for Thai.
  • Not suitable for other languages or other domains of UD_Thai-PUD corpus.

Factors - Based on known problems with thai natural Language processing.

Metrics None

Training Data

UD_Thai-PUD v2.2

Evaluation Data


Quantitative Analyses

(This corpus do not have the test set.)

Ethical Considerations no ideas

Caveats and Recommendations

  • Thai word token only


Model Details

  • Developer: Wannaphong Phatthiyaphaibun
  • This report author: Wannaphong Phatthiyaphaibun
  • Model date: 2021-7-31
  • Model version: 0.2
  • Used in PyThaiNLP version: 2.4 +
  • Filename: pythainlp/corpus/pos_ud_unigram-v0.2.json and pythainlp/corpus/pos_ud_unigram-v0.2.json
  • unigram model & perceptron model
  • License: CC0
  • GitHub:
  • train notebook:

Intended Use

  • Part of speech for Thai.
  • Not suitable for other languages or other domains of UD_Thai-PUD corpus.

Factors - Based on known problems with thai natural Language processing.

Metrics None

Training Data

UD_Thai-PUD v2.8

Evaluation Data


Quantitative Analyses None

Ethical Considerations no ideas

Caveats and Recommendations

  • Thai word token only

blackboard perceptron

Model Details

Intended Use

  • Part of speech for Thai.
  • Not suitable for other languages or other domains of Blackboard treebank.

Factors - Based on known problems with thai natural Language processing.


  • Evaluation metrics include precision, recall and f1-score.

Training Data

Blackboard treebank

Evaluation Data

Blackboard treebank

Quantitative Analyses

              precision    recall  f1-score   support

          AJ       0.90      0.90      0.90     16030
          AV       0.92      0.91      0.91     38078
          AX       0.97      0.96      0.97     44719
          CC       0.98      0.99      0.99    127801
          CL       0.93      0.87      0.90      6738
          FX       1.00      1.00      1.00     28991
          IJ       1.00      0.58      0.74        12
          NG       1.00      1.00      1.00     12121
          NN       0.99      0.99      0.99    283971
          NU       0.98      0.97      0.98     19220
          PA       0.98      0.88      0.93      1916
          PR       0.93      0.89      0.91     12869
          PS       0.96      0.96      0.96     39317
          PU       1.00      1.00      1.00      1576
          VV       0.98      0.98      0.98    257831
          XX       1.00      0.50      0.67         4

    accuracy                           0.98    891194
   macro avg       0.97      0.90      0.93    891194
weighted avg       0.98      0.98      0.98    891194

Ethical Considerations

  • It trained from Blackboard treebank. It is possible to have a bias from Blackboard treebank.

Caveats and Recommendations

  • Thai word token only

^ Back to top