Thai NER v2.0
This version was released at Hugging Face Hub, and the model was trained by WangchanBERTa base model.
Training script and split data: https://zenodo.org/record/7761354
Dataset
Size
- Train: 3,938 docs
- Validation: 1,313 docs
- Test: 1,313 Docs
Some data come from crowdsourcing between Dec 2018 - Nov 2019. https://github.com/wannaphong/thai-ner
Domain
- News (It, politics, economy, social)
- PR (KKU news)
- general
Source
- I use sone data from Nutcha’s theses (http://pioneer.chula.ac.th/~awirote/Data-Nutcha.zip) and improve data by rechecking and adding more tagging.
- Blognone.com - It news
- thaigov.go.th
- kku.ac.th
And more (the lists are lost.)
Tag
- DATA - date
- TIME - time
- EMAIL - email
- LEN - length
- LOCATION - Location
- ORGANIZATION - Company / Organization
- PERSON - Person name
- PHONE - phone number
- TEMPERATURE - temperature
- URL - URL
- ZIP - Zip code
- MONEY - the amount
- LAW - legislation
- PERCENT - PERCENT
Download: HuggingFace Hub
Model
The model was trained by WangchanBERTa base model.
Validation from the Validation set
- Precision: 0.830336794125095
- Recall: 0.873701039168665
- F1: 0.8514671513892494
- Accuracy: 0.9736483416628805
Test from the Test set
- Precision: 0.8199168093956447
- Recall: 0.8781446540880503
- F1: 0.8480323927622422
- Accuracy: 0.9724346779516247
Download: HuggingFace Hub
Cite
Wannaphong Phatthiyaphaibun. (2022). Thai NER 2.0 (2.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7761354
or BibTeX
@dataset{wannaphong_phatthiyaphaibun_2022_7761354,
author = {Wannaphong Phatthiyaphaibun},
title = {Thai NER 2.0},
month = sep,
year = 2022,
publisher = {Zenodo},
version = {2.0},
doi = {10.5281/zenodo.7761354},
url = {https://doi.org/10.5281/zenodo.7761354}
}