Han-solo - Thai syllable segmenter Released!

· July 30, 2023

🪿 Han-solo: Thai syllable segmenter

This work wants to create a Thai syllable segmenter that can work in the Thai social media domain. It use data from Wisesight Sentiment Corpus.

This work uses 2 datasets:

  1. Nutcha Dataset (Thai news domain). See more data_nutcha/
  2. Han-solo: Thai syllable segmenter dataset (Thai social media domain). See more Han-solo: Thai syllable segmenter

We train the model by CRF model that uses the same feature from ssg.

This project is developed by 🪿 Wannaphong Phatthiyaphaibun.

GitHub: PyThaiNLP/Han-solo

Twitter, Facebook