List Corpus & Models

PyPI
List PyThaiNLP Corpus and Models.

blackboard_pt_tagger

part-of-speech tagging (perceptron) from blackboar ...

View details »

blackboard_unigram_tagger

part-of-speech tagging (unigram) from blackboard t ...

View details »

g2p

Thai grapheme to phoneme

View details »

lst20-cls

lst20-cls (LST20)

View details »

ltw2v

LTW2V: The Large Thai Word2Vec

View details »

ltw2v_v1.0_15_window

LTW2V: The Large Thai Word2Vec v1.0 (15 window)

View details »

ltw2v_v1.0_5_window

LTW2V: The Large Thai Word2Vec v1.0 (5 window)

View details »

onnx_lst20ner

lst20 ner model

View details »

oscar_icu

Thai unigram word frequency from OSCAR Corpus (icu ...

View details »

pos_lst20_perceptron

Perceptron POS tagger (LST20)

View details »

pos_lst20_unigram

Unigram POS tagger (LST20)

View details »

scb_1m_en-th_moses

SCB_1M+TBASE_en-th_moses-spm.

View details »

scb_1m_en-th_spm

scb_1m_en-th_spm

View details »

scb_1m_th-en_newmm

SCB_1M+TBASE_th-en_newmm-moses.

View details »

scb_1m_th-en_spm

SCB_1M+TBASE_th-en_spm-spm.

View details »

scb_en_th

scb_1m_en-th_spm

View details »

scb_th_en

scb_1m_en-th_spm

View details »

test_zip

It's a test file.

View details »

thai-g2p

Thai grapheme to phoneme (PyTorch)

View details »

thai2fit_wv

thai2vec word embeddings

View details »

thai2rom-dataset

Thai romanization model

View details »

thai2rom-pytorch

Thai romanization model (LSTM)

View details »

thai2rom-pytorch-attn

Thai romanization model (LSTM-Attention)

View details »

thai_dict

This dataset collect from Thai wiktionary.

View details »

thai_nner

Thai Nested Named Entity Recognition

View details »

thai_synonym

The synonym for thai (open source & open data)

View details »

thai_w2p

Thai Word-to-Phoneme (W2P) converter

View details »

thainer

Thai Named Entity Recognition

View details »

thainer-1.4

Thai Named Entity Recognition 1.4 for PyThaiNLP 2. ...

View details »

tnc_bigram_word_freqs

Bigram word frequency from Thai National Corpus (T ...

View details »

tnc_trigram_word_freqs

Trigram word frequency from Thai National Corpus ( ...

View details »

wiki_itos_lstm

ULMFit index to text for LSTM

View details »

wiki_lm_lstm

Wikipedia-pretrained ULMFit language model for LST ...

View details »