pythainlp.parse

The pythainlp.parse is dependency parsing for Thai.

Modules

pythainlp.parse.dependency_parsing(text: str, model: Optional[str] = None, tag: str = 'str', engine: str = 'esupar') → Union[List[List[str]], str][source]

Dependency Parsing

Parameters

text (str) – text to do dependency parsing
model (str) – model for using with engine (for esupar and transformers_ud)
tag (str) – output type (str or list)
engine (str) – the name dependency parser

Returns

str (conllu) or List

Return type

Union[List[List[str]], str]

Options for engine

esupar (default) - Tokenizer POS-tagger and Dependency-parser with BERT/RoBERTa/DeBERTa model. GitHub
spacy_thai - Tokenizer, POS-tagger, and dependency-parser for Thai language, working on Universal Dependencies. GitHub
transformers_ud - TransformersUD GitHub

Options for model (esupar engine)

th (default) - KoichiYasuoka/roberta-base-thai-spm-upos model Huggingface
KoichiYasuoka/deberta-base-thai-upos - DeBERTa(V2) model pre-trained on Thai Wikipedia texts for POS-tagging and dependency-parsing Huggingface
KoichiYasuoka/roberta-base-thai-syllable-upos - RoBERTa model pre-trained on Thai Wikipedia texts for POS-tagging and dependency-parsing. (syllable level) Huggingface
KoichiYasuoka/roberta-base-thai-char-upos - RoBERTa model pre-trained on Thai Wikipedia texts for POS-tagging and dependency-parsing. (char level) Huggingface

If you want to train model for esupar, you can read Huggingface

Options for model (transformers_ud engine)

KoichiYasuoka/deberta-base-thai-ud-head (default) - DeBERTa(V2) model pretrained on Thai Wikipedia texts for dependency-parsing (head-detection on Universal Dependencies) as question-answering, derived from deberta-base-thai. trained by th_blackboard.conll. Huggingface
KoichiYasuoka/roberta-base-thai-spm-ud-head - roberta model pretrained on Thai Wikipedia texts for dependency-parsing. Huggingface

Example

from pythainlp.parse import dependency_parsing

print(dependency_parsing("ผมเป็นคนดี", engine="esupar"))
# output:
# 1       ผม      _       PRON    _       _       3       nsubj   _       SpaceAfter=No
# 2       เป็น     _       VERB    _       _       3       cop     _       SpaceAfter=No
# 3       คน      _       NOUN    _       _       0       root    _       SpaceAfter=No
# 4       ดี       _       VERB    _       _       3       acl     _       SpaceAfter=No

print(dependency_parsing("ผมเป็นคนดี", engine="spacy_thai"))
# output:
# 1       ผม              PRON    PPRS    _       2       nsubj   _       SpaceAfter=No
# 2       เป็น             VERB    VSTA    _       0       ROOT    _       SpaceAfter=No
# 3       คนดี             NOUN    NCMN    _       2       obj     _       SpaceAfter=No