pythainlp.parse

The pythainlp.parse module provides dependency parsing for the Thai language. Dependency parsing is a fundamental task in natural language processing that involves identifying the grammatical relationships between words in a sentence, which helps to analyze sentence structure and meaning.

Modules

dependency_parsing

pythainlp.parse.dependency_parsing(text: str, model: str | None = None, tag: str = 'str', engine: str = 'esupar') List[List[str]] | str[source]

Dependency Parsing

Parameters:
  • text (str) – text to apply dependency parsing to

  • model (str) – model for using with engine (for esupar and transformers_ud)

  • tag (str) – output type (str or list)

  • engine (str) – the name of dependency parser

Returns:

str (conllu) or List

Return type:

Union[List[List[str]], str]

Options for engine
  • esupar (default) - Tokenizer, POS tagger and Dependency parser using BERT/RoBERTa/DeBERTa models. GitHub

  • spacy_thai - Tokenizer, POS tagger, and dependency parser for the Thai language, using Universal Dependencies. GitHub

  • transformers_ud - TransformersUD GitHub

  • ud_goeswith - POS tagging and dependency parsing using goeswith for subwords

Options for model (esupar engine)
  • th (default) - KoichiYasuoka/roberta-base-thai-spm-upos model Huggingface

  • KoichiYasuoka/deberta-base-thai-upos - DeBERTa(V2) model pre-trained on Thai Wikipedia texts for POS tagging and dependency parsing Huggingface

  • KoichiYasuoka/roberta-base-thai-syllable-upos - RoBERTa model pre-trained on Thai Wikipedia texts for POS tagging and dependency parsing. (syllable level) Huggingface

  • KoichiYasuoka/roberta-base-thai-char-upos - RoBERTa model pre-trained on Thai Wikipedia texts for POS tagging and dependency parsing. (char level) Huggingface

If you want to train models for esupar, you can read Huggingface

Options for model (transformers_ud engine)
  • KoichiYasuoka/deberta-base-thai-ud-head (default) - DeBERTa(V2) model pretrained on Thai Wikipedia texts for dependency parsing (head-detection using Universal Dependencies) and question-answering, derived from deberta-base-thai. trained by th_blackboard.conll. Huggingface

  • KoichiYasuoka/roberta-base-thai-spm-ud-head - roberta model pretrained on Thai Wikipedia texts for dependency parsing. Huggingface

Options for model (ud_goeswith engine)
  • KoichiYasuoka/deberta-base-thai-ud-goeswith (default) - This is a DeBERTa(V2) model pre-trained on Thai Wikipedia texts for POS tagging and dependency parsing (using goeswith for subwords) Huggingface

Example:

from pythainlp.parse import dependency_parsing

print(dependency_parsing("ผมเป็นคนดี", engine="esupar"))
# output:
# 1       ผม      _       PRON    _       _       3       nsubj   _       SpaceAfter=No
# 2       เป็น     _       VERB    _       _       3       cop     _       SpaceAfter=No
# 3       คน      _       NOUN    _       _       0       root    _       SpaceAfter=No
# 4       ดี       _       VERB    _       _       3       acl     _       SpaceAfter=No

print(dependency_parsing("ผมเป็นคนดี", engine="spacy_thai"))
# output:
# 1       ผม              PRON    PPRS    _       2       nsubj   _       SpaceAfter=No
# 2       เป็น             VERB    VSTA    _       0       ROOT    _       SpaceAfter=No
# 3       คนดี             NOUN    NCMN    _       2       obj     _       SpaceAfter=No

The dependency_parsing function is the core component of the pythainlp.parse module. It offers dependency parsing capabilities for the Thai language. Given a Thai sentence as input, this function parses the sentence to identify the grammatical relationships between words, creating a dependency tree that represents the sentence’s structure.

Usage

To use the dependency_parsing function for Thai dependency parsing, follow these steps:

  1. Import the pythainlp.parse module.

  2. Use the dependency_parsing function with a Thai sentence as input.

  3. The function will return the dependency parsing results, which include information about the grammatical relationships between words.

Example

Here’s a basic example of how to use the dependency_parsing function:

from pythainlp.parse import dependency_parsing

# Input Thai sentence
sentence = "พี่น้องชาวบ้านกำลังเลี้ยงสตางค์ในสวน"

# Perform dependency parsing
parsing_result = dependency_parsing(sentence)

# Print the parsing result
print(parsing_result)