pythainlp.wsd

The pythainlp.wsd contains get word sense function for Thai Word Sense Disambiguation (WSD). The pythainlp.wsd module is designed to assist in Word Sense Disambiguation (WSD) for the Thai language. Word Sense Disambiguation is a crucial task in natural language processing that involves determining the correct sense or meaning of a word within a given context. This module provides a function for achieving precisely that.

Modules

pythainlp.wsd.get_sense(sentence: str, word: str, device: str = 'cpu', custom_dict: dict = {}, custom_tokenizer: ~pythainlp.tokenize.core.Tokenizer = <pythainlp.tokenize.core.Tokenizer object>) List[Tuple[str, float]][source]

Get word sense from the sentence. This function will get definition and distance from context in sentence.

Parameters:
  • sentence (str) – Thai sentence

  • word (str) – Thai word

  • device (str) – device for running model on.

  • custom_dict (dict) – Thai dictionary {“word”:[“definition”,..]}

  • custom_tokenizer (Tokenizer) – Tokenizer used to tokenize words in sentence.

Returns:

a list of definitions and distances (1 - cos_sim) or an empty list (if word is not in the dictionary)

Return type:

List[Tuple[str, float]]

We get the ideas from Context-Aware Semantic Similarity Measurement for Unsupervised Word Sense Disambiguation to build get_sense function.

Use Thai dictionary from wiktionary. See thai_dict.

Use sentence transformers model from sentence-transformers/paraphrase-multilingual-mpnet-base-v2 for unsupervised word sense disambiguation.

Example:

from pythainlp.wsd import get_sense
print(get_sense("เขากำลังอบขนมคุกกี้","คุกกี้"))
# output:
# [('โปรแกรมคอมพิวเตอร์ใช้ในทางอินเทอร์เน็ตสำหรับเก็บข้อมูลของผู้ใช้งาน',
#   0.0974416732788086),
#  ('ชื่อขนมชนิดหนึ่งจำพวกขนมเค้ก แต่ทำเป็นชิ้นเล็ก ๆ แบน ๆ แล้วอบให้กรอบ',
#   0.09319090843200684)]

print(get_sense("เว็บนี้ต้องการคุกกี้ในการทำงาน","คุกกี้"))
# output:
# [('โปรแกรมคอมพิวเตอร์ใช้ในทางอินเทอร์เน็ตสำหรับเก็บข้อมูลของผู้ใช้งาน',
#   0.1005704402923584),
#  ('ชื่อขนมชนิดหนึ่งจำพวกขนมเค้ก แต่ทำเป็นชิ้นเล็ก ๆ แบน ๆ แล้วอบให้กรอบ',
#   0.12473666667938232)]

The get_sense function is the primary tool within this module for performing Word Sense Disambiguation in Thai text. Given a word and its context, this function returns the most suitable sense or meaning for that word. This is particularly useful for tasks where word sense ambiguity needs to be resolved, such as text understanding and translation.

By using the pythainlp.wsd module, you can enhance the accuracy of your NLP applications when dealing with Thai text, ensuring that words are interpreted in the correct context.