pythainlp.soundex

The pythainlp.soundex module provides soundex algorithms for the Thai language. Soundex is a phonetic algorithm used to encode words or names into a standardized representation based on their pronunciation, making it useful for tasks like name matching and search.

Modules

soundex

pythainlp.soundex.soundex(text: str, engine: str = 'udom83', length: int = 4) str[source]

This function converts Thai text into phonetic code.

Parameters:
  • text (str) – word

  • engine (str) – soundex engine

  • length (int) – preferred length of the Soundex code (default is 4) for metasound and prayut_and_somchaip only

Returns:

Soundex code

Return type:

str

Options for engine:
  • udom83 (default) - Thai soundex algorithm proposed by Vichit Lorchirachoonkul [2]

  • lk82 - Thai soundex algorithm proposed by Wannee Udompanich [3]

  • metasound - Thai soundex algorithm based on a combination of Metaphone and Soundex proposed by Snae & Brückner [1]

  • prayut_and_somchaip - Thai-English Cross-Language Transliterated Word Retrieval using Soundex Technique [4]

Example:

from pythainlp.soundex import soundex

soundex("ลัก"), soundex("ลัก", engine='lk82'), \
    soundex("ลัก", engine='metasound')
# output: ('ร100000', 'ร1000', 'ล100')

soundex("รัก"), soundex("รัก", engine='lk82'), \
    soundex("รัก", engine='metasound')
# output: ('ร100000', 'ร1000', 'ร100')

soundex("รักษ์"), soundex("รักษ์", engine='lk82'), \
    soundex("รักษ์", engine='metasound')
# output: ('ร100000', 'ร1000', 'ร100')

soundex("บูรณการ"), soundex("บูรณการ", engine='lk82'), \
    soundex("บูรณการ", engine='metasound')
# output: ('บ931900', 'บE419', 'บ551')

soundex("ปัจจุบัน"), soundex("ปัจจุบัน", engine='lk82'), \
    soundex("ปัจจุบัน", engine='metasound')
# output: ('ป775300', 'ป3E54', 'ป223')

soundex("vp", engine="prayut_and_somchaip")
# output: '11'
soundex("วีพี", engine="prayut_and_somchaip")
# output: '11'

The soundex function is a basic Soundex algorithm for the Thai language. It encodes a Thai word into a Soundex code, allowing for approximate matching of words with similar pronunciation.

lk82

pythainlp.soundex.lk82(text: str) str[source]

This function converts Thai text into phonetic code with the Thai soundex algorithm named LK82 [3].

Parameters:

text (str) – Thai word

Returns:

LK82 soundex of the given Thai word

Return type:

str

Example:

from pythainlp.soundex import lk82

lk82("ลัก")
# output: 'ร1000'

lk82("รัก")
# output: 'ร1000'

lk82("รักษ์")
# output: 'ร1000'

lk82("บูรณการ")
# output: 'บE419'

lk82("ปัจจุบัน")
# output: 'ป3E54'

The lk82 module implements the Thai Soundex algorithm proposed by Vichit Lorchirachoonkul in 1982. This module is suitable for encoding Thai words into Soundex codes for phonetic comparisons.

udom83

pythainlp.soundex.udom83(text: str) str[source]

This function converts Thai text into phonetic code with the Thai soundex algorithm named Udom83 [2].

Parameters:

text (str) – Thai word

Returns:

Udom83 soundex

Return type:

str

Example:

from pythainlp.soundex import udom83

udom83("ลัก")
# output : 'ล100'

udom83("รัก")
# output: 'ร100'

udom83("รักษ์")
# output: 'ร100'

udom83("บูรณการ")
# output: 'บ5515'

udom83("ปัจจุบัน")
# output: 'ป775300'

The udom83 module is based on a homonymic approach for sound-alike string search. It encodes Thai words using the Wannee Udompanich Soundex algorithm developed in 1983.

metasound

pythainlp.soundex.metasound(text: str, length: int = 4) str[source]

This function converts Thai text into phonetic code with the matching technique called MetaSound [1] (combination between Soundex and Metaphone algorithms). MetaSound algorithm was developed specifically for the Thai language.

Parameters:
  • text (str) – Thai text

  • length (int) – preferred length of the MetaSound code (default is 4)

Returns:

MetaSound for the given text

Return type:

str

Example:

from pythainlp.soundex.metasound import metasound

metasound("ลัก")
# output: 'ล100'

metasound("รัก")
# output: 'ร100'

metasound("รักษ์")
# output: 'ร100'

metasound("บูรณการ", 5)
# output: 'บ5515'

metasound("บูรณการ", 6))
# output: 'บ55150'

metasound("บูรณการ", 4)
# output: 'บ551'

The metasound module implements a novel phonetic name matching algorithm with a statistical ontology for analyzing names based on Thai astrology. It offers advanced phonetic matching capabilities for Thai names.

prayut_and_somchaip

pythainlp.soundex.prayut_and_somchaip(text: str, length: int = 4) str[source]

This function converts English-Thai Cross-Language Transliterated Word into phonetic code with the matching technique called Soundex [4].

Parameters:
  • text (str) – English-Thai Cross-Language Transliterated Word

  • length (int) – preferred length of the Soundex code (default is 4)

Returns:

Soundex for the given text

Return type:

str

Example:

from pythainlp.soundex.prayut_and_somchaip import prayut_and_somchaip

prayut_and_somchaip("king", 2)
# output: '52'

prayut_and_somchaip("คิง", 2)
# output: '52'

The prayut_and_somchaip module is designed for Thai-English cross-language transliterated word retrieval using the Soundex technique. It is particularly useful for matching transliterated words in both languages.

pythainlp.soundex.sound.word_approximation

pythainlp.soundex.sound.word_approximation(word: str, list_word: List[str])[source]

Thai Word Approximation

Parameters:
  • word (str) – Thai word

  • list_word (str) – Thai word

Returns:

List of approximation of words (The smaller the value, the closer)

Return type:

List[str]

Example:

from pythainlp.soundex.sound import word_approximation

word_approximation("รถ", ["รด", "รส", "รม", "น้ำ"])
# output : [0.0, 0.0, 3.875, 8.375]

The pythainlp.soundex.sound.word_approximation module offers word approximation functionality. It allows users to find Thai words that are phonetically similar to a given word.

pythainlp.soundex.sound.audio_vector

pythainlp.soundex.sound.audio_vector(word: str) List[List[int]][source]

Convert audio to vector list

Parameters:

word (str) – Thai word

Returns:

List of features from panphon

Return type:

List[List[int]]

Example:

from pythainlp.soundex.sound import audio_vector

audio_vector("น้ำ")
# output : [[-1, 1, 1, -1, -1, -1, ...]]

The pythainlp.soundex.sound.audio_vector module provides audio vector functionality for Thai words. It allows users to work with audio vectors based on phonetic properties.

pythainlp.soundex.sound.word2audio

pythainlp.soundex.sound.word2audio(word: str) str[source]

Convert word to IPA

Parameters:

word (str) – Thai word

Returns:

IPA with tones removed from the text

Return type:

str

Example:

from pythainlp.soundex.sound import word2audio

word2audio("น้ำ")
# output : 'n aː m .'

The pythainlp.soundex.sound.word2audio module is designed for converting Thai words to audio representations. It enables users to obtain audio vectors for Thai words, which can be used for various applications.

References