pythainlp.soundex

The pythainlp.soundex is soundex for Thai.

Modules

pythainlp.soundex.soundex(text: str, engine: str = 'udom83') str[source]

This function converts Thai text into phonetic code.

Parameters
  • text (str) – word

  • engine (str) – soundex engine

Returns

Soundex code

Return type

str

Options for engine
  • udom83 (default) - Thai soundex algorithm proposed by Vichit Lorchirachoonkul 2

  • lk82 - Thai soundex algorithm proposed by Wannee Udompanich 3

  • metasound - Thai soundex algorithm based on a combination of Metaphone and Soundex proposed by Snae & Brückner 1

Example

from pythainlp.soundex import soundex

soundex("ลัก"), soundex("ลัก", engine='lk82'), \
    soundex("ลัก", engine='metasound')
# output: ('ร100000', 'ร1000', 'ล100')

soundex("รัก"), soundex("รัก", engine='lk82'), \
    soundex("รัก", engine='metasound')
# output: ('ร100000', 'ร1000', 'ร100')

soundex("รักษ์"), soundex("รักษ์", engine='lk82'), \
    soundex("รักษ์", engine='metasound')
# output: ('ร100000', 'ร1000', 'ร100')

soundex("บูรณการ"), soundex("บูรณการ", engine='lk82'), \
    soundex("บูรณการ", engine='metasound')
# output: ('บ931900', 'บE419', 'บ551')

soundex("ปัจจุบัน"), soundex("ปัจจุบัน", engine='lk82'), \
    soundex("ปัจจุบัน", engine='metasound')
# output: ('ป775300', 'ป3E54', 'ป223')
pythainlp.soundex.lk82(text: str) str[source]

This function converts Thai text into phonetic code with the a Thai soundex algorithm named LK82 3.

Parameters

text (str) – Thai word

Returns

LK82 soundex of the given Thai word

Return type

str

Example

from pythainlp.soundex import lk82

lk82("ลัก")
# output: 'ร1000'

lk82("รัก")
# output: 'ร1000'

lk82("รักษ์")
# output: 'ร1000'

lk82("บูรณการ")
# output: 'บE419'

lk82("ปัจจุบัน")
# output: 'ป3E54'
pythainlp.soundex.udom83(text: str) str[source]

This function converts Thai text into phonetic code with the Thai soundex algorithm named Udom83 2.

Parameters

text (str) – Thai word

Returns

Udom83 soundex

Return type

str

Example

from pythainlp.soundex import udom83

udom83("ลัก")
# output : 'ล100'

udom83("รัก")
# output: 'ร100'

udom83("รักษ์")
# output: 'ร100'

udom83("บูรณการ")
# output: 'บ5515'

udom83("ปัจจุบัน")
# output: 'ป775300'
pythainlp.soundex.metasound(text: str, length: int = 4) str[source]

This function converts Thai text into phonetic code with the mactching technique called MetaSound 1 (combination between Soundex and Metaphone algorithms). MetaSound algorithm was developed specifically for Thai language.

Parameters
  • text (str) – Thai text

  • length (int) – preferred length of the MetaSound code (default is 4)

Returns

MetaSound for the given text

Return type

str

Example

from pythainlp.metasound import metasound

metasound("ลัก")
# output: 'ล100'

metasound("รัก")
# output: 'ร100'

metasound("รักษ์")
# output: 'ร100'

metasound("บูรณการ", 5)
# output: 'บ5515'

metasound("บูรณการ", 6))
# output: 'บ55150'

metasound("บูรณการ", 4)
# output: 'บ551'

References

1(1,2)

Snae & Brückner. (2009). Novel Phonetic Name Matching Algorithm with a Statistical Ontology for Analysing Names Given in Accordance with Thai Astrology.

2(1,2)

Wannee Udompanich (1983). Search Thai sound-alike string using homonymic approach. Master Thesis. Chulalongkorn University, Thailand.

3(1,2)

วิชิต หล่อจีระชุณห์กุล และ เจริญ คุวินทร์พันธุ์. โปรแกรมการสืบค้นคำไทยตามเสียงอ่าน (Thai Soundex).