pythainlp.transliterate

The pythainlp.transliterate turns Thai text into a romanized one (put simply, spelled with English).

Modules

pythainlp.transliterate.romanize(text: str, engine: str = 'royin') str[source]

This function renders Thai words in the Latin alphabet or “romanization”, using the Royal Thai General System of Transcription (RTGS) 1. RTGS is the official system published by the Royal Institute of Thailand. (Thai: ถอดเสียงภาษาไทยเป็นอักษรละติน)

Parameters
  • text (str) – Thai text to be romanized

  • engine (str) – ‘royin’ (default) or ‘thai2rom’

Returns

A string of Thai words rendered in the Latin alphabet.

Return type

str

Options for engines
  • royin - (default) based on the Royal Thai General System of Transcription issued by Royal Institute of Thailand.

  • thai2rom - a deep learning-based Thai romanization engine (require PyTorch).

  • tltk - TLTK: Thai Language Toolkit

Example

from pythainlp.transliterate import romanize

romanize("สามารถ", engine="royin")
# output: 'samant'

romanize("สามารถ", engine="thai2rom")
# output: 'samat'

romanize("สามารถ", engine="tltk")
# output: 'samat'

romanize("ภาพยนตร์", engine="royin")
# output: 'phapn'

romanize("ภาพยนตร์", engine="thai2rom")
# output: 'phapphayon'
pythainlp.transliterate.transliterate(text: str, engine: str = 'thaig2p') str[source]

This function transliterates Thai text.

Parameters
  • text (str) – Thai text to be transliterated

  • engine (str) – ‘icu’, ‘ipa’, or ‘thaig2p’ (default)

Returns

A string of phonetic alphabets indicating how the input text should be pronounced.

Return type

str

Options for engines
  • thaig2p - (default) Thai Grapheme-to-Phoneme, output is IPA (require PyTorch)

  • icu - pyicu, based on International Components for Unicode (ICU)

  • ipa - epitran, output is International Phonetic Alphabet (IPA)

  • tltk_g2p - Thai Grapheme-to-Phoneme from TLTK.,

  • tltk_ipa - tltk, output is International Phonetic Alphabet (IPA)

Example

from pythainlp.transliterate import transliterate

transliterate("สามารถ", engine="icu")
# output: 's̄āmārt̄h'

transliterate("สามารถ", engine="ipa")
# output: 'saːmaːrot'

transliterate("สามารถ", engine="thaig2p")
# output: 's aː ˩˩˦ . m aː t̚ ˥˩'

transliterate("สามารถ", engine="tltk_ipa")
# output: 'saː5.maːt3'

transliterate("สามารถ", engine="tltk_g2p")
# output: 'saa4~maat2'

transliterate("ภาพยนตร์", engine="icu")
# output: 'p̣hāphyntr̒'

transliterate("ภาพยนตร์", engine="ipa")
# output: 'pʰaːpjanot'

transliterate("ภาพยนตร์", engine="thaig2p")
# output:'pʰ aː p̚ ˥˩ . pʰ a ˦˥ . j o n ˧'
pythainlp.transliterate.pronunciate(word: str, engine: str = 'w2p') str[source]

This function pronunciates Thai word.

Parameters
  • word (str) – Thai text to be pronunciated

  • engine (str) – ‘w2p’ (default)

Returns

A string of Thai letters indicating how the input text should be pronounced.

Return type

str

Options for engines
  • w2p - Thai Word-to-Phoneme

Example

from pythainlp.transliterate import pronunciate

pronunciate("สามารถ", engine="w2p")
# output: 'สา-มาด'

pronunciate("ภาพยนตร์", engine="w2p")
# output: 'พาบ-พะ-ยน'
pythainlp.transliterate.puan(word: str, show_pronunciation: bool = True) str[source]

Thai Spoonerism

This function converts Thai word to spoonerism word.

Parameters
  • word (str) – Thai word to be spoonerized

  • show_pronunciation (bool) – True (default) or False

Returns

A string of Thai spoonerism word.

Return type

str

Example

from pythainlp.transliterate import puan

puan("นาริน")
# output: 'นิน-รา'

puan("นาริน", False)
# output: 'นินรา'

Romanize Engines

thai2rom

royin

Render Thai words in Latin alphabet, using RTGS

Royal Thai General System of Transcription (RTGS), is the official system by the Royal Institute of Thailand.

param text

Thai text to be romanized

type text

str

return

A string of Thai words rendered in the Latin alphabet

rtype

str

Transliterate Engines

icu

Use ICU (International Components for Unicode) for transliteration :param str text: Thai text to be transliterated. :return: A string of Internaitonal Phonetic Alphabets indicating how the text should be pronounced.

ipa

thaig2p

References

1

Nitaya Kanchanawan. (2006). Romanization, Transliteration, and Transcription for the Globalization of the Thai Language. The Journal of the Royal Institute of Thailand.