pythainlp.transliterate

The pythainlp.transliterate module is dedicated to the transliteration of Thai text into romanized form, effectively spelling it out with the English alphabet. This functionality is invaluable for making Thai text more accessible to non-Thai speakers and for various language processing tasks.

Modules

pythainlp.transliterate.romanize(text: str, engine: str = 'royin', fallback_engine: str = 'royin') str[source]

This function renders Thai words in the Latin alphabet or “romanization”, using the Royal Thai General System of Transcription (RTGS) [1]. RTGS is the official system published by the Royal Institute of Thailand. (Thai: ถอดเสียงภาษาไทยเป็นอักษรละติน)

Parameters:
  • text (str) – Thai text to be romanized

  • engine (str) – One of ‘royin’ (default), ‘thai2rom’, ‘thai2rom_onnx, ‘tltk’, and ‘lookup’. See more in options for engine section.

  • fallback_engine (str) – If engine equals ‘lookup’, use fallback_engine for words that are not in the transliteration dict. No effect on other engines. Default to ‘royin’.

Returns:

A string of Thai words rendered in the Latin alphabet.

Return type:

str

Options for engines:
  • royin - (default) based on the Royal Thai General System of Transcription issued by Royal Institute of Thailand.

  • thai2rom - a deep learning-based Thai romanization engine (require PyTorch).

  • thai2rom_onnx - a deep learning-based Thai romanization engine with ONNX runtime

  • tltk - TLTK: Thai Language Toolkit

  • lookup - Look up on Thai-English Transliteration dictionary v1.4 compiled by Wannaphong.

Example:

from pythainlp.transliterate import romanize

romanize("สามารถ", engine="royin")
# output: 'samant'

romanize("สามารถ", engine="thai2rom")
# output: 'samat'

romanize("สามารถ", engine="tltk")
# output: 'samat'

romanize("ภาพยนตร์", engine="royin")
# output: 'phapn'

romanize("ภาพยนตร์", engine="thai2rom")
# output: 'phapphayon'

romanize("ภาพยนตร์", engine="thai2rom_onnx")
# output: 'phapphayon'

romanize("ก็อปปี้", engine="lookup")
# output: 'copy'

The romanize function allows you to transliterate Thai text, converting it into a phonetic representation using the English alphabet. It’s a fundamental tool for rendering Thai words and phrases in a more familiar format.

pythainlp.transliterate.transliterate(text: str, engine: str = 'thaig2p') str[source]

This function transliterates Thai text.

Parameters:
  • text (str) – Thai text to be transliterated

  • engine (str) – ‘icu’, ‘ipa’, or ‘thaig2p’ (default)

Returns:

A string of phonetic alphabets indicating how the input text should be pronounced.

Return type:

str

Options for engines:
  • thaig2p - (default) Thai Grapheme-to-Phoneme, output is IPA (require PyTorch)

  • icu - pyicu, based on International Components for Unicode (ICU)

  • ipa - epitran, output is International Phonetic Alphabet (IPA)

  • tltk_g2p - Thai Grapheme-to-Phoneme from TLTK.,

  • iso_11940 - Thai text into Latin characters with ISO 11940.

  • tltk_ipa - tltk, output is International Phonetic Alphabet (IPA)

Example:

from pythainlp.transliterate import transliterate

transliterate("สามารถ", engine="icu")
# output: 's̄āmārt̄h'

transliterate("สามารถ", engine="ipa")
# output: 'saːmaːrot'

transliterate("สามารถ", engine="thaig2p")
# output: 's aː ˩˩˦ . m aː t̚ ˥˩'

transliterate("สามารถ", engine="tltk_ipa")
# output: 'saː5.maːt3'

transliterate("สามารถ", engine="tltk_g2p")
# output: 'saa4~maat2'

transliterate("สามารถ", engine="iso_11940")
# output: 's̄āmārt̄h'

transliterate("ภาพยนตร์", engine="icu")
# output: 'p̣hāphyntr̒'

transliterate("ภาพยนตร์", engine="ipa")
# output: 'pʰaːpjanot'

transliterate("ภาพยนตร์", engine="thaig2p")
# output: 'pʰ aː p̚ ˥˩ . pʰ a ˦˥ . j o n ˧'

transliterate("ภาพยนตร์", engine="iso_11940")
# output: 'p̣hāphyntr'

The transliterate function serves as a versatile transliteration tool, offering a range of transliteration engines to choose from. It provides flexibility and customization for your transliteration needs.

pythainlp.transliterate.pronunciate(word: str, engine: str = 'w2p') str[source]

This function pronunciates Thai word.

Parameters:
  • word (str) – Thai text to be pronunciated

  • engine (str) – ‘w2p’ (default)

Returns:

A string of Thai letters indicating how the input text should be pronounced.

Return type:

str

Options for engines:
  • w2p - Thai Word-to-Phoneme

Example:

from pythainlp.transliterate import pronunciate

pronunciate("สามารถ", engine="w2p")
# output: 'สา-มาด'

pronunciate("ภาพยนตร์", engine="w2p")
# output: 'พาบ-พะ-ยน'

This function provides assistance in generating phonetic representations of Thai words, which is particularly useful for language learning and pronunciation practice.

pythainlp.transliterate.puan(word: str, show_pronunciation: bool = True) str[source]

Thai Spoonerism

This function converts Thai word to spoonerism word.

Parameters:
  • word (str) – Thai word to be spoonerized

  • show_pronunciation (bool) – True (default) or False

Returns:

A string of Thai spoonerism word.

Return type:

str

Example:

from pythainlp.transliterate import puan

puan("นาริน")
# output: 'นิน-รา'

puan("นาริน", False)
# output: 'นินรา'

The puan function offers a unique transliteration feature known as “Puan.” It provides a specialized transliteration method for Thai text and is an additional option for rendering Thai text into English characters.

class pythainlp.transliterate.wunsen.WunsenTransliterate[source]

Transliterating Japanese/Korean/Mandarin/Vietnamese romanization text to Thai text by Wunsen

See Also:

The WunsenTransliterate class represents a transliteration engine known as “Wunsen.” It offers specific transliteration methods for rendering Thai text into a phonetic English format.

__init__() None[source]
transliterate(text: str, lang: str, jp_input: str | None = None, zh_sandhi: bool | None = None, system: str | None = None)[source]

Use Wunsen for transliteration

Parameters:
  • text (str) – text to be transliterated to Thai text.

  • lang (str) – source language

  • jp_input (str) – Japanese input method (for Japanese only)

  • zh_sandhi (bool) – Mandarin third tone sandhi option (for Mandarin only)

  • system (str) – transliteration system (for Japanese and Mandarin only)

Returns:

Thai text

Return type:

str

Options for lang:
  • jp - Japanese (from Hepburn romanization)

  • ko - Korean (from Revised Romanization)

  • vi - Vietnamese (Latin script)

  • zh - Mandarin (from Hanyu Pinyin)

Options for jp_input:
  • Hepburn-no diacritic - Hepburn-no diacritic (without macron)

Options for zh_sandhi:
  • True - apply third tone sandhi rule

  • False - do not apply third tone sandhi rule

Options for system:
  • ORS61 - for Japanese หลักเกณฑ์การทับศัพท์ภาษาญี่ปุ่น

    (สำนักงานราชบัณฑิตยสภา พ.ศ. 2561)

  • RI35 - for Japanese หลักเกณฑ์การทับศัพท์ภาษาญี่ปุ่น

    (ราชบัณฑิตยสถาน พ.ศ. 2535)

  • RI49 - for Mandarin หลักเกณฑ์การทับศัพท์ภาษาจีน

    (ราชบัณฑิตยสถาน พ.ศ. 2549)

  • THC43 - for Mandarin เกณฑ์การถ่ายทอดเสียงภาษาจีนแมนดาริน

    ด้วยอักขรวิธีไทย (คณะกรรมการสืบค้นประวัติศาสตร์ไทยในเอกสาร ภาษาจีน พ.ศ. 2543)

Example:

from pythainlp.transliterate.wunsen import WunsenTransliterate

wt = WunsenTransliterate()

wt.transliterate("ohayō", lang="jp")
# output: 'โอฮาโย'

wt.transliterate(
    "ohayou",
    lang="jp",
    jp_input="Hepburn-no diacritic"
)
# output: 'โอฮาโย'

wt.transliterate("ohayō", lang="jp", system="RI35")
# output: 'โอะฮะโย'

wt.transliterate("annyeonghaseyo", lang="ko")
# output: 'อันนย็องฮาเซโย'

wt.transliterate("xin chào", lang="vi")
# output: 'ซีน จ่าว'

wt.transliterate("ni3 hao3", lang="zh")
# output: 'หนี เห่า'

wt.transliterate("ni3 hao3", lang="zh", zh_sandhi=False)
# output: 'หนี่ เห่า'

wt.transliterate("ni3 hao3", lang="zh", system="RI49")
# output: 'หนี ห่าว'

Transliteration Engines

thai2rom

The thai2rom engine specializes in transliterating Thai text into romanized form. It’s particularly useful for rendering Thai words accurately in an English phonetic format.

royin

Render Thai words in Latin alphabet, using RTGS

Royal Thai General System of Transcription (RTGS), is the official system by the Royal Institute of Thailand.

param text:

Thai text to be romanized

type text:

str

return:

A string of Thai words rendered in the Latin alphabet

rtype:

str

The royin engine focuses on transliterating Thai text into English characters. It provides an alternative approach to transliteration, ensuring accurate representation of Thai words.

Transliterate Engines

This section includes multiple transliteration engines designed to suit various use cases. They offer unique methods for transliterating Thai text into romanized form:

  • icu: Utilizes the ICU transliteration system for phonetic conversion.

  • ipa: Provides International Phonetic Alphabet (IPA) representation of Thai text.

  • thaig2p: Transliterates Thai text into the Grapheme-to-Phoneme (G2P) representation.

  • tltk: Utilizes the TLTK transliteration system for a specific approach to transliteration.

  • iso_11940: Focuses on the ISO 11940 transliteration standard.

References

The pythainlp.transliterate module offers a comprehensive set of tools and engines for transliterating Thai text into Romanized form. Whether you need a simple transliteration, specific engines for accurate representation, or phonetic rendering, this module provides a wide range of options. Additionally, the module references a publication that highlights the significance of Romanization, Transliteration, and Transcription in making the Thai language accessible to a global audience.