pythainlp.generate

The pythainlp.generate is Thai text generate with PyThaiNLP.

Modules

class pythainlp.generate.Unigram(name: str = 'tnc')[source]

Text generator using Unigram

Parameters:

name (str) – corpus name * tnc - Thai National Corpus (default) * ttc - Thai Textbook Corpus (TTC) * oscar - OSCAR Corpus

__init__(name: str = 'tnc')[source]
gen_sentence(start_seq: str | None = None, N: int = 3, prob: float = 0.001, output_str: bool = True, duplicate: bool = False) List[str] | str[source]
Parameters:
  • start_seq (str) – word for begin word.

  • N (int) – number of word.

  • output_str (bool) – output is str

  • duplicate (bool) – duplicate word in sent

Returns:

list words or str words

Return type:

List[str], str

Example:

from pythainlp.generate import Unigram

gen = Unigram()

gen.gen_sentence("แมว")
# ouput: 'แมวเวลานะนั้น'
class pythainlp.generate.Bigram(name: str = 'tnc')[source]

Text generator using Bigram

Parameters:

name (str) – corpus name * tnc - Thai National Corpus (default)

__init__(name: str = 'tnc')[source]
prob(t1: str, t2: str) float[source]

probability word

Parameters:
  • t1 (int) – text 1

  • t2 (int) – text 2

Returns:

probability value

Return type:

float

gen_sentence(start_seq: str | None = None, N: int = 4, prob: float = 0.001, output_str: bool = True, duplicate: bool = False) List[str] | str[source]
Parameters:
  • start_seq (str) – word for begin word.

  • N (int) – number of word.

  • output_str (bool) – output is str

  • duplicate (bool) – duplicate word in sent

Returns:

list words or str words

Return type:

List[str], str

Example:

from pythainlp.generate import Bigram

gen = Bigram()

gen.gen_sentence("แมว")
# ouput: 'แมวไม่ได้รับเชื้อมัน'
class pythainlp.generate.Trigram(name: str = 'tnc')[source]

Text generator using Trigram

Parameters:

name (str) – corpus name * tnc - Thai National Corpus (default)

__init__(name: str = 'tnc')[source]
prob(t1: str, t2: str, t3: str) float[source]

probability word

Parameters:
  • t1 (int) – text 1

  • t2 (int) – text 2

  • t3 (int) – text 3

Returns:

probability value

Return type:

float

gen_sentence(start_seq: str | None = None, N: int = 4, prob: float = 0.001, output_str: bool = True, duplicate: bool = False) List[str] | str[source]
Parameters:
  • start_seq (str) – word for begin word.

  • N (int) – number of word.

  • output_str (bool) – output is str

  • duplicate (bool) – duplicate word in sent

Returns:

list words or str words

Return type:

List[str], str

Example:

from pythainlp.generate import Trigram

gen = Trigram()

gen.gen_sentence()
# ouput: 'ยังทำตัวเป็นเซิร์ฟเวอร์คือ'
pythainlp.generate.thai2fit.gen_sentence(start_seq: str | None = None, N: int = 4, prob: float = 0.001, output_str: bool = True) List[str] | str[source]

Text generator using Thai2fit

Parameters:
  • start_seq (str) – word for begin word.

  • N (int) – number of word.

  • output_str (bool) – output is str

  • duplicate (bool) – duplicate word in sent

Returns:

list words or str words

Return type:

List[str], str

Example:

from pythainlp.generate.thai2fit import gen_sentence

gen_sentence()
# output: 'แคทรียา อิงลิช  (นักแสดง'

gen_sentence("แมว")
# output: 'แมว คุณหลวง '