pythainlp.corpus

The pythainlp.corpus is corpus for pythainlp.

Modules

pythainlp.corpus.get_corpus(filename: str)frozenset[source]

Read corpus from file and return a frozenset

Parameters

filename (string) – file corpus

pythainlp.corpus.get_corpus_path(name: str)Optional[str][source]

Get corpus path

Parameters

name (string) – corpus name

pythainlp.corpus.download(name: str, force: bool = False)NoReturn[source]

Download corpus

Parameters
  • name (string) – corpus name

  • force (bool) – force install

pythainlp.corpus.remove(name: str)bool[source]

Remove corpus

Parameters

name (string) – corpus name

Returns

True or False

pythainlp.corpus.common.thai_stopwords()frozenset[source]

Return a frozenset of Thai stopwords

pythainlp.corpus.common.thai_words()frozenset[source]

Return a frozenset of Thai words

pythainlp.corpus.common.thai_syllables()frozenset[source]

Return a frozenset of Thai syllables

pythainlp.corpus.common.thai_negations()frozenset[source]

Return a frozenset of Thai negation words

pythainlp.corpus.common.countries()frozenset[source]

Return a frozenset of country names in Thai

pythainlp.corpus.common.provinces()frozenset[source]

Return a frozenset of Thailand province names in Thai

pythainlp.corpus.conceptnet.edges(word: str, lang: str = 'th')[source]

Get edges from ConceptNet API

Parameters
  • word (str) – word

  • lang (str) – language

TNC

pythainlp.corpus.tnc.word_freq(word: str, domain: str = 'all')int[source]

Not officially supported. Get word frequency of a word by domain. This function will make a query to the server of Thai National Corpus. Internet connection is required.

IMPORTANT: Currently (as of 29 April 2019) it is likely to return 0, regardless of the word, as the service URL has been changed and the code is not updated yet. New URL is http://www.arts.chula.ac.th/~ling/tnc3/

Parameters
  • word (string) – word

  • domain (string) – domain

pythainlp.corpus.tnc.word_freqs()List[Tuple[str, int]][source]

Get word frequency from Thai National Corpus (TNC)

TTC

pythainlp.corpus.ttc.word_freqs()List[Tuple[str, int]][source]

Get word frequency from Thai Textbook Corpus (TTC)

Wordnet

pythainlp.corpus.wordnet.synsets(word: str, pos: Optional[str] = None, lang: str = 'tha')[source]
pythainlp.corpus.wordnet.synset(name_synsets)[source]
pythainlp.corpus.wordnet.all_lemma_names(pos: Optional[str] = None, lang: str = 'tha')[source]
pythainlp.corpus.wordnet.all_synsets(pos: Optional[str] = None)[source]
pythainlp.corpus.wordnet.langs()[source]
pythainlp.corpus.wordnet.lemmas(word: str, pos: Optional[str] = None, lang: str = 'tha')[source]
pythainlp.corpus.wordnet.lemma(name_synsets)[source]
pythainlp.corpus.wordnet.lemma_from_key(key)[source]
pythainlp.corpus.wordnet.path_similarity(synsets1, synsets2)[source]
pythainlp.corpus.wordnet.lch_similarity(synsets1, synsets2)[source]
pythainlp.corpus.wordnet.wup_similarity(synsets1, synsets2)[source]
pythainlp.corpus.wordnet.morphy(form, pos: Optional[str] = None)[source]
pythainlp.corpus.wordnet.custom_lemmas(tab_file, lang: str)[source]