pythainlp.tools

The pythainlp.tools contains miscellaneous functions for PyThaiNLP internal use.

Modules

pythainlp.tools.get_full_data_path(path: str) str[source]

This function joins path of pythainlp data directory and the given path, and returns the full path.

Returns:

full path given the name of dataset

Return type:

str

Example:

from pythainlp.tools import get_full_data_path

get_full_data_path('ttc_freq.txt')
# output: '/root/pythainlp-data/ttc_freq.txt'
pythainlp.tools.get_pythainlp_data_path() str[source]

Returns the full path where PyThaiNLP keeps its (downloaded) data. If the directory does not yet exist, it will be created. The path can be specified through the environment variable PYTHAINLP_DATA_DIR. By default, ~/pythainlp-data will be used.

Returns:

full path of directory for pythainlp downloaded data

Return type:

str

Example:

from pythainlp.tools import get_pythainlp_data_path

get_pythainlp_data_path()
# output: '/root/pythainlp-data'
pythainlp.tools.get_pythainlp_path() str[source]

This function returns full path of PyThaiNLP code

Returns:

full path of pythainlp code

Return type:

str

Example:

from pythainlp.tools import get_pythainlp_path

get_pythainlp_path()
# output: '/usr/local/lib/python3.6/dist-packages/pythainlp'
pythainlp.tools.misspell.misspell(sentence: str, ratio: float = 0.05)[source]

Simulate some mispellings for the input sentence. The number of mispelled locations is governed by ratio.

Params str sentence:

sentence to be mispelled

Params float ratio:

number of misspells per 100 chars. Defaults to 0.5.

Returns:

sentence containing some misspelled

Return type:

str

Example:

from pythainlp.tools.misspell import misspell

sentence = "ภาษาไทยปรากฏครั้งแรกในพุทธศักราช 1826"

misspell(sent, ratio=0.1)
# output:
ภาษาไทยปรากฏครั้งแรกในกุทธศักราช 1727