For stable version:

pip install pythainlp

For development version:

pip install --upgrade --pre pythainlp

For some functionalities, like named entity recognition, extra packages may be needed. Install them with these install options:

pip install pythainlp[extra1,extra2,…]

where extras can be
  • attacut (to support attacut, a fast and accurate tokenizer)

  • icu (for ICU, International Components for Unicode, support in transliteration and tokenization)

  • ipa (for IPA, International Phonetic Alphabet, support in transliteration)

  • ml (to support ULMFiT models for classification)

  • ner (for named-entity recognizer)

  • thai2fit (for Thai word vector)

  • thai2rom (for machine-learnt romanization)

  • full (install everything)

For dependency details, look at extras variable in [](

Note for installation on Windows:

  • PyICU libraries may required. You have two options to get them installed on Windows.

  • Option 1 (recommended):
    • Find a pre-built package (“wheel”) from

    • Download a suitable wheel for your Python version (3.5, 3.6, etc.) and CPU architecture (“win32” for 32-bit Windows and “amd64” for 64-bit Windows)

    • Install them with pip. For example: pip install PyICU-xxx‑cp36‑cp36m‑win32.whl

  • Option 2 (advanced):
    • You can also try to install them with a command: pip install pyicu

    • With this, pip will try to build the libraries directly from source files.

    • This will take some time and need a set of build tools to be installed in your system, for example Microsoft Visual C++ Compiler. It also requires some technical skills on how things are getting built on Windows system, as you may need to configure some environment variables to accommodate the build process.

    • For PyICU, before the installation, you have to set ICU_VERSION environment variable to ICU version in your system. For example, set ICU_VERSION=62.1.

    • This approach is obviously take more time and effort, but the good side is the library will be optimized for your system. This could mean a better performance.

Runtime Configurations


This environment variable specifies the location where the downloaded data and the corpus database information are stored. If this directory does not exist, PyThaiNLP will automatically create a new one. By default, it is specified to the directory called pythainlp-data within the home directory.