Interactive online version: Binder badge Google Colab badge

PyThaiNLP Translate

We used machine translation model from The VISTEC-depa Thailand Artificial Intelligence Research Institute.

Install

[1]:
!pip install fairseq
Collecting fairseq
  Downloading https://files.pythonhosted.org/packages/15/ab/92c6efb05ffdfe16fbdc9e463229d9af8c3b74dc943ed4b4857a87b223c2/fairseq-0.10.2-cp37-cp37m-manylinux1_x86_64.whl (1.7MB)
     |████████████████████████████████| 1.7MB 5.7MB/s
Collecting dataclasses
  Downloading https://files.pythonhosted.org/packages/26/2f/1095cdc2868052dd1e64520f7c0d5c8c550ad297e944e641dbf1ffbb9a5d/dataclasses-0.6-py3-none-any.whl
Requirement already satisfied: cython in /usr/local/lib/python3.7/dist-packages (from fairseq) (0.29.22)
Collecting hydra-core
  Downloading https://files.pythonhosted.org/packages/52/e3/fbd70dd0d3ce4d1d75c22d56c0c9f895cfa7ed6587a9ffb821d6812d6a60/hydra_core-1.0.6-py3-none-any.whl (123kB)
     |████████████████████████████████| 133kB 14.5MB/s
Requirement already satisfied: cffi in /usr/local/lib/python3.7/dist-packages (from fairseq) (1.14.5)
Collecting sacrebleu>=1.4.12
  Downloading https://files.pythonhosted.org/packages/7e/57/0c7ca4e31a126189dab99c19951910bd081dea5bbd25f24b77107750eae7/sacrebleu-1.5.1-py3-none-any.whl (54kB)
     |████████████████████████████████| 61kB 6.3MB/s
Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from fairseq) (4.41.1)
Requirement already satisfied: torch in /usr/local/lib/python3.7/dist-packages (from fairseq) (1.8.0+cu101)
Requirement already satisfied: regex in /usr/local/lib/python3.7/dist-packages (from fairseq) (2019.12.20)
Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from fairseq) (1.19.5)
Collecting omegaconf<2.1,>=2.0.5
  Downloading https://files.pythonhosted.org/packages/d0/eb/9d63ce09dd8aa85767c65668d5414958ea29648a0eec80a4a7d311ec2684/omegaconf-2.0.6-py3-none-any.whl
Collecting antlr4-python3-runtime==4.8
  Downloading https://files.pythonhosted.org/packages/56/02/789a0bddf9c9b31b14c3e79ec22b9656185a803dc31c15f006f9855ece0d/antlr4-python3-runtime-4.8.tar.gz (112kB)
     |████████████████████████████████| 112kB 18.4MB/s
Requirement already satisfied: importlib-resources; python_version < "3.9" in /usr/local/lib/python3.7/dist-packages (from hydra-core->fairseq) (5.1.2)
Requirement already satisfied: pycparser in /usr/local/lib/python3.7/dist-packages (from cffi->fairseq) (2.20)
Collecting portalocker==2.0.0
  Downloading https://files.pythonhosted.org/packages/89/a6/3814b7107e0788040870e8825eebf214d72166adf656ba7d4bf14759a06a/portalocker-2.0.0-py2.py3-none-any.whl
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch->fairseq) (3.7.4.3)
Collecting PyYAML>=5.1.*
  Downloading https://files.pythonhosted.org/packages/7a/a5/393c087efdc78091afa2af9f1378762f9821c9c1d7a22c5753fb5ac5f97a/PyYAML-5.4.1-cp37-cp37m-manylinux1_x86_64.whl (636kB)
     |████████████████████████████████| 645kB 18.0MB/s
Requirement already satisfied: zipp>=0.4; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from importlib-resources; python_version < "3.9"->hydra-core->fairseq) (3.4.1)
Building wheels for collected packages: antlr4-python3-runtime
  Building wheel for antlr4-python3-runtime (setup.py) ... done
  Created wheel for antlr4-python3-runtime: filename=antlr4_python3_runtime-4.8-cp37-none-any.whl size=141231 sha256=7443fbcc47b93d3b320b897cf91d8b947b6fdc6a0795dcce01ed16fd31c8ab6d
  Stored in directory: /root/.cache/pip/wheels/e3/e2/fa/b78480b448b8579ddf393bebd3f47ee23aa84c89b6a78285c8
Successfully built antlr4-python3-runtime
Installing collected packages: dataclasses, PyYAML, omegaconf, antlr4-python3-runtime, hydra-core, portalocker, sacrebleu, fairseq
  Found existing installation: PyYAML 3.13
    Uninstalling PyYAML-3.13:
      Successfully uninstalled PyYAML-3.13
Successfully installed PyYAML-5.4.1 antlr4-python3-runtime-4.8 dataclasses-0.6 fairseq-0.10.2 hydra-core-1.0.6 omegaconf-2.0.6 portalocker-2.0.0 sacrebleu-1.5.1
[3]:
!pip install sacremoses sentencepiece
Requirement already satisfied: sacremoses in /usr/local/lib/python3.7/dist-packages (0.0.43)
Collecting sentencepiece
  Downloading https://files.pythonhosted.org/packages/f5/99/e0808cb947ba10f575839c43e8fafc9cc44e4a7a2c8f79c60db48220a577/sentencepiece-0.1.95-cp37-cp37m-manylinux2014_x86_64.whl (1.2MB)
     |████████████████████████████████| 1.2MB 4.3MB/s
Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from sacremoses) (4.41.1)
Requirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from sacremoses) (7.1.2)
Requirement already satisfied: regex in /usr/local/lib/python3.7/dist-packages (from sacremoses) (2019.12.20)
Requirement already satisfied: joblib in /usr/local/lib/python3.7/dist-packages (from sacremoses) (1.0.1)
Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from sacremoses) (1.15.0)
Installing collected packages: sentencepiece
Successfully installed sentencepiece-0.1.95
[8]:
!pip install https://github.com/PyThaiNLP/pythainlp/archive/dev.zip
Collecting https://github.com/PyThaiNLP/pythainlp/archive/dev.zip
  Using cached https://github.com/PyThaiNLP/pythainlp/archive/dev.zip
Requirement already satisfied (use --upgrade to upgrade): pythainlp==2.3.0.dev0 from https://github.com/PyThaiNLP/pythainlp/archive/dev.zip in /usr/local/lib/python3.7/dist-packages
Requirement already satisfied: python-crfsuite>=0.9.6 in /usr/local/lib/python3.7/dist-packages (from pythainlp==2.3.0.dev0) (0.9.7)
Requirement already satisfied: requests>=2.22.0 in /usr/local/lib/python3.7/dist-packages (from pythainlp==2.3.0.dev0) (2.23.0)
Requirement already satisfied: tinydb>=3.0 in /usr/local/lib/python3.7/dist-packages (from pythainlp==2.3.0.dev0) (4.4.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests>=2.22.0->pythainlp==2.3.0.dev0) (2020.12.5)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests>=2.22.0->pythainlp==2.3.0.dev0) (1.24.3)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests>=2.22.0->pythainlp==2.3.0.dev0) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests>=2.22.0->pythainlp==2.3.0.dev0) (2.10)
Building wheels for collected packages: pythainlp
  Building wheel for pythainlp (setup.py) ... done
  Created wheel for pythainlp: filename=pythainlp-2.3.0.dev0-cp37-none-any.whl size=11003566 sha256=b64ebc4010c51f2644c15473edd0c49540644725a367c28baa0d3f3e19edcccb
  Stored in directory: /tmp/pip-ephem-wheel-cache-zkojv2_o/wheels/79/4e/1e/26f3198c6712ecfbee92928ed1dde923a078da3d222401cc78
Successfully built pythainlp

All download and install model

[11]:
from pythainlp.translate import download_model_all
[12]:
download_model_all()
Corpus: scb_1m_en-th_moses
- Downloading: scb_1m_en-th_moses 1.0
100%|██████████| 1174648148/1174648148 [00:14<00:00, 81506882.14it/s]
Corpus: scb_1m_th-en_spm
- Downloading: scb_1m_th-en_spm 1.0
100%|██████████| 703780432/703780432 [00:08<00:00, 78234386.81it/s]

Translate

Import

[1]:
from pythainlp.translate import EnThTranslator, ThEnTranslator

EnThTranslator/ThEnTranslator.translate(text)

  • text : text

List language

  • th is Thai language

  • en is English language

English to Thai

We have 1 model

  • scb_1m_en-th_moses - bpe tokenizer

[14]:
print(EnThTranslator().translate("I want fried chicken."))
ไก่ทอดค่ะ

Thai to English

We have 2 model

  • scb_1m_en-th_moses - bpe tokenizer

[4]:
print(ThEnTranslator().translate("ผมอยากกินไก่ทอด"))
I want fried chicken.
[7]:
print(ThEnTranslator().translate("ผมอยากเขียนโปรแกรมคอมพิวเตอร์"))
I want to write a computer program.