Interactive online version: Binder badge Google Colab badge

spaCy-PyThaiNLPīƒ

PyThaiNLP For spaCy

GitHub: https://github.com/PyThaiNLP/spaCy-PyThaiNLP

[1]:
!pip install -U pip
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Requirement already satisfied: pip in /usr/local/lib/python3.8/dist-packages (21.1.3)
Collecting pip
  Downloading pip-22.3.1-py3-none-any.whl (2.1 MB)
     |████████████████████████████████| 2.1 MB 6.1 MB/s
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 21.1.3
    Uninstalling pip-21.1.3:
      Successfully uninstalled pip-21.1.3
Successfully installed pip-22.3.1
[2]:
!pip install pythainlp[dependency_parsing] esupar
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pythainlp[dependency_parsing]
  Downloading pythainlp-3.1.1-py3-none-any.whl (9.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.6/9.6 MB 64.3 MB/s eta 0:00:00
Collecting esupar
  Downloading esupar-1.4.7-py3-none-any.whl (57 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 57.6/57.6 kB 8.8 MB/s eta 0:00:00
Requirement already satisfied: requests>=2.22.0 in /usr/local/lib/python3.8/dist-packages (from pythainlp[dependency_parsing]) (2.23.0)
Collecting ufal.chu-liu-edmonds>=1.0.2
  Downloading ufal.chu_liu_edmonds-1.0.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (107 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 107.4/107.4 kB 15.1 MB/s eta 0:00:00
Collecting spacy-thai>=0.7.1
  Downloading spacy_thai-0.7.3-py3-none-any.whl (6.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.0/6.0 MB 48.8 MB/s eta 0:00:00
Collecting transformers>=4.22.1
  Downloading transformers-4.25.1-py3-none-any.whl (5.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.8/5.8 MB 79.1 MB/s eta 0:00:00
Collecting deplacy>=2.0.3
  Downloading deplacy-2.0.3-py3-none-any.whl (22 kB)
Collecting supar>=1.1.4
  Downloading supar-1.1.4-py3-none-any.whl (93 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 93.1/93.1 kB 13.5 MB/s eta 0:00:00
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.8/dist-packages (from requests>=2.22.0->pythainlp[dependency_parsing]) (2022.12.7)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.8/dist-packages (from requests>=2.22.0->pythainlp[dependency_parsing]) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.8/dist-packages (from requests>=2.22.0->pythainlp[dependency_parsing]) (1.24.3)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.8/dist-packages (from requests>=2.22.0->pythainlp[dependency_parsing]) (2.10)
Requirement already satisfied: spacy>=2.2.2 in /usr/local/lib/python3.8/dist-packages (from spacy-thai>=0.7.1->pythainlp[dependency_parsing]) (3.4.4)
Collecting ufal.udpipe>=1.2.0
  Downloading ufal.udpipe-1.2.0.3.tar.gz (304 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 304.1/304.1 kB 35.7 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Requirement already satisfied: dill in /usr/local/lib/python3.8/dist-packages (from supar>=1.1.4->esupar) (0.3.6)
Requirement already satisfied: nltk in /usr/local/lib/python3.8/dist-packages (from supar>=1.1.4->esupar) (3.7)
Requirement already satisfied: torch>=1.7.1 in /usr/local/lib/python3.8/dist-packages (from supar>=1.1.4->esupar) (1.13.0+cu116)
Collecting stanza
  Downloading stanza-1.4.2-py3-none-any.whl (691 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 691.3/691.3 kB 46.6 MB/s eta 0:00:00
Requirement already satisfied: filelock in /usr/local/lib/python3.8/dist-packages (from transformers>=4.22.1->pythainlp[dependency_parsing]) (3.8.2)
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.6/7.6 MB 63.5 MB/s eta 0:00:00
Collecting huggingface-hub<1.0,>=0.10.0
  Downloading huggingface_hub-0.11.1-py3-none-any.whl (182 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 182.4/182.4 kB 14.6 MB/s eta 0:00:00
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.8/dist-packages (from transformers>=4.22.1->pythainlp[dependency_parsing]) (4.64.1)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.8/dist-packages (from transformers>=4.22.1->pythainlp[dependency_parsing]) (2022.6.2)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.8/dist-packages (from transformers>=4.22.1->pythainlp[dependency_parsing]) (6.0)
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.8/dist-packages (from transformers>=4.22.1->pythainlp[dependency_parsing]) (1.21.6)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.8/dist-packages (from transformers>=4.22.1->pythainlp[dependency_parsing]) (21.3)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.8/dist-packages (from huggingface-hub<1.0,>=0.10.0->transformers>=4.22.1->pythainlp[dependency_parsing]) (4.4.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging>=20.0->transformers>=4.22.1->pythainlp[dependency_parsing]) (3.0.9)
Requirement already satisfied: wasabi<1.1.0,>=0.9.1 in /usr/local/lib/python3.8/dist-packages (from spacy>=2.2.2->spacy-thai>=0.7.1->pythainlp[dependency_parsing]) (0.10.1)
Requirement already satisfied: thinc<8.2.0,>=8.1.0 in /usr/local/lib/python3.8/dist-packages (from spacy>=2.2.2->spacy-thai>=0.7.1->pythainlp[dependency_parsing]) (8.1.5)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.8/dist-packages (from spacy>=2.2.2->spacy-thai>=0.7.1->pythainlp[dependency_parsing]) (2.11.3)
Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /usr/local/lib/python3.8/dist-packages (from spacy>=2.2.2->spacy-thai>=0.7.1->pythainlp[dependency_parsing]) (2.0.8)
Requirement already satisfied: smart-open<7.0.0,>=5.2.1 in /usr/local/lib/python3.8/dist-packages (from spacy>=2.2.2->spacy-thai>=0.7.1->pythainlp[dependency_parsing]) (6.3.0)
Requirement already satisfied: pathy>=0.3.5 in /usr/local/lib/python3.8/dist-packages (from spacy>=2.2.2->spacy-thai>=0.7.1->pythainlp[dependency_parsing]) (0.10.1)
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (from spacy>=2.2.2->spacy-thai>=0.7.1->pythainlp[dependency_parsing]) (57.4.0)
Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /usr/local/lib/python3.8/dist-packages (from spacy>=2.2.2->spacy-thai>=0.7.1->pythainlp[dependency_parsing]) (3.3.0)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.8/dist-packages (from spacy>=2.2.2->spacy-thai>=0.7.1->pythainlp[dependency_parsing]) (1.0.9)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from spacy>=2.2.2->spacy-thai>=0.7.1->pythainlp[dependency_parsing]) (2.0.7)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,<1.11.0,>=1.7.4 in /usr/local/lib/python3.8/dist-packages (from spacy>=2.2.2->spacy-thai>=0.7.1->pythainlp[dependency_parsing]) (1.10.2)
Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /usr/local/lib/python3.8/dist-packages (from spacy>=2.2.2->spacy-thai>=0.7.1->pythainlp[dependency_parsing]) (1.0.4)
Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /usr/local/lib/python3.8/dist-packages (from spacy>=2.2.2->spacy-thai>=0.7.1->pythainlp[dependency_parsing]) (2.4.5)
Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.10 in /usr/local/lib/python3.8/dist-packages (from spacy>=2.2.2->spacy-thai>=0.7.1->pythainlp[dependency_parsing]) (3.0.10)
Requirement already satisfied: typer<0.8.0,>=0.3.0 in /usr/local/lib/python3.8/dist-packages (from spacy>=2.2.2->spacy-thai>=0.7.1->pythainlp[dependency_parsing]) (0.7.0)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.8/dist-packages (from spacy>=2.2.2->spacy-thai>=0.7.1->pythainlp[dependency_parsing]) (3.0.8)
Requirement already satisfied: joblib in /usr/local/lib/python3.8/dist-packages (from nltk->supar>=1.1.4->esupar) (1.2.0)
Requirement already satisfied: click in /usr/local/lib/python3.8/dist-packages (from nltk->supar>=1.1.4->esupar) (7.1.2)
Requirement already satisfied: six in /usr/local/lib/python3.8/dist-packages (from stanza->supar>=1.1.4->esupar) (1.15.0)
Requirement already satisfied: protobuf in /usr/local/lib/python3.8/dist-packages (from stanza->supar>=1.1.4->esupar) (3.19.6)
Collecting emoji
  Downloading emoji-2.2.0.tar.gz (240 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 240.9/240.9 kB 24.1 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Requirement already satisfied: blis<0.8.0,>=0.7.8 in /usr/local/lib/python3.8/dist-packages (from thinc<8.2.0,>=8.1.0->spacy>=2.2.2->spacy-thai>=0.7.1->pythainlp[dependency_parsing]) (0.7.9)
Requirement already satisfied: confection<1.0.0,>=0.0.1 in /usr/local/lib/python3.8/dist-packages (from thinc<8.2.0,>=8.1.0->spacy>=2.2.2->spacy-thai>=0.7.1->pythainlp[dependency_parsing]) (0.0.3)
Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.8/dist-packages (from jinja2->spacy>=2.2.2->spacy-thai>=0.7.1->pythainlp[dependency_parsing]) (2.0.1)
Building wheels for collected packages: ufal.udpipe, emoji
  Building wheel for ufal.udpipe (setup.py) ... done
  Created wheel for ufal.udpipe: filename=ufal.udpipe-1.2.0.3-cp38-cp38-linux_x86_64.whl size=5626945 sha256=6613dcb188f57561a00a2e40eca1bbafe6203936b8d9c387facd79de3f06fa62
  Stored in directory: /root/.cache/pip/wheels/20/eb/6f/3475485c7d991ca5698d39603e22a99bd6904dcac7d0a5855a
  Building wheel for emoji (setup.py) ... done
  Created wheel for emoji: filename=emoji-2.2.0-py3-none-any.whl size=234926 sha256=e3b7a3e928e5e81053b9f869cfef5382b49f133284c6abbd718496ff11e8ee67
  Stored in directory: /root/.cache/pip/wheels/39/08/a1/b0bb1f7683d20b75b34ceeb56ee83a585e9b065a5fef0b2cb1
Successfully built ufal.udpipe emoji
Installing collected packages: ufal.udpipe, ufal.chu-liu-edmonds, tokenizers, emoji, deplacy, stanza, pythainlp, huggingface-hub, transformers, supar, spacy-thai, esupar
Successfully installed deplacy-2.0.3 emoji-2.2.0 esupar-1.4.7 huggingface-hub-0.11.1 pythainlp-3.1.1 spacy-thai-0.7.3 stanza-1.4.2 supar-1.1.4 tokenizers-0.13.2 transformers-4.25.1 ufal.chu-liu-edmonds-1.0.2 ufal.udpipe-1.2.0.3
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

[3]:
!pip install spacy-pythainlp
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting spacy-pythainlp
  Downloading spacy_pythainlp-0.1.dev6-py3-none-any.whl (9.2 kB)
Requirement already satisfied: spacy>=3.0 in /usr/local/lib/python3.8/dist-packages (from spacy-pythainlp) (3.4.4)
Requirement already satisfied: pythainlp>=3.1.0 in /usr/local/lib/python3.8/dist-packages (from spacy-pythainlp) (3.1.1)
Collecting python-crfsuite
  Downloading python_crfsuite-0.9.8-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 20.7 MB/s eta 0:00:00
Requirement already satisfied: requests>=2.22.0 in /usr/local/lib/python3.8/dist-packages (from pythainlp>=3.1.0->spacy-pythainlp) (2.23.0)
Requirement already satisfied: pathy>=0.3.5 in /usr/local/lib/python3.8/dist-packages (from spacy>=3.0->spacy-pythainlp) (0.10.1)
Requirement already satisfied: wasabi<1.1.0,>=0.9.1 in /usr/local/lib/python3.8/dist-packages (from spacy>=3.0->spacy-pythainlp) (0.10.1)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.8/dist-packages (from spacy>=3.0->spacy-pythainlp) (1.0.9)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,<1.11.0,>=1.7.4 in /usr/local/lib/python3.8/dist-packages (from spacy>=3.0->spacy-pythainlp) (1.10.2)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.8/dist-packages (from spacy>=3.0->spacy-pythainlp) (21.3)
Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.10 in /usr/local/lib/python3.8/dist-packages (from spacy>=3.0->spacy-pythainlp) (3.0.10)
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (from spacy>=3.0->spacy-pythainlp) (57.4.0)
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /usr/local/lib/python3.8/dist-packages (from spacy>=3.0->spacy-pythainlp) (4.64.1)
Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /usr/local/lib/python3.8/dist-packages (from spacy>=3.0->spacy-pythainlp) (2.4.5)
Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /usr/local/lib/python3.8/dist-packages (from spacy>=3.0->spacy-pythainlp) (1.0.4)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.8/dist-packages (from spacy>=3.0->spacy-pythainlp) (2.11.3)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.8/dist-packages (from spacy>=3.0->spacy-pythainlp) (3.0.8)
Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /usr/local/lib/python3.8/dist-packages (from spacy>=3.0->spacy-pythainlp) (3.3.0)
Requirement already satisfied: numpy>=1.15.0 in /usr/local/lib/python3.8/dist-packages (from spacy>=3.0->spacy-pythainlp) (1.21.6)
Requirement already satisfied: typer<0.8.0,>=0.3.0 in /usr/local/lib/python3.8/dist-packages (from spacy>=3.0->spacy-pythainlp) (0.7.0)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from spacy>=3.0->spacy-pythainlp) (2.0.7)
Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /usr/local/lib/python3.8/dist-packages (from spacy>=3.0->spacy-pythainlp) (2.0.8)
Requirement already satisfied: smart-open<7.0.0,>=5.2.1 in /usr/local/lib/python3.8/dist-packages (from spacy>=3.0->spacy-pythainlp) (6.3.0)
Requirement already satisfied: thinc<8.2.0,>=8.1.0 in /usr/local/lib/python3.8/dist-packages (from spacy>=3.0->spacy-pythainlp) (8.1.5)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging>=20.0->spacy>=3.0->spacy-pythainlp) (3.0.9)
Requirement already satisfied: typing-extensions>=4.1.0 in /usr/local/lib/python3.8/dist-packages (from pydantic!=1.8,!=1.8.1,<1.11.0,>=1.7.4->spacy>=3.0->spacy-pythainlp) (4.4.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.8/dist-packages (from requests>=2.22.0->pythainlp>=3.1.0->spacy-pythainlp) (2022.12.7)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.8/dist-packages (from requests>=2.22.0->pythainlp>=3.1.0->spacy-pythainlp) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.8/dist-packages (from requests>=2.22.0->pythainlp>=3.1.0->spacy-pythainlp) (2.10)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.8/dist-packages (from requests>=2.22.0->pythainlp>=3.1.0->spacy-pythainlp) (1.24.3)
Requirement already satisfied: blis<0.8.0,>=0.7.8 in /usr/local/lib/python3.8/dist-packages (from thinc<8.2.0,>=8.1.0->spacy>=3.0->spacy-pythainlp) (0.7.9)
Requirement already satisfied: confection<1.0.0,>=0.0.1 in /usr/local/lib/python3.8/dist-packages (from thinc<8.2.0,>=8.1.0->spacy>=3.0->spacy-pythainlp) (0.0.3)
Requirement already satisfied: click<9.0.0,>=7.1.1 in /usr/local/lib/python3.8/dist-packages (from typer<0.8.0,>=0.3.0->spacy>=3.0->spacy-pythainlp) (7.1.2)
Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.8/dist-packages (from jinja2->spacy>=3.0->spacy-pythainlp) (2.0.1)
Installing collected packages: python-crfsuite, spacy-pythainlp
Successfully installed python-crfsuite-0.9.8 spacy-pythainlp-0.1.dev6
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

[4]:
!pip install attacut
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting attacut
  Downloading attacut-1.0.6-py3-none-any.whl (1.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 24.7 MB/s eta 0:00:00
Collecting nptyping>=0.2.0
  Downloading nptyping-2.4.1-py3-none-any.whl (36 kB)
Collecting ssg>=0.0.4
  Downloading ssg-0.0.8-py3-none-any.whl (473 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 473.8/473.8 kB 43.9 MB/s eta 0:00:00
Requirement already satisfied: pyyaml>=5.1.2 in /usr/local/lib/python3.8/dist-packages (from attacut) (6.0)
Collecting docopt>=0.6.2
  Downloading docopt-0.6.2.tar.gz (25 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: numpy>=1.17.0 in /usr/local/lib/python3.8/dist-packages (from attacut) (1.21.6)
Requirement already satisfied: six>=1.12.0 in /usr/local/lib/python3.8/dist-packages (from attacut) (1.15.0)
Requirement already satisfied: torch>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from attacut) (1.13.0+cu116)
Collecting fire>=0.1.3
  Downloading fire-0.5.0.tar.gz (88 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88.3/88.3 kB 10.9 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Requirement already satisfied: termcolor in /usr/local/lib/python3.8/dist-packages (from fire>=0.1.3->attacut) (2.1.1)
Requirement already satisfied: typing-extensions<5.0.0,>=4.0.0 in /usr/local/lib/python3.8/dist-packages (from nptyping>=0.2.0->attacut) (4.4.0)
Requirement already satisfied: python-crfsuite>=0.9.6 in /usr/local/lib/python3.8/dist-packages (from ssg>=0.0.4->attacut) (0.9.8)
Requirement already satisfied: tqdm>=4.32.2 in /usr/local/lib/python3.8/dist-packages (from ssg>=0.0.4->attacut) (4.64.1)
Building wheels for collected packages: docopt, fire
  Building wheel for docopt (setup.py) ... done
  Created wheel for docopt: filename=docopt-0.6.2-py2.py3-none-any.whl size=13723 sha256=cd282751c98736c79933ed4265624e65891888bb9fdd01dc5d6fcf978d76431f
  Stored in directory: /root/.cache/pip/wheels/ca/cc/e3/f1e272f628fdb013d969acc99cfe2e031ea15b3efb74ffe842
  Building wheel for fire (setup.py) ... done
  Created wheel for fire: filename=fire-0.5.0-py2.py3-none-any.whl size=116949 sha256=bc82a0082e9931af28c40d49e4494ce66a1f80f929b30ae4e7e1eff347b37c5c
  Stored in directory: /root/.cache/pip/wheels/40/86/45/88e8603bd3b1a9bff9d02d820c7431c47ad032865632657bb9
Successfully built docopt fire
Installing collected packages: docopt, nptyping, fire, ssg, attacut
Successfully installed attacut-1.0.6 docopt-0.6.2 fire-0.5.0 nptyping-2.4.1 ssg-0.0.8
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

[1]:
import spacy
import spacy_pythainlp.core
/usr/local/lib/python3.8/dist-packages/torch/cuda/__init__.py:497: UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")
[2]:
from spacy import displacy

You can config the setting in the nlp.add_pipe.

nlp.add_pipe(
    "pythainlp",
    config={
        "pos_engine": "perceptron",
        "pos": True,
        "pos_corpus": "orchid_ud",
        "sent_engine": "crfcut",
        "sent": True,
        "ner_engine": "thainer",
        "ner": True,
        "tokenize_engine": "newmm",
        "tokenize": False,
        "dependency_parsing": False,
        "dependency_parsing_engine": "esupar",
        "dependency_parsing_model": None,
    }
)
  • tokenize: Bool (True or False) to change the word tokenize. (the default spaCy is newmm of PyThaiNLP)

  • tokenize_engine: The tokenize engine. You can read more: Options for engine

  • sent: Bool (True or False) to turn on the sentence tokenizer.

  • sent_engine: The sentence tokenizer engine. You can read more: Options for engine

  • pos: Bool (True or False) to turn on the part-of-speech.

  • pos_engine: The part-of-speech engine. You can read more: Options for engine

  • ner: Bool (True or False) to turn on the NER.

  • ner_engine: The NER engine. You can read more: Options for engine

  • dependency_parsing: Bool (True or False) to turn on the Dependency parsing.

  • dependency_parsing_engine: The Dependency parsing engine. You can read more: Options for engine

  • dependency_parsing_model: The Dependency parsing model. You can read more: Options for model

Note: If you turn on Dependency parsing, word segmentation and sentence segmentation are turn off to use word segmentation and sentence segmentation from Dependency parsing.

[3]:
nlp = spacy.blank('th')
nlp.add_pipe(
   "pythainlp",
   config={
      "sent":True,
      "tokenize": True,
      "tokenize_engine":"attacut"# try change !!! "attacut" "newmm"
   }
)
Corpus: thainer
- Downloading: thainer 1.5
[3]:
<spacy_pythainlp.core.PyThaiNLP at 0x7f9c02410a90>
[4]:
doc=nlp("ā¸œā¸Ąāš€ā¸›āš‡ā¸™āšā¸Ąā¸§ ā¸œā¸Ąā¸Šā¸­ā¸šāš„ā¸›āš€ā¸Ĩāšˆā¸™ā¸—ā¸ĩāšˆāš‚ā¸Ŗā¸‡āš€ā¸Ŗā¸ĩā¸ĸā¸™ā¸™ā¸˛ā¸‡ā¸Ŗā¸­ā¸‡ ā¸šā¸¸ā¸Ŗā¸ĩā¸Ŗā¸ąā¸Ąā¸ĸāšŒ")
/usr/local/lib/python3.8/dist-packages/pythainlp/tag/perceptron.py:42: UserWarning:
    LST20 corpus are free for research and open source only.

    If you want to use in Commercial use, please contract NECTEC.

    https://www.facebook.com/dancearmy/posts/10157641945708284

  warnings.warn(
Corpus: pos_lst20_perceptron
- Downloading: pos_lst20_perceptron 0.2.4
[5]:
list(doc)
[5]:
[ā¸œā¸Ą, āš€ā¸›āš‡ā¸™, āšā¸Ąā¸§,  , ā¸œā¸Ąā¸Šā¸­ā¸š, āš„ā¸›, āš€ā¸Ĩāšˆā¸™, ā¸—ā¸ĩāšˆ, āš‚ā¸Ŗā¸‡, āš€ā¸Ŗā¸ĩā¸ĸā¸™, ā¸™ā¸˛ā¸‡ā¸Ŗā¸­ā¸‡,  ā¸šā¸¸ā¸Ŗā¸ĩā¸Ŗā¸ąā¸Ąā¸ĸāšŒ]
[6]:
list(doc.sents)
[6]:
[ā¸œā¸Ąāš€ā¸›āš‡ā¸™āšā¸Ąā¸§  ā¸œā¸Ąā¸Šā¸­ā¸š, āš„ā¸›āš€ā¸Ĩāšˆā¸™ā¸—ā¸ĩāšˆāš‚ā¸Ŗā¸‡āš€ā¸Ŗā¸ĩā¸ĸā¸™ā¸™ā¸˛ā¸‡ā¸Ŗā¸­ā¸‡ ā¸šā¸¸ā¸Ŗā¸ĩā¸Ŗā¸ąā¸Ąā¸ĸāšŒ]
[7]:
doc.ents
[7]:
(āš‚ā¸Ŗā¸‡āš€ā¸Ŗā¸ĩā¸ĸā¸™ā¸™ā¸˛ā¸‡ā¸Ŗā¸­ā¸‡,)
[8]:
displacy.render(doc, style="ent",jupyter=True)
ā¸œā¸Ąāš€ā¸›āš‡ā¸™āšā¸Ąā¸§ ā¸œā¸Ąā¸Šā¸­ā¸šāš„ā¸›āš€ā¸Ĩāšˆā¸™ā¸—ā¸ĩāšˆ āš‚ā¸Ŗā¸‡āš€ā¸Ŗā¸ĩā¸ĸā¸™ā¸™ā¸˛ā¸‡ā¸Ŗā¸­ā¸‡ LOCATION ā¸šā¸¸ā¸Ŗā¸ĩā¸Ŗā¸ąā¸Ąā¸ĸāšŒ
[8]:

Dependency parsing

[9]:
nlp = spacy.blank('th')
nlp.add_pipe(
   "pythainlp",
   config={
      "sent":True,
      "dependency_parsing": True,
   }
)
[9]:
<spacy_pythainlp.core.PyThaiNLP at 0x7f9c0146e880>
[10]:
doc=nlp("ā¸œā¸Ąāš€ā¸›āš‡ā¸™āšā¸Ąā¸§ ā¸œā¸Ąā¸Šā¸­ā¸šāš„ā¸›āš€ā¸Ĩāšˆā¸™ā¸—ā¸ĩāšˆāš‚ā¸Ŗā¸‡āš€ā¸Ŗā¸ĩā¸ĸā¸™ā¸™ā¸˛ā¸‡ā¸Ŗā¸­ā¸‡ ā¸šā¸¸ā¸Ŗā¸ĩā¸Ŗā¸ąā¸Ąā¸ĸāšŒ")
Some weights of the model checkpoint at KoichiYasuoka/roberta-base-thai-spm-upos were not used when initializing RobertaModel: ['classifier.weight', 'classifier.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at KoichiYasuoka/roberta-base-thai-spm-upos and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
INFO:supar:Loading the data
INFO:supar:
Dataset(n_sentences=1, n_batches=1, n_buckets=1)
INFO:supar:Making predictions on the dataset
  0%|                  | 0/1 00:00<?, ?it/s/usr/local/lib/python3.8/dist-packages/torch/nn/modules/rnn.py:26: UserWarning: apply_permutation is deprecated, please use tensor.index_select(dim, permutation) instead
  warnings.warn("apply_permutation is deprecated, please use tensor.index_select(dim, permutation) instead")
INFO:supar:0:00:00.204603s elapsed, 4.89 Sents/s
/usr/local/lib/python3.8/dist-packages/pythainlp/tag/perceptron.py:42: UserWarning:
    LST20 corpus are free for research and open source only.

    If you want to use in Commercial use, please contract NECTEC.

    https://www.facebook.com/dancearmy/posts/10157641945708284

  warnings.warn(
[11]:
list(doc)
[11]:
[ā¸œā¸Ą, āš€ā¸›āš‡ā¸™, āšā¸Ąā¸§, ā¸œā¸Ą, ā¸Šā¸­ā¸š, āš„ā¸›, āš€ā¸Ĩāšˆā¸™, ā¸—ā¸ĩāšˆ, āš‚ā¸Ŗā¸‡āš€ā¸Ŗā¸ĩā¸ĸā¸™, ā¸™ā¸˛ā¸‡ā¸Ŗā¸­ā¸‡, ā¸šā¸¸ā¸Ŗā¸ĩā¸Ŗā¸ąā¸Ąā¸ĸāšŒ]
[12]:
list(doc.sents)
[12]:
[ā¸œā¸Ąāš€ā¸›āš‡ā¸™āšā¸Ąā¸§ ā¸œā¸Ąā¸Šā¸­ā¸šāš„ā¸›āš€ā¸Ĩāšˆā¸™, ā¸—ā¸ĩāšˆāš‚ā¸Ŗā¸‡āš€ā¸Ŗā¸ĩā¸ĸā¸™ā¸™ā¸˛ā¸‡ā¸Ŗā¸­ā¸‡ ā¸šā¸¸ā¸Ŗā¸ĩā¸Ŗā¸ąā¸Ąā¸ĸāšŒ]
[13]:
displacy.render(doc, style="ent",jupyter=True)
ā¸œā¸Ąāš€ā¸›āš‡ā¸™āšā¸Ąā¸§ ā¸œā¸Ąā¸Šā¸­ā¸šāš„ā¸›āš€ā¸Ĩāšˆā¸™ā¸—ā¸ĩāšˆ āš‚ā¸Ŗā¸‡āš€ā¸Ŗā¸ĩā¸ĸā¸™ā¸™ā¸˛ā¸‡ā¸Ŗā¸­ā¸‡ LOCATION ā¸šā¸¸ā¸Ŗā¸ĩā¸Ŗā¸ąā¸Ąā¸ĸāšŒ LOCATION
[14]:
displacy.render(doc, style="dep",jupyter=True)
ā¸œā¸Ą PRON āš€ā¸›āš‡ā¸™ VERB āšā¸Ąā¸§ NOUN ā¸œā¸Ą PRON ā¸Šā¸­ā¸š VERB āš„ā¸› AUX āš€ā¸Ĩāšˆā¸™ VERB ā¸—ā¸ĩāšˆ SCONJ āš‚ā¸Ŗā¸‡āš€ā¸Ŗā¸ĩā¸ĸā¸™ NOUN ā¸™ā¸˛ā¸‡ā¸Ŗā¸­ā¸‡ VERB ā¸šā¸¸ā¸Ŗā¸ĩā¸Ŗā¸ąā¸Ąā¸ĸāšŒ NOUN nsubj cop root nsubj acl xcomp case obl flat:name
[14]: