Interactive online version: Binder badge Google Colab badge

Thai Chunk Parser

In PyThaiNLP, We use chunk data from ORCHID++ corpus.

Read more: https://github.com/PyThaiNLP/pythainlp/pull/524

[10]:
!pip install pythainlp svgling nltk
Requirement already satisfied: pythainlp in /usr/local/lib/python3.7/dist-packages (2.3.1)
Collecting svgling
  Downloading svgling-0.3.0-py3-none-any.whl (21 kB)
Requirement already satisfied: nltk in /usr/local/lib/python3.7/dist-packages (3.2.5)
Requirement already satisfied: requests>=2.22.0 in /usr/local/lib/python3.7/dist-packages (from pythainlp) (2.23.0)
Requirement already satisfied: tinydb>=3.0 in /usr/local/lib/python3.7/dist-packages (from pythainlp) (4.5.1)
Requirement already satisfied: python-crfsuite>=0.9.6 in /usr/local/lib/python3.7/dist-packages (from pythainlp) (0.9.7)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests>=2.22.0->pythainlp) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests>=2.22.0->pythainlp) (2021.5.30)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests>=2.22.0->pythainlp) (1.24.3)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests>=2.22.0->pythainlp) (2.10)
Requirement already satisfied: typing-extensions<4.0.0,>=3.10.0 in /usr/local/lib/python3.7/dist-packages (from tinydb>=3.0->pythainlp) (3.10.0.0)
Collecting svgwrite
  Downloading svgwrite-1.4.1-py3-none-any.whl (66 kB)
     |████████████████████████████████| 66 kB 2.6 MB/s
Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from nltk) (1.15.0)
Installing collected packages: svgwrite, svgling
Successfully installed svgling-0.3.0 svgwrite-1.4.1
[11]:
from pythainlp.tokenize import word_tokenize
from pythainlp.tag import pos_tag
from pythainlp.tag import chunk_parse
from nltk.chunk import conlltags2tree
import svgling
[3]:
def test(txt):
    m = [(w,t) for w,t in pos_tag(word_tokenize(txt), engine= 'perceptron',corpus = 'orchid')]
    tag = chunk_parse(m)
    p = [(w,t,tag[i]) for i,(w,t) in enumerate(m)]
    return p
[12]:
svgling.draw_tree(conlltags2tree(test("แมวกินปลา")))
[12]:
../_images/notebooks_pythainlp_chunk_4_0.svg
[13]:
svgling.draw_tree(conlltags2tree(test("คนหนองคายเป็นน่ารัก")))
[13]:
../_images/notebooks_pythainlp_chunk_5_0.svg
[14]:
svgling.draw_tree(conlltags2tree(test("ปลาอะไรอยู่ในน้ำ")))
[14]:
../_images/notebooks_pythainlp_chunk_6_0.svg
[15]:
svgling.draw_tree(conlltags2tree(test("ในน้ำมีอะไรอยู่")))
[15]:
../_images/notebooks_pythainlp_chunk_7_0.svg
[16]:
svgling.draw_tree(conlltags2tree(test("ทำไมเขารักคุณ")))
[16]:
../_images/notebooks_pythainlp_chunk_8_0.svg
[17]:
svgling.draw_tree(conlltags2tree(test("คนอะไรอยู่หลังต้นไม้")))
[17]:
../_images/notebooks_pythainlp_chunk_9_0.svg
[9]: