Thai Chunk Parser
In PyThaiNLP, We use chunk data from ORCHID++ corpus.
Read more: https://github.com/PyThaiNLP/pythainlp/pull/524
[10]:
!pip install pythainlp svgling nltk
Requirement already satisfied: pythainlp in /usr/local/lib/python3.7/dist-packages (2.3.1)
Collecting svgling
Downloading svgling-0.3.0-py3-none-any.whl (21 kB)
Requirement already satisfied: nltk in /usr/local/lib/python3.7/dist-packages (3.2.5)
Requirement already satisfied: requests>=2.22.0 in /usr/local/lib/python3.7/dist-packages (from pythainlp) (2.23.0)
Requirement already satisfied: tinydb>=3.0 in /usr/local/lib/python3.7/dist-packages (from pythainlp) (4.5.1)
Requirement already satisfied: python-crfsuite>=0.9.6 in /usr/local/lib/python3.7/dist-packages (from pythainlp) (0.9.7)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests>=2.22.0->pythainlp) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests>=2.22.0->pythainlp) (2021.5.30)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests>=2.22.0->pythainlp) (1.24.3)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests>=2.22.0->pythainlp) (2.10)
Requirement already satisfied: typing-extensions<4.0.0,>=3.10.0 in /usr/local/lib/python3.7/dist-packages (from tinydb>=3.0->pythainlp) (3.10.0.0)
Collecting svgwrite
Downloading svgwrite-1.4.1-py3-none-any.whl (66 kB)
|████████████████████████████████| 66 kB 2.6 MB/s
Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from nltk) (1.15.0)
Installing collected packages: svgwrite, svgling
Successfully installed svgling-0.3.0 svgwrite-1.4.1
[11]:
from pythainlp.tokenize import word_tokenize
from pythainlp.tag import pos_tag
from pythainlp.tag import chunk_parse
from nltk.chunk import conlltags2tree
import svgling
[3]:
def test(txt):
m = [(w,t) for w,t in pos_tag(word_tokenize(txt), engine= 'perceptron',corpus = 'orchid')]
tag = chunk_parse(m)
p = [(w,t,tag[i]) for i,(w,t) in enumerate(m)]
return p
[12]:
svgling.draw_tree(conlltags2tree(test("แมวกินปลา")))
[12]:
[13]:
svgling.draw_tree(conlltags2tree(test("คนหนองคายเป็นน่ารัก")))
[13]:
[14]:
svgling.draw_tree(conlltags2tree(test("ปลาอะไรอยู่ในน้ำ")))
[14]:
[15]:
svgling.draw_tree(conlltags2tree(test("ในน้ำมีอะไรอยู่")))
[15]:
[16]:
svgling.draw_tree(conlltags2tree(test("ทำไมเขารักคุณ")))
[16]:
[17]:
svgling.draw_tree(conlltags2tree(test("คนอะไรอยู่หลังต้นไม้")))
[17]:
[9]: