DSpace Repository

Thoughts on word and sentence segmentation in Thai

Show simple item record

dc.contributor.author Wirote Aroonmanakun
dc.contributor.other Chulalongkorn University. Faculty of Arts
dc.date.accessioned 2009-02-17T01:29:09Z
dc.date.available 2009-02-17T01:29:09Z
dc.date.issued 2007
dc.identifier.isbn 9789746230629
dc.identifier.uri http://cuir.car.chula.ac.th/handle/123456789/8823
dc.description.abstract This paper discusses problems of word and sentence segmentation in Thai. Disagreements on word segmentation are caused mostly from compound words. To set a standard resource and tool of word segmentation, we suggest that only simple words and true compound words should be segmented in the process of word segmentation. Other compounds can be grouped later by the same means as multiword identification in other languages. Sentence segmentation is also difficult because the boundary of sentence in Thai is fuzzy. We suggest that a discourse should be seen as a combination of clauses rather than sentences. Some discourse clues then can be used to segment these discourse units. The result from sentence segmentation module could be a sequence of segments composed of clauses, which then can be constructed into the discourse structure. en
dc.format.extent 373 bytes
dc.format.mimetype text/html
dc.language.iso en es
dc.publisher Chulalongkorn University en
dc.rights Chulalongkorn University en
dc.subject Thai language -- Sentences
dc.subject Thai language -- Phonology
dc.subject Word (Linguistics)
dc.title Thoughts on word and sentence segmentation in Thai en
dc.type Technical Report es
dc.email.author awirote@chula.ac.th
dc.description.publication Aroonmanakun, W. 2007. Thoughts on Word and Sentence Segmentation in Thai. In Proceedings of the Seventh Symposium on Natural Language Processing, Dec 13-15, 2007, Pattaya, Thailand. 85-90. en
dc.subject.keyword word segmentation en
dc.subject.keyword sentence segmentation en


Files in this item

This item appears in the following Collection(s)

Show simple item record