Creating the Thai National Corpus

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/8824

Title:	Creating the Thai National Corpus
Authors:	Wirote Aroonmanakun
Email:	awirote@chula.ac.th
Other author:	Chulalongkorn University. Faculty of Arts
Subjects:	Corpora (Linguistics) Thai National Corpus Thai language Computational linguistics Translating and interpreting -- Data processing
Issue Date:	2007
Publisher:	Chulalongkorn University
Citation:	Manusya. 13,[Special Issue],4-17
Abstract:	This paper reports on the progress of Thai National Corpus development. The TNC is designed as a general corpus of standard Thai. Only written texts are collected in the first phase. It aims to include at least eighty million words. Various text types produced by various authors are included in the TNC so that it would closely represent written language in general. Texts are word segmented and tagged following the Text Encoding Initiative (TEI) guidelines on text encoding. The TNC was designed as a resource for general applications, such as lexicography, language teaching, and linguistic research. In addition, the TNC is designed to be comparable to the British National Corpus so that a comparative study between the two languages is also possible.
Discipline Code:	1016
URI:	http://cuir.car.chula.ac.th/handle/123456789/8824
Type:	Article
Appears in Collections:	Arts - Journal Articles

Files in This Item:

File	Description	Size	Format
default.html		311 B	HTML	View/Open