Creating the Thai National Corpus

Wirote Aroonmanakun

dc.contributor.author	Wirote Aroonmanakun
dc.contributor.other	Chulalongkorn University. Faculty of Arts
dc.date.accessioned	2009-02-18T02:37:59Z
dc.date.available	2009-02-18T02:37:59Z
dc.date.issued	2007
dc.identifier.citation	Manusya. 13,[Special Issue],4-17
dc.identifier.uri	http://cuir.car.chula.ac.th/handle/123456789/8824
dc.description.abstract	This paper reports on the progress of Thai National Corpus development. The TNC is designed as a general corpus of standard Thai. Only written texts are collected in the first phase. It aims to include at least eighty million words. Various text types produced by various authors are included in the TNC so that it would closely represent written language in general. Texts are word segmented and tagged following the Text Encoding Initiative (TEI) guidelines on text encoding. The TNC was designed as a resource for general applications, such as lexicography, language teaching, and linguistic research. In addition, the TNC is designed to be comparable to the British National Corpus so that a comparative study between the two languages is also possible.	en
dc.format.extent	311 bytes
dc.format.mimetype	text/html
dc.language.iso	en	es
dc.publisher	Chulalongkorn University	en
dc.rights	Chulalongkorn University	en
dc.subject	Corpora (Linguistics)
dc.subject	Thai National Corpus
dc.subject	Thai language
dc.subject	Computational linguistics
dc.subject	Translating and interpreting -- Data processing
dc.title	Creating the Thai National Corpus	en
dc.type	Article	es
dc.email.author	awirote@chula.ac.th
dc.description.publication	Aroonmanakun, W. 2007. Creating the Thai National Corpus. Manusaya. Special Issue No.13, 4-17.	en
dc.subject.keyword	TNC	en
dc.subject.keyword	corpus linguistics	en
dc.discipline.code	1016	es