การแยกอนุพากย์ภาษาไทยด้วยการใช้แบบจำลองซัพพอร์ตเวกเตอร์แมชชีน

นลินี อินต๊ะซาว

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/42818

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	วิโรจน์ อรุณมานะกุล	en_US
dc.contributor.author	นลินี อินต๊ะซาว	en_US
dc.contributor.other	จุฬาลงกรณ์มหาวิทยาลัย. คณะอักษรศาสตร์	en_US
dc.date.accessioned	2015-06-24T06:21:39Z	-
dc.date.available	2015-06-24T06:21:39Z	-
dc.date.issued	2556	en_US
dc.identifier.uri	http://cuir.car.chula.ac.th/handle/123456789/42818	-
dc.description	สารนิพนธ์ (ร.ม.)--จุฬาลงกรณ์มหาวิทยาลัย, 2563	en_US
dc.description.abstract	วัตถุประสงค์ของวิทยานิพนธ์นี้ คือ เพื่อหาลักษณ์ทางภาษาที่จะนำไปใช้ในการแยกอนุพากย์ภาษาไทยด้วยแบบจำลองซัพพอร์พเวกเตอร์แมชชีน และเปรียบเทียบลักษณ์ทางภาษาที่ใช้ ว่าส่งผลต่อประสิทธิภาพของระบบการแยกอนุพากย์อย่างไรบ้าง คลังข้อมูลที่ใช้ในการศึกษานี้เป็นภาษาเขียนทางวิชาการ มีขนาด 76,460 คำ ประกอบไปด้วย 8,102 อนุพากย์ แบบจำลองซัพพอร์ตเวกเตอร์แมชชีนที่ใช้ในการแยกอนุพากย์ในงานนี้ คือฟังก์ชั่น SMO ของโปรแกรมวีก้า (Weka) และฟังก์ชั่นเคอร์เนลที่ใช้คือโพลีโนเมียล ระบบทำการแยกอนุพากย์โดยรับข้อมูลเข้าเป็นคำเพื่อให้แบบจำลองตัดสินใจว่าคำนั้นเป็นคำขอบเขตเริ่มต้นอนุพากย์หรือไม่ การตัดสินใจของแบบจำลองอาศัยลักษณ์ทางภาษา ได้แก่ ลักษณ์หมวดคำปัจจุบัน หมวดคำก่อนหน้า หมวดคำตามหลัง รายการคำเชื่อมอนุพากย์ ความน่าจะเป็นของช่องว่างที่จะเป็นตัวแบ่งอนุพากย์ และเครื่องหมายวรรคตอน การเปรียบเทียบประสิทธิภาพของแต่ละลักษณ์ทำโดยการกำหนดชุดของลักษณ์รูปแบบต่าง ๆ แล้วนำไปทดสอบ รูปแบบของลักษณ์ที่ส่งผลต่อประสิทธิภาพของระบบมากที่สุด คือการใช้ทุกลักษณ์ร่วมกันทั้งหมด สามารถวัดค่าความถูกต้อง (F-measure) ได้ 81.17 เปอร์เซ็นต์ นอกจากนี้ เมื่อปรับค่าพารามิเตอร์ของเคอร์เนลโพลีโนเมียลให้สูงขึ้น พบว่าสามารถช่วยเพิ่มประสิทธิภาพของระบบได้ กล่าวคือ วัดค่าความถูกได้ 84.74 เปอร์เซ็นต์ เมื่อปรับค่าพารามิเตอร์ไว้ที่ D=4 แต่ก็ทำให้ค่าความแม่นยำลดลง 6 เปอร์เซ็นต์	en_US
dc.description.abstractalternative	The purposes of this study are to find out linguistic features to be used in Thai clause segmentation using support vector machine (SVM) model as well as to compare efficiency of those features on clause segmentation system. The corpus used in the study is a 76,460 word collection of Thai academic written language, consisting of 8,102 clauses. SMO, which is one of the functions in Weka, is used for training SVM. The kernel function used with SVM is polynomial kernel. The clause segmentation system uses words as inputs and decides whether a particular word is the beginning of the clause. The system's decision relies on linguistic-based features including the present word's part-of-speech, the previous word's part-of-speech, the following word's part-of-speech, lists of discourse markers, possibility of white space to be a clause separator, and punctuations. The performances of linguistic features are compared by preparing the set of feature patterns and testing those patterns. The feature pattern that performs best is the mix of all linguistic features which claims the F-measure of 81.17 percent. In addition, when changing the value of the kernel parameter to higher value, it is found that the performance of the system increases. That is, when adjusting the exponent D to the value of 4, the system claims the F-measure of 84.74 percent, but the precision has decreased by 6 percent.	en_US
dc.language.iso	th	en_US
dc.publisher	จุฬาลงกรณ์มหาวิทยาลัย	en_US
dc.relation.uri	http://doi.org/10.14457/CU.the.2013.292	-
dc.rights	จุฬาลงกรณ์มหาวิทยาลัย	en_US
dc.subject	ซัพพอร์ตเวกเตอร์แมชชีน	-
dc.subject	ภาษาไทย -- ประโยค	-
dc.subject	Support vector machines	-
dc.subject	Thai language -- Sentences	-
dc.title	การแยกอนุพากย์ภาษาไทยด้วยการใช้แบบจำลองซัพพอร์ตเวกเตอร์แมชชีน	en_US
dc.title.alternative	Thai Clause Segmentation Using a Support Vector Machine Model	en_US
dc.type	Thesis	en_US
dc.degree.name	อักษรศาสตรมหาบัณฑิต	en_US
dc.degree.level	ปริญญาโท	en_US
dc.degree.discipline	ภาษาศาสตร์	en_US
dc.degree.grantor	จุฬาลงกรณ์มหาวิทยาลัย	en_US
dc.email.advisor	awirote@chula.ac.th	en_US
dc.identifier.DOI	10.14457/CU.the.2013.292	-
Appears in Collections:	Arts - Theses

Files in This Item:

File	Description	Size	Format
5380139422.pdf		2.84 MB	Adobe PDF	View/Open

Show simple item record