การตรวจแก้การสะกดผิดแบบเป็นคำจริงในภาษาไทยโดยใช้แบบจำลองไตรแกรม

พลวัฒน์ ไหลมนู

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/58156

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	วิโรจน์ อรุณมานะกุล	-
dc.contributor.author	พลวัฒน์ ไหลมนู	-
dc.contributor.other	จุฬาลงกรณ์มหาวิทยาลัย. คณะอักษรศาสตร์	-
dc.date.accessioned	2018-04-11T01:32:12Z	-
dc.date.available	2018-04-11T01:32:12Z	-
dc.date.issued	2559	-
dc.identifier.uri	http://cuir.car.chula.ac.th/handle/123456789/58156	-
dc.description	วิทยานิพนธ์ (อ.ม.)--จุฬาลงกรณ์มหาวิทยาลัย, 2559	-
dc.description.abstract	งานวิจัยนี้มีวัตถุประสงค์เพื่อรวบรวมและวิเคราะห์การสะกดผิดแบบเป็นคำจริงในภาษาไทยที่พบบนอินเทอร์เน็ต พร้อมกับพัฒนาระบบตรวจแก้การสะกดผิดแบบเป็นคำจริงในภาษาไทยด้วยแบบจำลองไตรแกรมและประเมินประสิทธิภาพของระบบที่พัฒนาขึ้น งานวิจัยนี้แบ่งออกเป็นสองส่วน ส่วนแรกเป็นการวิเคราะห์การสะกดผิดแบบเป็นคำจริงในภาษาไทยจำนวน 1,674 คำ จากหนังสือคำไทยที่มักเขียนผิดรวบรวมโดยผู้เชี่ยวชาญภาษาไทย ซึ่งทุกคำล้วนผ่านการตัดคำสำเร็จและพบตัวอย่างการใช้จริงบนอินเทอร์เน็ต จากการวิเคราะห์พบว่าคำที่สะกดผิดเหล่านี้ส่วนใหญ่หรือร้อยละ 80 เป็นคำที่สะกดผิดหนึ่งตำแหน่งซึ่งมักจะสะกดผิดที่พยัญชนะต้นมากที่สุด และส่วนที่เหลืออีก 20% เป็นคำที่สะกดผิดหลายตำแหน่งและส่วนใหญ่จะยังออกเสียงเหมือนเดิม ในส่วนที่สองเป็นการพัฒนาระบบตรวจแก้การสะกดผิดแบบเป็นคำจริงในภาษาไทยด้วยแบบจำลองไตรแกรมพร้อมกับประเมินประสิทธิภาพของระบบ ข้อมูลที่นำมาใช้ทดสอบเป็นข้อความที่มีคำสะกดผิดอยู่อย่างน้อยหนึ่งคำและคำนั้นจะต้องเป็นคำที่สะกดผิดแบบเป็นคำจริง จำนวน 1,000 ข้อความ ซึ่งระบบจะทำการตรวจจับคำที่สะกดผิดทั้งหมดในข้อความโดยนำสายคำเรียงสามแต่ละสายของข้อความเทียบกับคลังข้อมูลไตรแกรม หากไม่พบแสดงว่าสายคำเรียงสามนั้นต้องสงสัยว่าสะกดผิด โดยสายคำเรียงสามที่ต้องสงสัยทั้งหมดจะถูกนำไปปรับแก้ด้วยวิธีการปรับแก้น้อยสุด จากนั้นสายเรียงสามคำที่ถูกปรับแก้แล้วจะถูกนำไปแทนที่การสะกดผิดเดิมแล้วคำนวณหาค่าความน่าจะเป็นของข้อความ ซึ่งระบบจะเลือกสายคำเรียงสามที่ให้ค่าความน่าจะเป็นของข้อความสูงสุดมาใช้แก้ไขการสะกดผิด ผู้วิจัยได้ประเมินประสิทธิภาพของระบบในสามด้าน ได้แก่ ด้านระยะเวลาในการประมวลผลพบว่าระบบแบบจำลองไตรแกรมใช้เวลาในการประมวลผลทั้งหมด 128 วินาที ด้านประสิทธิภาพในการตรวจจับคำที่สะกดผิดแบบเป็นคำจริงในภาษาไทยพบว่ามีค่าความแม่นยำ (precision) และค่าความครบถ้วน (recall) เท่ากัน คือ 0.47 ส่วนด้านประสิทธิภาพในการแก้ไขคำที่สะกดผิดแบบเป็นคำจริงในภาษาไทยพบว่ามีค่าความครบถ้วนและค่าความแม่นยำอยู่ที่ 0.85	-
dc.description.abstractalternative	This research aims to collect and analyze Thai real-word spelling errors found on the internet, develop a Thai real-word error spelling correction program using a trigram model, and evaluate its performance. This research consists of two parts; first is an analysis of 1,674 Thai real-word spelling errors found in ‘Thai often misspelled words’ books. It is found that 80 percent of these analyzed errors contain only one spelling error which mostly occurs at word initial position. The other 20 percent of the analyzed errors have more than one spelling errors individually, most of which are pronounced the same. The latter part of the research is about developing a Thai real-word spelling error correction program using a trigram model and evaluating its performance. The test data are 1,000 Thai strings of words. Each contains at least one real-word spelling error. To detect a real-word error, word trigrams of each string are checked with the trigram corpus and those which do not exist in the corpus are considered misspelling suspects. Then, all misspelling suspects are edited to generate possible candidates. Only one candidate, that gives the highest probability of the observed string when replacing the detected error, is chosen as the correct one. The program’s efficiency is measured in three aspects. First is the processing duration. The program’s execution takes 128 seconds to finish. Second is the error detection efficiency. It is found that the values of precision and recall are similar, which is 0.47. Last is the error correction efficiency. The program’s values of precision and recall are also equal, which is 0.85.	-
dc.language.iso	th	-
dc.publisher	จุฬาลงกรณ์มหาวิทยาลัย	-
dc.relation.uri	http://doi.org/10.58837/CHULA.THE.2016.718	-
dc.rights	จุฬาลงกรณ์มหาวิทยาลัย	-
dc.title	การตรวจแก้การสะกดผิดแบบเป็นคำจริงในภาษาไทยโดยใช้แบบจำลองไตรแกรม	-
dc.title.alternative	THAI REAL-WORD SPELLING ERROR CORRECTION USING A TRIGRAM MODEL	-
dc.type	Thesis	-
dc.degree.name	อักษรศาสตรมหาบัณฑิต	-
dc.degree.level	ปริญญาโท	-
dc.degree.discipline	ภาษาศาสตร์	-
dc.degree.grantor	จุฬาลงกรณ์มหาวิทยาลัย	-
dc.email.advisor	Wirote.A@Chula.ac.th,awirote@gmail.com	-
dc.identifier.DOI	10.58837/CHULA.THE.2016.718	-
Appears in Collections:	Arts - Theses

Files in This Item:

File	Description	Size	Format
5680132222.pdf		2.54 MB	Adobe PDF	View/Open

Show simple item record