Thai spelling correction and word normalization on social text using a two-stage pipeline with neural contextual attention

Anuruth Lertpiya

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/70327

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Ekapol Chuangsuwanich	-
dc.contributor.author	Anuruth Lertpiya	-
dc.contributor.other	Chulalongkorn University. Faculty of Engineering	-
dc.date.accessioned	2020-11-11T13:53:59Z	-
dc.date.available	2020-11-11T13:53:59Z	-
dc.date.issued	2019	-
dc.identifier.uri	http://cuir.car.chula.ac.th/handle/123456789/70327	-
dc.description	Thesis (M.Eng.)--Chulalongkorn University, 2019	-
dc.description.abstract	Text correction systems (e.g., spell checkers) have been used to improve the quality of computerized text by detecting and correcting errors. However, the task of performing spelling correction and word normalization (text correction) for Thai social media text has remained largely unexplored. In this thesis, we investigated how current text correction systems perform on correcting errors and word variances in Thai social texts and propose a method designed for this task. We have found that currently available Thai text correction systems are insufficiently robust for correcting spelling errors and word variances, while the text correctors designed for English grammatical error correction suffer from overcorrections (text rewrites). Thus, we proposed a neural-based text corrector with a two-stage structure to alleviate issues of overcorrections while exploiting the benefits of a neural Seq2Seq corrector. Our method consists of a neural-based error detector and a Seq2Seq neural error corrector with contextual attention. This novel architecture allows the Seq2Seq network to produce corrections based on both the erroneous text and its context without the need for an end-to-end structure. Our method outperformed all the other evaluated text correction systems.	-
dc.description.abstractalternative	ระบบแก้ไขข้อความ (เช่นระบบแก้คำผิด) ถูกนำมาใช้เพื่อปรับปรุงคุณภาพของข้อมูลตัวอักษรบนระบบคอมพิวเตอร์โดยการตรวจจับและแก้ไขข้อผิดพลาด งานวิจัยก่อนหน้ายังไม่ได้รับการสำรวจโจทย์การแก้ไขคำผิดและการทำให้เป็นมาตรฐานของข้อความ (การแก้ไขข้อความ) สำหรับข้อความโซเชียลมีเดียภาษาไทย ในวิทยานิพนธ์ฉบับนี้เราได้ศึกษาความสามารถของระบบแก้ไขข้อความในปัจจุบันบนโจทย์การแก้ไขคำผิดและการทำให้เป็นมาตรฐานของข้อความ บนโซเชียลมีเดียภาษาไทย และ เสนอวิธีการที่ได้ถูกออกแบบมาสำหรับโจทย์นี้ เราพบว่าระบบแก้ไขข้อความภาษาไทยที่มีอยู่ในปัจจุบันมีประสิทธิภาพไม่เพียงพอสำหรับการแก้ไขคำผิดและความไม่เป็นมาตรฐานของข้อความ ในขณะที่ระบบแก้ไขข้อผิดพลาดทางไวยากรณ์ภาษาอังกฤษมีปัญหาการแก้ไขมากเกินไป (การเขียนข้อความใหม่) ดังนั้นเราจึงเสนอระบบแก้ไขข้อความ ซึ่งใช้ระบบประสาทเทียมที่งานสองขั้นตอนเพื่อบรรเทาปัญหาการแก้ไขมากเกินไปในขณะที่ได้ประโยชน์จากระบบประสาทเทียมแบบข้อความสู่ข้อความ ระบบของเราประกอบด้วยตัวตรวจจับข้อผิดพลาดที่ใช้ระบบประสาทเทียม และตัวแก้ไขข้อผิดพลาดทางประสาทแบบข้อความสู่ข้อความที่ใช้กลไกจุดสนใจบนบริบท สถาปัตยกรรมแบบใหม่นี้ช่วยให้ระบบประสาทเทียมแบบข้อความสู่ข้อความสร้างแก้ไขตามทั้งข้อความโดยคำนึงถึงบริบทโดยไม่จำเป็นต้องทำงานแบบหนึ่งขั้นตอนวิธีการของเรามีประสิทธิภาพดีกว่าระบบแก้ไขข้อความอื่นๆ ที่เราได้ประเมินทั้งหมด	-
dc.language.iso	en	-
dc.publisher	Chulalongkorn University	-
dc.relation.uri	http://doi.org/10.58837/CHULA.THE.2019.155	-
dc.rights	Chulalongkorn University	-
dc.subject	Text editors (Computer programs)	-
dc.subject	Text processing (Computer science	-
dc.subject	โปรแกรมบรรณาธิกรข้อความ	-
dc.subject	การประมวลผลข้อความ	-
dc.subject.classification	Computer Science	-
dc.subject.classification	Computer Science	-
dc.title	Thai spelling correction and word normalization on social text using a two-stage pipeline with neural contextual attention	-
dc.title.alternative	การแก้คำผิดและทำให้เป็นมาตราฐานบนข้อความโซเชียลมีเดียภาษาไทยโดยการทำงานสองขั้นตอนด้วยโครงข่ายประสาทเทียมที่ใช้กลไกจุดสนใจบนบริบท	-
dc.type	Thesis	-
dc.degree.name	Master of Engineering	-
dc.degree.level	Master's Degree	-
dc.degree.discipline	Computer Engineering	-
dc.degree.grantor	Chulalongkorn University	-
dc.email.advisor	Ekapol.C@Chula.ac.th	-
dc.identifier.DOI	10.58837/CHULA.THE.2019.155	-
Appears in Collections:	Eng - Theses

Files in This Item:

File	Description	Size	Format
6170322321.pdf		2.36 MB	Adobe PDF	View/Open

Show simple item record