Classification of abusive Thai messages in social networks using deep learning

Ruangsung Wanasukapunt

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/79896

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Suphakant Phimoltares	-
dc.contributor.author	Ruangsung Wanasukapunt	-
dc.contributor.other	Chulalongkorn University. Faculty of Science	-
dc.date.accessioned	2022-07-23T04:52:20Z	-
dc.date.available	2022-07-23T04:52:20Z	-
dc.date.issued	2021	-
dc.identifier.uri	http://cuir.car.chula.ac.th/handle/123456789/79896	-
dc.description	Thesis (M.Sc.)--Chulalongkorn University, 2021	-
dc.description.abstract	Social media has improved on traditional news sources by allowing increased access to information. However, the anonymity social media provides can lead to abusive and hateful speech without detection or repercussion from individuals with malicious intentions. This research develops a binomial and a multinomial classification model for classifying Thai social media text for five categories of abusive content detection in social media that include Rude, Figurative, Dirty, Offensive and Non-Abusive. The experiments demonstrated that DistilBERT achieved the highest F1 score with 0.8510 for the binomial model and 0.9067 for the multinomial model. BiLSTM performed second best with an F1 score of 0.8403 and 0.8969 for the binomial and multinomial models, respectively. Both deep learning models outperformed the traditional machine learning classifiers’ highest F1 score of 0.7452 and 0.8090 for the binomial and multinomial models, respectively. The deep learning architectures allow for better contextual representations of the words with the DistilBERT, enabling better modeling of long-range dependencies between words.	-
dc.description.abstractalternative	สื่อสังคมมีการปรับปรุงแหล่งข่าวแบบดั้งเดิมโดยอนุญาตให้มีการเข้าถึงข่าวสารเพิ่มขึ้น อย่างไรก็ตามการยอมไม่ให้เปิดเผยชื่อในสื่อสังคมก่อให้เกิดข้อความที่ใช้ไม่เหมาะสมและมีเจตนาร้ายโดยปราศจากการตรวจหาหรือผลที่ตามมาจากบุคคลด้วยความตั้งใจมุ่งร้าย งานวิจัยนี้พัฒนาตัวแบบการจำแนกแบบทวินามและอเนกนามสำหรับจำแนกข้อความบนสื่อสังคมไทยออกเป็นห้าประเภทสำหรับการตรวจหาเนื้อหาที่ไม่เหมาะสมในสื่อสังคม อันได้แก่ข้อความหยาบคาย ข้อความอุปมาอุปไมย ข้อความลามก ข้อความก้าวร้าว และข้อความที่ใช้ได้เหมาะสม การทดลองได้แสดงให้เห็นว่าดิสทิลเบิร์ทได้ให้คะแนนเอฟวันสูงสุดที่ 0.8510 สำหรับตัวแบบทวินามและ 0.9067 สำหรับตัวแบบอเนกนาม แอลเอสทีเอ็มแบบสองทิศทางได้ให้ผลดีที่สุดเป็นอันดับสองด้วยคะแนนเอฟวัน 0.8403 และ 0.8969 สำหรับตัวแบบทวินามและอเนกนามตามลำดับ ตัวแบบการเรียนรู้เชิงลึกทั้งสองได้ผลที่ดีกว่าตัวแบบการเรียนรู้ของเครื่องแบบดั้งเดิมที่มีคะแนนเอฟวันสูงสุดอยู่ที่ 0.7452 และ 0.8090 สำหรับตัวแบบทวินามและอเนกนามตามลำดับ สถาปัตยกรรมการเรียนรู้เชิงลึกได้ยอมให้การแทนเชิงบริบทของกลุ่มคำดีขึ้น โดยดิสทิลเบิร์ทได้ทำให้การสร้างตัวแบบของความเกี่ยวข้องกันระหว่างกลุ่มคำในช่วงที่ยาวดีขึ้น	-
dc.language.iso	en	-
dc.publisher	Chulalongkorn University	-
dc.relation.uri	http://doi.org/10.58837/CHULA.THE.2021.116	-
dc.rights	Chulalongkorn University	-
dc.title	Classification of abusive Thai messages in social networks using deep learning	-
dc.title.alternative	การจำแนกข้อความไทยที่ใช้ไม่เหมาะสมในเครือข่ายสังคมโดยใช้การเรียนรู้เชิงลึก	-
dc.type	Thesis	-
dc.degree.name	Master of Science	-
dc.degree.level	Master's Degree	-
dc.degree.discipline	Computer Science and Information Technology	-
dc.degree.grantor	Chulalongkorn University	-
dc.identifier.DOI	10.58837/CHULA.THE.2021.116	-
Appears in Collections:	Sci - Theses

Files in This Item:

File	Description	Size	Format
6172627123.pdf		2.11 MB	Adobe PDF	View/Open

Show simple item record