การปรับปรุงระบบกรองอีเมลสแปมสำหรับภาษาไทยด้วยวิธีการทางสถิติ

เฉลิมพล ณ สงขลา

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/15861

Title:	การปรับปรุงระบบกรองอีเมลสแปมสำหรับภาษาไทยด้วยวิธีการทางสถิติ
Other Titles:	Enhancing spam email filter system for Thai using statistical method
Authors:	เฉลิมพล ณ สงขลา
Advisors:	เกริก ภิรมย์โสภา
Other author:	จุฬาลงกรณ์มหาวิทยาลัย. คณะวิศวกรรมศาสตร์
Advisor's Email:	Krerk@cp.eng.chula.ac.th
Subjects:	การกรองสแปม (จดหมายอิเล็กทรอนิกส์) สแปม (จดหมายอิเล็กทรอนิกส์) จดหมายอิเล็กทรอนิกส์
Issue Date:	2552
Publisher:	จุฬาลงกรณ์มหาวิทยาลัย
Abstract:	ศึกษาปัญหาอีเมลสแปมและวิธีการแก้ไขปัญหาอีเมลสแปม โดยมุ่งศึกษาวิธีการแก้ไขปัญหาอีเมลสแปมสำหรับภาษาไทย วิธีการแก้ไขปัญหาอีเมลสแปมที่มีระบบการเรียนรู้แบบเบย์โดยทั่วไปนั้นกรองอีเมลสแปมภาษาไทยได้ไม่มีประสิทธิภาพ เนื่องจากภาษาไทยมีลักษณะเฉพาะ ยกตัวอย่างเช่น ไม่มีขอบเขตแบ่งคำที่แน่นอน เป็นต้น จึงจำเป็นต้องใช้โปรแกรมตัดคำไทยเพื่อช่วยประมวลผลคำไทย ส่วนหนึ่งของวิทยานิพนธ์นี้ได้นำเสนอวิธีการปรับปรุงระบบกรองอีเมลที่มีระบบการเรียนรู้แบบเบย์สำหรับภาษาไทย ผลการทดสอบแสดงให้เห็นว่า ระบบกรองอีเมลที่มีระบบการเรียนรู้แบบเบย์ และใช้โปรแกรมตัดคำไทยนั้นมีประสิทธิภาพสูงขึ้น อย่างไรก็ตามความรู้ที่ระบบการเรียนรู้ได้เรียนรู้นั้น ไม่สามารถนำมาใช้ร่วมกันระหว่างเครื่องอีเมลแม่ข่ายได้ จุดประสงค์ของวิทยานิพนธ์นี้ได้นำเสนอวิธีการสร้างกฎด้วยวิธีการทางสถิติ ซึ่งเป็นวิธีการซึ่งรวมข้อดีของวิธีการแก้ไขปัญหาอีเมลสแปมด้วยกฎและวิธีการแก้ไขปัญหาอีเมลสแปมที่มีระบบการเรียนรู้เข้าด้วยกัน กฎที่สร้างได้สามารถนำมาใช้ร่วมกันระหว่างเครื่องแม่ข่ายอีเมล และสามารถรับมือกับรูปแบบอีเมลสแปมที่หลากหลายได้ ผลการทดสอบแสดงให้เห็นว่า วิธีการที่นำเสนอสามารถปรับเพื่อกรองอีเมลสแปมภาษาไทยได้
Other Abstract:	To study the spam-email problems and the anti-spam solutions by focusing on anti-spam solutions for Thai-spam email. The general Bayesian-learning- anti-spam solution filters Thai-spam email ineffectively. Since Thai language has specific characteristics (i.e. no word boundary), word segmentation should be applied in order to process the Thai words correctly. One part of this thesis is to enhance Bayesian learning for Thai spam detection. The result of this part shows that Bayesian learning spam detection with Thai word segmentation program can filter Thai spam more effectively. However, the knowledge cannot be shared among mail servers. The goal of this thesis is to generate rules from statistical method which combines the advantage of rule-based method and the advantage of learning method. The generated rules can be shared among mail servers and can keep up with the variations of spam email. The result shows that our proposed method can adaptively filter Thai email spam.
Description:	วิทยานิพนธ์ (วศ.ม.)--จุฬาลงกรณ์มหาวิทยาลัย, 2552
Degree Name:	วิศวกรรมศาสตรมหาบัณฑิต
Degree Level:	ปริญญาโท
Degree Discipline:	วิศวกรรมคอมพิวเตอร์
URI:	http://cuir.car.chula.ac.th/handle/123456789/15861
URI:	http://doi.org/10.14457/CU.the.2009.1080
metadata.dc.identifier.DOI:	10.14457/CU.the.2009.1080
Type:	Thesis
Appears in Collections:	Eng - Theses

Files in This Item:

File	Description	Size	Format
Chalermpol_Na.pdf		2.9 MB	Adobe PDF	View/Open

Show full item record