การคัดแยกไวรัสคอมพิวเตอร์จากรหัสฐานสอง

ประสิทธิ์ อุษาฟ้าพนัส

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/59622

Title:	การคัดแยกไวรัสคอมพิวเตอร์จากรหัสฐานสอง
Other Titles:	Classification of Computer Viruses from binary code
Authors:	ประสิทธิ์ อุษาฟ้าพนัส
Advisors:	เกริก ภิรมย์โสภา
Other author:	จุฬาลงกรณ์มหาวิทยาลัย. คณะวิศวกรรมศาสตร์
Advisor's Email:	Krerk.P@Chula.ac.th,Krerk.P@chula.ac.th
Subjects:	ไวรัสคอมพิวเตอร์ Computer viruses
Issue Date:	2560
Publisher:	จุฬาลงกรณ์มหาวิทยาลัย
Abstract:	งานวิจัยนี้นำเสนอการใช้การเรียนรู้แบบมีผู้สอนเพื่อตรวจจับไฟล์ไวรัสคอมพิวเตอร์ที่ไม่เคยพบมาก่อนแบบ static ผู้วิจัยได้ทดสอบกับตัวแยกประเภทจำนวน 3 แบบ คือ random forest, multilayer perceptron และ extreme gradient boosting ชุดข้อมูลประกอบด้วย 6319 ไฟล์ executable แต่ละไฟล์ถูกสกัดด้วย objdump แล้วจัดเรียงตามคะแนน TF-IDF เพื่อหา feature ที่เหมาะสม ผลลัพธ์เปรียบเทียบด้วย F1-score คือ สามารถใช้ตัวแยกประเภทแบบ random forest ร่วมกับข้อมูลที่มี 20 attribute ได้ 0.937 F1-score ซึ่งมากกว่าบรรทัดฐานอยู่ 0.031 F1-score และ สามารถใช้ตัวแยกประเภทแบบ extreme gradient boosting ร่วมกับข้อมูลที่มี 500 attribute ได้ 0.962 F1-score ซึ่งมากกว่าบรรทัดฐานอยู่ 0.041 F1-score จึงสรุปได้ว่าวิธีการในงานวิจัยนี้สามารถเพิ่ม precision และ recall ของการแยกประเภทได้
Other Abstract:	This thesis proposes a supervised machine learning model for detecting (unseen) viruses files. Our main focus is on static analysis approach. To find the best method, we experiment with difference types of feature extraction and three classifier algorithms including extreme gradient boosting, random forest and multilayer perceptron. Our data set contains 6,319 executable files. Each file is extracted with objdump and sorted with TF-IDF score to find best features. The F1-score shows slightly better performance than those of the baselines. Random forest with 20 attributes yields 0.937 F1 score which is 0.031 more than that of the baseline . The extreme gradient boosting method with 500 attributes achieve 0.962 F1 score, 0.041 more than that of the baseline. We conclude that our approach can improve the precision and recall of the classification.
Description:	วิทยานิพนธ์ (วศ.ม.)--จุฬาลงกรณ์มหาวิทยาลัย, 2560
Degree Name:	วิศวกรรมศาสตรมหาบัณฑิต
Degree Level:	ปริญญาโท
Degree Discipline:	วิศวกรรมคอมพิวเตอร์
URI:	http://cuir.car.chula.ac.th/handle/123456789/59622
URI:	http://doi.org/10.58837/CHULA.THE.2017.1374
metadata.dc.identifier.DOI:	10.58837/CHULA.THE.2017.1374
Type:	Thesis
Appears in Collections:	Eng - Theses

Files in This Item:

File	Description	Size	Format
5870411921.pdf		2.96 MB	Adobe PDF	View/Open

Show full item record