การจำแนกแบบหลายฉลากโดยใช้การเรียนรู้เชิงรุกบนชุดข้อมูลขนาดใหญ่และไม่สมดุล

ไพโรจน์ ตันติวชิรฐากูร

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/55510

Title:	การจำแนกแบบหลายฉลากโดยใช้การเรียนรู้เชิงรุกบนชุดข้อมูลขนาดใหญ่และไม่สมดุล
Other Titles:	Multi-Label Classification Using Active learning on Large Scale and Imbalanced Data Sets
Authors:	ไพโรจน์ ตันติวชิรฐากูร
Advisors:	พีรพล เวทีกูล
Other author:	จุฬาลงกรณ์มหาวิทยาลัย. คณะวิศวกรรมศาสตร์
Advisor's Email:	Peerapon.V@chula.ac.th,peerapon.v@chula.ac.th
Issue Date:	2559
Publisher:	จุฬาลงกรณ์มหาวิทยาลัย
Abstract:	ปัจจุบันข้อมูลมีความซับซ้อนมากขึ้น ข้อมูลหนึ่งตัวอย่างสามารถเป็นได้มากกว่าหนึ่งคลาส ซึ่งข้อมูลมักจะมีขนาดใหญ่และเพิ่มขึ้นมหาศาล ดังนั้นในงานวิจัยชิ้นนี้จึงนำเสนอการเรียนรู้แบบเชิงรุก เป็นการเรียนรู้ที่ค่อยๆเรียนรู้จากข้อมูลขนาดใหญ่ โดยเริ่มต้นเรียนรู้จากกลุ่มตัวแทนข้อมูลที่ถูกเลือก แล้วเรียนรู้เพิ่มเติมจากข้อมูลที่จำแนกผิดพลาดในอดีตโดยแบบจำลอง จึงทำให้สามารถแก้ไขปัญหาการจำแนกข้อมูลขนาดใหญ่ได้ นอกจากนั้นข้อมูลแบบหลายฉลากมักมีปัญหาเรื่องความไม่สมดุลของข้อมูล จึงได้นำเสนอการแก้ไขปัญหาไม่สมดุลของข้อมูลด้วยเทคนิคที่เหมาะสมของแต่ละตัวจำแนก ซึ่งในการทดลองจะใช้ตัวจำแนกซัพพอร์ตเวกเตอร์แมชชีน และนิวรอลเน็ตเวิร์กสำหรับการสร้างแบบจำลองเริ่มต้น และการเรียนรู้เพิ่มเติม จากผลการทดลองเปรียบเทียบทั้งสองเทคนิค การเรียนรู้เชิงรุกประสิทธิภาพการจำแนกจะสูงกว่าการเรียนรู้เชิงรับเมื่อสร้างแบบจำลองด้วยตัวจำแนกนิวรอลเน็ตเวิร์กบนข้อมูลสองคลาสและข้อมูลแบบหลายฉลาก อย่างไรก็ตามเมื่อเทียบประสิทธิภาพการจำแนกกับการเรียนรู้เชิงรับที่สร้างแบบจำลองด้วยตัวจำแนกซัพพอร์ตเวกเตอร์แมชชีนจะมีค่าที่ต่ำกว่ามากบนชุดข้อมูลหลายฉลาก จึงเป็นที่มาการนำเสนอการเรียนรู้เชิงรุกด้วยตัวจำแนกซัพพอร์ตเวกเตอร์แมชชีน จากผลการทดลองเมื่อเปรียบเทียบการเรียนรู้เชิงรุกและการเรียนรู้เชิงรับเมื่อสร้างแบบจำลองด้วยซัพพอร์ตเวกเตอร์แมชชีน บนข้อมูลแบบหลายฉลากขนาดใหญ่ แสดงให้เห็นว่าทั้งสองวิธีการมีค่าประสิทธิภาพจากการวัดค่าเฉลี่ยไมโคร ค่าเฉลี่ยแมโคร ที่ใกล้เคียงกันและการเรียนรู้เชิงรุกจะใช้ขนาดข้อมูลที่น้อยกว่าการเรียนรู้เชิงรับในการเรียนรู้ ซึ่งจะเลือกใช้กลยุทธ์การเลือกข้อมูลที่เหมาะสมกับการเรียนรู้เพิ่มเติม วิทยานิพนธ์ฉบับนี้จึงนำเสนอกลยุทธ์การเรียนรู้เชิงรุก ได้แก่ การเรียนรู้เชิงรุกไม่เอนเอียง (UAL) และการเรียนรู้เชิงรุกด้วยเอสวีเอ็มเลือกข้อมูลไม่มั่นใจบนซัพพอร์ตเวกเตอร์แบบใช้ข้อมูลซ้ำ (AL-SVM-SV-R)
Other Abstract:	Nowadays, data are getting more complicated, where an instance in a dataset can represent multiple classes. The volume of data is usually large and getting larger. This thesis proposes Active learning, and this method gradually learns from the whole data by initially learning from the selected sample data, and it incrementally learns from misclassified examples. Hence, it can solve the large data volume issue. Moreover, multi-label data usually has the imbalanced data issue, therefore, this thesis uses appropriate technique for it. The experiments described herein used the Support Vector Machine (SVM) and Neural Network classifiers for the construction of an initial and incremental model. The experiments compared the results of both techniques in binary and multi-label data. The performance of Active learning was better than Passive learning, when both techniques were constructed by Neural Network. However, its performance was lower than the performance of Passive learning constructed by SVM classifier in multi-label data. Accordingly, Active learning with SVM was proposed instead. The experiments show the comparison of both techniques with SVM classifier on the multi-label large data. Both techniques have obtained similar result measured by micro-average and macro-average. Moreover, Active learning uses sizing of data less than Passive learning for learning and selecting strategies which are suitable with the incremental learning. In this thesis two strategies are proposed: UAL and AL-SVM-SV-R.
Description:	วิทยานิพนธ์ (วท.ม.)--จุฬาลงกรณ์มหาวิทยาลัย, 2559
Degree Name:	วิทยาศาสตรมหาบัณฑิต
Degree Level:	ปริญญาโท
Degree Discipline:	วิทยาศาสตร์คอมพิวเตอร์
URI:	http://cuir.car.chula.ac.th/handle/123456789/55510
URI:	http://doi.org/10.58837/CHULA.THE.2016.818
metadata.dc.identifier.DOI:	10.58837/CHULA.THE.2016.818
Type:	Thesis
Appears in Collections:	Eng - Theses

Files in This Item:

File	Description	Size	Format
5770949121.pdf		2.86 MB	Adobe PDF	View/Open

Show full item record