Extreme anomalous clustering algorithm

Panuruk Lisuwan

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/72794

Title:	Extreme anomalous clustering algorithm
Other Titles:	ขั้นตอนวิธีเกาะกลุ่มข้อมูลแบบผิดปกติสุดขีด
Authors:	Panuruk Lisuwan
Advisors:	Petarpa Boonserm Krung Sinapiromsaran
Other author:	Chulalongkorn University. Faculty of Science
Advisor's Email:	Petarpa.B@Chula.ac.th,petarpa.B@Chula.ac.th krung.s@chula.ac.th
Issue Date:	2017
Publisher:	Chulalongkorn University
Abstract:	A clustering algorithm is a process of dividing data points into disjoint clusters according to the similarity of the data points. Many researchers have presented and developed the clustering algorithms to be suitable for the different characteristics of the datasets. Therefore, the concept of the identifying anomaly data points in the dataset was adopted in this thesis to explain the similarity of the data points. The main idea is to calculate the minimum distance between data points to identify the extreme anomalous score of all data points. In this thesis, a novel clustering algorithm is proposed called the Extreme Anomalous Clustering algorithm or EAC. This algorithm specifies the extreme anomalous scores for all data points in the dataset and combines two data points with the smallest extreme anomalous score. Then the algorithm selects one data point within the cluster as the representative point which is used to consider the next combination. The experiments of this thesis are created to compare the performance of the EAC algorithm on the simulated datasets and UCI datasets with AGNES, k-means, and DBSCAN. The experimental results show that the EAC algorithm is better than the three algorithms according to the Silhouette and Rand index measurements.
Other Abstract:	ขั้นตอนวิธีเกาะกลุ่มข้อมูลคือขั้นตอนการแบ่งจุดข้อมูลออกเป็นกลุ่มที่แยกออกจากกันตามความคล้ายคลึงกันของจุดข้อมูล นักวิจัยจำนวนมากได้นำเสนอและพัฒนาขั้นตอนวิธีเกาะกลุ่มข้อมูลเพื่อให้เหมาะสมกับข้อมูล ที่มีลักษณะแตกต่างกัน ดังนั้นแนวคิดเรื่องการระบุจุดข้อมูลที่ผิดปกติในชุดข้อมูล จึงถูกนำมาใช้ ในวิทยานิพนธ์ นี้เพื่ออธิบายความคล้ายคลึงกันของข้อมูล โดยแนวคิดหลักคือการคำนวณระยะทางที่สั้นที่สุดระหว่าง จุดข้อมูลเพื่อหาคะแนนความผิดปกติสุดขีดของจุf ข้อมูลทั้งหมด ในวิทยานิพนธ์นี้เรานำเสนอขั้นตอนวิธี เกาะกลุ่มข้อมูลแบบใหม่เรียกว่าขั้นตอนวิธีเกาะกลุ่ม ข้อมูลแบบผิดปกติสุดขีด หรือ อีเอซี ขั้นตอนวิธีนี้ทำการระบุคะแนนความผิดปกติสุดขีดให้กับข้อมูลทั้งหมด และรวมจุดข้อมูลสองจุดที่มีคะแนนความผิดปกติสุดขีดน้อยที่สุด จากนั้นทำการเลือกจุดข้อมูลภายในกลุ่ม เป็นจุดตัวแทนซึ่งใช้เพื่อพิจารณาการรวมกลุ่มในครั้งต่อไป การทดลองของวิทยานิพนธ์นี้สร้างขึ้นเพื่อทดสอบ ประสิทธิภาพ ของขั้นตอนวิธีเกาะกลุ่มข้อมูลแบบผิดปกติสุดขีดบนชุดข้อมูลจำลองและชุดข้อมูลจริงจากยูซีไอ เปรียบเทียบกับขั้นตอนวิธีเกาะกลุ่มข้อมูลแบบรวมกัน ขั้นตอนวิธีเกาะกลุ่มข้อมูลแบบเคมีนและขั้นตอน วิธีเกาะกลุ่มข้อมูลแบบดีบีสแกน ผลการทดลองแสดงให้เห็นว่าขั้นตอนวิธีเกาะกลุ่มข้อมูลแบบผิดปกติสุดขีดดี กว่าขั้นตอน วิธีเกาะกลุ่มข้อมูลทั้งสามแบบโดยการประเมินด้วยการวัดแบบซิลูเอ็ตและการวัดดัชนีของแรนด์
Description:	Thesis (M.Sc.)--Chulalongkorn University, 2017
Degree Name:	Master of Science
Degree Level:	Master's Degree
Degree Discipline:	Mathematics
URI:	http://cuir.car.chula.ac.th/handle/123456789/72794
URI:	http://doi.org/10.58837/CHULA.THE.2017.329
metadata.dc.identifier.DOI:	10.58837/CHULA.THE.2017.329
Type:	Thesis
Appears in Collections:	Sci - Theses

Files in This Item:

File	Description	Size	Format
5772100023_Sc_2017.pdf		1.09 MB	Adobe PDF	View/Open

Show full item record