Abstract:
Knowledge discovery has been adopted widely in many fields. Clustering algorithm is a step that filters or partitions data into manageable sizes. B. Kaveelerdpotjana, et al. proposed a simple and efficient the half-orbital extreme pole clustering algorithm with only a single input parameter. The algorithm uses extreme poles and the core-vector to partition a dataset into bins along this vector. Because of its simplicity to split along the core-vector, some characteristics might be lost during the clustering process. In this thesis, Bi-orbital extreme pole clustering algorithm (BOEP) extracts the secondary information along the core-vector. BOEP uses the mean-shift smoothing algorithm in each bin to group instances. It links each group based on the distance from others. The connected groups are considered to belong to the same group. This process continues until all instances in the dataset are clustered. Two types of datasets are used to measure the performance of BOEP. The first type is the simulated multivariate normal distribution datasets of one, two, and three clusters with assigned target values. BOEP is able to classified instances statistical better than HOEP, especially in the case of two and three clusters using the paired t-tests. The second type is the UCI datasets, namely, IRIS, WINE, and E-COLI. BOEP is able to find a better separation between groups comparing with HOEP, k-mean, and DBSCAN using Have and Save as the performance measure.