Abstract:
Outlier detection is one of the widely studied topics in data mining. It can be applied to real world problems. A current active research in this field is to develop an outlier scoring algorithm to generate score which represents a degree of outlier for each instance. Local Outlier Factor or LOF is designed to score all instances in a dataset based on a local deviation of a given instance with respect to its k nearest neighbors. The LOF algorithm for computing LOF depends on this crucial parameter k. To avoid setting any parameter, this thesis proposes a new outlier score called the Ordered distance difference Outlier Factor or OOF. The OOF algorithm uses the ordered distance difference concept to compute outlier scores of all instances without any parameters. To compare the effectiveness between scores, we apply various outlier scores to five UCL datasets and a generated multivariate Guassian distribution dataset. We report instances from the top-10 ranks and count the number of instances within that top-10, then we compare the results with six other outlier techniques such as LOF, OOF, Connectivity-based Outlier Factor (COF), LOcal Correlation Integral score (LOCI), Local Outlier Probability (LoOP) and INFLuenced Outlierness (INFLO).