Abstract:
In this dissertation, we proposed an applying double clustering technique for intrusion detection in large-scale log. Log files are list of actions, events and activities that happened in the system. These data of log files are humungous and useless. Therefore, log analysis is another way to enhance the security of the system. K-Mean algorithm and Parallel FP-Growth based on Apache Mahout are applied to cluster these log files and discover the frequent patterns to generate the normal profiles respectively. After the normal patterns are generated, the normal records will be removed from the data set. Therefore, the remaining records are the suspect intrusion records. These remaining records are partitioned and analyzed once again. Finally, the characteristics of these suspect intrusion records are generated. These characteristics are new knowledge and useful to enhance the security of the system.