มาตรวัดระยะห่างสำหรับข้อมูลแบบผสมกับการวิเคราะห์กลุ่ม

พิชญา บุตรขุนทอง

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/56444

Title:	มาตรวัดระยะห่างสำหรับข้อมูลแบบผสมกับการวิเคราะห์กลุ่ม
Other Titles:	DISTANCE MEASURES FOR MIXED DATA WITH APPLICATION IN CLUSTER ANALYSIS
Authors:	พิชญา บุตรขุนทอง
Advisors:	อัครินทร์ ไพบูลย์พานิช
Other author:	จุฬาลงกรณ์มหาวิทยาลัย. คณะพาณิชยศาสตร์และการบัญชี
Advisor's Email:	Akarin.P@chula.ac.th,akarin@cbs.chula.ac.th
Issue Date:	2558
Publisher:	จุฬาลงกรณ์มหาวิทยาลัย
Abstract:	การศึกษาวิจัยนี้ได้เปรียบเทียบประสิทธิภาพการวิเคราะห์กลุ่มข้อมูลแบบผสม ซึ่งประกอบไปด้วยตัวแปรนามบัญญัติ ตัวแปรอันดับ และตัวแปรเชิงปริมาณ ด้วยอัลกอริทึมจัดกลุ่มโดยรอบมีดอยด์ โดยใช้มาตรวัดระยะห่างแบบต่าง ๆ คือ ระยะห่างของ Kaufman and Rousseeuw (KR) ระยะห่างของ Podani (P) ซึ่งทั้งสองพัฒนามาจากความคล้ายของ Gower และมาตรวัดระยะห่างที่เสนอขึ้นใหม่โดยประยุกต์ระยะห่างสำหรับตัวแปรนามบัญญัติของ Noorbehbahani et al. (N) ร่วมกับระยะห่างของ KR และระยะห่างของ P นั่นคือระยะห่างแบบ KR&N และระยะห่างแบบ P&N โดยจำลองข้อมูลแบบผสมและข้อมูลรูปแบบอื่น ๆ ที่ประกอบไปด้วยตัวแปรต่างชนิดกัน และกำหนดให้ทราบกลุ่มแน่ชัด โดยศึกษาภายใต้ขอบเขตค่าสัมประสิทธิ์สหสัมพันธ์ระหว่างตัวแปรเท่ากับ 0.2 และ 0.8 ขนาดข้อมูลต่อกลุ่มเท่ากับ 20 และ 100 จำนวนกลุ่มข้อมูลเท่ากับ 3 และ 5 จำนวนประเภทของตัวแปรนามบัญญัติและจำนวนอันดับของตัวแปรอันดับเท่ากับ 5 และพิจารณากรณีที่ความถี่ของข้อมูลแต่ละประเภทหรืออันดับแตกต่างและไม่แตกต่างกัน ผลการศึกษาพบว่า กรณีที่ความถี่ของข้อมูลแต่ละประเภทหรืออันดับแตกต่างกัน สำหรับข้อมูลแบบผสมที่ประกอบไปด้วยตัวแปรทั้ง 3 ชนิด การวิเคราะห์กลุ่มด้วยระยะห่างแบบ KR&N มีประสิทธิภาพดีที่สุด และการวิเคราะห์กลุ่มด้วยระยะห่างแบบ P&N มีประสิทธิภาพรองลงมา นอกจากนี้ระยะห่างแบบ KR&N เหมาะสำหรับการวิเคราะห์กลุ่มข้อมูลที่ประกอบไปด้วยทั้งตัวแปรนามบัญญัติและตัวแปรเชิงปริมาณ ขณะที่ระยะห่างของ KR เหมาะสำหรับการวิเคราะห์กลุ่มข้อมูลที่ประกอบไปด้วยทั้งตัวแปรอันดับและตัวแปรเชิงปริมาณ อย่างไรก็ตามกรณีที่ความถี่ของข้อมูลแต่ละประเภทหรืออันดับไม่แตกต่างกัน พบว่า โดยส่วนใหญ่การวิเคราะห์กลุ่มข้อมูลด้วยระยะห่างแบบต่าง ๆ มีประสิทธิภาพไม่แตกต่างกัน
Other Abstract:	This study presents performance comparison of cluster analysis through Partitioning Around Medoids algorithm, for mixed data which contains nominal, ordinal, and numerical variables, using different types of distance measures: Kaufman and Rousseeuw distance (KR) and Podani distance (P) which are adapted from Gower’s similarity, and two newly proposed distance measures: one is a combination between KR and Noorbehbahani et al. distance (KR&N) and the other is a combination between P and Noorbehbahani et al. distance (P&N). Mixed data and other types of data were simulated with equal and unequal frequency of nominal and ordinal variables. This study also sets correlations between variables at 0.2 and 0.8, 20 and 100 instances per group, 3 and 5 groups, and 5 values of nominal and ordinal variables. In case of unequal frequency data, the clustering using KR&N distance gives better result for mixed data. Moreover, the clustering using KR&N distance is suitable for the data which contains both nominal and numerical variables, while the clustering using KR distance is suitable for the data which contains only ordinal and numerical variables. However, in case of equal frequency data, four distances show similar efficiency.
Description:	วิทยานิพนธ์ (วท.ม.)--จุฬาลงกรณ์มหาวิทยาลัย, 2558
Degree Name:	วิทยาศาสตรมหาบัณฑิต
Degree Level:	ปริญญาโท
Degree Discipline:	สถิติ
URI:	http://cuir.car.chula.ac.th/handle/123456789/56444
Type:	Thesis
Appears in Collections:	Acctn - Theses

Files in This Item:

File	Description	Size	Format
5681567826.pdf		7.34 MB	Adobe PDF	View/Open

Show full item record