มาตรวัดระยะห่างสำหรับข้อมูลแบบผสมกับการวิเคราะห์กลุ่ม

พิชญา บุตรขุนทอง

DSpace Home
→
Faculty and Institute
→
Faculty of Commerce and Accountancy - Acctn
→
Acctn - Theses
→
View Item

dc.contributor.advisor	อัครินทร์ ไพบูลย์พานิช
dc.contributor.author	พิชญา บุตรขุนทอง
dc.contributor.other	จุฬาลงกรณ์มหาวิทยาลัย. คณะพาณิชยศาสตร์และการบัญชี
dc.date.accessioned	2017-11-27T10:18:51Z
dc.date.available	2017-11-27T10:18:51Z
dc.date.issued	2558
dc.identifier.uri	http://cuir.car.chula.ac.th/handle/123456789/56444
dc.description	วิทยานิพนธ์ (วท.ม.)--จุฬาลงกรณ์มหาวิทยาลัย, 2558
dc.description.abstract	การศึกษาวิจัยนี้ได้เปรียบเทียบประสิทธิภาพการวิเคราะห์กลุ่มข้อมูลแบบผสม ซึ่งประกอบไปด้วยตัวแปรนามบัญญัติ ตัวแปรอันดับ และตัวแปรเชิงปริมาณ ด้วยอัลกอริทึมจัดกลุ่มโดยรอบมีดอยด์ โดยใช้มาตรวัดระยะห่างแบบต่าง ๆ คือ ระยะห่างของ Kaufman and Rousseeuw (KR) ระยะห่างของ Podani (P) ซึ่งทั้งสองพัฒนามาจากความคล้ายของ Gower และมาตรวัดระยะห่างที่เสนอขึ้นใหม่โดยประยุกต์ระยะห่างสำหรับตัวแปรนามบัญญัติของ Noorbehbahani et al. (N) ร่วมกับระยะห่างของ KR และระยะห่างของ P นั่นคือระยะห่างแบบ KR&N และระยะห่างแบบ P&N โดยจำลองข้อมูลแบบผสมและข้อมูลรูปแบบอื่น ๆ ที่ประกอบไปด้วยตัวแปรต่างชนิดกัน และกำหนดให้ทราบกลุ่มแน่ชัด โดยศึกษาภายใต้ขอบเขตค่าสัมประสิทธิ์สหสัมพันธ์ระหว่างตัวแปรเท่ากับ 0.2 และ 0.8 ขนาดข้อมูลต่อกลุ่มเท่ากับ 20 และ 100 จำนวนกลุ่มข้อมูลเท่ากับ 3 และ 5 จำนวนประเภทของตัวแปรนามบัญญัติและจำนวนอันดับของตัวแปรอันดับเท่ากับ 5 และพิจารณากรณีที่ความถี่ของข้อมูลแต่ละประเภทหรืออันดับแตกต่างและไม่แตกต่างกัน ผลการศึกษาพบว่า กรณีที่ความถี่ของข้อมูลแต่ละประเภทหรืออันดับแตกต่างกัน สำหรับข้อมูลแบบผสมที่ประกอบไปด้วยตัวแปรทั้ง 3 ชนิด การวิเคราะห์กลุ่มด้วยระยะห่างแบบ KR&N มีประสิทธิภาพดีที่สุด และการวิเคราะห์กลุ่มด้วยระยะห่างแบบ P&N มีประสิทธิภาพรองลงมา นอกจากนี้ระยะห่างแบบ KR&N เหมาะสำหรับการวิเคราะห์กลุ่มข้อมูลที่ประกอบไปด้วยทั้งตัวแปรนามบัญญัติและตัวแปรเชิงปริมาณ ขณะที่ระยะห่างของ KR เหมาะสำหรับการวิเคราะห์กลุ่มข้อมูลที่ประกอบไปด้วยทั้งตัวแปรอันดับและตัวแปรเชิงปริมาณ อย่างไรก็ตามกรณีที่ความถี่ของข้อมูลแต่ละประเภทหรืออันดับไม่แตกต่างกัน พบว่า โดยส่วนใหญ่การวิเคราะห์กลุ่มข้อมูลด้วยระยะห่างแบบต่าง ๆ มีประสิทธิภาพไม่แตกต่างกัน
dc.description.abstractalternative	This study presents performance comparison of cluster analysis through Partitioning Around Medoids algorithm, for mixed data which contains nominal, ordinal, and numerical variables, using different types of distance measures: Kaufman and Rousseeuw distance (KR) and Podani distance (P) which are adapted from Gower’s similarity, and two newly proposed distance measures: one is a combination between KR and Noorbehbahani et al. distance (KR&N) and the other is a combination between P and Noorbehbahani et al. distance (P&N). Mixed data and other types of data were simulated with equal and unequal frequency of nominal and ordinal variables. This study also sets correlations between variables at 0.2 and 0.8, 20 and 100 instances per group, 3 and 5 groups, and 5 values of nominal and ordinal variables. In case of unequal frequency data, the clustering using KR&N distance gives better result for mixed data. Moreover, the clustering using KR&N distance is suitable for the data which contains both nominal and numerical variables, while the clustering using KR distance is suitable for the data which contains only ordinal and numerical variables. However, in case of equal frequency data, four distances show similar efficiency.
dc.language.iso	th
dc.publisher	จุฬาลงกรณ์มหาวิทยาลัย
dc.rights	จุฬาลงกรณ์มหาวิทยาลัย
dc.title	มาตรวัดระยะห่างสำหรับข้อมูลแบบผสมกับการวิเคราะห์กลุ่ม
dc.title.alternative	DISTANCE MEASURES FOR MIXED DATA WITH APPLICATION IN CLUSTER ANALYSIS
dc.type	Thesis
dc.degree.name	วิทยาศาสตรมหาบัณฑิต
dc.degree.level	ปริญญาโท
dc.degree.discipline	สถิติ
dc.degree.grantor	จุฬาลงกรณ์มหาวิทยาลัย
dc.email.advisor	Akarin.P@chula.ac.th,akarin@cbs.chula.ac.th