การเปรียบเทียบวิธีการประมาณค่าสูญหายในการวิเคราะห์ตัวแปรพหุ

พรศิริ หมื่นไชยศรี

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/17408

Title:	การเปรียบเทียบวิธีการประมาณค่าสูญหายในการวิเคราะห์ตัวแปรพหุ
Other Titles:	A comparison of missing values estimation methods in multivariate analysis
Authors:	พรศิริ หมื่นไชยศรี
Advisors:	สรชัย พิศาลบุตร
Other author:	จุฬาลงกรณ์มหาวิทยาลัย. บัณฑิตวิทยาลัย
Advisor's Email:	ไม่มีข้อมูล
Subjects:	ตัวแปรพหุ -- การวิเคราะห์
Issue Date:	2529
Publisher:	จุฬาลงกรณ์มหาวิทยาลัย
Abstract:	ในการวิเคราะห์ตัวแปรพหุ หากมีปัญหาเกี่ยวกับข้อมูลบางตัวสูญหายจะทำให้ไม่สามารถวิเคราะห์ข้อมูลได้ วิธีการแก้ปัญหาอย่างหนึ่งก็คือ ตัดค่าสังเกตชุดนั้นทิ้งไป แต่การแก้ปัญหาโดยวิธีนี้จะมีผลทำให้จำนวนค่าสังเกตน้อยลง และสูญเสียรายละเอียดของข้อมูลบางตัวไป วิธีการแก้ปัญหาอีกวิธีหนึ่งก็คือ ต้องทำการประมาณค่าสูญหายนั้น แต่เนื่องจากวิธีการประมาณค่าสูญหายมีหลายวิธีซึ่งแต่ละวิธีต่างก็มีข้อดีและข้อเสียแตกต่างกันไป ดังนั้นการวิจัยนี้จึงสนใจเปรียบเทียบวิธีการประมาณค่าสูญหายที่นิยมใช้กันทั่วไป 4 วิธีคือ วิธีที่ใช้ค่าเฉลี่ย วิธีวิเคราะห์ความถดถอยพหุเชิงเส้น วิธีวิเคราะห์ความถดถอยพหุเชิงเส้นดัดแปลง และวิธีวิเคราะห์ส่วนประกอบหลัก โดยใช้ค่าความคลาดเคลื่อนเฉลี่ยเป็นเกณฑ์ในการเปรียบเทียบสถานการณ์ต่างๆ ซึ่งจำลองการทดลองขึ้นโดยใช้เทคนิคมอนดิคาร์โล แต่ละสถานการณ์ต่างกันขึ้นอยู่กับขนาดตัวอย่าง n = 30 50 70 100 200 จำนวนตัวแปร p = 3 5 7 10 และขนาดความสัมพันธ์ระหว่างตัวแปร p = 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 แต่เนื่องจากบางช่วงของการทดลอง ถ้า p มีค่าแตกต่างกัน มีผลการเรียงลำดับของค่าความคลาดเคลื่อนเฉลี่ยเหมือนกัน ดังนั้นผู้วิจัยจึงไม่ได้นำสถานการณ์เหล่านั้นมาเสนอในวิทยานิพนธ์ แต่จะเสนอสถานการณ์ที่แตกต่างกันเพียง 106 สถานการณ์เท่านั้น จากการวิจัยที่ระดับนัยสำคัญ 0.05 พบว่าวิธีการประมาณค่าสูญหายทั้ง 4 วิธีให้ค่าความคลาดเคลื่อนเฉลี่ยไม่แตกต่างกันอย่างมีนัยสำคัญ ดังนั้นจึงกล่าวได้ว่า ไม่ว่าจะเป็นสถานการณ์ใดก็ตามที่กำหนดเหล่านี้ ถ้าหากมีข้อมูลสูญหายเกิดขึ้น สามารถเลือกวิธีการประมาณค่าสูญหายวิธีใดก็ได้ใน 4 วิธีนี้ แต่วิธีการประมาณค่าสูญหายที่ง่ายที่สุดและใช้เวลาในการประมวลผลน้อยที่สุดคือวิธีค่าเฉลี่ย ซึ่งเป็นวิธีการประมาณค่าสูญหายที่จะทำให้ได้ค่าความคลาดเคลื่อนเฉลี่ยไม่แตกต่างไปจากการใช้วิธีการประมาณอีก 3 วิธีที่เหลือ แต่อย่างไรก็ตาม ถ้าพิจารณาให้ละเอียดในแต่ละสถานการณ์ เมื่อเปรียบเทียบค่าความคลาดเคลื่อนเฉลี่ยแล้วพบว่ามีค่าแตกต่างกัน แม้ว่าจะไม่แตกต่างกันอย่างมีนัยสำคัญก็ตาม แต่ในการประมาณต่างๆ ผู้วิจัยต้องพยายามทำให้ค่าความคลาดเคลื่อนเฉลี่ยมีค่าน้อยที่สุด อาจกล่าวได้ว่า ถ้า p = 3 วิธีที่ใช้ค่าเฉลี่ยจะดีที่สุดเมื่อ p = 0.1 วิธีวิเคราะห์ความถดถอยพหุเชิงเส้นดัดแปลง จะดีที่สุดเมื่อ p = 0.2-0.7 วิธีวิเคราะห์ส่วนประกอบหลัก จะดีที่สุดเมื่อ p = 0.9 ถ้า p = 5 วิธีที่ใช้ค่าเฉลี่ย จะดีที่สุดเมื่อ p = 0.1 -0.2 วิธีวิเคราะห์ความถดถอยพหุเชิงเส้นดัดแปลง จะดีที่สุดเมื่อ p = 0.3 วิธีวิเคราะห์ส่วนประกอบหลักจะดีที่สุดเมื่อ p = 0.5 – 0.9 ถ้า p = 7 วิธีที่ใช้ค่าเฉลี่ยจะดีที่สุดเมื่อ p = 0.1 – 0.2 วิธีวิเคราะห์ความถดถอยพหุเชิงเส้นดัดแปลงจะดีที่สุดเมื่อ p = 0.3 -0.4 วิธีวิเคราะห์ส่วนประกอบหลักจะดีที่สุดเมื่อ p = 0.5 – 0.8 ถ้า p = 10 วิธีวิเคราะห์ส่วนประกอบหลักจะดีที่สุดเมื่อ p = 0.2 -0.5
Other Abstract:	The purpose of this study is to investigate the four well known missing value estimation methods in multivariate analysis namely, 1) Mean 2) Multiple Linear Regression 3) Modified Multiple Linear Regression 4) Principal Component, using mean square errors as means of comparison. The data for each experiment were obtained through simulation using the Monte Carlo technique. The computer program was designed to calculate the mean square error for each methods in different situations with varying sample size n = 30 50 70 100 200 number of variables p = 3 5 7 10 and correlation coefficient p = 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 However, some intervals of p provide the same ranking results of mean square error. These situations are omitted and thus only 106 different situations are presented in the thesis. The result of this study shows that, at 5% level of significance, mean square errors of the four methods are not significantly different. So if there exists the missing value problem, any one of these methods can be used to estimate the missing value. Nevertheless, the easiest method which also uses least processing time is the first method, Mean. In addition, attempt to obtain the smallest mean square error is made by considering for each situation which method has the smallest mean square error. The results are up to n, p and p. Conclusively, if the number of variables are three, Mean is the best when p = 0.1, Modified multiple linear regression is the best when p = 0.2 - 0.7 and Principal component is the best when p = 0.9. If the number of variables are five, Mean is the best when p = 0.1 - 0.2 Modified multiple linear regression is the best when p = 0.3 and Principal component is the best when p = 0.5 - 0.9. If the number of variables are seven Mean is the best when p = 0.1 - 0.2, Modified multiple linear regression is the best when p = 0.3 - 0.4 and Principal component is the best when p = 0.5 - 0.8. If the number of variables are ten, Principal component is the best when p = 0.2 - 0.5.
Description:	วิทยานิพนธ์ (สต.ม.)--จุฬาลงกรณ์มหาวิทยาลัย, 2529
Degree Name:	สถิติศาสตรมหาบัณฑิต
Degree Level:	ปริญญาโท
Degree Discipline:	สถิติ
URI:	http://cuir.car.chula.ac.th/handle/123456789/17408
ISBN:	9745664758
Type:	Thesis
Appears in Collections:	Grad - Theses

Files in This Item:

File	Size	Format
Pornsiri_Mu_front.pdf	298.53 kB	Adobe PDF	View/Open
Pornsiri_Mu_ch1.pdf	248.69 kB	Adobe PDF	View/Open
Pornsiri_Mu_ch2.pdf	291.48 kB	Adobe PDF	View/Open
Pornsiri_Mu_ch3.pdf	289.85 kB	Adobe PDF	View/Open
Pornsiri_Mu_ch4.pdf	548.13 kB	Adobe PDF	View/Open
Pornsiri_Mu_ch5.pdf	238.97 kB	Adobe PDF	View/Open
Pornsiri_Mu_back.pdf	499.23 kB	Adobe PDF	View/Open

Show full item record