การเปรียบเทียบวิธีพยากรณ์ในการวิเคราะห์ความถดถอยพหุคูณ โดยใช้วิธีริดจ์รีเกรสชันและวิธีที่ใช้หลักการของ โครงข่ายประสาทเทียมในกรณีที่เกิดพหุสัมพันธ์ระหว่างตัวแปรอิสระ

พัชรี คุณะสารพันธ์

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/9978

Title:	การเปรียบเทียบวิธีพยากรณ์ในการวิเคราะห์ความถดถอยพหุคูณ โดยใช้วิธีริดจ์รีเกรสชันและวิธีที่ใช้หลักการของ โครงข่ายประสาทเทียมในกรณีที่เกิดพหุสัมพันธ์ระหว่างตัวแปรอิสระ
Other Titles:	A Comparison on forecasting methods between ridge regression and artificial neural network methods in multiple regression analysis with multicollinearity
Authors:	พัชรี คุณะสารพันธ์
Advisors:	ธีระพร วีระถาวร
Other author:	จุฬาลงกรณ์มหาวิทยาลัย. บัณฑิตวิทยาลัย
Advisor's Email:	fcomtvr@acc.chula.ac.th
Subjects:	การวิเคราะห์การถดถอย การถดถอยริดจ์ นิวรัลเน็ตเวิร์ค (คอมพิวเตอร์) พหุสัมพันธ์ แบคพรอพาเกชัน (ปัญญาประดิษฐ์)
Issue Date:	2541
Publisher:	จุฬาลงกรณ์มหาวิทยาลัย
Abstract:	เปรียบเทียบความถูกต้องของค่าพยากรณ์ในการวิเคราะห์ความถดถอยพหุคูณ เมื่อเกิดพหุสัมพันธ์ระหว่างตัวแปรอิสระ โดยเปรียบเทียบวิธีริดจ์รีเกรสชัน (RR) และวิธีที่ใช้หลักการของโครงข่ายประสาทเทียม (ANN) ซึ่งเกณฑ์การเปรียบเทียบ คือ เปอร์เซ็นต์อัตราส่วนผลต่างค่าเฉลี่ย ของค่าความคลาดเคลื่อนกำลังสองเฉลี่ย ภายใต้สถานการณ์ต่างๆ ที่ศึกษา คือ การแจกแจงของความเคลื่อนที่ศึกษามี 3 การแจกแจง คือ การแจกแจงปกติ การแจกแจงปกติปลอมปน และการแจกแจงลอกนอร์มอล สำหรับการแจกแจงปกติ จะใช้ค่าเฉลี่ยเท่ากับ 1 และส่วนเบี่ยงเบนมาตรฐานเท่ากับ 0.1, 0.3 และ 0.5 ส่วนการแจกแจงปกติปลอมปน จะใช้สเกลแฟคเตอร์เท่ากับ 3 และ 10 เปอร์เซ็นต์ การปลอมปนเท่ากับ 5 และ 10 และการแจกแจงลอกนอร์มอล จะใช้ค่าเฉลี่ยเท่ากับ 1 ส่วนเบี่ยงเบนมาตรฐานเท่ากับ 0.2264, 0.5915 และ 1.0069 ตามลำดับ ขนาดตัวอย่างที่ใช้เท่ากับ 30, 50 และ 100 เมื่อจำนวนตัวแปรอิสระเท่ากับ 3 โดยมีระดับความสัมพันธ์ของตัวแปรอิสระแต่ละคู่เท่ากับ 0.1, 0.3, 0.5, 0.7, 0.9 และ 0.99 ตามลำดับ และเมื่อจำนวนตัวแปรอิสระเพิ่มขึ้นเท่ากับ 5 จะใช้ระดับความสัมพันธ์ของตัวแปรอิสระที่เพิ่มขึ้น (x4, x5) เท่ากับ 0.1, 0.3, 0.5, 0.7, 0.9 และ 0.99 ตามลำดับ ข้อมูลที่ใช้ในการวิจัยได้จากการจำลองด้วยเทคนิคมอนติคาร์โล ซึ่งกระทำซ้ำ 400 ครั้งในแต่ละสถานการณ์ ผลสรุปมีดังนี้ กรณีความคลาดเคลื่อนมีการแจกแจงปกติ และการแจกแจงลอกนอร์มอล พบว่า ความถูกต้องของการพยากรณ์ด้วยวิธี ANN จะดีขึ้นเมื่อขนาดตัวอย่าง จำนวนตัวแปรอิสระ และระดับความสัมพันธ์ของตัวแปรอิสระสูงขึ้น แต่จะลดลงเมื่อระดับสัมประสิทธิ์การแปรผันของความคลาดเคลื่อนสูงขึ้น ส่วนความถูกต้องของการพยากรณ์ด้วยวิธี RR จะดีขึ้น เมื่อขนาดตัวอย่างสูงขึ้น แต่จะลดลงเมื่อระดับความสัมพันธ์ของตัวแปรอิสระ ระดับสัมประสิทธิ์การแปรผันของความคลาดเคลื่อน และจำนวนตัวแปรอิสระสูงขึ้น โดยเรียงลำดับของอิทธิพลจากมากไปน้อย กรณีที่ความคลาดเคลื่อนมีการแจกแจงปกติปลอมปน พบว่า ความถูกต้องของการพยากรณ์ด้วยวิธี ANN จะดีขึ้นเมื่อขนาดตัวอย่าง จำนวนตัวแปรอิสระ และระดับความสัมพันธ์ของตัวแปรอิสระสูงขึ้น แต่จะลดลงเมื่อระดับสัมประสิทธิ์การแปรผันของความคลาดเคลื่อน สเกลแฟคเตอร์ และเปอร์เซ็นต์การปลอมปนสูงขึ้น ส่วนความถูกต้องของการพยากรณ์ด้วยวิธี RR จะดีขึ้นเมื่อขนาดตัวอย่างสูงขึ้น แต่จะลดลงเมื่อระดับความสัมพันธ์ของตัวแปรอิสระ ระดับสัมประสิทธิ์การแปรผันของความคลาดเคลื่อน จำนวนตัวแปรอิสระ สเกลแฟคเตอร์ และเปอร์เซ็นต์การปลอมปนสูงขึ้น โดยเรียงลำดับของอิทธิพลจากมากไปน้อย วิธี ANN จะใช้ในการพยาการณ์ได้ดีกว่าวิธี RR เมื่อควมคลาดเคลื่อนมีการแจกแจงลอกนอร์มอล และการแจกแจงปกติปลอมปน เรียงตามลำดับจากมากไปน้อย และเมื่อขนาดตัวอย่าง ระดับสัมประสิทธิ์การแปรผันของความคลาดเคลื่อนจำนวนตัวแปรอิสระ ระดับความสัมพันธ์ของตัวแปรอิสระ สเกลแฟคเตอร์ และเปอร์เซ็นต์การปลอมปนมีค่ามากขึ้น โดยเรียงลำดับอิทธิพลจากมากไปน้อย และวิธี RR จะใช้ในการพยากรณ์ได้ดีกว่าวิธี ANN เมื่อความคลาดเคลื่อนมีการแจกแจงปกติ
Other Abstract:	To compare the accuracy of forecasting value between multiple regression analysis of ridge regression (RR) method and artificial neural network (ANN) method when multicollinearity existing among independent variables. The criterion of comparison is the difference percentage ratio of average value of mean square error. This study used three residual distributions which are normal distribution, contaminated-normal distribution and lognormal distribution. For normal distribution, the mean of 1 and the standard deviations of 0.1, 0.3 and 0.5 are considered. For contaminated-normal distribution, the scale factors of 3 and 10, the percent of contaminations of 5 and 10 are studied. For lognormal distribution, the mean of 1, the standard deviations of 0.2264, 0.5915 and 1.0069 are tested. The sample sizes are 30, 50 and 100. When the number of independent variables is 3, the level of correlations among each pair independent variables are 0.1, 0.3, 0.5, 0.7, 0.9 and 0.99, respectively and the number of independent variables increases equal to 5, the level of correlations among independent variables (x4, x5) are 0.1, 0.3, 0.5, 0.7, 0.9 and 0.99, respectively. For each case, 400 randomly generated sets of data are used in the simulation using monte carlo technique. The result of this research can be summarized as follow. In case that residuals have nomal distribution and lognormal distribution, ranging the effect on accuracy from most to least, the accuracy of forecasting by ANN method improves as the sample size, the number of independent variables and the level of correlations among independent variables increases but decreases as the level of coefficient of variation increases. The accuracy of forecasting by RR method improves as the sample size increases but decreases as the level of correlations among independent variables, the level of coefficient of variation and the number of independent variables increases. In case the residuals have contaminated-normal distribution, ranging the effect on accuracy from most to least, the accuracy of forecasting by ANN method improves as the sample size, the number of independent variables and the level of correlations among independent variables increases but decreases as the level of coefficient of variation, the scale factors and the percent of contaminations increases. The accuracy of forecasting by RR method improves as the sample size increases but decreases as the level of correlations among independent variables, the level of coefficient of variation and the number of independent variables, the scale factors and the percent of contaminations increases. The performance of ANN method is better than that of RR method when the residuals have lognormal distribution and contaminated-normal distribution, ranging from most to least, and when the sample size, the level of coefficient of variation, the number of independent variables, the level of correlations among independent variables, the scale factors and the percent of contaminations, ranging from strongest effect to weakest effect, is larger. The performance RR method is better than that of ANN method when the residuals have normal distribution.
Description:	วิทยานิพนธ์ (สต.ม.)--จุฬาลงกรณ์มหาวิทยาลัย, 2541
Degree Name:	สถิติศาสตรมหาบัณฑิต
Degree Level:	ปริญญาโท
Degree Discipline:	สถิติ
URI:	http://cuir.car.chula.ac.th/handle/123456789/9978
ISBN:	9743311114
Type:	Thesis
Appears in Collections:	Grad - Theses

Files in This Item:

File	Size	Format
Phatcharee_Ku_front.pdf	1.2 MB	Adobe PDF	View/Open
Phatcharee_Ku_ch1.pdf	823.52 kB	Adobe PDF	View/Open
Phatcharee_Ku_ch2.pdf	950.26 kB	Adobe PDF	View/Open
Phatcharee_Ku_ch3.pdf	815.54 kB	Adobe PDF	View/Open
Phatcharee_Ku_ch4.pdf	2.73 MB	Adobe PDF	View/Open
Phatcharee_Ku_ch5.pdf	996.85 kB	Adobe PDF	View/Open
Phatcharee_Ku_back.pdf	943.79 kB	Adobe PDF	View/Open

Show full item record