การจำแนกกลุ่มข้อมูลโดยอัลกอริทึม modified regression tree

พรพิมล อุดมมาลัย

DSpace Home
→
Faculty and Institute
→
Faculty of Commerce and Accountancy - Acctn
→
Acctn - Theses
→
View Item

dc.contributor.advisor	สุพล ดุรงค์วัฒนา	en_US
dc.contributor.author	พรพิมล อุดมมาลัย	en_US
dc.contributor.other	จุฬาลงกรณ์มหาวิทยาลัย. คณะพาณิชยศาสตร์และการบัญชี	en_US
dc.date.accessioned	2016-12-02T02:06:22Z
dc.date.available	2016-12-02T02:06:22Z
dc.date.issued	2558	en_US
dc.identifier.uri	http://cuir.car.chula.ac.th/handle/123456789/50912
dc.description	วิทยานิพนธ์ (วท.ม.)--จุฬาลงกรณ์มหาวิทยาลัย, 2558	en_US
dc.description.abstract	งานวิจัยฉบับนี้มีวัตถุประสงค์เพื่อศึกษากระบวนการทำงานของการจำแนกกลุ่มข้อมูลโดยใช้อัลกอริทึม MODIFIED REGRESSION TREE (MRT) ซึ่งอัลกอริทึมนี้ได้ถูกประยุกต์มาจากการวิเคราะห์การถดถอยเชิงเส้นอย่างง่าย (Simple Regression Analysis) และการวิเคราะห์การถดถอยเชิงเส้นพหุ (Multiple Regression Analysis) จะทำการจำลองข้อมูลในแต่ละกรณีโดยใช้โปรแกรม R ภายใต้ขนาดตัวอย่างจำนวน 200, 600 และ 1,800 จำนวน ตัวแปรอิสระจำนวน 2, 3 และ 4 ตัวแปร และค่าความแปรปรวนของความคลาดเคลื่อนมีขนาด 500, 10,000 และ 40,000 โดยที่มีระดับนัยสำคัญคือ 0.05 และ 0.10 อัลกอริทึมนี้มีกระบวนการคล้ายกับการคัดเลือกแบบไปข้างหน้าและมีขั้นตอนการทำงาน 2 ขั้นตอนคือการคัดเลือกตัวแปรอิสระและการแยก จะคัดเลือกตัวแปรอิสระที่มีค่า p-value น้อยที่สุดจากตัวแปรอิสระทั้งหมด จากนั้นนำมาเปรียบเทียบกับระดับนัยสำคัญที่กำหนดถ้าค่าของ p-value ของตัวแปรอิสระมีค่าน้อยกว่าก็จะนำตัวแปรอิสระตัวนั้นเข้ามาจำแนกกลุ่มโดยใช้ค่าเฉลี่ยเลขคณิตแต่ถ้าค่า p-value ของตัวแปรอิสระมีค่ามากกว่าจะหยุดกระบวนการ คัดเลือกตัวแปรอิสระตัวถัดมาภายในกลุ่มนั้นๆ จนกว่าจะไม่มีตัวแปรอิสระใดที่ทำการจำแนกได้แล้วจึงจะหยุดกระบวนการ จากนั้นจะทำการวัดประสิทธิภาพโดยวัดร้อยละความถูกต้อง จากการศึกษาพบว่าขนาดตัวอย่าง ระดับนัยสำคัญ และจำนวนของตัวแปรอิสระต่างก็ส่งผลให้ร้อยละความถูกต้องมีค่าเพิ่มขึ้นหรือไม่ก็ลดลง ร้อยละความถูกต้องมีแนวโน้มเพิ่มมากขึ้นเมื่อกำหนดขนาดตัวอย่างให้มีจำนวนมากขึ้น แต่ร้อยละความถูกต้องมีแนวโน้มลดลงเมื่อเพิ่มระดับนัยสำคัญและจำนวนของตัวแปรอิสระ ส่วนค่าความแปรปรวนของความคลาดเคลื่อนนั้นไม่ส่งผลต่อร้อยละความถูกต้อง	en_US
dc.description.abstractalternative	This research is aimed at studying the algorithm of classification named as MODIFIED REGRESSION TREE (MRT). The algorithm can be applied for either simple regression model or multiple regression model. The data are simulated under several situations by R free program. Each situation of simulated data depends upon the sample size of each set of data, the number of independent variables, the variance of random error in the regression model, and lastly the level of significance. The algorithm MRT has its procedure almost like the forward selection. There are 2 steps in this algorithm. Those are independent variable selection and splitting steps. These 2 steps combine as one hierarchy of the algorithm. Each independent variable is selected using the least p-value of the simple regression F-test. When the least p-value of the selected independent variable shows the statistical significance to be selected, then the arithmetic mean of that independent variables is used to binary split the data into 2 groups otherwise the algorithm will be stopped. In each of splitting group, the next hierarchy for the rest of independent variables will be classified separately and independently and so on until there is no independent variable to classify or the algorithm is stopped. In the study, the percentage of correct classification is used as the measure how good the algorithm. The results of the study show that when the number of sample size increases, the percentage of correct classification also increases; when the significance level increases, the percentage of correct classification decreases; when the number of independent variables increases, then the percentage of correct classification decreases; and when the value of variance for random error increases, then the percentage of correct classification is indifferent.	en_US
dc.language.iso	th	en_US
dc.publisher	จุฬาลงกรณ์มหาวิทยาลัย	en_US
dc.relation.uri	http://doi.org/10.14457/CU.the.2015.966
dc.rights	จุฬาลงกรณ์มหาวิทยาลัย	en_US
dc.subject	ข้อมูล -- การจำแนก
dc.subject	อัลกอริทึม
dc.subject	การวิเคราะห์การถดถอย
dc.subject	Classification
dc.subject	Algorithms
dc.subject	Regression analysis
dc.title	การจำแนกกลุ่มข้อมูลโดยอัลกอริทึม modified regression tree	en_US
dc.title.alternative	Data classification by modified regression tree algorithm	en_US
dc.type	Thesis	en_US
dc.degree.name	วิทยาศาสตรมหาบัณฑิต	en_US
dc.degree.level	ปริญญาโท	en_US
dc.degree.discipline	สถิติ	en_US
dc.degree.grantor	จุฬาลงกรณ์มหาวิทยาลัย	en_US
dc.email.advisor	Supol.D@Chula.ac.th,supol@cbs.chula.ac.th	en_US
dc.identifier.DOI	10.14457/CU.the.2015.966