การจำแนกกลุ่มข้อมูลโดยอัลกอริทึม modified regression tree

พรพิมล อุดมมาลัย

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/50912

Title:	การจำแนกกลุ่มข้อมูลโดยอัลกอริทึม modified regression tree
Other Titles:	Data classification by modified regression tree algorithm
Authors:	พรพิมล อุดมมาลัย
Advisors:	สุพล ดุรงค์วัฒนา
Other author:	จุฬาลงกรณ์มหาวิทยาลัย. คณะพาณิชยศาสตร์และการบัญชี
Advisor's Email:	Supol.D@Chula.ac.th,supol@cbs.chula.ac.th
Subjects:	ข้อมูล -- การจำแนก อัลกอริทึม การวิเคราะห์การถดถอย Classification Algorithms Regression analysis
Issue Date:	2558
Publisher:	จุฬาลงกรณ์มหาวิทยาลัย
Abstract:	งานวิจัยฉบับนี้มีวัตถุประสงค์เพื่อศึกษากระบวนการทำงานของการจำแนกกลุ่มข้อมูลโดยใช้อัลกอริทึม MODIFIED REGRESSION TREE (MRT) ซึ่งอัลกอริทึมนี้ได้ถูกประยุกต์มาจากการวิเคราะห์การถดถอยเชิงเส้นอย่างง่าย (Simple Regression Analysis) และการวิเคราะห์การถดถอยเชิงเส้นพหุ (Multiple Regression Analysis) จะทำการจำลองข้อมูลในแต่ละกรณีโดยใช้โปรแกรม R ภายใต้ขนาดตัวอย่างจำนวน 200, 600 และ 1,800 จำนวน ตัวแปรอิสระจำนวน 2, 3 และ 4 ตัวแปร และค่าความแปรปรวนของความคลาดเคลื่อนมีขนาด 500, 10,000 และ 40,000 โดยที่มีระดับนัยสำคัญคือ 0.05 และ 0.10 อัลกอริทึมนี้มีกระบวนการคล้ายกับการคัดเลือกแบบไปข้างหน้าและมีขั้นตอนการทำงาน 2 ขั้นตอนคือการคัดเลือกตัวแปรอิสระและการแยก จะคัดเลือกตัวแปรอิสระที่มีค่า p-value น้อยที่สุดจากตัวแปรอิสระทั้งหมด จากนั้นนำมาเปรียบเทียบกับระดับนัยสำคัญที่กำหนดถ้าค่าของ p-value ของตัวแปรอิสระมีค่าน้อยกว่าก็จะนำตัวแปรอิสระตัวนั้นเข้ามาจำแนกกลุ่มโดยใช้ค่าเฉลี่ยเลขคณิตแต่ถ้าค่า p-value ของตัวแปรอิสระมีค่ามากกว่าจะหยุดกระบวนการ คัดเลือกตัวแปรอิสระตัวถัดมาภายในกลุ่มนั้นๆ จนกว่าจะไม่มีตัวแปรอิสระใดที่ทำการจำแนกได้แล้วจึงจะหยุดกระบวนการ จากนั้นจะทำการวัดประสิทธิภาพโดยวัดร้อยละความถูกต้อง จากการศึกษาพบว่าขนาดตัวอย่าง ระดับนัยสำคัญ และจำนวนของตัวแปรอิสระต่างก็ส่งผลให้ร้อยละความถูกต้องมีค่าเพิ่มขึ้นหรือไม่ก็ลดลง ร้อยละความถูกต้องมีแนวโน้มเพิ่มมากขึ้นเมื่อกำหนดขนาดตัวอย่างให้มีจำนวนมากขึ้น แต่ร้อยละความถูกต้องมีแนวโน้มลดลงเมื่อเพิ่มระดับนัยสำคัญและจำนวนของตัวแปรอิสระ ส่วนค่าความแปรปรวนของความคลาดเคลื่อนนั้นไม่ส่งผลต่อร้อยละความถูกต้อง
Other Abstract:	This research is aimed at studying the algorithm of classification named as MODIFIED REGRESSION TREE (MRT). The algorithm can be applied for either simple regression model or multiple regression model. The data are simulated under several situations by R free program. Each situation of simulated data depends upon the sample size of each set of data, the number of independent variables, the variance of random error in the regression model, and lastly the level of significance. The algorithm MRT has its procedure almost like the forward selection. There are 2 steps in this algorithm. Those are independent variable selection and splitting steps. These 2 steps combine as one hierarchy of the algorithm. Each independent variable is selected using the least p-value of the simple regression F-test. When the least p-value of the selected independent variable shows the statistical significance to be selected, then the arithmetic mean of that independent variables is used to binary split the data into 2 groups otherwise the algorithm will be stopped. In each of splitting group, the next hierarchy for the rest of independent variables will be classified separately and independently and so on until there is no independent variable to classify or the algorithm is stopped. In the study, the percentage of correct classification is used as the measure how good the algorithm. The results of the study show that when the number of sample size increases, the percentage of correct classification also increases; when the significance level increases, the percentage of correct classification decreases; when the number of independent variables increases, then the percentage of correct classification decreases; and when the value of variance for random error increases, then the percentage of correct classification is indifferent.
Description:	วิทยานิพนธ์ (วท.ม.)--จุฬาลงกรณ์มหาวิทยาลัย, 2558
Degree Name:	วิทยาศาสตรมหาบัณฑิต
Degree Level:	ปริญญาโท
Degree Discipline:	สถิติ
URI:	http://cuir.car.chula.ac.th/handle/123456789/50912
URI:	http://doi.org/10.14457/CU.the.2015.966
metadata.dc.identifier.DOI:	10.14457/CU.the.2015.966
Type:	Thesis
Appears in Collections:	Acctn - Theses

Files in This Item:

File	Description	Size	Format
5681563226.pdf		2.11 MB	Adobe PDF	View/Open

Show full item record