การจำแนกกลุ่มข้อมูลโดยอัลกอริทึม ANOVAID

นวทิพย์ ไมตรี

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/50911

Title:	การจำแนกกลุ่มข้อมูลโดยอัลกอริทึม ANOVAID
Other Titles:	Data classification by ANOVAID algorithm
Authors:	นวทิพย์ ไมตรี
Advisors:	สุพล ดุรงค์วัฒนา
Other author:	จุฬาลงกรณ์มหาวิทยาลัย. คณะพาณิชยศาสตร์และการบัญชี
Advisor's Email:	Supol.D@Chula.ac.th,supol@cbs.chula.ac.th
Subjects:	สถิติ -- การประมวลผลข้อมูล ข้อมูล -- การจำแนก อัลกอริทึม Statistics -- Data processing Algorithms
Issue Date:	2558
Publisher:	จุฬาลงกรณ์มหาวิทยาลัย
Abstract:	งานวิจัยฉบับนี้มีวัตถุประสงค์เพื่อศึกษากระบวนการจำแนกกลุ่มข้อมูลโดยอัลกอริทึม ANOVAID ซึ่งเป็นส่วนผสมของการใช้การวิเคราะห์ความแปรปรวนทางเดียวและสถิติทดสอบ t สำหรับกลุ่มตัวอย่าง 2 กลุ่มที่เป็นอิสระกัน โดยตัวแปรตามเป็นตัวแปรเชิงปริมาณและตัวแปรอิสระเป็นตัวแปรเชิงคุณภาพ อัลกอริทึมนี้มีขั้นตอนในการทำงาน 2 ขั้นตอน คือ ขั้นตอนในการคัดเลือกตัวแปรอิสระและขั้นตอนในการรวมกลุ่มของตัวแปรอิสระนั้น โดยในการคัดเลือกตัวแปรอิสระนั้น จะพิจารณาจากค่า p-value น้อยสุด จากการวิเคราะห์ความแปรปรวนทางเดียว เมื่อเปรียบเทียบกันระหว่างตัวแปรอิสระทั้งหมด โดยที่ค่า p-value ต้องมีนัยสำคัญด้วยจึงจะเลือกตัวแปรอิสระนั้นเข้ามาในกระบวนการ จากนั้นจะใช้สถิติทดสอบ t สำหรับกลุ่มตัวอย่าง 2 กลุ่มที่เป็นอิสระกันในการรวมกลุ่มของตัวแปรอิสระที่ถูกเลือกเข้ามา โดยพิจารณาจากค่า p-value ที่ไม่มีนัยสำคัญ ถ้าไม่ตรงตามเงื่อนไขข้างต้นอัลกอริทึมจะหยุดทำงาน และสำหรับแต่ละกลุ่มที่จำแนกมาได้ ตัวแปรอิสระที่เหลือจะถูกจำแนกแยกกันและเป็นอิสระกัน จนกระทั่งไม่มีตัวแปรอิสระเหลือหรืออัลกอริทึมหยุดการทำงาน โดยข้อมูลที่ใช้ในการศึกษาจะจำลองภายใต้จำนวนกลุ่มของปัจจัยเท่ากับ 2, 3 และ 4, ขนาดข้อมูลเท่ากับ 6,000, 12,000 และ 24,000, ความแปรปรวนเท่ากับ 10,000 และ 40,000 และอัตราส่วนของค่าเฉลี่ยเท่ากับ 0.5, 1 และ 2 โดยทำการทดสอบที่ระดับนัยสำคัญเท่ากับ 0.05 และใช้เปอร์เซ็นต์ความผิดพลาดในการจำแนกกลุ่มเป็นเกณฑ์ในการพิจารณาว่าอัลกอริทึมมีประสิทธิภาพในการจำแนกกลุ่มได้ดีหรือไม่ จากผลการศึกษาพบว่าเมื่อความแปรปรวนเพิ่มขึ้น เปอร์เซ็นต์ความผิดพลาดในการจำแนกกลุ่มจะมีแนวโน้มเพิ่มขึ้น, เมื่อขนาดข้อมูลเพิ่มขึ้น เปอร์เซ็นต์ความผิดพลาดในการจำแนกกลุ่มจะมีแนวโน้มลดลง, เมื่ออัตราส่วนของค่าเฉลี่ยเพิ่มขึ้น เปอร์เซ็นต์ความผิดพลาดในการจำแนกกลุ่มจะมีแนวโน้มลดลง และเมื่อจำนวนกลุ่มของปัจจัยเพิ่มขึ้น เปอร์เซ็นต์ความผิดพลาดในการจำแนกกลุ่มไม่แตกต่างกัน
Other Abstract:	The aim of this paper is to study the classification process of ANOVAID algorithm which is the mixture of one-way ANOVA and independent-sample t-test. The dependent variable is the quantitative variable and the independent variable is the fixed qualitative variable. There are 2 steps in this algorithm. Those are independent variable selection and merging steps. Each independent variable is selected using the least p-value of the one-way ANOVA when the least p-value of the selected independent variable shows the statistical significance to enter or to be selected, then the independent-sample t-test is used to merge the data by using the insignificance p-value otherwise the algorithm will be stopped. In each of merging group, the next hierarchy for the rest of independent variables will be classified separately and independently and so on until there is no independent variable to classify or the algorithm is stopped. The data are simulated under several situations. Each situation depends upon the numbers of levels in factor are 2, 3 and 4, the sample size of each set of data are 6,000, 12,000 and 24,000, the variance of random error in the one-way ANOVA model are 10,000 and 40,000, and lastly the ratio of means are 0.5, 1 and 2 at the hypothesis testing is 0.05. In the study, the percentage of misclassification is used as the measure how good the algorithm. The results of the study show that when the value of variance for random error increases, the percentage of misclassification also increase; when the number of sample size increases, then the percentage of misclassification decreases; when the ratio of mean increases, then the percentage of misclassification decreases; and when the numbers of levels in factor increases, then the percentage of misclassification is indifferent.
Description:	วิทยานิพนธ์ (วท.ม.)--จุฬาลงกรณ์มหาวิทยาลัย, 2558
Degree Name:	วิทยาศาสตรมหาบัณฑิต
Degree Level:	ปริญญาโท
Degree Discipline:	สถิติ
URI:	http://cuir.car.chula.ac.th/handle/123456789/50911
URI:	http://doi.org/10.14457/CU.the.2015.965
metadata.dc.identifier.DOI:	10.14457/CU.the.2015.965
Type:	Thesis
Appears in Collections:	Acctn - Theses

Files in This Item:

File	Description	Size	Format
5681546626.pdf		2.01 MB	Adobe PDF	View/Open

Show full item record