การเปรียบเทียบคุณภาพของข้อสอบและแบบสอบหลายตัวเลือกที่มีรูปแบบตัวเลือกต่างกัน

รณิดา เชยชุ่ม

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/28965

Title:	การเปรียบเทียบคุณภาพของข้อสอบและแบบสอบหลายตัวเลือกที่มีรูปแบบตัวเลือกต่างกัน
Other Titles:	The comparison of qualities of multiple-choice items and tests with different types of choice
Authors:	รณิดา เชยชุ่ม
Advisors:	ศิริชัย กาญจนวาสี เอมอร จังศิริพรปกรณ์
Other author:	จุฬาลงกรณ์มหาวิทยาลัย. คณะครุศาสตร์
Advisor's Email:	Sirichai.K@Chula.ac.th Aimorn.J@Chula.ac.th
Subjects:	ปริญญาดุษฎีบัณฑิต ข้อสอบแบบเลือกตอบ ทฤษฎีการตอบสนองข้อสอบ
Issue Date:	2551
Publisher:	จุฬาลงกรณ์มหาวิทยาลัย
Abstract:	งานวิจัยนี้มีวัตถุประสงค์เพื่อเปรียบเทียบคุณภาพของข้อสอบและแบบสอบหลายตัวเลือกที่มีรูปแบบตัวเลือกต่างกัน ในการทดสอบที่มีลักษณะต่างกันด้านวิธีการตอบ และจำนวนตัวเลือก โดยศึกษารูปแบบตัวเลือก “ดัก” “ใกล้เคียง” และ “คำตอบไม่สำเร็จ” ทั้งเชิงเดี่ยวและเชิงซ้อน ส่วนวิธีตอบศึกษา 3 วิธี คือ ตอบแบบประเพณีนิยม ตอบแบบเลือกชุดตัวถูก และตอบแบบตัดตัวลวง รวมทั้งศึกษาจำนวนตัวเลือก 3 ตัว 4 ตัว และ 5 ตัว เพื่อเปรียบเทียบความยาก อำนาจจำแนก ดัชนีความลวงร่วม ความเที่ยง ความตรงเชิงโครงสร้าง ค่าฟังก์ชันสารสนเทศของข้อสอบ (IIF) ค่าฟังก์ชันสารสนเทศของแบบสอบ (TIF) และอัตราส่วนสารสนเทศของเฉลี่ยของแบบสอบ (RAI) โดยเครื่องมือที่ใช้เป็นแบบสอบหลายตัวเลือกเรื่องระบบสมการเชิงเส้นที่คู่ขนานกัน 36 ฉบับ ฉบับละ 30 ข้อ เก็บข้อมูลจากนักเรียนชั้นมัธยมศึกษาปีที่ 3 ปีการศึกษา 2551 โรงเรียนในสังกัดสำนักงานเขตพื้นที่การศึกษากรุงเทพมหานคร เขต 1, 2 และ 3 รวมกลุ่มตัวอย่างโรงเรียน 42 แห่ง นักเรียน 12,021 คน โดยนักเรียนตอบแบบสอบคนละ 1 ฉบับ ผลของลักษณะปลายของตัวเลือก ที่มาของตัวเลือก รูปแบบตัวเลือก วิธีการตอบ และจำนวนตัวเลือกที่มีต่อคุณภาพของข้อสอบและแบบสอบสรุปเป็นผลการวิจัยได้ดังนี้ 1. ตัวเลือกเชิงเดี่ยวง่ายกว่า อำนาจจำแนกสูงกว่า มีเสน่ห์ในการลวงสูงกว่าตัวเลือกเชิงซ้อน ตัวเลือกคำตอบไม่สำเร็จยากที่สุด อำนาจจำแนกต่ำที่สุด และมีเสน่ห์ในการลวงต่ำที่สุด โดยตัวเลือกดักเชิงเดี่ยวง่ายที่สุด อำนาจจำแนกสูงที่สุด และมีเสน่ห์ในการลวงสูงกว่าตัวเลือกเชิงซ้อนทั้งหมด การตอบแบบตัดตัวลวงง่ายที่สุด อำนาจจำแนกไม่ต่างจากแบบอื่น แต่มีเสน่ห์ในการลวงสูงที่สุด ตัวเลือก 3 ตัวง่ายที่สุด และมีอำนาจจำแนกต่ำที่สุด 2. ลักษณะปลายของตัวเลือกและวิธีตอบไม่มีผลต่อความเที่ยง ตัวเลือกคำตอบไม่สำเร็จมีความเที่ยงต่ำที่สุด โดยตัวเลือกดักเชิงเดี่ยวมีความเที่ยงสูงที่สุด ตัวเลือก 3 ตัวมีความเที่ยงต่ำกว่า 5 ตัว ความตรงเชิงโครงสร้างเมื่อพิจารณาจากดัชนี AGFI, RMSEA และ ECVI พบว่าไม่แตกต่างกัน แต่ดัชนี PGFI ของตัวเลือกคำตอบไม่สำเร็จสูงกว่าตัวเลือกใกล้เคียง อีกทั้งตัวเลือกคำตอบไม่สำเร็จเชิงเดี่ยวสูงกว่าดักเชิงเดี่ยวและใกล้เคียงเชิงเดี่ยว 3. ในกลุ่มผู้สอบที่มีความสามารถต่ำ ตัวเลือกเชิงเดี่ยวมี IIF สูงกว่าเชิงซ้อน คำตอบไม่สำเร็จมี IIF ต่ำที่สุด คำตอบไม่สำเร็จเชิงซ้อนมี IIF ต่ำที่สุด ตอบแบบประเพณีนิยมมี IIF ต่ำที่สุด ตัวเลือก 3 ตัว IIF ต่ำที่สุด 5 ตัวสูงที่สุด ส่วนกลุ่มผู้สอบที่มีความสามารถปานกลาง ตัวเลือกเชิงเดี่ยวกับเชิงซ้อนมี IIF ไม่ต่างกัน คำตอบไม่สำเร็จมี IIF สูงทีสุด โดยคำตอบไม่สำเร็จทั้งเชิงเดี่ยวและเชิงซ้อนมี IIF สูงกว่าแบบอื่น ตอบแบบตัดตัวลวงมี IIF ต่ำกว่าประเพณีนิยม ตัวเลือก 3 ตัว IIF ต่ำที่สุด 5 ตัวสูงที่สุด สำหรับกลุ่มผู้สอบที่มีความสามารถสูง ตัวเลือกเชิงซ้อนมี IIF สูงกว่าเชิงเดี่ยว ตัวเลือกใกล้เคียงมี IIF ต่ำที่สุด ตัวเลือกคำตอบไม่สำเร็จเชิงซ้อนมี IIF สูงที่สุด ตอบแบบประเพณีนิยมมี IIF สูงที่สุด ตอบแบบตัดตัวลางมี IIF ต่ำที่สุด ตัวเลือก 3 ตัวมี IIF ต่ำที่สุด 4. ในกลุ่มผู้สอบที่มีความสามารถต่ำ ตัวเลือกเชิงเดี่ยวมี TIF สูงกว่าเชิงซ้อน ตัวเลือกดัก ใกล้เคียง คำตอบไม่สำเร็จมี TIF ไม่ต่างกัน ตัวเลือกใกล้เคียงเชิงเดี่ยวและดักเชิงเดี่ยวมี TIF สูงกว่าเชิงซ้อนทุกแบบ ตอบแบบประเพณีนิยมมี TIF ต่ำที่สุด ตัวเลือก 3 ตัวมี TIF ต่ำที่สุด ส่วนกลุ่มผู้สอบที่มีความสามารถปานกลาง ตัวเลือกเชิงซ้อนมี TIF สูงกว่าเชิงเดี่ยว ตัวเลือกดัก ใกล้เคียง คำตอบไม่สำเร็จมี TIF ไม่ต่างกัน ตัวเลือกดักเชิงซ้อนมี TIF สูงกว่าใกล้เคียงและไม่สำเร็จเชิงเดี่ยว ตอบแบบประเพณีนิยม TIF สูงที่สุด ตัวเลือก 3 ตัวมี TIF ต่ำกว่า 5 ตัว สำหรับกลุ่มผู้สอบที่มีความสามารถสูง ตัวเลือกเชิงเดี่ยวกับเชิงซ้อนมี TIF ไม่ต่างกัน คำตอบไม่สำเร็จมี TIF สูงที่สุด รูปแบบตัวเลือกที่ต่างกันและวิธีตอบที่ต่างกันไม่มีผลต่อ TIF ตัวเลือก 3 ตัวมี TIF ต่ำกว่า 5 ตัว 5. แบบสอบที่มี RAI สูงที่สุด ได้แก่ ตัวเลือกเชิงเดี่ยว, ตัวเลือกดัก, ตัวเลือกดักเชิงเดี่ยว, ตอบแบบตัดตัวลวง และ ตัวเลือก 5 ตัว
Other Abstract:	The objective of this research was to compare the quality of multiple-choice items and tests which had different types of choices in the tests that had different responding method and number of choices. Types of choice including “diagnostic alternative items”, “close alternative items”, and “incomplete alternative items” in both single and compound items were studied. Three responding method which were traditional choosing, right option set choosing, and distracters deleting and also the numbers of choice, i.e., 3, 4, and 5 choices were investigated in order to compare the level of difficulty, discrimination power, choice-agreement index, reliability, construct validity, Item Information Function (IIF), Test Information Function (TIF), and Ratio of Average Information of the (RAI). The tools used in this research were 36 multiple-choice tests on the parallel linear equation system. Each test consisted of 30 items. The data was collected from grade-9 students who were studied in the schools under the control of Bangkok education service area office 1, 2, and 3 in academic year 2008. Total sample group were 42 schools and 12,021 students. Each student took one test. The research results of the final characteristics of choice, sources of choice, types of choice, responding method, and numbers of choice which affected the test and item quality could be summarized as follows: 1. Single item was easier. It gave the higher discrimination power and the distracters were more attractive than the compound item. The incomplete alternative item was the most difficult, had the lowest discrimination power and distracter’s attraction. The single diagnostic alternative item was easiest, had the highest discrimination power and also, its distracter’s attraction was higher than those of all compound items. The distracters deleting method was easiest. Its discrimination power was not different from other methods but it had the highest distracter’s attraction. Three-choice items were easiest and had the lowest discrimination power. 2. The final characteristics of choice and responding method did not have any effect on reliability. The incomplete alternative item had the lowest reliability. The single diagnostic alternative item had the highest reliability. Three-choice items had a lower reliability than five-choice items. The construct validity, when considering from AGFI, RMSEA and ECVI indices, was not different but PGFI index of the incomplete alternative item was higher than the close alternative item and, moreover, the single incomplete alternative item had a higher value than the single diagnostic alternative item and single close alternative item. 3. In the low-ability tester group, the single item had a higher IIF than compound item. The incomplete alternative item, the compound incomplete alternative item and the traditional choosing method had the lowest IIF. Three-choice item had the lowest IIF whereas five-choice item had the highest. In case of medium-ability tester group, it was found that the IIF values of single and compound items were not different. The incomplete alternative item had the highest IIF. Both single and compound incomplete alternative items had a higher IIF than those of other types. The distracters deleting method had the lower IIF than the traditional choosing method. Three-choice item gave the lowest IIF where as five-choice item gave the highest. In the high-ability tester group, the compound item had a higher IIF than single item. The close alternative item had the lowest IIF. The compound incomplete alternative item and the traditional choosing method had the highest IIF. The distracter deleting method and three-choice item had the lowest IIF. 4. In the low-ability tester group, the single item had a higher TIF than compound item. The TIF values of diagnostic alternative item, close alternative item, and incomplete alternative item were not different. The single close alternative item and single diagnostic alternative had a higher TIF than those of all compound items. The traditional choosing method and three-choice item had the lowest TIF. In case of medium-ability tester group, it was found that the compound items had a higher TIF than single item. The diagnostic alternative item, close alternative item, and incomplete alternative items had the same TIF. The compound diagnostic alternative item had the higher TIF than the close alternative item and single incomplete alternative item. The traditional choosing method gave the highest TIF. Three-choice item had the lower TIF than five-choice item. . In the high-ability tester group, single and compound item had the same TIF. The incomplete alternative item had the highest TIF. Different types of choice and responding method did not have any effect on TIF values. Three-choice item had the lower TIF than five-choice item. 5. The items that gave the highest RAI value were single item, diagnostic alternative item, single diagnostic alternative item, distracter deleting method, and five-choice item.
Description:	วิทยานิพนธ์ (ค.ด.)--จุฬาลงกรณ์มหาวิทยาลัย, 2551
Degree Name:	ครุศาสตรดุษฎีบัณฑิต
Degree Level:	ปริญญาเอก
Degree Discipline:	การวัดและประเมินผลการศึกษา
URI:	http://cuir.car.chula.ac.th/handle/123456789/28965
URI:	http://doi.org/10.14457/CU.the.2008.312
metadata.dc.identifier.DOI:	10.14457/CU.the.2008.312
Type:	Thesis
Appears in Collections:	Edu - Theses

Files in This Item:

File	Description	Size	Format
ranida_ch.pdf		7.84 MB	Adobe PDF	View/Open

Show full item record