การเปรียบเทียบวิธีการวิเคราะห์ความสำคัญของกลุ่มยีนและวิธีการถดถอยโลจิสติกทวิภาคในการหาความสัมพันธ์ระหว่างเซตของยีนและฟีโนไทป์แบบทวิภาค

สุธิภาส สิงห์เรือง

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/50580

Title:	การเปรียบเทียบวิธีการวิเคราะห์ความสำคัญของกลุ่มยีนและวิธีการถดถอยโลจิสติกทวิภาคในการหาความสัมพันธ์ระหว่างเซตของยีนและฟีโนไทป์แบบทวิภาค
Other Titles:	A comparison of gene set enrichment analysis and binary logistic regression for investigating the relationship between gene sets and a binary phenotype
Authors:	สุธิภาส สิงห์เรือง
Advisors:	วิฐรา พึ่งพาพงศ์
Other author:	จุฬาลงกรณ์มหาวิทยาลัย. คณะพาณิชยศาสตร์และการบัญชี
Advisor's Email:	Vitara.P@Chula.ac.th,vitara@cbs.chula.ac.th
Subjects:	ยีน การวิเคราะห์การถดถอยโลจิสติก Genes Logistic regression analysis
Issue Date:	2558
Publisher:	จุฬาลงกรณ์มหาวิทยาลัย
Abstract:	งานวิจัยฉบับนี้มีวัตถุประสงค์ เพื่อศึกษาและเปรียบเทียบวิธีการวิเคราะห์ความสำคัญของกลุ่มยีน และการถดถอยโลจิสติกทวิภาค ในการหาค่า p-value ของแต่ละเซตยีน โดยคำนึงถึงความสัมพันธ์และการทำงานร่วมกันเป็นเซตของยีนเป็นหลัก โดยการศึกษานี้จะเปรียบเทียบประสิทธิภาพ จากการวิเคราะห์ข้อมูลจำลองทั้งในกรณีที่ข้อมูลมีขนาดตัวอย่างมากกว่าจำนวนของยีนหรือตัวแปรอิสระ และกรณีที่ข้อมูลมีขนาดตัวอย่างน้อยกว่าจำนวนของตัวแปรอิสระ หรือที่เรียกว่า “ข้อมูลที่มีมิติสูง” ในขอบเขตการศึกษาต่างๆกัน ในงานวิจัยนี้จะเปรียบเทียบค่าอัตราความผิดพลาดรวม และค่าอำนาจในการทดสอบเพื่อวัดประสิทธิภาพจากวิธีทั้งสอง จากการศึกษาภายใต้ขอบเขตดังกล่าวผลปรากฏว่าวิธีการถดถอยโลจิสติกทวิภาค มีค่าอำนาจการทดสอบ(เฉลี่ย)สูง ในกรณีขนาดตัวอย่างมากกว่าจำนวนของตัวแปรอิสระ ในขณะที่วิธีการวิเคราะห์ความสำคัญของกลุ่มยีนมีค่าอำนาจการทดสอบ(เฉลี่ย)สูง ในกรณีขนาดตัวอย่างน้อยกว่าจำนวนของตัวแปรอิสระ แต่เมื่อพิจารณาถึงการวัดประสิทธิภาพจากค่าอัตราความผิดพลาดรวม พบว่าวิธีการวิเคราะห์ความสำคัญของกลุ่มยีนมีค่าต่ำ สำหรับกรณีขนาดตัวอย่างมากกว่าจำนวนของตัวแปรอิสระ ในขณะที่วิธีการถดถอยโลจิสติกทวิภาค มีค่าต่ำสำหรับกรณีขนาดตัวอย่างน้อยกว่าจำนวนของตัวแปรอิสระ
Other Abstract:	This research is aimed to study and compare Gene Set Enrichment Analysis method and binary logistic regression in finding p-values of each gene set. Here we consider the relationship and collaboration among genes in each gene set. In this study, the performance of two methods are compared using simulated data in two cases: (i) sample size is larger than the number of genes or independent variables (ii) sample size is smaller than the number of independent variables which is called “high-dimensional data”. The performance of two methods are compared in terms of the family wise error rate and the power of test. Results from simulation suggest that the binary logistic regression has larger power than the Gene Set Enrichment Analysis when sample size is larger than the number of independent variables while the Gene Set Enrichment Analysis has larger power when the data is high-dimensional. However, in terms of family-wise error rate, the Gene Set Enrichment Analysis is better than the binary logistic regression in case of low-dimensional data while the binary logistic regression is superior in case of high-dimensional data.
Description:	วิทยานิพนธ์ (วท.ม.)--จุฬาลงกรณ์มหาวิทยาลัย, 2558
Degree Name:	วิทยาศาสตรมหาบัณฑิต
Degree Level:	ปริญญาโท
Degree Discipline:	สถิติ
URI:	http://cuir.car.chula.ac.th/handle/123456789/50580
URI:	http://doi.org/10.14457/CU.the.2015.974
metadata.dc.identifier.DOI:	10.14457/CU.the.2015.974
Type:	Thesis
Appears in Collections:	Acctn - Theses

Files in This Item:

File	Description	Size	Format
5781591926.pdf		2.16 MB	Adobe PDF	View/Open

Show full item record