วิธีการทางสวนสัทศาสตร์สำหรับการปรับปรุงการรู้จำเสียงพูดแบบอาศัยเซกเมนต์

เกริกศักดิ์ ลิขิตสุภิณ

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/15716

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	อติวงศ์ สุชาโต	-
dc.contributor.advisor	โปรดปราน บุณยพุกกณะ	-
dc.contributor.advisor	ชัย วุฒิวิวัฒน์ชัย	-
dc.contributor.author	เกริกศักดิ์ ลิขิตสุภิณ	-
dc.contributor.other	จุฬาลงกรณ์มหาวิทยาลัย. คณะวิศวกรรมศาสตร์	-
dc.date.accessioned	2011-08-17T13:20:46Z	-
dc.date.available	2011-08-17T13:20:46Z	-
dc.date.issued	2552	-
dc.identifier.uri	http://cuir.car.chula.ac.th/handle/123456789/15716	-
dc.description	วิทยานิพนธ์ (วศ.ด.)--จุฬาลงกรณ์มหาวิทยาลัย, 2552	en
dc.description.abstract	ปัจจุบันนี้มีหลายวิธีที่ใช้ในการรู้จำเสียงพูด ซึ่งวิธีที่ได้รับความนิยมมากที่สุดคือ วิธีที่มีการดึงเอาเวกเตอร์คุณสมบัติออกจากกรอบเวลาที่แน่นอน เช่น การรู้จำเสียงพูดแบบอาศัยแบบจำลองฮิดเดนมาร์คอฟ (HMM) ซึ่งได้มีการพิสูจน์แล้วว่าการที่จะเพิ่มความรู้ด้านสวนสัทศาสตร์ลงไปในการรู้จำเสียงพูด แบบอาศัยแบบจำลองฮิดเดนมาร์คอฟนั้นเป็นไปได้ยาก ดังนั้นจึงได้มีการนำเสนอวิธีการรู้จำเสียงพูดแบบอาศัยเซกเมนต์ขึ้นมา ซึ่งวิธีการนี้จะมีการดึงเอาเวกเตอร์คุณสมบัติออกจากเซกเมนต์ที่มีขนาดแตกต่างกันไป แทนที่จะดึงออกจากกรอบเวลาที่มีขนาดเท่าๆ กัน วิทยานิพนธ์นี้แสดงให้เห็นว่า การรู้จำเสียงพูดแบบอาศัยเซกเมนต์มีความแม่นยำสูงกว่าการจำเสียงพูดแบบอาศัยกรอบเวลา ในการทดลองรู้จำเสียงพูดภาษาไทยในระดับหน่วยเสียง อย่างไรก็ตามการรู้จำเสียงพูดแบบอาศัยเซกเมนต์จะมีการค้นหาคำตอบที่อยู่ในกราฟของเซกเมนต์ ดังนั้นความแม่นยำในการรู้จำเสียงพูดของการรู้จำเสียงพูดแบบอาศัยเซกเมนต์ จึงขึ้นอยู่กับคุณภาพของกราฟของเซกเมนต์ หากต้องการเพิ่มความแม่นยำในการรู้จำเสียงพูดของการรู้จำเสียงพูดแบบอาศัยเซกเมนต์ จึงต้องมีการปรับปรุงคุณภาพกราฟของเซกเมนต์ โดยเพิ่มจำนวนเซกเมนต์ที่ถูกต้องลงในกราฟของเซกเมนต์ ดังนั้น วิทยานิพนธ์นี้มุ่งเน้นในปรับปรุงคุณภาพกราฟของเซกเมนต์ โดยการแก้ไขความผิดพลาดในกราฟของเซกเมนต์ ซึ่งเกิดจากการที่มีขอบเขตของหน่วยเสียงแทรกมา และเกิดจากการตัดออกของขอบเขตของหน่วยเสียง ที่เกิดจากขั้นตอนการแบ่งเสียงพูดเป็นเซกเมนต์ด้วยวิธีทางความน่าจะเป็น นอกจากนี้เพื่อเพิ่มความแม่นยำของการรู้จำเสียงพูดแบบอาศัยเซกเมนต์ วิทยานิพนธ์นี้ยังมีการนำคะแนนที่เกิดจากความน่าจะเป็น ที่เซกเมนต์จะถูกจัดอยู่ในกลุ่มของหน่วยเสียงแบบกว้างมาใช้ในขั้นตอนการให้คะแนนและค้นหาคำตอบ ของการรู้จำเสียงพูดแบบอาศัยเซกเมนต์ จากผลการทดลองแสดงให้เห็นว่า การรู้จำเสียงพูดแบบอาศัยเซกเมนต์ที่ผ่านกระบวนการที่นำเสนอในวิทยานิพนธ์นี้ สามารถรู้จำเสียงพูดในระดับหน่วยเสียงได้แม่นยำถึง 58.26% ขณะที่การรู้จำเสียงพูดแบบอาศัยเซกเมนต์ที่ไม่มีการผ่านกระบวนการที่นำเสนอ และการรู้จำเสียงพูดแบบอาศัยแบบจำลองฮิดเดนมาร์คอฟ มีความแม่นยำในการรู้จำเสียงพูดในระดับหน่วยเสียงน้อยกว่า 50%.	en
dc.description.abstractalternative	Today, there are many approaches to automatic speech recognition. However, most of them represent an observation space based on a temporal sequence of measurements extracted from fixed-length frames, such as the Hidden Markov Model (HMM)-based speech recognition. Incorporating acoustic-phonetic knowledge into those HMM-based approaches are proved to be difficult. A segment-based approach, in which acoustic feature vectors represent underlying speech segments instead of speech frames, was introduced. Segment-based approaches have been shown as competitive alternatives to HMM-based techniques. In this dissertation, we show that using a segment-based approach can yield better accuracies than the HMM-based ones in Thai phoneme recognition tasks. Still, its accuracies rely heavily on the quality of the segment graph from which the recognizer searches for the most likely recognition hypotheses. In order to increase the inclusion rate of actual segments in the graph, we recover possible missing segments due to boundary insertion and deletion errors based on acoustic discontinuities together with manner distinctive features from segment graphs provided by a typical frame-based segmentation. Scores based on how likely a segment belongs to some phoneme broad classes are also incorporated to the probabilistic framework used for scoring segments. The best phoneme recognition accuracy achieved is 58.26%, while they are less than 50% for the baseline HMM- based and the traditional segment-based recognizers.	en
dc.format.extent	2129745 bytes	-
dc.format.mimetype	application/pdf	-
dc.language.iso	th	es
dc.publisher	จุฬาลงกรณ์มหาวิทยาลัย	en
dc.relation.uri	http://doi.org/10.14457/CU.the.2009.864	-
dc.rights	จุฬาลงกรณ์มหาวิทยาลัย	en
dc.subject	การรู้จำเสียงพูดอัตโนมัติ	en
dc.subject	ภาษาไทย -- สัทศาสตร์	en
dc.subject	ภาษาไทย -- การออกเสียง	en
dc.title	วิธีการทางสวนสัทศาสตร์สำหรับการปรับปรุงการรู้จำเสียงพูดแบบอาศัยเซกเมนต์	en
dc.title.alternative	Acoustic-phonetic approaches to improving segment-based speech recognition	en
dc.type	Thesis	es
dc.degree.name	วิศวกรรมศาสตรดุษฎีบัณฑิต	es
dc.degree.level	ปริญญาเอก	es
dc.degree.discipline	วิศวกรรมคอมพิวเตอร์	es
dc.degree.grantor	จุฬาลงกรณ์มหาวิทยาลัย	en
dc.email.advisor	Atiwong.S@Chula.ac.th	-
dc.email.advisor	proadpran.p@chula.ac.th	-
dc.email.advisor	ไม่มีข้อมูล	-
dc.identifier.DOI	10.14457/CU.the.2009.864	-
Appears in Collections:	Eng - Theses

Files in This Item:

File	Description	Size	Format
Krerksak_Li.pdf		2.08 MB	Adobe PDF	View/Open

Show simple item record