การตรวจหาสระในเสียงพูดต่อเนื่องภาษาไทย

เพียงจิต ดารีเย๊าะ

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/14782

Title:	การตรวจหาสระในเสียงพูดต่อเนื่องภาษาไทย
Other Titles:	Vowel landmark detection in Thai continuous speech
Authors:	เพียงจิต ดารีเย๊าะ
Advisors:	อติวงศ์ สุชาโต
Other author:	จุฬาลงกรณ์มหาวิทยาลัย. คณะวิศวกรรมศาสตร์
Advisor's Email:	Atiwong.S@Chula.ac.th
Subjects:	การรู้จำเสียงพูดอัตโนมัติ เสียงพูด ภาษาไทย -- สระ
Issue Date:	2549
Publisher:	จุฬาลงกรณ์มหาวิทยาลัย
Abstract:	ระบบรู้จำเสียงพูด ตั้งแต่อดีตจนถึงปัจจุบันได้ถูกพัฒนาขึ้นอย่างต่อเนื่อง เพื่อเพิ่มความสามารถในการรู้จำเสียงของระบบให้มีประสิทธิภาพมากที่สุด การพัฒนาระบบรู้จำเสียงพูดนั้นสามารถทำหลายวิธี ซึ่งหนึ่งในวิธีเหล่านั้นคือ การปรับปรุงกระบวนการในการรู้จำเสียงพูดให้มีความถูกต้องมากที่สุด หรือแม้แต่การเพิ่มตัวแปรบางชนิด เพื่อให้ระบบรู้จำเสียงพูดใช้เป็นความรู้เพิ่มเติมสำหรับการรู้จำเสียงพูดในแต่ละครั้ง สระเป็นหน่วยเสียงแบบหนึ่งที่ทำหน้าที่เป็นแกนของพยางค์ และตำแหน่งของสระในประโยคคำพูดนั้น สามารถทำให้การตัดแบ่งเป็นหน่วยเสียงมีความถูกต้องมากขึ้นได้ นอกจากนี้ตำแหน่งของสระทั้งหมดสามารถบอกจำนวนพยางค์ที่เกิดขึ้นในประโยค ซึ่งใช้เป็นความรู้หนึ่งในการรู้จำเสียงพูดของระบบรู้จำเสียงพูดประเภทต่างๆ ทำให้ระบบรู้จำเสียงพูดเหล่านั้น สามารถรู้จำเสียงพูดได้ถูกต้องยิ่งขึ้น ดังนั้น วิทยานิพนธ์นี้นำเสนอวิธีการในการตรวจหาตำแหน่งสระ ด้วยลักษณะทางสวนสัทศาสตร์ ซึ่งประกอบไปด้วย ค่าอัตสหสัมพันธ์ที่มากที่สุดในช่วงความถี่ 60 ถึง 320 เฮิรตซ์ สำหรับใช้ในการหาความก้องหรือไม่ก้องของกรอบสัญญาณเสียงพูด และค่าพลังงานของสัญญาณเสียงที่มากกว่า 300 เฮิรตซ์ สำหรับการคัดเลือกตำแหน่งที่จะเป็นสระด้วยระเบียบวิธีคอนเวกซ์ฮัลล์ การประเมินประสิทธิภาพของการตรวจหาสระนั้น ได้ทำการทดลองกับฐานข้อมูลเสียงโลตัส พบว่า ในเสียงพูดต่อเนื่องแบบบันทึกในห้องเงียบได้ความถูกต้องของการตรวจหาสระเท่ากับ 84.98% และในเสียงพูดต่อเนื่องแบบบันทึกในสภาวะปกติ ได้ความถูกต้องของการตรวจหาสระเท่ากับ 85.33% นอกจากนี้ ยังได้ทำการทดลองกับฐานข้อมูลเสียชุดตัวเลข และได้ความถูกต้องของการตรวจหาสระเท่ากับ 95.80% ในเสียงพูดตัวเลขแบบบันทึกผ่านทางไมโครโฟนในห้องเงียบ และในเสียงพูดตัวเลขแบบบันทึกผ่านทางโทรศัพท์ในห้องปกติ ได้ความถูกต้อง ได้ความถูกต้องของการตรวจหาสระเท่ากับ 84.22%.
Other Abstract:	From the past to present, speech recognition systems have been continuously developed in order for them to achieve possible maximal accuracy. Speech recognition systems can be done in many way such as improving or adding some parameters into the recognition process. Vowels are the nuclei of syllable. Their locations in speech utterances can help segmentation to obtain better performance. Furthermore, the number of vowels in a speech utterance can be used as additional speech recognition constraint. In this thesis, a method of vowel landmark detection based on two acoustic measurements in proposed. The first measurement is the maximal autocorrelation value of speech signal in the equivalent frequency range of 60 to 320 Hz. This measurement is used for classifying speech frames into voice or voiceless frames. Another one is the low frequency removed energy. The convex hull algorithm is used for picking the peak of low frequency removed energy profile to mark the location of vowel landmark. The evaluation of this method was done on three corpora. In Large Vocabulary Thai for Continuous Speech Recognition Corpus, it performs with 84.98% accuracy for clean speech data set and 85.33% accuracy for office environment speech data set. In Spoken Digit corpus, it performs with 95.80% accuracy for clean number data set and 84.22% of accuracy for telephone number data set. Also, it performed with 86.16% for TIMIT corpus.
Description:	วิทยานิพนธ์ (วศ.ม.)--จุฬาลงกรณ์มหาวิทยาลัย, 2549
Degree Name:	วิศวกรรมศาสตรมหาบัณฑิต
Degree Level:	ปริญญาโท
Degree Discipline:	วิศวกรรมคอมพิวเตอร์
URI:	http://cuir.car.chula.ac.th/handle/123456789/14782
URI:	http://doi.org/10.14457/CU.the.2006.1435
metadata.dc.identifier.DOI:	10.14457/CU.the.2006.1435
Type:	Thesis
Appears in Collections:	Eng - Theses

Files in This Item:

File	Description	Size	Format
piengjit.pdf		8.05 MB	Adobe PDF	View/Open

Show full item record