A study of prosodic features for Indonesian speech recognition

Nazrul Effendy

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/13649

Title:	A study of prosodic features for Indonesian speech recognition
Other Titles:	การศึกษาคุณลักษณะสัทสัมพันธ์สำหรับการรู้จำเสียงพูดอินโดนีเซีย
Authors:	Nazrul Effendy
Advisors:	Somchai Jitapunkul
Other author:	Chulalongkorn University. Faculty of Engineering
Advisor's Email:	Somchai.J@chula.ac.th
Subjects:	Automatic speech recognition Indonesian language
Issue Date:	2006
Publisher:	Chulalongkorn University
Abstract:	Utterance-type information has been used been used in spoken dialogue system, speech recognition system and translation machine. In a typical spoken dialogue system, a user can ask question or give information to the system. In another side, the spoken dialogue system should be capable of recognizing its user intention to give the correct response to him/her. In this dissertation, the automatic utterance-type recognizer is proposed to distinguish declarative questions from statements in Indonesian speech. Since utterances in these two types have the same words with the same order and differ only in their intonations, their classification requires not only a word recognizer, but also an intonation recognizer. At first, the utterance-type recognizer is designed based on Fujisaki model. The utterance-type recognizer uses a combination of the Fujisaki-model-parameters as the features to recognizt the two utterance type. The best performance of the Fujisaki model based utterance-type recognizer is achieved using a combination of a fraction value of F[subscript b] : F[subscript b]/100 the amplitude of last accent command, and the magnitude of last phrase command as the input of the neural neetworks. However, the Fujisaki parameters extractor is too complicated to be implemented in an automatic recognition system. Therefore, the utterance-type recognizer is developed using the polynomial coefficients of the pitch contours of the sentence's final word. The automatic utterance-type recognizer using polynomial expansion consists of a pitch contour extractor, normalizer, feature extractor, classifier, and an automatic utterance segmentation module. The pitch contour of each utterance type i analyzed to investigate the final word of the two utterance type. To create the automatic utterance segmentation module, an Indonesian acoustic model is designed. The evaluation confirms that the method using the final word and polynomial expansion is effective to distinguish declarative questions and statements in Indonesian speech.
Other Abstract:	ข้อมูลชนิด Utterance ถูกใช้ใน spoken dialogue system ระบบการรู้จำเสียงพูด และ translation machine โดยทั่วไปใน spoken dialogue system ผู้ใช้สามารถถามคำถามหรือให้ข้อมูลกับเครื่องได้ spoken dialogue system จึงควรจะสามารถรู้จำผู้ใช้ได้เพื่อให้สามารถตอบสนองได้อย่างถูกต้อง วิทยานิพนธ์ฉบับนี้นำเสนอ the automatic utterance type recognizer เพื่อให้สามารถแยกแยะ declarative questions ออกจาก statements ในภาษาอินโดนีเซียได้ เนื่องจาก Utterance ในเสียงพูดทั้งสองชนิดมีลักษณะด้านคำและลำดับที่เหมือนกัน และแตกต่างกันในเฉพาะด้าน intonations การแยกแยะจึงต้องใช้ทั้งตัวรู้จักคำ และตัวรู้จำ intonations ตัวรู้จำชนิด Utterance ถูกออกแบบบนพื้นฐานของแบบจำลองฟูจิซากิ ตัวรู้จำดังกล่าวจะใช้ผลรวมของค่าพารามิเตอร์จากแบบจำลองฟูจิซากิในการรู้จำ Utterance ในเสียงพูดทั้งสองชนิด ประสิทธิภาพสูงสุดของแบบจำลองฟูจิซากิเมื่อนำมาใช้ในการรู้จำ Utterance ได้จากผลรวมของค่า fraction value เท่ากับ F[subscript b]/100 ค่าแอมพิจูดของ accent command และค่าขนาดของ last phrase command เป็นสัญญาณข้าวของระบบ neural อย่างไรก็ตามการดึงค่าพารามิเตอร์มาใช้งานของแบบจำลองฟูจิซากิ มีความซับซ้อนมากเกินกว่าจะสามารถนำมาใช้ได้ในระบบการรู้จำเสียงพูดแบบอัตโนมัติ ดังนั้นตัวรู้จำชนิด Utterance จึงถูกพัฒนาโโยใช้ค่าสัมประสิทธิ์ polynomial ของคอนทัวร์ความทุ้มแหลมของเสียง (Pitch contours) ของคำสุดท้ายแต่ละประโยค ตัวรู้จำชนิด Utterance แบบอัตโนมัติซึ่งใช้การแผ่ขยาย polynomial ประกอบด้วยการดึงคอนทัวร์ความทุ้มแหลมของเสียง และมอดูลการแบ่ง Utterance แบบอัตโนมัติอนทัวร์ความทุ้มแหลมของเสียงของ Utterance ในเสียงพูดแต่ละชนิดจะถูกวิเคราะห์เพื่อศึกษาถึงคำสุดท้ายของแต่ละชนิด Utterance นอกจากนี้โมเดลเสียงอินโดนีเซียได้ถูกออกแบบขึ้นเพื่อให้สามารถสร้างมอดูลการแบ่ง Utterance แบบอัตโนมัติได้ ผลการประเมินระบบแสดงว่าการใช้ข้อมูลคำสุดท้าย และการแผ่ขยาย polynomial สามารถแยกแยะ declarative questions ออกจาก statements ในภาษาอินโดนีเซียได้อย่างมีประสิทธิภาพ
Description:	Thesis (D.Eng.)--Chulalongkorn University, 2006
Degree Name:	Doctor of Engineering
Degree Level:	Doctoral Degree
Degree Discipline:	Electrical Engineering
URI:	http://cuir.car.chula.ac.th/handle/123456789/13649
URI:	http://doi.org/10.14457/CU.the.2006.1769
metadata.dc.identifier.DOI:	10.14457/CU.the.2006.1769
Type:	Thesis
Appears in Collections:	Eng - Theses

Files in This Item:

File	Description	Size	Format
Nazrul_Ef.pdf		1.9 MB	Adobe PDF	View/Open

Show full item record