การรู้จำเสียงพูดไทยโดยตรงจากการเข้ารหัส G.729

สิริ วงศ์วรชาติกาล

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/72271

Title:	การรู้จำเสียงพูดไทยโดยตรงจากการเข้ารหัส G.729
Other Titles:	Direct recognition of Thai speech from G.729 code
Authors:	สิริ วงศ์วรชาติกาล
Advisors:	สุวิทย์ นาคพีระยุทธ
Other author:	จุฬาลงกรณ์มหาวิทยาลัย. คณะวิศวกรรมศาสตร์
Advisor's Email:	Suvit.N@Chula.ac.th
Subjects:	การรู้จำเสียงพูดอัตโนมัติ
Issue Date:	2543
Publisher:	จุฬาลงกรณ์มหาวิทยาลัย
Abstract:	มาตรฐาน ITU-T G.729 เป็นมาตรฐานในการบีบอัดเสียงพูดซึ่งสามารถนำใช้งานได้อย่างกว้างขวาง ดังนั้นถ้าเราสามารถดึงจุดเด่นของเสียงที่จำเป็นในการรู้จำออกมาได้โดยตรงจากรหัสเสียงที่ถูกบีบอัดแล้ว จะสามารถสร้างระบบรู้จำเสียงอย่างง่ายจากรหัสเสียง G.729 โดยตรง พลังงานเสียง คาบการสั่นของเสียง และ LSP (Line Spectral Pair) เป็นพารามิเตอร์ที่ส่งมาลับรหัส G.729 และสามารถใช้ในการรู้จำเสียงได้ วิทยานิพนธ์นี้นำวิธีการของแบบจำลองฮิดเดน มาร์คอฟ และการควอนไทซ์แบบเวกเตอร์ มาใช้ในการรู้จำเสียงภาษาไทยแบบไม่ขึ้นกับผู้พูด คำศัพท์ทั้งหมด 30 คำแบ่งเป็น 2 ชุดได้แก่ ชุดคำศัพท์ตัวเลข 0 ถึง 9 และชุดคำศัพท์พยางค์เดียว 20 คำ เสียงพูดที่นำมาเป็นต้นแบบและเป็นเสียงพูดทดสอบประกอบด้วยทั้งเพศชายและหญิงที่มีช่วงอายุ ระหว่าง 18 ปี ถึง 25ปี ผลการทดสอบอัตราการรู้จำแบบไม่ขึ้นลับผู้พูดของชุดเลียงพูดเพื่อทดสอบมีอัตรารู้จำเฉลี่ยร้อยละ 90.75 โดยมีอัตราการรู้จำเฉพาะชุดคำศัพท์พยางค์เดียวร้อยละ 88.50 อัตราการรู้จำเฉพาะชุดตัวเลขร้อยละ 93.00 ตามลำดับ
Other Abstract:	The ITU-T Recommendation G.729 is a versatile and well accepted speech compression standard. If the speech feature can be extracted directly from the code easily, a simple speech recognition system can work directly on the G.729 codes. Energy, pitch period and LSP are the parameters obtained from G.729 codes which can be used in speech recognition. This thesis uses Hidden Markov Model (HMM) and Vector Quantization to recognize speaker independent Thai speech. The 30-word vocabulary is subdivided into two sets comprising 20 single syllable, and 10 tha. numeric words, zero to nine. The separated speech training set and testing set are composed of both male and female speakers within the range of 18 to 25 years of age. The average recognition rate of this speaker-independent recognition system is 90.75 %. The recognition rate of the single-syllabled words is 88.50 %.The recognition rate of the numeric words is 93.00%.
Description:	วิทยานิพนธ์ (วศ.ม.)--จุฬาลงกรณ์มหาวิทยาลัย, 2543
Degree Name:	วิศวกรรมศาสตรมหาบัณฑิต
Degree Level:	ปริญญาโท
Degree Discipline:	วิศวกรรมไฟฟ้า
URI:	http://cuir.car.chula.ac.th/handle/123456789/72271
ISBN:	9741301111
Type:	Thesis
Appears in Collections:	Eng - Theses

Files in This Item:

File	Description	Size	Format
Siri_wo_front_p.pdf	หน้าปก สารบัญ และบทคัดย่อ	779.89 kB	Adobe PDF	View/Open
Siri_wo_ch1_p.pdf	บทที่ 1	665.27 kB	Adobe PDF	View/Open
Siri_wo_ch2_p.pdf	บทที่ 2	1.3 MB	Adobe PDF	View/Open
Siri_wo_ch3_p.pdf	บทที่ 3	1.38 MB	Adobe PDF	View/Open
Siri_wo_ch4_p.pdf	บทที่ 4	760.17 kB	Adobe PDF	View/Open
Siri_wo_ch5_p.pdf	บทที่ 5	628.45 kB	Adobe PDF	View/Open
Siri_wo_back_p.pdf	บรรณานุกรมและภาคผนวก	1.19 MB	Adobe PDF	View/Open

Show full item record