การรู้จำภาษาจากเสียงพูดโดยใช้ลักษณ์ทางสัทวิทยา

ศิรินุช บุญสุข

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/42552

Title:	การรู้จำภาษาจากเสียงพูดโดยใช้ลักษณ์ทางสัทวิทยา
Other Titles:	SPOKEN LANGUAGE RECOGNITION USING PHONOLOGICAL FEATURES
Authors:	ศิรินุช บุญสุข
Advisors:	อติวงศ์ สุชาโต โปรดปราน บุณยพุกกณะ
Other author:	จุฬาลงกรณ์มหาวิทยาลัย. คณะวิศวกรรมศาสตร์
Advisor's Email:	atiwong.s@chula.ac.th Proadpran.P@Chula.ac.th
Subjects:	สัทศาสตร์ การรู้จำเสียงพูดอัตโนมัติ -- โปรแกรมคอมพิวเตอร์ Phonetics Automatic speech recognition -- Computer programs ปริญญาดุษฎีบัณฑิต
Issue Date:	2556
Publisher:	จุฬาลงกรณ์มหาวิทยาลัย
Abstract:	การรู้จำภาษาเสียงพูดได้รับความสนใจในการนำมาใช้ในการรู้จำเสียงพูดที่รองรับหลายภาษาเพื่อเป็นขั้นตอนส่วนต้นเพื่อระบุภาษาของสัญญาณเสียงพูด แนวทางการรู้จำภาษาที่มีส่วนใหญ่ใช้เทคนิคแบบจำลองทางสถิติร่วมกับค่าลักษณะทางเสียง การเรียงตัวของเสียง และฉันทลักษณ์ ตามการศึกษาความสัมพันธ์ระหว่างค่าพีเอฟ และภาษา ค่าพีเอฟเป็นข้อมูลเกี่ยวกับภาษาเพื่อจับค่าลักษณะทางอะคูสติก และแสดงข้อมูลการเรียงตัวของเสียงจากรูปแบบการเปลี่ยนแปลงของค่าพีเอฟในภาษาต่างๆ ระบบรู้จำภาษาที่เป็นที่ยอมรับในปัจจุบันเป็นการรวมระบบรู้จำย่อยที่ใช้แนวทางต่างๆ วิทยานิพนธ์นี้เสนอระบบรู้จำภาษาที่รวมระบบรู้จำภาษาย่อย 4 ระบบ ซึ่งประกอบด้วย 1) ระบบรู้จำภาษาแนวทางที่ใช้การจำลองลำดับหน่วยเสียงด้วยแบบจำลองเวกเตอร์สเปซ 2) ระบบรู้จำภาษาแนวทางที่ใช้หน่วยเสียงแลตทิซเอสวีเอ็ม 3) ระบบการรู้จำภาษาตามแนวทางการเรียงตัวของเสียงโดยใช้ค่าพีเอฟ และ 4) ระบบการรู้จำภาษาโดยใช้การจำแนกด้วยแบบจำลองลาเทนท์คอนดิชันนอลแรนดอมฟิลด์ร่วมกับค่าพีเอฟ ในระบบรู้จำภาษาแนวทางการเรียงตัวของเสียง ใช้การจำลองลำดับหน่วยเสียงด้วยซับพอร์ตเวกเตอร์แมชชีน การถ่วงน้ำหนักของเทอมบนซุปเปอร์เวกเตอร์ของค่าความน่าจะเป็นของแบบจำลองเอ็นแกรมมีความจำเป็นต่อประสิทธิภาพของระบบรู้จำ เพราะการถ่วงน้ำหนักป้องกันฟังก์ชันเคอร์เนลของซับพอร์ตเวกเตอร์แมชชีนจากการมีอิทธิพลของสมาชิกที่มีค่าความน่าจะเป็นที่มีค่ามาก วิทยานิพนธ์นี้สนใจการเพิ่มประสิทธิภาพการรู้จำภาษาโดยการรวมฟังก์ชันการถ่วงน้ำหนักกับสมาชิกของซุปเปอร์เวกเตอร์เข้าไว้ด้วยกัน การรวมกันของค่าความซ้ำซ้อนของเทอม (อาร์ดี) และค่าล็อกของความถี่เทอม (ล็อกทีเอฟ) ถูกเสนอเป็นฟังก์ชันการถ่วงน้ำหนักที่มีประสิทธิภาพในการรวมกันระหว่างค่าน้ำหนักแบบท้องถิ่น และค่าถ่วงน้ำหนักแบบครอบคลุม การถ่วงน้ำหนักนี้สามารถลดความถี่หน่วยที่ซ้ำซ้อนที่ปรากฏร่วมกันข้ามภาษาได้ สำหรับระบบการรู้จำภาษาตามแนวทางการเรียงตัวของเสียงโดยใช้ค่าพีเอฟ ใช้สถิติของรูปแบบการปรากฏร่วมกันของค่าพีเอฟบนภาษาที่แตกต่างกัน สำหรับระบบการรู้จำภาษาโดยใช้การจำแนกด้วยแบบจำลองลาเทนท์คอนดิชันนอลแรนดอมฟิลด์ร่วมกับค่าพีเอฟ แบบจำลองลาเทนท์คอนดิชันนอลแรนดอมฟิลด์ร่วมกับค่าพีเอฟถูกใช้เพื่อจับการเปลี่ยนแปลงแบบไดนามิคของลำดับค่าพีเอฟเพื่อสร้างแบบจำลองของภาษา ระบบอ้างอิงที่ถูกจัดสำหรับประเมินระบบรู้จำภาษาแต่ละแนวทาง และแบบรวมทุกระบบย่อย ผลการทดลองแสดงการทำให้ดีขึ้นเมื่อรวมผลลัพธ์ของระบบย่อยเข้าด้วยกัน การรวมค่าพีเอฟเข้าไว้กับระบบรู้จำภาษา ทำให้ประสิทธิภาพการรู้จำภาษาดีขึ้น
Other Abstract:	Spoken language recognition (SLR) has been of increasing interest in multilingual speech recognition as a pre-process for identifying the languages of speech utterances. Most existing SLR approaches apply statistical modeling techniques with acoustic, phonotactic and prosodic features. According to the studies of relationship between phonological features (PFs) and language, this thesis uses PF as the linguistic information to capture acoustic characteristics and to represent phonotactic information from the patterns of PF transition in different languages. The current state-of-the art system is the fusion of different sub-systems. The proposed SLR system combining four sub-systems: 1) the phone sequence modeling followed by the vector space model (PRVSM), 2) lattice-SVM system, 3) The phonotactic SLR approach using co-occurrence of PFs, and 4) the SLR sub-system based on the latent-dynamic conditional random field (LDCRF) model using PFs. In the phonotactic SLR systems based on the Support Vector Machine (SVM) modeling, term weighting on the supervector of n-gram probabilities is critical to the recognition performance because the weighting prevents the SVM kernel from being dominated by a few large probabilities. This thesis focuses on enhancing the SLR performance by incorporating with term weighting function on the supervector entities. The combination of redundancy of term frequency (rd) and logarithm of term frequency (logtf) is proposed as the effective term weighting functions combining the local and global weighting. It can effectively eliminate the redundancy of unit frequency co-occurrence across languages. For the phonotactic approach using PF, the statistics of co-occurrence of PFs across different languages are captured. For the SLR systems based on LDCRF using PFs, the LDCRF model was employed to capture the dynamics of the PF attribute sequences for constructing language models. Baseline systems were conducted to evaluate the individual and the fused SLR system. The results showed improvements when combining the sub-systems and the results of integrating the PFs into SLR system can achieve better performance.
Description:	วิทยานิพนธ์ (วศ.ด.)--จุฬาลงกรณ์มหาวิทยาลัย, 2556
Degree Name:	วิศวกรรมศาสตรดุษฎีบัณฑิต
Degree Level:	ปริญญาเอก
Degree Discipline:	วิศวกรรมคอมพิวเตอร์
URI:	http://cuir.car.chula.ac.th/handle/123456789/42552
URI:	http://doi.org/10.14457/CU.the.2013.27
metadata.dc.identifier.DOI:	10.14457/CU.the.2013.27
Type:	Thesis
Appears in Collections:	Eng - Theses

Files in This Item:

File	Description	Size	Format
5171831321.pdf		10.84 MB	Adobe PDF	View/Open

Show full item record