การรู้จำภาษาจากเสียงพูดโดยใช้ลักษณ์ทางสัทวิทยา

ศิรินุช บุญสุข

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/42552

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	อติวงศ์ สุชาโต	en_US
dc.contributor.advisor	โปรดปราน บุณยพุกกณะ	en_US
dc.contributor.author	ศิรินุช บุญสุข	en_US
dc.contributor.other	จุฬาลงกรณ์มหาวิทยาลัย. คณะวิศวกรรมศาสตร์	en_US
dc.date.accessioned	2015-06-24T06:10:45Z
dc.date.available	2015-06-24T06:10:45Z
dc.date.issued	2556	en_US
dc.identifier.uri	http://cuir.car.chula.ac.th/handle/123456789/42552
dc.description	วิทยานิพนธ์ (วศ.ด.)--จุฬาลงกรณ์มหาวิทยาลัย, 2556	en_US
dc.description.abstract	การรู้จำภาษาเสียงพูดได้รับความสนใจในการนำมาใช้ในการรู้จำเสียงพูดที่รองรับหลายภาษาเพื่อเป็นขั้นตอนส่วนต้นเพื่อระบุภาษาของสัญญาณเสียงพูด แนวทางการรู้จำภาษาที่มีส่วนใหญ่ใช้เทคนิคแบบจำลองทางสถิติร่วมกับค่าลักษณะทางเสียง การเรียงตัวของเสียง และฉันทลักษณ์ ตามการศึกษาความสัมพันธ์ระหว่างค่าพีเอฟ และภาษา ค่าพีเอฟเป็นข้อมูลเกี่ยวกับภาษาเพื่อจับค่าลักษณะทางอะคูสติก และแสดงข้อมูลการเรียงตัวของเสียงจากรูปแบบการเปลี่ยนแปลงของค่าพีเอฟในภาษาต่างๆ ระบบรู้จำภาษาที่เป็นที่ยอมรับในปัจจุบันเป็นการรวมระบบรู้จำย่อยที่ใช้แนวทางต่างๆ วิทยานิพนธ์นี้เสนอระบบรู้จำภาษาที่รวมระบบรู้จำภาษาย่อย 4 ระบบ ซึ่งประกอบด้วย 1) ระบบรู้จำภาษาแนวทางที่ใช้การจำลองลำดับหน่วยเสียงด้วยแบบจำลองเวกเตอร์สเปซ 2) ระบบรู้จำภาษาแนวทางที่ใช้หน่วยเสียงแลตทิซเอสวีเอ็ม 3) ระบบการรู้จำภาษาตามแนวทางการเรียงตัวของเสียงโดยใช้ค่าพีเอฟ และ 4) ระบบการรู้จำภาษาโดยใช้การจำแนกด้วยแบบจำลองลาเทนท์คอนดิชันนอลแรนดอมฟิลด์ร่วมกับค่าพีเอฟ ในระบบรู้จำภาษาแนวทางการเรียงตัวของเสียง ใช้การจำลองลำดับหน่วยเสียงด้วยซับพอร์ตเวกเตอร์แมชชีน การถ่วงน้ำหนักของเทอมบนซุปเปอร์เวกเตอร์ของค่าความน่าจะเป็นของแบบจำลองเอ็นแกรมมีความจำเป็นต่อประสิทธิภาพของระบบรู้จำ เพราะการถ่วงน้ำหนักป้องกันฟังก์ชันเคอร์เนลของซับพอร์ตเวกเตอร์แมชชีนจากการมีอิทธิพลของสมาชิกที่มีค่าความน่าจะเป็นที่มีค่ามาก วิทยานิพนธ์นี้สนใจการเพิ่มประสิทธิภาพการรู้จำภาษาโดยการรวมฟังก์ชันการถ่วงน้ำหนักกับสมาชิกของซุปเปอร์เวกเตอร์เข้าไว้ด้วยกัน การรวมกันของค่าความซ้ำซ้อนของเทอม (อาร์ดี) และค่าล็อกของความถี่เทอม (ล็อกทีเอฟ) ถูกเสนอเป็นฟังก์ชันการถ่วงน้ำหนักที่มีประสิทธิภาพในการรวมกันระหว่างค่าน้ำหนักแบบท้องถิ่น และค่าถ่วงน้ำหนักแบบครอบคลุม การถ่วงน้ำหนักนี้สามารถลดความถี่หน่วยที่ซ้ำซ้อนที่ปรากฏร่วมกันข้ามภาษาได้ สำหรับระบบการรู้จำภาษาตามแนวทางการเรียงตัวของเสียงโดยใช้ค่าพีเอฟ ใช้สถิติของรูปแบบการปรากฏร่วมกันของค่าพีเอฟบนภาษาที่แตกต่างกัน สำหรับระบบการรู้จำภาษาโดยใช้การจำแนกด้วยแบบจำลองลาเทนท์คอนดิชันนอลแรนดอมฟิลด์ร่วมกับค่าพีเอฟ แบบจำลองลาเทนท์คอนดิชันนอลแรนดอมฟิลด์ร่วมกับค่าพีเอฟถูกใช้เพื่อจับการเปลี่ยนแปลงแบบไดนามิคของลำดับค่าพีเอฟเพื่อสร้างแบบจำลองของภาษา ระบบอ้างอิงที่ถูกจัดสำหรับประเมินระบบรู้จำภาษาแต่ละแนวทาง และแบบรวมทุกระบบย่อย ผลการทดลองแสดงการทำให้ดีขึ้นเมื่อรวมผลลัพธ์ของระบบย่อยเข้าด้วยกัน การรวมค่าพีเอฟเข้าไว้กับระบบรู้จำภาษา ทำให้ประสิทธิภาพการรู้จำภาษาดีขึ้น	en_US
dc.description.abstractalternative	Spoken language recognition (SLR) has been of increasing interest in multilingual speech recognition as a pre-process for identifying the languages of speech utterances. Most existing SLR approaches apply statistical modeling techniques with acoustic, phonotactic and prosodic features. According to the studies of relationship between phonological features (PFs) and language, this thesis uses PF as the linguistic information to capture acoustic characteristics and to represent phonotactic information from the patterns of PF transition in different languages. The current state-of-the art system is the fusion of different sub-systems. The proposed SLR system combining four sub-systems: 1) the phone sequence modeling followed by the vector space model (PRVSM), 2) lattice-SVM system, 3) The phonotactic SLR approach using co-occurrence of PFs, and 4) the SLR sub-system based on the latent-dynamic conditional random field (LDCRF) model using PFs. In the phonotactic SLR systems based on the Support Vector Machine (SVM) modeling, term weighting on the supervector of n-gram probabilities is critical to the recognition performance because the weighting prevents the SVM kernel from being dominated by a few large probabilities. This thesis focuses on enhancing the SLR performance by incorporating with term weighting function on the supervector entities. The combination of redundancy of term frequency (rd) and logarithm of term frequency (logtf) is proposed as the effective term weighting functions combining the local and global weighting. It can effectively eliminate the redundancy of unit frequency co-occurrence across languages. For the phonotactic approach using PF, the statistics of co-occurrence of PFs across different languages are captured. For the SLR systems based on LDCRF using PFs, the LDCRF model was employed to capture the dynamics of the PF attribute sequences for constructing language models. Baseline systems were conducted to evaluate the individual and the fused SLR system. The results showed improvements when combining the sub-systems and the results of integrating the PFs into SLR system can achieve better performance.	en_US
dc.language.iso	th	en_US
dc.publisher	จุฬาลงกรณ์มหาวิทยาลัย	en_US
dc.relation.uri	http://doi.org/10.14457/CU.the.2013.27	-
dc.rights	จุฬาลงกรณ์มหาวิทยาลัย	en_US
dc.subject	สัทศาสตร์
dc.subject	การรู้จำเสียงพูดอัตโนมัติ -- โปรแกรมคอมพิวเตอร์
dc.subject	Phonetics
dc.subject	Automatic speech recognition -- Computer programs
dc.subject	ปริญญาดุษฎีบัณฑิต
dc.title	การรู้จำภาษาจากเสียงพูดโดยใช้ลักษณ์ทางสัทวิทยา	en_US
dc.title.alternative	SPOKEN LANGUAGE RECOGNITION USING PHONOLOGICAL FEATURES	en_US
dc.type	Thesis	en_US
dc.degree.name	วิศวกรรมศาสตรดุษฎีบัณฑิต	en_US
dc.degree.level	ปริญญาเอก	en_US
dc.degree.discipline	วิศวกรรมคอมพิวเตอร์	en_US
dc.degree.grantor	จุฬาลงกรณ์มหาวิทยาลัย	en_US
dc.email.advisor	atiwong.s@chula.ac.th	en_US
dc.email.advisor	Proadpran.P@Chula.ac.th
dc.identifier.DOI	10.14457/CU.the.2013.27	-
Appears in Collections:	Eng - Theses

Files in This Item:

File	Description	Size	Format
5171831321.pdf		10.84 MB	Adobe PDF	View/Open

Show simple item record