A study of various linguistic effects on tone recognition in Thai continuous

Nuttakorn Thubthong

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/11193

Title:	A study of various linguistic effects on tone recognition in Thai continuous
Other Titles:	การศึกษาผลกระทบต่างๆ ทางภาษาศาสตร์ ต่อการรู้จำวรรณยุกต์ในคำพูดต่อเนื่องภาษาไทย
Authors:	Nuttakorn Thubthong
Advisors:	Boonserm Kijsirikul Sudaporn Luksaneeyanawin
Other author:	Chulalongkorn University. Faculty of Engineering
Advisor's Email:	boonserm@cp.eng.chula.ac.th, Boonserm.K@Chula.ac.th Sudaporn.L@chula.ac.th
Subjects:	Automatic speech recognition Thai language -- Tone Linguistics
Issue Date:	2001
Publisher:	Chulalongkorn University
Abstract:	To study the various linguistic effects, i.e., syllable structure, coarticulation, intonation and stress on tone recognition in Thai continuous speech. Tone models for compensating these effects are also developed. We first study the effect of initial consonants, vowels, and final consonants on tone recognition in isolation. Based on the observation on Fo contours, we proposed a novel tone feature set. The new feature set achieved better recognition rates than the conventional tone feature sets. We also explored several combinations of classifier schemes and found that the combinations of classifiers were superior to a single classifier. Next, we developed a basic tone recognition framework for Thai continuous speech. The framework consisted of tone models used to parameterize Fo contours of tones and a classifier used to evaluate the performance of the tone models. We conducted experiments to construct the tone models by concentrating on tone features, frequency scales, normalization techniques, and tone critical segments. The classifier was developed using a feed-forward neural network. Next, we focussed on tone coarticulation effect. We have proposed a feature set called "contextual tone features" that captured the Fo realizations of the neighboring syllables. The features provided the best tone error reduction rates of 56.17%, 42.47%, and 42.42% for Thai Proverb Corpus (TPC), Potisuk-1999 Corpus (PC-99), and Thai Anumal Story Corpus (TASC), respectively. Furthermore, we explored the context-dependent tone model (CD-T-175) and developed a novel model, half-tone model (H-T-30). Both models increased recognition rates, but the training time of H-T-30 was one-fourth of CD-T-175. Next, we studied the effect of intonation on tone recognition. We obtained two methods, i.e., beginning-point intonation normalization and center-point intonation normalization methods to compensate the intonation effect. Both methods significantly increased recognition rates. The best error reduction rates of 22.20% and 16.84% were achieved for TASC and TPC, respectively. Next, we concentrated on stress effect. We first performed two empirical experiments of stress detection on pairs of ambiguous words and poly-syllabic words. We explored acoustic features, i.e., duration, energy, and Fo extracted from several linguistic units, i.e., vowel, syllable and rhyme units. The rhyme unit outperformed the other units for stress detection. We then performed an empirical study of tone recognition. We have proposed two methods, i.e., separated stress method (SSM) and incorporated stress feature method (ISFM). Both methods increased the tone recognition rates. We additionally incorporated ISFMs into the tone model and found that TSFM improved the recognition rates. The highest error reduction rates of 32.43% and 27.16% were reported for TPC and TASC, respectively. Finally, we integrated several refined tone models into a syllable-based speech recognition system to enhance the recognition performance. We achieved the best error reduction rates of 85.16% and 75.06% for TPC and TASC, respectively.
Other Abstract:	ศึกษาผลกระทบจากปัจจัยทางภาษาศาสตร์ อันได้แก่ โครงสร้างพยางค์ บริบท ทำนองเสียง และเสียงหนัก/เบา ต่อการรู้จำวรรณยุกต์ ในคำพูดต่อเนื่องภาษาไทย และพัฒนาแบบจำลองวรรณยุกต์เพื่อแก้ปัญหาผลกระทบดังกล่าว การวิจัยเริ่มจากการศึกษาผลกระทบของหน่วยเสียงพยัญชนะต้น สระ และพยัญชนะตัวสะกด ต่อการรู้จำเสียงวรรณยุกต์ในคำพูดเดี่ยว ผู้วิจัยเสนอลักษณะสำคัญของเสียงวรรณยุกต์ชุดใหม่ ซึ่งให้ผลการรู้จำที่ดีกว่าลักษณะสำคัญที่ใช้กันแต่เดิม นอกจากนี้ ผู้วิจัยได้ศึกษาการผสมผสานตัวแยกแยะแทนการใช้ตัวแยกแยะเดี่ยว เพื่อเพิ่มอัตราการรู้จำ ผู้วิจัยได้พัฒนากรอบงานการรู้จำเสียงวรรณยุกต์พื้นฐาน สำหรับคำพูดต่อเนื่องภาษาไทย โดยกรอบงานประกอบด้วยแบบจำลองวรรณยุกต์ และตัวแยกแยะ โดยที่แบบจำลองวรรณยุกต์จะพิจารณาใช้องค์ประกอบที่สำคัญ ในการจำแนกเสียงวรรณยุกต์ คือ ลักษณะสำคัญของเสียงวรรณยุกต์ หน่วยความถี่มูลฐาน เทคนิคการปรับบรรทัดฐาน และส่วนประกอบของพยางค์ที่เป็นตัวเกาะของวรรณยุกต์ ขณะที่ตัวแยกแยะจะใช้ข่ายงานระบบประสาท จากนั้น ผู้วิจัยได้ศึกษาผลกระทบของบริบทต่อเสียงวรรณยุกต์ และได้เสนอชุดลักษณะสำคัญ เรียกว่า ลักษณะสำคัญของเสียงวรรณยุกต์แบบพึ่งพาบริบท (contextual tone features) เพื่อแก้ผลกระทบจากบริบท พบว่าอัตราการลดลงของความผิดพลาดสูงสุดเท่ากับ 56.17 42.47 และ 42.42 เปอร์เซ็นต์ สำหรับฐานข้อมูล TPC PC-99 และ TASC ตามลำดับ นอกจากนั้น ผู้วิจัยได้ทดลองแบบจำลองวรรณยุกต์แบบขึ้นกับบริบท (context-dependent tone model) และเสนอแบบจำลองครึ่งวรรณยุกต์ (half-tone model) พบว่าแบบจำลองทั้งสองให้อัตราการรู้จำดีขึ้น แต่เวลาในการฝึกของแบบจำลองครึ่งวรรณยุกต์น้อยกว่าถึง 1 ใน 4 ของแบบจำลองวรรณยุกต์แบบขึ้นกับบริบท จากนั้น ผู้วิจัยได้ศึกษาผลกระทบจากทำนองเสียง และเสนอวิธีการปรับทำนองเสียงเพื่อลดผลกระทบจากทำนองเสียง ซึ่งการปรับดังกล่าวทำให้อัตราการลดลงของความผิดพลาดสูงสุด คือ 22.20 และ 16.84 เปอร์เซ็นต์ สำหรับ ฐานข้อมูล TASC แบะ TPC ตามลำดับ จากนั้น ผู้วิจัยได้ศึกษาผลกระทบจากเสียงหนัก/เบา โดยเริ่มจากจากออกแบบวิธีการแยกเสียงหนักเบา โดยใช้ลักษณะทางสวนศาสตร์ต่างๆ คือ ระยะเวลา พลังงาน และ ความถี่มูลฐาน โดยศึกษาจากหน่วยเสียงขนาดต่างๆ คือ สระ หน่วยตาม (rhyme) และ พยางค์ จากการทดลองพบว่า การใช้หน่วยเสียงตามให้ผลการแยกแยะดีที่สุด ผู้วิจัยยังได้เสนอวิธีการแยกเสียงหนัก/เบา (separated stress method) และวิธีการรวมลักษณะสำคัญของเสียงหนัก/เบา (incorporated stress feature method) เพื่อลดผลกระทบของเสียงหนัก/เบา ที่มีต่อการรู้วรรณยุกต์ จากผลการทดลอง พบว่า ทั้งสองวิธีช่วยเพิ่มอัตราการรู้จำ โดยมีอัตราการลดลงของความผิดพลาดสูงสุดที่ 32.43 และ 27.16 เปอร์เซ็นต์สำหรับฐานข้อมูล TPC และ TASC ตามลำดับ ท้ายสุดผู้วิจัยได้นำแบบจำลองวรรณยุกต์ชนิดต่างๆ มาประยุกต์กับระบบการรู้จำเสียงพูดระดับพยางค์ ซึ่งจากการทดลองพบว่า อัตราการลดลงของความผิดพลาดสูงสุดคือ 85.16 และ 75.06 เปอร์เซ็นต์ สำหรับฐานข้อมูล TPC และ TASC ตามลำดับ
Description:	Thesis (Ph.D.)--Chulalongkorn University, 2001
Degree Name:	Doctor of Philosophy
Degree Level:	Doctoral Degree
Degree Discipline:	Computer Engineering
URI:	http://cuir.car.chula.ac.th/handle/123456789/11193
ISBN:	9740311512
Type:	Thesis
Appears in Collections:	Eng - Theses

Files in This Item:

File	Description	Size	Format
nuttakorn.pdf		2.85 MB	Adobe PDF	View/Open

Show full item record