การเข้ารหัสคำทับศัพท์เพื่อการค้นคืนข้ามภาษาไทย-อังกฤษ

ประยุทธ สุวรรณวิสารท

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/9581

Title:	การเข้ารหัสคำทับศัพท์เพื่อการค้นคืนข้ามภาษาไทย-อังกฤษ
Other Titles:	Transliterated word encoding for Thai-English cross-language retrieval
Authors:	ประยุทธ สุวรรณวิสารท
Advisors:	สมชาย ประสิทธิ์จูตระกูล
Other author:	จุฬาลงกรณ์มหาวิทยาลัย. บัณฑิตวิทยาลัย
Advisor's Email:	Somchai.P@Chula.ac.th
Subjects:	การถอดตัวอักษร การค้นข้อสนเทศ ดัชนีเสียง
Issue Date:	2541
Publisher:	จุฬาลงกรณ์มหาวิทยาลัย
Abstract:	วิทยานิพนธ์ฉบับนี้นำเสนอขั้นตอนวิธีการเข้ารหัสคำทับศัพท์เพื่อการค้นคืนข้ามภาษาไทย-อังกฤษ ซึ่งอนุญาตให้ใช้ข้อคำถามที่เป็นคำทับศัพท์ภาษาอังกฤษหรือภาษาไทยในการค้นคืนเอกสารที่มีคำหลักตรงกันในอีกภาษา โดยมีข้อสมมุติฐานว่าสามารถทำการค้นคืนข้ามภาษาไทย-อังกฤษได้โดยไม่ต้องอาศัยพจนานุกรม ขั้นตอนวิธีที่นำเสนอแบ่งออกเป็นสองส่วนคือ (1) ขั้นตอนวิธีการเข้ารหัสคำทับศัพท์เพื่อการค้นคืนข้ามแบบภาษาไทยทับศัพท์ภาษาอังกฤษ และ (2) ขั้นตอนวิธีการเข้ารหัสคำทับศัพท์เพื่อการค้นคืนข้ามแบบภาษาอังกฤษทับศัพท์ภาษาไทย ขั้นตอนวิธีการค้นคืนข้ามภาษานี้จะทำงานโดยการเข้ารหัสคำในข้อคำถามแล้วนำรหัสคำที่ได้ไปเปรียบเทียบกับรหัสคำในดัชนีคำหลัก การเปรียบเทียบรหัสคำในการข้ามภาษาแบบภาษาไทยทับศัพท์ภาษาอังกฤษจะอาศัยวิธีการเปรียบเทียบแบบเหมือนกันทุกประการ ส่วนการเปรียบเทียบรหัสคำในการข้ามภาษาอังกฤษทับศัพท์ภาษาไทยจะอาศัยวิธีการเปรียบเทียบเชิงประมาณและแยกเปรียบเทียบส่วนพยัญชนะและสระออกจากกัน โดยใช้เทคนิคกำหนดการพลวัต ผลการทดลองแสดงให้เห็นว่าขั้นตอนวิธีการเข้ารหัสคำทับศัพท์เพื่อการค้นคืนข้ามภาษาไทย-อังกฤษแบบภาษาไทยทับศัพท์ภาษาอังกฤษมีค่าเรียกคืนสูงถึง 90 เปอร์เซ็นต์ และค่าแม่นยำสูงถึง 78 เปอร์เซ็นต์ เมื่อคำทับศัพท์มีความยาวมากกว่า 7 ตัวอักษรและแบบภาษาอังกฤษทับศัพท์ภาษาไทยมีค่าเรียกคืนสูงถึง 73 เปอร์เซ็นต์ และค่าแม่นยำสูงถึง 69 เปอร์เซ็นต์
Other Abstract:	This thesis presents two algorithms for transliterated word encoding for Thai-English cross-language retrieval. The algorithms enable retrieval of documents containing either the English-to-Thai or Thai-to-English transliterated keywords. We have a hypothesis that cross-language retrieval does not use a dictionary. The proposed algorithms are (1) English-to-Thai transliterated word encoding for cross-language retrieval algorithm and (2) Thai-to-English transliterated word encoding for cross-language retrieval algorithm. This cross-language retrieval is done by encoding each word in the query terms and then matching the query code with codes of keywords in the index. The English-to-Thai cross-language retrieval uses exact code matching. On the other hand, the Thai-to-English uses approximate code matching (separatedly done for consonant and vowel parts) by using dynamic programming technique. Experimental results showed that for keywords of length longer than seven characters the recall and precision of the English-to-Thai transliterated word cross-language retrieval are 90% and 78%, respectively. The recall and precision of the Thai-to-English transliterated word are around 73% and 69%, respectively.
Description:	วิทยานิพนธ์ (วท.ม.)--จุฬาลงกรณ์มหาวิทยาลัย, 2541
Degree Name:	วิทยาศาสตรมหาบัณฑิต
Degree Level:	ปริญญาโท
Degree Discipline:	วิทยาศาสตร์คอมพิวเตอร์
URI:	http://cuir.car.chula.ac.th/handle/123456789/9581
ISBN:	9743321233
Type:	Thesis
Appears in Collections:	Grad - Theses

Files in This Item:

File	Size	Format
Prayut_Su_front.pdf	772.51 kB	Adobe PDF	View/Open
Prayut_Su_ch1.pdf	717.53 kB	Adobe PDF	View/Open
Prayut_Su_ch2.pdf	857.28 kB	Adobe PDF	View/Open
Prayut_Su_ch3.pdf	746.43 kB	Adobe PDF	View/Open
Prayut_Su_ch4.pdf	941.63 kB	Adobe PDF	View/Open
Prayut_Su_ch5.pdf	700.89 kB	Adobe PDF	View/Open
Prayut_Su_back.pdf	1.01 MB	Adobe PDF	View/Open

Show full item record