การจับคู่ประโยคที่ตรงกันในคลังข้อความขนานด้วยอนุกรมเวลา

ศิรินันท์ สินธุวาทิน

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/41130

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	โชติรัตน์ รัตนามหัทธนะ	-
dc.contributor.author	ศิรินันท์ สินธุวาทิน	-
dc.contributor.other	จุฬาลงกรณ์มหาวิทยาลัย. คณะวิศวกรรมศาสตร์	-
dc.date.accessioned	2014-03-18T02:47:04Z	-
dc.date.available	2014-03-18T02:47:04Z	-
dc.date.issued	2550	-
dc.identifier.uri	http://cuir.car.chula.ac.th/handle/123456789/41130	-
dc.description	วิทยานิพนธ์ (วท.ม.)--จุฬาลงกรณ์มหาวิทยาลัย, 2550	en_US
dc.description.abstract	ในปัจจุบันโปรแกรมประยุกต์ที่พัฒนาจากคลังข้อความขนานมีเพิ่มมากขึ้นเรื่อย ๆ โดยเฉพาะอย่างยิ่งในด้านการค้นคืนข้ามภาษา การแปลภาษาด้วยเครื่องและมนุษย์ และการประมวลผลภาษาธรรมชาติ ทำให้การประมวลผลคลังข้อความขนานกลายเป็นเรื่องที่นักวิจัยให้ความสนใจมากขึ้น ในงานวิจัยนี้นำเสนอกลวิธีในการจับคู่ประโยคที่ตรงกันในคลังข้อความขนาน โดยใช้อนุกรมเวลาซึ่งจะเก็บข้อมูลเกี่ยวกับความถี่และตำแหน่งของคำที่ปรากฏในคลังข้อความขนานสองภาษาใด ๆ และทำการจับคู่คำโดยการวัดความเหมือนกันของอนุกรมเวลา วิธีนี้มีข้อดีคือ ไม่ต้องใช้ความรู้ทางภาษาศาสตร์ เช่น ไวยากรณ์ วากยสัมพันธ์ โครงสร้างประโยค และการแปลจากพจนานุกรม เป็นต้น อย่างไรก็ตาม แม้ว่าคำที่เป็นคำเดียวกันในคลังข้อความขนานหลายภาษามักจะมีความถี่และตำแหน่งของการปรากฏคล้ายกัน ทำให้สามารถจับคู่ประโยคโดยใช้คำเหล่านี้เป็นตัวบ่งชี้ได้ แต่ก็ยังมีคำอีกเป็นจำนวนมากที่ไม่สามารถจับคู่คำด้วยวิธีนี้ได้ จากการทดลองพบว่าวิธีนี้เป็นประโยชน์และให้ผลดีกับข้อความขนานขนาดสั้นประมาณ 1 หน้ามากกว่าข้อความขนาดยาว เมื่อทดลองกับข้อความขนาดสั้นโดยใช้ฟังก์ชันระยะห่างแบบแมนฮัตตัน ความถูกต้องเฉลี่ยคิดเป็น 58 เปอร์เซ็นต์	en_US
dc.description.abstractalternative	As applications based on parallel corpora (parallel text) has increasingly expanded, especially in the areas of cross-language informational retrieval, machine/human translation, natural language processing, and multilingual lexicography, parallel-text processing has become the heart of the development. In this research, we propose a novel sentence alignment technique. We exploit a notion of time series representation, recording the position and frequency of word appearance, without any requirement of any linguistic knowledge, e.g. grammar/syntax, sentence structure, dictionary lookup, etc. We align word by using similarity measurement and the result of word alignment will be subsequently used for sentence alignment. Our intuition lies in the belief that similar words in any multilingual parallel text should possess similar frequency and the position of word occurrences. However, the experiment results have revealed several limitations of the method, where its utility and effectiveness seem to work better with short parallel text about 1 page. The experiment result on short parallel text by using manhattan distance gives an accuracy of 58 percent.	en_US
dc.language.iso	th	en_US
dc.publisher	จุฬาลงกรณ์มหาวิทยาลัย	en_US
dc.relation.uri	http://doi.org/10.14457/CU.the.2007.539	-
dc.rights	จุฬาลงกรณ์มหาวิทยาลัย	en_US
dc.subject	ภาษาอังกฤษ -- การแปลภาษาด้วยเครื่อง	en_US
dc.subject	ภาษาอังกฤษ -- การแปลเป็นภาษาไทย	en_US
dc.subject	การแปล -- โปรแกรมคอมพิวเตอร์	en_US
dc.subject	English language -- Machine translating	en_US
dc.subject	English language -- Translating into Thai	en_US
dc.subject	Translating -- Computer programs	en_US
dc.title	การจับคู่ประโยคที่ตรงกันในคลังข้อความขนานด้วยอนุกรมเวลา	en_US
dc.title.alternative	Sentence alignment in parallel text corpora using time series	en_US
dc.type	Thesis	en_US
dc.degree.name	วิทยาศาสตรมหาบัณฑิต	en_US
dc.degree.level	ปริญญาโท	en_US
dc.degree.discipline	วิทยาศาสตร์คอมพิวเตอร์	en_US
dc.degree.grantor	จุฬาลงกรณ์มหาวิทยาลัย	en_US
dc.email.advisor	Chotirat.R@Chula.ac.th	-
dc.identifier.DOI	10.14457/CU.the.2007.539	-
Appears in Collections:	Eng - Theses

Files in This Item:

File	Description	Size	Format
Sirinun_Si.pdf		2.31 MB	Adobe PDF	View/Open

Show simple item record