ขั้นตอนวิธีการจัดทำดัชนีสำหรับข้อความไทยที่มีความผิดพลาด

วรวัฒน์ วรศิลป์

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/4136

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	สมชาย ประสิทธิ์จูตระกูล	-
dc.contributor.author	วรวัฒน์ วรศิลป์	-
dc.contributor.other	จุฬาลงกรณ์มหาวิทยาลัย. คณะวิศวกรรมศาสตร์	-
dc.date.accessioned	2007-09-18T02:25:26Z	-
dc.date.available	2007-09-18T02:25:26Z	-
dc.date.issued	2542	-
dc.identifier.isbn	9743346309	-
dc.identifier.uri	http://cuir.car.chula.ac.th/handle/123456789/4136	-
dc.description	วิทยานิพนธ์ (วท.ม.)--จุฬาลงกรณ์มหาวิทยาลัย, 2542	en
dc.description.abstract	วิทยานิพนธ์ฉบับนี้กล่าวถึงขั้นตอนวิธีการจัดทำดัชนีสำหรับข้อความไทยที่มีความผิดพลาด โดยมีจุดประสงค์ในการทำให้ดัชนีมีความสมบูรณ์มากขึ้นด้วยการเพิ่มคำที่ถูกต้องเข้าไปในดัชนี ในกรณีที่ข้อความที่นำมาทำดัชนีมีความผิดพลาดปนอยู่ การจัดทำดัชนีที่นำเสนอนี้อาศัยคุณสมบัติ "ความเฉพาะตัว" ของสตริงซึ่งคือ จำนวนครั้งของสตริงที่ปรากฏเป็นส่วนหนึ่งของคำในพจนานุกรม ขั้นตอนวิธีการจัดทำดัชนีแบ่งออกเป็นสามขั้นตอนคือ (1) หารายการของสตริงย่อยของข้อความที่ประกอบกันเป็นข้อความเดิมได้ โดยมีผลรวมของค่าของฟังก์ชัน (ที่มีค่าแปรตามค่าเฉพาะตัว) น้อยที่สุด (2) หาสตริงย่อยจากผลลัพธ์ที่ได้ในขั้นตอนแรกที่มีโอกาสสูงที่จะเกิดจากความผิดพลาดในข้อความ โดยพิจารณาจากค่าความเฉพาะตัวของสตริงย่อยที่เกินเกณฑ์ที่กำหนดไว้ และ (3) หาคำในพจนานุกรมที่ใกล้เคียงกับคำหาได้จากการรวมสตริงย่อยของผลลัพธ์ในขั้นตอนที่สองกับสตริงข้างเคียงในข้อความ มาเป็นคำเพิ่มเติมในการจัดทำดัชนี จากผลการทดลองพบว่าสามารถเพิ่มความสมบูรณ์ให้กับดัชนีเดิมซึ่งไม่พิจารณาความผิดพลาดจาก 87% เป็น 97% ในขณะที่ลดความแม่นยำของดัชนีเดิมจาก 83% ลงเป็น 60%	en
dc.description.abstractalternative	This thesis presents an indexing algorithm for Thai text with errors. The algorithm utilizes string's "uniqueness" property which is defined to be the number of times that string appear as parts words in a dictionary. There are three steps in the algorithm. First, we find a list of substrings which can be re-assembled to the original text and minimizes a function of substring uniquenesses. Second substrings of the list potentially caused by error are identified. This can be done by comparing a function of substring uniqueness to a preset threshold. Last, words in the dictionary which approximately match strings obtained by concatenating the potentially error-caused substrings and adjacent substrings are added in the index list. Experimental results showed that this algorithm can improve index completeness from 87% to 94% whiles decrease index precision from 83% to 60%	en
dc.format.extent	6216110 bytes	-
dc.format.mimetype	application/pdf	-
dc.language.iso	th	en
dc.publisher	จุฬาลงกรณ์มหาวิทยาลัย	en
dc.rights	จุฬาลงกรณ์มหาวิทยาลัย	en
dc.subject	ภาษาไทย	en
dc.subject	ดัชนี	en
dc.subject	การค้นข้อสนเทศ	en
dc.title	ขั้นตอนวิธีการจัดทำดัชนีสำหรับข้อความไทยที่มีความผิดพลาด	en
dc.title.alternative	Indexing algorithm for Thai text with errors	en
dc.type	Thesis	en
dc.degree.name	วิทยาศาสตรมหาบัณฑิต	en
dc.degree.level	ปริญญาโท	en
dc.degree.discipline	วิทยาศาสตร์คอมพิวเตอร์	en
dc.degree.grantor	จุฬาลงกรณ์มหาวิทยาลัย	en
dc.email.advisor	Somchai.P@Chula.ac.th	-
Appears in Collections:	Eng - Theses

Files in This Item:

File	Description	Size	Format
voravat.pdf		4.32 MB	Adobe PDF	View/Open

Show simple item record