การรู้จำตัวอักษรภาษาไทยโดยใช้การวิเคราะห์องค์ประกอบสำคัญแบบหลายประเภทและนิวรอลเน็ตเวิร์ก

อุดม สถาพรชัยสิทธิ์

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/7918

Title:	การรู้จำตัวอักษรภาษาไทยโดยใช้การวิเคราะห์องค์ประกอบสำคัญแบบหลายประเภทและนิวรอลเน็ตเวิร์ก
Other Titles:	Thai optical character recognition using multi-class principal components analysis and neural networks
Authors:	อุดม สถาพรชัยสิทธิ์
Advisors:	บุญเสริม กิจศิิริกุล
Other author:	จุฬาลงกรณ์มหาวิทยาลัย. คณะวิศวกรรมศาสตร์
Advisor's Email:	boonserm@cp.eng.chula.ac.th, Boonserm.K@chula.ac.th
Subjects:	นิวรัลเน็ตเวิร์ค (คอมพิวเตอร์) การรู้จำอักขระ (คอมพิวเตอร์)
Issue Date:	2549
Publisher:	จุฬาลงกรณ์มหาวิทยาลัย
Abstract:	นำเสนอวิธีการรู้จำตัวอักษรภาษาไทยแบบใหม่ที่เรียกว่าการวิเคราะห์องค์ประกอบสำคัญแบบหลายประเภทสำหรับการรู้จำตัวอักษรภาษาไทย โดยวิธีการนี้มีแนวคิดพื้นฐานมาจากการวิเคราะห์องค์ประกอบสำคัญดั้งเดิม ซึ่งการวิเคราะห์องค์ประกอบสำคัญคั้งเดิมมีข้อดีที่สามารถลดปริมาณข้อมูลทำให้ข้อมูลที่ได้มีขนาดกะทัดรัดโดยใช้กระบวนการแปลงเชิงเส้นและการตัดลดคุณลักษณะที่ไม่สำคัญออก อย่างไรก็ดีการวิเคราะห์องค์ประกอบสำคัญดั้งเดิมนี้ยังขาดประสิทธิภาพในการแบ่งแยกข้อมูลประเภทที่มีจำนวนมาก ดังเช่นตัวอักษรภาษาไทย ส่วนการวิเคราะห์องค์ประกอบสำคัญแบบหลายประเภทที่นำเสนอนี้มีประสิทธิภาพในการแบ่งแยกข้อมูลโดยการสร้างเซตขององค์ประกอบสำคัญโดยที่แต่ละเซตสร้างจากข้อมูลในแต่ละประเภท แต่การสร้างเซตขององค์ประกอบสำคัญหลายเซตนั้นมีจุดอ่อนตรงที่ต้องใช้ทรัพยากรและเวลาในการคำนวณมากเกินไปซึ่งเป็นการสิ้นเปลื่องและเสียเวลาจึงต้องมีวิธีในการลดจำนวนองค์ประกอบสำคัญให้น้อยลง ดังนี้นในวิทยานิพนธ์ฉบับนี้จึงนำเสนอวิธีการสำหรับการกำหนดจำนวนองค์ประกอบสำคัญที่เหมาะสมสำหรับข้อมูลแต่ละประเภทที่แต่ต่างกัน 4 วิธีด้วยกัน โดยผลการทดลองแสดงให้เห็นว่าวิธีการที่นำเสนอนี้ให้ความถูกต้องในการรู้จำตัวอักษรที่สูงขั้นว่าวิธีการวิเคราะห์องค์ประกอบสำคัญดั้งเดิม
Other Abstract:	Presents a novel method, called multi-class principal component analysis (MCPCA), for Thai optical character recognition (Thai OCR) . The method is based on the original principal components analysis (PCA) for Thai OCR. The original PCA reduces the original data to smaller size data. It has the advantage of compact representation of data by linear transformation and reduction of unimportant features. However, PCA lacks of discriminative power. The proposed MCPCA possesses strong discriminative power by constructing several sets of principal components, each for one class of data. Each set is the representative for the corresponding class of data. However, constructing several sets of principal components consumes lot of resources and computational time. We propose four methods for determining the appropriate number of principal components for each class. The experimental results show that our proposed methods provide higher accuracy than the original PCA for Thai OCR.
Description:	วิทยานิพนธ์ (วศ.ม.)--จุฬาลงกรณ์มหาวิทยาลัย, 2549
Degree Name:	วิศวกรรมศาสตรมหาบัณฑิต
Degree Level:	ปริญญาโท
Degree Discipline:	วิศวกรรมคอมพิวเตอร์
URI:	http://cuir.car.chula.ac.th/handle/123456789/7918
Type:	Thesis
Appears in Collections:	Eng - Theses

Files in This Item:

File	Description	Size	Format
Udom_Sa.pdf		915.49 kB	Adobe PDF	View/Open

Show full item record