Imputing incomplete multi-dimensional data using neural network and clustering similarity comparison

Sathit Prasomphan

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/35707

Title:	Imputing incomplete multi-dimensional data using neural network and clustering similarity comparison
Other Titles:	การเติมข้อมูลในหลายมิติที่ไม่สมบูรณ์โดยอาศัยเทคนิคโครงข่ายประสาทเทียม และการเปรียบเทียบความคล้ายของกลุ่มข้อมูล
Authors:	Sathit Prasomphan
Advisors:	Chidchanok Lursinsap Sirapat Chiewchanwattana
Other author:	Chulalongkorn University. Faculty of Science
Advisor's Email:	Chidchanok.L@Chula.ac.th Sirapat.C@Student.chula.ac.th
Subjects:	Neural networks (Computer science) Cluster analysis Electronic data processing นิวรัลเน็ตเวิร์ค (คอมพิวเตอร์) การวิเคราะห์จัดกลุ่ม การประมวลผลข้อมูลอิเล็กทรอนิกส์
Issue Date:	2011
Publisher:	Chulalongkorn University
Abstract:	This dissertation presented a method to fill in missing data in multi-dimensional data. These data are divided into two categories. The first one is incomplete time series data. The algorithm for imputing the missing time-series data is based on the gradient of the area surrounding the missing data. The missing information which is the gradient of a data falls in one of the following three categories: positive gradient, negative gradient, and zero gradient. When a group of missing data belongs to one of three categories, the missing data are imputed with bootstrapping method. The second type is filling in the incomplete multi-dimensional data in an image. To impute the missing image, the characteristics of missing image are used. If missing data are randomly and fine scattered, an artificial neural network model is used to create an approximated surface to cover those missing data. But if the missing data are clustered in forms of an empty shape, then a similarity pattern searching and filling is performed. The missing data areas are divided into a set of equal size of windows. This windowed area will be compared with every other non-missing data area of the image area to find the most similar area with the missing area. The experimental results concluded that our proposed algorithms are outperformed the other tradition methods in several cases.
Other Abstract:	วิทยานิพนธ์ฉบับนี้นำเสนอวิธีการเติมข้อมูลที่สูญหายในข้อมูลหลายมิติ โดยแบ่งลักษณะของข้อมูลที่ใช้ออกเป็นสองกลุ่มคือ กลุ่มที่หนึ่งเป็นการเติมข้อมูลอนุกรมเวลาที่ไม่สมบูรณ์ โดยอาศัยข้อมูลเกรเดียนท์ของข้อมูลรอบข้างของบริเวณที่หายไป แนวคิดหลักของวิธีนี้คือ ข้อมูลที่หายไปจะมีเกรเดียนท์อยู่ในบริเวณเกรเดียนท์หนึ่งในสามประเภทต่อไปนี้คือ เกรเดียนท์ที่เป็นบวก เกรเดียนท์ที่เป็นลบ และเกรเดียนท์ที่เป็นศูนย์ เมื่อได้ประเภทของข้อมูลที่สูญหายแล้วจะใช้วิธีการสุ่มแบบบูตสแทรปสำหรับการเติมข้อมูล ส่วนกลุ่มที่สองคือ การเติมข้อมูลในหลายมิติที่ไม่สมบูรณ์โดยการทดลองกับข้อมูลรูปภาพ โดยอาศัยลักษณะของการสูญหายของข้อมูลมาใช้ในการเติมข้อมูลโดยที่ กรณีที่ข้อมูลมีการสูญหายในลักษณะสุ่มและมีกระจายตัวแบบสม่ำเสอ วิธีการแก้ปัญหาคือ การใช้แบบจำลองโครงข่ายประสาทเทียม โดยใช้เฉพาะข้อมูลรอบข้างของบริเวณที่สูญหายภายใต้รัศมีที่กำหนดเพื่อสร้างพื้นผิวสำหรับบริเวณที่สูญหาย กรณีที่ข้อมูลที่สูญหายอยู่ในลักษณะรูปร่างแบบต่าง ๆ วิธีการแก้ปัญหาคือ การแบ่งพื้นที่ที่สูญหายเป็นหน้าต่าง หลังจากนั้นจะนำบริเวณดังกล่าวไปเปรียบเทียบกับทุก ๆ บริเวณของรูปภาพเพื่อหาบริเวณที่มีความคล้ายกับบริเวณที่สูญหายมากที่สุด จากผลการทดลองสามารถสรุปได้ว่า เมื่อเติมข้อมูลโดยวิธีที่นำเสนอกับรูปภาพที่มีรูปแบบของการสูญหายแบบต่าง ๆ จะสามารถเพิ่มความถูกต้องของการเติมข้อมูลเมื่อเปรียบเทียบกับวิธีอื่น ๆ
Description:	Thesis (Ph.D.)--Chulalongkorn University, 2011
Degree Name:	Doctor of Philosophy
Degree Level:	Doctoral Degree
Degree Discipline:	Computer Science
URI:	http://cuir.car.chula.ac.th/handle/123456789/35707
URI:	http://doi.org/10.14457/CU.the.2011.1401
metadata.dc.identifier.DOI:	10.14457/CU.the.2011.1401
Type:	Thesis
Appears in Collections:	Sci - Theses

Files in This Item:

File	Description	Size	Format
sathit_pr.pdf		3.19 MB	Adobe PDF	View/Open

Show full item record