Incomplete time-series data forecasting based on clustering fill-in technique and ensembling neural network model

Sirapat Chiewchanwattana

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/67494

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Chidchanok Lursinsap	-
dc.contributor.author	Sirapat Chiewchanwattana	-
dc.contributor.other	Chulalongkorn University. Faculty of Science	-
dc.date.accessioned	2020-08-14T07:44:19Z	-
dc.date.available	2020-08-14T07:44:19Z	-
dc.date.issued	2005	-
dc.identifier.isbn	9741767501	-
dc.identifier.uri	http://cuir.car.chula.ac.th/handle/123456789/67494	-
dc.description	Thesis (Ph.D.)--Chulalongkorn University, 2005	en_US
dc.description.abstract	This dissertation demonstrates the problem of incomplete time-series prediction by modelling the forecasting of several natural and social phenomena. The modeling consists of two main steps. The first step is to estimate the collected incomplete data, which are considered as missing data or missing values. The second step is to predict new data based on the nature of the data obtained from the first step. Our solution is to develop a new neural network model for forecasting incomplete time-series data and improving the accuracy of prediction. Two neural network models are proposed. First, various versions of EM-based algorithm and smoothing spline interpolation are used to preprocess the incomplete data sets. The individual networks are trained by supervised multilayer perceptron(MLP) with extended Kalman filtering. The ensemble construction is used for the combination of the individual networks. We name this type of network Fill-in - Generalized Ensemble Method (FI-GEM) networks. Second, each individual network uses a Finite Impulse Response model to perform the prediction. The outputs of all individual neural networks are combined by the genetic algorithm-based selective neural network ensemble method (GASEN). We denote this network as a reconstructed missing data-finite impulse response selective ensemble (RMD-FSE) network. Moreover, we proposed a new fill-in technique that is improved for estimating missing values based on clustering technique for characterizing the pattern of incomplete time-series data. The main idea is the time-series data are divided into separate subsequences of different sizes and, therefore, each subsequence can be viewed as a window. The imputation of missing samples is achieved by finding a complete subsequence similar to the missing sample subsequence and imputing the missing samples from this complete subsequence. The imputation accuracy of the proposed algorithm, namely varied window clustering (WDC) algorithm is comparable or better than the others traditional methods such as: the spline interpolation, the multiple imputation (MI), and the optimal completion strategy fuzzy c-means algorithm (OCSFCM) in case of the non-stationary time-series data.	en_US
dc.description.abstractalternative	วิทยานิพนธ์นี้นำเสนอการพยากรณ์ข้อมูลอนุกรมเวลาที่ไม่สมบูรณ์ โดยอาศัยการจำลองรูปแบบของโครงข่ายประสาทเทียม ซึ่งการจำลองนั้นสามารถแบ่งได้เป็นสองขั้นตอนดังนี้ ขั้นตอนที่หนึ่ง ทำการเติมเต็มข้อมูลอนุกรมเวลาที่ไม่สมบูรณ์นั้นให้สมบูรณ์ ในขั้นตอนที่สองทำการพยากรณ์ ข้อมูลอนุกรมเวลาที่ได้จากขั้นตอนที่หนึ่ง การแก้ปัญหาในงานนี้คือพัฒนาแบบจำลองโครงข่ายประสาทเทียมใหม่ สำหรับการพยากรณ์ข้อมูลอนุกรมเวลาที่ไม่สมบูรณ์ และยังต้องสามารถให้ความถูกต้องในการพยากรณ์เพิ่มขึ้นด้วย โดยได้นำเสนอแบบจำลองโครงข่ายประสาทเทียม สองแบบ แบบแรก ใช้วิธีการเติมเต็มข้อมูลแบบ EM หลายลักษณะ และวิธีการเติมเต็มข้อมูลแบบ Spline ซึ่งข้อมูลหลายๆ ชุดที่ถูกเติมเต็มจากหลายๆ วิธีนั้นจะถูกนำมาสอนโดยใช้โครงข่ายประสาทเทียม MLP โดยใช้แบบขยาย Kalman Filtering จากนั้นทำการประสานผลลัพธ์ของโครงข่ายประสาทเทียมทุกโครงข่ายเข้าด้วยกัน แบบจำลองโครงข่ายนี้ให้ชื่อว่า โครงข่าย F-GEM แบบที่สองปรับเปลี่ยนมาใช้โครงข่ายประสาทเทียม FIR เพื่อทำการพยากรณ์ จากนั้นผลลัพธ์ของโครงข่ายประสาทเทียมทุกโครงข่ายจะถูกประสานเข้าด้วยกันโดยใช้วิธีการเลือกโครงข่ายแบบ genetic algorithm ให้ชื่อแบบจำลองโครงข่ายนี้ว่า โครงข่าย RMD-FSE นอกจากนั้นยังได้นำเสนอวิธีการเติมเต็มข้อมูลแบบใหม่ เพื่อปรับปรุงการประมาณค่าข้อมูลที่หายไปนั้นให้ได้ค่าที่ถูกต้องมากยิ่งขึ้น โดยได้ใช้เทคนิคการจัดกลุ่ม โดยอาศัยคุณลักษณะของรูปแบบข้อมูลที่มีอยู่จริง แนวคิดหลักคือทำการตัดแบ่งข้อมูลอนุกรมเวลาออกเป็นหลายๆ ชิ้นที่มีขนาดต่างๆ กัน วิธีการคำนวณหาค่าข้อมูลที่หายไป จะคำนวณหาจากชิ้นข้อมูลที่มีความคล้ายกับชิ้นที่มีข้อมูลที่หายไปมากที่สุดแล้วทำการคำนวณหาค่าข้อมูลที่หายไปนั้น ให้ชื่อว่า ขั้นตอนวิธี WDC ซึ่งสามารถให้ผลที่เทียบเท่าหรือดีกว่าวิธีอื่น เช่น EM, M1, OCSFCM และ Spline ในกรณีของข้อมูลอนุกรมเวลาที่ไม่คงที่	en_US
dc.language.iso	en	en_US
dc.publisher	Chulalongkorn University	en_US
dc.rights	Chulalongkorn University	en_US
dc.subject	Time-series analysis	en_US
dc.subject	Evolutionary computation	en_US
dc.subject	Neural networks (Computer science)	en_US
dc.subject	การวิเคราะห์อนุกรมเวลา	en_US
dc.subject	การคำนวณเชิงวิวัฒนาการ	en_US
dc.subject	นิวรัลเน็ตเวิร์ค (วิทยาการคอมพิวเตอร์)	en_US
dc.title	Incomplete time-series data forecasting based on clustering fill-in technique and ensembling neural network model	en_US
dc.title.alternative	การพยากรณ์ข้อมูลอนุกรมเวลาที่ไม่สมบูรณ์โดยใช้วิธีเติมเต็มแบบจัดกลุ่มข้อมูลให้สมบูรณ์ และวิธีประสานผลของตัวแบบโครงข่ายประสาท	en_US
dc.type	Thesis	en_US
dc.degree.name	Doctor of Philosophy	en_US
dc.degree.level	Doctoral Degree	en_US
dc.degree.discipline	Computer Science	en_US
dc.degree.grantor	Chulalongkorn University	en_US
dc.email.advisor	Chidchanok.L@Chula.ac.th	-
Appears in Collections:	Sci - Theses

Files in This Item:

File	Description	Size	Format
Sirapat_ch_front_p.pdf	หน้าปก และบทคัดย่อ	1.08 MB	Adobe PDF	View/Open
Sirapat_ch_ch1_p.pdf	บทที่ 1	817.41 kB	Adobe PDF	View/Open
Sirapat_ch_ch2_p.pdf	บทที่ 2	759.73 kB	Adobe PDF	View/Open
Sirapat_ch_ch3_p.pdf	บทที่ 3	1.5 MB	Adobe PDF	View/Open
Sirapat_ch_ch4_p.pdf	บทที่ 4	4.33 MB	Adobe PDF	View/Open
Sirapat_ch_ch5_p.pdf	บทที่ 5	671.03 kB	Adobe PDF	View/Open
Sirapat_ch_back_p.pdf	บรรณานุกรม และภาคผนวก	975.99 kB	Adobe PDF	View/Open

Show simple item record