การทำดัชนีที่แม่นยำสำหรับการค้นข้อมูลอนุกรมเวลาตามความคล้ายภายใต้ไทม์วอร์ปปิงด้วยการเข้าถึงข้อมูลแบบลำดับโดยใช้ดัชนี

พงศกร เรืองรองหิรัญญา

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/31241

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	โชติรัตน์ รัตนามหัทธนะ	-
dc.contributor.author	พงศกร เรืองรองหิรัญญา	-
dc.contributor.other	จุฬาลงกรณ์มหาวิทยาลัย. คณะวิศวกรรมศาสตร์	-
dc.date.accessioned	2013-05-23T13:08:47Z	-
dc.date.available	2013-05-23T13:08:47Z	-
dc.date.issued	2551	-
dc.identifier.uri	http://cuir.car.chula.ac.th/handle/123456789/31241	-
dc.description	วิทยนิพนธ์ (วศ.ม.)--จุฬาลงกรณ์มหาวิทยาลัย, 2551	en
dc.description.abstract	การค้นคืนข้อมูลอนุกรมเวลาตามความคล้ายเป็นสิ่งที่สำคัญสำหรับงานประยุกต์มากมาย เช่น การค้นคืนข้อมูลมัลติมีเดียและการจำแนกข้อมูลเหล่านั้น สำหรับงานประยุกต์ดังกล่าวนั้นไดนามิกไทม์วอร์ปปิงจัดเป็นมาตรวัดที่ใช้สำหรับการค้นคืนข้อมูลอนุกรมเวลาที่มีความแม่นยำมากที่สุดวิธีหนึ่ง อย่างไรก็ตามการใช้ไดนามิกไทม์วอร์ปปิงสำหรับการค้นคืนข้อมูลนั้นต้องใช้เวลาในการคำนวณสูง ซึ่งปัญหาดังกล่าวเป็นประเด็นที่มีนักวิจัยมากมายสนใจที่จะพัฒนาให้มีความรวดเร็วมากยิ่งขึ้น จนเมื่อไม่นานมานี้ได้มีงานวิจัยที่ได้เสนอวิธีการแก้ปัญหาดังกล่าวด้วยฟังก์ชันขอบเขตล่างซึ่งสามารถลดทอนการคำนวณไดนามิกไทม์วอร์ปปิงและสามารถเพิ่มความเร็วในการค้นคืนข้อมูลได้อย่างมีประสิทธิภาพ แต่สำหรับในงานด้านฐานข้อมูล ประเด็นสำคัญนั้นไม่ได้อยู่เพียงแค่การลดเวลาในการคำนวณสำหรับการค้นคืนข้อมูลเท่านั้น แต่อยู่ที่จะทำอย่างไรเพื่อที่จะลดการเข้าถึงข้อมูลหรืออินพุต / เอาต์พุตใหได้มากที่สุด ซึ่งวิธีที่ใช้กันทั่วไปก็คือวิธีการทำดัชนี แต่จนถึงปัจจุบันยังไม่มีวิธีการทำดัชนีข้อมูลอนุกรมเวลาที่มีประสิทธิภาพเนื่องจากต้องเผชิญกับปัญหาสำคัญอยู่สองประการ ประการแรกเกิดจากการที่ข้อมูลอนุกรมเวลานั้นมีจำนวนมิติที่สูงมาก และในประการที่สองเนื่องจากการคำนวณไดนามิกไทม์วอร์ปปิงนั้นจุดข้อมูลหนึ่งสามารถมีความสัมพันธ์กับจุดข้อมูลที่อยู่คนละมิติได้ ด้วยเหตุนี้วิธีการทำดัชนีข้อมูลอนุกรมเวลาที่มีอยู่ในปัจจุบันยังคงไม่สามารถลดทอนการเข้าถึงข้อมูลให้ได้มากเพียงพอกับเวลาที่ต้องเสียเพิ่มเติมจากการเข้าถึงข้อมูลแบบสุ่ม กล่าวคือการกราดตรวจข้อมูลตามลำดับนั้นสามารถค้นคืนข้อมูลได้เร็วกว่าการทำดัชนีทุกวิธีที่มีอยู่ในปัจจุบัน ดังนั้นงานวิจัยนี้จึงนำเสนอวิธีการทำดัชนีข้อมูลอนุกรมเวลาที่มีประสิทธิภาพด้วยการนำวิธีการจับกลุ่มข้อมูลมาประยุกต์ใช้ โดยใช้แนวคิดในการจัดลำดับการเข้าถึงข้อมูลในแต่ละกลุ่มข้อมูลโดยเรียงตามค่าระยะทางขอบเขตล่างที่ได้นำเสนอ ซึ่งเป็นผลทำให้การค้นคืนข้อมูลสามารถเข้าถึงข้อมูลที่มีความคล้ายได้ในช่วงต้นของกระบวนการค้น นอกจากนี้ยังสามารถลดทอนการเข้ากลุ่มข้อมูลหลายกลุ่มได้ด้วยค่าระยะทางขอบเขตล่างดังกล่าว ในการทดลองนั้นวิธีการทำดัชนีที่ได้นำเสนอสามารถค้นคืนข้อมูลได้เร็วกว่าวิธีกราดตรวจ โดยลดการเข้าถึงข้อมูลได้หลายสิบเท่า	en
dc.description.abstractalternative	As time series has become one of the most prevalent types of data in this digital age, similarity search occurs to be crucial for these applications, including multimedia data retrieval and classification. Dynamic time warping (DTW) is one of the most accurate distance measure exploited in the search. However, it is known to suffer from such a high computational cost, raising great amount of interest in trying to speed it up. Recently, this obstacle has been alleviated using efficient existing lower bounding functions, e.g., FTW and LB_Keogh which can greatly speed up the distance calculation for the similarity search. However, in database field, the crux of the matter is how to minimize the data access or I/O cost. The most typical approach is through indexing. Unlike other types of data, time series indexing is almost impractical due to their explosively high number of dimensions. Moreover, unlike typical Euclidean distance measure, in DTW distance measure, a data point can be in relation to data points from other dimensions. As a result, time series indexing is beyond challenging since sequential scan unexpectedly outperforms all existing indexing methods. This research proposes an efficient time series indexing structure. With clustering technique on data preprocessing, the order of the data access on each data cluster is sorted by the proposed lower bounding distance which indicates the similarity between query and data cluster. Therefore, the similar candidates can be accessed early in the search process. Moreover, many data clusters can be pruned by the proposed lower bounding distance. In the experiments, the proposed indexing method outperforms the sequential scan by over an order of magnitude in terms of data access reduction.	en
dc.format.extent	3351715 bytes	-
dc.format.mimetype	application/pdf	-
dc.language.iso	th	es
dc.publisher	จุฬาลงกรณ์มหาวิทยาลัย	en
dc.relation.uri	http://doi.org/10.14457/CU.the.2008.779	-
dc.rights	จุฬาลงกรณ์มหาวิทยาลัย	en
dc.subject	การวิเคราะห์อนุกรมเวลา	en
dc.subject	การทำดัชนี	en
dc.title	การทำดัชนีที่แม่นยำสำหรับการค้นข้อมูลอนุกรมเวลาตามความคล้ายภายใต้ไทม์วอร์ปปิงด้วยการเข้าถึงข้อมูลแบบลำดับโดยใช้ดัชนี	en
dc.title.alternative	Exact indexing for similarity search on time series data under time warping using indexed sequential access	en
dc.type	Thesis	es
dc.degree.name	วิศวกรรมศาสตรมหาบัณฑิต	es
dc.degree.level	ปริญญาโท	es
dc.degree.discipline	วิศวกรรมคอมพิวเตอร์	es
dc.degree.grantor	จุฬาลงกรณ์มหาวิทยาลัย	en
dc.email.advisor	Chotirat.R@Chula.ac.th	-
dc.identifier.DOI	10.14457/CU.the.2008.779	-
Appears in Collections:	Eng - Theses

Files in This Item:

File	Description	Size	Format
Pongsakorn_Ru.pdf		3.27 MB	Adobe PDF	View/Open

Show simple item record