การใช้กลุ่มของภาพฉากเพื่อจำแนกวิดีโอจากรายการโทรทัศน์

อิทธิศักดิ์ เผือกศรี

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/77213

Title:	การใช้กลุ่มของภาพฉากเพื่อจำแนกวิดีโอจากรายการโทรทัศน์
Other Titles:	Using clustered frames to classify videos from television programs
Authors:	อิทธิศักดิ์ เผือกศรี
Advisors:	สุกรี สินธุภิญโญ
Other author:	จุฬาลงกรณ์มหาวิทยาลัย. คณะวิศวกรรมศาสตร์
Issue Date:	2562
Publisher:	จุฬาลงกรณ์มหาวิทยาลัย
Abstract:	งานวิจัยนี้นำเสนอวิธีการจำแนกวิดีโอ ด้วยเทคนิคแบบจำลองคอนโวลูชันสองมิติ และการเรียนรู้แบบกึ่งกำกับ โดยทั่วไปการจำแนกวิดีโอที่มีประสิทธิภาพสูง ถูกนำเสนอโดยใช้วิธีการเรียนรู้แบบลึก อย่างไรก็ตามจากการเพิ่มขึ้นของจำนวนวิดีโอในปัจจุบัน การเรียนรู้ของแบบจำลองเพื่อจำแนกวิดีโอจำเป็นต้องใช้ประสิทธิภาพในการประมวลผลสูง งานวิจัยนี้จึงนำเสนอวิธีการเรียนรู้ด้วยแบบจำลองคอนโวลูชันสองมิติโดยใช้การซ้อนทับกันของภาพฉาก และการจัดกลุ่มของภาพฉากด้วยแผนที่จัดระเบียบด้วยตนเองก่อนนำไปสร้างแบบจำลองจำแนกประเภทรายการ โดยการสร้างแบบจำลองประเภทรายการถูกนำเสนอใน 4 รูปแบบ ประกอบด้วย การออกเสียง การคำนวณค่าความวุ่นวาย การเรียนรู้ด้วยแบบจำลองโครงข่ายประสาทเทียม การเรียนรู้ด้วยหน่วยความจำระยะสั้นแบบยาว อีกทั้งยังประเมินจำนวนภาพฉากสำหรับการประมวลผลในการจัดกลุ่มโดยเปรียบเทียบระหว่างระยะเวลาการเรียนรู้และความแม่นยำ วิธีการในงานวิจัยนี้ถูกนำเสนอด้วยประเมินจากการเรียนรู้ด้วยชุดข้อมูลวิดีโอจำนวน 18 ประเภท 912 วิดีโอ จากรายการโทรทัศน์ ในการประเมินด้วยการประเมินผลแบบไขว้ จำนวน 5 โฟลด์ วิธีการในงานวิจัยนี้มีความแม่นยำเฉลี่ยร้อยละ 71.98 และใช้เวลาในการเรียนรู้โดยเฉลี่ยประมาณ 40 นาที นอกจากนี้ยังเปรียบเทียบกับการเรียนรู้ด้วยแบบจำลองอื่นๆ อาทิ แบบจำลองคอนโวลูชันสามมิติ และแบบจำลองคอนโวลูชันร่วมกับหน่วยความจำระยะสั้นแบบยาว รวมถึงประเมินผลกับชุดข้อมูลพื้นฐาน Hollywood2 ซึ่งการเรียนรู้มีความแม่นยำเฉลี่ยร้อยละ 93.72
Other Abstract:	This research presents techniques, including Convolutional Neural Network and Semi-Supervised Learning, to classify video clips. Usually, many tasks are done by categorizing video clips using deep learning techniques. However, based on the number of online videos today, it is necessary to use high computing power to accomplish this task. We present a traditional technique using a two-dimensional Convolutional Neural Network by stacking frames and propose using the Self-Organizing Map (SOM) to cluster video frames. We then classified them using simple voting, calculating entropy, neural networks, and Long-Short Term Memory (LSTM). We also show finding frame numbers that are used to cluster video frames according to accuracy and training time. The results of this approach are presented based on testing 18 specific classes of real-world datasets from TV-programs containing 912 videos. The authors evaluated the techniques using five-fold cross-validation that our method archived 71.98% of average accuracy. Their computing time was then assessed, which achieved approximately 40 minutes of average computing time. Moreover, we also compared the present proposal to other baseline models, including C3D and CNN-LSTM, and also evaluate the technique with Hollywood2 that archived 93.72% of average accuracy.
Description:	วิทยานิพนธ์ (วท.ม.)--จุฬาลงกรณ์มหาวิทยาลัย, 2563
Degree Name:	วิทยาศาสตรมหาบัณฑิต
Degree Level:	ปริญญาโท
Degree Discipline:	วิทยาศาสตร์คอมพิวเตอร์
URI:	http://cuir.car.chula.ac.th/handle/123456789/77213
URI:	http://doi.org/10.58837/CHULA.THE.2019.1147
metadata.dc.identifier.DOI:	10.58837/CHULA.THE.2019.1147
Type:	Thesis
Appears in Collections:	Eng - Theses

Files in This Item:

File	Description	Size	Format
6170982521.pdf		1.7 MB	Adobe PDF	View/Open

Show full item record