การใช้กลุ่มของภาพฉากเพื่อจำแนกวิดีโอจากรายการโทรทัศน์

อิทธิศักดิ์ เผือกศรี

dc.contributor.advisor	สุกรี สินธุภิญโญ
dc.contributor.author	อิทธิศักดิ์ เผือกศรี
dc.contributor.other	จุฬาลงกรณ์มหาวิทยาลัย. คณะวิศวกรรมศาสตร์
dc.date.accessioned	2021-09-22T23:36:44Z
dc.date.available	2021-09-22T23:36:44Z
dc.date.issued	2562
dc.identifier.uri	http://cuir.car.chula.ac.th/handle/123456789/77213
dc.description	วิทยานิพนธ์ (วท.ม.)--จุฬาลงกรณ์มหาวิทยาลัย, 2563
dc.description.abstract	งานวิจัยนี้นำเสนอวิธีการจำแนกวิดีโอ ด้วยเทคนิคแบบจำลองคอนโวลูชันสองมิติ และการเรียนรู้แบบกึ่งกำกับ โดยทั่วไปการจำแนกวิดีโอที่มีประสิทธิภาพสูง ถูกนำเสนอโดยใช้วิธีการเรียนรู้แบบลึก อย่างไรก็ตามจากการเพิ่มขึ้นของจำนวนวิดีโอในปัจจุบัน การเรียนรู้ของแบบจำลองเพื่อจำแนกวิดีโอจำเป็นต้องใช้ประสิทธิภาพในการประมวลผลสูง งานวิจัยนี้จึงนำเสนอวิธีการเรียนรู้ด้วยแบบจำลองคอนโวลูชันสองมิติโดยใช้การซ้อนทับกันของภาพฉาก และการจัดกลุ่มของภาพฉากด้วยแผนที่จัดระเบียบด้วยตนเองก่อนนำไปสร้างแบบจำลองจำแนกประเภทรายการ โดยการสร้างแบบจำลองประเภทรายการถูกนำเสนอใน 4 รูปแบบ ประกอบด้วย การออกเสียง การคำนวณค่าความวุ่นวาย การเรียนรู้ด้วยแบบจำลองโครงข่ายประสาทเทียม การเรียนรู้ด้วยหน่วยความจำระยะสั้นแบบยาว อีกทั้งยังประเมินจำนวนภาพฉากสำหรับการประมวลผลในการจัดกลุ่มโดยเปรียบเทียบระหว่างระยะเวลาการเรียนรู้และความแม่นยำ วิธีการในงานวิจัยนี้ถูกนำเสนอด้วยประเมินจากการเรียนรู้ด้วยชุดข้อมูลวิดีโอจำนวน 18 ประเภท 912 วิดีโอ จากรายการโทรทัศน์ ในการประเมินด้วยการประเมินผลแบบไขว้ จำนวน 5 โฟลด์ วิธีการในงานวิจัยนี้มีความแม่นยำเฉลี่ยร้อยละ 71.98 และใช้เวลาในการเรียนรู้โดยเฉลี่ยประมาณ 40 นาที นอกจากนี้ยังเปรียบเทียบกับการเรียนรู้ด้วยแบบจำลองอื่นๆ อาทิ แบบจำลองคอนโวลูชันสามมิติ และแบบจำลองคอนโวลูชันร่วมกับหน่วยความจำระยะสั้นแบบยาว รวมถึงประเมินผลกับชุดข้อมูลพื้นฐาน Hollywood2 ซึ่งการเรียนรู้มีความแม่นยำเฉลี่ยร้อยละ 93.72
dc.description.abstractalternative	This research presents techniques, including Convolutional Neural Network and Semi-Supervised Learning, to classify video clips. Usually, many tasks are done by categorizing video clips using deep learning techniques. However, based on the number of online videos today, it is necessary to use high computing power to accomplish this task. We present a traditional technique using a two-dimensional Convolutional Neural Network by stacking frames and propose using the Self-Organizing Map (SOM) to cluster video frames. We then classified them using simple voting, calculating entropy, neural networks, and Long-Short Term Memory (LSTM). We also show finding frame numbers that are used to cluster video frames according to accuracy and training time. The results of this approach are presented based on testing 18 specific classes of real-world datasets from TV-programs containing 912 videos. The authors evaluated the techniques using five-fold cross-validation that our method archived 71.98% of average accuracy. Their computing time was then assessed, which achieved approximately 40 minutes of average computing time. Moreover, we also compared the present proposal to other baseline models, including C3D and CNN-LSTM, and also evaluate the technique with Hollywood2 that archived 93.72% of average accuracy.
dc.language.iso	th
dc.publisher	จุฬาลงกรณ์มหาวิทยาลัย
dc.relation.uri	http://doi.org/10.58837/CHULA.THE.2019.1147
dc.rights	จุฬาลงกรณ์มหาวิทยาลัย
dc.subject.classification	Computer Science
dc.subject.classification	Computer Science
dc.subject.classification	Computer Science
dc.subject.classification	Computer Science
dc.title	การใช้กลุ่มของภาพฉากเพื่อจำแนกวิดีโอจากรายการโทรทัศน์
dc.title.alternative	Using clustered frames to classify videos from television programs
dc.type	Thesis
dc.degree.name	วิทยาศาสตรมหาบัณฑิต
dc.degree.level	ปริญญาโท
dc.degree.discipline	วิทยาศาสตร์คอมพิวเตอร์
dc.degree.grantor	จุฬาลงกรณ์มหาวิทยาลัย
dc.identifier.DOI	10.58837/CHULA.THE.2019.1147