การสรุปใจความสำคัญแบบสกัดจากบทความโดยใช้ออนโทโลยีและวิธีการทางกราฟ

ชุลีพร ยงเกียรติพานิช

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/63627

Title:	การสรุปใจความสำคัญแบบสกัดจากบทความโดยใช้ออนโทโลยีและวิธีการทางกราฟ
Other Titles:	Extractive Text Summarization using Ontology and Graph-based Method
Authors:	ชุลีพร ยงเกียรติพานิช
Advisors:	ดวงดาว วิชาดากุล
Other author:	จุฬาลงกรณ์มหาวิทยาลัย. คณะวิศวกรรมศาสตร์
Advisor's Email:	Duangdao.W@chula.ac.th
Issue Date:	2561
Publisher:	จุฬาลงกรณ์มหาวิทยาลัย
Abstract:	ในปัจจุบันผู้คนเริ่มหันมาดูแลสุขภาพร่างกายกันมากขึ้น บทความปริทัศน์ทางชีวการแพทย์ซึ่งเป็นบทความที่รวบรวมงานวิจัยและนำเสนอออกมาในอีกมุมมองหนึ่ง จึงกำลังเป็นที่สนใจ ทำให้มีผลงานทางวิชาการเผยแพร่ออกมาไม่เว้นแต่ละวัน เนื่องจากสุขภาพถือเป็นเรื่องใกล้ตัว กลุ่มผู้อ่านจึงมีทั้งผู้อ่านที่มีความรู้ในด้านนี้อยู่แล้วและผู้อ่านทั่วไปที่สนใจในการรักษาสุขภาพ งานวิจัยนี้จึงมีเป้าหมายที่จะพัฒนาระบบสรุปใจความสำคัญแบบอัตโนมัติจากบทความปริทัศน์ในด้านชีวการแพทย์ เพื่อช่วยลดเวลาที่ผู้อ่านใช้ในการทำความเข้าใจและรับสาระจากบทความเหล่านั้น เพื่อให้สามารถเข้าใจสิ่งที่ผู้เขียนบทความต้องการนำเสนอ โดยงานวิจัยนี้ใช้วิธีการทางกราฟร่วมกับออนโทโลยี UMLS (Unified Medical Language System) และใช้ค่าระยะห่างการเคลื่อนที่ของคำ (Word Mover’s Distance : WMD) ซึ่งเป็นส่วนหนึ่งของกฏในการแทนบทความด้วยกราฟ และสังเคราะห์ประโยคสำคัญออกมาเป็นบทสรุปโดยวิธีการทางกราฟ งานวิจัยนี้ใช้บทความปริทัศน์ทางด้านชีวการแพทย์เกี่ยวกับโรคต่าง ๆ 5 โรค จำนวนโรคละ 400 บทความจากผับเมด (PubMed) ในการพัฒนาและทดสอบโมเดล ผลการทดลองมีค่าวัดประสิทธิภาพโดยเครื่องมือวัดผล ROUGE (Recall-Oriented Understudy for Gisting Evaluation) โมเดลที่นำเสนอมีความแม่นยำมากกว่าวิธีที่นำเสนอก่อนหน้าสูงสุดร้อยละ 22 นอกจากนี้ยังได้ทดสอบกับวิธีการทางกราฟต่าง ๆ คือ เพจแรงก์ (PageRank) ค่าความเป็นศูนย์กลาง (Degree Centrality) ค่าความใกล้ชิด (Closeness Centrality) และค่าคั่นกลาง (Betweenness Centrality) การทดลองพบว่าการใช้ค่าความใกล้ชิดสามารถสร้างบทสรุปที่มีความครอบคลุมและมีประสิทธิภาพมากที่สุด วิธีการที่นำเสนอนี้สามารถประยุกต์ใช้กับบทความในด้านอื่น ๆ ได้ตามออนโทโลยีที่เลือกใช้
Other Abstract:	In recent years, many people started to take care of the physical health. The biomedical review article is the trendy issue at the moment leading to the huge amount of health information generated rapidly. In this research, we propose a new automatic extractive text summarization technique based on graph representation generated from the Unified Medical Language System (UMLS). We combined the graph building rules with a distance function between text documents, called Word Mover’s Distance. To prioritize the core sentences, we extracted the summary using various graph-based methods. We compared our results with other text summarization software using 5 datasets. Each dataset contains 400 biological review papers as a corpus randomly sampled from PubMed Central (PMC). Our approach outperformed up to 22 percent with the baseline comparators in terms of Recall-Oriented Understudy for Gisting Evaluation (ROUGE) scores. Furthermore, we compared various graph-based methods: PageRank, Degree Centrality, Closeness Centrality and Betweenness Centrality. The results showed that the Closeness Centrality got the best performance score for all experiments. This approach could be applied to other domains depending on selected ontology.
Description:	วิทยานิพนธ์ (วท.ม.)--จุฬาลงกรณ์มหาวิทยาลัย, 2561
Degree Name:	วิทยาศาสตรมหาบัณฑิต
Degree Level:	ปริญญาโท
Degree Discipline:	วิทยาศาสตร์คอมพิวเตอร์
URI:	http://cuir.car.chula.ac.th/handle/123456789/63627
URI:	http://.doi.org/10.58837/CHULA.THE.2018.1137
metadata.dc.identifier.DOI:	10.58837/CHULA.THE.2018.1137
Type:	Thesis
Appears in Collections:	Eng - Theses

Files in This Item:

File	Description	Size	Format
5970919521.pdf		3.85 MB	Adobe PDF	View/Open

Show full item record