การเปรียบเทียบอัลกอริทึมระหว่างการสุ่มตัวอย่างแบบทอมสันและอัลกอริทึมความเชื่อมั่นขอบเขตบน สำหรับการเรียนรู้แบบเสริมแรงในเกมเป่ายิ้งฉุบ

ธันยวุฒิ อักขระสมชีพ

DSpace Home
→
Faculty and Institute
→
Faculty of Commerce and Accountancy - Acctn
→
Acctn - Theses
→
View Item

dc.contributor.advisor	เสกสรร เกียรติสุไพบูลย์
dc.contributor.author	ธันยวุฒิ อักขระสมชีพ
dc.contributor.other	จุฬาลงกรณ์มหาวิทยาลัย. คณะพาณิชยศาสตร์และการบัญชี
dc.date.accessioned	2023-02-03T04:31:28Z
dc.date.available	2023-02-03T04:31:28Z
dc.date.issued	2565
dc.identifier.uri	http://cuir.car.chula.ac.th/handle/123456789/81681
dc.description	วิทยานิพนธ์ (วท.ม.)--จุฬาลงกรณ์มหาวิทยาลัย, 2565
dc.description.abstract	งานวิจัยนี้มีวัตถุประสงค์เพื่อเปรียบเทียบประสิทธิภาพระหว่างอัลกอริทึมการสุ่มตัวอย่างแบบทอมสันและอัลกอริทึมความเชื่อมั่นขอบเขตบน ในตัวแบบการเรียนรู้แบบเสริมแรงกับการตัดสินใจเชิงพฤติกรรมของมนุษย์ ทั้งสองอัลกอริทึมเป็นอัลกอริทึมที่มีประสิทธิภาพในการแก้ไขปัญหาแบนดิทหลายแขน แต่ไม่ชัดเจนว่าทั้งสองอัลกอริทึมจะมีประสิทธิภาพอย่างไรกับปัญหาการตัดสินใจเชิงพฤติกรรมของมนุษย์ที่ความซับซ้อนทางด้านพฤติกรรม งานวิจัยนี้จำลองเกมเป่ายิ้งฉุบแทนปัญหาการตัดสินใจของมนุษย์ โดยมีองค์ประกอบเชิงพฤติกรรม 2 องค์ประกอบ คือ พฤติกรรมการใช้กลยุทธตามเข็มนาฬิกาแบบผสม และพฤติกรรมการใช้กลยุทธยุติการสูญเสีย โดยตัวแบบเกมเป่ายิ้งฉุบถูกจำลองขึ้นตามกระบวนการตัดสินใจแบบมาร์คอฟ ตัวแทนตัวแบบจากทั้งสองอัลกอริทึมจะแก้ไขปัญหาดังกล่าวและวัดประสิทธิภาพด้วยผลรางวัลสะสมภายใต้เงื่อนไขการจำลองในรูปแบบต่าง ๆ ผลการเปรียบเทียบประสิทธิภาพพบว่า ตัวแทนตัวแบบจากอัลกอริทึมความเชื่อมั่นขอบเขตบนมีประสิทธิภาพดีกว่าตัวแทนตัวแบบจากอัลกอริทึมการสุ่มตัวอย่างแบบทอมสันในการจำลองส่วนใหญ่ ยกเว้นกรณีการจำลองที่รูปแบบพฤติกรรมของมนุษย์มีความชัดเจนเป็นระยะเวลายาว ตัวแทนตัวแบบจากอัลกอริทึมการสุ่มตัวอย่างแบบทอมสันมีประสิทธิภาพดีกว่าตัวแทนตัวแบบจากอัลกอริทึมความเชื่อมั่นขอบเขตบน
dc.description.abstractalternative	The purpose of this study is to compare the efficiency of the Thompson sampling algorithm and the upper confidence bound algorithm in reinforcement learning models for human behavioral decision making. Both algorithms are known of being efficient in solving multi-armed bandit problems. However, little is known how well those two algorithms perform when they encounter a behaviorally complex human decision problem. In this study, simulated rock-paper-scissors games represent human decision problems with two human behavioral traits, a mixed clockwise strategy and a stop loss strategy. The simulated rock-paper-scissors game is modeled as a Markov decision process. The two reinforcement learning agents are then applied to solve the decision process with their cumulative rewards as the performance measures. The performances of the two agents are measured under various simulation settings. The comparison results show that the upper confidence bound agent outperforms the Thompson sampling agent in most cases. The only exception is when there exists a strong behavioral pattern that persists over a long decision horizon where the Thompson sampling agent outperforms the upper confidence bound agent.
dc.language.iso	th
dc.publisher	จุฬาลงกรณ์มหาวิทยาลัย
dc.relation.uri	http://doi.org/10.58837/CHULA.THE.2022.955
dc.rights	จุฬาลงกรณ์มหาวิทยาลัย
dc.subject	การสุ่มตัวอย่าง
dc.subject	อัลกอริทึม
dc.subject	กระบวนการมาร์คอฟ
dc.subject	Algorithms
dc.subject	Sampling
dc.subject	Markov processes
dc.subject.classification	Computer Science
dc.title	การเปรียบเทียบอัลกอริทึมระหว่างการสุ่มตัวอย่างแบบทอมสันและอัลกอริทึมความเชื่อมั่นขอบเขตบน สำหรับการเรียนรู้แบบเสริมแรงในเกมเป่ายิ้งฉุบ
dc.title.alternative	A Comparison between thompson sampling and upper confidence bound algorithm for reinforcement learning in the game of rock-paper-scissor
dc.type	Thesis
dc.degree.name	วิทยาศาสตรมหาบัณฑิต
dc.degree.level	ปริญญาโท
dc.degree.discipline	สถิติ
dc.degree.grantor	จุฬาลงกรณ์มหาวิทยาลัย
dc.identifier.DOI	10.58837/CHULA.THE.2022.955