การประยุกต์ใช้การเรียนรู้แบบเสริมกำลังกับการวางแผนทางการเงิน

ภัควัลย์ จันทรศิริภาส

DSpace Home
→
Faculty and Institute
→
Faculty of Commerce and Accountancy - Acctn
→
Acctn - Theses
→
View Item

dc.contributor.advisor	เสกสรร เกียรติสุไพบูลย์
dc.contributor.author	ภัควัลย์ จันทรศิริภาส
dc.contributor.other	จุฬาลงกรณ์มหาวิทยาลัย. คณะพาณิชยศาสตร์และการบัญชี
dc.date.accessioned	2020-04-05T07:10:31Z
dc.date.available	2020-04-05T07:10:31Z
dc.date.issued	2562
dc.identifier.uri	http://cuir.car.chula.ac.th/handle/123456789/64782
dc.description	วิทยานิพนธ์ (วท.ม.)--จุฬาลงกรณ์มหาวิทยาลัย, 2562
dc.description.abstract	งานวิจัยนี้มีวัตถุประสงค์ที่จะนำการเรียนรู้แบบเสริมกำลังมาประยุกต์กับการวางแผนทางการเงินเพื่อตัดสินใจเลือกอัตราส่วนของสินทรัพย์ที่ใช้ในการบริโภคและการลงทุนในสินทรัพย์ที่มีความเสี่ยงที่ดีที่สุดในแต่ละช่วงเวลาตลอดช่วงอายุของครัวเรือน ผลลัพธ์ที่ได้จากการเรียนรู้แบบเสริมกำลังซึ่งเป็นค่าประมาณ จะถูกนำมาเปรียบเทียบกับคำตอบที่ถูกต้องจากวิธี MDP สำหรับการเรียนรู้แบบเสริมกำลังในงานวิจัยนี้เป็นอัลกอริธึม SARSA โดยการเลือกการกระทำใช้วิธี ε-greedy ส่วนการประมาณค่าใช้ตัวแบบถดถอยที่มีตัวแปรต้นเป็นฟีเจอร์จากเคอร์เนล Radial Basis Function (RBF) จากการศึกษาพบว่าความผิดพลาดระหว่างค่าประมาณผลลัพธ์ที่ดีที่สุดเทียบกับคำตอบจาก MDP มีแนวโน้มลู่เข้าสู่ศูนย์ แสดงว่าการเรียนรู้แบบเสริมกำลังสามารถประยุกต์กับการวางแผนทางการเงินได้ อย่างไรก็ตาม SARSA แบบดั้งเดิมใช้เวลานานในการเรียนรู้ เมื่อปรับปรุงให้การเลือกการกระทำในช่วงแรกเน้นสำรวจมากขึ้น พบว่า ความผิดพลาดลดลง แสดงให้เห็นว่า SARSA ที่ปรับปรุงให้เน้นการสำรวจในช่วงแรกมีประสิทธิภาพดีขึ้นกว่าแบบดั้งเดิม นอกจากนี้เมื่อพิจารณาผลของการปรับเปลี่ยนปัจจัยต่างๆ สำหรับ SARSA แบบเน้นการสำรวจในช่วงแรก พบว่า ความผิดพลาดระหว่างค่าประมาณผลลัพธ์ที่ดีที่สุดเทียบกับ MDP มีค่าน้อยสุดเมื่อใช้ค่าน้ำหนักเริ่มต้นจากตัวแบบการถดถอยเชิงเส้น, จำนวนฟีเจอร์ 200 ลักษณะ, อัตราการเรียนรู้และความน่าจะเป็นในการเลือกการกระทำแบบสำรวจแบบลดลงตามเวลาที่มีค่าเริ่มต้น 0.1 และ 0.9 ตามลำดับ ในขณะที่การนำคำตอบที่ดีที่สุดไปจำลองใช้จริง ผลของการวางแผนทางการเงินที่ได้มีความแตกต่างกับคำตอบจาก MDP มาก โดยการใช้ค่าน้ำหนักเริ่มต้นจากตัวแบบการถดถอยเชิงเส้น, จำนวนฟีเจอร์ 300 ลักษณะ, อัตราการเรียนรู้และความน่าจะเป็นในการเลือกการกระทำแบบสำรวจแบบลดลงตามเวลาที่มีค่าเริ่มต้น 0.1 และ 0.9 ตามลำดับให้ผลลัพธ์ที่ใกล้เคียงกับ MDP มากสุด แสดงว่าถึงแม้ความผิดพลาดของผลลัพธ์ที่ดีที่สุดจะมีค่าต่ำสุด คำตอบจากวิธีการเรียนรู้แบบเสริมกำลังยังมีความผิดพลาดสูงเมื่อเทียบกับคำตอบจาก MDP
dc.description.abstractalternative	In this study a reinforcement learning is applied to a financial planning problem to find an optimal consumption proportion and an optimal investment proportion in risky assets. The solutions from the reinforcement approach are compared with the exact solutions from an MDP approach. The algorithm used in this study is SARSA with ε-greedy action selection where the value approximation employs a regression method with Radial Basis Function (RBF) features. From the experiments, the errors between the optimal value estimated from the reinforcement learning and the exact solution from the MDP have a tendency to converge, indicating the effectiveness of the reinforcement learning in solving a financial planning problem. The algorithm is then adjusted to emphasize more on exploration. The errors from the adjusted algorithm are lower than those from the original algorithm, showing that the adjusted algorithm is more efficient than the original algorithm. In addition, considering the effects of factor adjustment of the SARSA algorithm focused on exploration in the first stage, it is found that the error between the optimal value of the reinforcement learning and the MDP is lowest when the initial weights from the linear regression model are used with 200 features and the initial decreased learning rate and epsilon are 0.1 and 0.9, respectively. When the optimal actions are used in the simulation, the obtained results of financial planning are very different compared to those from the MDP. The simulation in which 300 features are used instead gives the most similar result to the MDP. This shows that even though the error of the optimal value is lowest, the difference of the result from the reinforcement learning is still high compared to the result from the MDP.
dc.language.iso	th
dc.publisher	จุฬาลงกรณ์มหาวิทยาลัย
dc.relation.uri	http://doi.org/10.58837/CHULA.THE.2019.1395
dc.rights	จุฬาลงกรณ์มหาวิทยาลัย
dc.subject.classification	Computer Science
dc.subject.classification	Decision Sciences
dc.title	การประยุกต์ใช้การเรียนรู้แบบเสริมกำลังกับการวางแผนทางการเงิน
dc.title.alternative	An application of reinforcement learning to financial planning
dc.type	Thesis
dc.degree.name	วิทยาศาสตรมหาบัณฑิต
dc.degree.level	ปริญญาโท
dc.degree.discipline	สถิติ
dc.degree.grantor	จุฬาลงกรณ์มหาวิทยาลัย
dc.email.advisor	Seksan.K@Chula.ac.th
dc.identifier.DOI	10.58837/CHULA.THE.2019.1395