Multi-agent deep reinforcement learning for cryptocurrency trading

Kittiwin Kumlungmak

dc.contributor.advisor	Peerapon Vateekul
dc.contributor.author	Kittiwin Kumlungmak
dc.contributor.other	Chulalongkorn University. Faculty of Engineering
dc.date.accessioned	2023-08-04T07:37:29Z
dc.date.available	2023-08-04T07:37:29Z
dc.date.issued	2022
dc.identifier.uri	https://cuir.car.chula.ac.th/handle/123456789/83147
dc.description	Thesis (M.Sc.)--Chulalongkorn University, 2022
dc.description.abstract	Reinforcement learning has emerged as a promising approach for enhancing profitability in cryptocurrency trading. However, the inherent volatility of the market, especially during bearish periods, poses significant challenges in this domain. Existing literature addresses this issue through the adoption of single-agent techniques such as deep Q-network (DQN), advantage actor-critic (A2C), and proximal policy optimization (PPO), or their ensembles. Despite these efforts, the mechanisms employed to mitigate losses during bearish market conditions within the cryptocurrency context lack robustness. Consequently, the performance of reinforcement learning methods for cryptocurrency trading remains constrained within the current literature. To overcome this limitation, we present a novel cryptocurrency trading method, leveraging multi-agent proximal policy optimization (MAPPO). Our approach incorporates a collaborative multi-agent scheme and a local-global reward function to optimize both individual and collective agent performance. Employing a multi-objective optimization technique and a multi-scale continuous loss (MSCL) reward, we train the agents using a progressive penalty mechanism to prevent consecutive losses of portfolio value. In evaluating our method, we compare it against multiple baselines, revealing superior cumulative returns compared to baseline methods. Notably, the strength of our method is further exemplified through the results obtained from the bearish test set, where only our approach demonstrates the ability to yield a profit. Specifically, our method achieves an impressive cumulative return of 2.36%, while the baseline methods result in negative cumulative returns. In comparison to FinRL-Ensemble, a reinforcement learning-based method, our approach exhibits a remarkable 46.05% greater cumulative return in the bullish test set.
dc.description.abstractalternative	การเรียนรู้แบบเสริมกำลัง (Reinforcement learning) เป็นวิธีการที่ถูกนำมาใช้ในการเพิ่มผลกำไรในการซื้อขายคริปโทเคอร์เรนซี (cryptocurrency) อย่างไรก็ตาม ความผันผวนของตลาด โดยเฉพาะในช่วงเวลาที่ตลาดเป็นลักษณะตลาดขาลง (Bearish) กลายเป็นอุปสรรคที่สำคัญของด้านนี้ งานวิจัยที่มีอยู่ในปัจจุบัน มีความพยายามที่จะแก้ปัญหานี้โดยการใช้เทคนิค Deep Q-Network (DQN), Advantage Actor-Critic (A2C), และ Proximal Policy Optimization (PPO) หรือการผสมผสานกันของเทคนิคดังกล่าว (Ensemble) แต่อย่างไรก็ตาม กลไกที่นำมาใช้เพื่อลดความเสียหายในช่วงตลาดขาลงสำหรับคริปโทเคอร์เรนซียังไม่มีประสิทธิภาพเท่าที่ควร ดังนั้นประสิทธิภาพของวิธีการเรียนรู้แบบเสริมกำลังสำหรับการซื้อขายคริปโทเคอร์เรนซียังถูกจำกัด เพื่อเอาชนะข้อจำกัดนี้ เรานำเสนอเทคนิคใหม่สำหรับการซื้อขายคริปโทเคอร์เรนซี โดยใช้การเรียนรู้แบบหลายตัวกระทำ (Multi-Agent) และฟังก์ชันรางวัลร่วม (Local-Global Reward Function) เพื่อปรับปรุงประสิทธิภาพในการทำงานร่วมกันของตัวกระทำทุกตัว รวมถึงการทำงานของตัวกระทำแต่ละตัวไปพร้อมกันด้วย นอกจากนั้น เรายังใช้เทคนิคการปรับปรุงเป้าหมายหลายวัตถุประสงค์ (Multi-Objective Optimization Technique) และการทำโทษเมื่อมีการสูญเสียแบบต่อเนื่อง ซึ่งเราเรียกว่า Multi-Scale Continuous Loss (MSCL) Reward ที่เราดัดแปลงมาจากการลงโทษแบบเพิ่มเติม (Progressive Penalty) เพื่อป้องกันความสูญเสียต่อเนื่องของมูลค่าพอร์ตการลงทุน ในการประเมินผลของวิธีการที่เรานำเสนอ เราได้ทำการเปรียบเทียบกับเทคนิคอื่นๆที่เป็นที่นิยม และพบว่าผลตอบแทนสะสม (cumulative return) ของเทคนิคของเรามีค่าสูงกว่าเทคนิคดังกล่าว โดยเฉพาะในช่วงตลาดขาลง มีเพียงวิธีการของเราเท่านั้นที่สามารถให้ผลกำไรได้ ซึ่งวิธีการของเราสร้างผลตอบแทนสะสมได้ถึง 2.36% ในขณะที่วิธีการอื่นๆที่เรานำมาเปรียบเทียบเกิดการขาดทุนทั้งหมด และเมื่อเปรียบเทียบกับ FinRL-Ensemble ซึ่งเป็นวิธีการที่ใช้การเรียนรู้แบบเสริมกำลัง เราพบว่าวิธีการของเราได้รับผลตอบแทนสะสมที่สูงกว่าถึง 46.05% ในช่วงตลาดขาขึ้น (Bullish)
dc.language.iso	en
dc.publisher	Chulalongkorn University
dc.relation.uri	http://doi.org/10.58837/CHULA.THE.2022.95
dc.rights	Chulalongkorn University
dc.title	Multi-agent deep reinforcement learning for cryptocurrency trading
dc.title.alternative	การเรียนรู้แบบเสริมกำลังเชิงลึกแบบหลายตัวกระทำสำหรับการซื้อขายคริปโทเคอร์เรนซี
dc.type	Thesis
dc.degree.name	Master of Science
dc.degree.level	Master's Degree
dc.degree.discipline	Computer Science
dc.degree.grantor	Chulalongkorn University
dc.identifier.DOI	10.58837/CHULA.THE.2022.95