Path exploration with random network distillation on multi-agent reinforcement learning

Korawat Charoenpitaks

Path exploration with random network distillation on multi-agent reinforcement learning

Korawat Charoenpitaks

URI: http://cuir.car.chula.ac.th/handle/123456789/70347

Date: 2019

Abstract:

Intrinsic motivation is one of the potential candidates to help improve performance of reinforcement learning algorithm in complex environments. The method enhances exploration capability without explicitly told by the creator and works on any environment. This is suitable in the case of multi-agent reinforcement learning where the environment complexity is more than usual. The research presents an exploration model using intrinsic motivation built from the random network distillation algorithm to improve the performance of multi-agent reinforcement learning and compare with the benchmark in different scenarios. The concept of clipping ratio is introduced to enforces the limit on optimization magnitude. Based on the extrinsic reward, the limit in the form of clipping ratio helps truncate the excessive magnitude that may cause instability to the optimization. The experiments were carried out on two different multi-agent architectures: 1) Individual Intrinsic Motivation Architecture, and 2) Centralized Intrinsic Motivation Architecture. The experimental results showed that in case of very complex environments, Centralized Intrinsic Motivation Architecture accompanied with a small clipping ratio could gain an increase in performance. The result reported the achievement of up to 70% win-rate in both architectures which is higher than those of the benchmark at the best of 43% in 2s3z environment.