Decentralized Multi-Agent Reinforcement Learning with Global State Prediction. (arXiv:2306.12926v1 [cs.RO])


Decentralized Multi-Agent Reinforcement Learning with Global State Prediction. (arXiv:2306.12926v1 [cs.RO])
By: <a href="http://arxiv.org/find/cs/1/au:+Bloom_J/0/1/0/all/0/1">Joshua Bloom</a>, <a href="http://arxiv.org/find/cs/1/au:+Paliwal_P/0/1/0/all/0/1">Pranjal Paliwal</a>, <a href="http://arxiv.org/find/cs/1/au:+Mukherjee_A/0/1/0/all/0/1">Apratim Mukherjee</a>, <a href="http://arxiv.org/find/cs/1/au:+Pinciroli_C/0/1/0/all/0/1">Carlo Pinciroli</a> Posted: June 23, 2023

Deep reinforcement learning (DRL) has seen remarkable success in the control
of single robots. However, applying DRL to robot swarms presents significant
challenges. A critical challenge is non-stationarity, which occurs when two or
more robots update individual or shared policies concurrently, thereby engaging
in an interdependent training process with no guarantees of convergence.
Circumventing non-stationarity typically involves training the robots with
global information about other agents’ states and/or actions. In contrast, in
this paper we explore how to remove the need for global information. We pose
our problem as a Partially Observable Markov Decision Process, due to the
absence of global knowledge on other agents. Using collective transport as a
testbed scenario, we study two approaches to multi-agent training. In the
first, the robots exchange no messages, and are trained to rely on implicit
communication through push-and-pull on the object to transport. In the second
approach, we introduce Global State Prediction (GSP), a network trained to
forma a belief over the swarm as a whole and predict its future states. We
provide a comprehensive study over four well-known deep reinforcement learning
algorithms in environments with obstacles, measuring performance as the
successful transport of the object to the goal within a desired time-frame.
Through an ablation study, we show that including GSP boosts performance and
increases robustness when compared with methods that use global knowledge.

Provided by:
http://arxiv.org/icons/sfx.gif

DoctorMorDi

DoctorMorDi

Moderator and Editor