MP3: Movement Primitive-Based (Re-)Planning Policy. (arXiv:2306.12729v1 [cs.LG])


MP3: Movement Primitive-Based (Re-)Planning Policy. (arXiv:2306.12729v1 [cs.LG])
By: <a href="http://arxiv.org/find/cs/1/au:+Otto_F/0/1/0/all/0/1">Fabian Otto</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhou_H/0/1/0/all/0/1">Hongyi Zhou</a>, <a href="http://arxiv.org/find/cs/1/au:+Celik_O/0/1/0/all/0/1">Onur Celik</a>, <a href="http://arxiv.org/find/cs/1/au:+Li_G/0/1/0/all/0/1">Ge Li</a>, <a href="http://arxiv.org/find/cs/1/au:+Lioutikov_R/0/1/0/all/0/1">Rudolf Lioutikov</a>, <a href="http://arxiv.org/find/cs/1/au:+Neumann_G/0/1/0/all/0/1">Gerhard Neumann</a> Posted: June 23, 2023

We introduce a novel deep reinforcement learning (RL) approach called
Movement Prmitive-based Planning Policy (MP3). By integrating movement
primitives (MPs) into the deep RL framework, MP3 enables the generation of
smooth trajectories throughout the whole learning process while effectively
learning from sparse and non-Markovian rewards. Additionally, MP3 maintains the
capability to adapt to changes in the environment during execution. Although
many early successes in robot RL have been achieved by combining RL with MPs,
these approaches are often limited to learning single stroke-based motions,
lacking the ability to adapt to task variations or adjust motions during
execution. Building upon our previous work, which introduced an episode-based
RL method for the non-linear adaptation of MP parameters to different task
variations, this paper extends the approach to incorporating replanning
strategies. This allows adaptation of the MP parameters throughout motion
execution, addressing the lack of online motion adaptation in stochastic
domains requiring feedback. We compared our approach against state-of-the-art
deep RL and RL with MPs methods. The results demonstrated improved performance
in sophisticated, sparse reward settings and in domains requiring replanning.

Provided by:
http://arxiv.org/icons/sfx.gif

DoctorMorDi

DoctorMorDi

Moderator and Editor