Missions for autonomous systems often require agents to visit multiple
targets in complex operating conditions. This work considers the problem of
visiting a set of targets in minimum time by a team of non-communicating agents
in a Markov decision process (MDP). The single-agent problem is at least
NP-complete by reducing it to a Hamiltonian path problem. We first discuss an
optimal algorithm based on Bellman’s optimality equation that is exponential in
the number of target states. Then, we trade-off optimality for time complexity
by presenting a suboptimal algorithm that is polynomial at each time step. We
prove that the proposed algorithm generates optimal policies for certain
classes of MDPs. Extending our procedure to the multi-agent case, we propose a
target partitioning algorithm that approximately minimizes the expected time to
visit the targets. We prove that our algorithm generates optimal partitions for
clustered target scenarios. We present the performance of our algorithms on
random MDPs and gridworld environments inspired by ocean dynamics. We show that
our algorithms are much faster than the optimal procedure and more optimal than
the currently available heuristic.