A computer-implemented method for exploring, by a table-based parallel reinforcement learning, PRL, algorithm, an unexplored domain comprising a plurality of agents and states, the unexplored domain represented by a state-action space. The method includes the steps performed by one or more of the plurality of agents receiving an assigned partition of the state-action space represented by a table; and executing during a plurality of episodes actions for states within the partition. An action transits a state; and granting to a transited state a reward; and exchanging state-action values with other agents of the plurality of agents in the domain; and updating the table.
展开▼