A system and method for multi-agent reinforcement learning in a multi-agent environment that include receiving data associated with the multi-agent environment in which an ego agent and a target agent are traveling and learning a single agent policy that is based on the data associated with the multi-agent environment and that accounts for operation of at least one of: the ego agent and the target agent individually. The system and method also include learning a multi-agent policy that accounts for operation of the ego agent and the target agent with respect to one another within the multi-agent environment. The system and method further include controlling at least one of: the ego agent and the target agent to operate within the multi-agent environment based on the multi-agent policy.
展开▼