A reinforcement learning architecture for facilitating reinforcement learning in connection with operation of an external real-time system that includes a plurality of devices operating in a real-world environment. The reinforcement learning architecture includes a plurality of communicators, a task manager, and a reinforcement learning agent that interact with each other to effectuate a policy for achieving a defined objective in the real-world environment. Each of the communicators receives sensory data from a corresponding device and the task manager generates a joint state vector based on the sensory data. The reinforcement learning agent generates, based on the joint state vector, a joint action vector, which the task manager parses into a plurality of actuation commands. The communicators transmit the actuation commands to the plurality of devices in the real-world environment.
展开▼