Systems, devices and methods can control one or more end effectors by generating a semantic labeled image based on image data, wherein the semantic labeled image has to identify a shape of an object and a semantic label of the object, associating a first set of actions with the Object and generating a plan based on an intersection of the first set of actions and a second set of actions to fulfill a command from a user through an actuation of one or more end effectors, associating the second set of actions with the command is.
展开▼