Training and/or using both a high-level policy model and a low-level policy model for mobile robot navigation. High-level output generated using the high-level policy model at each iteration indicates a corresponding high-level action for robot movement in navigating to the navigation target. The low-level output generated at each iteration is based on the determined corresponding high-level action for that iteration, and is based on observation(s) for that iteration. The low-level policy model is trained to generate low-level output that defines low-level action(s) that define robot movement more granularly than the high-level action—and to generate low-level action(s) that avoid obstacles and/or that are efficient (e.g., distance and/or time efficiency).
展开▼