Vision-and-Language Navigation (VLN), where agents are guided by natural language instructions, is one of the most intuitive yet challenging embodied AI tasks.
However, in practice, instructions given by humans can be incomplete or incorrect.
A new set of benchmarks (Taioli et al., 2024) and techniques (Taioli et al., 2024) will be needed to improve the robustness of VLN systems in the real world.
References
2024
-
Mind the error! detection and localization of instruction errors in vision-and-language navigation
Francesco
Taioli, Stefano
Rosa, Alberto
Castellini
, and
5 more authors
IROS, 2024
We propose a benchmark that introduces instruction errors in VLN datasets and a baseline method for detection and localization of errors.
-
I2EDL: Interactive Instruction Error Detection and Localization
Francesco
Taioli, Stefano
Rosa, Alberto
Castellini
, and
5 more authors
RO-MAN, 2024
We extend IEDL to programmatically interact with the user, balancing interactions and success rate increase.