Vision-and-Language Navigation

Vision-and-Language Navigation (VLN), where agents are guided by natural language instructions, is one of the most intuitive yet challenging embodied AI tasks. However, in practice, instructions given by humans can be incomplete or incorrect. A new set of benchmarks (Taioli et al., 2024) and techniques (Taioli et al., 2024) will be needed to improve the robustness of VLN systems in the real world.

References

2024

Mind the error! detection and localization of instruction errors in vision-and-language navigation

Francesco Taioli, Stefano Rosa, Alberto Castellini, and 5 more authors

IROS, 2024

Abs arXiv Code Website

We propose a benchmark that introduces instruction errors in VLN datasets and a baseline method for detection and localization of errors.
I2EDL: Interactive Instruction Error Detection and Localization

Francesco Taioli, Stefano Rosa, Alberto Castellini, and 5 more authors

RO-MAN, 2024

Abs arXiv

We extend IEDL to programmatically interact with the user, balancing interactions and success rate increase.