Vision-and-Language Navigation

Vision-and-Language Navigation (VLN), where agents are guided by natural language instructions, is one of the most intuitive yet challenging embodied AI tasks. However, in practice, instructions given by humans can be incomplete or incorrect. A new set of benchmarks (Taioli et al., 2024) and techniques (Taioli et al., 2024) will be needed to improve the robustness of VLN systems in the real world.

References

2024

  1. mind-the-error.jpg
    Mind the error! detection and localization of instruction errors in vision-and-language navigation
    Francesco Taioli, Stefano Rosa, Alberto Castellini , and 5 more authors
    IROS, 2024
  2. I2EDL.jpg
    I2EDL: Interactive Instruction Error Detection and Localization
    Francesco Taioli, Stefano Rosa, Alberto Castellini , and 5 more authors
    RO-MAN, 2024