vision-and-language-navigation-for-uavs-progress-c

name: vision-and-language-navigation-for-uavs-progress-c description: 'Research paper: Vision-and-Language Navigation for UAVs: Progress, Challenges, and a Research Ro' metadata: source: arXiv arxiv_id: 2604.13654 published: 2026-04-15 utility_score: 1.0 keywords: multi-agent, agentic, long-horizon, reasoning, benchmark, evaluation, coordination

Vision-and-Language Navigation for UAVs: Progress, Challenges, and a Research Roadmap

arXiv ID: 2604.13654
Published: 2026-04-15
Utility Score: 1.0
URL: http://arxiv.org/abs/2604.13654

Authors

Hanxuan Chen, Jie Zheng, Siqi Yang

Abstract

Vision-and-Language Navigation for Unmanned Aerial Vehicles (UAV-VLN) represents a pivotal challenge in embodied artificial intelligence, focused on enabling UAVs to interpret high-level human commands and execute long-horizon tasks in complex 3D environments. This paper provides a comprehensive and structured survey of the field, from its formal task definition to the current state of the art. We establish a methodological taxonomy that charts the technological evolution from early modular and deep learning approaches to contemporary agentic systems driven by large foundation models, including Vision-Language Models (VLMs), Vision-Language-Action (VLA) models, and the emerging integration of generative world models with VLA architectures for physically-grounded reasoning. The survey systematically reviews the ecosystem of essential resources simulators, datasets, and evaluation metrics that facilitates standardized research. Furthermore, we conduct a critical analysis of the primary challenges impeding real-world deployment: the simulation-to-reality gap, robust perception in dynamic outdoor settings, reasoning with linguistic ambiguity, and the efficient deployment of large models on resource-constrained hardware. By synthesizing current benchmarks and limitations, this survey concludes by proposing a forward-looking research roadmap to guide future inquiry into key frontiers such as multi-agent swarm coordination and air-ground collaborative robotics.

Matched Keywords

multi-agent, agentic, long-horizon, reasoning, benchmark, evaluation, coordination

Relevance to AI Agents

This paper is highly relevant to AI agent systems research with focus on:

multi-agent, agentic, long-horizon, reasoning, benchmark

Quick Reference

# View paper
open http://arxiv.org/abs/2604.13654

# Download PDF
open http://arxiv.org/pdf/2604.13654.pdf

Auto-generated from arXiv on 2026-04-17

Activation Keywords

"vision-and-language-navigation-for-uavs-progress-c"
"vision and language navigation for uavs progress c"
"use vision and language navigation for uavs progress c"
"vision and language navigation for uavs progress c help"
"vision and language navigation for uavs progress c tool"

Tools Used

Read - Read existing files and documentation
Write - Create new files and documentation
Bash - Execute commands when needed

Instructions for Agents

Identify user's intent and specific requirements
Gather necessary context from files or user input
Execute appropriate actions using available tools
Provide clear results and suggest next steps

Examples

Basic Vision And Language Navigation For Uavs Progress C usage

User: "Help me with vision and language navigation for uavs progress c"
→ Understand requirements → Execute actions → Provide results

Advanced usage

User: "I need detailed vision and language navigation for uavs progress c assistance"
→ Clarify scope → Provide comprehensive solution → Follow up