name: vision-and-language-navigation-for-uavs-progress-c description: 'Research paper: Vision-and-Language Navigation for UAVs: Progress, Challenges, and a Research Ro' metadata: source: arXiv arxiv_id: 2604.13654 published: 2026-04-15 utility_score: 1.0 keywords: multi-agent, agentic, long-horizon, reasoning, benchmark, evaluation, coordination
Vision-and-Language Navigation for UAVs: Progress, Challenges, and a Research Roadmap
arXiv ID: 2604.13654
Published: 2026-04-15
Utility Score: 1.0
URL: http://arxiv.org/abs/2604.13654
Authors
Hanxuan Chen, Jie Zheng, Siqi Yang
Categories
cs.RO
Abstract
Vision-and-Language Navigation for Unmanned Aerial Vehicles (UAV-VLN) represents a pivotal challenge in embodied artificial intelligence, focused on enabling UAVs to interpret high-level human commands and execute long-horizon tasks in complex 3D environments. This paper provides a comprehensive and structured survey of the field, from its formal task definition to the current state of the art. We establish a methodological taxonomy that charts the technological evolution from early modular and deep learning approaches to contemporary agentic systems driven by large foundation models, including Vision-Language Models (VLMs), Vision-Language-Action (VLA) models, and the emerging integration of generative world models with VLA architectures for physically-grounded reasoning. The survey systematically reviews the ecosystem of essential resources simulators, datasets, and evaluation metrics that facilitates standardized research. Furthermore, we conduct a critical analysis of the primary challenges impeding real-world deployment: the simulation-to-reality gap, robust perception in dynamic outdoor settings, reasoning with linguistic ambiguity, and the efficient deployment of large models on resource-constrained hardware. By synthesizing current benchmarks and limitations, this survey concludes by proposing a forward-looking research roadmap to guide future inquiry into key frontiers such as multi-agent swarm coordination and air-ground collaborative robotics.
Matched Keywords
multi-agent, agentic, long-horizon, reasoning, benchmark, evaluation, coordination
Relevance to AI Agents
This paper is highly relevant to AI agent systems research with focus on:
- multi-agent, agentic, long-horizon, reasoning, benchmark
Quick Reference
# View paper
open http://arxiv.org/abs/2604.13654
# Download PDF
open http://arxiv.org/pdf/2604.13654.pdf
Auto-generated from arXiv on 2026-04-17
Activation Keywords
- "vision-and-language-navigation-for-uavs-progress-c"
- "vision and language navigation for uavs progress c"
- "use vision and language navigation for uavs progress c"
- "vision and language navigation for uavs progress c help"
- "vision and language navigation for uavs progress c tool"
Tools Used
Read- Read existing files and documentationWrite- Create new files and documentationBash- Execute commands when needed
Instructions for Agents
- Identify user's intent and specific requirements
- Gather necessary context from files or user input
- Execute appropriate actions using available tools
- Provide clear results and suggest next steps
Examples
Basic Vision And Language Navigation For Uavs Progress C usage
User: "Help me with vision and language navigation for uavs progress c"
→ Understand requirements → Execute actions → Provide results
Advanced usage
User: "I need detailed vision and language navigation for uavs progress c assistance"
→ Clarify scope → Provide comprehensive solution → Follow up