arXiv Papers of Unmanned Aerial Vehicle
Authors: Wei Sun, Weixia Zhang, Hongjian Zhan, Mingkai Lu, Yixuan Gao, Guangtao Zhai
Abstract: We present DroneIQA‑VLE, our solution to the ICME 2026 Drone‑IQA Grand Challenge on Target‑aware Image Quality Assessment for Low‑altitude UAV Images. The framework jointly predicts global, target, and background quality scores by ensembling two complementary pipelines: (1) SigLIP2 vision encoders with multi‑task regression heads, and (2) a LoRA‑adapted Qwen3.5‑9B multimodal large language model for quality score regression. The final global quality prediction is obtained by arithmetically averaging the outputs of both pipelines. Our method achieves 2nd place in the challenge, demonstrating its effectiveness. The code is available at https://github.com/sunwei925/DroneIQA‑VLE.
Authors: Zhongqiang Song, Guanying Chen, Yuqi Zhang, Yin Zou, Chuanyu Fu, Zhiyuan Yuan, Chuan Huang, Shuguang Cui, Xiaochun Cao
Abstract: This paper addresses the problem of monocular metric depth estimation in aerial UAV imagery. Although recent data‑driven methods have achieved remarkable progress in ground‑level scenarios, models trained primarily on street‑view and indoor datasets exhibit significant domain gaps when applied to aerial viewpoints. To tackle these challenges, we introduce AerialMetric, a benchmark dataset designed to evaluate and facilitate the adaptation of monocular metric depth estimation under UAV aerial viewpoints. The dataset consists of four complementary subsets collected from different sources, jointly covering real‑world photogrammetry data, controlled aerial acquisition settings, photorealistic synthetic scenes, and in‑the‑wild Internet imagery. Totally, AerialMetric provides 52K real‑world and 16K synthetic image‑depth pairs with reliable metric ground truth. Based on this dataset, we conduct systematic evaluations of existing state‑of‑the‑art models under aerial settings and investigate the impact of viewpoint, altitude, and camera parameters on metric depth prediction. In addition, by fine‑tuning representative metric depth model on our dataset, we establish a comprehensive aerial benchmark and achieve state‑of‑the‑art performance across diverse aerial imagery. Our dataset, code, and model weight are publicly available at https://kuieless.github.io/AerialMetric‑ECCV2026‑page/.
Authors: Haoyu Zhang, Meng Liu, Qianlong Xiang, Kun Wang, Yaowei Wang, Liqiang Nie
Abstract: Spatial intelligence is essential for low‑altitude unmanned aerial vehicle (UAV) perception, collaboration, and navigation. However, existing UAV benchmarks often emphasize image‑level recognition, single‑view understanding, or narrow answer formats, leaving 3D spatial inference, multi‑view collaboration, scene dynamics, and diverse task formulations insufficiently evaluated. To address these gaps, we introduce SpatialUAV, a real low‑altitude UAV benchmark comprising 4,331 curated instances across 14 fine‑grained task types, covering semantic discrimination, spatial relation, aerial‑‑aerial collaboration, aerial‑‑ground collaboration, and motion understanding. SpatialUAV organizes all samples into a unified visual‑input‑‑question‑‑answer schema, while supporting seven input configurations and nine answer formats, including option labels, region identifiers, geometric values, cross‑view correspondences, and free‑form motion descriptions. To ensure reliable and grounded evaluation, our data construction pipeline integrates detector‑assisted regions, depth supervision, metadata‑derived rules, extensive manual annotation, blind filtering, and multi‑turn human validation, together with task‑specific metrics for heterogeneous outputs. Evaluating representative vision‑language models across three categories, we show that current models remain far from human‑level performance, with pronounced bottlenecks in cross‑view association, structured grounding, geometric reasoning, and temporal viewpoint understanding. These results offer empirical guidance for advancing low‑altitude UAV spatial intelligence. Code and data are available at https://github.com/Hyu‑Zhang/SpatialUAV.
Authors: Jingfeng Mao, Xuyang Chen, Qilin Zhang, Oussema Dhaouadi, Guangming Wang, Brian Sheil, Daniel Cremers, Yan Xia, Olaf Wysocki
Abstract: Aerial 6DoF localization typically relies on precise GNSS signals or radiometrically rich 3D reconstructions, limiting scalability and on‑board deployment. We propose SemCityLoc, a semantic‑geometric alignment system that reframes aerial pose estimation as structured surface registration between foundation‑model‑derived visual priors and standardized LoD‑compliant 3D city models. Instead of matching sparse contours or dense texture, our method aligns semantic surfaces and monocular depth with lightweight semantic 3D building models, increasing pose discriminability in repetitive and occluded urban environments. To enable accurate evaluation, we introduce SemCityLockeD, the first real‑world benchmark combining centimeter‑accurate UAV poses with standardized LoD1‑‑LoD3 semantic city models and challenging low‑altitude imagery. Experiments demonstrate substantial improvements over existing map‑based approaches, improving recall by up to 36% and reducing mean positional error from 9.89m to 2.62m in challenging urban canyons. Our results indicate that semantically structured geometry provides sufficient and scalable constraints for high‑precision aerial localization without radiometric scene reconstructions. The code and data are available at https://albertchen98.github.io/SemCityLoc.
Authors: Fenghe Guo, Runjie Shen, Chenyang Sun, Junrui Zhang, Quanxi Zhan, Yongchun Wang, Junjie Zhang
Abstract: Hydropower tunnel inspection is critical for infrastructure integrity yet remains inefficient and hazardous using manual methods. We propose FLISP (Fast LiDAR‑IMU Synchronized Path Planner), a mapless planning framework for cooperative UGV‑UAV inspection. Unlike traditional map‑based paradigms, FLISP features three core contributions: (1) a unified architecture where a single UGV‑mounted LiDAR‑IMU suite drives synchronized path generation for both platforms; (2) platform‑specific solvers utilizing an enhanced Firefly Algorithm for UGV obstacle avoidance and a dynamic iterative optimizer for UAV flight; and (3) a hierarchical refinement strategy ensuring kinematic feasibility without state estimation drift. Benchmarks in a 1.2 km operational tunnel demonstrate that FLISP circumvents structural bottlenecks of map‑based methods, eliminating map rasterization overhead (Fast‑LIO2 + A) and sampling instability (LIO‑SAM + RRT). FLISP achieves a 100% success rate with 7 ms latency, representing a 7‑fold speedup over grid‑based and a three‑order‑of‑magnitude improvement over sampling‑based baselines. Validated in operational hydropower tunnels, this approach offers a scalable solution for robotic inspection in feature‑degraded linear infrastructure. A demonstration video is available at https://youtu.be/Y_ezs1PfLJ4, and the code at https://github.com/ArchibaldGuo/FLISP.git.
Authors: Abhishek Phadke, Karthik Kumar Vasudeva, Abhishek Joshi
Abstract: The initial development phase of UAV swarms largely depends on simulation for experimental design and validation, yet existing open‑source tools are often unmaintained, have steep learning curves, or are built around a single fixed scenario. The need for a comprehensive, modular simulation platform is a recognized research gap. This paper presents SwarmFly, a MATLAB‑based simulation and test platform for multi‑ UAV swarms that addresses these gaps. SwarmFly combines a real‑time operational map, four swarm coordination modes (leader‑follower, decentralized, heterogeneous relay, and heterogeneous speed), simulated IMU telemetry, and IP‑based geolocation with a plugin architecture that lets researchers add behaviors, fault models, and analysis tools without touching the core code. Eight bundled plugins extend the base simulator into a full test harness. The SwarmFly platform exposes multi‑agent aerial swarms to a wide range of internal and external disruptions, enabling observation and quantification of underlying swarm control and behavioral mechanisms. This study verifies and characterizes each subsystem through eight experiments that measure formation accuracy, wind tolerance, fault recovery, energy endurance, and airspace compliance. The platform runs entirely in MATLAB. Its modular design supports straightforward extension toward hardware‑in‑the‑loop testing, larger swarms, and higher‑fidelity dynamics. An open‑source release is available at [https://github.com/abhishekphadke/SwarmFly.git]
Authors: Fanfu Xue, En Yu, Yantian Shen, Zhikun Hu, Hongjun Wang, Yang Yang, Xindi Wang, Jiande Sun
Abstract: UAV Vision‑Language Navigation (UAV‑VLN) is typically formulated as a holistic search‑and‑reach problem, where long‑range target discovery and final target approach are optimized and evaluated jointly. This formulation makes it difficult to assess a critical capability of aerial embodied agents, namely whether a UAV can accurately ground a visible target and translate vision‑language evidence into precise 3D motion once the target enters its field of view. To address this limitation, we introduce UAV‑VLN‑FOV, a target‑visible navigation task that isolates the see‑and‑reach stage and enables a more diagnostic evaluation of terminal reaching ability. We further propose 3DG‑VLN, a vision‑language waypoint prediction framework guided by dynamic 3D direction cues to enhance fine‑grained visual grounding and spatial direction alignment for precise target reaching. Specifically, 3DG‑VLN adaptively processes high‑resolution front‑view and downward‑view observations to preserve fine‑grained visual and geometric details for target grounding. It also updates the target‑relative direction online during closed‑loop navigation, allowing the agent to maintain spatial alignment with the target and reduce accumulated direction drift. To support this task, we construct a dedicated high‑resolution benchmark which contains 2,717 trajectories with target‑oriented high‑level instructions, high‑resolution front‑view and downward‑view egocentric observations, and continuous 3D waypoint annotations. Experiments show that 3DG‑VLN outperforms competitive UAV‑VLN baselines, achieving a 13.82% improvement in success rate. Real‑world trials further demonstrate the potential of 3DG‑VLN for practical see‑and‑reach navigation. The source code and benchmark are available at https://github.com/xuefanfu/3DG‑VLN.
Authors: Guo Pu, Yixuan Han, Haofeng Li, Yao Zhang, Hui Zhou, Zhouhui Lian
Abstract: Online 3D reconstruction from monocular image sequences is a challenging and ongoing research topic. 3D Gaussian Splatting (3DGS), leveraging its high‑quality real‑time rendering capability, empowers online 3D reconstruction to represent dense scenes with enhanced expressiveness, and thus holds great promise for a wide range of applications such as robotics and AR/VR. However, existing online 3DGS methods still suffer from some key challenges: fragile camera pose estimation due to the lack of global optimization, and low optimization efficiency in large‑scale or long‑sequence scenarios. To address these issues, we propose a robust and efficient online voxelized 3DGS reconstruction framework integrated with global \textSim(3) optimization, which enables reliable camera tracking and efficient global loop closure for both camera poses and voxelized 3DGS. To accelerate the convergence of the voxelized 3DGS, we further introduce a color residual learning strategy, which not only boosts optimization speed but also enhances rendering quality. Extensive experiments on diverse indoor and outdoor datasets demonstrate that our method achieves state‑of‑the‑art performance in both camera pose estimation accuracy and rendering quality, while retaining real‑time efficiency. Additionally, we develop and deploy a real‑world UAV‑based active reconstruction system grounded on our proposed method, validating its robustness and generalizability for practical online 3D reconstruction tasks. Our code and data are available at https://github.com/TrickyGo/MoonSplat.
Authors: Sripath Mishra, Bharat Bhargava, Zizheng Liu, Shafkat Islam
Abstract: Intrusion detection systems (IDS) trained on wired‑network benchmarks degrade sharply in real‑world unmanned aerial vehicle (UAV) swarms, where mobility, fluctuating link quality, and decentralized routing reshape traffic distributions. Existing UAV‑specific datasets also do not systematically vary these conditions, leaving no way to train or test an IDS against the very shift that defeats it. We present UAV‑CAS, a large‑scale labeled flow dataset for UAV‑network intrusion detection, generated by a Containernet digital twin that is systematically calibrated against AERPAW testbed measurements. We have a four‑layer calibration pipeline spanning altitude‑dependent path loss, mission‑specific mobility, the link‑level performance chain, and end‑to‑end trace fidelity. UAV‑CAS comprises 99,492 flows drawn from 1,024 configurations that span five attack families (DoS, DDoS, blackhole, wormhole, replay) and nine collaborative attack compositions. A diversity analysis shows that high‑rate attacks separate from benign traffic up to an order of magnitude more strongly than in any prior benchmark, while stealth attacks deliberately blend with benign traffic. Across ten baseline IDS, binary attack detection saturates above 0.98, confirming the dataset is learnable, whereas full attack‑class identification remains hard ‑‑ per‑class F_1 ranges from near zero to 0.82 and falls into the single digits for stealth attacks. We release the dataset, simulator, and calibration data to support reproducible UAV intrusion‑detection research.
Authors: Isai Daniel Chacón, Zhongqi Miao, Bruno Demuro, Caleb Robinson, Rahul Dodhia, Lasha Otarashvili, Jason Holmberg, Kirk Larsen, Howard Frederick, Nathan J. Pamperin, Pablo Arbeláez, Juan M. Lavista Ferres
Abstract: Automated aerial wildlife surveys increasingly rely on deep learning, yet standard object detectors require bounding‑box annotations, reported to be up to seven times slower and three times more expensive to produce than point‑level labels. To address this bottleneck, we introduce the Overhead Wildlife Locator (OWL), a weakly supervised density‑estimation framework with three variants: OWL‑C, a fully convolutional model for high‑throughput screening; OWL‑T, a Swin‑augmented hybrid for heterogeneous, cluttered scenes; and OWL‑D, built on a frozen DINOv3 ViT‑H+/16 encoder with a DPT‑style fusion decoder. We benchmark all three against POLO, YOLOv11n, and YOLOv11l across five public aerial datasets, from sparse fixed‑wing savanna surveys to dense UAV paddock imagery, and against the published HerdNet baseline on its native Delplanque split. OWL‑D sets a new state of the art on Delplanque (0.934 AP vs. HerdNet's 0.840) and records the highest AP on four of the five datasets. Performance is regime‑dependent: on the extreme‑density SheepCounter UAV dataset the hybrid OWL‑T leads (0.978 AP) and the convolutional variants attain the lowest counting error, whereas the foundation‑based OWL‑D degrades, indicating which variant suits which survey type. We further validate operational readiness on the Alaska Department of Fish and Game's 2022 Central Arctic Caribou census: under cross‑herd and cross‑temporal transfer, OWL‑C fine‑tuned on the 2017 Porcupine Caribou Herd split attains F1 = 0.965 on a held‑out patch test set, with a signed count error of +3.1% aggregated across the released test patches. We release the OWL code, model weights, and the annotated Porcupine Caribou Herd 2017 (PCH) and Central Arctic Herd 2022 (CAH) patches, the first open patch‑level datasets for large‑scale caribou aerial surveys, at https://github.com/microsoft/MegaDetector‑Overhead.
Authors: Marius Bayizere
Abstract: Unmanned Aerial Vehicle (UAV) threats have emerged as a defining security challenge of the 21st century. This paper presents DroneShield‑AI, a unified open framework integrating six processing layers: RF signal classification, acoustic motor‑signature detection, YOLOv8‑based visual detection, evidence‑weighted sensor fusion, a Behavioral Intent Classification Engine (BICE), and a Graph Neural Network Swarm Intelligence Module (GNN‑SIM). BICE introduces the first systematic six‑class threat taxonomy for drone flight patterns, enabling predictive operator alerts with a 30‑second advance‑warning horizon. GNN‑SIM is the first open framework for adversarial multi‑drone formation analysis using Graph Attention Networks. Evaluated on three publicly available real‑world datasets, the fused pipeline achieves 96.1% detection accuracy, 3.2% false alarm rate, AUC‑ROC: 0.981, and 142ms end‑to‑end latency on commodity CPU‑class hardware at approximately 500‑780 USD total system cost. All code, model weights, and simulation datasets are publicly released at submission.
Authors: Yanyan Chen, Ruigang Fu, Yu Song, Ping Zhong
Abstract: Severe image degradation under low‑light nighttime conditions constitutes a core bottleneck preventing all‑day applications for UAV‑based single object tracking. Existing image enhancement methods often struggle to distinguish between target and background regions, which can easily lead to amplified background noise or compromise target features. To overcome this limitation, we propose TAE, a target‑aware low‑light enhancement framework tailored for nighttime object tracking. Guided explicitly by weak supervisory signals from tracking bounding boxes, the framework performs region‑aware enhancement to ensure operations focus on the target area. It further adopts an adaptive RGB multi‑curve fusion mechanism to achieve refined modeling and adaptive adjustment across different regions. To facilitate research in this domain, we also contribute DarkSOT, a new benchmark for nighttime UAV tracking, comprising 268 sequences across 9 target categories. Experimental results on the DarkSOT and UAVDark135 demonstrate that TAE significantly improves tracking performance in low‑light nighttime scenarios, exhibiting strong robustness and generalization. The DarkSOT dataset is available at https://github.com/Fu0511/DarkSOT‑Dataset.
Authors: Hongtao Yang, Bineng Zhong, Qihua Liang, Yaozong Zheng, Xiantao Hu, Yuanliang Xue, Shuxiang Song
Abstract: Given the real‑time demands of UAV tracking, many methods simplify the backbone to reduce computation, but this often weakens feature representation and degrades performance in complex scenarios. To alleviate this issue, we propose EATrack, an efficient and asymmetric UAV tracking framework centered around a teacher‑guided dual‑branch distillation strategy that enhances the feature expressiveness of the lightweight student model. Specifically, EATrack investigates two complementary perspectives of knowledge transfer: spatially focused feature‑level distillation that compensates for weakened representations by guiding the student to learn strong target representations, and prediction‑level distillation that enhances spatial localization by learning the teacher's capability for accurate target localization. Furthermore, to enhance robustness against appearance variations, we introduce a fine‑grained target‑aware distillation strategy that selectively transfers the teacher's target modeling capacity to the student. A temporal adaptation module is incorporated at inference to enhance robustness over time. Experiments on five UAV benchmarks demonstrate that EATrack achieves a favorable balance between accuracy and speed. Code: https://github.com/GXNU‑ZhongLab/EATrack
Authors: Hongyu Ding, Sizhuo Zhang, Ziming Xu, Jinwen Guo, Hongxiu Liu, Xingzhi Cheng, Zixuan Chen, Haifei Qi, Duo Wang, Hao Xu, Jieqi Shi, Yifan Zhang, Jing Huo, Jian Cheng, Yang Gao, Jiebo Luo
Abstract: Embodied navigation requires an agent to map language and visual observations to a stream of spatial actions that drive a real robot through environments it has never seen. The dominant approach has been to scale vision‑language‑action (VLA) foundation models on ever‑larger collections of robot trajectories. This paper argues that, for navigation specifically, generality can be obtained structurally, not only through data scale. The underlying decision structure of navigation reduces to a single Language‑Vision‑Robot Actions Translation. The language action emits semantic‑level directional command and the vision action emits a pixel‑level visual target. Both outputs lie inside the natural output manifold of pretrained multimodal large language models (MLLMs), so the task can be reasoned about by an agent rather than learned from robot data. Therefore, we present Uni‑LaViRA, a unified agentic architecture that extends the same insight to four task families (VLN‑CE, ObjectNav, EQA, and Aerial‑VLN) and to four heterogeneous real robots (Wheeled, Quadruped, Humanoid robot, and a self‑built UAV) in a zero‑shot manner. Two agent‑loop mechanisms make this unification practical. TODO List Memory (TDM) rewrites a structured checklist of pending sub‑goals at every step, reciting the unfinished items back into the agent's most recent attention window. Second Chance Backtrack (SCB) rolls the robot back to the pre‑error state and conditions the agent's next plan on the failed sub‑trajectory, turning single‑pass navigation into a self‑correcting process. With zero training effort, Uni‑LaViRA reaches 60.7% SR on VLN‑CE R2R, 51.3% on VLN‑CE RxR, 77.7% on HM3D‑v2, 60.0% on HM3D‑OVON, 54.7% on MP3D‑EQA, and 40.0% on OpenUAV, matching or even surpassing recent training navigation foundation models that consume millions of samples and thousands of GPU‑hours.
Authors: Deyi Zhu, Yuji Wang, Yong Liu, Yansong Tang, Bingyao Yu, Jiwen Lu, Jie Zhou
Abstract: Traditional visual object tracking (VOT) methods typically rely on task‑specific supervised training, limiting their generalization to unseen objects and challenging scenarios with distractors, occlusion, and nonlinear motion. Recent vision foundation models, exemplified by SAM 2, learn strong video understanding priors from large‑scale pretraining and offer a promising foundation for building more robust and generalizable trackers. However, directly applying SAM 2 to VOT remains suboptimal, as it does not explicitly model target motion dynamics or enforce geometric and semantic consistency across frames, both of which are essential for reliable tracking. To address this issue, we propose SAMOSA, a new tracking framework that adapts SAM 2 to complex VOT scenarios by explicitly leveraging motion, geometry, and semantic cues. Specifically, we introduce a lightweight nonlinear motion predictor to model target dynamics and guide mask selection as well as memory filtering. We further exploit semantic cues to detect target shifts and recover from tracking failures, while geometric cues are incorporated as structural constraints to improve tracking stability. In this way, SAMOSA bridges the gap between the implicit video understanding prior of SAM 2 and explicit tracking‑oriented modeling. Extensive experiments show that SAMOSA consistently outperforms state‑of‑the‑art SAM 2‑‑based approaches on general benchmarks, demonstrates stronger generalization than supervised VOT methods, and achieves substantial gains on anti‑UAV datasets, which typify complex nonlinear motion scenarios. Our code is available at https://github.com/DurYi/SAMOSA.
Authors: Xiang Yang, Yongli Wang, HaiFeng Li, Yunsheng Zhang
Abstract: Feed‑forward 3D reconstruction has advanced rapidly, but current models remain unreliable in UAV photogrammetric acquisition. We argue that this failure is caused not only by appearance‑domain shift, but also by UAV‑specific camera‑geometry variations, especially oblique views and HFOV‑height ambiguity. Existing UAV datasets mainly emphasize scene diversity and provide limited coverage of camera configurations, which restricts robustness evaluation and UAV‑domain adaptation. To address this gap, we introduce UAVFF3D, a geometry‑aware real‑synthetic benchmark for feed‑forward UAV 3D reconstruction. UAVFF3D contains more than 170k real UAV images and more than 370k synthetic images rendered from high‑quality textured 3D models, covering diverse HFOVs, flight altitudes, viewing directions, and acquisition patterns. It also includes a controlled HFOV‑height test subset for diagnosing projection‑geometry ambiguity. We further propose an evaluation protocol that jointly assesses camera‑geometry estimation and dense scene reconstruction under a shared global alignment, avoiding the bias caused by separate camera and geometry alignments. Experiments on representative feed‑forward reconstruction models show that UAVFF3D‑based domain adaptation consistently improves camera and geometry estimation, reducing Ray Error by up to 84.2%, Pose ATE by up to 76.0%, and Chamfer Distance by up to 41.1%. In oblique scenes, adaptation reduces the oblique‑nadir rotation gap by up to 90.7%. Under HFOV‑height ambiguity, it improves robustness across HFOV‑height configurations and yields more stable performance across HFOV settings. Incorporating camera priors further improves reconstruction under UAV‑specific acquisition geometries. The dataset and evaluation code are available at https://github.com/yanxian‑ll/UAVFF3D .
Authors: Jianlin Ye, Christos Kyrkou, Panayiotis Kolios
Abstract: The integration of Unmanned Aerial Vehicles(UAVs) into Intelligent Transportation Systems (ITS) offers synoptic visibility for traffic monitoring, yet scalable deployment is hindered by trajectory fragmentation, where vehicle identity persistence is lost across multi‑UAV Fields of View (FOV). While state‑of‑the‑art frameworks excel in optimizing local trajectory extraction and stability for single‑drone imagery, they often function as isolated data silos that generate disjointed trajectories, thereby precluding network‑level analysis such as Origin‑Destination estimation. This paper presents a real‑time Multi‑Camera Multi‑Vehicle Tracking (MCMT) system designed to handle global identity persistence. Addressing the visual ambiguity and computational cost of appearance‑based Re‑Identification (Re‑ID) in nadir views, we introduce a lightweight Topology‑Based Spatiotemporal Handover mechanism. We implement a high‑throughput parallel pipeline leveraging YOLO11 and ByteTrack to process concurrent 4K streams. Our core contribution is a deterministic queue‑based matching algorithm that utilizes geometric overlaps and virtual lane discretization to predictively manage identity handover via FIFO queues. Experimental results on complex urban environments, including intersections and merging traffic, demonstrate a Handover Success Rate (HOSR) of 99.8% in continuous traffic flows, significantly outperforming Re‑ID baselines (74.1%) while validating edge deployment feasibility. The source code is available at https://github.com/JYe9/multi‑camera‑multi‑vehicle‑tracking‑system.
Authors: Yijun Lu, Zilei Yang, Yuyin Ma
Abstract: While Isaac Lab provides massive parallel UAV simulation, OmniSafe and safe‑control‑gym provide constrained‑RL benchmarks, and CBFKit provides control‑barrier‑function synthesis tooling, no existing framework unifies these capabilities for end‑to‑end safety‑constrained training. ParallelCBF is the first framework to unify (i)~tensor‑parallel UAV environments, (ii)~hard‑gate CBF safety filters, (iii)~sharded BC‑to‑RL pipelines, and (iv)~first‑class operational auditability ‑‑ pre‑registration, watchdog registries, failure forensics, and dataset audits as composable APIs rather than user‑implemented scripts. We release ParallelCBF v0.1.0 under Apache~2.0 with a four‑layer composable API, a CPU PyTorch reference implementation of a dual‑barrier (squared / linear‑predictive) CBF, property‑based safety invariance tests across vectorized batch sizes that complete in 1.67~s for the full 39‑test suite, and a 31,415‑episode behavior‑cloning collection campaign whose curriculum mix, per‑bucket yields, and dataset SHA‑256 are auditable through the framework's own \textttops primitives. We report a representative end‑to‑end pipeline execution in which the framework's auditability layer halted a downstream training stage that did not meet pre‑registered convergence criteria, preventing silent propagation of a degraded checkpoint ‑‑ an architectural property we argue is necessary, not merely useful, for reproducible empirical robotics research. The framework is installable via \textttpip install parallelcbf; source and release artifacts are available at https://github.com/xiaoyang‑123‑cell/ParallelCBF.
Authors: Jianping Li, Pengfei Wan, Zhongyuan Liu, Yi Wang, Yiheng Chen, Xinhang Xu, Rui Jin, Boyu Zhou, Lihua Xie
Abstract: Efficient UAV exploration in unknown environments requires rapid coverage expansion while maintaining accurate and reliable localization, since safe navigation in complex scenes depends on consistent mapping and pose estimation. However, for conventional LiDAR‑equipped UAVs, the observable region is tightly coupled with the UAV pose and motion. Expanding coverage often requires additional translational or rotational maneuvers, which can reduce exploration efficiency and increase the risk of localization degradation in geometrically challenging environments. Motorized rotating LiDARs provide a promising solution by actively adjusting the sensor viewing direction without changing the UAV motion, thereby introducing an additional sensing degree of freedom. Nevertheless, existing exploration systems rarely exploit this scanning freedom as an explicit decision variable linked to both exploration progress and localization quality. To address this gap, we develop a UAV platform equipped with an independently actuated rotating LiDAR and propose a hierarchical exploration framework. The global planner organizes frontiers into representative viewpoints and sequences them using topology‑aware transition costs. Built upon this planner, FU‑MPC serves as a local receding‑horizon scan controller that optimizes LiDAR rotation along the predicted flight trajectory. The controller jointly considers frontier‑aware exploration utility and direction‑dependent localization uncertainty, while lightweight surrogate evaluation enables real‑time onboard execution. Experiments in complex environments demonstrate that the proposed system improves exploration efficiency while maintaining robust localization performance compared with fixed‑pattern scanning and uncertainty‑only baselines. The project page can be found at https://kafeiyin00.github.io/FU‑MPC/.
Authors: Weiqi Yan, Lixin Chen, Xiangrui Hou, Zhipeng Cai, Youbiao Wang, Yangyang Shi, Yu Zang, Cheng Wang
Abstract: Tiny UAV detection from an onboard event camera is difficult when the observer and target move at the same time. In this motion‑on‑motion regime, ego‑motion activates background edges across buildings, vegetation, and horizon structures, while the UAV may appear as a sparse event cluster. Unlike static‑ or ground‑observer event‑based UAV detection, onboard UAV‑view detection breaks the clean‑background assumption because sensor ego‑motion can activate dense background events over the entire field of view. To explore this practical problem, we present M^2E‑UAV, to the best of our knowledge, the first onboard UAV‑view motion‑on‑motion event‑based dataset and benchmark for tiny UAV detection, where both the sensing platform and the target UAV are moving. M^2E‑UAV provides synchronized event streams and IMU measurements collected from an onboard sensing platform, together with event‑level UAV foreground labels derived from temporally propagated 10 Hz bounding‑box annotations. The processed benchmark contains 87,223 training samples and 21,395 validation samples across four scene families: sunny building‑forest, sunny farm‑village, sunset building‑forest, and sunset farm‑village. We define a train/validation split and an evaluation protocol for comparing representative existing baselines across event‑frame, voxel‑grid, and point‑set representations, with optional IMU input. The benchmark results show that existing baselines remain limited under sparse tiny‑target evidence and dense ego‑motion‑induced background events. Code and benchmark files will be released at https://github.com/Wickyan/M2E‑UAV.
Authors: Gabriel Jeanson, David-Alexandre Duclos, William Larrivée-Hardy, Noé Cochet, Matěj Boxan, Anthony Deschênes, François Pomerleau, Philippe Giguère
Abstract: Sustainable forest management relies on precise species composition mapping, yet traditional ground surveys are labour‑intensive and geographically constrained. While Uncrewed Aerial Vehicles (UAVs) offer scalable data collection, the transition to deep learning‑based interpretation is bottlenecked by the severe scarcity of expert‑annotated imagery, particularly in complex, visually heterogeneous regeneration zones. This paper addresses the dual challenges of data scarcity and extreme class imbalance in the semantic segmentation of fine‑grained forest regeneration species by providing a scalable framework that reduces reliance on manual photo‑interpretation for high‑resolution, millimetre‑level aerial imagery. Importantly, we leverage the large‑scale vision‑language Nano Banana Pro model to simultaneously generate high‑fidelity images and their corresponding pixel‑aligned semantic masks from prompts. We introduce WilDReF‑Q‑V2, an expansion of a natural forest dataset with 13 977 new unlabelled and 50 labelled real images, as well as the Gen4Regen dataset, featuring 2101 pairs of synthetic images and semantic masks. Our methodology integrates real‑world data with AI‑generated images, highlighting that AI‑generated data is highly complementary to real‑world data, with unified training yielding an F1 score improvement of over 15 %pt compared to purely supervised baselines. Furthermore, we demonstrate that even small quantities of prompt‑generated data significantly improve performance for underrepresented species, some of which saw per‑species F1 score gains of up to 30 %pt. We conclude that vision‑language models can serve as agile data generators, effectively bootstrapping perception tasks for niche AI domains where expert labels are scarce or unavailable. Our datasets, source code, and models will be available at https://norlab‑ulaval.github.io/gen4regen.
Authors: Hadi Hajieghrary, Paul Schmitt
Abstract: Abrupt cable severance in multi‑UAV slung‑load transport redistributes load and changes the active constraint set, leaving limited time for fault diagnosis and reconfiguration. Existing controllers rely on coordinated force allocation, peer‑state exchange, or fixed cable topology, and therefore lack a certified decentralized recovery mechanism for unannounced severance. We present a passive architecture that routes each vehicle's measured cable tension directly into its altitude thrust command, T_i^\mathrmff=T_i, while a surrounding proportional‑derivative, anti‑swing, and projection cascade preserves local tracking feasibility. The main contribution is a conditional hybrid practical input‑to‑state‑stability certificate that composes a slack‑excursion‑bounded taut‑cable reduction, bounded post‑severance Lyapunov jumps, inter‑fault decay, and per‑fault‑cycle contraction ρ\in (0,1) into an explicit recovery envelope under stated actuator, slack, and dwell assumptions. We validate the controller in Drake multibody simulation with five vehicles, a 10 kg payload, Kelvin‑Voigt cables, Dryden wind, and single‑ and dual‑severance schedules: the closed loop attains 0.312‑0.328 m RMSE, 76.1‑95.2 mm peak sag, and recovery within one payload‑pendulum period. Disabling the identity inflates cruise error by 34‑39% and peak sag by 3.6x‑4.0x, identifying local tension feed‑forward as the dominant passive recovery mechanism in the tested decentralized cascade.
Authors: Boyue Xu, Ruichao Hou, Tongwei Ren, Gangshan Wu
Abstract: UAV‑ground visual tracking (UGVT) aims to simultaneously track the same object from both the UAV and the ground view. However, existing two‑stream methods suffer from isolated feature extraction and rely heavily on implicit appearance matching, which struggles to establish reliable correspondence under drastic view differences, leading to tracking unreliability. To address these limitations, we propose VL‑UniTrack, a fully unified framework enhanced by visual‑language prompts. By encoding features from both views within a single shared encoder, our method breaks the barrier of feature isolation to facilitate sufficient cross‑view interaction. To overcome the ambiguity caused by relying solely on appearance matching, we design visual‑language geometric prompting module, which fuses language descriptions with visual features to generate learnable prompts. These prompts are then fed into our prompt‑guided cross‑view adapter module to enable sufficient cross‑view feature interaction and to guide the learning of view‑specific feature representations. Furthermore, a confidence‑modulated mutual distillation loss is proposed to regularize the training by mitigating noise propagation. Extensive experiments demonstrate that our method achieves state‑of‑the‑art performance on the latest benchmark. The code can be downloaded in https://github.com/xuboyue1999/VL‑UniTrack.git
Authors: Yupeng Gao, Tianyu Li, Guoqing Wang, Yang Yang
Abstract: Remote Sensing Image Change Captioning (RSICC) aims to generate spatially grounded natural language descriptions of scene evolution from bi‑temporal imagery, moving beyond binary change masks toward semantic‑level understanding. However, existing methods rely on implicit feature differencing without explicitly modeling structured change semantics, and struggle to reconcile the conflicting representation demands of change detection and caption generation. In addition, current benchmarks provide limited coverage of high‑resolution urban construction scenarios. To address these challenges, we propose PTNet, a prototype‑guided task‑adaptive framework for joint change captioning and detection. PTNet explicitly models structured change semantics through a learnable prototype bank that guides cross‑temporal interaction, disentangles task‑specific representations via multi‑head gating, and injects detection‑derived spatial priors into caption generation, enabling coherent semantic correspondence while preserving fine‑grained spatial sensitivity. Furthermore, we construct UCCD, a large‑scale UAV‑based benchmark comprising 9,000 high‑resolution image pairs and 45,000 annotated sentences for urban construction monitoring. Extensive experiments on UCCD and WHU‑CDC demonstrate that PTNet consistently outperforms existing methods. The dataset and source code are publicly available at https://github.com/G124556/ptnet.
Authors: Lorenzo Beltrame, Jules Salzinger, Filip Svoboda, Phillipp Fanta-Jende, Jasmin Lampert, Radu Timofte, Marco Körner
Abstract: Shadows cast by terrain and tall structures remain a major obstacle for high‑resolution satellite image analysis, degrading classification, detection, and 3D reconstruction performance. Public resources offering geometry‑consistent paired shadow/shadow‑free satellite imagery are essentially missing, and most Earth‑observation datasets are designed for shadow detection or 3D modelling rather than removal. Existing deep shadow‑removal datasets either target ground‑level or aerial scenes or rely on unpaired and weakly supervised formulations rather than explicit satellite pairs. We address this gap with deSEO, a geometry‑aware and physics‑informed methodology that, to the best of our knowledge, is the first to derive paired supervision for satellite shadow removal from the S‑EO shadow detection dataset through a fully replicable pipeline. For each tile, deSEO selects a minimally shadowed acquisition as a weak reference and pairs it with shadowed counterparts using temporal and geometric filtering, Jacobian‑based orientation normalisation, and LoFTR‑RANSAC registration. A per‑pixel validity mask restricts learning to reliably aligned regions, enabling supervision despite residual off‑nadir parallax. In addition to this paired dataset, we develop a DSM‑aware deshadowing model that combines residual translation, perceptual objectives, and mask‑constrained adversarial learning. In contrast, a direct adaptation of a UAV‑based SRNet/pix2pix architecture fails to converge under satellite viewpoint variability. Our model consistently reduces the visual impact of cast shadows across diverse illumination and viewing conditions, achieving improved structural and perceptual fidelity on held‑out scenes. deSEO therefore provides the first reproducible, geometry‑aware paired dataset and baseline for shadow removal in satellite Earth observation.
Authors: Daoxuan Zhang, Ping Chen, Jianyi Zhou, Shuo Yang
Abstract: The rapid advancement of Multimodal Large Language Models (MLLMs) has empowered Unmanned Aerial Vehicle (UAV) with exceptional capabilities in spatial reasoning, semantic understanding, and complex decision‑making, making them inherently suited for UAV Search and Rescue (SAR). However, existing UAV SAR research is dominated by traditional vision and path‑planning methods and lacks a comprehensive and unified benchmark for embodied agents. To bridge this gap, we first propose the novel task of Embodied Search and Rescue (ESAR), which requires aerial agents to autonomously explore complex environments, identify rescue clues, and reason about victim locations to execute informed decision‑making. Additionally, we present ESARBench, the first comprehensive benchmark designed to evaluate MLLM‑driven UAV agents in highly realistic SAR scenarios. Leveraging Unreal Engine 5 and AirSim, we construct four high‑fidelity, large‑scale open environments mapped directly from real‑world Geographic Information System (GIS) data to ensure photorealistic landscapes. To rigorously simulate actual rescue operations, our benchmark incorporates dynamic variables including weather conditions, time of day, and stochastic clue placement. Furthermore, we create a dataset of 600 tasks modeled after real‑world rescue cases and propose a robust set of evaluation metrics. We evaluate diverse baselines, ranging from traditional heuristics to advanced ground and aerial MLLM‑based ObjectNav agents. Experimental results highlight the challenges in ESAR, revealing critical bottlenecks in spatial memory, aerial adaptation, and the trade‑off between search efficiency and flight safety. We hope ESARBench serves as a valuable resource to advance research on Embodied Search and Rescue domain. Source code and project page: https://4amgodvzx.github.io/ESAR.github.io.
Authors: Fangqiang Fan, Zhicheng Zhao, Xiaoliang Ma, Chenglong Li, Jin Tang
Abstract: Fine‑grained RGBT image semantic segmentation is crucial for all‑weather unmanned aerial vehicle (UAV) scene understanding. However, UAV RGBT image semantic segmentation faces two coupled challenges: cross‑modal spatial misalignment caused by sensor parallax and platform vibration, and severe semantic confusion among fine‑grained ground objects under top‑down aerial views. To address these issues, we propose a Graph‑based Semantic Calibration Network (GSCNet) for unaligned UAV RGBT image semantic segmentation. Specifically, we design a Feature Decoupling and Alignment Module (FDAM) that decouples each modality into shared structural and private perceptual components and performs deformable alignment in the shared subspace, enabling robust spatial correction with reduced modality appearance interference. Moreover, we propose a Semantic Graph Calibration Module (SGCM) that explicitly encodes the hierarchical taxonomy and co‑occurrence regularities among ground‑object categories in UAV scenes into a structured category graph, and incorporates these priors into graph‑attention reasoning to calibrate predictions of visually similar and rare categories. In addition, we construct the Unaligned RGB‑Thermal Fine‑grained (URTF) benchmark, to the best of our knowledge, the largest and most fine‑grained benchmark for unaligned UAV RGBT image semantic segmentation, containing over 25,000 image pairs across 61 semantic categories with realistic cross‑modal misalignment. Extensive experiments on URTF demonstrate that GSCNet significantly outperforms state‑of‑the‑art methods, with notable gains on fine‑grained categories. The dataset is available at https://github.com/mmic‑lcl/Datasets‑and‑benchmark‑code.
Authors: Mingbo Hong, Feng Liu, Caroline Gevaert, George Vosselman, Hao Cheng
Abstract: Detectors often suffer from degraded performance, primarily due to the distributional gap between the source and target domains. This issue is especially evident in single‑source domains with limited data, as models tend to rely on confounders (e.g., illumination, co‑occurrence, and style) from the source domain, leading to spurious correlations that hinder generalization. To this end, this paper proposes a novel Basis‑driven framework for domain generalization, namely Bridge, that incorporates causal inference into object detection. By learning the low‑rank bases for front‑door adjustment, Bridge blocks confounders' effects to mitigate spurious correlations, while simultaneously refining representations by filtering redundant and task‑irrelevant components. Bridge can be seamlessly integrated with both discriminative (e.g., DINOv2/3, SAM) and generative (e.g., Stable Diffusion) Vision Foundation Models (VFMs). Extensive experiments across multiple domain generalization object detection datasets, i.e., Cross‑Camera, Adverse Weather, Real‑to‑Artistic, Diverse Weather Datasets, and Diverse Weather DroneVehicle (our newly augmented real‑world UAV‑based benchmark), underscore the superiority of our proposed method over previous state‑of‑the‑art approaches. The project page is available at: https://mingbohong.github.io/Bridge/.
Authors: Zhicheng Song, Yongjian Li, Kai Chen, Yulin Li, Fan Shi, Jun Ma
Abstract: Convex free regions provide a structured and optimization‑friendly representation of collision‑free space for robot navigation in unknown and cluttered environments. However, existing methods typically enlarge local collision‑free regions mainly according to surrounding obstacle geometry. In cluttered environments, such strategies may fail to generate regions that both accommodate robot geometry and preserve traversable extension along candidate motion directions, thereby limiting downstream traversal, especially in narrow passages. Even when such a region is available, safe motion generation remains challenging, because safety checking at discretized trajectory samples does not guarantee continuously collision‑free motion when robot geometry is modeled explicitly. To address these issues, we propose a navigation framework that jointly incorporates candidate motion directions and robot geometry into convex free‑region generation, and achieves continuously collision‑free motion through continuous‑safe trajectory generation. Within each region, the framework performs geometry‑aware target pose selection and trajectory generation, together with Lipschitz‑based continuous safety certification and local refinement. The resulting free regions and candidate motions are maintained in a region‑based graph to support incremental planning. Quantitative results in cluttered 2D navigation scenarios show that the proposed method generates free regions better aligned with downstream traversal and enables reliable collision‑free navigation, while additional 3D and real‑world experiments on a quadrupedal robot and a UAV demonstrate the extensibility and practical applicability of the framework. The open‑source project can be found at https://github.com/ZhichengSong6/FRGraph.
Authors: Mohammed Q. Alkhatib
Abstract: Hyperspectral image (HSI) classification remains challenging due to high spectral dimensionality, redundancy, and limited labeled data. Although convolutional neural networks (CNNs) and Vision Transformers (ViTs) achieve strong performance by exploiting spectral‑spatial information and long‑range dependencies, they often incur high computational cost and large model size, limiting practical use. To address these limitations, a unified hybrid framework, termed ConvVitMamba, is proposed for efficient HSI classification. The architecture integrates three components: a multiscale convolutional feature extractor to capture local spectral, spatial, and joint patterns; a Vision Transformer based tokenization and encoding stage to model global contextual relationships; and a lightweight Mamba inspired gated sequence mixing module for efficient content‑aware refinement without quadratic self‑attention. Principal Component Analysis (PCA) is used as preprocessing to reduce redundancy and improve efficiency. Experiments on four benchmark datasets, including Houston and three UAV borne QUH datasets (Pingan, Qingyun, and Tangdaowan), demonstrate that ConvVitMamba consistently outperforms CNN, Transformer, and Mamba based methods while maintaining a favorable balance between accuracy, model size, and inference efficiency. Ablation studies confirm the complementary contributions of all components. The results indicate that the proposed framework provides an effective and efficient solution for HSI classification in diverse scenarios. The source code is publicly available at https://github.com/mqalkhatib/ConvVitMamba
Authors: Peiwen Yang, Shiyu Bai, Weisong Wen, Yixin Gao, Jiahao Hu
Abstract: Safe and agile trajectory planning is essential for autonomous systems, especially during complex aerobatic maneuvers. Motivated by the recent success of diffusion models in generative tasks, this paper introduces AeroTrajGen, a novel framework for diffusion‑based trajectory generation that incorporates control barrier function (CBF)‑guided sampling during inference, specifically designed for unmanned aerial vehicles (UAVs). The proposed CBF‑guided sampling addresses two critical challenges: (1) mitigating the inherent unpredictability and potential safety violations of diffusion models, and (2) reducing reliance on extensively safety‑verified training data. During the reverse diffusion process, CBF‑based guidance ensures collision‑free trajectories by seamlessly integrating safety constraint gradients with the diffusion model's score function. The model features an obstacle‑aware diffusion transformer architecture with multi‑modal conditioning, including trajectory history, obstacles, maneuver styles, and goal, enabling the generation of smooth, highly agile trajectories across 14 distinct aerobatic maneuvers. Trained on a dataset of 2,000 expert demonstrations, AeroTrajGen is rigorously evaluated in simulation under multi‑obstacle environments. Simulation results demonstrate that CBF‑guided sampling reduces collision rates by 94.7% compared to unguided diffusion baselines, while preserving trajectory agility and diversity. Our code is open‑sourced at https://github.com/RoboticsPolyu/CBF‑DMP.
Authors: Dian Shao, Zhengzheng Xu, Peiyang Wang, Like Liu, Yule Wang, Jieqi Shi, Jing Huo
Abstract: UAV vision‑language navigation (VLN) requires an agent to navigate complex 3D environments from an egocentric perspective while following ambiguous multi‑step instructions over long horizons. Existing zero‑shot methods remain limited, as they often rely on large base models, generic prompts, and loosely coordinated modules. In this work, we propose FineCog‑Nav, a top‑down framework inspired by human cognition that organizes navigation into fine‑grained modules for language processing, perception, attention, memory, imagination, reasoning, and decision‑making. Each module is driven by a moderate‑sized foundation model with role‑specific prompts and structured input‑output protocols, enabling effective collaboration and improved interpretability. To support fine‑grained evaluation, we construct AerialVLN‑Fine, a curated benchmark of 300 trajectories derived from AerialVLN, with sentence‑level instruction‑trajectory alignment and refined instructions containing explicit visual endpoints and landmark references. Experiments show that FineCog‑Nav consistently outperforms zero‑shot baselines in instruction adherence, long‑horizon planning, and generalization to unseen environments. These results suggest the effectiveness of fine‑grained cognitive modularization for zero‑shot aerial navigation. Project page: https://smartdianlab.github.io/projects‑FineCogNav.
Authors: Lorenzo Beltrame, Jules Salzinger, Filip Svoboda, Jasmin Lampert, Phillipp Fanta-Jende, Radu Timofte, Marco Körner
Abstract: We present a three‑stage progressive shadow‑removal pipeline for the CVPR2026 NTIRE WSRD+ challenge. Built on OmniSR, our method treats deshadowing as iterative direct refinement, where later stages correct residual artefacts left by earlier predictions. The model combines RGB appearance with frozen DINOv2 semantic guidance and geometric cues from monocular depth and surface normals, reused across all stages. To stabilise multi‑stage optimisation, we introduce a contraction‑constrained objective that encourages non‑increasing reconstruction error across the cascade. A staged training pipeline transfers from earlier WSRD pretraining to WSRD+ supervision and final WSRD+ 2026 adaptation with cosine‑annealed checkpoint ensembling. On the official WSRD+ 2026 hidden test set, our final ensemble achieved 26.680 PSNR, 0.8740 SSIM, 0.0578 LPIPS, and 26.135 FID, ranked first overall, and won the NTIRE 2026 Image Shadow Removal Challenge. The strong performance of the proposed model is further validated on the ISTD+ and UAV‑SC+ datasets.
Authors: Zile Guo, Zhan Chen, Enze Zhu, Kan Wei, Yongkang Zou, Xiaoxuan Liu, Lei Wang
Abstract: Recent advances in world models have demonstrated strong capabilities in simulating physical reality, making them an increasingly important foundation for embodied intelligence. For UAV agents in particular, accurate prediction of complex 3D dynamics is essential for autonomous navigation and robust decision‑making in unconstrained environments. However, under the highly dynamic camera trajectories typical of UAV views, existing world models often struggle to maintain spatiotemporal physical consistency. A key reason lies in the distribution bias of current training data: most existing datasets exhibit restricted 2.5D motion patterns, such as ground‑constrained autonomous driving scenes or relatively smooth human‑centric egocentric videos, and therefore lack realistic high‑dynamic 6‑DoF UAV motion priors. To address this gap, we present MotionScape, a large‑scale real‑world UAV‑view video dataset with highly dynamic motion for world modeling. MotionScape contains over 30 hours of 4K UAV‑view videos, totaling more than 4.5M frames. This novel dataset features semantically and geometrically aligned training samples, where diverse real‑world UAV videos are tightly coupled with accurate 6‑DoF camera trajectories and fine‑grained natural language descriptions. To build the dataset, we develop an automated multi‑stage processing pipeline that integrates CLIP‑based relevance filtering, temporal segmentation, robust visual SLAM for trajectory recovery, and large‑language‑model‑driven semantic annotation. Extensive experiments show that incorporating such semantically and geometrically aligned annotations effectively improves the ability of existing world models to simulate complex 3D dynamics and handle large viewpoint shifts, thereby benefiting decision‑making and planning for UAV agents in complex environments. The dataset is publicly available at https://github.com/Thelegendzz/MotionScape
Authors: Xiang Zhang, Tengfei Wang, Fang Xu, Xin Wang, Zongqian Zhan
Abstract: Visual localization in large‑scale UAV scenarios is a critical capability for autonomous systems, yet it remains challenging due to geometric complexity and environmental variations. While 3D Gaussian Splatting (3DGS) has emerged as a promising scene representation, existing 3DGS‑based visual localization methods struggle with robust pose initialization and sensitivity to rendering artifacts in large‑scale settings. To address these limitations, we propose LSGS‑Loc, a novel visual localization pipeline tailored for large‑scale 3DGS scenes. Specifically, we introduce a scale‑aware pose initialization strategy that combines scene‑agnostic relative pose estimation with explicit 3DGS scale constraints, enabling geometrically grounded localization without scene‑specific training. Furthermore, in the pose refinement, to mitigate the impact of reconstruction artifacts such as blur and floaters, we develop a Laplacian‑based reliability masking mechanism that guides photometric refinement toward high‑quality regions. Extensive experiments on large‑scale UAV benchmarks demonstrate that our method achieves state‑of‑the‑art accuracy and robustness for unordered image queries, significantly outperforming existing 3DGS‑based approaches. Code is available at: https://github.com/xzhang‑z/LSGS‑Loc
Authors: Wenfeng Zhang, Jun Ni, Yue Meng, Xiaodong Pei, Wei Hu, Qibing Qin, Lei Huang
Abstract: Object detection in unmanned aerial vehicle (UAV) images remains a highly challenging task, primarily caused by the complexity of background noise and the imbalance of target scales. Traditional methods easily struggle to effectively separate objects from intricate backgrounds and fail to fully leverage the rich multi‑scale information contained within images. To address these issues, we have developed a synergistic feature fusion network (SFFNet) with dual‑domain edge enhancement specifically tailored for object detection in UAV images. Firstly, the multi‑scale dynamic dual‑domain coupling (MDDC) module is designed. This component introduces a dual‑driven edge extraction architecture that operates in both the frequency and spatial domains, enabling effective decoupling of multi‑scale object edges from background noise. Secondly, to further enhance the representation capability of the model's neck in terms of both geometric and semantic information, a synergistic feature pyramid network (SFPN) is proposed. SFPN leverages linear deformable convolutions to adaptively capture irregular object shapes and establishes long‑range contextual associations around targets through the designed wide‑area perception module (WPM). Moreover, to adapt to the various applications or resource‑constrained scenarios, six detectors of different scales (N/S/M/B/L/X) are designed. Experiments on two challenging aerial datasets (VisDrone and UAVDT) demonstrate the outstanding performance of SFFNet‑X, achieving 36.8 AP and 20.6 AP, respectively. The lightweight models (N/S) also maintain a balance between detection accuracy and parameter efficiency. The code will be available at https://github.com/CQNU‑ZhangLab/SFFNet.
Authors: Xiaoran Zhang, Yu Liu, Jinyu Liang, Kangqiushi Li, Zhiwei Huang, Huaxin Xiao
Abstract: Cross‑modal Thermal Geo‑localization (TG) provides a robust, all‑weather solution for Unmanned Aerial Vehicles (UAVs) in Global Navigation Satellite System (GNSS)‑denied environments. However, profound thermal‑visible modality gaps introduce severe feature ambiguity, systematically corrupting conventional coarse‑to‑fine registration. To dismantle this bottleneck, we propose SCC‑Loc, a unified Semantic‑Cascade‑Consensus localization framework. By sharing a single DINOv2 backbone across global retrieval and MINIMA_\textRoMa matching, it minimizes memory footprint and achieves zero‑shot, highly accurate absolute position estimation. Specifically, we tackle modality ambiguity by introducing three cohesive components. First, we design the Semantic‑Guided Viewport Alignment (SGVA) module to adaptively optimize satellite crop regions, effectively correcting initial spatial deviations. Second, we develop the Cascaded Spatial‑Adaptive Texture‑Structure Filtering (C‑SATSF) mechanism to explicitly enforce geometric consistency, thereby eradicating dense cross‑modal outliers. Finally, we propose the Consensus‑Driven Reliability‑Aware Position Selection (CD‑RAPS) strategy to derive the optimal solution through a synergy of physically constrained pose optimization. To address data scarcity, we construct Thermal‑UAV, a comprehensive dataset providing 11,890 diverse thermal queries referenced against a large‑scale satellite ortho‑photo and corresponding spatially aligned Digital Surface Model (DSM). Extensive experiments demonstrate that SCC‑Loc establishes a new state‑of‑the‑art, suppressing the mean localization error to 9.37 m and providing a 7.6‑fold accuracy improvement within a strict 5‑m threshold over the strongest baseline. Code and dataset are available at https://github.com/FloralHercules/SCC‑Loc.
Authors: Wenhao Li, Zimeng Wu, Yu Wu, Zehua Fu, Jiaxin Chen
Abstract: Unmanned aerial vehicle (UAV) based object detection is a critical but challenging task, when applied in dynamically changing scenarios with limited annotated training data. Layout‑to‑image generation approaches have proved effective in promoting detection accuracy by synthesizing labeled images based on diffusion models. However, they suffer from frequently producing artifacts, especially near layout boundaries of tiny objects, thus substantially limiting their performance. To address these issues, we propose UAVGen, a novel layout‑to‑image generation framework tailored for UAV‑based object detection. Specifically, UAVGen designs a Visual Prototype Conditioned Diffusion Model (VPC‑DM) that constructs representative instances for each class and integrates them into latent embeddings for high‑fidelity object generation. Moreover, a Focal Region Enhanced Data Pipeline (FRE‑DP) is introduced to emphasize object‑concentrated foreground regions in synthesis, combined with a label refinement to correct missing, extra and misaligned generations. Extensive experimental results demonstrate that our method significantly outperforms state‑of‑the‑art approaches, and consistently promotes accuracy when integrated with distinct detectors. The source code is available at https://github.com/Sirius‑Li/UAVGen.
Authors: Qiyao Zhang, Shuhua Zheng, Jianli Sun, Chengxiang Li, Xianke Wu, Zihan Song, Zhiyong Cui, Yisheng Lv, Yonglin Tian
Abstract: Embodied visual tracking is crucial for Unmanned Aerial Vehicles (UAVs) executing complex real‑world tasks. In dynamic urban scenarios with complex semantic requirements, Vision‑Language‑Action (VLA) models show great promise due to their cross‑modal fusion and continuous action generation capabilities. To benchmark multimodal tracking in such environments, we construct a dedicated evaluation benchmark and a large‑scale dataset encompassing over 890K frames, 176 tasks, and 85 diverse objects. Furthermore, to address temporal feature redundancy and the lack of spatial geometric priors in existing VLA models, we propose an improved VLA tracking model, UAV‑Track VLA. Built upon the π_0.5 architecture, our model introduces a temporal compression net to efficiently capture inter‑frame dynamics. Additionally, a parallel dual‑branch decoder comprising a spatial‑aware auxiliary grounding head and a flow matching action expert is designed to decouple cross‑modal features and generate fine‑grained continuous actions. Systematic experiments in the CARLA simulator validate the superior end‑to‑end performance of our method. Notably, in challenging long‑distance pedestrian tracking tasks, UAV‑Track VLA achieves a 61.76% success rate and 269.65 average tracking frames, significantly outperforming existing baselines. Furthermore, it demonstrates robust zero‑shot generalization in unseen environments and reduces single‑step inference latency by 33.4% (to 0.0571s) compared to the original π_0.5, enabling highly efficient, real‑time UAV control. Data samples and demonstration videos are available at: https://github.com/Hub‑Tian/UAV‑Track_VLA.
Authors: Da Zhang, Gao Junyu, Zhao Zhiyuan
Abstract: Semantic segmentation of low‑altitude UAV imagery presents unique challenges due to extreme scale variations, complex object boundaries, and limited computational resources on edge devices. Existing transformer‑based segmentation methods achieve remarkable performance but incur high computational overhead, while lightweight approaches struggle to capture fine‑grained details in high‑resolution aerial scenes. To address these limitations, we propose PBSeg, an efficient prototype‑based segmentation framework tailored for UAV applications. PBSeg introduces a novel prototype‑based cross‑attention (PBCA) that exploits feature redundancy to reduce computational complexity while maintaining segmentation quality. The framework incorporates an efficient multi‑scale feature extraction module that combines deformable convolutions (DConv) with context‑aware modulation (CAM) to capture both local details and global semantics. Experiments on two challenging UAV datasets demonstrate the effectiveness of the proposed approach. PBSeg achieves 71.86% mIoU on UAVid and 80.92% mIoU on UDD6, establishing competitive performance while maintaining computational efficiency. Code is available at https://github.com/zhangda1018/PBSeg.
Authors: Tianle Zeng, Yanci Wen, Hong Zhang
Abstract: The convergence of low‑altitude economies, embodied intelligence, and air‑ground cooperative systems creates growing demand for simulation infrastructure capable of jointly modeling aerial and ground agents within a single physically coherent environment. Existing open‑source platforms remain domain‑segregated: driving simulators lack aerial dynamics, while multirotor simulators lack realistic ground scenes. Bridge‑based co‑simulation introduces synchronization overhead and cannot guarantee strict spatial‑temporal consistency.
We present CARLA‑Air, an open‑source infrastructure that unifies high‑fidelity urban driving and physics‑accurate multirotor flight within a single Unreal Engine process. The platform preserves both CARLA and AirSim native Python APIs and ROS 2 interfaces, enabling zero‑modification code reuse. Within a shared physics tick and rendering pipeline, CARLA‑Air delivers photorealistic environments with rule‑compliant traffic, socially‑aware pedestrians, and aerodynamically consistent UAV dynamics, synchronously capturing up to 18 sensor modalities across all platforms at each tick. The platform supports representative air‑ground embodied intelligence workloads spanning cooperation, embodied navigation and vision‑language action, multi‑modal perception and dataset construction, and reinforcement‑learning‑based policy training. An extensible asset pipeline allows integration of custom robot platforms into the shared world. By inheriting AirSim's aerial capabilities ‑‑ whose upstream development has been archived ‑‑ CARLA‑Air ensures this widely adopted flight stack continues to evolve within a modern infrastructure.
Released with prebuilt binaries and full source: https://github.com/louiszengCN/CarlaAir
Authors: Weijia Li, Haoen Xiang, Tianxu Wang, Shuaibing Wu, Qiming Xia, Cheng Wang, Chenglu Wen
Abstract: Modern autonomous vehicle perception systems are often constrained by occlusions, blind spots, and limited sensing range. While existing cooperative perception paradigms, such as Vehicle‑to‑Vehicle (V2V) and Vehicle‑to‑Infrastructure (V2I), have demonstrated their effectiveness in mitigating these challenges, they remain limited to ground‑level collaboration and cannot fully address large‑scale occlusions or long‑range perception in complex environments. To advance research in cross‑view cooperative perception, we present V2U4Real, the first large‑scale real‑world multi‑modal dataset for Vehicle‑to‑UAV (V2U) cooperative object perception. V2U4Real is collected by a ground vehicle and a UAV equipped with multi‑view LiDARs and RGB cameras. The dataset covers urban streets, university campuses, and rural roads under diverse traffic scenarios, comprising over 56K LiDAR frames, 56K multi‑view camera images, and 700K annotated 3D bounding boxes across four classes. To support a wide range of research tasks, we establish benchmarks for single‑agent 3D object detection, cooperative 3D object detection, and object tracking. Comprehensive evaluations of several state‑of‑the‑art models demonstrate the effectiveness of V2U cooperation in enhancing perception robustness and long‑range awareness. The V2U4Real dataset and codebase is available at https://github.com/VjiaLi/V2U4Real.
Authors: Deyan Deng, Rongjun Qin
Abstract: 3D Gaussian Splatting (3DGS) has revolutionized real‑time rendering with its state‑of‑the‑art novel view synthesis, but its utility for accurate geometric measurement remains underutilized. Compared to multi‑view stereo (MVS) point clouds or meshes, 3DGS rendered views present superior visual quality and completeness. However, current point measurement methods still rely on demanding stereoscopic workstations or direct picking on often‑incomplete and inaccurate 3D meshes. As a novel view synthesizer, 3DGS renders exact source views and smoothly interpolates in‑between views. This allows users to intuitively pick congruent points across different views while operating 3DGS models. By triangulating these congruent points, one can precisely generate 3D point measurements. This approach mimics traditional stereoscopic measurement but is significantly less demanding: it requires neither a stereo workstation nor specialized operator stereoscopic capability. Furthermore, it enables multi‑view intersection (more than two views) for higher measurement accuracy. We implemented a web‑based application to demonstrate this proof‑of‑concept (PoC). Using several UAV aerial datasets, we show this PoC allows users to successfully perform highly accurate point measurements, achieving accuracy matching or exceeding traditional stereoscopic methods on standard hardware. Specifically, our approach significantly outperforms direct mesh‑based measurements. Quantitatively, our method achieves RMSEs in the 1‑2 cm range on well‑defined points. More critically, on challenging thin structures where mesh‑based RMSE was 0.062 m, our method achieved 0.037 m. On sharp corners poorly reconstructed in the mesh, our method successfully measured all points with a 0.013 m RMSE, whereas the mesh method failed entirely. Code is available at: https://github.com/GDAOSU/3dgs_measurement_tool.
Authors: Jannik Endres, Etienne Laliberté, David Rolnick, Arthur Ouaknine
Abstract: Accurate estimation of forest biomass, a major carbon sink, relies heavily on tree‑level traits such as height and species. Unoccupied Aerial Vehicles (UAVs) capturing high‑resolution imagery from a single RGB camera offer a cost‑effective and scalable approach for mapping and measuring individual trees. We introduce BIRCH‑Trees, the first benchmark for individual tree height and species estimation from tree‑centered UAV images, spanning three datasets: temperate forests, tropical forests, and boreal plantations. We also present DINOvTree, a unified approach using a Vision Foundation Model (VFM) backbone with task‑specific heads for simultaneous height and species prediction. Through extensive evaluations on BIRCH‑Trees, we compare DINOvTree against commonly used vision methods, including VFMs, as well as biological allometric equations. We find that DINOvTree achieves top overall results with accurate height predictions and competitive classification accuracy while using only 54% to 58% of the parameters of the second‑best approach.
Authors: Yangjie Cui, Xin Dong, Boyang Gao, Jinwu Xiang, Daochun Li, Zhan Tu
Abstract: As spatial intelligence continues to evolve, heterogeneous multi‑agent systems‑particularly the collaboration between Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs), have demonstrated strong potential in complex applications such as search and rescue, urban surveillance, and environmental monitoring. However, existing simulation platforms are primarily designed for single‑agent dynamics and lack dedicated frameworks for interactive air‑ground collaborative simulation. In this paper, we present AirsimAG, a high‑fidelity air‑ground collaborative simulation platform built upon an extensively customized AirSim framework. The platform enables synchronized multi‑agent simulation and supports heterogeneous sensing and control interfaces for UAV‑UGV systems. To demonstrate its capabilities, we design a set of representative air‑ground collaborative tasks, including mapping, planning, tracking, formation, and exploration. We further provide quantitative analyses based on these tasks to illustrate the platform effectiveness in supporting multi‑agent coordination and cross‑modal data consistency. The AirsimAG simulation platform is publicly available at https://github.com/BIULab‑BUAA/AirSimAG.
Authors: Jun Yang, Dong Wang, Hongxu Yin, Hongpeng Li, Jianxiong Yu
Abstract: Drone detection is pivotal in numerous security and counter‑UAV applications. However, existing deep learning‑based methods typically struggle to balance robust feature representation with computational efficiency. This challenge is particularly acute when detecting miniature drones against complex backgrounds under severe environmental interference. To address these issues, we introduce UAV‑DETR, a novel framework that integrates a small‑target‑friendly architecture with real‑time detection capabilities. Specifically, UAV‑DETR features a WTConv‑enhanced backbone and a Sliding Window Self‑Attention (SWSA‑IFI) encoder, capturing the high‑frequency structural details of tiny targets while drastically reducing parameter overhead. Furthermore, we propose an Efficient Cross‑Scale Feature Recalibration and Fusion Network (ECFRFN) to suppress background noise and aggregate multi‑scale semantics. To further enhance accuracy, UAV‑DETR incorporates a hybrid Inner‑CIoU and NWD loss strategy, mitigating the extreme sensitivity of standard IoU metrics to minor positional deviations in small objects. Extensive experiments demonstrate that UAV‑DETR significantly outperforms the baseline RT‑DETR on our custom UAV dataset (+6.61% in mAP50:95, with a 39.8% reduction in parameters) and the public DUT‑ANTI‑UAV benchmark (+1.4% in Precision, +1.0% in F1‑Score). These results establish UAV‑DETR as a superior trade‑off between efficiency and precision in counter‑UAV object detection. The code is available at https://github.com/wd‑sir/UAVDETR.
Authors: Xiaoya Cheng, Long Wang, Yan Liu, Xinyi Liu, Hanlin Tan, Yu Liu, Maojun Zhang, Shen Yan
Abstract: We present PiLoT, a unified framework that tackles UAV‑based ego and target geo‑localization. Conventional approaches rely on decoupled pipelines that fuse GNSS and Visual‑Inertial Odometry (VIO) for ego‑pose estimation, and active sensors like laser rangefinders for target localization. However, these methods are susceptible to failure in GNSS‑denied environments and incur substantial hardware costs and complexity. PiLoT breaks this paradigm by directly registering live video stream against a geo‑referenced 3D map. To achieve robust, accurate, and real‑time performance, we introduce three key contributions: 1) a Dual‑Thread Engine that decouples map rendering from core localization thread, ensuring both low latency while maintaining drift‑free accuracy; 2) a large‑scale synthetic dataset with precise geometric annotations (camera pose, depth maps). This dataset enables the training of a lightweight network that generalizes in a zero‑shot manner from simulation to real data; and 3) a Joint Neural‑Guided Stochastic‑Gradient Optimizer (JNGO) that achieves robust convergence even under aggressive motion. Evaluations on a comprehensive set of public and newly collected benchmarks show that PiLoT outperforms state‑of‑the‑art methods while running over 25 FPS on NVIDIA Jetson Orin platform. Our code and dataset is available at: https://github.com/Choyaa/PiLoT.
Authors: Junhao Wei, Yanxiao Li, Seyedali Mirjalili, Dexing Yao, Yifu Zhao, Haochen Li, Xudong Ye, Zikun Li, Qingyang Xu, Baili Lu, Ngai Cheong, Dengcheng Yang, Sio-Kei Im, Yapeng Wang, Xu Yang
Abstract: The Whale Optimization Algorithm (WOA) has shown strong optimization ability but still suffers from premature convergence and weak search diversity. To address these issues, this paper proposes an enhanced WOA variant called CICDWOA. The proposed algorithm introduces a Good Nodes Set (GNS) method for uniform population initialization, a Collective Cognitive Sharing (CCS) mechanism to enhance group collaboration, and an Enhanced Spiral Updating strategy based on the Cauchy Inverse Cumulative Distribution (CICD) to strengthen global exploration and local exploitation balance. In addition, a nonlinear convergence factor and a Hybrid Gaussian‑Cauchy mutation based on Differential Evolution (DE) further improve convergence efficiency and population diversity. CICDWOA was evaluated on 23 benchmark functions, 2D robot path planning problems, 3D UAV path planning tasks and 10 engineering design problems. Statistical experiment results show that CICDWOA achieves faster convergence, higher accuracy, and better robustness than classical WOA and other advanced metaheuristic algorithms. CICDWOA gained average Friedman value of 1.6790, ranking first among the SOTA algorithms. And the results of engineering simulations confirm that CICDWOA provides an effective and general framework for solving complex optimization and engineering problems. The code of CICDWOA are available on \hrefURLhttps://github.com/JunhaoWei‑mpu/ROBIS‑Lab/tree/CICDWOA.
Authors: Yiheng Wang, Changhong Fu, Liangliang Yao, Haobo Zuo, Zijie Zhang
Abstract: Robust feature encoding constitutes the foundation of UAV tracking by enabling the nuanced perception of target appearance and motion, thereby playing a pivotal role in ensuring reliable tracking. However, existing feature encoding methods often overlook critical illumination and viewpoint cues, which are essential for robust perception under challenging nighttime conditions, leading to degraded tracking performance. To overcome the above limitation, this work proposes a dual prompt‑driven feature encoding method that integrates prompt‑conditioned feature adaptation and context‑aware prompt evolution to promote domain‑invariant feature encoding. Specifically, the pyramid illumination prompter is proposed to extract multi‑scale frequency‑aware illumination prompts. %The dynamic viewpoint prompter adapts the sampling to different viewpoints, enabling the tracker to learn view‑invariant features. The dynamic viewpoint prompter modulates deformable convolution offsets to accommodate viewpoint variations, enabling the tracker to learn view‑invariant features. Extensive experiments validate the effectiveness of the proposed dual prompt‑driven tracker (DPTracker) in tackling nighttime UAV tracking. Ablation studies highlight the contribution of each component in DPTracker. Real‑world tests under diverse nighttime UAV tracking scenarios further demonstrate the robustness and practical utility. The code and demo videos are available at https://github.com/yiheng‑wang‑duke/DPTracker.
Authors: Markus Gross, Sai Bharadhwaj Matha, Rui Song, Viswanathan Muthuveerappan, Conrad Christoph, Julius Huber, Daniel Cremers
Abstract: Semantic segmentation for uncrewed aerial vehicles (UAVs) is fundamental for aerial scene understanding, yet existing RGB and RGB‑T datasets remain limited in scale, diversity, and annotation efficiency due to the high cost of manual labeling and the difficulties of accurate RGB‑T alignment on off‑the‑shelf UAVs. To address these challenges, we propose a scalable geometry‑driven 2D‑3D‑2D paradigm that leverages multi‑view redundancy in high‑overlap aerial imagery to automatically propagate labels from a small subset of manually annotated RGB images to both RGB and thermal modalities within a unified framework. By lifting less than 3% of RGB images into a semantic 3D point cloud and reprojecting it into all views, our approach enables dense pseudo ground‑truth generation across large image collections, automatically producing 97% of RGB labels and 100% of thermal labels while achieving 91% and 88% annotation accuracy without any 2D manual refinement. We further extend this 2D‑3D‑2D paradigm to cross‑modal image registration, using 3D geometry as an intermediate alignment space to obtain fully automatic, strong pixel‑level RGB‑T alignment with 87% registration accuracy and no hardware‑level synchronization. Applying our framework to existing geo‑referenced aerial imagery, we construct SegFly, a large‑scale benchmark with over 20,000 high‑resolution RGB images and more than 15,000 geometrically aligned RGB‑T pairs spanning diverse urban, industrial, and rural environments across multiple altitudes and seasons. On SegFly, we establish the Firefly baseline for RGB and thermal semantic segmentation and show that both conventional architectures and vision foundation models benefit substantially from SegFly supervision, highlighting the potential of geometry‑driven 2D‑3D‑2D pipelines for scalable multi‑modal scene understanding. Data and Code available at https://github.com/markus‑42/SegFly.
Authors: Markus Gross, Andreas Greiner, Sai Bharadhwaj Matha, Felix Soest, Daniel Cremers, Henri Meeß
Abstract: Autonomous landing of uncrewed aerial vehicles (UAVs) in unknown, dynamic environments poses significant safety challenges, particularly near people and infrastructure, as UAVs transition to routine urban and rural operations. Existing methods often rely on prior maps, heavy sensors like LiDAR, static markers, or fail to handle non‑cooperative dynamic obstacles like humans, limiting generalization and real‑time performance. To address these challenges, we introduce SafeLand, a lean, vision‑based system for safe autonomous landing (SAL) that requires no prior information and operates only with a camera and a lightweight height sensor. Our approach constructs an online semantic ground map via deep learning‑based semantic segmentation, optimized for embedded deployment and trained on a consolidation of seven curated public aerial datasets (achieving 70.22% mIoU across 20 classes), which is further refined through Bayesian probabilistic filtering with temporal semantic decay to robustly identify metric‑scale landing spots. A behavior tree then governs adaptive landing, iteratively validates the spot, and reacts in real time to dynamic obstacles by pausing, climbing, or rerouting to alternative spots, maximizing human safety. We extensively evaluate our method in 200 simulations and 60 end‑to‑end field tests across industrial, urban, and rural environments at altitudes up to 100m, demonstrating zero false negatives for human detection. Compared to the state of the art, SafeLand achieves sub‑second response latency, substantially lower than previous methods, while maintaining a superior success rate of 95%. To facilitate further research in aerial robotics, we release SafeLand's segmentation model as a plug‑and‑play ROS package, available at https://github.com/markus‑42/SafeLand.
Authors: Hailiang Tang, Tisheng Zhang, Liqiang Wang, Xin Ding, Man Yuan, Xiaoji Niu
Abstract: Real‑time LiDAR‑visual‑inertial odometry and mapping is crucial for navigation and planning tasks in intelligent transportation systems. This study presents a pose‑only bundle adjustment (PA) LiDAR‑visual‑inertial odometry (LVIO), named PA‑LVIO, to meet the urgent need for real‑time navigation and mapping. The proposed PA framework for LiDAR and visual measurements is highly accurate and efficient, and it can derive reliable frame‑to‑frame constraints within multiple frames. A marginalization‑free and frame‑to‑map (F2M) LiDAR measurement model is integrated into the state estimator to eliminate odometry drifts. Meanwhile, an IMU‑centric online spatial‑temporal calibration is employed to obtain a pixel‑wise LiDAR‑camera alignment. With accurate estimated odometry and extrinsics, a high‑quality and RGB‑rendered point‑cloud map can be built. Comprehensive experiments are conducted on both public and private datasets collected by wheeled robot, unmanned aerial vehicle (UAV), and handheld devices with 28 sequences and more than 50 km trajectories. Sufficient results demonstrate that the proposed PA‑LVIO yields superior or comparable performance to state‑of‑the‑art LVIO methods, in terms of the odometry accuracy and mapping quality. Besides, PA‑LVIO can run in real‑time on both the desktop PC and the onboard ARM computer. The codes and datasets are open sourced on GitHub (https://github.com/i2Nav‑WHU/PA‑LVIO) to benefit the community.
Authors: Peng Xu, Zhengnan Deng, Jiayan Deng, Zonghua Gu, Shaohua Wan
Abstract: Vision‑Language Navigation (VLN) for Unmanned Aerial Vehicles (UAVs) demands complex visual interpretation and continuous control in dynamic 3D environments. Existing hierarchical approaches rely on dense oracle guidance or auxiliary object detectors, creating semantic gaps and limiting genuine autonomy. We propose AerialVLA, a minimalist end‑to‑end Vision‑Language‑Action framework mapping raw visual observations and fuzzy linguistic instructions directly to continuous physical control signals. First, we introduce a streamlined dual‑view perception strategy that reduces visual redundancy while preserving essential cues for forward navigation and precise grounding, which additionally facilitates future simulation‑to‑reality transfer. To reclaim genuine autonomy, we deploy a fuzzy directional prompting mechanism derived solely from onboard sensors, completely eliminating the dependency on dense oracle guidance. Ultimately, we formulate a unified control space that integrates continuous 3‑Degree‑of‑Freedom (3‑DoF) kinematic commands with an intrinsic landing signal, freeing the agent from external object detectors for precision landing. Extensive experiments on the TravelUAV benchmark demonstrate that AerialVLA achieves state‑of‑the‑art performance in seen environments. Furthermore, it exhibits superior generalization in unseen scenarios by achieving nearly three times the success rate of leading baselines, validating that a minimalist, autonomy‑centric paradigm captures more robust visual‑motor representations than complex modular systems.
Authors: Yu Zhang, Zhicheng Zhao, Ze Luo, Chenglong Li, Jin Tang
Abstract: Traffic scene understanding from unmanned aerial vehicle (UAV) platforms is crucial for intelligent transportation systems due to its flexible deployment and wide‑area monitoring capabilities. However, existing methods face significant challenges in real‑world surveillance, as their heavy reliance on optical imagery leads to severe performance degradation under adverse illumination conditions like nighttime and fog. Furthermore, current Visual Question Answering (VQA) models are restricted to elementary perception tasks, lacking the domain‑specific regulatory knowledge required to assess complex traffic behaviors. To address these limitations, we propose a novel Multi‑modal Traffic Cognition Network (MTCNet) for robust UAV traffic scene understanding. Specifically, we design a Prototype‑Guided Knowledge Embedding (PGKE) module that leverages high‑level semantic prototypes from an external Traffic Regulation Memory (TRM) to anchor domain‑specific knowledge into visual representations, enabling the model to comprehend complex behaviors and distinguish fine‑grained traffic violations. Moreover, we develop a Quality‑Aware Spectral Compensation (QASC) module that exploits the complementary characteristics of optical and thermal modalities to perform bidirectional context exchange, effectively compensating for degraded features to ensure robust representation in complex environments. In addition, we construct Traffic‑VQA, the first large‑scale optical‑thermal infrared benchmark for cognitive UAV traffic understanding, comprising 8,180 aligned image pairs and 1.3 million question‑answer pairs across 31 diverse types. Extensive experiments demonstrate that MTCNet significantly outperforms state‑of‑the‑art methods in both cognition and perception scenarios. The dataset is available at https://github.com/YuZhang‑2004/UAV‑traffic‑scene‑understanding.
Authors: Guiyong Zheng, Yueting Ban, Mingjie Zhang, Juepeng Zheng, Boyu Zhou
Abstract: Aerial vision‑language navigation (AVLN) enables UAVs to follow natural‑language instructions in complex 3D environments. However, existing zero‑shot AVLN methods often suffer from unstable single‑stream Vision‑Language Model decision‑making, unreliable long‑horizon progress monitoring, and a trade‑off between safety and efficiency. We propose OnFly, a fully onboard, real‑time framework for zero‑shot AVLN. OnFly adopts a shared‑perception dual‑agent architecture that decouples high‑frequency target generation from low‑frequency progress monitoring, thereby stabilizing decision‑making. It further employs a hybrid keyframe‑recent‑frame memory to preserve global trajectory context while maintaining KV‑cache prefix stability, enabling reliable long‑horizon monitoring with termination and recovery signals. In addition, a semantic‑geometric verifier refines VLM‑predicted targets for instruction consistency and geometric safety using VLM features and depth cues, while a receding‑horizon planner generates optimized collision‑free trajectories under geometric safety constraints, improving both safety and efficiency. In simulation, OnFly improves task success from 26.4% to 67.8%, compared with the strongest state‑of‑the‑art baseline, while fully onboard real‑world flights validate its feasibility for real‑time deployment. The code will be released at https://github.com/Robotics‑STAR‑Lab/OnFly
Authors: Zhe Yang, Guoqiang Zhao, Sheng Wu, Kai Luo, Kailun Yang
Abstract: Omnidirectional images are increasingly used in robotics and vision due to their wide field of view. However, extending 3D Gaussian Splatting (3DGS) to panoramic camera models remains challenging, as existing formulations are designed for perspective projections and naive adaptations often introduce distortion and geometric inconsistencies. We present Spherical‑GOF, an omnidirectional Gaussian rendering framework built upon Gaussian Opacity Fields (GOF). Unlike projection‑based rasterization, Spherical‑GOF performs GOF ray sampling directly on the unit sphere in spherical ray space, enabling consistent ray‑Gaussian interactions for panoramic rendering. To make the spherical ray casting efficient and robust, we derive a conservative spherical bounding rule for fast ray‑Gaussian culling and introduce a spherical filtering scheme that adapts Gaussian footprints to distortion‑varying panoramic pixel sampling. Extensive experiments on standard panoramic benchmarks (OmniBlender and OmniPhotos) demonstrate competitive photometric quality and substantially improved geometric consistency. Compared with the strongest baseline, Spherical‑GOF reduces depth reprojection error by 57% and improves cycle inlier ratio by 21%. Qualitative results show cleaner depth and more coherent normal maps, with strong robustness to global panorama rotations. We further validate generalization on OmniRob, a real‑world robotic omnidirectional dataset introduced in this work, featuring UAV and quadruped platforms. The source code and the OmniRob dataset will be released at https://github.com/1170632760/Spherical‑GOF.
Authors: Şebnem Sarıözkan, Hürkan Şahin, Olaya Álvarez-Tuñón, Erdal Kayacan
Abstract: Conventional visual simultaneous localization and mapping (SLAM) algorithms often fail under rapid motion, low illumination, or abrupt lighting transitions due to motion blur and limited dynamic range. Event cameras mitigate these issues with high temporal resolution and high dynamic range (HDR), but their sparse, asynchronous outputs complicate feature extraction and integration with other sensors; e.g. inertial measurement units (IMUs) and standard cameras. We present Edged USLAM, a hybrid visual‑inertial system that extends Ultimate SLAM (USLAM) with an edge‑aware front‑end and a lightweight depth module. The frontend enhances event frames for robust feature tracking and nonlinear motion compensation, while the depth module provides coarse, region‑of‑interest (ROI)‑based scene depth to improve motion compensation and scale consistency. Evaluations across public benchmarks and real‑world unmanned air vehicle (UAV) flights demonstrate that performance varies significantly by scenario. For instance, event‑only methods like point‑line event‑based visual‑inertial odometry (PL‑EVIO) or learning‑based pipelines such as deep event‑based visual odometry (DEVO) excel in highly aggressive or extreme HDR conditions. In contrast, Edged USLAM provides superior stability and minimal drift in slow or structured trajectories, ensuring consistently accurate localization on real flights under challenging illumination. These findings highlight the complementary strengths of event‑only, learning‑based, and hybrid approaches, while positioning Edged USLAM as a robust solution for diverse aerial navigation tasks.
Authors: Xinlu Yan, Mingjie Zhang, Yuhao Fang, Yanke Sun, Jun Ma, Youmin Gong, Boyu Zhou, Jie Mei
Abstract: Efficient multi‑UAV exploration under limited communication is severely bottlenecked by inadequate task representation and allocation. Previous task representations either impose heavy communication requirements for coordination or lack the flexibility to handle complex environments, often leading to inefficient traversal. Furthermore, short‑horizon allocation strategies neglect spatiotemporal contiguity, causing non‑contiguous assignments and frequent cross‑region detours. To address this, we propose C^2‑Explorer, a decentralized framework that constructs a connectivity graph to decompose disconnected unknown components into independent task units. We then introduce a contiguity‑driven allocation formulation with a graph‑based neighborhood penalty to discourage non‑adjacent assignments, promoting more contiguous task sequences over time. Extensive simulation experiments show that C^2‑Explorer consistently outperforms state‑of‑the‑art (SOTA) baselines, reducing average exploration time by 43.1% and path length by 33.3%. Real‑world flights further demonstrate the system's feasibility. The code will be released at https://github.com/Robotics‑STAR‑Lab/C2‑Explorer
Authors: Xiangkai Zhang, Dizhe Zhang, WenZhuo Cao, Zhaoliang Wan, Yingjie Niu, Lu Qi, Xu Yang, Zhiyong Liu
Abstract: Obstacle avoidance in unmanned aerial vehicles (UAVs), as a fundamental capability, has gained increasing attention with the growing focus on spatial intelligence. However, current obstacle‑avoidance methods mainly depend on limited field‑of‑view sensors and are ill‑suited for UAV scenarios which require full‑spatial awareness when the movement direction differs from the UAV's heading. This limitation motivates us to explore omnidirectional obstacle avoidance for panoramic drones with full‑view perception. We first study an under explored problem setting in which a UAV must generate collision‑free motion in environments with obstacles from arbitrary directions, and then construct a benchmark that consists of three representative flight tasks. Based on such settings, we propose Fly360, a two‑stage perception‑decision pipeline with a fixed random‑yaw training strategy. At the perception stage, panoramic RGB observations are input and converted into depth maps as a robust intermediate representation. For the policy network, it is lightweight and used to output body‑frame velocity commands from depth inputs. Extensive simulation and real‑world experiments demonstrate that Fly360 achieves stable omnidirectional obstacle avoidance and outperforms forward‑view baselines across all tasks. Our model is available at https://zxkai.github.io/fly360/
Authors: Xuecheng Bai, Yuxiang Wang, Chuanzhi Xu, Boyu Hu, Kang Han, Ruijie Pan, Xiaowei Niu, Xiaotian Guan, Liqiang Fu, Pengfei Ye
Abstract: Small object detection in unmanned aerial vehicle (UAV) imagery is challenging, mainly due to scale variation, structural detail degradation, and limited computational resources. In high‑altitude scenarios, fine‑grained features are further weakened during hierarchical downsampling and cross‑scale fusion, resulting in unstable localization and reduced robustness. To address this issue, we propose CollabOD, a lightweight collaborative detection framework that explicitly preserves structural details and aligns heterogeneous feature streams before multi‑scale fusion. The framework integrates Structural Detail Preservation, Cross‑Path Feature Alignment, and Localization‑Aware Lightweight Design strategies. From the perspectives of image processing, channel structure, and lightweight design, it optimizes the architecture of conventional UAV perception models. The proposed design enhances representation stability while maintaining efficient inference. A unified detail‑aware detection head further improves regression robustness without introducing additional deployment overhead. The code is available at: https://github.com/Bai‑Xuecheng/CollabOD.
Authors: Ziyang Gong, Zehang Luo, Anke Tang, Zhe Liu, Shi Fu, Zhi Hou, Ganlin Yang, Weiyun Wang, Xiaofeng Wang, Jianbo Liu, Gen Luo, Haolan Kang, Shuang Luo, Yue Zhou, Yong Luo, Li Shen, Xiaosong Jia, Yao Mu, Xue Yang, Chunxiao Liu, Junchi Yan, Hengshuang Zhao, Dacheng Tao, Xiaogang Wang
Abstract: Universal embodied intelligence demands robust generalization across heterogeneous embodiments, such as autonomous driving, robotics, and unmanned aerial vehicles (UAVs). However, existing embodied brain in training a unified model over diverse embodiments frequently triggers long‑tail data, gradient interference, and catastrophic forgetting, making it notoriously difficult to balance universal generalization with domain‑specific proficiency. In this report, we introduce ACE‑Brain‑0, a generalist foundation brain that unifies spatial reasoning, autonomous driving, and embodied manipulation within a single multimodal large language model~(MLLM). Our key insight is that spatial intelligence serves as a universal scaffold across diverse physical embodiments: although vehicles, robots, and UAVs differ drastically in morphology, they share a common need for modeling 3D mental space, making spatial cognition a natural, domain‑agnostic foundation for cross‑embodiment transfer. Building on this insight, we propose the Scaffold‑Specialize‑Reconcile~(SSR) paradigm, which first establishes a shared spatial foundation, then cultivates domain‑specialized experts, and finally harmonizes them through data‑free model merging. Furthermore, we adopt Group Relative Policy Optimization~(GRPO) to strengthen the model's comprehensive capability. Extensive experiments demonstrate that ACE‑Brain‑0 achieves competitive and even state‑of‑the‑art performance across 24 spatial and embodiment‑related benchmarks.
Authors: Huichun Liu, Xiaosong Li, Zhuangfan Huang, Tao Ye, Yang Liu, Haishu Tan
Abstract: Multimodal Image Fusion (MMIF) integrates complementary information from various modalities to produce clearer and more informative fused images. MMIF under adverse weather is particularly crucial in autonomous driving and UAV monitoring applications. However, existing adverse weather fusion methods generally only tackle single types of degradation such as haze, rain, or snow, and fail when multiple degradations coexist (e.g., haze+rain, rain+snow). To address this challenge, we propose Compound Adverse Weather Mamba (CAWM‑Mamba), the first end‑to‑end framework that jointly performs image fusion and compound weather restoration with unified shared weights. Our network contains three key components: (1) a Weather‑Aware Preprocess Module (WAPM) to enhance degraded visible features and extracts global weather embeddings; (2) a Cross‑modal Feature Interaction Module (CFIM) to facilitate the alignment of heterogeneous modalities and exchange of complementary features across modalities; and (3) a Wavelet Space State Block (WSSB) that leverages wavelet‑domain decomposition to decouple multi‑frequency degradations. WSSB includes Freq‑SSM, a module that models anisotropic high‑frequency degradation without redundancy, and a unified degradation representation mechanism to further improve generalization across complex compound weather conditions. Extensive experiments on the AWMM‑100K benchmark and three standard fusion datasets demonstrate that CAWM‑Mamba consistently outperforms state‑of‑the‑art methods in both compound and single‑weather scenarios. In addition, our fusion results excel in downstream tasks covering semantic segmentation and object detection, confirming the practical value in real‑world adverse weather perception. The source code will be available at https://github.com/Feecuin/CAWM‑Mamba.
Authors: Stefan Fabian, Aljoscha Schmidt, Jonas Süß, Dishant, Aum Oza, Oskar von Stryk
Abstract: In disaster response and situation assessment, robots have great potential in reducing the risks to the safety and health of first responders. As the situations encountered and the required capabilities of the robots deployed in such missions differ wildly and are often not known in advance, heterogeneous fleets of robots are needed to cover a wide range of mission requirements. While UAVs can quickly survey the mission environment, their ability to carry heavy payloads such as sensors and manipulators is limited. UGVs can carry required payloads to assess and manipulate the mission environment, but need to be able to deal with difficult and unstructured terrain such as rubble and stairs. The ability of tracked platforms with articulated arms (flippers) to reconfigure their geometry makes them particularly effective for navigating challenging terrain. In this paper, we present Athena, an open‑hardware rescue ground robot research platform with four individually reconfigurable flippers and a reliable low‑cost remote emergency stop (E‑Stop) solution. A novel mounting solution using an industrial PU belt and tooth inserts allows the replacement and testing of different track profiles. The manipulator with a maximum reach of 1.54m can be used to operate doors, valves, and other objects of interest. Full CAD & PCB files, as well as all low‑level software, are released as open‑source contributions.
Authors: Kordel K. France, Ovidiu Daescu, Latifur Khan, Rohith Peddi
Abstract: Autonomous odor source localization remains a challenging problem for aerial robots due to turbulent airflow, sparse and delayed sensory signals, and strict payload and compute constraints. While prior unmanned aerial vehicle (UAV)‑based olfaction systems have demonstrated gas distribution mapping or reactive plume tracing, they rely on predefined coverage patterns, external infrastructure, or extensive sensing and coordination. In this work, we present a complete, open‑source UAV system for online odor source localization using a minimal sensor suite. The system integrates custom olfaction hardware, onboard sensing, and a learning‑based navigation policy trained in simulation and deployed on a real quadrotor. Through our minimal framework, the UAV is able to navigate directly toward an odor source without constructing an explicit gas distribution map or relying on external positioning systems. Vision is incorporated as an optional complementary modality to accelerate navigation under certain conditions. We validate the proposed system through real‑world flight experiments in a large indoor environment using an ethanol source, demonstrating consistent source‑finding behavior under realistic airflow conditions. The primary contribution of this work is a reproducible system and methodological framework for UAV‑based olfactory navigation and source finding under minimal sensing assumptions. We elaborate on our hardware design and open source our UAV firmware, simulation code, olfaction‑vision dataset, and circuit board to the community. Code, data, and designs will be made available at https://github.com/KordelFranceTech/ChasingGhosts.
Authors: Yang Zhou, Derui Ding, Ran Sun, Ying Sun, Haohua Zhang
Abstract: Visual object tracking (VOT) plays a pivotal role in unmanned aerial vehicle (UAV) applications. Addressing the trade‑off between accuracy and efficiency, especially under challenging conditions like unpredictable occlusion, remains a significant challenge. This paper introduces LGTrack, a unified UAV tracking framework that integrates dynamic layer selection, efficient feature enhancement, and robust representation learning for occlusions. By employing a novel lightweight Global‑Grouped Coordinate Attention (GGCA) module, LGTrack captures long‑range dependencies and global contexts, enhancing feature discriminability with minimal computational overhead. Additionally, a lightweight Similarity‑Guided Layer Adaptation (SGLA) module replaces knowledge distillation, achieving an optimal balance between tracking precision and inference efficiency. Experiments on three datasets demonstrate LGTrack's state‑of‑the‑art real‑time speed (258.7 FPS on UAVDT) while maintaining competitive tracking accuracy (82.8% precision). Code is available at https://github.com/XiaoMoc/LGTrack
Authors: Sebastian-Ion Nae, Mihai-Eugen Barbu, Sebastian Mocanu, Marius Leordeanu
Abstract: Autonomous agents such as indoor drones must learn new object classes in real‑time while limiting catastrophic forgetting, motivating Class‑Incremental Learning (CIL). However, most unmanned aerial vehicle (UAV) datasets focus on outdoor scenes and offer limited temporally coherent indoor videos. We introduce an indoor dataset of 14,400 frames capturing inter‑drone and ground vehicle footage, annotated via a semi‑automatic workflow with a 98.6% first‑pass labeling agreement before final manual verification. Using this dataset, we benchmark 3 replay‑based CIL strategies: Experience Replay (ER), Maximally Interfered Retrieval (MIR), and Forgetting‑Aware Replay (FAR), using YOLOv11‑nano as a resource‑efficient detector for deployment‑constrained UAV platforms. Under tight memory budgets (5‑10% replay), FAR performs better than the rest, achieving an average accuracy (ACC, mAP_50‑95 across increments) of 82.96% with 5% replay. Gradient‑weighted class activation mapping (Grad‑CAM) analysis shows attention shifts across classes in mixed scenes, which is associated with reduced localization quality for drones. The experiments further demonstrate that replay‑based continual learning can be effectively applied to edge aerial systems. Overall, this work contributes an indoor UAV video dataset with preserved temporal coherence and an evaluation of replay‑based CIL under limited replay budgets. Project page: https://spacetime‑vision‑robotics‑laboratory.github.io/learning‑on‑the‑fly‑cl
Authors: Wanhao Liu, Junhong Dai, Yixuan Zhang, Shengyun Yin, Panshuo Li
Abstract: Cooperative path planning for heterogeneous UAV swarms poses significant challenges for Multi‑Agent Reinforcement Learning (MARL), particularly in handling asymmetric inter‑agent dependencies and addressing the risks of sparse rewards and catastrophic forgetting during training. To address these issues, this paper proposes an attentive curriculum learning framework (AC‑MASAC). The framework introduces a role‑aware heterogeneous attention mechanism to explicitly model asymmetric dependencies. Moreover, a structured curriculum strategy is designed, integrating hierarchical knowledge transfer and stage‑proportional experience replay to address the issues of sparse rewards and catastrophic forgetting. The proposed framework is validated on a custom multi‑agent simulation platform, and the results show that our method has significant advantages over other advanced methods in terms of Success Rate, Formation Keeping Rate, and Success‑weighted Mission Time. The code is available at \textcolorredhttps://github.com/Wanhao‑Liu/AC‑MASAC.
Authors: Minglei Li, Mengfan He, Chunyu Li, Chao Chen, Xingyu Shao, Ziyang Meng
Abstract: Cross‑view geo‑localization (CVGL) is pivotal for GNSS‑denied UAV navigation but remains brittle under the drastic geometric misalignment between oblique aerial views and orthographic satellite references. Existing methods predominantly operate within a 2D manifold, neglecting the underlying 3D geometry where view‑dependent vertical facades (macro‑structure) and scale variations (micro‑scale) severely corrupt feature alignment. To bridge this gap, we propose (MGS)^2, a geometry‑grounded framework. The core of our innovation is the Macro‑Geometric Structure Filtering (MGSF) module. Unlike pixel‑wise matching sensitive to noise, MGSF leverages dilated geometric gradients to physically filter out high‑frequency facade artifacts while enhancing the view‑invariant horizontal plane, directly addressing the domain shift. To guarantee robust input for this structural filtering, we explicitly incorporate a Micro‑Geometric Scale Adaptation (MGSA) module. MGSA utilizes depth priors to dynamically rectify scale discrepancies via multi‑branch feature fusion. Furthermore, a Geometric‑Appearance Contrastive Distillation (GACD) loss is designed to strictly discriminate against oblique occlusions. Extensive experiments demonstrate that (MGS)^2 achieves state‑of‑the‑art performance, recording a Recall@1 of 97.5% on University‑1652 and 97.02% on SUES‑200. Furthermore, the framework exhibits superior cross‑dataset generalization against geometric ambiguity. The code is available at: \hrefhttps://github.com/GabrielLi1473/MGS‑Nethttps://github.com/GabrielLi1473/MGS‑Net.
Authors: Tao Wang, Chenyu Lin, Chenwei Tang, Jizhe Zhou, Deng Xiong, Jianan Li, Jian Zhao, Jiancheng Lv
Abstract: Detecting objects from UAV‑captured images is challenging due to the small object size. In this work, a simple and efficient adaptive zoom‑in framework is explored for object detection on UAV images. The main motivation is that the foreground objects are generally smaller and sparser than those in common scene images, which hinders the optimization of effective object detectors. We thus aim to zoom in adaptively on the objects to better capture object features for the detection task. To achieve the goal, two core designs are required: \textcolorblacki) How to conduct non‑uniform zooming on each image efficiently? ii) How to enable object detection training and inference with the zoomed image space? Correspondingly, a lightweight offset prediction scheme coupled with a novel box‑based zooming objective is introduced to learn non‑uniform zooming on the input image. Based on the learned zooming transformation, a corner‑aligned bounding box transformation method is proposed. The method warps the ground‑truth bounding boxes to the zoomed space to learn object detection, and warps the predicted bounding boxes back to the original space during inference. We conduct extensive experiments on three representative UAV object detection datasets, including VisDrone, UAVDT, and SeaDronesSee. The proposed ZoomDet is architecture‑independent and can be applied to an arbitrary object detection architecture. Remarkably, on the SeaDronesSee dataset, ZoomDet offers more than 8.4 absolute gain of mAP with a Faster R‑CNN model, with only about 3 ms additional latency. The code is available at https://github.com/twangnh/zoomdet_code.
Authors: Daoxuan Zhang, Ping Chen, Xiaobo Xia, Xiu Su, Ruichen Zhen, Jianqiang Xiao, Shuo Yang
Abstract: Aerial Object Goal Navigation, a challenging frontier in Embodied AI, requires an Unmanned Aerial Vehicle (UAV) agent to autonomously explore, reason, and identify a specific target using only visual perception and language description. However, existing methods struggle with the memorization of complex spatial representations in aerial environments, reliable and interpretable action decision‑making, and inefficient exploration and information gathering. To address these challenges, we introduce APEX (Aerial Parallel Explorer), a novel hierarchical agent designed for efficient exploration and target acquisition in complex aerial settings. APEX is built upon a modular, three‑part architecture: 1) Dynamic Spatio‑Semantic Mapping Memory, which leverages the zero‑shot capability of a Vision‑Language Model (VLM) to dynamically construct high‑resolution 3D Attraction, Exploration, and Obstacle maps, serving as an interpretable memory mechanism. 2) Action Decision Module, trained with reinforcement learning, which translates this rich spatial understanding into a fine‑grained and robust control policy. 3) Target Grounding Module, which employs an open‑vocabulary detector to achieve definitive and generalizable target identification. All these components are integrated into a hierarchical, asynchronous, and parallel framework, effectively bypassing the VLM's inference latency and boosting the agent's proactivity in exploration. Extensive experiments show that APEX outperforms the previous state of the art by +4.2% SR and +2.8% SPL on challenging UAV‑ON benchmarks, demonstrating its superior efficiency and the effectiveness of its hierarchical asynchronous design. Our source code is provided in \hrefhttps://github.com/4amGodvzx/apexGitHub
Authors: Jianli Sun, Bin Tian, Qiyao Zhang, Chengxiang Li, Zihan Song, Zhiyong Cui, Yisheng Lv, Yonglin Tian
Abstract: While Vision‑Language‑Action (VLA) models have achieved remarkable success in ground‑based embodied intelligence, their application to Aerial Manipulation Systems (AMS) remains a largely unexplored frontier. The inherent characteristics of AMS, including floating‑base dynamics, strong coupling between the UAV and the manipulator, and the multi‑step, long‑horizon nature of operational tasks, pose severe challenges to existing VLA paradigms designed for static or 2D mobile bases. To bridge this gap, we propose AIR‑VLA, the first VLA benchmark specifically tailored for aerial manipulation. We construct a physics‑based simulation environment and release a high‑quality multimodal dataset comprising 3000 manually teleoperated demonstrations, covering base manipulation, object \& spatial understanding, semantic reasoning, and long‑horizon planning. Leveraging this platform, we systematically evaluate mainstream VLA models and state‑of‑the‑art VLM models. Our experiments not only validate the feasibility of transferring VLA paradigms to aerial systems but also, through multi‑dimensional metrics tailored to aerial tasks, reveal the capabilities and boundaries of current models regarding UAV mobility, manipulator control, and high‑level planning. AIR‑VLA establishes a standardized testbed and data foundation for future research in general‑purpose aerial robotics. The resource of AIR‑VLA will be available at https://github.com/SpencerSon2001/AIR‑VLA.
Authors: Mohamed Amine Ferrag, Abderrahmane Lakas, Merouane Debbah
Abstract: Autonomous unmanned aerial vehicle (UAV) systems are increasingly deployed in safety‑critical, networked environments where they must operate reliably in the presence of malicious adversaries. While recent benchmarks have evaluated large language model (LLM)‑based UAV agents in reasoning, navigation, and efficiency, systematic assessment of security, resilience, and trust under adversarial conditions remains largely unexplored, particularly in emerging 6G‑enabled settings.
We introduce α^3‑SecBench, the first large‑scale evaluation suite for assessing the security‑aware autonomy of LLM‑based UAV agents under realistic adversarial interference. Building on multi‑turn conversational UAV missions from α^3‑Bench, the framework augments benign episodes with 20,000 validated security overlay attack scenarios targeting seven autonomy layers, including sensing, perception, planning, control, communication, edge/cloud infrastructure, and LLM reasoning. α^3‑SecBench evaluates agents across three orthogonal dimensions: security (attack detection and vulnerability attribution), resilience (safe degradation behavior), and trust (policy‑compliant tool usage).
We evaluate 23 state‑of‑the‑art LLMs from major industrial providers and leading AI labs using thousands of adversarially augmented UAV episodes sampled from a corpus of 113,475 missions spanning 175 threat types. While many models reliably detect anomalous behavior, effective mitigation, vulnerability attribution, and trustworthy control actions remain inconsistent. Normalized overall scores range from 12.9% to 57.1%, highlighting a significant gap between anomaly detection and security‑aware autonomous decision‑making. We release α^3‑SecBench on GitHub: https://github.com/maferrag/AlphaSecBench
Authors: Gnankan Landry Regis N'guessan
Abstract: Tropical algebra, including max‑plus, min‑plus, and related idempotent semirings, provides a unifying framework in which many optimization problems that are nonlinear in classical algebra become linear. This property makes tropical methods particularly well suited for shortest paths, scheduling, throughput analysis, and discrete event systems. Despite their theoretical maturity and practical relevance, existing tropical algebra implementations primarily target desktop or server environments and remain largely inaccessible on resource‑constrained embedded platforms, where such optimization problems are most acute. We present PALMA (Parallel Algebra Library for Max‑plus Applications), a lightweight, dependency‑free C library that brings tropical linear algebra to ARM‑based embedded systems. PALMA implements a generic semiring abstraction with SIMD‑accelerated kernels, enabling a single computational framework to support shortest paths, bottleneck paths, reachability, scheduling, and throughput analysis. The library supports five tropical semirings, dense and sparse (CSR) representations, tropical closure, and spectral analysis via maximum cycle mean computation. We evaluate PALMA on a Raspberry Pi 4 and demonstrate peak performance of 2,274 MOPS, speedups of up to 11.9 times over classical Bellman‑Ford for single‑source shortest paths, and sub‑10 microsecond scheduling solves for real‑time control workloads. Case studies in UAV control, IoT routing, and manufacturing systems show that tropical algebra enables efficient, predictable, and unified optimization directly on embedded hardware. PALMA is released as open‑source software under the MIT license.
Authors: Muhayy Ud Din, Waseem Akram, Ahsan B. Bakht, Irfan Hussain
Abstract: Maritime port inspection plays a critical role in ensuring safety, regulatory compliance, and operational efficiency in complex maritime environments. However, existing inspection methods often rely on manual operations and conventional computer vision techniques that lack scalability and contextual understanding. This study introduces a novel integrated engineering framework that utilizes the synergy between Large Language Models (LLMs) and Vision Language Models (VLMs) to enable autonomous maritime port inspection using cooperative aerial and surface robotic platforms. The proposed framework replaces traditional state‑machine mission planners with LLM‑driven symbolic planning and improved perception pipelines through VLM‑based semantic inspection, enabling context‑aware and adaptive monitoring. The LLM module translates natural language mission instructions into executable symbolic plans with dependency graphs that encode operational constraints and ensure safe UAV‑USV coordination. Meanwhile, the VLM module performs real‑time semantic inspection and compliance assessment, generating structured reports with contextual reasoning. The framework was validated using the extended MBZIRC Maritime Simulator with realistic port infrastructure and further assessed through real‑world robotic inspection trials. The lightweight on‑board design ensures suitability for resource‑constrained maritime platforms, advancing the development of intelligent, autonomous inspection systems. Project resources (code and videos) can be found here: https://github.com/Muhayyuddin/llm‑vlm‑fusion‑port‑inspection
Authors: Cheng-Zhuang Liu, Si-Bao Chen, Qing-Ling Shu, Chris Ding, Jin Tang, Bin Luo
Abstract: Recent advances in video anomaly detection (VAD) mainly focus on ground‑based surveillance or unmanned aerial vehicle (UAV) videos with static backgrounds, whereas research on UAV videos with dynamic backgrounds remains limited. Unlike static scenarios, dynamically captured UAV videos exhibit multi‑source motion coupling, where the motion of objects and UAV‑induced global motion are intricately intertwined. Consequently, existing methods may misclassify normal UAV movements as anomalies or fail to capture true anomalies concealed within dynamic backgrounds. Moreover, many approaches do not adequately address the joint modeling of inter‑frame continuity and local spatial correlations across diverse temporal scales. To overcome these limitations, we propose the Frequency‑Assisted Temporal Dilation Mamba (FTDMamba) network for UAV VAD, including two core components: (1) a Frequency Decoupled Spatiotemporal Correlation Module, which disentangles coupled motion patterns and models global spatiotemporal dependencies through frequency analysis; and (2) a Temporal Dilation Mamba Module, which leverages Mamba's sequence modeling capability to jointly learn fine‑grained temporal dynamics and local spatial structures across multiple temporal receptive fields. Additionally, unlike existing UAV VAD datasets which focus on static backgrounds, we construct a large‑scale Moving UAV VAD dataset (MUVAD), comprising 222,736 frames with 240 anomaly events across 12 anomaly types. Extensive experiments demonstrate that FTDMamba achieves state‑of‑the‑art (SOTA) performance on two public static benchmarks and the new MUVAD dataset. The code and MUVAD dataset will be available at: https://github.com/uavano/FTDMamba.
Authors: Hassaan Farooq, Marvin Brenner, Peter Stütz
Abstract: Unmanned Aerial Vehicles (UAVs) are increasingly deployed in close proximity to humans for applications such as parcel delivery, traffic monitoring, disaster response and infrastructure inspections. Ensuring safe and reliable operation in these human‑populated environments demands accurate perception of human poses and actions from an aerial viewpoint. This perspective challenges existing methods with low resolution, steep viewing angles and (self‑)occlusion, especially if the application demands realtime feasibile models. We train and deploy FlyPose, a lightweight top‑down human pose estimation pipeline for aerial imagery. Through multi‑dataset training, we achieve an average improvement of 6.8 mAP in person detection across the test‑sets of Manipal‑UAV, VisDrone, HIT‑UAV as well as our custom dataset. For 2D human pose estimation we report an improvement of 16.3 mAP on the challenging UAV‑Human dataset. FlyPose runs with an inference latency of ~20 milliseconds including preprocessing on a Jetson Orin AGX Developer Kit and is deployed onboard a quadrotor UAV during flight experiments. We also publish FlyPose‑104, a small but challenging aerial human pose estimation dataset, that includes manual annotations from difficult aerial perspectives: https://github.com/farooqhassaan/FlyPose.
Authors: Mohamed Amine Ferrag, Abderrahmane Lakas, Merouane Debbah
Abstract: Large Language Models (LLMs) are increasingly used as high level controllers for autonomous Unmanned Aerial Vehicle (UAV) missions. However, existing evaluations rarely assess whether such agents remain safe, protocol compliant, and effective under realistic next generation networking constraints. This paper introduces α^3‑Bench, a benchmark for evaluating LLM driven UAV autonomy as a multi turn conversational reasoning and control problem operating under dynamic 6G conditions. Each mission is formulated as a language mediated control loop between an LLM based UAV agent and a human operator, where decisions must satisfy strict schema validity, mission policies, speaker alternation, and safety constraints while adapting to fluctuating network slices, latency, jitter, packet loss, throughput, and edge load variations.
To reflect modern agentic workflows, α^3‑Bench integrates a dual action layer supporting both tool calls and agent to agent coordination, enabling evaluation of tool use consistency and multi agent interactions. We construct a large scale corpus of 113k conversational UAV episodes grounded in UAVBench scenarios and evaluate 17 state of the art LLMs using a fixed subset of 50 episodes per scenario under deterministic decoding. We propose a composite α^3 metric that unifies six pillars: Task Outcome, Safety Policy, Tool Consistency, Interaction Quality, Network Robustness, and Communication Cost, with efficiency normalized scores per second and per thousand tokens. Results show that while several models achieve high mission success and safety compliance, robustness and efficiency vary significantly under degraded 6G conditions, highlighting the need for network aware and resource efficient LLM based UAV agents. The dataset is publicly available on GitHub : https://github.com/maferrag/AlphaBench
Authors: Matthias Bartolo, Dylan Seychell, Gabriel Hili, Matthew Montebello, Carl James Debono, Saviour Formosa, Konstantinos Makantasis
Abstract: This paper investigates the integration of the Learning Using Privileged Information (LUPI) paradigm in object detection to exploit fine‑grained, descriptive information available during training but not at inference. We introduce a general, model‑agnostic methodology for injecting privileged information‑such as bounding box masks, saliency maps, and depth cues‑into deep learning‑based object detectors through a teacher‑student architecture. Experiments are conducted across five state‑of‑the‑art object detection models and multiple public benchmarks, including UAV‑based litter detection datasets and Pascal VOC 2012, to assess the impact on accuracy, generalization, and computational efficiency. Our results demonstrate that LUPI‑trained students consistently outperform their baseline counterparts, achieving significant boosts in detection accuracy with no increase in inference complexity or model size. Performance improvements are especially marked for medium and large objects, while ablation studies reveal that intermediate weighting of teacher guidance optimally balances learning from privileged and standard inputs. The findings affirm that the LUPI framework provides an effective and practical strategy for advancing object detection systems in both resource‑constrained and real‑world settings.
Authors: Yue Zhou, Jue Chen, Zilun Zhang, Penghui Huang, Ran Ding, Zhentao Zou, PengFei Gao, Yuchen Wei, Ke Li, Xue Yang, Xue Jiang, Hongxin Yang, Jonathan Li
Abstract: Remote sensing (RS) large vision‑language models (LVLMs) have shown strong promise across visual grounding (VG) tasks. However, existing RS VG datasets predominantly rely on explicit referring expressions‑such as relative position, relative size, and color cues‑thereby constraining performance on implicit VG tasks that require scenario‑specific domain knowledge. This article introduces DVGBench, a high‑quality implicit VG benchmark for drones, covering six major application scenarios: traffic, disaster, security, sport, social activity, and productive activity. Each object provides both explicit and implicit queries. Based on the dataset, we design DroneVG‑R1, an LVLM that integrates the novel Implicit‑to‑Explicit Chain‑of‑Thought (I2E‑CoT) within a reinforcement learning paradigm. This enables the model to take advantage of scene‑specific expertise, converting implicit references into explicit ones and thus reducing grounding difficulty. Finally, an evaluation of mainstream models on both explicit and implicit VG tasks reveals substantial limitations in their reasoning capabilities. These findings provide actionable insights for advancing the reasoning capacity of LVLMs for drone‑based agents. The code and datasets will be released at https://github.com/zytx121/DVGBench
Authors: Socratis Gkelios, Savvas D. Apostolidis, Pavlos Ch. Kapoutsis, Elias B. Kosmatopoulos, Athanasios Ch. Kapoutsis
Abstract: Unmanned Aerial Vehicles (UAVs) have revolutionized inspection tasks by offering a safer, more efficient, and flexible alternative to traditional methods. However, battery limitations often constrain their effectiveness, necessitating the development of optimized flight paths and data collection techniques. While existing approaches like coverage path planning (CPP) ensure comprehensive data collection, they can be inefficient, especially when inspecting multiple non connected Regions of Interest (ROIs). This paper introduces the Fast Inspection of Scattered Regions (FISR) problem and proposes a novel solution, the multi UAV Disjoint Areas Inspection (mUDAI) method. The introduced approach implements a two fold optimization procedure, for calculating the best image capturing positions and the most efficient UAV trajectories, balancing data resolution and operational time, minimizing redundant data collection and resource consumption. The mUDAI method is designed to enable rapid, efficient inspections of scattered ROIs, making it ideal for applications such as security infrastructure assessments, agricultural inspections, and emergency site evaluations. A combination of simulated evaluations and real world deployments is used to validate and quantify the method's ability to improve operational efficiency while preserving high quality data capture, demonstrating its effectiveness in real world operations. An open source Python implementation of the mUDAI method can be found on GitHub (https://github.com/soc12/mUDAI) and the collected and processed data from the real world experiments are all hosted on Zenodo (https://zenodo.org/records/13866483). Finally, this online platform (https://sites.google.com/view/mudai‑platform/) allows interested readers to interact with the mUDAI method and generate their own multi UAV FISR missions.
Authors: Zhuoyu Wu, Wenhui Ou, Qiawei Zheng, Jiayan Yang, Quanjun Wang, Wenqi Fang, Zheng Wang, Yongkui Yang, Heshan Li
Abstract: Motion blur caused by camera or object movement severely degrades image quality and poses challenges for real‑time applications such as autonomous driving, UAV perception, and medical imaging. In this paper, a lightweight U‑shaped network tailored for real‑time deblurring is presented and named RT‑Focuser. To balance speed and accuracy, we design three key components: Lightweight Deblurring Block (LD) for edge‑aware feature extraction, Multi‑Level Integrated Aggregation module (MLIA) for encoder integration, and Cross‑source Fusion Block (X‑Fuse) for progressive decoder refinement. Trained on a single blurred input, RT‑Focuser achieves 30.67 dB PSNR with only 5.85M parameters and 15.76 GMACs. It runs 6ms per frame on GPU and mobile, exceeds 140 FPS on both, showing strong potential for deployment on the edge. The official code and usage are available on: https://github.com/ReaganWu/RT‑Focuser.
Authors: Markus Gross, Sai B. Matha, Aya Fahmy, Rui Song, Daniel Cremers, Henri Meess
Abstract: Semantic Scene Completion (SSC) is essential for 3D perception in mobile robotics, as it enables holistic scene understanding by jointly estimating dense volumetric occupancy and per‑voxel semantics. Although SSC has been widely studied in terrestrial domains such as autonomous driving, aerial settings like autonomous flying remain largely unexplored, thereby limiting progress on downstream applications. Furthermore, LiDAR sensors are the primary modality for SSC data generation, which poses challenges for most uncrewed aerial vehicles (UAVs) due to flight regulations, mass and energy constraints, and the sparsity of LiDAR point clouds from elevated viewpoints. To address these limitations, we propose a LiDAR‑free, camera‑based data generation framework. By leveraging classical 3D reconstruction, our framework automates semantic label transfer by lifting <10% of annotated images into the reconstructed point cloud, substantially minimizing manual 3D annotation effort. Based on this framework, we introduce OccuFly, the first real‑world, camera‑based aerial SSC benchmark, captured across multiple altitudes and all seasons. OccuFly provides over 20,000 samples of images, semantic voxel grids, and metric depth maps across 21 semantic classes in urban, industrial, and rural environments, and follows established data organization for seamless integration. We benchmark both SSC and metric monocular depth estimation on OccuFly, revealing fundamental limitations of current vision foundation models in aerial settings and establishing new challenges for robust 3D scene understanding in the aerial domain. Visit https://github.com/markus‑42/occufly.
Authors: Yunkai Dang, Meiyi Zhu, Donghao Wang, Yizhuo Zhang, Jiacheng Yang, Qi Fan, Yuekun Yang, Wenbin Li, Feng Miao, Yang Gao
Abstract: Multimodal large language models (MLLMs) demonstrate strong perception and reasoning performance on existing remote sensing (RS) benchmarks. However, most prior benchmarks rely on low‑resolution imagery, and some high‑resolution benchmarks suffer from flawed reasoning‑task designs. We show that text‑only LLMs can perform competitively with multimodal vision‑language models on RS reasoning tasks without access to images, revealing a critical mismatch between current benchmarks and the intended evaluation of visual understanding. To enable faithful assessment, we introduce RSHR‑Bench, a super‑high‑resolution benchmark for RS visual understanding and reasoning. RSHR‑Bench contains 5,329 full‑scene images with a long side of at least 4,000 pixels, with up to about 3 x 10^8 pixels per image, sourced from widely used RS corpora and UAV collections. We design four task families: multiple‑choice VQA, open‑ended VQA, image captioning, and single‑image evaluation. These tasks cover nine perception categories and four reasoning types, supporting multi‑turn and multi‑image dialog. To reduce reliance on language priors, we apply adversarial filtering with strong LLMs followed by rigorous human verification. Overall, we construct 3,864 VQA tasks, 3,913 image captioning tasks, and 500 fully human‑written or verified single‑image evaluation VQA pairs. Evaluations across open‑source, closed‑source, and RS‑specific VLMs reveal persistent performance gaps in super‑high‑resolution scenarios. Code: https://github.com/Yunkaidang/RSHR
Authors: Wenda Li, Meng Wu, Liangzhao Chen, Sungmin Eum, Heesung Kwon, Qing Qu
Abstract: Training object detectors demands extensive, task‑specific annotations, yet this requirement becomes impractical in UAV‑based human detection due to constantly shifting target distributions and the scarcity of labeled images. As a remedy, synthetic simulators are adopted to generate annotated data, with a low annotation cost. However, the domain gap between synthetic and real images hinders the model from being effectively applied to the target domain. Accordingly, we introduce Coarse‑to‑Fine Hierarchical Alignment (CFHA), a three‑stage diffusion‑based framework designed to transform synthetic data for UAV‑based human detection, narrowing the domain gap while preserving the original synthetic labels. CFHA explicitly decouples global style and local content domain discrepancies and bridges those gaps using three modules: (1) Global Style Transfer ‑‑ a diffusion model aligns color, illumination, and texture statistics of synthetic images to the realistic style, using only a small real reference set; (2) Local Refinement ‑‑ a super‑resolution diffusion model is used to facilitate fine‑grained and photorealistic details for the small objects, such as human instances, preserving shape and boundary integrity; (3) Hallucination Removal ‑‑ a module that filters out human instances whose visual attributes do not align with real‑world data to make the human appearance closer to the target distribution. Extensive experiments on public UAV Sim2Real detection benchmarks demonstrate that our methods significantly improve the detection accuracy compared to the non‑transformed baselines. Specifically, our method achieves up to +14.1 improvement of mAP50 on Semantic‑Drone benchmark. Ablation studies confirm the complementary roles of the global and local stages and highlight the importance of hierarchical alignment. The code is released at \hrefhttps://github.com/liwd190019/CFHAthis url.
Authors: Lihuang Chen, Xiangyu Luo, Jun Meng
Abstract: We propose LEO‑RobotAgent, a general‑purpose language‑driven intelligent agent framework for robots. Under this framework, LLMs can operate different types of robots to complete unpredictable complex tasks across various scenarios. This framework features strong generalization, robustness, and efficiency. The application‑level system built around it can fully enhance bidirectional human‑robot intent understanding and lower the threshold for human‑robot interaction. Regarding robot task planning, the vast majority of existing studies focus on the application of large models in single‑task scenarios and for single robot types. These algorithms often have complex structures and lack generalizability. Thus, the proposed LEO‑RobotAgent framework is designed with a streamlined structure as much as possible, enabling large models to independently think, plan, and act within this clear framework. We provide a modular and easily registrable toolset, allowing large models to flexibly call various tools to meet different requirements. Meanwhile, the framework incorporates a human‑robot interaction mechanism, enabling the algorithm to collaborate with humans like a partner. Experiments have verified that this framework can be easily adapted to mainstream robot platforms including unmanned aerial vehicles (UAVs), robotic arms, and wheeled robot, and efficiently execute a variety of carefully designed tasks with different complexity levels. Our code is available at https://github.com/LegendLeoChen/LEO‑RobotAgent.
Authors: Reza Ahmari, Ahmad Mohammadi, Vahid Hemmati, Mohammed Mynuddin, Parham Kebria, Mahmoud Nabil Mahmoud, Xiaohong Yuan, Abdollah Homaifar
Abstract: The integration of Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs) is increasingly central to the development of intelligent autonomous systems for applications such as search and rescue, environmental monitoring, and logistics. However, precise coordination between these platforms in real‑time scenarios presents major challenges, particularly when external localization infrastructure such as GPS or GNSS is unavailable or degraded [1]. This paper proposes a vision‑based, data‑driven framework for real‑time UAV‑UGV integration, with a focus on robust UGV detection and heading angle prediction for navigation and coordination. The system employs a fine‑tuned YOLOv5 model to detect UGVs and extract bounding box features, which are then used by a lightweight artificial neural network (ANN) to estimate the UAV's required heading angle. A VICON motion capture system was used to generate ground‑truth data during training, resulting in a dataset of over 13,000 annotated images collected in a controlled lab environment. The trained ANN achieves a mean absolute error of 0.1506° and a root mean squared error of 0.1957°, offering accurate heading angle predictions using only monocular camera inputs. Experimental evaluations achieve 95% accuracy in UGV detection. This work contributes a vision‑based, infrastructure‑ independent solution that demonstrates strong potential for deployment in GPS/GNSS‑denied environments, supporting reliable multi‑agent coordination under realistic dynamic conditions. A demonstration video showcasing the system's real‑time performance, including UGV detection, heading angle prediction, and UAV alignment under dynamic conditions, is available at: https://github.com/Kooroshraf/UAV‑UGV‑Integration
Authors: Mika Persson, Jonas Lidman, Jacob Ljungberg, Samuel Sandelius, Adam Andersson
Abstract: This work studies the application of Multi‑Agent Reinforcement Learning (MARL) to decentralized control of unmanned aerial vehicles to relay a critical data package to a known position. For this purpose, a family of deterministic games is introduced, designed for MARL scaling studies. A robust baseline policy is proposed which restricts agent motion and applies Dijkstra's shortest path algorithm. Computational experiment results show that two off‑the‑shelf MARL algorithms perform competitively with the baseline for a small number of agents, but face scalability issues as the number of agents increases. Source code and animations are available online at https://github.com/mikapersson/Information‑Relaying.
Authors: Huilin Xu, Zhuoyang Liu, Yixiang Luomei, Feng Xu
Abstract: Aerial Vision‑and‑Language Navigation (VLN) aims to enable unmanned aerial vehicles (UAVs) to interpret natural language instructions and navigate complex urban environments using onboard visual observation. This task holds promise for real‑world applications such as low‑altitude inspection, search‑and‑rescue, and autonomous aerial delivery. Existing methods often rely on panoramic images, depth inputs, or odometry to support spatial reasoning and action planning. These requirements increase system cost and integration complexity, thus hindering practical deployment for lightweight UAVs. We present a unified aerial VLN framework that operates solely on egocentric monocular RGB observations and natural language instructions. The model formulates navigation as a next‑token prediction problem, jointly optimizing spatial perception, trajectory reasoning, and action prediction through prompt‑guided multi‑task learning. Moreover, we propose a keyframe selection strategy to reduce visual redundancy by retaining semantically informative frames, along with an action merging and label reweighting mechanism that mitigates long‑tailed supervision imbalance and facilitates stable multi‑task co‑training. Extensive experiments on the AerialVLN and OpenFly benchmark validate the effectiveness of our method. Under the challenging monocular RGB‑only setting, our model achieves strong results across both seen and unseen environments. It significantly outperforms existing RGB‑only baselines and narrows the performance gap with state‑of‑the‑art panoramic RGB‑D counterparts. Comprehensive ablation studies further demonstrate the contribution of our task design and architectural choices. Our code is publicly available at https://github.com/return‑sleep/AeroAct.
Authors: Đorđe Nedeljković
Abstract: Convolutional Neural Networks (CNNs) have proven highly effective for edge and mobile vision tasks due to their computational efficiency. While many recent works seek to enhance CNNs with global contextual understanding via self‑attention‑based Vision Transformers, these approaches often introduce significant computational overhead. In this work, we demonstrate that it is possible to retain strong global perception without relying on computationally expensive components. We present GlimmerNet, an ultra‑lightweight convolutional network built on the principle of separating receptive field diversity from feature recombination. GlimmerNet introduces Grouped Dilated Depthwise Convolutions(GDBlocks), which partition channels into groups with distinct dilation rates, enabling multi‑scale feature extraction at no additional parameter cost. To fuse these features efficiently, we design a novel Aggregator module that recombines cross‑group representations using grouped pointwise convolution, significantly lowering parameter overhead. With just 31K parameters and 29% fewer FLOPs than the most recent baseline, GlimmerNet achieves a new state‑of‑the‑art weighted F1‑score of 0.966 on the UAV‑focused AIDERv2 dataset. These results establish a new accuracy‑efficiency trade‑off frontier for real‑time emergency monitoring on resource‑constrained UAV platforms. Our implementation is publicly available at https://github.com/djordjened92/gdd‑cnn.
Authors: Chunhui Zhang, Li Liu, Zhipeng Zhang, Yong Wang, Hao Wen, Xi Zhou, Shiming Ge, Yanfeng Wang
Abstract: Unmanned Aerial Vehicles (UAVs) offer wide‑ranging applications but also pose significant safety and privacy violation risks in areas like airport and infrastructure inspection, spurring the rapid development of Anti‑UAV technologies in recent years. However, current Anti‑UAV research primarily focuses on RGB, infrared (IR), or RGB‑IR videos captured by fixed ground cameras, with little attention to tracking target UAVs from another moving UAV platform. To fill this gap, we propose a new multi‑modal visual tracking task termed UAV‑Anti‑UAV, which involves a pursuer UAV tracking a target adversarial UAV in the video stream. Compared to existing Anti‑UAV tasks, UAV‑Anti‑UAV is more challenging due to severe dual‑dynamic disturbances caused by the rapid motion of both the capturing platform and the target. To advance research in this domain, we construct a million‑scale dataset consisting of 1,810 videos, each manually annotated with bounding boxes, a language prompt, and 15 tracking attributes. Furthermore, we propose MambaSTS, a Mamba‑based baseline method for UAV‑Anti‑UAV tracking, which enables integrated spatial‑temporal‑semantic learning. Specifically, we employ Mamba and Transformer models to learn global semantic and spatial features, respectively, and leverage the state space model's strength in long‑sequence modeling to establish video‑level long‑term context via a temporal token propagation mechanism. We conduct experiments on the UAV‑Anti‑UAV dataset to validate the effectiveness of our method. A thorough experimental evaluation of 50 modern deep tracking algorithms demonstrates that there is still significant room for improvement in the UAV‑Anti‑UAV domain. The dataset and codes will be available at \colormagentahttps://github.com/983632847/Awesome‑Multimodal‑Object‑Tracking.
Authors: Mingning Guo, Mengwei Wu, Shaoxian Li, Haifeng Li, Chao Tao
Abstract: Existing image perception methods based on VLMs generally follow a paradigm wherein models extract and analyze image content based on user‑provided textual task prompts. However, such methods face limitations when applied to UAV imagery, which presents challenges like target confusion, scale variations, and complex backgrounds. These challenges arise because VLMs' understanding of image content depends on the semantic alignment between visual and textual tokens. When the task prompt is simplistic and the image content is complex, achieving effective alignment becomes difficult, limiting the model's ability to focus on task‑relevant information. To address this issue, we introduce AerialVP, the first agent framework for task prompt enhancement in UAV image perception. AerialVP proactively extracts multi‑dimensional auxiliary information from UAV images to enhance task prompts, overcoming the limitations of traditional VLM‑based approaches. Specifically, the enhancement process includes three stages: (1) analyzing the task prompt to identify the task type and enhancement needs, (2) selecting appropriate tools from the tool repository, and (3) generating enhanced task prompts based on the analysis and selected tools. To evaluate AerialVP, we introduce AerialSense, a comprehensive benchmark for UAV image perception that includes Aerial Visual Reasoning, Aerial Visual Question Answering, and Aerial Visual Grounding tasks. AerialSense provides a standardized basis for evaluating model generalization and performance across diverse resolutions, lighting conditions, and both urban and natural scenes. Experimental results demonstrate that AerialVP significantly enhances task prompt guidance, leading to stable and substantial performance improvements in both open‑source and proprietary VLMs. Our work will be available at https://github.com/lostwolves/AerialVP.
Authors: Yida Lin, Bing Xue, Mengjie Zhang, Sam Schofield, Richard Green
Abstract: UAV‑based autonomous forestry operations require rapid and precise tree branch segmentation for safe navigation and automated pruning across varying pixel resolutions and operational conditions. We evaluate different deep learning methods at three resolutions (256x256, 512x512, 1024x1024) using the Urban Street Tree Dataset, employing standard metrics (IoU, Dice) and specialized measures including Thin Structure IoU (TS‑IoU) and Connectivity Preservation Rate (CPR). Among 22 configurations tested, U‑Net with MiT‑B4 backbone achieves strong performance at 256x256. At 512x512, MiT‑B4 leads in IoU, Dice, TS‑IoU, and Boundary‑F1. At 1024x1024, U‑Net+MiT‑B3 shows the best validation performance for IoU/Dice and precision, while U‑Net++ excels in boundary quality. PSPNet provides the most efficient option (2.36/9.43/37.74 GFLOPs) with 25.7/19.6/11.8 percentage point IoU reductions compared to top performers at respective resolutions. These results establish multi‑resolution benchmarks for accuracy‑efficiency trade‑offs in embedded forestry systems. Implementation is available at https://github.com/BennyLinntu/PerformanceTreeBranchSegmentation.
Authors: Jiawen Wen, Yu Hu, Suixuan Qiu, Jinshan Huang, Xiaowen Chu
Abstract: Real‑time tracking of small unmanned aerial vehicles (UAVs) on edge devices faces a fundamental resolution‑speed conflict. Downsampling high‑resolution imagery to standard detector input sizes causes small target features to collapse below detectable thresholds. Yet processing native 1080p frames on resource‑constrained platforms yields insufficient throughput for smooth gimbal control. We propose SDG‑Track, a Sparse Detection‑Guided Tracker that adopts an Observer‑Follower architecture to reconcile this conflict. The Observer stream runs a high‑capacity detector at low frequency on the GPU to provide accurate position anchors from 1920x1080 frames. The Follower stream performs high‑frequency trajectory interpolation via ROI‑constrained sparse optical flow on the CPU. To handle tracking failures from occlusion or model drift caused by spectrally similar distractors, we introduce Dual‑Space Recovery, a training‑free re‑acquisition mechanism combining color histogram matching with geometric consistency constraints. Experiments on a ground‑to‑air tracking station demonstrate that SDG‑Track achieves 35.1 FPS system throughput while retaining 97.2% of the frame‑by‑frame detection precision. The system successfully tracks agile FPV drones under real‑world operational conditions on an NVIDIA Jetson Orin Nano. Our paper code is publicly available at https://github.com/Jeffry‑wen/SDG‑Track
Authors: Mengyuan Liu, Jinfu Liu, Yongkang Jiang, Bin He
Abstract: Human action recognition (HAR) in videos has garnered widespread attention due to the rich information in RGB videos. Nevertheless, existing methods for extracting deep features from RGB videos face challenges such as information redundancy, susceptibility to noise and high storage costs. To address these issues and fully harness the useful information in videos, we propose a novel heatmap pooling network (HP‑Net) for action recognition from videos, which extracts information‑rich, robust and concise pooled features of the human body in videos through a feedback pooling module. The extracted pooled features demonstrate obvious performance advantages over the previously obtained pose data and heatmap features from videos. In addition, we design a spatial‑motion co‑learning module and a text refinement modulation module to integrate the extracted pooled features with other multimodal data, enabling more robust action recognition. Extensive experiments on several benchmarks namely NTU RGB+D 60, NTU RGB+D 120, Toyota‑Smarthome and UAV‑Human consistently verify the effectiveness of our HP‑Net, which outperforms the existing human action recognition methods. Our code is publicly available at: https://github.com/liujf69/HPNet‑Action.
Authors: Nan Zhou, Huandong Wang, Jiahao Li, Han Li, Yali Song, Qiuhua Wang, Yong Li, Xinlei Chen
Abstract: Fine‑grained wildfire spread prediction is crucial for enhancing emergency response efficacy and decision‑making precision. However, existing research predominantly focuses on coarse spatiotemporal scales and relies on low‑resolution satellite data, capturing only macroscopic fire states while fundamentally constraining high‑precision localized fire dynamics modeling capabilities. To bridge this gap, we present FireSentry, a provincial‑scale multi‑modal wildfire dataset characterized by sub‑meter spatial and sub‑second temporal resolution. Collected using synchronized UAV platforms, FireSentry provides visible and infrared video streams, in‑situ environmental measurements, and manually validated fire masks. Building on FireSentry, we establish a comprehensive benchmark encompassing physics‑based, data‑driven, and generative models, revealing the limitations of existing mask‑only approaches. Our analysis proposes FiReDiff, a novel dual‑modality paradigm that first predicts future video sequences in the infrared modality, and then precisely segments fire masks in the mask modality based on the generated dynamics. FiReDiff achieves state‑of‑the‑art performance, with video quality gains of 39.2% in PSNR, 36.1% in SSIM, 50.0% in LPIPS, 29.4% in FVD, and mask accuracy gains of 3.3% in AUPRC, 59.1% in F1 score, 42.9% in IoU, and 62.5% in MSE when applied to generative models. The FireSentry benchmark dataset and FiReDiff paradigm collectively advance fine‑grained wildfire forecasting and dynamic disaster simulation. The processed benchmark dataset is publicly available at: https://github.com/Munan222/FireSentry‑Benchmark‑Dataset.
Authors: Liyuan Lou, Wanyun Li, Wentian Gan, Yifei Yu, Tengfei Wang, Xin Wang, Zongqian Zhan
Abstract: Compared with conventional offline UAV photogrammetry, real‑time UAV photogrammetry is essential for time‑critical geospatial applications such as disaster response and active digital‑twin maintenance. However, most existing methods focus on processing captured images or sequential frames in real time, without explicitly evaluating the quality of the on‑the‑go 3D reconstruction or providing guided feedback to enhance image acquisition in the target area. This work presents On‑the‑fly Feedback SfM, an explore‑and‑exploit framework for real‑time UAV photogrammetry, enabling iterative exploration of unseen regions and exploitation of already observed and reconstructed areas in near real time. Built upon SfM on‑the‑fly , the proposed method integrates three modules: (1) online incremental coarse‑mesh generation for dynamically expanding sparse 3D point cloud; (2) online mesh quality assessment with actionable indicators; and (3) predictive path planning for on‑the‑fly trajectory refinement. Comprehensive experiments demonstrate that our method achieves in‑situ reconstruction and evaluation in near real time while providing actionable feedback that markedly reduces coverage gaps and re‑flight costs. Via the integration of data collection, processing, 3D reconstruction and assessment, and online feedback, our on the‑fly feedback SfM could be an alternative for the transition from traditional passive working mode to a more intelligent and adaptive exploration workflow. Code is now available at https://github.com/IRIS‑LAB‑whu/OntheflySfMFeedback.
Authors: Zhihao Zhan, Yuhang Ming, Shaobin Li, Jie Yuan
Abstract: Multi‑sensor Simultaneous Localization and Mapping (SLAM) is essential for Unmanned Aerial Vehicles (UAVs) performing agricultural tasks such as spraying, surveying, and inspection. However, real‑world, multi‑modal agricultural UAV datasets that enable research on robust operation remain scarce. To address this gap, we present AgriLiRa4D, a multi‑modal UAV dataset designed for challenging outdoor agricultural environments. AgriLiRa4D spans three representative farmland types‑flat, hilly, and terraced‑and includes both boundary and coverage operation modes, resulting in six flight sequence groups. The dataset provides high‑accuracy ground‑truth trajectories from a Fiber Optic Inertial Navigation System with Real‑Time Kinematic capability (FINS_RTK), along with synchronized measurements from a 3D LiDAR, a 4D Radar, and an Inertial Measurement Unit (IMU), accompanied by complete intrinsic and extrinsic calibrations. Leveraging its comprehensive sensor suite and diverse real‑world scenarios, AgriLiRa4D supports diverse SLAM and localization studies and enables rigorous robustness evaluation against low‑texture crops, repetitive patterns, dynamic vegetation, and other challenges of real agricultural environments. To further demonstrate its utility, we benchmark four state‑of‑the‑art multi‑sensor SLAM algorithms across different sensor combinations, highlighting the difficulty of the proposed sequences and the necessity of multi‑modal approaches for reliable UAV localization. By filling a critical gap in agricultural SLAM datasets, AgriLiRa4D provides a valuable benchmark for the research community and contributes to advancing autonomous navigation technologies for agricultural UAVs. The dataset can be downloaded from: https://zhan994.github.io/AgriLiRa4D.
Authors: Hongda Liu, Yunfan Liu, Changlu Wang, Yunlong Wang, Zhenan Sun
Abstract: Recent advances in skeleton‑based action recognition increasingly leverage semantic priors from Large Language Models (LLMs) to enrich skeletal representations. However, the LLM is typically queried in isolation from the recognition model and receives no performance feedback. As a result, it often fails to deliver the targeted discriminative cues critical to distinguish similar actions. To overcome these limitations, we propose SkeletonAgent, a novel framework that bridges the recognition model and the LLM through two cooperative agents, i.e., Questioner and Selector. Specifically, the Questioner identifies the most frequently confused classes and supplies them to the LLM as context for more targeted guidance. Conversely, the Selector parses the LLM's response to extract precise joint‑level constraints and feeds them back to the recognizer, enabling finer‑grained cross‑modal alignment. Comprehensive evaluations on five benchmarks, including NTU RGB+D, NTU RGB+D 120, Kinetics‑Skeleton, FineGYM, and UAV‑Human, demonstrate that SkeletonAgent consistently outperforms state‑of‑the‑art benchmark methods. The code is available at https://github.com/firework8/SkeletonAgent.
Authors: Pascal Goldschmid, Aamir Ahmad
Abstract: Multi‑rotor UAVs face limited flight time due to battery constraints. Autonomous docking on blimps with onboard battery recharging and data offloading offers a promising solution for extended UAV missions. However, the vulnerability of blimps to wind gusts causes trajectory deviations, requiring precise, obstacle‑aware docking strategies. To this end, this work introduces two key novelties: (i) a temporal convolutional network that predicts blimp responses to wind gusts, enabling rapid gust detection and estimation of points where the wind gust effect has subsided; (ii) a model predictive controller (MPC) that leverages these predictions to compute collision‑free trajectories for docking, enabled by a novel obstacle avoidance method for close‑range manoeuvres near the blimp. Simulation results show our method outperforms a baseline constant‑velocity model of the blimp significantly across different scenarios. We further validate the approach in real‑world experiments, demonstrating the first autonomous multi‑rotor docking control strategy on blimps shown outside simulation. Source code is available here https://github.com/robot‑perception‑group/multi_rotor_airship_docking.
Authors: Tianyang Xu, Jinjie Gu, Xuefeng Zhu, XiaoJun Wu, Josef Kittler
Abstract: With the proliferation of low altitude unmanned aerial vehicles (UAVs), visual multi‑object tracking is becoming a critical security technology, demanding significant robustness even in complex environmental conditions. However, tracking UAVs using a single visual modality often fails in challenging scenarios, such as low illumination, cluttered backgrounds, and rapid motion. Although multi‑modal multi‑object UAV tracking is more resilient, the development of effective solutions has been hindered by the absence of dedicated public datasets. To bridge this gap, we release MM‑UAV, the first large‑scale benchmark for Multi‑Modal UAV Tracking, integrating three key sensing modalities, e.g. RGB, infrared (IR), and event signals. The dataset spans over 30 challenging scenarios, with 1,321 synchronised multi‑modal sequences, and more than 2.8 million annotated frames. Accompanying the dataset, we provide a novel multi‑modal multi‑UAV tracking framework, designed specifically for UAV tracking applications and serving as a baseline for future research. Our framework incorporates two key technical innovations, e.g. an offset‑guided adaptive alignment module to resolve spatio mismatches across sensors, and an adaptive dynamic fusion module to balance complementary information conveyed by different modalities. Furthermore, to overcome the limitations of conventional appearance modelling in multi‑object tracking, we introduce an event‑enhanced association mechanism that leverages motion cues from the event modality for more reliable identity maintenance. Comprehensive experiments demonstrate that the proposed framework consistently outperforms state‑of‑the‑art methods. To foster further research in multi‑modal UAV tracking, both the dataset and source code will be made publicly available at https://xuefeng‑zhu5.github.io/MM‑UAV/.
Authors: Xianwei Lv, Debin Tang, Zhecheng Shi, Wang Wang, Yujiao Zheng, Xiatian Zhu
Abstract: Meeting real‑time constraints for high‑performance Approximate Nearest Neighbor (ANN) search remains a critical challenge in remote sensing edge devices, which are essentially fusion systems like micro‑satellites and UAVs, largely due to stringent limitations in primary (RAM) and secondary (disk) storage. To address this challenge, we propose Edge‑ANN, an innovative ANN framework specifically engineered for storage efficiency. The core innovation of Edge‑ANN lies in its departure from traditional tree‑based methods that store high‑dimensional hyperplanes. Instead, it leverages pairs of existing data items, termed "anchors," to implicitly define spatial partitions. To ensure these partitions are both balanced and effective, we have developed a novel Binary Anchor Optimization algorithm.This architectural shift eliminates the dimension‑dependence of the space complexity. Rigorous experiments on three multi‑source datasets, MillionAID, High‑resolution Urban Complex Dataset, and GlobalUrbanNet Dataset, demonstrate that under simulated edge environments with dual storage constraints, Edge‑ANN achieves a 30‑40% reduction in secondary storage compared to the baseline, at the cost of a minor 3‑5% drop in retrieval accuracy. Furthermore, its overall retrieval performance surpasses that of other mainstream methods in these constrained scenarios. Collectively, these results establish Edge‑ANN as a state‑of‑the‑art solution for enabling large‑scale, high‑performance, real‑time remote sensing feature retrieval on edge devices with exceptionally constrained storage. The codes of Edge‑ANN are available at https://github.com/huaijiao666/Edge‑ANN.
Authors: Lingfeng Zhang, Yuchen Zhang, Hongsheng Li, Haoxiang Fu, Yingbo Tang, Hangjun Ye, Long Chen, Xiaojun Liang, Xiaoshuai Hao, Wenbo Ding
Abstract: Vision‑Language Models (VLMs), leveraging their powerful visual perception and reasoning capabilities, have been widely applied in Unmanned Aerial Vehicle (UAV) tasks. However, the spatial intelligence capabilities of existing VLMs in UAV scenarios remain largely unexplored, raising concerns about their effectiveness in navigating and interpreting dynamic environments. To bridge this gap, we introduce SpatialSky‑Bench, a comprehensive benchmark specifically designed to evaluate the spatial intelligence capabilities of VLMs in UAV navigation. Our benchmark comprises two categories‑Environmental Perception and Scene Understanding‑divided into 13 subcategories, including bounding boxes, color, distance, height, and landing safety analysis, among others. Extensive evaluations of various mainstream open‑source and closed‑source VLMs reveal unsatisfactory performance in complex UAV navigation scenarios, highlighting significant gaps in their spatial capabilities. To address this challenge, we developed the SpatialSky‑Dataset, a comprehensive dataset containing 1M samples with diverse annotations across various scenarios. Leveraging this dataset, we introduce Sky‑VLM, a specialized VLM designed for UAV spatial reasoning across multiple granularities and contexts. Extensive experimental results demonstrate that Sky‑VLM achieves state‑of‑the‑art performance across all benchmark tasks, paving the way for the development of VLMs suitable for UAV scenarios. The source code is available at https://github.com/linglingxiansen/SpatialSKy.
Authors: Mohamed Amine Ferrag, Abderrahmane Lakas, Merouane Debbah
Abstract: Autonomous aerial systems increasingly rely on large language models (LLMs) for mission planning, perception, and decision‑making, yet the lack of standardized and physically grounded benchmarks limits systematic evaluation of their reasoning capabilities. To address this gap, we introduce UAVBench, an open benchmark dataset comprising 50,000 validated UAV flight scenarios generated through taxonomy‑guided LLM prompting and multi‑stage safety validation. Each scenario is encoded in a structured JSON schema that includes mission objectives, vehicle configuration, environmental conditions, and quantitative risk labels, providing a unified representation of UAV operations across diverse domains. Building on this foundation, we present UAVBench_MCQ, a reasoning‑oriented extension containing 50,000 multiple‑choice questions spanning ten cognitive and ethical reasoning styles, ranging from aerodynamics and navigation to multi‑agent coordination and integrated reasoning. This framework enables interpretable and machine‑checkable assessment of UAV‑specific cognition under realistic operational contexts. We evaluate 32 state‑of‑the‑art LLMs, including GPT‑5, ChatGPT‑4o, Gemini 2.5 Flash, DeepSeek V3, Qwen3 235B, and ERNIE 4.5 300B, and find strong performance in perception and policy reasoning but persistent challenges in ethics‑aware and resource‑constrained decision‑making. UAVBench establishes a reproducible and physically grounded foundation for benchmarking agentic AI in autonomous aerial systems and advancing next‑generation UAV reasoning intelligence. To support open science and reproducibility, we release the UAVBench dataset, the UAVBench_MCQ benchmark, evaluation scripts, and all related materials on GitHub at https://github.com/maferrag/UAVBench
Authors: Jeongho Min, Dongyoung Kim, Jaehyup Lee
Abstract: Cross‑view image retrieval, particularly street‑to‑satellite matching, is a critical task for applications such as autonomous navigation, urban planning, and localization in GPS‑denied environments. However, existing approaches often require supervised training on curated datasets and rely on panoramic or UAV‑based images, which limits real‑world deployment. In this paper, we present a simple yet effective cross‑view image retrieval framework that leverages a pretrained vision encoder and a large language model (LLM), requiring no additional training. Given a monocular street‑view image, our method extracts geographic cues through web‑based image search and LLM‑based location inference, generates a satellite query via geocoding API, and retrieves matching tiles using a pretrained vision encoder (e.g., DINOv2) with PCA‑based whitening feature refinement. Despite using no ground‑truth supervision or finetuning, our proposed method outperforms prior learning‑based approaches on the benchmark dataset under zero‑shot settings. Moreover, our pipeline enables automatic construction of semantically aligned street‑to‑satellite datasets, which is offering a scalable and cost‑efficient alternative to manual annotation. All source codes will be made publicly available at https://jeonghomin.github.io/street2orbit.github.io/.
Authors: Weining Lu, Deer Bin, Lian Ma, Ming Ma, Zhihao Ma, Xiangyang Chen, Longfei Wang, Yixiao Feng, Zhouxian Jiang, Yongliang Shi, Bin Liang
Abstract: Efficient, accurate, and flexible relative localization is crucial in air‑ground collaborative tasks. However, current approaches for robot relative localization are primarily realized in the form of distributed multi‑robot SLAM systems with the same sensor configuration, which are tightly coupled with the state estimation of all robots, limiting both flexibility and accuracy. To this end, we fully leverage the high capacity of Unmanned Ground Vehicle (UGV) to integrate multiple sensors, enabling a semi‑distributed cross‑modal air‑ground relative localization framework. In this work, both the UGV and the Unmanned Aerial Vehicle (UAV) independently perform SLAM while extracting deep learning‑based keypoints and global descriptors, which decouples the relative localization from the state estimation of all agents. The UGV employs a local Bundle Adjustment (BA) with LiDAR, camera, and an IMU to rapidly obtain accurate relative pose estimates. The BA process adopts sparse keypoint optimization and is divided into two stages: First, optimizing camera poses interpolated from LiDAR‑Inertial Odometry (LIO), followed by estimating the relative camera poses between the UGV and UAV. Additionally, we implement an incremental loop closure detection algorithm using deep learning‑based descriptors to maintain and retrieve keyframes efficiently. Experimental results demonstrate that our method achieves outstanding performance in both accuracy and efficiency. Unlike traditional multi‑robot SLAM approaches that transmit images or point clouds, our method only transmits keypoint pixels and their descriptors, effectively constraining the communication bandwidth under 0.3 Mbps. Codes and data will be publicly available on https://github.com/Ascbpiac/cross‑model‑relative‑localization.git.
Authors: Tao Liu, Kan Ren, Qian Chen
Abstract: With the rapid growth of the low‑altitude economy, unmanned aerial vehicles (UAVs) have become key platforms for measurement and tracking in intelligent patrol systems. However, in GNSS‑denied environments, localization schemes that rely solely on satellite signals are prone to failure. Cross‑view image retrieval‑based localization is a promising alternative, yet substantial geometric and appearance domain gaps exist between oblique UAV views and nadir satellite orthophotos. Moreover, conventional approaches often depend on complex network architectures, text prompts, or large amounts of annotation, which hinders generalization. To address these issues, we propose DiffusionUavLoc, a cross‑view localization framework that is image‑prompted, text‑free, diffusion‑centric, and employs a VAE for unified representation. We first use training‑free geometric rendering to synthesize pseudo‑satellite images from UAV imagery as structural prompts. We then design a text‑free conditional diffusion model that fuses multimodal structural cues to learn features robust to viewpoint changes. At inference, descriptors are computed at a fixed time step t and compared using cosine similarity. On University‑1652 and SUES‑200, the method performs competitively for cross‑view localization, especially for satellite‑to‑drone in University‑1652.Our data and code will be published at the following URL: https://github.com/liutao23/DiffusionUavLoc.git.
Authors: Xin Zuo, Chenyu Qu, Haibo Zhan, Jifeng Shen, Wankou Yang
Abstract: Recent multispectral object detection methods have primarily focused on spatial‑domain feature fusion based on CNNs or Transformers, while the potential of frequency‑domain feature remains underexplored. In this work, we propose a novel Spatial and Frequency Feature Reconstruction method (SFFR) method, which leverages the spatial‑frequency feature representation mechanisms of the Kolmogorov‑Arnold Network (KAN) to reconstruct complementary representations in both spatial and frequency domains prior to feature fusion. The core components of SFFR are the proposed Frequency Component Exchange KAN (FCEKAN) module and Multi‑Scale Gaussian KAN (MSGKAN) module. The FCEKAN introduces an innovative selective frequency component exchange strategy that effectively enhances the complementarity and consistency of cross‑modal features based on the frequency feature of RGB and IR images. The MSGKAN module demonstrates excellent nonlinear feature modeling capability in the spatial domain. By leveraging multi‑scale Gaussian basis functions, it effectively captures the feature variations caused by scale changes at different UAV flight altitudes, significantly enhancing the model's adaptability and robustness to scale variations. It is experimentally validated that our proposed FCEKAN and MSGKAN modules are complementary and can effectively capture the frequency and spatial semantic features respectively for better feature fusion. Extensive experiments on the SeaDroneSee, DroneVehicle and DVTOD datasets demonstrate the superior performance and significant advantages of the proposed method in UAV multispectral object perception task. Code will be available at https://github.com/qchenyu1027/SFFR.
Authors: Yong Huang, Ruihao Li, Mingyang Chen, Feiyang Zhao, Dalong Zhang, Wanqing Tu
Abstract: The open nature of wireless communications renders unmanned aerial vehicle (UAV) communications vulnerable to impersonation attacks, under which malicious UAVs can impersonate authorized ones with stolen digital certificates. Traditional fingerprint‑based UAV authentication approaches rely on a single modality of sensory data gathered from a single layer of the network model, resulting in unreliable authentication experiences, particularly when UAVs are mobile and in an open‑world environment. To transcend these limitations, this paper proposes SecureLink, a UAV authentication system that is among the first to employ cross‑layer information for enhancing the efficiency and reliability of UAV authentication. Instead of using single modalities, SecureLink fuses physical‑layer radio frequency (RF) fingerprints and application‑layer micro‑electromechanical system (MEMS) fingerprints into reliable UAV identifiers via multimodal fusion. SecureLink first aligns fingerprints from channel state information measurements and telemetry data, such as feedback readings of onboard accelerometers, gyroscopes, and barometers. Then, an attention‑based neural network is devised for in‑depth feature fusion. Next, the fused features are trained by a multi‑similarity loss and fed into a one‑class support vector machine for open‑world authentication. We extensively implement our SecureLink using three different types of UAVs and evaluate it in different environments. With only six additional data frames, SecureLink achieves a closed‑world accuracy of 98.61% and an open‑world accuracy of 97.54% with two impersonating UAVs, outperforming the existing approaches in authentication robustness and communication overheads. Finally, our datasets collected from these experiments are available on GitHub: https://github.com/PhyGroup/SecureLink\_data.
Authors: Tao Liu, Kan Ren, Qian Chen
Abstract: With the rapid growth of the low‑altitude economy, UAVs have become crucial for measurement and tracking in patrol systems. However, in GNSS‑denied areas, satellite‑based localization methods are prone to failure. This paper presents a cross‑view UAV localization framework that performs map matching via object detection, aimed at effectively addressing cross‑temporal, cross‑view, heterogeneous aerial image matching. In typical pipelines, UAV visual localization is formulated as an image‑retrieval problem: features are extracted to build a localization map, and the pose of a query image is estimated by matching it to a reference database with known poses. Because publicly available UAV localization datasets are limited, many approaches recast localization as a classification task and rely on scene labels in these datasets to ensure accuracy. Other methods seek to reduce cross‑domain differences using polar‑coordinate reprojection, perspective transformations, or generative adversarial networks; however, they can suffer from misalignment, content loss, and limited realism. In contrast, we leverage modern object detection to accurately extract salient instances from UAV and satellite images, and integrate a graph neural network to reason about inter‑image and intra‑image node relationships. Using a fine‑grained, graph‑based node‑similarity metric, our method achieves strong retrieval and localization performance. Extensive experiments on public and real‑world datasets show that our approach handles heterogeneous appearance differences effectively and generalizes well, making it applicable to scenarios with larger modality gaps, such as infrared‑visible image matching. Our dataset will be publicly available at the following URL: https://github.com/liutao23/ODGNNLoc.git.
Authors: Zachary Ravichandran, Fernando Cladera, Ankit Prabhu, Jason Hughes, Varun Murali, Camillo Taylor, George J. Pappas, Vijay Kumar
Abstract: Heterogeneous robot teams operating in realistic settings often must accomplish complex missions requiring collaboration and adaptation to information acquired online. Because robot teams frequently operate in unstructured environments ‑‑ uncertain, open‑world settings without prior maps ‑‑ subtasks must be grounded in robot capabilities and the physical world. While heterogeneous teams have typically been designed for fixed specifications, generative intelligence opens the possibility of teams that can accomplish a wide range of missions described in natural language. However, current large language model (LLM)‑enabled teaming methods typically assume well‑structured and known environments, limiting deployment in unstructured environments. We present SPINE‑HT, a framework that addresses these limitations by grounding the reasoning abilities of LLMs in the context of a heterogeneous robot team through a three‑stage process. Given language specifications describing mission goals and team capabilities, an LLM generates grounded subtasks which are validated for feasibility. Subtasks are then assigned to robots based on capabilities such as traversability or perception and refined given feedback collected during online operation. In simulation experiments with closed‑loop perception and control, our framework achieves nearly twice the success rate compared to prior LLM‑enabled heterogeneous teaming approaches. In real‑world experiments with a Clearpath Jackal, a Clearpath Husky, a Boston Dynamics Spot, and a high‑altitude UAV, our method achieves an 87% success rate in missions requiring reasoning about robot capabilities and refining subtasks with online feedback. More information is provided at https://zacravichandran.github.io/SPINE‑HT.
Authors: Runxi Huang, Mingxuan Yu, Mingyu Tsoi, Xiaomin Ouyang
Abstract: Real‑time multimodal inference on resource‑constrained edge devices is essential for applications such as autonomous driving, human‑computer interaction, and mobile health. However, prior work often overlooks the tight coupling between sensing dynamics and model execution, as well as the complex inter‑modality dependencies. In this paper, we propose MMEdge, a new on‑device multimodal inference framework based on pipelined sensing and encoding. Instead of waiting for complete sensor inputs, MMEdge decomposes the entire inference process into a sequence of fine‑grained sensing and encoding units, allowing computation to proceed incrementally as data arrive. MMEdge also introduces a lightweight but effective temporal aggregation module that captures rich temporal dynamics across different pipelined units to maintain accuracy performance. Such pipelined design also opens up opportunities for fine‑grained cross‑modal optimization and early decision‑making during inference. To further enhance system performance under resource variability and input data complexity, MMEdge incorporates an adaptive multimodal configuration optimizer that dynamically selects optimal sensing and model configurations for each modality under latency constraints, and a cross‑modal speculative skipping mechanism that bypasses future units of slower modalities when early predictions reach sufficient confidence. We evaluate MMEdge using two public multimodal datasets and deploy it on a real‑world unmanned aerial vehicle (UAV)‑based multimodal testbed. The results show that MMEdge significantly reduces end‑to‑end latency while maintaining high task accuracy across various system and data dynamics. A video demonstration of MMEdge's performance in real world is available at https://youtu.be/qRew7sT‑iWw.
Authors: Xinyu Zhou, Tongxin Pan, Lingyi Hong, Pinxue Guo, Haijing Guo, Zhaoyu Chen, Kaixun Jiang, Wenqiang Zhang
Abstract: UAV tracking can be widely applied in scenarios such as disaster rescue, environmental monitoring, and logistics transportation. However, existing UAV tracking methods predominantly emphasize speed and lack exploration in semantic awareness, which hinders the search region from extracting accurate localization information from the template. The limitation results in suboptimal performance under typical UAV tracking challenges such as camera motion, fast motion, and low resolution, etc. To address this issue, we propose a dynamic semantic aware correlation modeling tracking framework. The core of our framework is a Dynamic Semantic Relevance Generator, which, in combination with the correlation map from the Transformer, explore semantic relevance. The approach enhances the search region's ability to extract important information from the template, improving accuracy and robustness under the aforementioned challenges. Additionally, to enhance the tracking speed, we design a pruning method for the proposed framework. Therefore, we present multiple model variants that achieve trade‑offs between speed and accuracy, enabling flexible deployment according to the available computational resources. Experimental results validate the effectiveness of our method, achieving competitive performance on multiple UAV tracking datasets. The code is available at https://github.com/zxyyxzz/DSATrack.
Authors: Giovanni Bologni, Martin Bo Møller, Richard Heusdens, Richard C. Hendriks
Abstract: Conventional acoustic beamformers typically assume short‑time stationarity and process frequency bins independently, ignoring inter‑frequency correlations. This is suboptimal for almost‑periodic noise sources such as engines, fans, and musical instruments: these signals are better modeled as (almost) cyclostationary (ACS) processes with statistically correlated spectral components. This paper introduces the cyclic minimum power distortionless response (cMPDR) beamformer, which extends the conventional MPDR to jointly exploit spatial and spectral correlations. Building on frequency‑shifted (FRESH) filtering, it suppresses noise components that are coherent across harmonically related frequencies, reducing residual noise beyond what spatial filtering alone achieves. To address inharmonicity, where partials deviate from exact integer multiples of a fundamental frequency, we estimate resonant frequencies from a periodogram and derive frequency shifts from their pairwise spacing. Theoretical analysis yields closed‑form expressions for residual noise and proves that output power decreases monotonically with the number of cyclic components. Experiments on synthetic harmonic noise and real UAV motor recordings confirm these findings: in low‑SNR scenarios, the cMPDR achieves up to 5dB improvement in SI‑SDR over the MPDR, yields consistent STOI gains, and remains effective with a single microphone. When spectral correlation is absent, the method reduces to conventional MPDR and does not degrade performance. These results suggest that cyclic processing is a viable direction for acoustic noise reduction that deserves further investigation. Code is available at https://github.com/Screeen/cMPDR.
Authors: Tianhao Li, Tingfa Xu, Ying Wang, Haolin Qin, Xu Lin, Jianan Li
Abstract: Drone‑based multi‑object tracking is essential yet highly challenging due to small targets, severe occlusions, and cluttered backgrounds. Existing RGB‑based tracking algorithms heavily depend on spatial appearance cues such as color and texture, which often degrade in aerial views, compromising reliability. Multispectral imagery, capturing pixel‑level spectral reflectance, provides crucial cues that enhance object discriminability under degraded spatial conditions. However, the lack of dedicated multispectral UAV datasets has hindered progress in this domain. To bridge this gap, we introduce MMOT, the first challenging benchmark for drone‑based multispectral multi‑object tracking. It features three key characteristics: (i) Large Scale ‑ 125 video sequences with over 488.8K annotations across eight categories; (ii) Comprehensive Challenges ‑ covering diverse conditions such as extreme small targets, high‑density scenarios, severe occlusions, and complex motion; and (iii) Precise Oriented Annotations ‑ enabling accurate localization and reduced ambiguity under aerial perspectives. To better extract spectral features and leverage oriented annotations, we further present a multispectral and orientation‑aware MOT scheme adapting existing methods, featuring: (i) a lightweight Spectral 3D‑Stem integrating spectral features while preserving compatibility with RGB pretraining; (ii) an orientation‑aware Kalman filter for precise state estimation; and (iii) an end‑to‑end orientation‑adaptive transformer. Extensive experiments across representative trackers consistently show that multispectral input markedly improves tracking performance over RGB baselines, particularly for small and densely packed objects. We believe our work will advance drone‑based multispectral multi‑object tracking research. Our MMOT, code, and benchmarks are publicly available at https://github.com/Annzstbl/MMOT.
Authors: Tien-Dat Nguyen, Thien-Minh Nguyen, Vinh-Hao Nguyen
Abstract: Onboard simultaneous localization and mapping (SLAM) methods are commonly used to provide accurate localization information for autonomous robots. However, the coordinate origin of SLAM estimate often resets for each run. On the other hand, UWB‑based localization with fixed anchors can ensure a consistent coordinate reference across sessions; however, it requires an accurate assignment of the anchor nodes' coordinates. To this end, we propose a two‑stage approach that calibrates and fuses UWB data and SLAM data to achieve coordinate‑wise consistent and accurate localization in the same environment. In the first stage, we solve a continuous‑time batch optimization problem by using the range and odometry data from one full run, incorporating height priors and anchor‑to‑anchor distance factors to recover the anchors' 3D positions. For the subsequent runs in the second stage, a sliding‑window optimization scheme fuses the UWB and SLAM data, which facilitates accurate localization in the same coordinate system. Experiments are carried out on the NTU VIRAL dataset with six scenarios of UAV flight, and we show that calibration using data in one run is sufficient to enable accurate localization in the remaining runs. We release our source code to benefit the community at https://github.com/ntdathp/slam‑uwb‑calibration.
Authors: Hongyang Zhang, Yinhao Liu, Zhenyu Kuang
Abstract: Cross‑view geo‑localization aims at establishing location correspondences between different viewpoints. Existing approaches typically learn cross‑view correlations through direct feature similarity matching, often overlooking semantic degradation caused by extreme viewpoint disparities. To address this unique problem, we focus on robust feature retrieval under viewpoint variation and propose the novel SkyLink method. We firstly utilize the Google Retrieval Enhancement Module to perform data enhancement on street images, which mitigates the occlusion of the key target due to restricted street viewpoints. The Patch‑Aware Feature Aggregation module is further adopted to emphasize multiple local feature aggregations to ensure the consistent feature extraction across viewpoints. Meanwhile, we integrate the 3D scene information constructed from multi‑scale UAV images as a bridge between street and satellite viewpoints, and perform feature alignment through self‑supervised and cross‑view contrastive learning. Experimental results demonstrate robustness and generalization across diverse urban scenarios, which achieve 25.75% Recall@1 accuracy on University‑1652 in the UAVM2025 Challenge. Code will be released at https://github.com/HRT00/CVGL‑3D.
Authors: Ben Liang, Hongguang Wei, Yuan Liu, Bingwen Qiu, Yihong Wang, Xiubao Sui, Qian Chen
Abstract: Remote sensing object detection is a critical technology for real‑world applications such as natural resource monitoring, traffic management, and UAV‑based rescue. Detecting tiny objects in high‑resolution aerial imagery remains challenging due to weak visual cues and insufficient global context modeling in complex scenes. Existing methods often suffer from delayed contextual interaction and limited nonlinear reasoning, which restrict their ability to effectively refine shallow representations and ultimately lead to suboptimal performance. To address these challenges, we propose FMC‑DETR, a frequency‑decoupled fusion framework for aerial‑view object detection. First, we propose the Wavelet Kolmogorov‑Arnold Transformer (WeKat) backbone, which employs cascaded wavelet transforms to enhance global low‑frequency structure perception in shallow features while preserving fine‑grained details, and further leverages Kolmogorov‑Arnold networks for adaptive nonlinear modeling of multi‑scale dependencies. Second, we introduce the Multi‑Domain Feature Coordination (MDFC) module, which refines cross‑scale fused representations through partial‑channel spatial, spectral, and structural coordination, thereby strengthening small‑object‑related feature responses in cluttered scenes. Finally, we design the Compact Partial Fusion (CPF) module, which performs compact multi‑branch aggregation with progressive partial refinement to improve feature diversity and multi‑scale interaction while preserving stable information flow and reducing redundant perturbation. Extensive experiments across multiple remote sensing benchmarks demonstrate that FMC‑DETR achieves state‑of‑the‑art performance and significantly outperforming the baseline detector. Code is available at https://github.com/bloomingvision/FMC‑DETR.
Authors: Babak Salamat, Dominik Mattern, Sebastian-Sven Olzem, Gerhard Elsbacher, Christian Seidel, Andrea M. Tonello
Abstract: We propose \textGMP^3, a multiphase global path planning framework that generates dynamically feasible three‑dimensional trajectories for unmanned aerial vehicles (UAVs) operating in cluttered environments. The framework extends traditional path planning from Euclidean position spaces to the Lie group \mathrmSE(3), allowing joint learning of translational motion and rotational dynamics. A modified Bellman‑based operator is introduced to support reinforcement learning (RL) policy updates while leveraging prior trajectory information for improved convergence. \textGMP^3 is designed as a distributed framework in which agents influence each other and share policy information along the trajectory: each agent refines its assigned segment and shares with its neighbors via a consensus‑based scheme, enabling cooperative policy updates and convergence toward a path shaped globally even under kinematic constraints. We also propose DroneManager, a modular ground control software that interfaces the planner with real UAV platforms via the MAVLink protocol, supporting real‑time deployment and feedback. Simulation studies and indoor flight experiments validate the effectiveness of the proposed method in constrained 3D environments, demonstrating reliable obstacle avoidance and smooth, feasible trajectories across both position and orientation. The open‑source implementation is available at https://github.com/Domattee/DroneManager
Authors: Oussema Dhaouadi, Riccardo Marin, Johannes Meier, Jacques Kaiser, Daniel Cremers
Abstract: Accurate visual localization from aerial views is a fundamental problem with applications in mapping, large‑area inspection, and search‑and‑rescue operations. In many scenarios, these systems require high‑precision localization while operating with limited resources (e.g., no internet connection or GNSS/GPS support), making large image databases or heavy 3D models impractical. Surprisingly, little attention has been given to leveraging orthographic geodata as an alternative paradigm, which is lightweight and increasingly available through free releases by governmental authorities (e.g., the European Union). To fill this gap, we propose OrthoLoC, the first large‑scale dataset comprising 16,425 UAV images from Germany and the United States with multiple modalities. The dataset addresses domain shifts between UAV imagery and geospatial data. Its paired structure enables fair benchmarking of existing solutions by decoupling image retrieval from feature matching, allowing isolated evaluation of localization and calibration performance. Through comprehensive evaluation, we examine the impact of domain shifts, data resolutions, and covisibility on localization accuracy. Finally, we introduce a refinement technique called AdHoP, which can be integrated with any feature matcher, improving matching by up to 95% and reducing translation error by up to 63%. The dataset and code are available at: https://deepscenario.github.io/OrthoLoC.
Authors: Jiayu Yuan, Ming Dai, Enhui Zheng, Chao Su, Nanxing Chen, Qiming Hu, Shibo Zhu, Yibin Cao
Abstract: Vision‑based Unmanned Aerial Vehicle (UAV) localization systems have been extensively investigated for Global Navigation Satellite System (GNSS)‑denied environments. However, existing retrieval‑based approaches face limitations in dataset availability and persistent challenges including suboptimal real‑time performance, environmental sensitivity, and limited generalization capability, particularly in dynamic or temporally varying environments. To overcome these limitations, we present a large‑scale Multi‑Altitude Flight Segments dataset (MAFS) for variable altitude scenarios and propose a novel Semantic‑Weighted Adaptive Particle Filter (SWA‑PF) method. This approach integrates robust semantic features from both UAV‑captured images and satellite imagery through two key innovations: a semantic weighting mechanism and an optimized particle filtering architecture. Evaluated using our dataset, the proposed method achieves 10x computational efficiency gain over feature extraction methods, maintains global positioning errors below 10 meters, and enables rapid 4 degree of freedom (4‑DoF) pose estimation within seconds using accessible low‑resolution satellite maps. Code and dataset will be available at https://github.com/YuanJiayuuu/SWA‑PF.
Authors: Hojat Ardi, Amir Jahanshahi, Ali Diba
Abstract: Aerial object tracking remains a challenging task due to scale variations, dynamic backgrounds, clutter, and frequent occlusions. While most existing trackers emphasize spatial cues, they often overlook temporal dependencies, resulting in limited robustness in long‑term tracking and under occlusion. Furthermore, correlation‑based Siamese trackers are inherently constrained by the linear nature of correlation operations, making them ineffective against complex, non‑linear appearance changes. To address these limitations, we introduce T‑SiamTPN, a temporal‑aware Siamese tracking framework that extends the SiamTPN architecture with explicit temporal modeling. Our approach incorporates temporal feature fusion and attention‑based interactions, strengthening temporal consistency and enabling richer feature representations. These enhancements yield significant improvements over the baseline and achieve performance competitive with state‑of‑the‑art trackers. Crucially, despite the added temporal modules, T‑SiamTPN preserves computational efficiency. Deployed on the resource‑constrained Jetson Nano, the tracker runs in real time at 7.1 FPS, demonstrating its suitability for real‑world embedded applications without notable runtime overhead. Experimental results highlight substantial gains: compared to the baseline, T‑SiamTPN improves success rate by 13.7% and precision by 14.7%. These findings underscore the importance of temporal modeling in Siamese tracking frameworks and establish T‑SiamTPN as a strong and efficient solution for aerial object tracking. Code is available at: https://github.com/to/be/released
Authors: Yechen Zhang, Bin Gao, Gang Wang, Jian Sun, Zhuo Li
Abstract: Reinforcement learning (RL) has shown promise in a large number of robotic control tasks. Nevertheless, its deployment on unmanned aerial vehicles (UAVs) remains challenging, mainly because of reliance on accurate dynamic models and platform‑specific sensing, which hinders cross‑platform transfer. This paper presents the CORB‑Planner (Corridor‑as‑Observations for RL B‑spline planner), a real‑time, RL‑based trajectory planning framework for high‑speed autonomous UAV flight across heterogeneous platforms. The key idea is to combine B‑spline trajectory generation with the RL policy producing successive control points with a compact safe flight corridor (SFC) representation obtained via heuristic search. The SFC abstracts obstacle information in a low‑dimensional form, mitigating overfitting to platform‑specific details and reducing sensitivity to model inaccuracies. To narrow the sim‑to‑real gap, we adopt an easy‑to‑hard progressive training pipeline in simulation. A value‑based soft decomposed‑critic Q (SDCQ) algorithm is used to learn effective policies within approximately ten minutes of training. Benchmarks in simulation and real‑world tests demonstrate real‑time planning on lightweight onboard hardware and support maximum flight speeds up to 8.2m/s in dense, cluttered environments without external positioning. Compatibility with various UAV configurations (quadrotors, hexarotors) and modest onboard compute underlines the generality and robustness of CORB‑Planner for practical deployment.
Authors: Yue Zhou, Litong Feng, Mengcheng Lan, Xue Yang, Qingyun Li, Yiping Ke, Xue Jiang, Wayne Zhang
Abstract: Mathematical reasoning is critical for tasks such as precise distance and area computations, trajectory estimations, and spatial analysis in unmanned aerial vehicle (UAV) based remote sensing, yet current vision‑language models (VLMs) have not been adequately tested in this domain. To address this gap, we introduce AVI‑Math, the first benchmark to rigorously evaluate multimodal mathematical reasoning in aerial vehicle imagery, moving beyond simple counting tasks to include domain‑specific knowledge in areas such as geometry, logic, and algebra. The dataset comprises 3,773 high‑quality vehicle‑related questions captured from UAV views, covering 6 mathematical subjects and 20 topics. The data, collected at varying altitudes and from multiple UAV angles, reflects real‑world UAV scenarios, ensuring the diversity and complexity of the constructed mathematical problems. In this paper, we benchmark 14 prominent VLMs through a comprehensive evaluation and demonstrate that, despite their success on previous multimodal benchmarks, these models struggle with the reasoning tasks in AVI‑Math. Our detailed analysis highlights significant limitations in the mathematical reasoning capabilities of current VLMs and suggests avenues for future research. Furthermore, we explore the use of Chain‑of‑Thought prompting and fine‑tuning techniques, which show promise in addressing the reasoning challenges in AVI‑Math. Our findings not only expose the limitations of VLMs in mathematical reasoning but also offer valuable insights for advancing UAV‑based trustworthy VLMs in real‑world applications. The code, and datasets will be released at https://github.com/VisionXLab/avi‑math
Authors: Jianping Li, Xinhang Xu, Zhongyuan Liu, Shenghai Yuan, Muqing Cao, Lihua Xie
Abstract: LiDAR‑based 3D perception and localization on unmanned aerial vehicles (UAVs) are fundamentally limited by the narrow field of view (FoV) of compact LiDAR sensors and the payload constraints that preclude multi‑sensor configurations. Traditional motorized scanning systems with fixed‑speed rotations lack scene awareness and task‑level adaptability, leading to degraded odometry and mapping performance in complex, occluded environments. Inspired by the active sensing behavior of owls, we propose AEOS (Active Environment‑aware Optimal Scanning), a biologically inspired and computationally efficient framework for adaptive LiDAR control in UAV‑based LiDAR‑Inertial Odometry (LIO). AEOS combines model predictive control (MPC) and reinforcement learning (RL) in a hybrid architecture: an analytical uncertainty model predicts future pose observability for exploitation, while a lightweight neural network learns an implicit cost map from panoramic depth representations to guide exploration. To support scalable training and generalization, we develop a point cloud‑based simulation environment with real‑world LiDAR maps across diverse scenes, enabling sim‑to‑real transfer. Extensive experiments in both simulation and real‑world environments demonstrate that AEOS significantly improves odometry accuracy compared to fixed‑rate, optimization‑only, and fully learned baselines, while maintaining real‑time performance under onboard computational constraints. The project page can be found at https://kafeiyin00.github.io/AEOS/.
Authors: Wei Lu, Lingyu Zhu, Si-Bao Chen
Abstract: Low light conditions significantly degrade Unmanned Aerial Vehicles (UAVs) performance in critical applications. Existing Low‑light Image Enhancement (LIE) methods struggle with the unique challenges of aerial imagery, including Ultra‑High Resolution (UHR), lack of paired data, severe non‑uniform illumination, and deployment constraints. To address these issues, we propose three key contributions. First, we present U3D, the first unsupervised UHR UAV dataset for LIE, with a unified evaluation toolkit. Second, we introduce the Edge Efficiency Index (EEI), a novel metric balancing perceptual quality with key deployment factors: speed, resolution, model complexity, and memory footprint. Third, we develop U3LIE, an efficient framework with two training‑only designs‑Adaptive Pre‑enhancement Augmentation (APA) for input normalization and a Luminance Interval Loss (L_int) for exposure control. U3LIE achieves SOTA results, processing 4K images at 23.8 FPS on a single GPU, making it ideal for real‑time on‑board deployment. In summary, these contributions provide a holistic solution (dataset, metric, and method) for advancing robust 24/7 UAV vision. The code and datasets are available at https://github.com/lwCVer/U3D_Toolkit.
Authors: Yong Su, Yiyi Chen, Shenghong Yi, Hui Feng, Yuedong Xu, Wang Xiang, Bo Hu
Abstract: Cellular‑connected UAV systems have enabled a wide range of low‑altitude aerial services. However, these systems still face many challenges, such as frequent handovers and the inefficiency of traditional transport protocols. To better study these issues, we develop a modular and scalable simulation platform specifically designed for UAVs communication leveraging the research ecology in wireless communication of MATLAB. The platform supports flexible 5G NR node deployment, customizable UAVs mobility models, and multi‑network‑interface extensions. It also supports multiple transport protocols including TCP, UDP, QUIC, etc., allowing to investigate how different transport protocols affect UAVs communication performance. In addition, the platform includes a handover management module, enabling the evaluation of both traditional and learning‑based handover strategies. Our platform can serve as a testbed for the development and evaluation of advanced transmission strategies in cellular‑connected UAV systems.
Authors: Tongtong Feng, Xin Wang, Feilin Han, Leping Zhang, Wenwu Zhu
Abstract: Swarm UAV autonomous flight for Embodied Long‑Horizon (ELH) tasks is crucial for advancing the low‑altitude economy. However, existing methods focus only on specific basic tasks due to dataset limitations, failing in real‑world deployment for ELH tasks. ELH tasks are not mere concatenations of basic tasks, requiring handling long‑term dependencies, maintaining embodied persistent states, and adapting to dynamic goal shifts. This paper presents U2UData+, the first large‑scale swarm UAV autonomous flight dataset for ELH tasks and the first scalable swarm UAV data online collection and algorithm closed‑loop verification platform. The dataset is captured by 15 UAVs in autonomous collaborative flights for ELH tasks, comprising 12 scenes, 720 traces, 120 hours, 600 seconds per trajectory, 4.32M LiDAR frames, and 12.96M RGB frames. This dataset also includes brightness, temperature, humidity, smoke, and airflow values covering all flight routes. The platform supports the customization of simulators, UAVs, sensors, flight algorithms, formation modes, and ELH tasks. Through a visual control window, this platform allows users to collect customized datasets through one‑click deployment online and to verify algorithms by closed‑loop simulation. U2UData+ also introduces an ELH task for wildlife conservation and provides comprehensive benchmarks with 9 SOTA models. U2UData+ can be found at https://fengtt42.github.io/U2UData‑2/.
Authors: Tao Huang, Hongbo Pan, Nanxi Zhou, Siyuan Zou, Shun Zhou
Abstract: Sub‑pixel matching of multimodal optical images is a critical step in combined application of multiple sensors. However structural noise and inconsistencies arising from variations in multimodal image responses usually limit the accuracy of matching. Phase congruency mutual‑structure weighted least absolute deviation (PCWLAD) is developed as a coarse‑to‑fine framework. In the coarse matching stage, we preserve the complete structure and use an enhanced cross‑modal similarity criterion to mitigate structural information loss by PC noise filtering. In the fine matching stage, a mutual‑structure filtering and weighted least absolute deviation‑based is introduced to enhance inter‑modal structural consistency and accurately estimate sub‑pixel displacements adaptively. Experiments on three multimodal datasets‑Landsat visible‑infrared, short‑range visible‑near‑infrared, and UAV optical image pairs demonstrate that PCWLAD consistently outperforms eight state‑of‑the‑art methods, achieving an average matching accuracy of approximately 0.4 pixels. The software and datasets are publicly available at https://github.com/huangtaocsu/PCWLAD.
Authors: Baorun Li, Chengrui Zhu, Siyi Du, Bingran Chen, Jie Ren, Wenfei Wang, Yong Liu, Jiajun Lv
Abstract: Extrinsic calibration is essential for multi‑sensor fusion, existing methods rely on structured targets or fully‑excited data, limiting real‑world applicability. Online calibration further suffers from weak excitation, leading to unreliable estimates. To address these limitations, we propose a reinforcement learning (RL)‑based extrinsic calibration framework that formulates extrinsic calibration as a decision‑making problem, directly optimizes SE(3) extrinsics to enhance odometry accuracy. Our approach leverages a probabilistic Bingham distribution to model 3D rotations, ensuring stable optimization while inherently retaining quaternion symmetry. A trajectory alignment reward mechanism enables robust calibration without structured targets by quantitatively evaluating estimated tightly‑coupled trajectory against a reference trajectory. Additionally, an automated data selection module filters uninformative samples, significantly improving efficiency and scalability for large‑scale datasets. Extensive experiments on UAVs, UGVs, and handheld platforms demonstrate that our method outperforms traditional optimization‑based approaches, achieving high‑precision calibration even under weak excitation conditions. Our framework simplifies deployment on diverse robotic platforms by eliminating the need for high‑quality initial extrinsics and enabling calibration from routine operating data. The code is available at https://github.com/APRIL‑ZJU/learn‑to‑calibrate.
Authors: Sijie Wang, Siqi Li, Yawei Zhang, Shangshu Yu, Shenghai Yuan, Rui She, Quanjiang Guo, JinXuan Zheng, Ong Kang Howe, Leonrich Chandra, Shrivarshann Srijeyan, Aditya Sivadas, Toshan Aggarwal, Heyuan Liu, Hongming Zhang, Chujie Chen, Junyu Jiang, Lihua Xie, Wee Peng Tay
Abstract: Multi‑modal perception is essential for unmanned aerial vehicle (UAV) operations, as it enables a comprehensive understanding of the UAVs' surrounding environment. However, most existing multi‑modal UAV datasets are primarily biased toward localization and 3D reconstruction tasks, or only support map‑level semantic segmentation due to the lack of frame‑wise annotations for both camera images and LiDAR point clouds. This limitation prevents them from being used for high‑level scene understanding tasks. To address this gap and advance multi‑modal UAV perception, we introduce UAVScenes, a large‑scale dataset designed to benchmark various tasks across both 2D and 3D modalities. Our benchmark dataset is built upon the well‑calibrated multi‑modal UAV dataset MARS‑LVIG, originally developed only for simultaneous localization and mapping (SLAM). We enhance this dataset by providing manually labeled semantic annotations for both frame‑wise images and LiDAR point clouds, along with accurate 6‑degree‑of‑freedom (6‑DoF) poses. These additions enable a wide range of UAV perception tasks, including segmentation, depth estimation, 6‑DoF localization, place recognition, and novel view synthesis (NVS). Our dataset is available at https://github.com/sijieaaa/UAVScenes
Authors: Van Chung Nguyen, Pratik Walunj, Chuong Le, An Duy Nguyen, Hung Manh La
Abstract: Nonlinear Model Predictive Control (NMPC) is a powerful approach for controlling highly dynamic robotic systems, as it accounts for system dynamics and optimizes control inputs at each step. However, its high computational complexity makes implementation on resource‑constrained microcontrollers impractical. While recent studies have demonstrated the feasibility of Model Predictive Control (MPC) with linearized dynamics on microcontrollers, applying full NMPC remains a significant challenge. This work presents an efficient solution for generating and deploying NMPC on microcontrollers (NMPCM) to control quadrotor UAVs. The proposed method optimizes computational efficiency while maintaining high control accuracy. Simulations in Gazebo/ROS and real‑world experiments validate the effectiveness of the approach, demonstrating its capability to achieve high‑frequency NMPC execution in real‑time systems. The code is available at: https://github.com/aralab‑unr/NMPCM.
Authors: Christopher Indris, Raiyan Rahman, Goetz Bramesfeld, Guanghui Wang
Abstract: Aerial wildlife tracking is critical for conservation efforts and relies on detecting small objects on the ground below the aircraft. It presents technical challenges: crewed aircraft are expensive, risky and disruptive; autonomous drones have limited computational capacity for onboard AI systems. Since the objects of interest may appear only a few pixels wide, small object detection is an inherently challenging computer vision subfield compounded by computational efficiency needs. This paper applies a patching augmentation to datasets to study model performance under various settings. A comparative study of three common yet architecturally diverse object detectors is conducted using the data, varying the patching method's hyperparameters against detection accuracy. Each model achieved at least 93% mAP@IoU=0.5 on at least one patching configuration. Statistical analyses provide an in‑depth commentary on the effects of various factors. Analysis also shows that faster, simpler models are about as effective as models that require more computational power for this task and perform well given limited patch scales, encouraging UAV deployment. Datasets and models will be made available via https://github.com/chrisindris/Moose.
Authors: Kangcheng Bin, Chen Chen, Ting Hu, Jiahao Qi, Ping Zhong
Abstract: Multimodal fusion has become a key enabler for UAV‑based object detection, as each modality provides complementary cues for robust feature extraction. However, due to significant differences in resolution, field of view, and sensing characteristics across modalities, accurate registration is a prerequisite before fusion. Despite its importance, there is currently no publicly available benchmark specifically designed for multimodal registration in UAV‑based aerial scenarios, which severely limits the development and evaluation of advanced registration methods under real‑world conditions. To bridge this gap, we present ATR‑UMMIM, the first benchmark dataset specifically tailored for multimodal image registration in UAV‑based applications. This dataset includes 7,969 triplets of raw visible, infrared, and precisely registered visible images captured covers diverse scenarios including flight altitudes from 80m to 300m, camera angles from 0° to 75°, and all‑day, all‑year temporal variations under rich weather and illumination conditions. To ensure high registration quality, we design a semi‑automated annotation pipeline to introduce reliable pixel‑level ground truth to each triplet. In addition, each triplet is annotated with six imaging condition attributes, enabling benchmarking of registration robustness under real‑world deployment settings. To further support downstream tasks, we provide object‑level annotations on all registered images, covering 11 object categories with 77,753 visible and 78,409 infrared bounding boxes. We believe ATR‑UMMIM will serve as a foundational benchmark for advancing multimodal registration, fusion, and perception in real‑world UAV scenarios. The datatset can be download from https://github.com/supercpy/ATR‑UMMIM
Authors: Huy Nguyen, Kien Nguyen, Akila Pemasiri, Akmal Jahan, Clinton Fookes, Sridha Sridharan
Abstract: Person re‑identification (Re‑ID) across visible and infrared modalities is crucial for 24‑hour surveillance systems, but existing datasets primarily focus on ground‑level perspectives. While ground‑based IR systems offer nighttime capabilities, they suffer from occlusions, limited coverage, and vulnerability to obstructions‑‑problems that aerial perspectives uniquely solve. To address these limitations, we introduce AG‑VPReID.VIR, the first aerial‑ground cross‑modality video‑based person Re‑ID dataset. This dataset captures 1,837 identities across 4,861 tracklets (124,855 frames) using both UAV‑mounted and fixed CCTV cameras in RGB and infrared modalities. AG‑VPReID.VIR presents unique challenges including cross‑viewpoint variations, modality discrepancies, and temporal dynamics. Additionally, we propose TCC‑VPReID, a novel three‑stream architecture designed to address the joint challenges of cross‑platform and cross‑modality person Re‑ID. Our approach bridges the domain gaps between aerial‑ground perspectives and RGB‑IR modalities, through style‑robust feature learning, memory‑based cross‑view adaptation, and intermediary‑guided temporal modeling. Experiments show that AG‑VPReID.VIR presents distinctive challenges compared to existing datasets, with our TCC‑VPReID framework achieving significant performance gains across multiple evaluation protocols. Dataset and code are available at https://github.com/agvpreid25/AG‑VPReID.VIR.
Authors: Kostas Karakontis, Thanos Petsanis, Athanasios Ch. Kapoutsis, Pavlos Ch. Kapoutsis, Elias B. Kosmatopoulos
Abstract: Multi‑UAV Coverage Path Planning (mCPP) algorithms in popular commercial software typically treat a Region of Interest (RoI) only as a 2D plane, ignoring important3D structure characteristics. This leads to incomplete 3Dreconstructions, especially around occluded or vertical surfaces. In this paper, we propose a modular algorithm that can extend commercial two‑dimensional path planners to facilitate terrain‑aware planning by adjusting altitude and camera orientations. To demonstrate it, we extend the well‑known DARP (Divide Areas for Optimal Multi‑Robot Coverage Path Planning) algorithm and produce DARP‑3D. We present simulation results in multiple 3D environments and a real‑world flight test using DJI hardware. Compared to baseline, our approach consistently captures improved 3D reconstructions, particularly in areas with significant vertical features. An open‑source implementation of the algorithm is available here:https://github.com/konskara/TerraPlan
Authors: Peiqi Chen, Lei Yu, Yi Wan, Yingying Pei, Xinyi Liu, Yongxiang Yao, Yingying Zhang, Lixiang Ru, Liheng Zhong, Jingdong Chen, Ming Yang, Yongjun Zhang
Abstract: Semi‑dense feature matching methods have shown strong performance in challenging scenarios. However, the existing pipeline relies on a global search across the entire feature map to establish coarse matches, limiting further improvements in accuracy and efficiency. Motivated by this limitation, we propose a novel pipeline, CasP, which leverages cascaded correspondence priors for guidance. Specifically, the matching stage is decomposed into two progressive phases, bridged by a region‑based selective cross‑attention mechanism designed to enhance feature discriminability. In the second phase, one‑to‑one matches are determined by restricting the search range to the one‑to‑many prior areas identified in the first phase. Additionally, this pipeline benefits from incorporating high‑level features, which helps reduce the computational costs of low‑level feature extraction. The acceleration gains of CasP increase with higher resolution, and our lite model achieves a speedup of ~2.2× at a resolution of 1152 compared to the most efficient method, ELoFTR. Furthermore, extensive experiments demonstrate its superiority in geometric estimation, particularly with impressive cross‑domain generalization. These advantages highlight its potential for latency‑sensitive and high‑robustness applications, such as SLAM and UAV systems. Code is available at https://github.com/pq‑chen/CasP.
Authors: Yang Zhou, Junjie Li, CongYang Ou, Dawei Yan, Haokui Zhang, Xizhe Xue
Abstract: Due to its extensive applications, aerial image object detection has long been a hot topic in computer vision. In recent years, advancements in Unmanned Aerial Vehicles (UAV) technology have further propelled this field to new heights, giving rise to a broader range of application requirements. However, traditional UAV aerial object detection methods primarily focus on detecting predefined categories, which significantly limits their applicability. The advent of cross‑modal text‑image alignment (e.g., CLIP) has overcome this limitation, enabling open‑vocabulary object detection (OVOD), which can identify previously unseen objects through natural language descriptions. This breakthrough significantly enhances the intelligence and autonomy of UAVs in aerial scene understanding. This paper presents a comprehensive survey of OVOD in the context of UAV aerial scenes. We begin by aligning the core principles of OVOD with the unique characteristics of UAV vision, setting the stage for a specialized discussion. Building on this foundation, we construct a systematic taxonomy that categorizes existing OVOD methods for aerial imagery and provides a comprehensive overview of the relevant datasets. This structured review enables us to critically dissect the key challenges and open problems at the intersection of these fields. Finally, based on this analysis, we outline promising future research directions and application prospects. This survey aims to provide a clear road map and a valuable reference for both newcomers and seasoned researchers, fostering innovation in this rapidly evolving domain. We keep tracing related works at https://github.com/zhouyang2002/OVOD‑in‑UVA‑imagery
Authors: Peijun Wang, Jinhua Zhao
Abstract: Small object detection remains a challenging problem in the field of object detection. To address this challenge, we propose an enhanced YOLOv8‑based model, SOD‑YOLO. This model integrates an ASF mechanism in the neck to enhance multi‑scale feature fusion, adds a Small Object Detection Layer (named P2) to provide higher‑resolution feature maps for better small object detection, and employs Soft‑NMS to refine confidence scores and retain true positives. Experimental results demonstrate that SOD‑YOLO significantly improves detection performance, achieving a 36.1% increase in mAP_50:95 and 20.6% increase in mAP_50 on the VisDrone2019‑DET dataset compared to the baseline model. These enhancements make SOD‑YOLO a practical and efficient solution for small object detection in UAV imagery. Our source code, hyper‑parameters, and model weights are available at https://github.com/iamwangxiaobai/SOD‑YOLO.
Authors: Xiang Yu, Xinyao Liu, Guang Liang
Abstract: Tracking small, agile multi‑objects (SMOT), such as birds, from an Unmanned Aerial Vehicle (UAV) perspective is a highly challenging computer vision task. The difficulty stems from three main sources: the extreme scarcity of target appearance features, the complex motion entanglement caused by the combined dynamics of the camera and the targets themselves, and the frequent occlusions and identity ambiguity arising from dense flocking behavior. This paper details our championship‑winning solution in the MVA 2025 "Finding Birds" Small Multi‑Object Tracking Challenge (SMOT4SB), which adopts the tracking‑by‑detection paradigm with targeted innovations at both the detection and association levels. On the detection side, we propose a systematic training enhancement framework named SliceTrain. This framework, through the synergy of 'deterministic full‑coverage slicing' and 'slice‑level stochastic augmentation, effectively addresses the problem of insufficient learning for small objects in high‑resolution image training. On the tracking side, we designed a robust tracker that is completely independent of appearance information. By integrating a motion direction maintenance (EMA) mechanism and an adaptive similarity metric combining bounding box expansion and distance penalty into the OC‑SORT framework, our tracker can stably handle irregular motion and maintain target identities. Our method achieves state‑of‑the‑art performance on the SMOT4SB public test set, reaching an SO‑HOTA score of 55.205, which fully validates the effectiveness and advancement of our framework in solving complex real‑world SMOT problems. The source code will be made available at https://github.com/Salvatore‑Love/YOLOv8‑SMOT.
Authors: Kongwu Huang, Shiyi Mu, Jun Jiang, Yuan Gao, Shugong Xu
Abstract: Scaling laws have achieved success in LLM and foundation models. To explore their potential in ISAC research, we propose Great‑X. This single‑engine multimodal data twin platform reconstructs the ray‑tracing computation of Sionna within Unreal Engine and is deeply integrated with autonomous driving tools. This enables efficient and synchronized simulation of multimodal data, including CSI, RGB, Radar, and LiDAR. Based on this platform, we construct an open‑source, large‑scale, low‑altitude UAV multimodal synaesthesia dataset named Great‑MSD, and propose a baseline CSI‑based UAV 3D localization algorithm, demonstrating its feasibility and generalizability across different CSI simulation engines. The related code and dataset will be made available at: https://github.com/hkw‑xg/Great‑MCD.
Authors: Antonella Barisic Kulas, Andreja Jurasovic, Stjepan Bogdan
Abstract: Thermal imaging from unmanned aerial vehicles (UAVs) holds significant potential for applications in search and rescue, wildlife monitoring, and emergency response, especially under low‑light or obscured conditions. However, the scarcity of large‑scale, diverse thermal aerial datasets limits the advancement of deep learning models in this domain, primarily due to the high cost and logistical challenges of collecting thermal data. In this work, we introduce a novel procedural pipeline for generating synthetic thermal images from an aerial perspective. Our method integrates arbitrary object classes into existing thermal backgrounds by providing control over the position, scale, and orientation of the new objects, while aligning them with the viewpoints of the background. We enhance existing thermal datasets by introducing new object categories, specifically adding a drone class in urban environments to the HIT‑UAV dataset and an animal category to the MONET dataset. In evaluating these datasets for object detection task, we showcase strong performance across both new and existing classes, validating the successful expansion into new applications. Through comparative analysis, we show that thermal detectors outperform their visible‑light‑trained counterparts and highlight the importance of replicating aerial viewing angles. Project page: https://github.com/larics/thermal_aerial_synthetic.
Authors: Ziqin Wang, Jinyu Chen, Xiangyi Zheng, Qinan Liao, Linjiang Huang, Si Liu
Abstract: Unmanned Aerial Vehicles, operating in environments with relatively few obstacles, offer high maneuverability and full three‑dimensional mobility. This allows them to rapidly approach objects and perform a wide range of tasks often challenging for ground robots, making them ideal for exploration, inspection, aerial imaging, and everyday assistance. In this paper, we introduce AirStar, a UAV‑centric embodied platform that turns a UAV into an intelligent aerial assistant: a large language model acts as the cognitive core for environmental understanding, contextual reasoning, and task planning. AirStar accepts natural interaction through voice commands and gestures, removing the need for a remote controller and significantly broadening its user base. It combines geospatial knowledge‑driven long‑distance navigation with contextual reasoning for fine‑grained short‑range control, resulting in an efficient and accurate vision‑and‑language navigation (VLN) capability.Furthermore, the system also offers built‑in capabilities such as cross‑modal question answering, intelligent filming, and target tracking. With a highly extensible framework, it supports seamless integration of new functionalities, paving the way toward a general‑purpose, instruction‑driven intelligent UAV agent. The supplementary PPT is available at \hrefhttps://buaa‑colalab.github.io/airstar.github.iohttps://buaa‑colalab.github.io/airstar.github.io.
Authors: Wei Li, Jiaman Tang, Yang Li, Beihao Xia, Ligang Tan, Hongmao Qin
Abstract: Unmanned Aerial Vehicle (UAV) object detection has been widely used in traffic management, agriculture, emergency rescue, etc. However, it faces significant challenges, including occlusions, small object sizes, and irregular shapes. These challenges highlight the necessity for a robust and efficient multimodal UAV object detection method. Mamba has demonstrated considerable potential in multimodal image fusion. Leveraging this, we propose UAVD‑Mamba, a multimodal UAV object detection framework based on Mamba architectures. To improve geometric adaptability, we propose the Deformable Token Mamba Block (DTMB) to generate deformable tokens by incorporating adaptive patches from deformable convolutions alongside normal patches from normal convolutions, which serve as the inputs to the Mamba Block. To optimize the multimodal feature complementarity, we design two separate DTMBs for the RGB and infrared (IR) modalities, with the outputs from both DTMBs integrated into the Mamba Block for feature extraction and into the Fusion Mamba Block for feature fusion. Additionally, to improve multiscale object detection, especially for small objects, we stack four DTMBs at different scales to produce multiscale feature representations, which are then sent to the Detection Neck for Mamba (DNM). The DNM module, inspired by the YOLO series, includes modifications to the SPPF and C3K2 of YOLOv11 to better handle the multiscale features. In particular, we employ cross‑enhanced spatial attention before the DTMB and cross‑channel attention after the Fusion Mamba Block to extract more discriminative features. Experimental results on the DroneVehicle dataset show that our method outperforms the baseline OAFA method by 3.6% in the mAP metric. Codes will be released at https://github.com/GreatPlum‑hnu/UAVD‑Mamba.git.
Authors: Nuo Chen, Chao Xiao, Yimian Dai, Shiman He, Miao Li, Wei An
Abstract: Small object detection (SOD) in anti‑UAV task is a challenging problem due to the small size of UAVs and complex backgrounds. Traditional frame‑based cameras struggle to detect small objects in complex environments due to their low frame rates, limited dynamic range, and data redundancy. Event cameras, with microsecond temporal resolution and high dynamic range, provide a more effective solution for SOD. However, existing event‑based object detection datasets are limited in scale, feature large targets size, and lack diverse backgrounds, making them unsuitable for SOD benchmarks. In this paper, we introduce a Event‑based Small object detection (EVSOD) dataset (namely EV‑UAV), the first large‑scale, highly diverse benchmark for anti‑UAV tasks. It includes 147 sequences with over 2.3 million event‑level annotations, featuring extremely small targets (averaging 6.8 × 5.4 pixels) and diverse scenarios such as urban clutter and extreme lighting conditions. Furthermore, based on the observation that small moving targets form continuous curves in spatiotemporal event point clouds, we propose Event based Sparse Segmentation Network (EV‑SpSegNet), a novel baseline for event segmentation in point cloud space, along with a Spatiotemporal Correlation (STC) loss that leverages motion continuity to guide the network in retaining target events. Extensive experiments on the EV‑UAV dataset demonstrate the superiority of our method and provide a benchmark for future research in EVSOD. The dataset and code are at https://github.com/ChenYichen9527/Ev‑UAV.
Authors: Haiping Yang, Huaxing Liu, Wei Wu, Zuohui Chen, Ning Wu
Abstract: Unmanned aerial vehicles (UAVs) are increasingly employed in diverse applications such as land surveying, material transport, and environmental monitoring. Following missions like data collection or inspection, UAVs must land safely at docking stations for storage or recharging, which is an essential requirement for ensuring operational continuity. However, accurate landing remains challenging due to factors like GPS signal interference. To address this issue, we propose a deviation warning system for UAV landings, powered by a novel vision‑based model called AeroLite‑MDNet. This model integrates a multiscale fusion module for robust cross‑scale object detection and incorporates a segmentation branch for efficient orientation estimation. We introduce a new evaluation metric, Average Warning Delay (AWD), to quantify the system's sensitivity to landing deviations. Furthermore, we contribute a new dataset, UAVLandData, which captures real‑world landing deviation scenarios to support training and evaluation. Experimental results show that our system achieves an AWD of 0.7 seconds with a deviation detection accuracy of 98.6%, demonstrating its effectiveness in enhancing UAV landing reliability. Code will be available at https://github.com/ITTTTTI/Maskyolo.git
Authors: Xiangbo Gao, Yuheng Wu, Fengze Yang, Xuewen Luo, Keshu Wu, Xinghao Chen, Yuping Wang, Chenxi Liu, Yang Zhou, Zhengzhong Tu
Abstract: While multi‑vehicular collaborative driving demonstrates clear advantages over single‑vehicle autonomy, traditional infrastructure‑based V2X systems remain constrained by substantial deployment costs and the creation of "uncovered danger zones" in rural and suburban areas. We present AirV2X‑Perception, a large‑scale dataset that leverages Unmanned Aerial Vehicles (UAVs) as a flexible alternative or complement to fixed Road‑Side Units (RSUs). Drones offer unique advantages over ground‑based perception: complementary bird's‑eye‑views that reduce occlusions, dynamic positioning capabilities that enable hovering, patrolling, and escorting navigation rules, and significantly lower deployment costs compared to fixed infrastructure. Our dataset comprises 6.73 hours of drone‑assisted driving scenarios across urban, suburban, and rural environments with varied weather and lighting conditions. The AirV2X‑Perception dataset facilitates the development and standardized evaluation of Vehicle‑to‑Drone (V2D) algorithms, addressing a critical gap in the rapidly expanding field of aerial‑assisted autonomous driving systems. The dataset and development kits are open‑sourced at https://github.com/taco‑group/AirV2X‑Perception.
Authors: Jan Michalczyk, Stephan Weiss, Jan Steinbrener
Abstract: Using 3D point clouds in odometry estimation in robotics often requires finding a set of correspondences between points in subsequent scans. While there are established methods for point clouds of sufficient quality, state‑of‑the‑art still struggles when this quality drops. Thus, this paper presents a novel learning‑based framework for predicting robust point correspondences between pairs of noisy, sparse and unstructured 3D point clouds from a light‑weight, low‑power, inexpensive, consumer‑grade System‑on‑Chip (SoC) Frequency Modulated Continuous Wave (FMCW) radar sensor. Our network is based on the transformer architecture which allows leveraging the attention mechanism to discover pairs of points in consecutive scans with the greatest mutual affinity. The proposed network is trained in a self‑supervised way using set‑based multi‑label classification cross‑entropy loss, where the ground‑truth set of matches is found by solving the Linear Sum Assignment (LSA) optimization problem, which avoids tedious hand annotation of the training data. Additionally, posing the loss calculation as multi‑label classification permits supervising on point correspondences directly instead of on odometry error, which is not feasible for sparse and noisy data from the SoC radar we use. We evaluate our method with an open‑source state‑of‑the‑art Radar‑Inertial Odometry (RIO) framework in real‑world Unmanned Aerial Vehicle (UAV) flights and with the widely used public Coloradar dataset. Evaluation shows that the proposed method improves the position estimation accuracy by over 14 % and 19 % on average, respectively. The open source code and datasets can be found here: https://github.com/aau‑cns/radar_transformer.
Authors: Fei Zhou
Abstract: Remote sensing change detection is used in urban planning, terrain analysis, and environmental monitoring by analyzing feature changes in the same area over time. In this paper, we propose a large language model (LLM) augmented inference approach (SegChange‑R1), which enhances the detection capability by integrating textual descriptive information and guides the model to focus on relevant change regions, accelerating convergence. We designed a linear attention‑based spatial transformation module (BEV) to address modal misalignment by unifying features from different times into a BEV space. Furthermore, we introduce DVCD, a novel dataset for building change detection from UAV viewpoints. Experiments on four widely‑used datasets demonstrate significant improvements over existing method The code and pre‑trained models are available in https://github.com/Yu‑Zhouz/SegChange‑R1.
Authors: Yunhao Hou, Bochao Zou, Min Zhang, Ran Chen, Shangdong Yang, Yanmei Zhang, Junbao Zhuo, Siheng Chen, Jiansheng Chen, Huimin Ma
Abstract: By sharing information across multiple agents, collaborative perception helps autonomous vehicles mitigate occlusions and improve overall perception accuracy. While most previous work focus on vehicle‑to‑vehicle and vehicle‑to‑infrastructure collaboration, with limited attention to aerial perspectives provided by UAVs, which uniquely offer dynamic, top‑down views to alleviate occlusions and monitor large‑scale interactive environments. A major reason for this is the lack of high‑quality datasets for aerial‑ground collaborative scenarios. To bridge this gap, we present AGC‑Drive, the first large‑scale real‑world dataset for Aerial‑Ground Cooperative 3D perception. The data collection platform consists of two vehicles, each equipped with five cameras and one LiDAR sensor, and one UAV carrying a forward‑facing camera and a LiDAR sensor, enabling comprehensive multi‑view and multi‑agent perception. Consisting of approximately 80K LiDAR frames and 360K images, the dataset covers 14 diverse real‑world driving scenarios, including urban roundabouts, highway tunnels, and on/off ramps. Notably, 17% of the data comprises dynamic interaction events, including vehicle cut‑ins, cut‑outs, and frequent lane changes. AGC‑Drive contains 350 scenes, each with approximately 100 frames and fully annotated 3D bounding boxes covering 13 object categories. We provide benchmarks for two 3D perception tasks: vehicle‑to‑vehicle collaborative perception and vehicle‑to‑UAV collaborative perception. Additionally, we release an open‑source toolkit, including spatiotemporal alignment verification tools, multi‑agent visualization systems, and collaborative annotation utilities. The dataset and code are available at https://github.com/PercepX/AGC‑Drive.
Authors: Worasit Sangjan, Piyush Pandey, Norman B. Best, Jacob D. Washburn
Abstract: Accurate identification of individual plants from unmanned aerial vehicle (UAV) images is essential for advancing high‑throughput phenotyping and supporting data‑driven decision‑making in plant breeding. This study presents MatchPlant, a modular, graphical user interface‑supported, open‑source Python pipeline for UAV‑based single‑plant detection and geospatial trait extraction. MatchPlant enables end‑to‑end workflows by integrating UAV image processing, user‑guided annotation, Convolutional Neural Network model training for object detection, forward projection of bounding boxes onto an orthomosaic, and shapefile generation for spatial phenotypic analysis. In an early‑season maize case study, MatchPlant achieved reliable detection performance (validation AP: 89.6%, test AP: 85.9%) and effectively projected bounding boxes, covering 89.8% of manually annotated boxes with 87.5% of projections achieving an Intersection over Union (IoU) greater than 0.5. Trait values extracted from predicted bounding instances showed high agreement with manual annotations (r = 0.87‑0.97, IoU >= 0.4). Detection outputs were reused across time points to extract plant height and Normalized Difference Vegetation Index with minimal additional annotation, facilitating efficient temporal phenotyping. By combining modular design, reproducibility, and geospatial precision, MatchPlant offers a scalable framework for UAV‑based plant‑level analysis with broad applicability in agricultural and environmental monitoring.
Authors: Hongyu Chen, Jiping Liu, Yong Wang, Jun Zhu, Dejun Feng, Yakun Xie
Abstract: Unsupervised Domain Adaptation (UDA) has shown promise in effectively alleviating the performance degradation caused by domain gaps between source and target domains, and it can potentially be generalized to UAV object detection in adverse scenes. However, existing UDA studies are based on natural images or clear UAV imagery, and research focused on UAV imagery in adverse conditions is still in its infancy. Moreover, due to the unique perspective of UAVs and the interference from adverse conditions, these methods often fail to accurately align features and are influenced by limited or noisy pseudo‑labels. To address this, we propose the first benchmark for UAV object detection in adverse scenes, the Statistical Feedback‑Driven Threshold and Mask Adjustment Teacher‑Student Framework (SF‑TMAT). Specifically, SF‑TMAT introduces a design called Dynamic Step Feedback Mask Adjustment Autoencoder (DSFMA), which dynamically adjusts the mask ratio and reconstructs feature maps by integrating training progress and loss feedback. This approach dynamically adjusts the learning focus at different training stages to meet the model's needs for learning features at varying levels of granularity. Additionally, we propose a unique Variance Feedback Smoothing Threshold (VFST) strategy, which statistically computes the mean confidence of each class and dynamically adjusts the selection threshold by incorporating a variance penalty term. This strategy improves the quality of pseudo‑labels and uncovers potentially valid labels, thus mitigating domain bias. Extensive experiments demonstrate the superiority and generalization capability of the proposed SF‑TMAT in UAV object detection under adverse scene conditions. The Code is released at https://github.com/ChenHuyoo .
Authors: Yixuan Huang, Jie Yang, Shuqiang Xia, Chao-Kai Wen, Shi Jin
Abstract: The low‑altitude economy is emerging as a key driver of future economic growth, necessitating effective flight activity surveillance using existing mobile cellular network sensing capabilities. However, traditional monostatic and localizationbased sensing methods face challenges in fusing sensing results and matching channel parameters. To address these challenges, we model low‑altitude surveillance as a compressed sensing (CS)‑based imaging problem by leveraging the cooperation of multiple base stations and the inherent sparsity of aerial images. Additionally, we derive the point spread function to analyze the influences of different antenna, subcarrier, and resolution settings on the imaging performance. Given the random spatial distribution of unmanned aerial vehicles (UAVs), we propose a physics‑embedded learning method to mitigate off‑grid errors in traditional CS‑based approaches. Furthermore, to enhance rare UAV detection in vast low‑altitude airspace, we integrate an online hard example mining scheme into the loss function design, enabling the network to adaptively focus on samples with significant discrepancies from the ground truth during training. Simulation results demonstrate the effectiveness of the proposed low‑altitude surveillance framework. The proposed physicsembedded learning algorithm achieves a 97.55% detection rate, significantly outperforming traditional CS‑based methods under off‑grid conditions. Part of the source code for this paper will be soon accessed at https://github.com/kiwi1944/LAEImager.
Authors: Shoon Kit Lim, Melissa Jia Ying Chong, Jing Huey Khor, Ting Yang Ling
Abstract: Recent advances in agentic and physical artificial intelligence (AI) have largely focused on ground‑based platforms such as humanoid and wheeled robots, leaving aerial robots relatively underexplored. Meanwhile, state‑of‑the‑art unmanned aerial vehicle (UAV) multimodal vision‑language systems typically rely on closed‑source models accessible only to well‑resourced organizations. To democratize natural language control of autonomous drones, we present an open‑source agentic framework that integrates PX4‑based flight control, Robot Operating System 2 (ROS 2) middleware, and locally hosted models using Ollama. We evaluate performance both in simulation and on a custom quadcopter platform, benchmarking four large language model (LLM) families for command generation and three vision‑language model (VLM) families for scene understanding.
Authors: Cédric Léonard, Dirk Stober, Martin Schulz
Abstract: New UAV technologies and the NewSpace era are transforming Earth Observation missions and data acquisition. Numerous small platforms generate large data volume, straining bandwidth and requiring onboard decision‑making to transmit high‑quality information in time. While Machine Learning allows real‑time autonomous processing, FPGAs balance performance with adaptability to mission‑specific requirements, enabling onboard deployment. This review systematically analyzes 68 experiments deploying ML models on FPGAs for Remote Sensing applications. We introduce two distinct taxonomies to capture both efficient model architectures and FPGA implementation strategies. For transparency and reproducibility, we follow PRISMA 2020 guidelines and share all data and code at https://github.com/CedricLeon/Survey_RS‑ML‑FPGA.
Authors: Zhaoying Wang, Xingxing Zuo, Wei Dong
Abstract: Lightweight long‑range mapping is critical for safe navigation of UAV swarms in large‑scale unknown environments. Traditional stereo vision systems with fixed short baselines face limited perception ranges. To address this, we propose Flying Co‑Stereo, a cross‑agent collaborative stereo vision system that leverages the wide‑baseline spatial configuration of two UAVs for long‑range dense mapping. Key innovations include: (1) a dual‑spectrum visual‑inertial‑ranging estimator for robust baseline estimation; (2) a hybrid feature association strategy combining deep learning‑based cross‑agent matching and optical‑flow‑based intra‑agent tracking; (3) A sparse‑to‑dense depth recovery scheme,refining dense monocular depth predictions using exponential fitting of long‑range triangulated sparse landmarks for precise metric‑scale mapping. Experiments demonstrate the Flying Co‑Stereo system achieves dense 3D mapping up to 70 meters with 2.3%‑9.7% relative error, outperforming conventional systems by up to 350% in depth range and 450% in coverage area. The project webpage: https://xingxingzuo.github.io/flying_co_stereo
Authors: Junhuan Liu, San Jiang, Wei Ge, Wei Huang, Bingxuan Guo, Qingquan Li
Abstract: The primary contribution of this paper is a challenging benchmark dataset, UAVPairs, and a training pipeline designed for match pair retrieval of large‑scale UAV images. First, the UAVPairs dataset, comprising 21,622 high‑resolution images across 30 diverse scenes, is constructed; the 3D points and tracks generated by SfM‑based 3D reconstruction are employed to define the geometric similarity of image pairs, ensuring genuinely matchable image pairs are used for training. Second, to solve the problem of expensive mining cost for global hard negative mining, a batched nontrivial sample mining strategy is proposed, leveraging the geometric similarity and multi‑scene structure of the UAVPairs to generate training samples as to accelerate training. Third, recognizing the limitation of pair‑based losses, the ranked list loss is designed to improve the discrimination of image retrieval models, which optimizes the global similarity structure constructed from the positive set and negative set. Finally, the effectiveness of the UAVPairs dataset and training pipeline is validated through comprehensive experiments on three distinct large‑scale UAV datasets. The experiment results demonstrate that models trained with the UAVPairs dataset and the ranked list loss achieve significantly improved retrieval accuracy compared to models trained on existing datasets or with conventional losses. Furthermore, these improvements translate to enhanced view graph connectivity and higher quality of reconstructed 3D models. The models trained by the proposed approach perform more robustly compared with hand‑crafted global features, particularly in challenging repetitively textured scenes and weakly textured scenes. For match pair retrieval of large‑scale UAV images, the trained image retrieval models offer an effective solution. The dataset would be made publicly available at https://github.com/json87/UAVPairs.
Authors: Mengjingcheng Mo, Xinyang Tong, Mingpi Tan, Jiaxu Leng, Jiankang Zheng, Yiran Liu, Haosheng Chen, Ji Gan, Weisheng Li, Xinbo Gao
Abstract: While unmanned aerial vehicles (UAVs) offer wide‑area, high‑altitude coverage for anomaly detection, they face challenges such as dynamic viewpoints, scale variations, and complex scenes. Existing datasets and methods, mainly designed for fixed ground‑level views, struggle to adapt to these conditions, leading to significant performance drops in drone‑view scenarios. To bridge this gap, we introduce A2Seek (Aerial Anomaly Seek), a large‑scale, reasoning‑centric benchmark dataset for aerial anomaly understanding. This dataset covers various scenarios and environmental conditions, providing high‑resolution real‑world aerial videos with detailed annotations, including anomaly categories, frame‑level timestamps, region‑level bounding boxes, and natural language explanations for causal reasoning. Building on this dataset, we propose A2Seek‑R1, a novel reasoning framework that generalizes R1‑style strategies to aerial anomaly understanding, enabling a deeper understanding of "Where" anomalies occur and "Why" they happen in aerial frames. To this end, A2Seek‑R1 first employs a graph‑of‑thought (GoT)‑guided supervised fine‑tuning approach to activate the model's latent reasoning capabilities on A2Seek. Then, we introduce Aerial Group Relative Policy Optimization (A‑GRPO) to design rule‑based reward functions tailored to aerial scenarios. Furthermore, we propose a novel "seeking" mechanism that simulates UAV flight behavior by directing the model's attention to informative regions. Extensive experiments demonstrate that A2Seek‑R1 achieves up to a 22.04% improvement in AP for prediction accuracy and a 13.9% gain in mIoU for anomaly localization, exhibiting strong generalization across complex environments and out‑of‑distribution scenarios. Our dataset and code are released at https://2‑mo.github.io/A2Seek/.
Authors: Mingning Guo, Mengwei Wu, Jiarun He, Shaoxian Li, Haifeng Li, Chao Tao
Abstract: With the rapid advancement of low‑altitude remote sensing and Vision‑Language Models (VLMs), Embodied Agents based on Unmanned Aerial Vehicles (UAVs) have shown significant potential in autonomous tasks. However, current evaluation methods for UAV‑Embodied Agents (UAV‑EAs) remain constrained by the lack of standardized benchmarks, diverse testing scenarios and open system interfaces. To address these challenges, we propose BEDI (Benchmark for Embodied Drone Intelligence), a systematic and standardized benchmark designed for evaluating UAV‑EAs. Specifically, we introduce a novel Dynamic Chain‑of‑Embodied‑Task paradigm based on the perception‑decision‑action loop, which decomposes complex UAV tasks into standardized, measurable subtasks. Building on this paradigm, we design a unified evaluation framework encompassing six core sub‑skills: semantic perception, spatial perception, motion control, tool utilization, task planning and action generation. Furthermore, we develop a hybrid testing platform that incorporates a wide range of both virtual and real‑world scenarios, enabling a comprehensive evaluation of UAV‑EAs across diverse contexts. The platform also offers open and standardized interfaces, allowing researchers to customize tasks and extend scenarios, thereby enhancing flexibility and scalability in the evaluation process. Finally, through empirical evaluations of several state‑of‑the‑art (SOTA) VLMs, we reveal their limitations in embodied UAV tasks, underscoring the critical role of the BEDI benchmark in advancing embodied intelligence research and model optimization. By filling the gap in systematic and standardized evaluation within this field, BEDI facilitates objective model comparison and lays a robust foundation for future development in this field. Our benchmark is now publicly available at https://github.com/lostwolves/BEDI.
Authors: Hongshu Guo, Zeyuan Ma, Yining Ma, Xinglin Zhang, Wei-Neng Chen, Yue-Jiao Gong
Abstract: Designing effective black‑box optimizers is hampered by limited problem‑specific knowledge and manual control that spans months for almost every detail. In this paper, we present DesignX, the first automated algorithm design framework that generates an effective optimizer specific to a given black‑box optimization problem within seconds. Rooted in the first principles, we identify two key sub‑tasks: 1) algorithm structure generation and 2) hyperparameter control. To enable systematic construction, a comprehensive modular algorithmic space is first built, embracing hundreds of algorithm components collected from decades of research. We then introduce a dual‑agent reinforcement learning system that collaborates on structural and parametric design through a novel cooperative training objective, enabling large‑scale meta‑training across 10k diverse instances. Remarkably, through days of autonomous learning, the DesignX‑generated optimizers continuously surpass human‑crafted optimizers by orders of magnitude, either on synthetic testbed or on realistic optimization scenarios such as Protein‑docking, AutoML and UAV path planning. Further in‑depth analysis reveals DesignX's capability to discover non‑trivial algorithm patterns beyond expert intuition, which, conversely, provides valuable design insights for the optimization community. We provide DesignX's Python project at~ https://github.com/MetaEvo/DesignX.
Authors: Vendi Ardianto Nugroho, Byung Moo Lee
Abstract: Millimeter‑wave (mmWave) communication enables high data rates for cellular‑connected Unmanned Aerial Vehicles (UAVs). However, a robust beam management remains challenging due to significant path loss and the dynamic mobility of UAVs, which can destabilize the UAV‑base station (BS) link. This research presents a GPS‑aided deep learning (DL) model that simultaneously predicts current and future optimal beams for UAV mmWave communications, maintaining a Top‑1 prediction accuracy exceeding 70% and an average power loss below 0.6 dB across all prediction steps. These outcomes stem from a proposed data set splitting method ensuring balanced label distribution, paired with a GPS preprocessing technique that extracts key positional features, and a DL architecture that maps sequential position data to beam index predictions. The model reduces overhead by approximately 93% (requiring the training of 2 ~ 3 beams instead of 32 beams) with 95% beam prediction accuracy guarantees, and ensures 94% to 96% of predictions exhibit mean power loss not exceeding 1 dB.
Authors: Fei Zhou, Yi Li, Mingqing Zhu
Abstract: In this paper, the dual‑optical attention fusion crowd head point counting model (TAPNet) is proposed to address the problem of the difficulty of accurate counting in complex scenes such as crowd dense occlusion and low light in crowd counting tasks under UAV view. The model designs a dual‑optical attention fusion module (DAFP) by introducing complementary information from infrared images to improve the accuracy and robustness of all‑day crowd counting. In order to fully utilize different modal information and solve the problem of inaccurate localization caused by systematic misalignment between image pairs, this paper also proposes an adaptive two‑optical feature decomposition fusion module (AFDF). In addition, we optimize the training strategy to improve the model robustness through spatial random offset data augmentation. Experiments on two challenging public datasets, DroneRGBT and GAIIC2, show that the proposed method outperforms existing techniques in terms of performance, especially in challenging dense low‑light scenes. Code is available at https://github.com/zz‑zik/TAPNet
Authors: Weihong Li, Xiaoqiong Liu, Heng Fan, Libo Zhang
Abstract: Recent advancements in visual object tracking have markedly improved the capabilities of unmanned aerial vehicle (UAV) tracking, which is a critical component in real‑world robotics applications. While the integration of hierarchical lightweight networks has become a prevalent strategy for enhancing efficiency in UAV tracking, it often results in a significant drop in network capacity, which further exacerbates challenges in UAV scenarios, such as frequent occlusions and extreme changes in viewing angles. To address these issues, we introduce a novel family of UAV trackers, termed CGTrack, which combines explicit and implicit techniques to expand network capacity within a coarse‑to‑fine framework. Specifically, we first introduce a Hierarchical Feature Cascade (HFC) module that leverages the spirit of feature reuse to increase network capacity by integrating the deep semantic cues with the rich spatial information, incurring minimal computational costs while enhancing feature representation. Based on this, we design a novel Lightweight Gated Center Head (LGCH) that utilizes gating mechanisms to decouple target‑oriented coordinates from previously expanded features, which contain dense local discriminative information. Extensive experiments on three challenging UAV tracking benchmarks demonstrate that CGTrack achieves state‑of‑the‑art performance while running fast. Code will be available at https://github.com/Nightwatch‑Fox11/CGTrack.
Authors: Jiuwu Hao, Liguo Sun, Yuting Wan, Yueyang Wu, Ti Xiang, Haolin Song, Pin Lv
Abstract: Collaborative perception enhances environmental awareness through inter‑agent communication and is regarded as a promising solution to intelligent transportation systems. However, existing collaborative methods for Unmanned Aerial Vehicles (UAVs) overlook the unique characteristics of the UAV perspective, resulting in substantial communication overhead. To address this issue, we propose a novel communication‑efficient collaborative perception framework based on late‑intermediate fusion, dubbed LIF. The core concept is to exchange informative and compact detection results and shift the fusion stage to the feature representation level. In particular, we leverage vision‑guided positional embedding (VPE) and box‑based virtual augmented feature (BoBEV) to effectively integrate complementary information from various agents. Additionally, we innovatively introduce an uncertainty‑driven communication mechanism that uses uncertainty evaluation to select high‑quality and reliable shared areas. Experimental results demonstrate that our LIF achieves superior performance with minimal communication bandwidth, proving its effectiveness and practicality. Code and models are available at https://github.com/uestchjw/LIF.
Authors: Mohammed Ayman Shalaby, Syed Shabbir Ahmed, Nicholas Dahdah, Charles Champagne Cossette, Jerome Le Ny, James Richard Forbes
Abstract: This paper introduces MILUV, a Multi‑UAV Indoor Localization dataset with UWB and Vision measurements. This dataset comprises 217 minutes of flight time over 36 experiments using three quadcopters, collecting ultra‑wideband (UWB) ranging data such as the raw timestamps and channel‑impulse response data, vision data from a stereo camera and a bottom‑facing monocular camera, inertial measurement unit data, height measurements from a laser rangefinder, magnetometer data, and ground‑truth poses from a motion‑capture system. The UWB data is collected from up to 12 transceivers affixed to mobile robots and static tripods in both line‑of‑sight and non‑line‑of‑sight conditions. The UAVs fly at a maximum speed of 4.418 m/s in an indoor environment with visual fiducial markers as features. MILUV is versatile and can be used for a wide range of applications beyond localization, but the primary purpose of MILUV is for testing and validating multi‑robot UWB‑ and vision‑based localization algorithms. The dataset can be downloaded at https://doi.org/10.25452/figshare.plus.28386041.v1. A development kit is presented alongside the MILUV dataset, which includes benchmarking algorithms such as visual‑inertial odometry, UWB‑based localization using an extended Kalman filter, and classification of CIR data using machine learning approaches. The development kit can be found at https://github.com/decargroup/miluv, and is supplemented with a website available at https://decargroup.github.io/miluv/.
Authors: Mengyuan Li, Changhong Fu, Ziyu Lu, Zijie Zhang, Haobo Zuo, Liangliang Yao
Abstract: Thermal imaging can greatly enhance the application of intelligent unmanned aerial vehicles (UAV) in challenging environments. However, the inherent low resolution of thermal sensors leads to insufficient details and blurred boundaries. Super‑resolution (SR) offers a promising solution to address this issue, while most existing SR methods are designed for fixed‑scale SR. They are computationally expensive and inflexible in practical applications. To address above issues, this work proposes a novel any‑scale thermal SR method (AnyTSR) for UAV within a single model. Specifically, a new image encoder is proposed to explicitly assign specific feature code to enable more accurate and flexible representation. Additionally, by effectively embedding coordinate offset information into the local feature ensemble, an innovative any‑scale upsampler is proposed to better understand spatial relationships and reduce artifacts. Moreover, a novel dataset (UAV‑TSR), covering both land and water scenes, is constructed for thermal SR tasks. Experimental results demonstrate that the proposed method consistently outperforms state‑of‑the‑art methods across all scaling factors as well as generates more accurate and detailed high‑resolution images. The code is located at https://github.com/vision4robotics/AnyTSR.
Authors: Thu Hang Khuat, Duy-Nam Bui, Hoa TT. Nguyen, Mien L. Trinh, Minh T. Nguyen, Manh Duong Phung
Abstract: Cooperative path planning is gaining its importance due to the increasing demand on using multiple unmanned aerial vehicles (UAVs) for complex missions. This work addresses the problem by introducing a new algorithm named MultiRRT that extends the rapidly exploring random tree (RRT) to generate paths for a group of UAVs to reach multiple goal locations at the same time. We first derive the dynamics constraint of the UAV and include it in the problem formulation. MultiRRT is then developed, taking into account the cooperative requirements and safe constraints during its path‑searching process. The algorithm features two new mechanisms, node reduction and Bezier interpolation, to ensure the feasibility and optimality of the paths generated. Importantly, the interpolated paths are proven to meet the safety and dynamics constraints imposed by obstacles and the UAVs. A number of simulations, comparisons, and experiments have been conducted to evaluate the performance of the proposed approach. The results show that MultiRRT can generate collision‑free paths for multiple UAVs to reach their goals with better scores in path length and smoothness metrics than state‑of‑the‑art RRT variants including Theta‑RRT, FN‑RRT, RRT, and RRT‑Smart. The generated paths are also tested in practical flights with real UAVs to evaluate their validity for cooperative tasks. The source code of the algorithm is available at https://github.com/duynamrcv/multi‑target_RRT
Authors: Nguyen Ngoc Dat, Tom Richardson, Matthew Watson, Kilian Meier, Jenna Kline, Sid Reid, Guy Maalouf, Duncan Hine, Majid Mirmehdi, Tilo Burghardt
Abstract: Live tracking of wildlife via high‑resolution video processing directly onboard drones is widely unexplored and most existing solutions rely on streaming video to ground stations to support navigation. Yet, both autonomous animal‑reactive flight control beyond visual line of sight and/or mission‑specific individual and behaviour recognition tasks rely to some degree on this capability. In response, we introduce WildLive ‑ a near real‑time animal detection and tracking framework for high‑resolution imagery running directly onboard uncrewed aerial vehicles (UAVs). The system performs multi‑animal detection and tracking at 17.81fps for HD and 7.53fps on 4K video streams suitable for operation during higher altitude flights to minimise animal disturbance. Our system is optimised for Jetson Orin AGX onboard hardware. It integrates the efficiency of sparse optical flow tracking and mission‑specific sampling with device‑optimised and proven YOLO‑driven object detection and segmentation techniques. Essentially, computational resource is focused onto spatio‑temporal regions of high uncertainty to significantly improve UAV processing speeds. Alongside, we introduce our WildLive dataset, which comprises 200K+ annotated animal instances across 19K+ frames from 4K UAV videos collected at the Ol Pejeta Conservancy in Kenya. All frames contain ground truth bounding boxes, segmentation masks, as well as individual tracklets and tracking point trajectories. We compare our system against current object tracking approaches including OC‑SORT, ByteTrack, and SORT. Our multi‑animal tracking experiments with onboard hardware confirm that near real‑time high‑resolution wildlife tracking is possible on UAVs whilst maintaining high accuracy levels as needed for future navigational and mission‑specific animal‑centric operational autonomy. Our materials are available at: https://dat‑nguyenvn.github.io/WildLive/
Authors: You Wu, Xucheng Wang, Xiangyang Yang, Mengyuan Liu, Dan Zeng, Hengzhou Ye, Shuiwang Li
Abstract: Single‑stream architectures using Vision Transformer (ViT) backbones show great potential for real‑time UAV tracking recently. However, frequent occlusions from obstacles like buildings and trees expose a major drawback: these models often lack strategies to handle occlusions effectively. New methods are needed to enhance the occlusion resilience of single‑stream ViT models in aerial tracking. In this work, we propose to learn Occlusion‑Robust Representations (ORR) based on ViTs for UAV tracking by enforcing an invariance of the feature representation of a target with respect to random masking operations modeled by a spatial Cox process. Hopefully, this random masking approximately simulates target occlusions, thereby enabling us to learn ViTs that are robust to target occlusion for UAV tracking. This framework is termed ORTrack. Additionally, to facilitate real‑time applications, we propose an Adaptive Feature‑Based Knowledge Distillation (AFKD) method to create a more compact tracker, which adaptively mimics the behavior of the teacher model ORTrack according to the task's difficulty. This student model, dubbed ORTrack‑D, retains much of ORTrack's performance while offering higher efficiency. Extensive experiments on multiple benchmarks validate the effectiveness of our method, demonstrating its state‑of‑the‑art performance. Codes is available at https://github.com/wuyou3474/ORTrack.
Authors: Houzhang Fang, Xiaolin Wang, Zengyang Li, Lu Wang, Qingshan Li, Yi Chang, Luxin Yan
Abstract: Infrared unmanned aerial vehicle (UAV) images captured using thermal detectors are often affected by temperature dependent low‑frequency nonuniformity, which significantly reduces the contrast of the images. Detecting UAV targets under nonuniform conditions is crucial in UAV surveillance applications. Existing methods typically treat infrared nonuniformity correction (NUC) as a preprocessing step for detection, which leads to suboptimal performance. Balancing the two tasks while enhancing detection beneficial information remains challenging. In this paper, we present a detection‑friendly union framework, termed UniCD, that simultaneously addresses both infrared NUC and UAV target detection tasks in an end‑to‑end manner. We first model NUC as a small number of parameter estimation problem jointly driven by priors and data to generate detection‑conducive images. Then, we incorporate a new auxiliary loss with target mask supervision into the backbone of the infrared UAV target detection network to strengthen target features while suppressing the background. To better balance correction and detection, we introduce a detection‑guided self‑supervised loss to reduce feature discrepancies between the two tasks, thereby enhancing detection robustness to varying nonuniformity levels. Additionally, we construct a new benchmark composed of 50,000 infrared images in various nonuniformity types, multi‑scale UAV targets and rich backgrounds with target annotations, called IRBFD. Extensive experiments on IRBFD demonstrate that our UniCD is a robust union framework for NUC and UAV target detection while achieving real‑time processing capabilities. Dataset can be available at https://github.com/IVPLaboratory/UniCD.
Authors: Daniel M. Cherenson, Devansh R. Agrawal, Dimitra Panagou
Abstract: Mission planning can often be formulated as a constrained control problem under multiple path constraints (i.e., safety constraints) and budget constraints (i.e., resource expenditure constraints). In a priori unknown environments, verifying that an offline solution will satisfy the constraints for all time can be difficult, if not impossible. We present ReRoot, a novel sampling‑based framework that enforces safety and budget constraints for nonlinear systems in unknown environments. The main idea is that ReRoot grows multiple reverse RRT trees online, starting from renewal sets, i.e., sets where the budget constraints are renewed. The dynamically feasible backup trajectories guarantee safety and reduce resource expenditure, which provides a principled backup policy when integrated into the gatekeeper safety verification architecture. We demonstrate our approach in simulation with a fixed‑wing UAV in a GNSS‑denied environment with a budget constraint on localization error that can be renewed at visual landmarks.
Authors: Rick van Essen, Eldert van Henten, Lammert Kooistra, Gert Kootstra
Abstract: This paper presents an adaptive path planner for object search in agricultural fields using UAVs. The path planner uses a high‑altitude coverage flight path and plans additional low‑altitude inspections when the detection network is uncertain. The path planner was evaluated in an offline simulation environment containing real‑world images. We trained a YOLOv8 detection network to detect artificial plants placed in grass fields to showcase the potential of our path planner. We evaluated the effect of different detection certainty measures, optimized the path planning parameters, investigated the effects of localization errors, and different numbers of objects in the field. The YOLOv8 detection confidence worked best to differentiate between true and false positive detections and was therefore used in the adaptive planner. The optimal parameters of the path planner depended on the distribution of objects in the field. When the objects were uniformly distributed, more low‑altitude inspections were needed compared to a non‑uniform distribution of objects, resulting in a longer path length. The adaptive planner proved to be robust against localization uncertainty. When increasing the number of objects, the flight path length increased, especially when the objects were uniformly distributed. When the objects were non‑uniformly distributed, the adaptive path planner yielded a shorter path than a low‑altitude coverage path, even with a high number of objects. Overall, the presented adaptive path planner allowed finding non‑uniformly distributed objects in a field faster than a coverage path planner and resulted in a compatible detection accuracy. The path planner is made available at https://github.com/wur‑abe/uav_adaptive_planner.
Authors: Taufiq Ahmed, Abhishek Kumar, Constantino Álvarez Casado, Anlan Zhang, Tuomo Hänninen, Lauri Loven, Miguel Bordallo López, Sasu Tarkoma
Abstract: Object detection models often struggle with class imbalance, where rare categories appear significantly less frequently than common ones. Existing sampling‑based rebalancing strategies, such as Repeat Factor Sampling (RFS) and Instance‑Aware Repeat Factor Sampling (IRFS), mitigate this issue by adjusting sample frequencies based on image and instance counts. However, these methods are based on linear adjustments, which limit their effectiveness in long‑tailed distributions. This work introduces Exponentially Weighted Instance‑Aware Repeat Factor Sampling (E‑IRFS), an extension of IRFS that applies exponential scaling to better differentiate between rare and frequent classes. E‑IRFS adjusts sampling probabilities using an exponential function applied to the geometric mean of image and instance frequencies, ensuring a more adaptive rebalancing strategy. We evaluate E‑IRFS on a dataset derived from the Fireman‑UAV‑RGBT Dataset and four additional public datasets, using YOLOv11 object detection models to identify fire, smoke, people and lakes in emergency scenarios. The results show that E‑IRFS improves detection performance by 22% over the baseline and outperforms RFS and IRFS, particularly for rare categories. The analysis also highlights that E‑IRFS has a stronger effect on lightweight models with limited capacity, as these models rely more on data sampling strategies to address class imbalance. The findings demonstrate that E‑IRFS improves rare object detection in resource‑constrained environments, making it a suitable solution for real‑time applications such as UAV‑based emergency monitoring. The code is available at: https://github.com/futurians/E‑IRFS.
Authors: Yara AlaaEldin, Francesca Odone
Abstract: Understanding the geometric and semantic properties of the scene is crucial in autonomous navigation and particularly challenging in the case of Unmanned Aerial Vehicle (UAV) navigation. Such information may be by obtained by estimating depth and semantic segmentation maps of the surrounding environment and for their practical use in autonomous navigation, the procedure must be performed as close to real‑time as possible. In this paper, we leverage monocular cameras on aerial robots to predict depth and semantic maps in low‑altitude unstructured environments. We propose a joint deep‑learning architecture that can perform the two tasks accurately and rapidly, and validate its effectiveness on MidAir and Aeroscapes benchmark datasets. Our joint‑architecture proves to be competitive or superior to the other single and joint architecture methods while performing its task fast predicting 20.2 FPS on a single NVIDIA quadro p5000 GPU and it has a low memory footprint. All codes for training and prediction can be found on this link: https://github.com/Malga‑Vision/Co‑SemDepth
Authors: Haolin Qin, Tingfa Xu, Tianhao Li, Zhenxiang Chen, Tao Feng, Jianan Li
Abstract: UAV tracking faces significant challenges in real‑world scenarios, such as small‑size targets and occlusions, which limit the performance of RGB‑based trackers. Multispectral images (MSI), which capture additional spectral information, offer a promising solution to these challenges. However, progress in this field has been hindered by the lack of relevant datasets. To address this gap, we introduce the first large‑scale Multispectral UAV Single Object Tracking dataset (MUST), which includes 250 video sequences spanning diverse environments and challenges, providing a comprehensive data foundation for multispectral UAV tracking. We also propose a novel tracking framework, UNTrack, which encodes unified spectral, spatial, and temporal features from spectrum prompts, initial templates, and sequential searches. UNTrack employs an asymmetric transformer with a spectral background eliminate mechanism for optimal relationship modeling and an encoder that continuously updates the spectrum prompt to refine tracking, improving both accuracy and efficiency. Extensive experiments show that our proposed UNTrack outperforms state‑of‑the‑art UAV trackers. We believe our dataset and framework will drive future research in this area. The dataset is available on https://github.com/q2479036243/MUST‑Multispectral‑UAV‑Single‑Object‑Tracking.
Authors: Yu-Hsi Chen
Abstract: Detecting and tracking multiple unmanned aerial vehicles (UAVs) in thermal infrared video is inherently challenging due to low contrast, environmental noise, and small target sizes. This paper provides a straightforward approach to address multi‑UAV tracking in thermal infrared video, leveraging recent advances in detection and tracking. Instead of relying on the well‑established YOLOv5 with DeepSORT combination, we present a tracking framework built on YOLOv12 and BoT‑SORT, enhanced with tailored training and inference strategies. We evaluate our approach following the 4th Anti‑UAV Challenge metrics and reach competitive performance. Notably, we achieved strong results without using contrast enhancement or temporal information fusion to enrich UAV features, highlighting our approach as a "Strong Baseline" for multi‑UAV tracking tasks. We provide implementation details, in‑depth experimental analysis, and a discussion of potential improvements. The code is available at https://github.com/wish44165/YOLOv12‑BoT‑SORT‑ReID .
Authors: Ruiyang Ha, Songyi Jiang, Bin Li, Bikang Pan, Yihang Zhu, Junjie Zhang, Xiatian Zhu, Shaogang Gong, Jingya Wang
Abstract: Conventional person re‑identification (ReID) research is often limited to single‑modality sensor data from static cameras, which fails to address the complexities of real‑world scenarios where multi‑modal signals are increasingly prevalent. For instance, consider an urban ReID system integrating stationary RGB cameras, nighttime infrared sensors, and UAVs equipped with dynamic tracking capabilities. Such systems face significant challenges due to variations in camera perspectives, lighting conditions, and sensor modalities, hindering effective person ReID. To address these challenges, we introduce the MP‑ReID benchmark, a novel dataset designed specifically for multi‑modality and multi‑platform ReID. This benchmark uniquely compiles data from 1,930 identities across diverse modalities, including RGB, infrared, and thermal imaging, captured by both UAVs and ground‑based cameras in indoor and outdoor environments. Building on this benchmark, we introduce Uni‑Prompt ReID, a framework with specific‑designed prompts, tailored for cross‑modality and cross‑platform scenarios. Our method consistently outperforms state‑of‑the‑art approaches, establishing a robust foundation for future research in complex and dynamic ReID environments. Our dataset are available at:https://mp‑reid.github.io/.
Authors: Yibin Ye, Xichao Teng, Shuo Chen, Leqi Liu, Kun Wang, Xiaokai Song, Zhang Li
Abstract: Absolute Visual Localization (AVL) enables an Unmanned Aerial Vehicle (UAV) to determine its position in GNSS‑denied environments by establishing geometric relationships between UAV images and geo‑tagged reference maps. While many previous works have achieved AVL with image retrieval and matching techniques, research in low‑altitude multi‑view scenarios still remains limited. Low‑altitude multi‑view conditions present greater challenges due to extreme viewpoint changes. To investigate effective UAV AVL approaches under such conditions, we present this benchmark. Firstly, a large‑scale low‑altitude multi‑view dataset called AnyVisLoc was constructed. This dataset includes 18,000 images captured at multiple scenes and altitudes, along with 2.5D reference maps containing aerial photogrammetry maps and historical satellite maps. Secondly, a unified framework was proposed to integrate the state‑of‑the‑art AVL approaches and comprehensively test their performance. The best combined method was chosen as the baseline, and the key factors influencing localization accuracy are thoroughly analyzed based on it. This baseline achieved a 74.1% localization accuracy within 5 m under low‑altitude, multi‑view conditions. In addition, a novel retrieval metric called PDM@K was introduced to better align with the characteristics of the UAV AVL task. Overall, this benchmark revealed the challenges of low‑altitude, multi‑view UAV AVL and provided valuable guidance for future research. The dataset and code are available at https://github.com/UAV‑AVL/Benchmark
Authors: Rui Shi, Xiaodong Yu, Shengming Wang, Yijia Zhang, Lu Xu, Peng Pan, Chunlai Ma
Abstract: In this paper, we propose RFUAV as a new benchmark dataset for radio‑frequency based (RF‑based) unmanned aerial vehicle (UAV) identification and address the following challenges: Firstly, many existing datasets feature a restricted variety of drone types and insufficient volumes of raw data, which fail to meet the demands of practical applications. Secondly, existing datasets often lack raw data covering a broad range of signal‑to‑noise ratios (SNR), or do not provide tools for transforming raw data to different SNR levels. This limitation undermines the validity of model training and evaluation. Lastly, many existing datasets do not offer open‑access evaluation tools, leading to a lack of unified evaluation standards in current research within this field. RFUAV comprises approximately 1.3 TB of raw frequency data collected from 37 distinct UAVs using the Universal Software Radio Peripheral (USRP) device in real‑world environments. Through in‑depth analysis of the RF data in RFUAV, we define a drone feature sequence called RF drone fingerprint, which aids in distinguishing drone signals. In addition to the dataset, RFUAV provides a baseline preprocessing method and model evaluation tools. Rigorous experiments demonstrate that these preprocessing methods achieve state‑of‑the‑art (SOTA) performance using the provided evaluation tools. The RFUAV dataset and baseline implementation are publicly available at https://github.com/kitoweeknd/RFUAV/.
Authors: Yu Tang Liu, Afonso Vale, Aamir Ahmad, Rodrigo Ventura, Meysam Basiri
Abstract: Quadcopter attitude control involves two tasks: smooth attitude tracking and aggressive stabilization from arbitrary states. Although both can be formulated as tracking problems, their distinct state spaces and control strategies complicate a unified reward function. We propose a multitask deep reinforcement learning framework that leverages parallel simulation with IsaacGym and a Graph Convolutional Network (GCN) policy to address both tasks effectively. Our multitask Soft Actor‑Critic (SAC) approach achieves faster, more reliable learning and higher sample efficiency than single‑task methods. We validate its real‑world applicability by deploying the learned policy ‑ a compact two‑layer network with 24 neurons per layer ‑ on a Pixhawk flight controller, achieving 400 Hz control without extra computational resources. We provide our code at https://github.com/robot‑perception‑group/GraphMTSAC\_UAV/.
Authors: Chaocan Xue, Bineng Zhong, Qihua Liang, Yaozong Zheng, Ning Li, Yuanliang Xue, Shuxiang Song
Abstract: Vision transformers (ViTs) have emerged as a popular backbone for visual tracking. However, complete ViT architectures are too cumbersome to deploy for unmanned aerial vehicle (UAV) tracking which extremely emphasizes efficiency. In this study, we discover that many layers within lightweight ViT‑based trackers tend to learn relatively redundant and repetitive target representations. Based on this observation, we propose a similarity‑guided layer adaptation approach to optimize the structure of ViTs. Our approach dynamically disables a large number of representation‑similar layers and selectively retains only a single optimal layer among them, aiming to achieve a better accuracy‑speed trade‑off. By incorporating this approach into existing ViTs, we tailor previously complete ViT architectures into an efficient similarity‑guided layer‑adaptive framework, namely SGLATrack, for real‑time UAV tracking. Extensive experiments on six tracking benchmarks verify the effectiveness of the proposed approach, and show that our SGLATrack achieves a state‑of‑the‑art real‑time speed while maintaining competitive tracking precision. Codes and models are available at https://github.com/GXNU‑ZhongLab/SGLATrack.
Authors: Jinhao Zhang, Zhexuan Zhou, Wenlong Xia, Youmin Gong, Jie Mei
Abstract: Efficient and safe trajectory planning plays a critical role in the application of quadrotor unmanned aerial vehicles. Currently, the inherent trade‑off between constraint compliance and computational efficiency enhancement in UAV trajectory optimization problems has not been sufficiently addressed. To enhance the performance of UAV trajectory optimization, we propose a spatial‑temporal iterative optimization framework. Firstly, B‑splines are utilized to represent UAV trajectories, with rigorous safety assurance achieved through strict enforcement of constraints on control points. Subsequently, a set of QP‑LP subproblems via spatial‑temporal decoupling and constraint linearization is derived. Finally, an iterative optimization strategy incorporating guidance gradients is employed to obtain high‑performance UAV trajectories in different scenarios. Both simulation and real‑world experimental results validate the efficiency and high‑performance of the proposed optimization framework in generating safe and fast trajectories. Our source codes will be released for community reference at https://hitsz‑mas.github.io/STORM
Authors: Yifei Wang, Jacky Keung, Haohan Xu, Yuchen Cao, Zhenyu Mao
Abstract: Autonomous navigation is reshaping various domains in people's life by enabling efficient and safe movement in complex environments. Reliable navigation requires algorithmic approaches that compute optimal or near‑optimal trajectories while satisfying task‑specific constraints and ensuring obstacle avoidance. However, existing methods struggle with slow convergence and suboptimal solutions, particularly in complex environments, limiting their real‑world applicability. To address these limitations, this paper presents the Multi‑Strategy Enhanced Crayfish Optimization Algorithm (MCOA), a novel approach integrating three key strategies: 1) Refractive Opposition Learning, enhancing population diversity and global exploration, 2) Stochastic Centroid‑Guided Exploration, balancing global and local search to prevent premature convergence, and 3) Adaptive Competition‑Based Selection, dynamically adjusting selection pressure for faster convergence and improved solution quality. Empirical evaluations underscore the remarkable planning speed and the amazing solution quality of MCOA in both 3D Unmanned Aerial Vehicle (UAV) and 2D mobile robot path planning. Against 11 baseline algorithms, MCOA achieved a 69.2% reduction in computational time and a 16.7% improvement in minimizing overall path cost in 3D UAV scenarios. Furthermore, in 2D path planning, MCOA outperformed baseline approaches by 44% on average, with an impressive 75.6% advantage in the largest 6060 grid setting. These findings validate MCOA as a powerful tool for optimizing autonomous navigation in complex environments. The source code is available at: https://github.com/coedv‑hub/MCOA.
Authors: Mingjie Wu, Chenggui Yang, Huihua Wang, Chen Xue, Yibo Wang, Haoyu Wang, Yansong Wang, Can Peng, Yuqi Han, Ruoyu Li, Lijun Yun, Zaiqing Chen, Yuelong Xia
Abstract: The UAV technology is gradually maturing and can provide extremely powerful support for smart agriculture and precise monitoring. Currently, there is no dataset related to green walnuts in the field of agricultural computer vision. Thus, in order to promote the algorithm design in the field of agricultural computer vision, we used UAV to collect remote‑sensing data from 8 walnut sample plots. Considering that green walnuts are subject to various lighting conditions and occlusion, we constructed a large‑scale dataset with a higher‑granularity of target features ‑ WalnutData. This dataset contains a total of 30,240 images and 706,208 instances, and there are 4 target categories: being illuminated by frontal light and unoccluded (A1), being backlit and unoccluded (A2), being illuminated by frontal light and occluded (B1), and being backlit and occluded (B2). Subsequently, we evaluated many mainstream algorithms on WalnutData and used these evaluation results as the baseline standard. The dataset and all evaluation results can be obtained at https://github.com/1wuming/WalnutData.
Authors: Rui Li, Xiaowei Zhao
Abstract: As a novel and challenging task, referring segmentation combines computer vision and natural language processing to localize and segment objects based on textual descriptions. While referring image segmentation (RIS) has been extensively studied in natural images, little attention has been given to aerial imagery, particularly from unmanned aerial vehicles (UAVs). The unique challenges of UAV imagery, including complex spatial scales, occlusions, and varying object orientations, render existing RIS approaches ineffective. A key limitation has been the lack of UAV‑specific datasets, as manually annotating pixel‑level masks and generating textual descriptions is labour‑intensive and time‑consuming. To address this gap, we design an automatic labelling pipeline that leverages pre‑existing UAV segmentation datasets and Multimodal Large Language Models (MLLM) for generating textual descriptions. Furthermore, we propose Aerial Referring Transformer (AeroReformer), a novel framework for UAV referring image segmentation (UAV‑RIS), featuring a Vision‑Language Cross‑Attention Module (VLCAM) for effective cross‑modal understanding and a Rotation‑Aware Multi‑Scale Fusion (RAMSF) decoder to enhance segmentation accuracy in aerial scenes. Extensive experiments on two newly developed datasets demonstrate the superiority of AeroReformer over existing methods, establishing a new benchmark for UAV‑RIS. The datasets and code will be publicly available at: https://github.com/lironui/AeroReformer.
Authors: Jiahao Qi, Chuanhong Zhou, Xingyue Liu, Chen Chen, Dehui Zhu, Kangcheng Bin, Ping Zhong
Abstract: UAV‑borne hyperspectral remote sensing has emerged as a promising approach for underwater target detection (UTD). However, its effectiveness is hindered by spectral distortions in nearshore environments, which compromise the accuracy of traditional hyperspectral UTD (HUTD) methods that rely on bathymetric model. These distortions lead to significant uncertainty in target and background spectra, challenging the detection process. To address this, we propose the Hyperspectral Underwater Contrastive Learning Network (HUCLNet), a novel framework that integrates contrastive learning with a self‑paced learning paradigm for robust HUTD in nearshore regions. HUCLNet extracts discriminative features from distorted hyperspectral data through contrastive learning, while the self‑paced learning strategy selectively prioritizes the most informative samples. Additionally, a reliability‑guided clustering strategy enhances the robustness of learned representations.To evaluate the method effectiveness, we conduct a novel nearshore HUTD benchmark dataset, ATR2‑HUTD, covering three diverse scenarios with varying water types and turbidity, and target types. Extensive experiments demonstrate that HUCLNet significantly outperforms state‑of‑the‑art methods. The dataset and code will be publicly available at: https://github.com/qjh1996/HUTD
Authors: Ekin Celikkan, Timo Kunzmann, Yertay Yeskaliyev, Sibylle Itzerott, Nadja Klein, Martin Herold
Abstract: Weeds are one of the major reasons for crop yield loss but current weeding practices fail to manage weeds in an efficient and targeted manner. Effective weed management is especially important for crops with high worldwide production such as maize, to maximize crop yield for meeting increasing global demands. Advances in near‑sensing and computer vision enable the development of new tools for weed management. Specifically, state‑of‑the‑art segmentation models, coupled with novel sensing technologies, can facilitate timely and accurate weeding and monitoring systems. However, learning‑based approaches require annotated data and show a lack of generalization to aerial imaging for different crops. We present a novel dataset for semantic and instance segmentation of crops and weeds in agricultural maize fields. The multispectral UAV‑based dataset contains images with RGB, red‑edge, and near‑infrared bands, a large number of plant instances, dense annotations for maize and four weed classes, and is multitemporal. We provide extensive baseline results for both tasks, including probabilistic methods to quantify prediction uncertainty, improve model calibration, and demonstrate the approach's applicability to out‑of‑distribution data. The results show the effectiveness of the two additional bands compared to RGB only, and better performance in our target domain than models trained on existing datasets. We hope our dataset advances research on methods and operational systems for fine‑grained weed identification, enhancing the robustness and applicability of UAV‑based weed management. The dataset and code are available at https://github.com/GFZ/weedsgalore
Authors: Shiao Wang, Xiao Wang, Chao Wang, Liye Jin, Lin Zhu, Bo Jiang, Yonghong Tian, Jin Tang
Abstract: We then introduce a novel hierarchical knowledge distillation strategy that incorporates the similarity matrix, feature representation, and response map‑based distillation to guide the learning of the student Transformer network. We also enhance the model's ability to capture temporal dependencies by applying the temporal Fourier transform to establish temporal relationships between video frames. We adapt the network model to specific target objects during testing via a newly proposed test‑time tuning strategy to achieve high performance and flexibility in target tracking. Recognizing the limitations of existing event‑based tracking datasets, which are predominantly low‑resolution, we propose EventVOT, the first large‑scale high‑resolution event‑based tracking dataset. It comprises 1141 videos spanning diverse categories such as pedestrians, vehicles, UAVs, ping pong, etc. Extensive experiments on both low‑resolution (FE240hz, VisEvent, FELT), and our newly proposed high‑resolution EventVOT dataset fully validated the effectiveness of our proposed method. Both the benchmark dataset and source code have been released on https://github.com/Event‑AHU/EventVOT_Benchmark
Authors: Eslam Eldeeb, Hirley Alves
Abstract: Mission critical applications, such as UAV‑assisted IoT networks require risk‑aware decision‑making under dynamic topologies and uncertain channels. We propose meta‑conservative quantile regression (M‑CQR), a meta‑offline distributional MARL algorithm that integrates conservative Q‑learning (CQL) for safe offline learning, quantile regression DQN (QR‑DQN) for risk‑sensitive value estimation, and model‑agnostic meta‑learning (MAML) for rapid adaptation. Two variants are developed: meta‑independent CQR (M‑I‑CQR) and meta‑CTDE‑CQR. In a UAV‑based communication scenario, M‑CTDE‑CQR achieves up to 50% faster convergence and outperforms baseline MARL methods, offering improved scalability, robustness, and adaptability for risk‑sensitive decision‑making. Code is available at https://github.com/Eslam211/MA_Meta_ODRL
Authors: Muqing Cao, Thien-Minh Nguyen, Shenghai Yuan, Andreas Anastasiou, Angelos Zacharia, Savvas Papaioannou, Panayiotis Kolios, Christos G. Panayiotou, Marios M. Polycarpou, Xinhang Xu, Mingjie Zhang, Fei Gao, Boyu Zhou, Ben M. Chen, Lihua Xie
Abstract: We propose the Cooperative Aerial Robot Inspection Challenge (CARIC), a simulation‑based benchmark for motion planning algorithms in heterogeneous multi‑UAV systems. CARIC features UAV teams with complementary sensors, realistic constraints, and evaluation metrics prioritizing inspection quality and efficiency. It offers a ready‑to‑use perception‑control software stack and diverse scenarios to support the development and evaluation of task allocation and motion planning algorithms. Competitions using CARIC were held at IEEE CDC 2023 and the IROS 2024 Workshop on Multi‑Robot Perception and Navigation, attracting innovative solutions from research teams worldwide. This paper examines the top three teams from CDC 2023, analyzing their exploration, inspection, and task allocation strategies while drawing insights into their performance across scenarios. The results highlight the task's complexity and suggest promising directions for future research in cooperative multi‑UAV systems.
Authors: Zhifan Song, Yuan Zhang, Abd Al Rahman M. Abu Ebayyeh
Abstract: Detecting small targets in drone imagery is challenging due to low resolution, complex backgrounds, and dynamic scenes. We propose EDNet, a novel edge‑target detection framework built on an enhanced YOLOv10 architecture, optimized for real‑time applications without post‑processing. EDNet incorporates an XSmall detection head and a Cross Concat strategy to improve feature fusion and multi‑scale context awareness for detecting tiny targets in diverse environments. Our unique C2f‑FCA block employs Faster Context Attention to enhance feature extraction while reducing computational complexity. The WIoU loss function is employed for improved bounding box regression. With seven model sizes ranging from Tiny to XL, EDNet accommodates various deployment environments, enabling local real‑time inference and ensuring data privacy. Notably, EDNet achieves up to a 5.6% gain in mAP@50 with significantly fewer parameters. On an iPhone 12, EDNet variants operate at speeds ranging from 16 to 55 FPS, providing a scalable and efficient solution for edge‑based object detection in challenging drone imagery. The source code and pre‑trained models are available at: https://github.com/zsniko/EDNet.
Authors: Thi Thuy Ngan Duong, Duy-Nam Bui, Manh Duong Phung
Abstract: Path planning is essential for unmanned aerial vehicles (UAVs) as it determines the path that the UAV needs to follow to complete a task. This work addresses this problem by introducing a new algorithm called navigation variable‑based multi‑objective particle swarm optimization (NMOPSO). It first models path planning as an optimization problem via the definition of a set of objective functions that include optimality and safety requirements for UAV operation. The NMOPSO is then used to minimize those functions through Pareto optimal solutions. The algorithm features a new path representation based on navigation variables to include kinematic constraints and exploit the maneuverable characteristics of the UAV. It also includes an adaptive mutation mechanism to enhance the diversity of the swarm for better solutions. Comparisons with various algorithms have been carried out to benchmark the proposed approach. The results indicate that the NMOPSO performs better than not only other particle swarm optimization variants but also other state‑of‑the‑art multi‑objective and metaheuristic optimization algorithms. Experiments have also been conducted with real UAVs to confirm the validity of the approach for practical flights. The source code of the algorithm is available at https://github.com/ngandng/NMOPSO.
Authors: Yonglin Tian, Fei Lin, Yiduo Li, Tengchao Zhang, Qiyao Zhang, Xuan Fu, Jun Huang, Xingyuan Dai, Yutong Wang, Chunwei Tian, Bai Li, Yisheng Lv, Levente Kovács, Fei-Yue Wang
Abstract: Low‑altitude mobility, exemplified by unmanned aerial vehicles (UAVs), has introduced transformative advancements across various domains, like transportation, logistics, and agriculture. Leveraging flexible perspectives and rapid maneuverability, UAVs extend traditional systems' perception and action capabilities, garnering widespread attention from academia and industry. However, current UAV operations primarily depend on human control, with only limited autonomy in simple scenarios, and lack the intelligence and adaptability needed for more complex environments and tasks. The emergence of large language models (LLMs) demonstrates remarkable problem‑solving and generalization capabilities, offering a promising pathway for advancing UAV intelligence. This paper explores the integration of LLMs and UAVs, beginning with an overview of UAV systems' fundamental components and functionalities, followed by an overview of the state‑of‑the‑art in LLM technology. Subsequently, it systematically highlights the multimodal data resources available for UAVs, which provide critical support for training and evaluation. Furthermore, it categorizes and analyzes key tasks and application scenarios where UAVs and LLMs converge. Finally, a reference roadmap towards agentic UAVs is proposed, aiming to enable UAVs to achieve agentic intelligence through autonomous perception, memory, reasoning, and tool utilization. Related resources are available at https://github.com/Hub‑Tian/UAVs_Meet_LLMs.
Authors: Huaxiang Zhang, Kai Liu, Zhongxue Gan, Guo-Niu Zhu
Abstract: Unmanned aerial vehicle object detection (UAV‑OD) has been widely used in various scenarios. However, most existing UAV‑OD algorithms rely on manually designed components, which require extensive tuning. End‑to‑end models that do not depend on such manually designed components are mainly designed for natural images, which are less effective for UAV imagery. To address such challenges, this paper proposes an efficient detection transformer (DETR) framework tailored for UAV imagery, i.e., UAV‑DETR. The framework includes a multi‑scale feature fusion with frequency enhancement module, which captures both spatial and frequency information at different scales. In addition, a frequency‑focused down‑sampling module is presented to retain critical spatial details during down‑sampling. A semantic alignment and calibration module is developed to align and fuse features from different fusion paths. Experimental results demonstrate the effectiveness and generalization of our approach across various UAV imagery datasets. On the VisDrone dataset, our method improves AP by 3.1% and \textAP_50 by 4.2% over the baseline. Similar enhancements are observed on the UAVVaste dataset. The project page: https://github.com/ValiantDiligent/UAV‑DETR
Authors: You Wu, Yongxin Li, Mengyuan Liu, Xucheng Wang, Xiangyang Yang, Hengzhou Ye, Dan Zeng, Qijun Zhao, Shuiwang Li
Abstract: Transformer‑based models have improved visual tracking, but most still cannot run in real time on resource‑limited devices, especially for unmanned aerial vehicle (UAV) tracking. To achieve a better balance between performance and efficiency, we propose AVTrack, an adaptive computation tracking framework that adaptively activates transformer blocks through an Activation Module (AM), which dynamically optimizes the ViT architecture by selectively engaging relevant components. To address extreme viewpoint variations, we propose to learn view‑invariant representations via mutual information (MI) maximization. In addition, we propose AVTrack‑MD, an enhanced tracker incorporating a novel MI maximization‑based multi‑teacher knowledge distillation framework. Leveraging multiple off‑the‑shelf AVTrack models as teachers, we maximize the MI between their aggregated softened features and the corresponding softened feature of the student model, improving the generalization and performance of the student, especially under noisy conditions. Extensive experiments show that AVTrack‑MD achieves performance comparable to AVTrack's performance while reducing model complexity and boosting average tracking speed by over 17%. Codes is available at: https://github.com/wuyou3474/AVTrack.
Authors: Risal Shahriar Shefin, Md Asifur Rahman, Thai Le, Sarra Alqahtani
Abstract: Reinforcement learning (RL) has shown great promise in simulated environments, such as games, where failures have minimal consequences. However, the deployment of RL agents in real‑world systems such as autonomous vehicles, robotics, UAVs, and medical devices demands a higher level of safety and transparency, particularly when facing adversarial threats. Safe RL algorithms have been developed to address these concerns by optimizing both task performance and safety constraints. However, errors are inevitable, and when they occur, it is essential that the RL agents can also explain their actions to human operators. This makes trust in the safety mechanisms of RL systems crucial for effective deployment. Explainability plays a key role in building this trust by providing clear, actionable insights into the agent's decision‑making process, ensuring that safety‑critical decisions are well understood. While machine learning (ML) has seen significant advances in interpretability and visualization, explainability methods for RL remain limited. Current tools fail to address the dynamic, sequential nature of RL and its needs to balance task performance with safety constraints over time. The re‑purposing of traditional ML methods, such as saliency maps, is inadequate for safety‑critical RL applications where mistakes can result in severe consequences. To bridge this gap, we propose xSRL, a framework that integrates both local and global explanations to provide a comprehensive understanding of RL agents' behavior. xSRL also enables developers to identify policy vulnerabilities through adversarial attacks, offering tools to debug and patch agents without retraining. Our experiments and user studies demonstrate xSRL's effectiveness in increasing safety in RL systems, making them more reliable and trustworthy for real‑world deployment. Code is available at https://github.com/risal‑shefin/xSRL.
Authors: Zhenyuan Xiao, Yizhuo Yang, Guili Xu, Xianglong Zeng, Shenghai Yuan
Abstract: The increasing use of compact UAVs has created significant threats to public safety, while traditional drone detection systems are often bulky and costly. To address these challenges, we propose AV‑DTEC, a lightweight self‑supervised audio‑visual fusion‑based anti‑UAV system. AV‑DTEC is trained using self‑supervised learning with labels generated by LiDAR, and it simultaneously learns audio and visual features through a parallel selective state‑space model. With the learned features, a specially designed plug‑and‑play primary‑auxiliary feature enhancement module integrates visual features into audio features for better robustness in cross‑lighting conditions. To reduce reliance on auxiliary features and align modalities, we propose a teacher‑student model that adaptively adjusts the weighting of visual features. AV‑DTEC demonstrates exceptional accuracy and effectiveness in real‑world multi‑modality data. The code and trained models are publicly accessible on GitHub
\urlhttps://github.com/AmazingDay1/AV‑DETC.
Authors: Henry Cording, Yves Plancherel, Pablo Brito-Parada
Abstract: Very high resolution (VHR) mapping through remote sensing (RS) imagery presents a new opportunity to inform decision‑making and sustainable practices in countless domains. Efficient processing of big VHR data requires automated tools applicable to numerous geographic regions and features. Contemporary RS studies address this challenge by employing deep learning (DL) models for specific datasets or features, which limits their applicability across contexts.
The present research aims to overcome this limitation by introducing EcoMapper, a scalable solution to segment arbitrary features in VHR RS imagery. EcoMapper fully automates processing of geospatial data, DL model training, and inference. Models trained with EcoMapper successfully segmented two distinct features in a real‑world UAV dataset, achieving scores competitive with prior studies which employed context‑specific models.
To evaluate EcoMapper, many additional models were trained on permutations of principal field survey characteristics (FSCs). A relationship was discovered allowing derivation of optimal ground sampling distance from feature size, termed Cording Index (CI). A comprehensive methodology for field surveys was developed to ensure DL methods can be applied effectively to collected data.
The EcoMapper code accompanying this work is available at https://github.com/hcording/ecomapper .
Authors: Zhenyuan Xiao, Huanran Hu, Guili Xu, Junwei He
Abstract: The increasing prevalence of compact UAVs has introduced significant risks to public safety, while traditional drone detection systems are often bulky and costly. To address these challenges, we present TAME, the Temporal Audio‑based Mamba for Enhanced Drone Trajectory Estimation and Classification. This innovative anti‑UAV detection model leverages a parallel selective state‑space model to simultaneously capture and learn both the temporal and spectral features of audio, effectively analyzing propagation of sound. To further enhance temporal features, we introduce a Temporal Feature Enhancement Module, which integrates spectral features into temporal data using residual cross‑attention. This enhanced temporal information is then employed for precise 3D trajectory estimation and classification. Our model sets a new standard of performance on the MMUAD benchmarks, demonstrating superior accuracy and effectiveness. The code and trained models are publicly available on GitHub https://github.com/AmazingDay1/TAME.
Authors: Han Liu, Tian Liu, Kai Huang
Abstract: As urban logistics demand continues to grow, UAV delivery has become a key solution to improve delivery efficiency, reduce traffic congestion, and lower logistics costs. However, to fully leverage the potential of UAV delivery networks, efficient swarm scheduling and management are crucial. In this paper, we propose a real‑time scheduling and management system based on the ``Airport‑Unloading Station" model, aiming to bridge the gap between high‑level scheduling algorithms and low‑level execution systems. This system, acting as middleware, accurately translates the requirements from the scheduling layer into specific execution instructions, ensuring that the scheduling algorithms perform effectively in real‑world environments. Additionally, we implement three collaborative scheduling schemes involving autonomous ground vehicles (AGVs), unmanned aerial vehicles (UAVs), and ground staff to further optimize overall delivery efficiency. Through extensive experiments, this study demonstrates the rationality and feasibility of the proposed management system, providing practical solution for the commercial application of UAVs delivery in urban.
Code: https://github.com/chengji253/UAVDeliverySystem
Authors: Samuel Folorunsho, Maggie Ni, William Norris
Abstract: This paper presents the development of a comprehensive dynamics and stabilizing control architecture for Tethered Unmanned Aerial Vehicle (TUAV) systems. The proposed architecture integrates both onboard and ground‑based controllers, employing nonlinear backstepping control techniques to achieve asymptotic stability of the TUAV's equilibrium. The onboard controllers are responsible for the position and attitude control of the TUAV, while the ground controllers regulate the winder mechanism to maintain the desired tether length, ensuring it retains its catenary form. Simulation results demonstrate the ability of the TUAV system to accurately track linear and circular trajectories, ensuring robust performance under various operational scenarios. The code and movies demonstrating the performance of the system can be found at https://github.com/sof‑danny/TUAV\_system\_control.
Authors: Ruihuai Liang, Bo Yang, Pengyu Chen, Xuelin Cao, Zhiwen Yu, Mérouane Debbah, Dusit Niyato, H. Vincent Poor, Chau Yuen
Abstract: Optimization is crucial for MEC networks to function efficiently and reliably, most of which are NP‑hard and lack efficient approximation algorithms. This leads to a paucity of optimal solution, constraining the effectiveness of conventional deep learning approaches. Most existing learning‑based methods necessitate extensive optimal data and fail to exploit the potential benefits of suboptimal data that can be obtained with greater efficiency and effectiveness. Taking the multi‑server multi‑user computation offloading (MSCO) problem, which is widely observed in systems like Internet‑of‑Vehicles (IoV) and Unmanned Aerial Vehicle (UAV) networks, as a concrete scenario, we present a Graph Diffusion‑based Solution Generation (GDSG) method. This approach is designed to work with suboptimal datasets while converging to the optimal solution large probably. We transform the optimization issue into distribution‑learning and offer a clear explanation of learning from suboptimal training datasets. We build GDSG as a multi‑task diffusion model utilizing a Graph Neural Network (GNN) to acquire the distribution of high‑quality solutions. We use a simple and efficient heuristic approach to obtain a sufficient amount of training data composed entirely of suboptimal solutions. In our implementation, we enhance the backbone GNN and achieve improved generalization. GDSG also reaches nearly 100% task orthogonality, ensuring no interference between the discrete and continuous generation tasks. We further reveal that this orthogonality arises from the diffusion‑related training loss, rather than the neural network architecture itself. The experiments demonstrate that GDSG surpasses other benchmark methods on both the optimal and suboptimal training datasets. The MSCO datasets has open‑sourced at this http URL, as well as the GDSG algorithm codes at https://github.com/qiyu3816/GDSG.
Authors: Jun Dong, Jintao Cheng, Jin Wu, Chengxi Zhang, Shunyi Zhao, Xiaoyu Tang
Abstract: In the fifth‑generation (5G) era, eliminating communication interference sources is crucial for maintaining network performance. Interference often originates from unauthorized or malfunctioning antennas, and radio monitoring agencies must address numerous sources of such antennas annually. Unmanned aerial vehicles (UAVs) can improve inspection efficiency. However, the data transmission delay in the existing cloud‑only (CO) artificial intelligence (AI) mode fails to meet the low latency requirements for real‑time performance. Therefore, we propose a computer vision‑based AI of Things (AIoT) system to detect antenna interference sources for UAVs. The system adopts an optimized edge‑cloud collaboration (ECC+) mode, combining a keyframe selection algorithm (KSA), focusing on reducing end‑to‑end latency (E2EL) and ensuring reliable data transmission, which aligns with the core principles of ultra‑reliable low‑latency communication (URLLC). At the core of our approach is an end‑to‑end antenna localization scheme based on the tracking‑by‑detection (TBD) paradigm, including a detector (EdgeAnt) and a tracker (AntSort). EdgeAnt achieves state‑of‑the‑art (SOTA) performance with a mean average precision (mAP) of 42.1% on our custom antenna interference source dataset, requiring only 3 million parameters and 14.7 GFLOPs. On the COCO dataset, EdgeAnt achieves 38.9% mAP with 5.4 GFLOPs. We deployed EdgeAnt on Jetson Xavier NX (TRT) and Raspberry Pi 4B (NCNN), achieving real‑time inference speeds of 21.1 (1088) and 4.8 (640) frames per second (FPS), respectively. Compared with CO mode, the ECC+ mode reduces E2EL by 88.9%, increases accuracy by 28.2%. Additionally, the system offers excellent scalability for coordinated multiple UAVs inspections. The detector code is publicly available at https://github.com/SCNU‑RISLAB/EdgeAnt.
Authors: You Wu, Xiangyang Yang, Xucheng Wang, Hengzhou Ye, Dan Zeng, Shuiwang Li
Abstract: Harnessing low‑light enhancement and domain adaptation, nighttime UAV tracking has made substantial strides. However, over‑reliance on image enhancement, limited high‑quality nighttime data, and a lack of integration between daytime and nighttime trackers hinder the development of an end‑to‑end trainable framework. Additionally, current ViT‑based trackers demand heavy computational resources due to their reliance on the self‑attention mechanism. In this paper, we propose a novel pure Mamba‑based tracking framework (MambaNUT) that employs a state space model with linear complexity as its backbone, incorporating a single‑stream architecture that integrates feature learning and template‑search coupling within Vision Mamba. We introduce an adaptive curriculum learning (ACL) approach that dynamically adjusts sampling strategies and loss weights, thereby improving the model's ability of generalization. Our ACL is composed of two levels of curriculum schedulers: (1) sampling scheduler that transforms the data distribution from imbalanced to balanced, as well as from easier (daytime) to harder (nighttime) samples; (2) loss scheduler that dynamically assigns weights based on the size of the training set and IoU of individual instances. Exhaustive experiments on multiple nighttime UAV tracking benchmarks demonstrate that the proposed MambaNUT achieves state‑of‑the‑art performance while requiring lower computational costs. The code will be available at https://github.com/wuyou3474/MambaNUT.
Authors: Haochen Chai, Meimei Su, Yang Lyu, Zhunga Liu, Chunhui Zhao, Quan Pan
Abstract: Fixed‑wing Unmanned Aerial Vehicles (UAVs) are one of the most commonly used platforms for the burgeoning Low‑altitude Economy (LAE) and Urban Air Mobility (UAM), due to their long endurance and high‑speed capabilities. Classical obstacle avoidance systems, which rely on prior maps or sophisticated sensors, face limitations in unknown low‑altitude environments and small UAV platforms. In response, this paper proposes a lightweight deep reinforcement learning (DRL) based UAV collision avoidance system that enables a fixed‑wing UAV to avoid unknown obstacles at cruise speed over 30m/s, with only onboard visual sensors. The proposed system employs a single‑frame image depth inference module with a streamlined network architecture to ensure real‑time obstacle detection, optimized for edge computing devices. After that, a reinforcement learning controller with a novel reward function is designed to balance the target approach and flight trajectory smoothness, satisfying the specific dynamic constraints and stability requirements of a fixed‑wing UAV platform. An adaptive entropy adjustment mechanism is introduced to mitigate the exploration‑exploitation trade‑off inherent in DRL, improving training convergence and obstacle avoidance success rates. Extensive software‑in‑the‑loop and hardware‑in‑the‑loop experiments demonstrate that the proposed framework outperforms other methods in obstacle avoidance efficiency and flight trajectory smoothness and confirm the feasibility of implementing the algorithm on edge devices. The source code is publicly available at \urlhttps://github.com/ch9397/FixedWing‑MonoPPO.
Authors: Chunhui Zhang, Li Liu, Hao Wen, Xi Zhou, Yanfeng Wang
Abstract: Night unmanned aerial vehicle (UAV) tracking is impeded by the challenges of poor illumination, with previous daylight‑optimized methods demonstrating suboptimal performance in low‑light conditions, limiting the utility of UAV applications. To this end, we propose an efficient mamba‑based tracker, leveraging dual enhancement techniques to boost night UAV tracking. The mamba‑based low‑light enhancer, equipped with an illumination estimator and a damage restorer, achieves global image enhancement while preserving the details and structure of low‑light images. Additionally, we advance a cross‑modal mamba network to achieve efficient interactive learning between vision and language modalities. Extensive experiments showcase that our method achieves advanced performance and exhibits significantly improved computation and memory efficiency. For instance, our method is 2.8× faster than CiteTracker and reduces 50.2% GPU memory. Our codes are available at \urlhttps://github.com/983632847/Awesome‑Multimodal‑Object‑Tracking.
Authors: Guozheng Lu, Yunfan Ren, Fangcheng Zhu, Haotian Li, Ruize Xue, Yixi Cai, Ximin Lyu, Fu Zhang
Abstract: Trajectory generation for fully autonomous flights of tail‑sitter unmanned aerial vehicles (UAVs) presents substantial challenges due to their highly nonlinear aerodynamics. In this paper, we introduce, to the best of our knowledge, the world's first fully autonomous tail‑sitter UAV capable of high‑speed navigation in unknown, cluttered environments. The UAV autonomy is enabled by cutting‑edge technologies including LiDAR‑based sensing, differential‑flatness‑based trajectory planning and control with purely onboard computation. In particular, we propose an optimization‑based tail‑sitter trajectory planning framework that generates high‑speed, collision‑free, and dynamically‑feasible trajectories. To efficiently and reliably solve this nonlinear, constrained \textcolorblackproblem, we develop an efficient feasibility‑assured solver, EFOPT, tailored for the online planning of tail‑sitter UAVs. We conduct extensive simulation studies to benchmark EFOPT's superiority in planning tasks against conventional NLP solvers. We also demonstrate exhaustive experiments of aggressive autonomous flights with speeds up to 15m/s in various real‑world environments, including indoor laboratories, underground parking lots, and outdoor parks. A video demonstration is available at https://youtu.be/OvqhlB2h3k8, and the EFOPT solver is open‑sourced at https://github.com/hku‑mars/EFOPT.
Authors: Feng Gao, Chao Yu, Yu Wang, Yi Wu
Abstract: Accurate motion control in the face of disturbances within complex environments remains a major challenge in robotics. Classical model‑based approaches often struggle with nonlinearities and unstructured disturbances, while RL‑based methods can be fragile when encountering unseen scenarios. In this paper, we propose a novel framework, Neural Internal Model Control, which integrates model‑based control with RL‑based control to enhance robustness. Our framework streamlines the predictive model by applying Newton‑Euler equations for rigid‑body dynamics, eliminating the need to capture complex high‑dimensional nonlinearities. This internal model combines model‑free RL algorithms with predictive error feedback. Such a design enables a closed‑loop control structure to enhance the robustness and generalizability of the control system. We demonstrate the effectiveness of our framework on both quadrotors and quadrupedal robots, achieving superior performance compared to state‑of‑the‑art methods. Furthermore, real‑world deployment on a quadrotor with rope‑suspended payloads highlights the framework's robustness in sim‑to‑real transfer. Our code is released at https://github.com/thu‑uav/NeuralIMC.
Authors: Xinhua Jiang, Tianpeng Liu, Li Liu, Zhen Liu, Yongxiang Liu
Abstract: Occlusion is a longstanding difficulty that challenges the UAV‑based object detection. Many works address this problem by adapting the detection model. However, few of them exploit that the UAV could fundamentally improve detection performance by changing its viewpoint. Active Object Detection (AOD) offers an effective way to achieve this purpose. Through Deep Reinforcement Learning (DRL), AOD endows the UAV with the ability of autonomous path planning to search for the observation that is more conducive to target identification. Unfortunately, there exists no available dataset for developing the UAV AOD method. To fill this gap, we released a UAV's eye view active vision dataset named UEVAVD and hope it can facilitate research on the UAV AOD problem. Additionally, we improve the existing DRL‑based AOD method by incorporating the inductive bias when learning the state representation. First, due to the partial observability, we use the gated recurrent unit to extract state representations from the observation sequence instead of the single‑view observation. Second, we pre‑decompose the scene with the Segment Anything Model (SAM) and filter out the irrelevant information with the derived masks. With these practices, the agent could learn an active viewing policy with better generalization capability. The effectiveness of our innovations is validated by the experiments on the UEVAVD dataset. Our dataset will soon be available at https://github.com/Leo000ooo/UEVAVD_dataset.
Authors: Zhuoran Li, Zhen Gao, Kuiyu Wang, Yikun Mei, Chunli Zhu, Lei Chen, Xiaomei Wu, Dusit Niyato
Abstract: To ensure the thriving development of low‑altitude economy, countering unauthorized unmanned aerial vehicles (UAVs) is an essential task. The existing widely deployed base stations hold great potential for joint communication and jamming. In light of this, this paper investigates the joint design of beamforming to simultaneously support communication with legitimate users and countermeasure against unauthorized UAVs based on dual‑functional multiple‑input multiple‑output (MIMO) cellular systems. We first formulate a joint communication and jamming (JCJ) problem, relaxing it through semi‑definite relaxation (SDR) to obtain a tractable semi‑definite programming (SDP) problem, with SDR providing an essential step toward simplifying the complex JCJ design. Although the solution to the relaxed SDP problem cannot directly solve the original problem, it offers valuable insights for further refinement. Therefore, we design a novel constraint specifically tailored to the structure of the SDP problem, ensuring that the solution adheres to the rank‑1 constraint of the original problem. Finally, we validate effectiveness of the proposed JCJ scheme through extensive simulations. Simulation codes are provided to reproduce the results in this paper: https://github.com/LiZhuoRan0. The results confirm that the proposed JCJ scheme can operate effectively when the total number of legitimate users and unauthorized UAVs exceeds the number of antennas.
Authors: Zhicheng Zhao, Juanjuan Gu, Chenglong Li, Chun Wang, Zhongling Huang, Jin Tang
Abstract: Optics‑guided Thermal UAV image Super‑Resolution (OTUAV‑SR) has attracted significant research interest due to its potential applications in security inspection, agricultural measurement, and object detection. Existing methods often employ single guidance model to generate the guidance features from optical images to assist thermal UAV images super‑resolution. However, single guidance models make it difficult to generate effective guidance features under favorable and adverse conditions in UAV scenarios, thus limiting the performance of OTUAV‑SR. To address this issue, we propose a novel Guidance Disentanglement network (GDNet), which disentangles the optical image representation according to typical UAV scenario attributes to form guidance features under both favorable and adverse conditions, for robust OTUAV‑SR. Moreover, we design an attribute‑aware fusion module to combine all attribute‑based optical guidance features, which could form a more discriminative representation and fit the attribute‑agnostic guidance process. To facilitate OTUAV‑SR research in complex UAV scenarios, we introduce VGTSR2.0, a large‑scale benchmark dataset containing 3,500 aligned optical‑thermal image pairs captured under diverse conditions and scenes. Extensive experiments on VGTSR2.0 demonstrate that GDNet significantly improves OTUAV‑SR performance over state‑of‑the‑art methods, especially in the challenging low‑light and foggy environments commonly encountered in UAV scenarios. The dataset and code will be publicly available at https://github.com/Jocelyney/GDNet.
Authors: Qingpeng Li, Yuxin Zhang, Leyuan Fang, Yuhan Kang, Shutao Li, Xiao Xiang Zhu
Abstract: Object detection algorithms are pivotal components of unmanned aerial vehicle (UAV) imaging systems, extensively employed in complex fields. However, images captured by high‑mobility UAVs often suffer from motion blur cases, which significantly impedes the performance of advanced object detection algorithms. To address these challenges, we propose an innovative object detection algorithm specifically designed for blurry images, named DREB‑Net (Dual‑stream Restoration Embedding Blur‑feature Fusion Network). First, DREB‑Net addresses the particularities of blurry image object detection problem by incorporating a Blurry image Restoration Auxiliary Branch (BRAB) during the training phase. Second, it fuses the extracted shallow features via Multi‑level Attention‑Guided Feature Fusion (MAGFF) module, to extract richer features. Here, the MAGFF module comprises local attention modules and global attention modules, which assign different weights to the branches. Then, during the inference phase, the deep feature extraction of the BRAB can be removed to reduce computational complexity and improve detection speed. In loss function, a combined loss of MSE and SSIM is added to the BRAB to restore blurry images. Finally, DREB‑Net introduces Fast Fourier Transform in the early stages of feature extraction, via a Learnable Frequency domain Amplitude Modulation Module (LFAMM), to adjust feature amplitude and enhance feature processing capability. Experimental results indicate that DREB‑Net can still effectively perform object detection tasks under motion blur in captured images, showcasing excellent performance and broad application prospects. Our source code will be available at https://github.com/EEIC‑Lab/DREB‑Net.git.
Authors: Oleg Sautenkov, Selamawit Asfaw, Yasheerah Yaqoot, Muhammad Ahsan Mustafa, Aleksey Fedoseev, Daria Trinitatova, Dzmitry Tsetserukou
Abstract: The swift advancement of unmanned aerial vehicle (UAV) technologies necessitates new standards for developing human‑drone interaction (HDI) interfaces. Most interfaces for HDI, especially first‑person view (FPV) goggles, limit the operator's ability to obtain information from the environment. This paper presents a novel interface, FlightAR, that integrates augmented reality (AR) overlays of UAV first‑person view (FPV) and bottom camera feeds with head‑mounted display (HMD) to enhance the pilot's situational awareness. Using FlightAR, the system provides pilots not only with a video stream from several UAV cameras simultaneously, but also the ability to observe their surroundings in real time. User evaluation with NASA‑TLX and UEQ surveys showed low physical demand (μ=1.8, SD = 0.8) and good performance (μ=3.4, SD = 0.8), proving better user assessments in comparison with baseline FPV goggles. Participants also rated the system highly for stimulation (μ=2.35, SD = 0.9), novelty (μ=2.1, SD = 0.9) and attractiveness (μ=1.97, SD = 1), indicating positive user experiences. These results demonstrate the potential of the system to improve UAV piloting experience through enhanced situational awareness and intuitive control. The code is available here: https://github.com/Sautenich/FlightAR
Authors: Shuang Geng, Zelin Ning, Fu Zhang, Boyu Zhou
Abstract: Autonomous exploration is a fundamental problem for various applications of unmanned aerial vehicles (UAVs). Recently, LiDAR‑based exploration has gained significant attention due to its ability to generate high‑precision point cloud maps of large‑scale environments. While the point clouds are inherently informative for navigation, many existing exploration methods still rely on additional, often expensive, environmental representations. This reliance stems from two main reasons: the need for frontier detection or information gain computation, which typically depends on memory‑intensive occupancy grid maps, and the high computational complexity of path planning directly on point clouds, primarily due to costly collision checking. To address these limitations, we present EPIC, a lightweight LiDAR‑based UAV exploration framework that directly exploits point cloud data to explore large‑scale environments. EPIC introduces a novel observation map derived directly from the quality of point clouds, eliminating the need for global occupancy grid maps while preserving comprehensive exploration capabilities. We also propose an incremental topological graph construction method operating directly on point clouds, enabling real‑time path planning in large‑scale environments. Leveraging these components, we build a hierarchical planning framework that generates agile and energy‑efficient trajectories, achieving significantly reduced memory consumption and computation time compared to most existing methods. Extensive simulations and real‑world experiments demonstrate that EPIC achieves faster exploration while significantly reducing memory consumption compared to state‑of‑the‑art methods.
Authors: Haobo Zuo, Changhong Fu, Guangze Zheng, Liangliang Yao, Kunhan Lu, Jia Pan
Abstract: Domain adaptation is an inspiring solution to the misalignment issue of day/night image features for nighttime UAV tracking. However, the one‑step adaptation paradigm is inadequate in addressing the prevalent difficulties posed by low‑resolution (LR) objects when viewed from the UAVs at night, owing to the blurry edge contour and limited detail information. Moreover, these approaches struggle to perceive LR objects disturbed by nighttime noise. To address these challenges, this work proposes a novel progressive alignment paradigm, named domain‑aware diffusion model (DaDiff), aligning nighttime LR object features to the daytime by virtue of progressive and stable generations. The proposed DaDiff includes an alignment encoder to enhance the detail information of nighttime LR objects, a tracking‑oriented layer designed to achieve close collaboration with tracking tasks, and a successive distribution discriminator presented to distinguish different feature distributions at each diffusion timestep successively. Furthermore, an elaborate nighttime UAV tracking benchmark is constructed for LR objects, namely NUT‑LR, consisting of 100 annotated sequences. Exhaustive experiments have demonstrated the robustness and feature alignment ability of the proposed DaDiff. The source code and video demo are available at https://github.com/vision4robotics/DaDiff.
Authors: Juelin Zhu, Shen Yan, Long Wang, Shengyue Zhang, Yu Liu, Maojun Zhang
Abstract: We propose a new method named LoD‑Loc for visual localization in the air. Unlike existing localization algorithms, LoD‑Loc does not rely on complex 3D representations and can estimate the pose of an Unmanned Aerial Vehicle (UAV) using a Level‑of‑Detail (LoD) 3D map. LoD‑Loc mainly achieves this goal by aligning the wireframe derived from the LoD projected model with that predicted by the neural network. Specifically, given a coarse pose provided by the UAV sensor, LoD‑Loc hierarchically builds a cost volume for uniformly sampled pose hypotheses to describe pose probability distribution and select a pose with maximum probability. Each cost within this volume measures the degree of line alignment between projected and predicted wireframes. LoD‑Loc also devises a 6‑DoF pose optimization algorithm to refine the previous result with a differentiable Gaussian‑Newton method. As no public dataset exists for the studied problem, we collect two datasets with map levels of LoD3.0 and LoD2.0, along with real RGB queries and ground‑truth pose annotations. We benchmark our method and demonstrate that LoD‑Loc achieves excellent performance, even surpassing current state‑of‑the‑art methods that use textured 3D models for localization. The code and dataset are available at https://victorzoo.github.io/LoD‑Loc.github.io/.
Authors: Hui Ye, Rajshekhar Sunderraman, Shihao Ji
Abstract: Unmanned Aerial Vehicles (UAVs), equipped with cameras, are employed in numerous applications, including aerial photography, surveillance, and agriculture. In these applications, robust object detection and tracking are essential for the effective deployment of UAVs. However, existing benchmarks for UAV applications are mainly designed for traditional 2D perception tasks, restricting the development of real‑world applications that require a 3D understanding of the environment. Furthermore, despite recent advancements in single‑UAV perception, limited views of a single UAV platform significantly constrain its perception capabilities over long distances or in occluded areas. To address these challenges, we introduce UAV3D, a benchmark designed to advance research in both 3D and collaborative 3D perception tasks with UAVs. UAV3D comprises 1,000 scenes, each of which has 20 frames with fully annotated 3D bounding boxes on vehicles. We provide the benchmark for four 3D perception tasks: single‑UAV 3D object detection, single‑UAV object tracking, collaborative‑UAV 3D object detection, and collaborative‑UAV object tracking. Our dataset and code are available at https://huiyegit.github.io/UAV3D_Benchmark/.
Authors: Khaled Gabr, Mohamed Abdelkader, Imen Jarraya, Abdullah AlMusalami, Anis Koubaa
Abstract: In the field of sensor fusion and state estimation for object detection and localization, ensuring accurate tracking in dynamic environments poses significant challenges. Traditional methods like the Kalman Filter (KF) often fail when measurements are intermittent, leading to rapid divergence in state estimations. To address this, we introduce SMART (Sensor Measurement Augmentation and Reacquisition Tracker), a novel approach that leverages high‑frequency state estimates from the KF to guide the search for new measurements, maintaining tracking continuity even when direct measurements falter. This is crucial for dynamic environments where traditional methods struggle. Our contributions include: 1) Versatile Measurement Augmentation Using KF Feedback: We implement a versatile measurement augmentation system that serves as a backup when primary object detectors fail intermittently. This system is adaptable to various sensors, demonstrated using depth cameras where KF's 3D predictions are projected into 2D depth image coordinates, integrating nonlinear covariance propagation techniques simplified to first‑order approximations. 2) Open‑source ROS2 Implementation: We provide an open‑source ROS2 implementation of the SMART‑TRACK framework, validated in a realistic simulation environment using Gazebo and ROS2, fostering broader adaptation and further research. Our results showcase significant enhancements in tracking stability, with estimation RMSE as low as 0.04 m during measurement disruptions, advancing the robustness of UAV tracking and expanding the potential for reliable autonomous UAV operations in complex scenarios. The implementation is available at https://github.com/mzahana/SMART‑TRACK.
Authors: Lara Laban, Mariusz Wzorek, Piotr Rudol, Tommy Persson
Abstract: Navigating complex environments requires Unmanned Aerial Vehicles (UAVs) and autonomous systems to perform trajectory tracking and obstacle avoidance in real‑time. While many control strategies have effectively utilized linear approximations, addressing the non‑linear dynamics of UAV, especially in obstacle‑dense environments, remains a key challenge that requires further research. This paper introduces a Non‑linear Model Predictive Control (NMPC) framework for the DJI Matrice 100, addressing these challenges by using a dynamic model and B‑spline interpolation for smooth reference trajectories, ensuring minimal deviation while respecting safety constraints. The framework supports various trajectory types and employs a penalty‑based cost function for control accuracy in tight maneuvers. The framework utilizes CasADi for efficient real‑time optimization, enabling the UAV to maintain robust operation even under tight computational constraints. Simulation and real‑world indoor and outdoor experiments demonstrated the NMPC ability to adapt to disturbances, resulting in smooth, collision‑free navigation.
Authors: Mohssen E. Elshaar, Zeyad M. Manaa, Mohammed R. Elbalshy, Abdul Jabbar Siddiqui, Ayman M. Abdallah
Abstract: Unmanned Aerial Vehicles (UAVs) are becoming more popular in various sectors, offering many benefits, yet introducing significant challenges to privacy and safety. This paper investigates state‑of‑the‑art solutions for detecting and tracking quadrotor UAVs to address these concerns. Cutting‑edge deep learning models, specifically the YOLOv5 and YOLOv8 series, are evaluated for their performance in identifying UAVs accurately and quickly. Additionally, robust tracking systems, BoT‑SORT and Byte Track, are integrated to ensure reliable monitoring even under challenging conditions. Our tests on the DUT dataset reveal that while YOLOv5 models generally outperform YOLOv8 in detection accuracy, the YOLOv8 models excel in recognizing less distinct objects, demonstrating their adaptability and advanced capabilities. Furthermore, BoT‑SORT demonstrated superior performance over Byte Track, achieving higher IoU and lower center error in most cases, indicating more accurate and stable tracking.
Code: https://github.com/zmanaa/UAV_detection_and_tracking Tracking demo: https://drive.google.com/file/d/1pe6HC5kQrgTbA2QrjvMN‑yjaZyWeAvDT/view?usp=sharing
Authors: Changhong Fu, Yiheng Wang, Liangliang Yao, Guangze Zheng, Haobo Zuo, Jia Pan
Abstract: Nighttime UAV tracking under low‑illuminated scenarios has achieved great progress by domain adaptation (DA). However, previous DA training‑based works are deficient in narrowing the discrepancy of temporal contexts for UAV trackers. To address the issue, this work proposes a prompt‑driven temporal domain adaptation training framework to fully utilize temporal contexts for challenging nighttime UAV tracking, i.e., TDA. Specifically, the proposed framework aligns the distribution of temporal contexts from daytime and nighttime domains by training the temporal feature generator against the discriminator. The temporal‑consistent discriminator progressively extracts shared domain‑specific features to generate coherent domain discrimination results in the time series. Additionally, to obtain high‑quality training samples, a prompt‑driven object miner is employed to precisely locate objects in unannotated nighttime videos. Moreover, a new benchmark for long‑term nighttime UAV tracking is constructed. Exhaustive evaluations on both public and self‑constructed nighttime benchmarks demonstrate the remarkable performance of the tracker trained in TDA framework, i.e., TDA‑Track. Real‑world tests at nighttime also show its practicality. The code and demo videos are available at https://github.com/vision4robotics/TDA‑Track.
Authors: Yuxiang Ji, Boyong He, Zhuoyue Tan, Liaoni Wu
Abstract: The vision‑based geo‑localization technology for UAV, serving as a secondary source of GPS information in addition to the global navigation satellite systems (GNSS), can still operate independently in the GPS‑denied environment. Recent deep learning based methods attribute this as the task of image matching and retrieval. By retrieving drone‑view images in geo‑tagged satellite image database, approximate localization information can be obtained. However, due to high costs and privacy concerns, it is usually difficult to obtain large quantities of drone‑view images from a continuous area. Existing drone‑view datasets are mostly composed of small‑scale aerial photography with a strong assumption that there exists a perfect one‑to‑one aligned reference image for any query, leaving a significant gap from the practical localization scenario. In this work, we construct a large‑range contiguous area UAV geo‑localization dataset named GTA‑UAV, featuring multiple flight altitudes, attitudes, scenes, and targets using modern computer games. Based on this dataset, we introduce a more practical UAV geo‑localization task including partial matches of cross‑view paired data, and expand the image‑level retrieval to the actual localization in terms of distance (meters). For the construction of drone‑view and satellite‑view pairs, we adopt a weight‑based contrastive learning approach, which allows for effective learning while avoiding additional post‑processing matching steps. Experiments demonstrate the effectiveness of our data and training method for UAV geo‑localization, as well as the generalization capabilities to real‑world scenarios.
Authors: Yucheng Wang, Changhong Fu, Kunhan Lu, Liangliang Yao, Haobo Zuo
Abstract: State‑of‑the‑art (SOTA) visual object tracking methods have significantly enhanced the autonomy of unmanned aerial vehicles (UAVs). However, in low‑light conditions, the presence of irregular real noise from the environments severely degrades the performance of these SOTA methods. Moreover, existing SOTA denoising techniques often fail to meet the real‑time processing requirements when deployed as plug‑and‑play denoisers for UAV tracking. To address this challenge, this work proposes a novel conditional generative denoiser (CGDenoiser), which breaks free from the limitations of traditional deterministic paradigms and generates the noise conditioning on the input, subsequently removing it. To better align the input dimensions and accelerate inference, a novel nested residual Transformer conditionalizer is developed. Furthermore, an innovative multi‑kernel conditional refiner is designed to pertinently refine the denoised output. Extensive experiments show that CGDenoiser promotes the tracking precision of the SOTA tracker by 18.18% on DarkTrack2021 whereas working 5.8 times faster than the second well‑performed denoiser. Real‑world tests with complex challenges also prove the effectiveness and practicality of CGDenoiser. Code, video demo and supplementary proof for CGDenoier are now available at: \urlhttps://github.com/vision4robotics/CGDenoiser.
Authors: Changhong Fu, Xiang Lei, Haobo Zuo, Liangliang Yao, Guangze Zheng, Jia Pan
Abstract: Visual object tracking has significantly promoted autonomous applications for unmanned aerial vehicles (UAVs). However, learning robust object representations for UAV tracking is especially challenging in complex dynamic environments, when confronted with aspect ratio change and occlusion. These challenges severely alter the original information of the object. To handle the above issues, this work proposes a novel progressive representation learning framework for UAV tracking, i.e., PRL‑Track. Specifically, PRL‑Track is divided into coarse representation learning and fine representation learning. For coarse representation learning, two innovative regulators, which rely on appearance and semantic information, are designed to mitigate appearance interference and capture semantic information. Furthermore, for fine representation learning, a new hierarchical modeling generator is developed to intertwine coarse object representations. Exhaustive experiments demonstrate that the proposed PRL‑Track delivers exceptional performance on three authoritative UAV tracking benchmarks. Real‑world tests indicate that the proposed PRL‑Track realizes superior tracking performance with 42.6 frames per second on the typical UAV platform equipped with an edge smart camera. The code, model, and demo videos are available at \urlhttps://github.com/vision4robotics/PRL‑Track.
Authors: Liangliang Yao, Changhong Fu, Yiheng Wang, Haobo Zuo, Kunhan Lu
Abstract: Visual object tracking has boosted extensive intelligent applications for unmanned aerial vehicles (UAVs). However, the state‑of‑the‑art (SOTA) enhancers for nighttime UAV tracking always neglect the uneven light distribution in low‑light images, inevitably leading to excessive enhancement in scenarios with complex illumination. To address these issues, this work proposes a novel enhancer, i.e., LDEnhancer, enhancing nighttime UAV tracking with light distribution suppression. Specifically, a novel image content refinement module is developed to decompose the light distribution information and image content information in the feature space, allowing for the targeted enhancement of the image content information. Then this work designs a new light distribution generation module to capture light distribution effectively. The features with light distribution information and image content information are fed into the different parameter estimation modules, respectively, for the parameter map prediction. Finally, leveraging two parameter maps, an innovative interweave iteration adjustment is proposed for the collaborative pixel‑wise adjustment of low‑light images. Additionally, a challenging nighttime UAV tracking dataset with uneven light distribution, namely NAT2024‑2, is constructed to provide a comprehensive evaluation, which contains 40 challenging sequences with over 74K frames in total. Experimental results on the authoritative UAV benchmarks and the proposed NAT2024‑2 demonstrate that LDEnhancer outperforms other SOTA low‑light enhancers for nighttime UAV tracking. Furthermore, real‑world tests on a typical UAV platform with an NVIDIA Orin NX confirm the practicality and efficiency of LDEnhancer. The code is available at https://github.com/vision4robotics/LDEnhancer.
Authors: Mingle Zhou, Rui Xing, Delong Han, Zhiyong Qi, Gang Li
Abstract: UAVs emerge as the optimal carriers for visual weed iden?tification and integrated pest and disease management in crops. How?ever, the absence of specialized datasets impedes the advancement of model development in this domain. To address this, we have developed the Pests and Diseases Tree dataset (PDT dataset). PDT dataset repre?sents the first high‑precision UAV‑based dataset for targeted detection of tree pests and diseases, which is collected in real‑world operational environments and aims to fill the gap in available datasets for this field. Moreover, by aggregating public datasets and network data, we further introduced the Common Weed and Crop dataset (CWC dataset) to ad?dress the challenge of inadequate classification capabilities of test models within datasets for this field. Finally, we propose the YOLO‑Dense Pest (YOLO‑DP) model for high‑precision object detection of weed, pest, and disease crop images. We re‑evaluate the state‑of‑the‑art detection models with our proposed PDT dataset and CWC dataset, showing the completeness of the dataset and the effectiveness of the YOLO‑DP. The proposed PDT dataset, CWC dataset, and YOLO‑DP model are pre?sented at https://github.com/RuiXing123/PDT_CWC_YOLO‑DP.
Authors: Zhefan Xu, Xinming Han, Haoyu Shen, Hanyu Jin, Kenji Shimada
Abstract: Safe flight in dynamic environments requires unmanned aerial vehicles (UAVs) to make effective decisions when navigating cluttered spaces with moving obstacles. Traditional approaches often decompose decision‑making into hierarchical modules for prediction and planning. Although these handcrafted systems can perform well in specific settings, they might fail if environmental conditions change and often require careful parameter tuning. Additionally, their solutions could be suboptimal due to the use of inaccurate mathematical model assumptions and simplifications aimed at achieving computational efficiency. To overcome these limitations, this paper introduces the NavRL framework, a deep reinforcement learning‑based navigation method built on the Proximal Policy Optimization (PPO) algorithm. NavRL utilizes our carefully designed state and action representations, allowing the learned policy to make safe decisions in the presence of both static and dynamic obstacles, with zero‑shot transfer from simulation to real‑world flight. Furthermore, the proposed method adopts a simple but effective safety shield for the trained policy, inspired by the concept of velocity obstacles, to mitigate potential failures associated with the black‑box nature of neural networks. To accelerate the convergence, we implement the training pipeline using NVIDIA Isaac Sim, enabling parallel training with thousands of quadcopters. Simulation and physical experiments show that our method ensures safe navigation in dynamic environments and results in the fewest collisions compared to benchmarks.
Authors: Zhefan Xu, Hanyu Jin, Xinming Han, Haoyu Shen, Kenji Shimada
Abstract: Aerial robots can enhance construction site productivity by autonomously handling inspection and mapping tasks. However, ensuring safe navigation near human workers remains challenging. While navigation in static environments has been well studied, navigating dynamic environments remains open due to challenges in perception and planning. Payload limitations restrict the robots to using cameras with limited fields of view, resulting in unreliable perception and tracking during collision avoidance. Moreover, the rapidly changing conditions of dynamic environments can quickly make the generated optimal trajectory outdated.To address these challenges, this paper presents a comprehensive navigation framework that integrates perception, intent prediction, and planning. Our perception module detects and tracks dynamic obstacles efficiently and handles tracking loss and occlusion during collision avoidance. The proposed intent prediction module employs a Markov Decision Process (MDP) to forecast potential actions of dynamic obstacles with the possible future trajectories. Finally, a novel intent‑based planning algorithm, leveraging model predictive control (MPC), is applied to generate navigation trajectories. Simulation and physical experiments demonstrate that our method improves the safety of navigation by achieving the fewest collisions compared to benchmarks.
Authors: Jianbo Ma, Chuanming Tang, Fei Wu, Can Zhao, Jianlin Zhang, Zhiyong Xu
Abstract: Multiple object tracking (MOT) in Unmanned Aerial Vehicle (UAV) videos is important for diverse applications in computer vision. Current MOT trackers rely on accurate object detection results and precise matching of target reidentification (ReID). These methods focus on optimizing target spatial attributes while overlooking temporal cues in modelling object relationships, especially for challenging tracking conditions such as object deformation and blurring, etc. To address the above‑mentioned issues, we propose a novel Spatio‑Temporal Cohesion Multiple Object Tracking framework (STCMOT), which utilizes historical embedding features to model the representation of ReID and detection features in a sequential order. Concretely, a temporal embedding boosting module is introduced to enhance the discriminability of individual embedding based on adjacent frame cooperation. While the trajectory embedding is then propagated by a temporal detection refinement module to mine salient target locations in the temporal field. Extensive experiments on the VisDrone2019 and UAVDT datasets demonstrate our STCMOT sets a new state‑of‑the‑art performance in MOTA and IDF1 metrics. The source codes are released at https://github.com/ydhcg‑BoBo/STCMOT.
Authors: Qian Chen, Shihao Shu, Xiangzhi Bai
Abstract: Novel‑view synthesis based on visible light has been extensively studied. In comparison to visible light imaging, thermal infrared imaging offers the advantage of all‑weather imaging and strong penetration, providing increased possibilities for reconstruction in nighttime and adverse weather scenarios. However, thermal infrared imaging is influenced by physical characteristics such as atmospheric transmission effects and thermal conduction, hindering the precise reconstruction of intricate details in thermal infrared scenes, manifesting as issues of floaters and indistinct edge features in synthesized images. To address these limitations, this paper introduces a physics‑induced 3D Gaussian splatting method named Thermal3D‑GS. Thermal3D‑GS begins by modeling atmospheric transmission effects and thermal conduction in three‑dimensional media using neural networks. Additionally, a temperature consistency constraint is incorporated into the optimization objective to enhance the reconstruction accuracy of thermal infrared images. Furthermore, to validate the effectiveness of our method, the first large‑scale benchmark dataset for this field named Thermal Infrared Novel‑view Synthesis Dataset (TI‑NSD) is created. This dataset comprises 20 authentic thermal infrared video scenes, covering indoor, outdoor, and UAV(Unmanned Aerial Vehicle) scenarios, totaling 6,664 frames of thermal infrared image data. Based on this dataset, this paper experimentally verifies the effectiveness of Thermal3D‑GS. The results indicate that our method outperforms the baseline method with a 3.03 dB improvement in PSNR and significantly addresses the issues of floaters and indistinct edge features present in the baseline method. Our dataset and codebase will be released in \hrefhttps://github.com/mzzcdf/Thermal3DGS\textcolorredThermal3DGS.
Authors: Ang He, Xiaobo Li, Ximei Wu, Chengyue Su, Jing Chen, Sheng Xu, Xiaobin Guo
Abstract: Unmanned aerial vehicles (UAVs) equipped with thermal infrared (TIR) cameras play a crucial role in combating nocturnal wildlife poaching. However, TIR images often face challenges such as jitter, and wildlife overlap, necessitating UAVs to possess the capability to identify blurred and overlapping small targets. Current traditional lightweight networks deployed on UAVs struggle to extract features from blurry small targets. To address this issue, we developed ALSS‑YOLO, an efficient and lightweight detector optimized for TIR aerial images. Firstly, we propose a novel Adaptive Lightweight Channel Split and Shuffling (ALSS) module. This module employs an adaptive channel split strategy to optimize feature extraction and integrates a channel shuffling mechanism to enhance information exchange between channels. This improves the extraction of blurry features, crucial for handling jitter‑induced blur and overlapping targets. Secondly, we developed a Lightweight Coordinate Attention (LCA) module that employs adaptive pooling and grouped convolution to integrate feature information across dimensions. This module ensures lightweight operation while maintaining high detection precision and robustness against jitter and target overlap. Additionally, we developed a single‑channel focus module to aggregate the width and height information of each channel into four‑dimensional channel fusion, which improves the feature representation efficiency of infrared images. Finally, we modify the localization loss function to emphasize the loss value associated with small objects to improve localization accuracy. Extensive experiments on the BIRDSAI and ISOD TIR UAV wildlife datasets show that ALSS‑YOLO achieves state‑of‑the‑art performance, Our code is openly available at https://github.com/helloworlder8/computer_vision.
Authors: Sai Yang, Bin Hu, Bojun Zhou, Fan Liu, Xiaoxin Wu, Xinsong Zhang, Juping Gu, Jun Zhou
Abstract: Power Line Autonomous Inspection (PLAI) plays a crucial role in the construction of smart grids due to its great advantages of low cost, high efficiency, and safe operation. PLAI is completed by accurately detecting the electrical components and defects in the aerial images captured by Unmanned Aerial Vehicles (UAVs). However, the visible quality of aerial images is inevitably degraded by adverse weather like haze, rain, or snow, which are found to drastically decrease the detection accuracy in our research. To circumvent this problem, we propose a new task of Power Line Aerial Image Restoration under Adverse Weather (PLAIR‑AW), which aims to recover clean and high‑quality images from degraded images with bad weather thus improving detection performance for PLAI. In this context, we are the first to release numerous corresponding datasets, namely, HazeCPLID, HazeTTPLA, HazeInsPLAD for power line aerial image dehazing, RainCPLID, RainTTPLA, RainInsPLAD for power line aerial image deraining, SnowCPLID, SnowInsPLAD for power line aerial image desnowing, which are synthesized upon the public power line aerial image datasets of CPLID, TTPLA, InsPLAD following the mathematical models. Meanwhile, we select numerous state‑of‑the‑art methods from image restoration community as the baseline methods for PLAIR‑AW. At last, we conduct large‑scale empirical experiments to evaluate the performance of baseline methods on the proposed datasets. The proposed datasets and trained models are available at https://github.com/ntuhubin/PLAIR‑AW.
Authors: Mingjie Zhang, Chen Feng, Zengzhi Li, Guiyong Zheng, Yiming Luo, Zhu Wang, Jinni Zhou, Shaojie Shen, Boyu Zhou
Abstract: Unmanned Aerial Vehicles (UAVs) have gained significant popularity in scene reconstruction. This paper presents SOAR, a LiDAR‑Visual heterogeneous multi‑UAV system specifically designed for fast autonomous reconstruction of complex environments. Our system comprises a LiDAR‑equipped explorer with a large field‑of‑view (FoV), alongside photographers equipped with cameras. To ensure rapid acquisition of the scene's surface geometry, we employ a surface frontier‑based exploration strategy for the explorer. As the surface is progressively explored, we identify the uncovered areas and generate viewpoints incrementally. These viewpoints are then assigned to photographers through solving a Consistent Multiple Depot Multiple Traveling Salesman Problem (Consistent‑MDMTSP), which optimizes scanning efficiency while ensuring task consistency. Finally, photographers utilize the assigned viewpoints to determine optimal coverage paths for acquiring images. We present extensive benchmarks in the realistic simulator, which validates the performance of SOAR compared with classical and state‑of‑the‑art methods. For more details, please see our project page at https://sysu‑star.github.io/SOARsysu‑star.github.io/SOAR.
Authors: Zhiwei Wei, Chenran Huang, Bing Li, Yiting Zhao, Xiang Cheng, Liuqing Yang, Rongqing Zhang
Abstract: Vehicular Fog Computing (VFC) is significantly enhancing the efficiency, safety, and computational capabilities of Intelligent Transportation Systems (ITS), and the integration of Unmanned Aerial Vehicles (UAVs) further elevates these advantages by incorporating flexible and auxiliary services. This evolving UAV‑integrated VFC paradigm opens new doors while presenting unique complexities within the cooperative computation framework. Foremost among the challenges, modeling the intricate dynamics of aerial‑ground interactive computing networks is a significant endeavor, and the absence of a comprehensive and flexible simulation platform may impede the exploration of this field. Inspired by the pressing need for a versatile tool, this paper provides a lightweight and modular aerial‑ground collaborative simulation platform, termed AirFogSim. We present the design and implementation of AirFogSim, and demonstrate its versatility with five key missions in the domain of UAV‑integrated VFC. A multifaceted use case is carried out to validate AirFogSim's effectiveness, encompassing several integral aspects of the proposed AirFogSim, including UAV trajectory, task offloading, resource allocation, and blockchain. In general, AirFogSim is envisioned to set a new precedent in the UAV‑integrated VFC simulation, bridge the gap between theoretical design and practical validation, and pave the way for future intelligent transportation domains. Our code will be available at https://github.com/ZhiweiWei‑NAMI/AirFogSim.
Authors: Yuhao Pan, Xiucheng Wang, Zhiyao Xu, Nan Cheng, Wenchao Xu, Jun-jie Zhang
Abstract: Unmanned Aerial Vehicles (UAVs), due to their low cost and high flexibility, have been widely used in various scenarios to enhance network performance. However, the optimization of UAV trajectories in unknown areas or areas without sufficient prior information, still faces challenges related to poor planning performance and low distributed execution. These challenges arise when UAVs rely solely on their own observation information and the information from other UAVs within their communicable range, without access to global information. To address these challenges, this paper proposes the Qedgix framework, which combines graph neural networks (GNNs) and the QMIX algorithm to achieve distributed optimization of the Age of Information (AoI) for users in unknown scenarios. The framework utilizes GNNs to extract information from UAVs, users within the observable range, and other UAVs within the communicable range, thereby enabling effective UAV trajectory planning. Due to the discretization and temporal features of AoI indicators, the Qedgix framework employs QMIX to optimize distributed partially observable Markov decision processes (Dec‑POMDP) based on centralized training and distributed execution (CTDE) with respect to mean AoI values of users. By modeling the UAV network optimization problem in terms of AoI and applying the Kolmogorov‑Arnold representation theorem, the Qedgix framework achieves efficient neural network training through parameter sharing based on permutation invariance. Simulation results demonstrate that the proposed algorithm significantly improves convergence speed while reducing the mean AoI values of users. The code is available at https://github.com/UNIC‑Lab/Qedgix.
Authors: Andrei-Marian Ungureanu, Stelian Spînu
Abstract: Autonomous drones are rapidly transforming modern warfare and civil applications alike. This paper presents the development of an integrated intelligent drone system designed to serve as a personal assistant. Leveraging the DJI Tello drone platform, we implemented a modular architecture that integrates three core artificial intelligence functionalities: facial detection, facial recognition, and depth estimation from monocular vision. A web‑based interface enables seamless drone control and real‑time video monitoring, while a Python‑based server processes visual data and executes inference pipelines using lightweight neural models optimized for embedded systems. Unlike existing commercial solutions, this system emphasizes accessibility, low‑cost hardware, and open‑source technologies. The system demonstrates robust performance in real‑world conditions, including person tracking, indoor scanning, and autonomous line following using virtual sensors. This project validates the applicability of advanced AI techniques in real‑time robotic systems and illustrates the feasibility of deploying them on constrained hardware, providing a foundation for future research in autonomous UAVs for military, rescue, and surveillance missions.
Authors: Weixian Qian, Tianyi Yang, Sebastian Schroder, Yao Deng, Jiaohong Yao, Xiao Cheng, Richard Han, Xi Zheng
Abstract: Safe landing‑site assessment in unstructured environments remains a key challenge for autonomous UAV deployment, as vision‑only learning approaches often degrade under terrain variability and provide limited transparency in safety decisions. We present NEUROSYMLAND, a neuro‑symbolic landing‑site assessment system that integrates lightweight perception with explicit safety reasoning. The framework constructs a probabilistic semantic scene graph from onboard visual input and evaluates candidate landing regions using symbolic constraints capturing terrain flatness, obstacle clearance, and spatial consistency, enabling structured reasoning under perceptual uncertainty while maintaining edge‑feasible execution. Across 72 simulated landing scenarios spanning diverse terrains, NEUROSYMLAND achieves 61 successful assessments, outperforming four competitive baselines (37‑57 successes). To evaluate deployability, we further conduct 100 hardware‑in‑the‑loop trials with randomized initial poses, profiling end‑to‑end latency, stage‑wise execution time, and system‑level metrics including CPU/GPU utilization, memory footprint, and power consumption. Results demonstrate improved robustness and interpretability with bounded edge‑resource usage. Profiling shows that symbolic reasoning contributes only a small fraction of end‑to‑end latency, while the main computational cost arises from perception and PSSG construction. These results demonstrate the feasibility of deploying the landing‑site assessment stack on edge‑constrained UAV hardware, and all source code, datasets, prompts, and symbolic rule refinement examples are released in an open‑source repository
Authors: Shenghui Zhang, YuXuan Gao, Songwei Zhao, Jifeng Hu, Zijing Zhang, Hechang Chen
Abstract: With the rapid development of autonomous aerial systems, Unmanned Aerial Vehicles (UAVs) are increasingly deployed in applications such as inspection, environmental monitoring, and rescue, creating growing demand for reliable autonomous navigation. However, autonomous UAV navigation in dense environments remains challenging under sparse perception and dynamic constraints. Most reinforcement learning (RL) methods lack explicit safety mechanisms, leading to unsafe exploration, unstable training, and risky behaviors, especially during high‑speed flight. Even in safe RL approaches, safety is often enforced by projecting policy outputs onto a safe action set, which may introduce instability. Meanwhile, many learning‑based methods rely on dense inputs or large networks, increasing computational burden and limiting lightweight onboard deployment. Facing the above challenges, we propose a safety‑constrained perception‑control integrated framework for UAV navigation. A lightweight network encodes sparse observations into collision‑risk‑aware features using asymmetric and depthwise separable convolutions. We formulate the task as a constrained Markov decision process within a hierarchical control architecture and solve it using a Lagrangian‑based safe PPO algorithm. Curriculum learning further improves training stability. Experiments with varying obstacle densities and flight speeds demonstrate higher success rates, improved safety, and better efficiency than existing reinforcement learning baselines.
Authors: Chun-Kit Li, Iok Long Sit, Ming Fung Siu, Ka Yu Kui, Hin Wang Lin, Pengyu Wang, Ling Shi
Abstract: Autonomous landing of unmanned aerial vehicles (UAVs) on wave‑disturbed marine platforms remains challenging due to stochastic platform motion, time‑varying platform attitude, and uncertain touchdown conditions. Existing model‑based methods often require accurate motion prediction and online optimization, while end‑to‑end learning approaches may suffer from high training complexity and limited interpretability. This paper presents WaveLander, a hierarchical control framework via reinforcement learning (RL) that decouples vertical landing decision‑making from low‑level flight stabilization. The RL policy maps a compact platform‑relative observation to a scalar vertical velocity reference, while a conventional low‑level flight controller maintains attitude stability and lateral tracking. This formulation reduces dynamic platform landing to a low‑dimensional, timing‑aware control problem and enables smooth landing behavior without explicit switching rules. Simulation results under randomized wave‑induced platform motions show that WaveLander achieves robust landing performance and generalizes to unseen disturbance conditions, demonstrating the potential of hierarchical learning‑based control for marine UAV recovery.
Authors: Luigi Petruzziello, Camilla Fioravanti, Gabriele Oliva
Abstract: UAV swarms and cyber‑physical multi‑agent systems are increasingly deployed in safety‑critical missions that require coordinated motion, distributed decision making, and autonomy. A major security risk arises when a legitimate agent is hijacked and driven by adversarial high‑level commands. Rather than focusing on detection and isolation of malicious agents, we exploit a structural property common in autonomous platforms: low‑level collision‑avoidance modules are typically implemented as independent safety layers and may remain active even under high‑level compromise. Building on this property, we propose a distributed containment framework that uses the compromised agent's uncompromised avoidance response as an indirect actuation channel. Defender agents select their geometric configuration to shape the repulsive field experienced by the target, with the goal of keeping it inside a prescribed admissible region and, when required, steering it toward a desired destination. The interaction is modeled as an online Stackelberg game in which defenders act as leaders and the adversary reacts by choosing the target command. Using support‑function and normal‑cone arguments, we derive an exact geometric characterization of robust one‑step containment and introduce the notion of a repulsive cage. These results define a centralized Stackelberg oracle and motivate a fully distributed online approximation based on local communication and dynamic field estimation. We prove sublinear dynamic‑regret bounds with respect to the centralized benchmark, quantifying the effect of network‑induced estimation errors and temporal variability of the stage‑wise optimum. Simulations validate the approach and corroborate the theory.
Authors: David Shulman
Abstract: Radio‑frequency (RF) sensing is a central modality for counter‑unmanned‑aerial‑system (counter‑UAS) defence because it exploits the control, telemetry, and video links between a drone and its operator. Reported accuracies for RF‑based drone detection and identification are often very high, but many are obtained using cross‑validation that splits a small number of continuous recordings into short segments. This can place near‑duplicate slices of the same recording in both training and test partitions, creating data leakage. We study this leakage problem through theory and measurement. We formalise the optimism of segment‑level cross‑validation and show, using Cover's function‑counting theorem, that a classifier can exactly memorise the recording‑to‑label map when the number of independent recordings, R, is small relative to the feature dimension, d. In particular, this can occur when 2R is less than or approximately equal to d. Under these conditions, naive accuracy approaches 1, and the inflation gap approaches 1 ‑ ACC, where ACC is the Bayes accuracy. The inflation eases only once R grows beyond this separability threshold. A controlled synthetic experiment with 10 seeds confirms the predicted curves: naive balanced accuracy rises from the Bayes level toward 1.0 as recording‑specific nuisance variation grows, while honest recording‑grouped evaluation declines to chance, with a gap reaching about 0.5. On the public DroneRF dataset, pooled leave‑one‑recording‑out cross‑validation shows drone type identification, AR versus Bebop, collapsing from a naive macro‑F1 of 0.74 to 0.46, the two‑class chance level. A leakage‑pathway ablation attributes essentially all of the inflation to segment‑level leakage.
Authors: Minxing Sun, Yao Mao
Abstract: Short‑horizon prediction is essential for electro‑optical UAV tracking, especially when the target is small, maneuvering, or intermittently observed. Image center, line‑of‑sight, and range measurements provide direct constraints on target position, but their constraints on acceleration are weak. As a result, prediction can lag during aggressive maneuvers. This paper proposes an image‑domain tilt constrained distributed fusion method for maneuvering UAV tracking. The method uses the apparent roll and pitch of a rotorcraft target in the image as low‑level maneuver cues. A weak‑prior auto‑labeling pipeline first generates oriented bounding box and image‑domain tilt labels from synchronized video, gimbal IMU, and UAV IMU data. A YOLO‑OBB detector is then trained to provide online target position and tilt measurements. The front‑end Python implementation is publicly available at github.com/ShineMinxing/PythonYOLO. In the fusion stage, the UAV state is modeled by position, velocity, and acceleration. Image‑domain roll and pitch are introduced as acceleration‑related pseudo‑observations. For distributed tracking, one mobile gimbal camera and two fixed ground cameras are fused asynchronously. Camera attitude error states are augmented into the filter to absorb extrinsic drift and cross‑camera systematic inconsistency. A Mahalanobis gate with time‑since‑last‑valid covariance widening is used to reject false detections and handle dropouts. In simulation, adding roll/pitch observations reduces the prediction RMSE from 1.991 m to 0.821 m and decreases the cumulative prediction error by 60.75%. In real distributed experiments, a self‑consistency evaluation shows an 18.10% reduction in cumulative prediction error. The results show that image‑domain tilt can provide useful acceleration constraints for robust short‑horizon UAV prediction.
Authors: Zhihan Zeng, Amir Hussain, Yue Xiu, Phee Lep Yeoh, Lu Chen, Zhongpei Zhang, Guan Gui
Abstract: Low‑altitude Unmanned Aerial Vehicles (UAVs) often need to infer channel knowledge across a range of heights from only sparse observations collected at a few altitude layers. To address this challenge, this paper studies height‑conditioned cross‑height channel knowledge map (CKM) prediction for UAV‑assisted communications in geometry‑rich urban environments. We develop a geometry‑aware conditional prediction framework that combines urban scene priors, sparse multi‑altitude observations, and target‑height descriptors to reconstruct dense CKMs at unobserved target heights. An uncertainty head is further introduced to characterize prediction confidence and to support cost‑aware online UAV sensing under motion and safety constraints. Experiments on a layered aerial CKM benchmark show that the proposed Feature Pyramid Network (FPN)‑Transformer achieves the best overall performance under both unseen‑scene zero‑shot and legacy patch‑random protocols, reducing the Root Mean Square Error (RMSE) to 5.347dB and 1.111dB, respectively, compared with 6.937dB and 1.221dB for the strongest baseline 3D‑RadioDiff. Moreover, after applying our unseen‑scene few‑shot adaptation, the RMSE further decreases from 5.347dB in zero‑shot prediction to 3.518dB with 10‑shot two‑height support, while the uncertainty‑guided cost‑aware sensing policy improves active reconstruction from 6.94dB at initialization to 4.79dB at sensing budget 40, outperforming uncertainty‑only sensing at 5.08dB and random aerial sampling at 5.84dB.
Authors: Wen-Yu Dong, Weiwei Jiang, Song Zhao, Qi Bi, Sheng Chen
Abstract: The fundamental limits of information flow in spatial networks are usually characterized under stationary spatial point processes, but this assumption cannot capture non‑stationary regimes where the node intensity field evolves continuously in space and time. This paper develops Fluid‑Spatiotemporal Stochastic Geometry (F‑STSG), treating dynamic network topology as a hydrodynamic limit of the discrete node constellation. We formulate the identification of latent network dynamics as an inverse boundary value problem and, using the minimum kinetic energy principle from optimal transport, establish the existence and uniqueness of a scalar potential field governing the compressive evolution of network load. The resulting field‑theoretic formulation couples continuous Lagrangian transport with discrete Eulerian interference geometry. Based on this model, we derive the information flux vector as a sufficient statistic for macroscopic advection and the material derivative as a kinematic predictor of topological divergence. We further characterize non‑stationary network limits through energy‑density scaling and source‑channel interpretation, showing how coordination overhead, topology deformation, and control signaling requirements are linked to the kinematic entropy of the evolving network topology.
Authors: Ke Wu, Yanan Zhang, Yingjie Gao, Wenhao Li, Chenyu Zhou, XinZhu Ma, Jiaxin Chen, Di Huang
Abstract: Object detection for Unmanned Aerial Vehicles (UAVs) working in open and dynamic environments is a highly challenging task. While Vision‑Language Models (VLMs) have offered a powerful solution for universal object detection, adapting them to UAV scenarios remains non‑trivial due to a substantial domain gap between VLM pre‑training data and aerial imagery. The prevailing Parameter‑Efficient Fine‑Tuning (PEFT) methods prove ineffective in bridging this gap, as VLMs' "natural‑scene, foreground‑dominant" visual priors misalign with the "bird's‑eye‑view, background‑dominant, small‑object" characteristics of UAV data. To address this issue, we propose DroneFINE, a novel PEFT paradigm comprising two domain‑aware complementary modules tailored for VLM‑based drone image detectors. Specifically, a data‑dependent, foreground‑aware, and multi‑path adaptation mechanism named HyperAdapter is designed, which overcomes the static structural constraints of PEFT. In addition, a background suppression algorithm named SemanticGate is developed. It is a text‑conditioned guidance strategy that employs background vocabulary to actively guide the model in suppressing responses from irrelevant regions. Extensive experiments on VisDrone and UAVDT demonstrate that DroneFINE significantly outperforms existing PEFT methods and achieves performance comparable to full fine‑tuning while substantially reducing the number of trainable parameters.
Authors: Bohan Li, Min Ye, Haochen Liu, Yongkang Gong, Ning Gao, Jie Nie, Pei Xiao, Xiuzhen Cheng
Abstract: This paper studies high‑altitude platform (HAP)‑assisted sparse cooperative integrated sensing and communication (ISAC) for UAV‑enabled ocean monitoring. A fleet of rotary‑wing UAVs senses drifting buoys, collects their monitoring data, and reports local posterior estimates to a HAP that performs fusion and sparse cooperation control. The model explicitly accounts for a spatially correlated sea‑patch field, patch‑aware buoy dynamics, RCS‑ and clutter‑aware echo sensing, fused posterior Cramér‑Rao bounds (PCRBs), and propulsion‑energy‑limited UAV mobility. The long‑horizon objective is cast as a queue‑weighted buffered‑collection Markov decision process rather than instantaneous throughput, where each buoy maintains a backlog of buffered observations. The resulting long‑horizon design is formulated as a mixed discrete‑continuous problem with sensing, communication, mobility, safety, buffered‑collection, and onboard‑energy constraints. To address the combinatorial association component without replacing learning by a deterministic optimizer, we propose a structured feasible‑association graph‑MARL framework. A heterogeneous graph encoder produces candidate‑edge logits, and a masked sequential b‑matching policy samples legal UAV‑buoy associations while exactly satisfying UAV‑load and buoy‑cluster constraints. A MAPPO‑style training procedure, an independent queue‑state value critic, and a consistency‑verification protocol are then specified to support reproducible training. Simulation results on congested maritime scenarios show that the proposed policy improves the cumulative queue‑weighted collection utility by about 106% over the rate‑driven deterministic decoder, maintains a large margin across sea‑state sweeps and medium‑to‑heavy traffic loads, and transfers to larger networks without fine‑tuning.
Authors: Hamid Shiri, Mehdi Bennis
Abstract: This article presents a communication‑aware and risk‑aware predictive latent control (CRPL) framework for unmanned aerial vehicle (UAV) systems operating under partial observability and uncertain environment dynamics. CRPL integrates a joint‑embedding predictive architecture (JEPA) with probabilistic communication and safety constraints to jointly optimize UAV motion and transmission power. The learned latent model generates recursive multi‑step rollouts, enabling the controller to anticipate future motion, channel degradation, and collision risk. These predictions are incorporated into a unified safety‑aware optimization framework for proactive, energy‑aware trajectory and communication adaptation. Simulation results show that CRPL closely approaches the performance of an oracle analytical predictive controller and outperforms reactive constrained and unconstrained baselines under limited bandwidth and dynamic uncertainty. In the bandwidth‑limited regime, CRPL reduces terminal error, i.e., the final UAV‑to‑goal distance, by up to a factor of approximately 3 and outage duration by up to approximately 18, while also lowering communication energy and collision risk. These improvements are achieved with only a moderate motion‑energy overhead, demonstrating a favorable trade‑off among mobility effort, communication reliability, and operational safety.
Authors: Feibo Jiang, Li Dong, Lei Mao, Kezhi Wang, Xianbin Wang, Abbas Jamalipour
Abstract: Unmanned Aerial Vehicles (UAVs) have become key enabling platforms for low‑altitude economic networks, yet achieving efficient and adaptive optimization under resource‑constrained and dynamic environments remains challenging. This paper investigates language models for UAV‑enabled Wireless Power Transfer (WPT) systems. First, a lightweight Small Language Model (SLM)‑based solution is developed using a pre‑trained BERT backbone, enhanced UAV embeddings and contextual features, a geometry‑aware path decoder, and ensemble inference to achieve low complexity, low latency, and high energy efficiency. Second, an Agentic AI‑based framework is designed to exploit the reasoning and interactive capabilities of Large Language Models (LLMs). It integrates four collaborative agents‑Initializer, Actor, Critic, and Reflector‑to form a closed loop of generation, optimization, evaluation, and reflection for iterative UAV path and energy optimization. Finally, simulations compare the SLM‑, LLM‑, and Agentic AI‑based approaches.
Authors: Louis Petit, Alexis Lussier Desbiens
Abstract: Many path planning algorithms have been introduced so far, but most are costly, in path cost and in processing time, in large‑scale uncluttered 3D environments such as underground mining stopes explored by an unmanned aerial vehicle (UAV). Rapidly‑exploring Random Tree (RRT) algorithms are popular because of their probabilistic completeness and rapidity in finding a feasible path in single‑query problems. Many of the algorithms (e.g. Informed RRT, RRT#) developed to improve RRT need considerable time to converge in large environments. Shortcutting an RRT is an old idea that has been proven to outperform RRT variants. This paper introduces a new method, RRT‑Rope, that aims at finding a near‑optimal solution in a drastically shorter amount of time. The proposed approach benefits from fast computation of a feasible path with an altered version of RRT‑connect, and post‑processes it quickly with a deterministic shortcutting technique, taking advantage of intermediate nodes added to each branch of the tree. This paper presents simulations and statistics carried out to show the efficiency of RRT‑Rope, which gives better results in terms of path cost and computation time than other popular RRT variations and shortening techniques in all our simulation environments, and is up to 70% faster than the next best algorithm in a representative stope.
Authors: Claire Sun, Tanya Berger-Wolf, Jenna Kline
Abstract: Reliable individual re‑identification (re‑ID) of wildlife is essential for population monitoring, behavioral tracking, and conservation policy evaluation, yet large‑scale data collection remains labor‑intensive, relying on manual efforts by ecologists or citizen scientists. We propose an autonomous drone navigation system that actively optimizes image capture for downstream re‑ID, moving beyond passive aerial sensing. The system combines YOLOv11 object detection with a DINOv2‑based pose classifier to guide real‑time flight decisions: detecting animals, orienting to expose the lateral flank (the surface of interest for pattern‑based re‑ID), and approaching until the subject meets a minimum bounding‑box threshold. Unlike prior drone systems that optimize for group‑level behavioral video, ours targets the specific image‑quality requirements of individual‑identification models. We demonstrate feasibility through a case study on zebra using footage collected in Kenya, and show the approach generalizes to other species with diagnostic surface patterns, including giraffes, tigers, and elephants. Our work establishes a framework for task‑aware embodied AI for ecological data collection, in which downstream re‑ID requirements drive real‑time perception and control.
Authors: Wen Jiang, Hanfang Liang, Li Wang, Kangyao Huang, Wang Xu, Wei Fan, Jinyuan Liu, Shaoyu Liu, Hongwei Duan, Bin Xu, Xiangyang Ji, Huaping Liu
Abstract: Recent advances in multimodal large models have significantly improved UAV vision‑language navigation (UAV‑VLN) by enhancing high‑level perception and reasoning. However, existing methods mainly focus on predicting discrete actions, local targets, or sparse waypoints, while the continuous transition from navigation intent to executable UAV motion remains weakly modeled. This motion‑interface gap limits the continuity, stability, and executability of generated UAV trajectories. To address this gap, we propose DynFly, a dynamic‑aware continuous trajectory generation framework that bridges high‑level navigation reasoning and executable UAV motion. DynFly bridges high‑level navigation intent and continuous UAV motion through a lightweight trajectory generation layer. Specifically, it represents expert trajectories in B‑spline control‑point space and employs a Spline‑DiT generator to learn conditional trajectory generation via flow matching. Furthermore, we introduce UAV‑oriented dynamic‑aware supervision over position, finite‑difference velocity, finite‑difference acceleration, heading consistency, and local target alignment, enabling the generated trajectories to better satisfy UAV motion characteristics. And our trajectory generation framework can also be integrated with an existing UAV‑VLN framework while preserving its original visual‑language reasoning pipeline. Extensive experiments on the OpenUAV UAV‑VLN benchmark show that DynFly improves both navigation performance and trajectory quality. On the Test Unseen Full split, DynFly improves the strongest baseline by 4.69 NDTW, 2.40 SDTW, 2.14 SR points and 4.87 OSR points, while reducing NE by 4.51 m.
Authors: Francisco S. Neves, Pedro N. Pereira, Raul D. S. G. Campilho, Andry M. Pinto
Abstract: Autonomous aerial inspection of marine infrastructure is frequently compromised by stochastic sea states, introducing risks of high‑kinetic impacts, post‑landing toppling, and sensory occlusion. This paper proposes a decoupled, multi‑vehicle landing framework synchronizing an Unmanned Surface Vehicle (USV) equipped with a 3‑RPU stabilized platform with a robust Unmanned Aerial Vehicle (UAV). The architecture utilizes two independent Deep Reinforcement Learning (DRL) agents: a Soft Actor‑Critic (SAC) agent providing high‑frequency wave‑motion compensation for the landing deck, and a multimodal RL agent for the UAVs final approach. Evaluated in high‑fidelity maritime simulations, the system achieved a 100% landing success rate across 15 trials in wave states varying from calm to rough. Results show a mean stabilization efficacy of 87.8%, maintaining the landing surface within 1 degree of the horizontal plane for 96% of the mission duration in rough conditions, effectively contributing to safer landings.
Authors: Chen Min, Shuli Lv, Pengda Mao, Huixin Cao, Li Hong, Quan Quan
Abstract: Due to the limited endurance of embedded energy sources such as lithium‑polymer (LiPo) batteries, the flight duration and operational range of unmanned aerial vehicles (UAVs) are severely constrained. Although energy‑efficient trajectory planning and control have been widely studied, most existing approaches rely on accurate system models and computationally expensive optimization procedures. This paper proposes a model‑free online iterative learning (IL) framework to minimize energy consumption. Without requiring explicit models of UAV dynamics or energy consumption, the proposed method improves energy efficiency while maintaining a low computational cost. The per‑iteration computational complexity is O(n), where n denotes the number of path points. In the tested cases, the proposed method is approximately 50‑‑60 times faster than the model‑based IPOPT benchmark. Simulation results and real‑world flight experiments across multiple UAV platforms validate the effectiveness, computational efficiency, and practical applicability of the proposed approach.
Authors: Wenyi Zhang, Fanglong Yao, Youzhi Liu, Peng Hu, Zhengqiu Zhu, Chen Gao, Xian Sun, Kun Fu
Abstract: With the rapid advancement of aerospace embodied intelligence, enabling Unmanned Aerial Vehicles (UAVs) to autonomously understand and reason about complex environments has become increasingly important. However, existing UAV‑based spatial reasoning approaches face critical limitations: single‑view perception renders them vulnerable to occlusions and perspective distortions, while most VLMs lack explicit geometric modeling, relying on semantic cues and yielding inconsistent reasoning under viewpoint and scale variations. To address these challenges, we propose SatAgent, a UAV‑Satellite collaborative spatial reasoning model inspired by the dual‑pathway mechanism of the human visual system. By jointly leveraging satellite and UAV perspectives, SatAgent enables robust, accurate reasoning in complex urban environments. We first introduce a Geometric‑Aware 3D Reconstruction Encoder that elevates 2D UAV features into explicit 3D spatial representations. Next, we design a multi‑view topology‑semantic alignment module integrating cross‑view features within a unified BEV coordinate system. We further introduce a multi‑view consistency loss encouraging viewpoint‑invariant representations. Finally, we construct SatAgent‑SR130K, the first large‑scale UAV‑Satellite collaborative multi‑view spatial reasoning dataset. Experiments show SatAgent outperforms state‑of‑the‑art general‑purpose foundation models and specialized spatial reasoning models by 25.91% and 11.69%, respectively, across diverse tasks, achieving particularly high accuracy in complex geometric relationship reasoning.
Authors: Tianshun Li, Hongliang Lu, Haoang Li, Xinhu Zheng
Abstract: Accurate 4D trajectory prediction and closed‑loop tracking are essential for Unmanned Aerial Vehicle (UAV) swarms to achieve safe and efficient operations in complex low‑altitude environments such as urban airspaces, industrial sites, and indoor facilities. However, this task remains challenging due to intrinsic nonlinearity of UAV swarm dynamics and strict real‑time constraints of swarm formation control. To address these challenges, we propose a unified framework that couples coarse‑to‑fine trajectory forecasting with uncertainty‑aware Distributed Nonlinear Model Predictive Control (DNMPC). Our approach features two key innovations: 1) a dimension‑decoupled trajectory prediction module that reduces computational complexity by forecasting axis‑wise motion, and 2) a diffusion‑based residual dynamics refinement module that captures temporally correlated dynamic uncertainties. These refined predictions are then integrated into a DNMPC loop to ensure formation stability. We also introduce a synchronized multi‑scenario 4D UAV swarm dataset spanning six representative airspace scenarios. The dataset contains over 7,900 frames of synchronized three‑UAV trajectories with frame‑level annotations of speed intention and target sector. Extensive experiments demonstrate that our approach outperforms state‑of‑the‑art baselines, reducing trajectory tracking error by up to 10‑15% and achieving sub‑0.07\,m average tracking error in complex urban and industrial environments, while maintaining real‑time inference speeds of 34 FPS (sub‑30 ms latency) suitable for agile flight.
Authors: Xinyi Liu, Xiaoya Cheng, Rouwan Wu, Zhaochen Wang, Shen Yan, Maojun Zhang, Yu Liu
Abstract: Real‑time, drift‑free UAV geo‑localization is essential for autonomous missions in GNSS‑denied environments. The pioneering system, PiLoT, achieves high precision via Neural Pixel‑to‑3D Registration, aligning UAV video streams with a single rendered reference view from 3D meshes. However, its reliance on heavy 3D meshes incurs massive storage overheads, complex map acquisition, and significant computational rendering costs, severely hindering deployment on embedded platforms. To address these bottlenecks, we propose PiLoT v2, a lightweight yet robust evolution that shifts the paradigm to direct pixel‑to‑orthogonal map registration for free‑view UAV geo‑localization. By leveraging True Digital Orthophoto Maps (TDOMs) and Digital Surface Models (DSMs) as the reference substrate, PiLoT v2 replaces GPU‑intensive 3D rendering with a highly efficient, CPU‑friendly map cropping operation. To bridge the severe geometric discrepancy between these 2.5D orthogonal crops and free‑view oblique UAV imagery, we train a cross‑view feature registration network using a novel, large‑scale geometrically annotated dataset. Furthermore, we integrate onboard sensor prior‑‑specifically gravity direction and single‑point laser rang‑‑directly into the pose optimization manifold to enhance robustness against cross‑view visual degradation. Experimental results demonstrate that PiLoT v2 achieves performance comparable to, or even exceeding, its Pixel‑to‑3D predecessor, while offering drastically lower storage and computational costs.
Authors: Sheng Zhang, Qinglin Li, Yuechao Zang, Xueqin Huang, Yijia Fu, Cheng Zhu
Abstract: Large language models (LLMs) provide a promising interface for high‑level robotic task planning, but their use in multi‑UAV collaboration remains difficult to evaluate systematically. Existing UAV simulators mainly emphasize dynamics, perception, or low‑level control, while existing LLM‑agent benchmarks rarely capture aerial‑robotics constraints such as partial observability, spatial coverage, UAV assignment, and multi‑vehicle coordination. To bridge this gap, we present MultiUAV‑Plat, a lightweight, easy‑to‑use, LLM‑agent‑oriented simulation platform for multi‑UAV collaborative task planning. The platform exposes concise RESTful APIs, agent‑facing observations, role‑based information access, hidden validation logic, and optional 2D/3D visualization, allowing agents to solve missions through realistic tool interaction rather than privileged simulator access. Built on this platform, the MultiUAV‑Plat Benchmark contains 75 mission sessions, 1500 natural‑language tasks, and 9396 validation checks across target assignment, area search, and area assignment and patrol scenarios. We further propose Agent4Drone, a task‑specific LLM agent framework that structures multi‑UAV behavior into memory, observation, task understanding, planning, execution, and verification. In a full paired benchmark comparison, Agent4Drone achieves a 57.9% task pass rate, a 74.6% average task check pass rate, and a 72.0% global check pass rate, substantially outperforming a ReAct baseline at 30.6%, 47.9%, and 43.1%, respectively. Agent4Drone also reduces the total failed task rate from 32.4% to 12.9%. These results demonstrate that MultiUAV‑Plat and MultiUAV‑Plat Benchmark provide a reproducible foundation for studying LLM‑driven multi‑UAV autonomy under realistic information and execution constraints.
Authors: Ilyar Asl Sabbaghian Hokmabadi, Mahdis Bisheban
Abstract: Accurate extrinsic calibration of inertial sensors, such as Inertial Measurement Units (IMUs) and cameras is crucial for trajectory estimation of Uncrewed Aerial Vehicles (UAVs). While numerous calibration methods have been proposed, these techniques often rely on specialized equipment, planar targets, and an initial estimate of the calibration parameters. In this research, we propose a targetless calibration method designed for UAVs equipped with IMUs and RGB‑Depth (RGB‑D) cameras. Our approach leverages deep‑learning‑based floor‑segmentation to extract ground points from the depth channel of RGB‑D images. Subsequently, the normal vector to these points is estimated. The known orientation of the normal to the floor segment and the gravity vector sensed in the accelerometer's frame are utilized in a robust estimation approach to estimate the extrinsic calibration parameters. We illustrate that the developed method outperforms MATLAB's Toolboxes and exhibits similar performance to Kalibr without the use of specialized checkerboard targets.
Authors: Hiranya Udagedara, Adam Bigsby, Mahdis Bisheban
Abstract: Use of quadrotor UAVs for wind velocity estimation is gaining popularity in recent studies, leveraging their maneuverability, compact size and low cost. Among available approaches, model‑based wind velocity estimation is most commonly used, since it relies only on onboard sensors. However, as the quadrotor is a highly nonlinear system, thus making this task challenging. This study evaluate the use of both discrete and continuous dynamic equations of the quadrotor UAV for wind velocity estimation on SE(3), rather than commonly adapted continuous or discretized form. Lie Group Variational Integrator, developed on discrete Lagrangian is used as the discrete model without any approximation or discritization. The study assess both the discrete and continuous form of the quadrotor dynamics on SE(3) using Extended Kalman filter (EKF), and Unscented Kalman filter (UKF). The quadrotor UAV performance is evaluated in both MATLAB‑based numerical simulations and free outdoor flight. The numerical simulations are conducted during both hovering and trajectory‑tracking flights. Results demonstrate that, by using discrete SE(3) dynamics coupled with UKF, the quadrotor achieves higher estimation accuracy while maintaining trajectory tracking, even with low‑cost sensors. These findings highlight the potential of discrete quadrotor models with UKF not only for wind velocity estimation but also for other high‑accuracy tasks, even when relying on low‑cost onboard sensors.
Authors: Juanqin Liu, Leonardo Plotegher, Eloy Roura, Shaoming He
Abstract: For uncrewed aerial vehicles (UAVs), estimating six‑degree‑of‑freedom (6‑DoF) poses is essential for airspace situational awareness, target tracking, and counter‑UAV operations. However, non‑cooperative targets usually lack computer‑aided design (CAD) models and keypoint priors, making existing model‑based or keypoint‑matching methods difficult to apply reliably. To address these challenges, this paper proposes MF‑UAVPose6D, a model‑free monocular 6‑DoF pose estimation framework for fixed‑wing UAVs. During inference, the method takes only a single red‑green‑blue (RGB) image and camera intrinsics as input. It first obtains a stable target anchor through heatmap‑guided center localization, introduces a Perspective‑Aware Module (PAM) to model observation‑ray priors, exploits Dynamic Topological Sampling (DTS) to complement weak structural cues from the wings, fuselage, and tail, and adopts a decoupled translation‑rotation pose decoding mechanism to estimate the 6‑DoF pose. In addition, we construct the FW‑UAV6DPose synthetic dataset, which covers fixed‑wing UAV observations across diverse distances, viewpoints, and poses. Experimental results show that MF‑UAVPose6D achieves accurate and efficient monocular 6‑DoF pose estimation without requiring CAD models, and demonstrates strong robustness in long‑range rotation estimation, depth recovery, and joint pose evaluation.
Authors: Cong Hoang Quach, Chi Thanh Vo, Dong LT. Tran, Truong Son Nguyen, Manh Duong Phung, Thuan Hoang Tran
Abstract: Accurate localization of unmanned aerial vehicles (UAVs) is essential for applications such as structural health monitoring, especially in environments where Global Positioning System (GPS) signals are denied or unreliable, like indoor spaces, tunnels, urban canyons, or areas beneath large structures. To address this challenge, we propose Cross‑Fusion, a novel method for real‑time UAV localization that integrates data from a 3D Light Detection and Ranging (LiDAR) and a monocular camera. A key contribution is its cross‑session fusion strategy, which integrates visual and geometric information collected from multiple agents during routine baseline surveys to improve localization consistency and map completeness. The system employs LiDAR‑based odometry for motion tracking and image‑based feature matching via a single red‑green‑blue (RGB) camera to correct drift and improve accuracy. Unlike visual‑inertial systems, Cross‑Fusion maintains a simple sensor setup and avoids the complexity of stereo or global shutter configurations. Experimental results demonstrate that Cross‑Fusion achieves localization accuracy comparable to GPS‑based methods and performs reliably in challenging feature‑sparse environments.
Authors: Kaushlendra Pandey, Nithin V Sabu, Abhishek K. Gupta
Abstract: Urban vehicular networks (VNs) demand seamless connectivity and situational awareness within road‑constrained environments, motivating the deployment of unmanned aerial vehicles (UAVs) platforms capable of simultaneously sensing vehicles and establishing communication with them. In this paper, we present a sensing‑assisted UAV network that provides connectivity to the vehicles in an urban area. The road network of the urban area is modeled as Manhattan Poisson line process (MPLP), and the random location of vehicles on each road is modeled as one dimensional Poisson point processes (PPPs). UAVs are distributed in the urban area at a fixed altitude and provide connectivity after sensing the vehicles. Their locations are modeled as a two‑dimensional homogeneous PPP. Combined with the fixed altitude, this results in a three‑dimensional spatial configuration. We incorporate an elevation dependent blockage model and define the sensing radius based on detection probability (DP), showing that it is jointly limited by signal strength and blockage effects. We derive the DP and characterize the typical UAV's sensing region within the reliability requirements. We also derive the Laplace transform (LT) of aggregate interference accounting for directional patterns and sensing‑driven activity, and analyze the resulting coverage probability (CP). Finally, we obtain the rate coverage (RC) of sensed vehicles falling within the UAV's sensing zone. Numerical results shows that increasing altitude degrades sensing and coverage performance, whereas RC exhibits a non‑monotonic trend, first decreasing and then increasing with altitude.
Authors: Yulin Huang, Shaojie Chen, Di Feng, Jiahao Wang, Ping Liu, Jianxiao Zou
Abstract: End‑to‑end unmanned aerial vehicle (UAV) navigation can achieve impressive agility in simulation, yet its obstacle‑avoidance behavior often degrades after deployment because the policy must tolerate simulator mismatch, sensing irregularity, and variable‑rate control. These effects are especially dangerous in cluttered environments, where stale observations or short control irregularities can directly lead to collisions. We present LNN‑Fly, a deployment‑oriented continuous‑time navigation policy for LiDAR‑based UAV obstacle avoidance. The policy combines a dynamic‑programming‑inspired structured recurrent update, explicit conditioning on the elapsed control interval Δt, and an input‑driven adaptive forgetting gate that refreshes stale latent state near hazards while preserving consistency during sustained maneuvers. It is trained with differentiable rollouts that incorporate deployment‑relevant sensing and timing perturbations. In simulation, LNN‑Fly improves obstacle‑avoidance performance in the tested settings and shows better tolerance to reduced control frequency, sparse observations, and control‑period jitter. It also transfers zero‑shot from a simplified differentiable simulator to a physical quadrotor. In indoor cross‑frequency real‑world tests, the system achieves 100% success over 20 flights, while policy inference has a median latency of 0.514 ms on a desktop graphics processing unit (GPU) and about 2.5 ms on the onboard central processing unit (CPU), with onboard P95 latency below 30 ms.
Authors: Zhenyu Liang, Xiao Zhang, Boyu Wang, Zhaolun Liang, Ang Li, Jeff Chak Fu Chan, Mingzhu Wang, Jack C. P. Cheng
Abstract: Existing digitization of buildings with reflective glass facades suffers from geometric reconstruction distortion, unrealistic view‑dependent texture rendering, and difficulties in object‑based semantic enhancement. Therefore, we propose RefGlass‑GS, a fusion framework that enables end‑to‑end UAV‑based photorealistic, semantic, and interactive digitization of reflective glass facades. The contributions include: (1) proposing an individual glass panel segmentation method based on maximum a posteriori estimation with structural regularities, robust to severe reflection and background interference; (2) formulating a UAV viewpoint planning optimization function that maximizes the coverage of view‑dependent appearance for sufficient data capture; (3) developing an optimized Gaussian Splatting framework with a Reflection MLP, a novel deferred shading function, and two enhanced regularization terms for effective modeling of high‑frequency near‑field reflections; (4) introducing a standardized data organization paradigm for structuring GS‑based representations into object‑based models, facilitating interactive facility management on digital twin platforms. Experiments on real‑world reflective glass facade scenes validate the effectiveness and superiority of the proposed method. Specifically, the glass panel segmentation achieves an improvement of 0.1927 in mIoU over SOTA methods, and only our method enables instance‑level panel extraction. The UAV view planning improves novel view synthesis for reflective facades by 13.15 dB in PSNR compared to commercially used nap‑of‑the‑object planning methods. The RefGlass‑GS modeling outperforms SOTA Gaussian Splatting approaches for reflective scenes with an average improvement of 5.08 dB in PSNR.
Authors: Marwan Dhuheir, Thang X. Vu, Symeon Chatzinotas
Abstract: Industrial 6G networks require ultra‑reliable, low‑latency, and energy‑efficient connectivity in dynamic and blockage‑prone environments, where conventional terrestrial deployments often fail to ensure stable coverage. Hence, in this paper, we propose a RIS‑enabled Open‑RAN framework for integrated terrestrial/non‑terrestrial (TN/NTN) industrial 6G networks, in which UAVs‑mounted reconfigurable intelligent surfaces (RISs) cooperate with ground radio units and a high‑altitude platform (HAP) to enhance connectivity for dense industrial IoT devices. Owing to the high dimensionality and strong coupling among decision variables, conventional optimization techniques become computationally intractable. To overcome this limitation, the joint optimization problem of data rates, latency, and energy consumptions is formulated as a decentralized partially observable Markov decision process (Dec‑POMDP) and solved using a multi‑agent deep reinforcement learning framework. Simulation results show improvements of up to 75% in data rate, 25% latency reduction, and 16% energy savings compared with state‑of‑the‑art learning‑based and non‑RIS baselines, demonstrating the effectiveness of RIS‑assisted Open‑RAN intelligence for industrial 6G networks.
Authors: Haotian Li, Yida Wang, Leyuan Wang, Jinshan Lai, Keyang Wang, Zonghao Guo, Qiang Ma, Liuyu Xiang, Jianwei Hu, Zhaofeng He
Abstract: In recent years, multimodal large language models (MLLMs) have shown strong potential for embodied intelligence, yet their ability to maintain geometrically consistent spatial understanding across heterogeneous views remains under‑evaluated. Existing benchmarks largely focus on single‑agent, single‑view perception, leaving a gap in the systematic assessment of collaborative air‑ground settings, where multi‑scale observations are complementary but introduce scale mismatch, asymmetric occlusion, and reference‑frame inconsistencies. We present AirGroundBench, a diagnostic benchmark for evaluating multi‑view spatial intelligence in heterogeneous UAV‑UGV collaboration. AirGroundBench is built from 11 high‑fidelity simulated environments with 1,021 synchronized air‑ground observation pairs, yielding approximately 62,000 dual‑view, four‑option single‑choice visual question answering instances and 115 closed‑loop vision‑language navigation episodes. It covers 10 task types organized into four progressively demanding capability dimensions: spatial perception, cross‑view alignment, spatial transformation and reasoning, and embodied decision‑making. To support geometry‑grounded evaluation and analysis, we provide structured spatial annotations, including cross‑view object identities and metric 2D and 3D bounding boxes. Evaluations of 13 representative MLLMs under UAV‑only, UGV‑only, and dual‑view input settings reveal consistent bottlenecks: models perform relatively well on spatial perception but struggle with cross‑view alignment and transformation‑intensive reasoning, and these deficits propagate to sequential decision‑making in vision‑language navigation. Although dual‑view inputs provide measurable gains over single‑view variants, a persistent gap from human performance remains, highlighting geometric consistency as a key limitation of current embodied MLLMs.
Authors: Mobin Habibpour, John Spodnik, Niloufar Alipour Talemi, Fatemeh Afghah
Abstract: Wildfire monitoring from UAVs requires reliable reasoning over complex aerial scenes, where smoke, scale variation, and occlusions often limit RGB‑only interpretation. We introduce FlameVQA, a multiple‑choice visual question answering benchmark for UAV‑based wildfire intelligence built on FLAME 3, leveraging paired RGB imagery and radiometric thermal TIFFs for temperature‑grounded, safety‑critical reasoning. FlameVQA includes 34 multiple‑choice questions per image spanning six operational capability groups, covering tasks such as detection, localization, distribution/coverage estimation, cross‑modal reasoning, and flight planning. To ensure label reliability, we combine MLLM‑assisted annotation with deterministic thermal rules and cross‑question consistency checks, followed by human auditing. We also evaluate representative MLLMs on FlameVQA to provide baselines for future work. Results show strong performance when explicit cross‑modal cues are available, but notable failures on presence detection under heavy smoke and on coverage estimation. These findings suggest that current MLLMs require domain‑specific adaptation to better support disaster and wildfire monitoring. The dataset and benchmark code are open‑source at github.com/mobiiin/WildFire_VQA
Authors: Bhavya Dixit, Aayushi Rajgor, Subham Kumar, Rushikesh Patil, Ananthapadmanabhan A., Gaurav S. Kasbekar, Arnab Maity
Abstract: Unmanned aerial vehicle (UAV) swarms rely on distributed coordination and cooperative communication to support scalable operations, extended coverage, and applications such as surveillance and real‑time data exchange. Wireless technologies such as radio frequency (RF) and WiFi are widely used for UAV‑to‑UAV and UAV‑to‑ground control station (GCS) communication but introduce significant security challenges. MAVLink, the predominant communication protocol in UAV systems, provides message integrity and authentication but lacks built‑in encryption, leaving telemetry traffic vulnerable to eavesdropping. In our previous work, we proposed MAVShield, a lightweight encryption framework for MAVLink communications. In this paper, MAVShield, AES‑CTR, Speck‑CTR, ChaCha20, and Rabbit are integrated into four custom‑built UAVs to establish secure communication links over RF and WiFi channels. Their performance is evaluated through flight experiments using a UAV swarm testbed. Encrypted telemetry data enable autonomous formation control and collision avoidance during flight. For collision avoidance, we develop a modified artificial potential field (APF) algorithm that computes attractive and repulsive forces directly in geodetic coordinates, eliminating Cartesian transformations and reducing trajectory oscillations while avoiding local‑minimum trapping. CPU utilization, memory consumption, and packet delivery ratio (PDR) are measured for each encryption scheme. Results show that MAVShield achieves performance comparable to unencrypted communication while outperforming AES‑CTR, Speck‑CTR, ChaCha20, and Rabbit in overall efficiency. Algebraic cryptanalysis and Wireshark‑based traffic analysis demonstrate resistance to key‑recovery attacks and protection of telemetry confidentiality. The results indicate that MAVShield is an efficient and secure solution for UAV swarm communication.
Authors: Feng Pan, Chunran Zheng, Bing Xue, Yukang Cui, Jiayu Wen, Zhiyu Chen, Wei Wang
Abstract: Large‑scale point cloud maps are essential for robotics and spatial intelligence tasks. UAVs provide an efficient means for large‑scale map acquisition; however, due to limited flight endurance and onboard storage, mapping a large‑scale scene within a single flight remains difficult. Existing multi‑session map merging methods can extend the mapping range, yet in UAV scenarios they still struggle to simultaneously suppress long‑range drift and preserve local geometric accuracy. To address this issue, an uncertainty‑aware multi‑session point cloud map merging and coarse‑to‑fine optimization system is proposed. The proposed method first performs initial multi‑session map merging based on a scene graph, and then incorporates RTK observations through an RTK spatiotemporal alignment module, where temporal offsets are estimated using Dynamic Time Warping (DTW), and continuous RTK constraints are recovered using Multi‑Output Gaussian Processes (MOGP) under incomplete sampling and frame dropouts. On this basis, a unified uncertainty‑aware factor graph is constructed, and local geometric accuracy is further improved through iterative plane‑factor refinement. Experiments on real‑world datasets validate the effectiveness and robustness of the proposed method. To facilitate further research and development in the community, our code and dataset will be publicly released.
Authors: Soeun Lee, Chanho Kim, Yeji Kang, YoungKi Hong, Byeongkeun Kang
Abstract: Accurate seedling detection during early growth stages is essential for timely replanting and effective crop management in precision agriculture. However, existing studies are mostly evaluated under relatively stable imaging conditions, such as UAV imagery or greenhouse environments, leaving robust detection under severe and spatially heterogeneous illumination in ground‑based outdoor monitoring insufficiently explored. In addition, many illumination‑robust detection methods rely on additional enhancement or feature‑extraction modules, which increase inference‑time overhead and are not tailored to seedling detection and downstream missing seedling localization. To address these gaps, we construct a new garlic seedling dataset captured using a ground‑based monitoring platform under real outdoor field conditions with highly variable illumination. We further propose an illumination‑robust seedling detection framework based on adversarial augmentation policy learning. The proposed method jointly optimizes a stochastic augmentation policy agent and an object detector, enabling the detector to learn robust representations under challenging visual conditions. A structural penalty is introduced to prevent unrealistic distortions while encouraging challenging augmentations during training. Extensive experiments show that the proposed approach achieves an AP_50 of 91.6%, improving the baseline by 0.9 percentage points and outperforming the previous best‑performing method by 0.2 percentage points. For downstream missing seedling localization, it achieves 75.0% precision and a 67.0% F1‑score, improving the baseline by 4.8 and 2.0 percentage points, respectively. These results demonstrate the effectiveness of the proposed framework for practical ground‑based agricultural monitoring under complex outdoor lighting conditions without additional inference‑time computational overhead.
Authors: Rui Zhang, Fuwang Dong, Wei Wang, Zhen Du
Abstract: In sea‑air communication networks composed of an uncrewed aerial vehicle (UAV) and an uncrewed surface vehicle (USV), the extended target characteristics and three degree of freedom motion of the USV under sea induced disturbances cause beam misalignment in the UAV's tracking of the USV. To address these issues, this paper proposes a predictive beam tracking scheme based on integrated sensing and communication (ISAC) for sea‑air networks. We develop a wide and narrow beam switching scheme based on sub‑array selection, where a time allocation factor is optimized to balance robust state sensing in the wide beam mode and high‑rate communication in the narrow beam mode. Specifically, a wide beam mode provides full USV coverage and state sensing, while a narrow beam mode exploits the estimated state for high‑gain communication with the communication receiver (CR) mounted on the USV. To characterize the CR motion, a sea‑air state evolution model is derived by jointly considering the surge, sway, yaw, and sea induced disturbances of the USV. For the extended target USV, the measurement equation is constructed from multiple scatterer observations, with the measurement noise caused by sea clutter modeled, and an extended Kalman filter (EKF) based CR state prediction and estimation method is developed. In addition, the effect of sea clutter on sensing accuracy is incorporated into the time allocation optimization problem to adjust the time of the wide beam mode. Simulation results demonstrate that the proposed scheme achieves higher tracking accuracy than the state‑of‑the‑art benchmark schemes.
Authors: Runfeng Ling
Abstract: Disturbance‑robust UAV position control is easy to demonstrate in benign simulations but much harder to make fast in approach, well behaved near the target, and credible beyond a single benchmark. This letter presents a layered terminal‑control architecture for multi‑waypoint UAV position regulation together with a staged evaluation across PyBullet, PX4/Gazebo, and hardware. Phase I uses a PyBullet benchmark with stochastic wind for rapid structural selection, identifying a controller core that separates smooth approach generation, persistent‑bias compensation, and supervised near‑target terminal regulation. Phase II carries only that main architecture into a more demanding PX4/Gazebo closed loop, where the outer‑loop controller acts through a cascaded flight stack with delay‑sensitive settling and stronger transit‑to‑hover coupling. This step exposes which benchmark gains survive autopilot‑mediated dynamics and which refinements collapse once the loop becomes more deployment‑like. In Phase I, the bare controller attains 0.024 m mean late‑stage wind error. In Phase II, the final controller is selected using a transfer‑oriented rule emphasizing absence of benchmark priors, cross‑scenario balance, and deployable supervisory logic. Strict is used as the primary reporting reference; the supplementary retrospective Grace analysis shows that part of the residual failure set is sensitive to completion semantics rather than gross waypoint‑miss behaviour. The evaluation is completed on one Vicon‑tracked Tello stack through a two‑level hardware study. Taken together, the results suggest that benchmark success becomes more informative when the main controller design is separated from benchmark‑specific refinement and remains defensible under harder closed‑loop evaluation.
Authors: Xuli Cai, Poonam Lohan, Sachin Ravikant Trankatwar, Burak Kantarci
Abstract: In this paper, we investigate delay‑aware task offloading and resource scheduling in a three‑tier space‑air‑ground integrated network (SAGIN) consisting of IoT devices, UAV edge nodes, and a high‑altitude platform station (HAPS). We formulate joint task association and continuous resource control (including bandwidth, transmit power, and CPU frequency allocation) as a non‑convex mixed‑integer nonlinear programming (MINLP) problem, which is inherently NP‑hard. To capture fine‑grained system dynamics, we introduce a macro‑micro slot model that tracks cumulative transmission and computation progress over time. Based on this model, we propose HALO, a hierarchical auction‑assisted learning framework that combines auction‑based task association with hierarchical Proximal Policy Optimization (HPPO) for resource allocation. Simulation results under different traffic loads show that HALO consistently outperforms representative deep reinforcement learning (DRL) baselines. In particular, HALO achieves an average improvement of 8.7 percentage points in task success rate over PPO (corresponding to an 11.4% relative gain) and shows consistently greater robustness than DDPG and SAC, with relative improvements of 32.4% and 89.9%, respectively. These results highlight HALO's ability to maintain stable and efficient performance under varying traffic conditions, making it well‑suited for delay‑sensitive SAGIN environments.
Authors: Samuel Tovey, Stefan Prestel, Hiroshi Yamauchi
Abstract: Locating survivors of building collapses within the first 72 hours is a critical challenge in disaster response, and existing sensing modalities provide only partial information about the structure beneath the rubble. This paper proposes drone‑based quantum magnetometry as a complementary modality and develops a simulation pipeline spanning rubble physics, sensor‑array deployment, and active spatial reconstruction. We use Unreal Engine to generate a steel‑reinforced concrete parking‑garage collapse and compute the induced magnetic field via a per‑triangle dipole approximation, establishing that meaningful magnetic structure is recoverable in the sub‑pT to sub‑nT range from roughly 1 m above the roofline. Then, we feed sparse multi‑sensor samples into a Gaussian Process Regression back‑end driven by Bayesian active sampling and validate the pipeline across multiple independent collapse realizations; a three‑sensor array optimizes the trade‑off between gradient resolution and UAV payload constraints, and active sampling reaches peak structural correlation in roughly 100 samples. Together, these results indicate that quantum‑grade sensing could become a useful tool for drone‑based structural analysis and potentially void detection in collapsed buildings.
Authors: Khaoula Khaled, Muhammad Afaq, Ali Arshad Nasir, Zeeshan Kaleem
Abstract: Unmanned aerial vehicle (UAV) can provide on‑demand, high‑capacity connectivity in disaster and normal situation. However, it faces a challenge of curse of dimensionality in trajectory optimization, where interference‑limited environments and vast search spaces make real‑time coordination computationally expensive. To overcome this challenge, we propose the Rate‑Aware Quantum‑Annealed Graph Condensation (RA‑QAGC) scheme, which combines rate‑aware graph abstraction with decentralized reinforcement learning to enable scalable, interference‑aware UAV coordination. By identifying high throughput locations and guiding UAV trajectory adaptation toward throughput‑optimal regions, RA‑QAGC effectively balances network capacity by maintaining quality‑of‑service (QoS) requirements. Simulation results demonstrate the proposal outperformed over existing schemes by achieving 59.4 Mbps total throughput and 23.9 Mbps priority‑user throughput, representing gains of approximately 15% and 34%, respectively, over the baseline schemes.
Authors: Oussema Dhaouadi, Zuria Bauer, Johannes Michael Meier, Olaf Wysocki, Marc Pollefeys, Daniel Cremers
Abstract: Continuous 6‑DoF pose estimation is essential for autonomous UAV operations. Yet, existing visual odometry and SLAM methods accumulate drift and yield only relative, up‑to‑scale trajectories. Single‑frame geo‑localization, in turn, discards temporal continuity and remains too slow for real‑time use. We present OrthoTrack, a training‑free system that estimates continuous 6‑DoF UAV trajectories using only publicly available orthophotos and surface models as a map prior. OrthoTrack matches keyframes against the orthophoto and lifts correspondences to metric 3D via the surface model. It then propagates these map‑anchored correspondences to intermediate frames with optical flow, producing absolute, metrically scaled poses at every frame without GPS or post‑hoc alignment. We also introduce the MovingDrone Dataset, a large‑scale benchmark pairing photorealistic UAV sequences with dense 6‑DoF ground truth and co‑registered multi‑modal geodata including multi‑temporal orthophotos. On MovingDrone and real‑world benchmarks, OrthoTrack runs in real time on a single GPU. It outperforms all baselines by a large margin, even those receiving oracle scale and alignment. By relying on publicly available geodata, OrthoTrack enables deployment to new regions without site‑specific adaptation.
Authors: Yang Xiaomeng, Jia Ziye, Zhu Qiuming, Wu Qihui
Abstract: Unmanned aerial vehicles (UAVs) are increasingly employed in urban inspection tasks, where reliable communication is critical but challenging due to the severe spatial channel heterogeneity. To address the issue, in this paper, we focus on the communication‑aware path planning for multi‑UAV tasks, and propose a channel knowledge map (CKM)‑driven trajectory planning framework which integrates the channel modeling and trajectory decision‑making. Specifically, we apply the diffusion model to construct a time‑accumulated CKM and achieve the accurate perception with low flight overhead, which leverages the sparse observation data to reconstruct the high‑fidelity global channel quality distribution. Based on the CKM, we propose a global‑to‑local graph attention network soft actor‑critic algorithm. The graph attention network optimizes the complex combinatorial node ordering problem, generating an optimal and communication‑aware sequence for the inspection targets. Subsequently, the soft actor‑critic algorithm performs continuous action control to ensure the smoothness of the flight path and dynamically avoid communication attenuation areas. Simulation results demonstrate that the proposed method effectively guides UAVs through high‑quality channel regions without dependence on real‑time channel feedback, significantly improving both the trajectory efficiency and communication reliability.
Authors: Allan Henry, Solange Rossato, Christian Graff, Sylvain Huet, Jose-Ernesto Gomez-Balderas
Abstract: Voice control offers an intuitive alternative to manual drone piloting, yet most existing systems rely on rigid command vocabularies that fail to handle the spontaneous, disfluent speech of naive users. This paper addresses this gap by proposing an End‑to‑End Spoken Language Understanding architecture for real‑time human‑drone interaction in French. Our model combines a frozen Self‑Supervised Learning acoustic encoder with a lightweight LSTM‑based classification head, augmented by a cross‑modal knowledge distillation objective that aligns acoustic representations with semantic embeddings from a text teacher, without requiring transcription at inference time. We evaluate our approach on VoiceStick, a novel French corpus of spontaneous speech collected during real teleoperation sessions with 29 nonexpert dyads. On simple voice commands, our best configuration achieves 93% accuracy at 7 ms inference latency, outperforming cascade baselines (79%, 202 ms) with a 29x speedup. On the full spontaneous speech test set, our architecture reaches 82% accuracy, with crossmodal distillation consistently improving robustness across all configurations. These results demonstrate that End‑to‑End architectures are not only feasible but preferable for spontaneous voice‑guided UAV teleoperation, combining semantic robustness, low latency, and calibrated confidence.
Authors: Chenrui Sun, Swarna Bindu Chetty, Gianluca Fontanesi, Mahnaz Arvaneh, Walid Saad, Hamed Ahmadi
Abstract: The deployment of unmanned aerial vehicles (UAV) as open radio units (O‑RUs) in 6G cellular systems presents a promising opportunity to achieve scalable and adaptive network coverage. However, optimizing UAV trajectories in dynamic and unfamiliar environments remains a critical challenge, particularly due to the need for extensive retraining in each new scenario. In this paper, we introduce a novel UAV trajectory optimization framework that integrates enhanced continual transfer learning within the O‑RAN architecture. The proposed system maintains a library of pre‑trained models and employs a model selection mechanism to identify and transfer knowledge from the most relevant environments, minimizing adaptation time and improving efficiency. When no sufficiently similar model is available, a fallback model empowered by continuous refinements ensures baseline performance. The framework leverages real‑world city maps and ray tracing techniques to enhance learning reliability and improve trajectory planning. Simulation results demonstrate that the proposed model selection‑based transfer learning approach reduces convergence time by 44% to 56% compared to retraining from scratch, and up to 40% compared to traditional transfer learning without model selection.
Authors: Keegan Kimbrell, Alexis R. Tudor, Peter Van, Trevor Bihl, Doug Slattery, Gopal Gupta
Abstract: Autonomous unmanned aerial vehicles (UAVs) must operate safely in dynamic environments and adapt to changing mission conditions. Although deep learning approaches have shown strong performance for navigation and perception, they are often difficult to explain, verify, and modify for safety‑critical tasks. We propose a symbolic state‑centered UAV agent using the s(CASP) answer set programming system, enabling autonomous task execution with constraint‑based commonsense reasoning in a high‑fidelity Unreal Engine 5 environment. We fully implement prior work on the VECSR‑A system to support multi‑step autonomous behaviors including navigation, search, debris detection, precision spraying, object transport, and inspection. The UAV reasons over environmental and spatial constraints, dynamically revising plans when tasks fail or data is insufficient. Because decisions are based on commonsense reasoning, they are guaranteed to be correct and explainable. We evaluate the feasibility of s(CASP) for UAV control in realistic simulated missions. Results show that our framework enables explainable, adaptive autonomy without retraining, handling complex constraint‑aware decisions and dynamic task reevaluation.
Authors: Hamed Alimohammadzadeh, Shahram Ghandeharizadeh
Abstract: This paper presents the design and implementation of a Flying Light Speck (FLS) to illuminate English letters. The FLS uses its onboard camera and computing to localize and follow a trajectory to illuminate a letter. We evaluate the illuminations quantitatively and qualitatively. The latter is based on an IRB approved human subject study with 20 participants. The obtained results show a 42 to 56 millimeter error that impacts the detection of letters. A key finding is that the order in which the illumination of letters is presented to subjects has a significant effect on detection duration.
Authors: Houzhang Fang, Ruixuan Huang, Qiuhuan Chen, Xiaolin Wang, Yi Chang, Luxin Yan
Abstract: Infrared small target detection (IRSTD) in high‑resolution images is crucial for many practical applications, such as surveillance of unmanned aerial vehicles (UAVs) and UAV‑based ground monitoring. However, IRSTD remains challenging due to the small size and weak features of targets, as well as significant interference from complex dynamic backgrounds. Existing detection methods often suffer from redundant computations on non‑target background regions and insufficient exploitation of target context information, which limits their performance in complex backgrounds. To address these issues, we propose an efficient coarse‑to‑fine infrared small target detection framework with attention prior‑guided knowledge distillation, termed ECFNet. In the coarse stage, we design a region binary classification network (RBCN) on grid‑based multi‑scale feature maps to efficiently recognize target‑containing context region proposals while suppressing complex backgrounds. Moreover, we introduce a novel denoising‑assisted training strategy that incorporates noisy ground‑truth (GT) masks into the feature maps of RBCN and trains the network to reconstruct the GT masks through a denoising task, thereby enhancing its ability to distinguish target proposals from background regions and accelerating convergence. In the fine stage, we customize a lightweight target detector to the coarse stage's region proposals for balancing accuracy and efficiency. Furthermore, we propose a knowledge distillation strategy guided by the teacher‑student cross‑attention prior. This mechanism directs the student to focus on critical target regions, thereby enhancing the discriminative feature representation for infrared small targets. Extensive experiments on three real infrared datasets demonstrate that our method outperforms both existing single‑stage and two‑stage approaches while maintaining high real‑time processing efficiency.
Authors: Cole Dickerson, Wahab Khawaja, Ismail Guvenc
Abstract: Unmanned aerial vehicles (UAVs) enable numerous commercial and public‑safety applications, yet they also create security risks near critical infrastructure, transportation hubs, and restricted airspace. While integrated sensing and communications (ISAC) can leverage existing wireless networks for UAV surveillance, practical deployment must address competition between sensing and communication demands, as well as the challenges associated with tracking highly maneuverable UAVs with low radar cross section (RCS). This paper investigates adaptive multistatic ISAC for load‑aware UAV detection and tracking in 5G wireless networks. A shared‑resource framework is developed to quantify how sensing waveform length, sensing transmission rate, and beam allocation affect communication throughput in a 5G new radio (NR) system. Detection performance is analyzed using Zadoff‑Chu (ZC) sensing waveforms, while tracking continuity is evaluated through an M‑of‑N detection model. To improve robustness under congestion, software‑defined sensor (SDS) nodes exploit external signals of opportunity (SoO) to provide supplemental passive sensing opportunities when network resources become limited. Results show that adaptive sensing policies outperform fixed sensing reservations by preserving throughput under dynamic load while maintaining useful sensing capability. Under heavy congestion, SDS assistance substantially reduces tracking outage in the simulated scenarios. Cramer‑Rao lower bound (CRLB) analysis demonstrates that multistatic sensing geometries improve localization accuracy and provide more uniform spatial coverage than monostatic sensing alone. These results highlight coordinated adaptive sensing and distributed multistatic support as a practical path toward resilient UAV surveillance in future wireless networks.
Authors: Di Yang, Mahmoud Ali, Quan Kong, Gianpiero Francesca, Francois Bremond
Abstract: Vision‑language models such as CLIP have recently achieved strong performance on a wide range of visual understanding tasks. However, most existing models rely primarily on appearance‑level supervision from images or videos, and do not explicitly model human motion, which is essential for fine‑grained and human‑centric action recognition task as actions are defined by temporally structured and physically grounded body movements. To address this problem, we propose Transferable skeleton MOtion Representation (T‑MOR), a motion‑aware framework that learns transferable action representations from skeleton sequences with the aid of video and language supervision during training. T‑MOR adopts a multi‑modal contrastive learning scheme that aligns skeleton motion with visual and textual representations, while performing inference using only lightweight skeleton inputs. To support large‑scale pre‑training, we construct PoseCap‑1M, a new dataset that contains over one million synchronized video, skeleton, and text triplets covering diverse human activities. We evaluate T‑MOR on a range of human‑centric action recognition benchmarks, including action classification and frame‑wise temporal detection. Experimental results show that T‑MOR consistently improves performance across multiple datasets, such as Toyota Smarthome, Penn Action, UAV‑Human, TSU, and Charades. In addition, T‑MOR demonstrates strong generalization ability in few‑shot and zero‑shot settings, highlighting the effectiveness of motion‑centric and embodied representations for transferable action understanding.
Authors: Yihong Tian, Junjie Zhang, Liuyang Li, Deteng Zhang, Yunfei Zuo, Jie Yin
Abstract: Reliable localization is essential for intelligent transportation systems (ITS), including autonomous vehicles, quadruped last‑mile carriers, and infrastructure‑inspection unmanned aerial vehicles (UAVs). Although tightly‑coupled multi‑sensor fusion improves accuracy in favorable conditions, deployed systems remain vulnerable to sensor degradation ‑‑ poor illumination, LiDAR degeneracy, wheel slippage, and GNSS outage ‑‑ and to spatiotemporal calibration errors. These failures are common in urban canyons, tunnels, and high‑speed corridors, where localization drift can degrade route tracking, tunnel passage continuity, and local map alignment. This paper presents Ultra‑Fusion, a tightly‑coupled multi‑sensor localization framework based on a unified sliding‑window estimator. Asynchronous measurements are timestamp‑ordered and converted into optional factors within one optimization window, supporting WIO, VIO, LIO, and LVIO with optional wheel and GNSS augmentation. Observability‑aware initialization selects the bootstrap mode, factor‑wise reliability scheduling gates degraded measurements, and online LiDAR‑‑IMU spatiotemporal calibration refines temporal offsets and rotational extrinsics during operation. We extend the M3DGR benchmark with simulation trajectories and evaluate more than 60 open‑source SLAM systems on M3DGR, M2DGR‑Plus, KAIST, GrandTour, and MARS‑LVIG. The results show competitive accuracy across wheeled, legged, and aerial platforms under long‑duration and high‑speed operation, degradation, and calibration perturbation, improving localization availability for road‑level autonomy, campus and warehouse mobility, and low‑altitude aerial inspection. To benefit the industrial and academic community, we will release source code and datasets upon paper acceptance.
Authors: Jost Wittmann, Ahmed Abdullahi Hassan, Nils Kömpe, Andreas Nüchter
Abstract: Inspection of offshore wind turbine rotor blades is critical for predictive maintenance to maximise efficiency and extend operational lifetime. However, it remains a challenging task due to remote locations, large structural dimensions, and the limitations of current UAV‑compatible sensor systems. While existing approaches can detect certain types of surface anomalies, reliable classification of defect types often remains a manual and error‑prone process. This paper presents the design of a UAV‑mounted multimodal sensor network combining an industrial RGB camera, a passive thermal infrared camera, and an in‑house developed 3D scanner. All sensors are co‑calibrated into a common coordinate frame, enabling spatial superimposition of geometric, colour, and thermal data. The system is designed to operate at close range, addressing three fundamental sensing challenges: platform motion, large field of view, and millimetre‑level measurement accuracy. Preliminary laboratory results demonstrate synchronised multi‑sensor acquisition and initial point cloud reconstructions, forming the basis for future airborne inspection trials.
Authors: Ruixing Ren, Junhui Zhao, He Fang
Abstract: Leveraging their flexible and efficient deployment capabilities, unmanned aerial vehicle (UAV) swarms have been widely applied in various mission scenarios. However, the open communication environment also exposes them to the threat of Byzantine attacks. Most existing studies assume independent decision‑making by each UAV, neglecting that local conformity amplifies false information propagation. This paper constructs an evolutionary game model for UAV swarm under malicious attacks based on graph evolutionary game theory, revealing how local conformity rules govern the spread of deceptive strategies. Using death‑birth updating rules, we derive the macroscopic dynamic equation for the fraction of deceptive strategies and the analytical solutions to its evolutionary stable states. Sim ulations reveal observation errors weaken malicious induction, while higher proportions of malicious nodes and greater attack intensity drastically amplify attack impacts. Moreover, the model exhibits strong topological robustness across regular, scale‑free and random networks.
Authors: Yamil Uchani, Grace Abigail Luna Verdueta, Mauricio Figueroa, Edwin Salcedo
Abstract: UAV‑based pavement inspection can reduce the cost and risk of road‑surface monitoring, but real‑world deployment remains difficult when traffic, pedestrians, and temporary occlusions affect the visibility of defects. This paper presents a Unity‑based digital twin framework for traffic‑aware UAV pavement monitoring without lane closure. The proposed environment integrates procedurally generated road defects, dynamic vehicles and pedestrians, autonomous UAV navigation, and an embedded road‑damage perception pipeline. The perception module uses a two‑stage approach: a lightweight YOLOv8n detector first localises road defects, pedestrians, and vehicles, while a second classifier distinguishes among potholes, single cracks, and crocodile cracks. On the simulator test set, the full pipeline achieved 99.26% overall accuracy across five classes. The digital twin was then used to evaluate three recovery strategies for occluded road segments: hover‑and‑recheck, micro‑repositioning, and skip‑and‑revisit. Experiments were conducted across different traffic densities and flight altitudes using coverage, mission time, energy consumption, and revisit ratio as operational metrics. Results show that flight altitude has a strong influence on inspection coverage and that adaptive recovery improves performance under occlusion. In particular, hover‑and‑recheck achieved the most consistent coverage under medium and high traffic conditions, reaching up to 97.03% coverage, while skip‑and‑revisit was most effective in low‑traffic scenarios, reaching 97.95% coverage at medium altitude. These results demonstrate that digital twins can support the development and evaluation of traffic‑aware UAV inspection strategies before real‑world deployment.
Authors: Jingfeng Zhang, Yi Li, Xianchong Liang, Huan Yang
Abstract: Slope hazards constitute a major safety threat to expressway infrastructure, and their evolution is typically manifested as slow surface deformation. Conventional manual inspection suffers from low efficiency and inadequate operational safety, especially on severely deteriorated slopes. Accordingly, there is an urgent need for an automated, high‑precision solution capable of large‑area slope observation and analysis. This study aims to develop a highly automated workflow for slope hazard detection using Unmanned Aerial Vehicle (UAV)‑borne Light Detection and Ranging (LiDAR). The proposed workflow consists of a shared data‑acquisition and ground‑surface extraction stage, a single‑observation hazard‑screening branch based on RandLA‑Net, and a multi‑epoch deformation‑monitoring branch based on grid‑wise elevation differencing. To validate the effectiveness of the proposed system, we conducted multiple UAV‑borne LiDAR data‑acquisition flights in real expressway slope environments. The results show that the workflow can extract usable ground‑surface point clouds under vegetation cover, identify potential hazard zones from single‑observation point clouds, and quantify centimeter‑level elevation changes using multi‑epoch grid differencing. This study establishes an end‑to‑end UAV‑borne LiDAR‑based workflow for slope inspection and demonstrates its feasibility through controlled experiments, field tests, and simulation‑based validation, thereby providing an implementable solution for automated slope‑hazard monitoring and intelligent early warning.
Authors: Matěj Petrlík, Filip Novák, Robert Pěnička, Martin Saska
Abstract: A precise state estimate is crucial for a tight feedback control that enables agile and near‑obstacle flights of UAVs. The state‑of‑the‑art methods fuse slow pose measurements with high‑frequency inertial measurements to obtain a precise state estimate. However, the inertial measurements from the IMU onboard the UAV are degraded by vibrations from spinning propellers and the precision of the estimated state suffers. We propose a novel approach based on the preintegration of accelerations obtained from motor speeds. We show that the accelerations obtained in this manner can be used for state propagation on their own to achieve better precision without including the IMU. Further, we propose a factor composed of the preintegrated motor speeds that can be directly employed in factor graph optimization frameworks. We combine our factor with LiDAR measurements into the proposed Motor Angular Speed LiDAR Odometry (MAS‑LO) algorithm for precise state estimation, which we open‑source. Lastly, we evaluate the estimation precision against a state‑of‑the‑art inertial algorithm LIO‑SAM to show 28% improvement in position and 65% in velocity estimation accuracy, 14% lower measurement lag, and high robustness to wrong parameter values.
Authors: Shihui Yan, Hu Liu, Junyu Shi, Zihui Zhu, Ziqi Zhou, Yufei Song, Youming Geng, Minghui Li, Shengshan Hu
Abstract: Adversarial camouflage in the physical world remains highly challenging, particularly under UAV reconnaissance where targets undergo continuous geometric changes and extreme illumination variations. Existing methods either optimize 2D digital perturbations that fail to generalize to dynamic viewpoints or produce visually unnatural textures that cannot be deployed in real scenarios. Therefore, we propose an end‑to‑end framework for adversarial camouflage generation that automatically produces wearable adversarial patterns and maintains stable attack performance in real physical environments with changing viewpoints, poses, and lighting conditions. Our method integrates UV‑volume rendering with a diffusion‑based texture generator, enabling consistent appearance under varying scales, poses, and lighting conditions. To ensure environmental realism, we propose an illumination color consistency estimator that extracts dominant background attributes and guides a natural texture loss to align the generated UV texture with the surrounding environment. A multi‑scale dynamic training strategy further enhances robustness against viewpoint shifts and body deformation. Extensive experiments across multiple mainstream detectors demonstrate that our method achieves strong and stable physical attack performance while maintaining high perceptual naturalness, reducing human detection rates without introducing unnatural artifacts.
Authors: Zhiyu Chen, Chunran Zheng, Jiayu Wen, XiaoLei Zhang, Jiaming Xu, Feng Pan, Yukang Cui
Abstract: Robust state estimation and mapping in long‑term, large‑scale, and highly dynamic environments remains a key challenge in robotics. Existing LiDAR‑Inertial‑Visual Odometry (LIVO) systems achieve strong local accuracy but suffer from accumulated drift over long distances and may fail in geometrically degraded or textureless scenes. Meanwhile, GNSS‑aided fusion frameworks often rely on LiDAR or visual odometry for state prediction and outlier rejection, making them vulnerable when odometry degenerates. To address these limitations, we propose a tightly coupled LiDAR‑Inertial‑Visual‑GNSS fusion framework based on an Error‑State Iterated Kalman Filter. An online spatiotemporal alignment module using Dynamic Time Warping is introduced for highly dynamic conditions. To better exploit GNSS precision, we develop observation models based on Doppler shifts and fixed‑anchor Time‑Differenced Carrier Phase, providing millimeter‑level relative constraints without augmenting historical anchor states. We further design a degeneracy‑aware dual‑mode outlier rejection strategy that switches between LIVO‑prior‑guided rejection and GNSS‑aided recovery according to the LIVO degeneracy level. Experiments on the public M3DGR dataset and a custom 20~m/s fixed‑wing UAV dataset demonstrate that our system reduces accumulated drift and map ghosting, outperforming state‑of‑the‑art methods in accuracy and robustness.
Authors: Maneesha Wickramasuriya, Beomyeol Yu, Jaden Shin, Mason Huslig, Taeyoung Lee, Murray Snyder
Abstract: Autonomous UAV operations on ships require reliable vision‑based relative pose estimation, yet at‑sea validation is costly, weather‑dependent, and risky. This paper presents a hardware‑validated vision‑in‑the‑loop framework that enables fully autonomous indoor flight while emulating photorealistic maritime environments. Rendered maritime views are processed onboard by a deep transformer‑based monocular pose estimator. Delayed vision measurements are fused with high‑rate IMU data using a delayed Kalman filter to provide consistent state estimates for geometric control. The system captures critical embedded effects, including perception latency, asynchronous updates, and computational constraints, that are absent in pure simulation. Autonomous takeoff, trajectory tracking, and landing experiments demonstrate stable closed‑loop flight. The results establish a safe and hardware‑realistic intermediate stage for developing maritime UAV autonomy prior to shipboard deployment.
Authors: Azim Akhtarshenas German Svistunov, Kuangyu Zheng, David Lopez-Perez
Abstract: Unmanned aerial vehicle (UAV)‑mounted base stations are highly susceptible to wind disturbances such as gusts and turbulence, which induce positional drift and degrade communication link quality, particularly in emergency scenarios. To address this challenge, we propose a DRL‑based framework for wind‑resilient trajectory adjustment and positioning based on the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm. The method models wind as a stochastic kinematic perturbation, avoiding complex aerodynamic modeling, thereby enabling the TD3 agent to learn adaptive control policies that maintain optimal coverage footprints. By prioritizing user‑centric performance metrics under turbulent conditions, the proposed architecture ensures continuous service availability despite external disruptions. Simulation results demonstrate that the TD3‑based approach effectively compensates for wind‑induced displacements and outperforms benchmark methods, including Proximal Policy Optimization (PPO), in terms of throughput stability and robustness in windy environments.
Authors: Zijie Meng, Ziwei Li, Yufei Liu, Zhiyu Li, Jiyuan Liu, Wenhua Nie, Bingcai Wei, Miao Zhang
Abstract: Safe coordination in networked cyber‑physical systems forces learning algorithms to simultaneously handle hybrid discrete‑continuous actions, hard training‑time safety constraints, and physics‑governed dynamics. We show that these three features form a directed cycle of biases that defeats any naive composition of off‑the‑shelf modules, and formalize this as a three‑way coupling lemma. We then introduce TRIDENT, the first MARL framework whose three components are co‑designed to cancel each leak: a Richardson‑Romberg gradient correction reducing Gumbel‑Softmax bias from O(tau) to O(tau^2), a Lyapunov‑constrained sequential trust‑region update enforcing per‑iterate feasibility, and a physics‑informed residual critic that decomposes value rather than reward. We prove an O~(1/sqrt(K)) convergence rate to a constrained Nash equilibrium and an O(sqrt(K)) cumulative‑violation bound. On multi‑UAV mobile‑edge computing, autonomous intersection management, and a hybrid SMAC variant, TRIDENT cuts training‑time violations by 95.5% over MADDPG and 76.3% over MACPO, while improving reward by 13.5% over the strongest unconstrained baseline.
Authors: Mengke Zhang, Sitong Li, Tiancheng Lai, Ruitian Pang, Mingxuan Zhang, Qingcheng Chen, Fei Gao, Chao Xu, Yanjun Cao
Abstract: Safe and efficient trajectory planning in unknown, cluttered 3D environments constitutes a critical bottleneck for deploying Unmanned Aerial Vehicles (UAVs) in real‑world applications. This challenge is further exacerbated by the limited field‑of‑view (FOV) and sensing range of onboard sensors. Many existing methods either make simplistic assumptions about unexplored space or rely on conservative heuristics such as speed limits or fixed perception patterns, reducing efficiency and generalizing poorly across different sensor types. In this work, we propose a novel planning framework that directly integrates active perception into trajectory optimization, thereby improving safety while preserving efficiency. The perception constraints are derived from the UAV's dynamic model and formulated in the sensor coordinate frame, which enables precise handling of FOV geometry. The velocity‑triggered activation mechanism enables the planner to balance perception and motion efficiency. We introduce an active perception sub‑trajectory segment with parametric start‑time optimization, mitigating collision risks from late obstacle detection. Our formulation enables active perception during arbitrary 3D maneuvers, extending beyond prior methods designed mainly for horizontal motion. All constraints and penalties are incorporated into a differentiable optimization problem, so the planner requires only a simple front‑end global path for guidance, rather than a computationally expensive perception‑aware path generator. Extensive simulations and real‑world experiments demonstrate robust performance across diverse unknown environments with varying sensor configurations.
Authors: I-Ling Yen, Akeem Mohammed, Farokh Bastani, San-Yih Hwang
Abstract: Despite rapid advances in UAV technologies, current deployments remain limited due to several gaps in UAV systems research. To address these challenges, we propose OmniDroneX, a unified Drone‑as‑a‑Service ecosystem, in which drones are transitioned from fixed function platforms into dynamically composable entities that can be integrated with external infrastructures to offer omni‑capabilities. OmniDroneX bridges low‑level physical primitives with high‑level mission intent through a unified vendor‑agnostic interface (libUAV) and a formal physical‑service abstraction model (PT‑SOA). A core innovation is the diverse application of large language models (LLMs) across multiple layers of the OmniDroneX architecture. LLMs are used to assist in identifying and formalizing primitive device functions and abstract service definitions, supporting automated service composition and workflow generation, and enabling interactive, natural‑language mission specification and refinement. OmniDroneX also incorporates important categories of composition techniques that are essential in dynamic UAV systems, including physical layer composition for drone capability augmentation, as well as spatiotemporal, functional, collaborative, exception‑aware, and QoS‑based service compositions. Collectively, these features allow OmniDroneX to serve as a foundation for scalable, resilient, and self‑evolving UAV ecosystems operating in complex and dynamic environments.
Authors: Sozan Sulaiman Maghdid, Tarik Ahmed Rashid, Shavan Askar
Abstract: Physical cyber systems have brought about new threats and challenges in detection and immediate response. This study examines how Graph Neural Networks (GNNs) can be used to aid cybersecurity and drone management in a physical cyber system comprising of cyber intrusions and unmanned aerial vehicles (UAVs). By providing a bridge between structural understanding of graphical neural networks, this work has provided an integrated procedure that allows intrusion detection systems to educate on underlying network structures, identify malicious activity, and facilitates drone response measures. Based on an emulation‑based case study, cyberattacks models were created to provoke the responses of the drones, which proved that graph‑based learning can assist with the situational awareness, swarm coordination, and adaptive maneuver. According to the performance valuation, this method has a detection rate of 94.2, average area under the receiver operating characteristic (ROC) of 0.955 and an average response time of 1.4 seconds. Comparative experiments reveal that proposed GraphSAGE network is more effective than the Graphical Convolutional Networks (GCNs) and Graphical Attention Networks (GATs) in the identical situation. Such findings prove that graphical neural networks can be used to avert intrusion and response of dynamic cyber‑physical systems.
Authors: Hongjiang Lei, Xiaqiu Wu, Ki-Hong Park, Gaojie Chen, Gaofeng Pan
Abstract: Uncrewed aerial vehicles (UAVs) are increasingly being employed for data collection tasks, thanks to their high mobility and easy deployment, acting as aerial platforms to collect data from ground devices (GDs). This study considers a secure underlay data collection system assisted by dual UAVs and focuses on the joint design of the UAVs' three‑dimensional (3D) flight paths, the power of the jamming UAV, the power of GDs, and the scheduling of the underlay GDs in the context of an aerial eavesdropper. The highly coupled objective function and non‑convex constraints make the formulated problem more complicated to solve. We first utilize an approximate lower bound on the expected spectral efficiency to streamline the solution process. The average secrecy spectral efficiency (ASSE) is maximized by jointly designing the 3D trajectory of the UAVs, the transmit power of GDs, and the user scheduling. The optimization problem is decomposed into four subproblems using block coordinate descent, with each of them into manageable convex optimization tasks by incorporating slack variables and employing successive convex approximation methods. The numerical results validate the effectiveness of our proposed approach, demonstrating that the design of UAV 3D trajectories remarkably improves the ASSE of the considered system.
Authors: Bodhisatwa Kundu, Anish Rooj, Sumit Saha, Abhradeep Sarkar, Arghadip Das, Arnab Raha, Mrinal K. Naskar
Abstract: State estimation is the closed‑loop core of every real‑time tracking system, from radar surveillance and counter‑UAV defense to autonomous driving and robotics. These deployments run on edge platforms, where defense systems mount on vehicles and drones, and civilian pipelines live on cars and handheld devices. Here, every additional watt of compute erodes mission duration or operational range. Two hard constraints follow: each new measurement must be fused before the next control cycle, and the total compute must fit within a strict battery and thermal power envelope. The Linear and Extended Kalman Filters (LKF, EKF) are dominant estimators on these systems, but today they execute almost exclusively on CPUs, which serialize multi‑object tracking (MOT) updates, or on custom FPGA/ASIC accelerators that lengthen design cycles. Contemporary AI‑PC SoCs, like the Intel Core Ultra Series 1 and 2, integrate a low‑power, data‑parallel Neural Processing Unit (NPU). We therefore ask whether the Kalman filter can be mapped onto this existing matrix engine to meet real‑time and low‑power budgets simultaneously, avoiding a dedicated accelerator and keeping the CPU and GPU free for primary workloads. We present KATANA, an NPU‑aware optimization framework delivering the first end‑to‑end mapping of the LKF and EKF onto a commercial NPU, alongside a cross‑platform characterization on shipping AI‑PC silicon. KATANA applies three algebraic graph rewrites: subtract‑to‑add reformulation via a precomputed negative‑projection matrix H_neg, static‑shape tensor fusion, and block‑diagonal batched parallelization, ensuring 100% of operations execute on the DPU matrix engine. On the Series 2, the optimized batched EKF reaches 223.35 FPS at 13.43 W active power, and the LKF reaches 408.73 FPS at 14.05 W, delivering up to a 97.9% reduction in dynamic energy versus the CPU implementation.
Authors: Wenhao Lu, Zhengqiu Zhu, Xiaofeng Wang, Xiaoran Zhang, Yatai Ji, Yong Zhao, Yue Hu, Yingzhen Nie, Jinlong Zhu, Zheng Zhu
Abstract: Aerial Embodied Question Answering (EQA) requires Unmanned Aerial Vehicles (UAVs) to actively perceive the environment and answer natural language questions. Existing outdoor EQA systems usually stop once the target enters the UAV's field of view, leaving the fine‑grained viewpoint adjustment needed for evidence‑seeking questions largely unresolved. To address this issue, we introduce FG‑EQA, a fine‑grained active perception EQA benchmark with more than 40K simulated trajectories and 1K real‑world trajectories. Drawing inspiration from the ``waggle dance'' of scout bees, which iteratively adjust their flight paths to verify target information, we propose ScoutVLA, an evidence‑driven Vision‑Language‑Action model for outdoor EQA. To emulate this active exploration behavior, ScoutVLA features a decoupled dual‑expert architecture: a vision‑language expert infers the semantic intent to identify missing evidence, while an independent action expert employs high‑DoF flow matching to generate continuous viewpoint‑refinement trajectories. To balance the competing demands of continuous control and semantic reasoning, we devise a decoupled training strategy with a knowledge insulation mechanism that prevents the action gradients from erasing the model's multimodal reasoning ability. Extensive simulated experiments and a qualitative real‑world field study both verify the superiority of ScoutVLA over the state‑of‑the‑art baselines, demonstrating a 10.48\boldsymbol× higher average strict success rate and a 7.72\boldsymbol× higher average QA correctness.
Authors: Shrikant Banerjee, Reza Faieghi
Abstract: As Uncrewed Aerial Vehicles (UAVs) transition toward higher levels of autonomy, the ability to perform unassisted recovery in non‑cooperative, unstructured environments becomes critical. Achieving safe autonomous landing requires high‑fidelity semantic resolution to distinguish navigable terrain from hazardous obstacles, yet development is often hindered by the scarcity of annotated aerial datasets. This work proposes a comprehensive perception and data generation pipeline designed to bridge the sim‑to‑real gap for autonomous landing tasks. We introduce a procedural synthetic data engine that generates photorealistic urban environments with automated semantic annotations through domain randomization. A Transformer‑based OneFormer architecture is fine‑tuned exclusively on this synthetic data, leveraging multi‑head self‑attention mechanisms for global context resolution. To ensure operational safety, a deterministic landing module utilizes a Euclidean Distance Transform (EDT) and dynamic inference logic to identify the largest inscribed safe landing zones while maintaining strict clearance buffers around obstacles. Quantitative benchmarking against the UAVid dataset demonstrates robust semantic segmentation performance, while qualitative validation on real‑world UAV footage confirms the system's ability to identify collision‑free landing sites in unseen environments. Our results highlight the potential of high‑fidelity procedural simulation to eliminate the need for manual annotation while providing robust, edge‑deployable situational awareness for autonomous UAV recovery.
Authors: Taewoo Park, Kyeonghyun Yoo, Seunghyun Yoo, Hwangnam Kim
Abstract: Agentic AI can support unmanned aerial vehicle (UAV) autonomy by providing high‑level recovery reasoning when local waypoint‑ or setpoint‑based execution encounters blocked passages, repeated no‑progress behavior, or mission‑level ambiguity. On physical UAVs, however, remote reasoning is most useful when it is invoked selectively, since each call introduces latency, resource cost, backend uncertainty, and a need to validate the returned decision. This paper presents Persistent Mission Runtime (PMR), a UAV recovery framework that keeps the mission loop and safety‑critical execution local while using an external agentic reasoner only as an on‑demand recovery module. The reasoner selects from predefined recovery skills, and each returned decision is parsed, verified, safety‑filtered, and mapped to local executor actions before it can affect flight. PMR introduces learned Cognitive Value of Invocation (learned‑CVI), a compact admission gate that estimates when remote agentic reasoning is likely to improve near‑term mission progress enough to justify its operational cost. Across a fixed 400‑run Gazebo/PX4 benchmark with eight scenarios, learned‑CVI raises hard/ambiguous‑regime success from 5.0% under local‑only autonomy to 95.0%, outperforms one‑shot and periodic reasoning baselines by 20.0 and 32.5 percentage points, and reduces remote‑agent calls by 16.7% and logged tokens by 29.2% relative to a manually tuned rule‑based invocation baseline.
Authors: Ivan Valuev, Iana Zhura, Valerii Serpiva, Didar Seyidov, Dzmitry Tsetserukou
Abstract: Autonomous UAV navigation is conventionally solved by pipelines that separate perception, mapping, and planning into distinct stages, which propagates errors, accumulates latency, and requires environment‑specific retuning. End‑to‑end generative models remove these interfaces by mapping raw observations directly to trajectories, but inherit a subtle failure mode: trained on clean data, they cannot recognise when an observation is unreliable, and treat degraded regions such as glass, mirrors, and overexposed surfaces as valid evidence for planning. We present a reliability‑aware diffusion planner for 3D UAV navigation. It conditions trajectory generation on the observation together with a scene‑level reliability heatmap that marks where perception cannot be trusted, produced by a lightweight network that distils the open‑vocabulary reasoning of a vision‑language model within the real‑time planning budget. To generalise to unseen environments without retraining, we steer the denoising process with a differentiable two‑stage ESDF cost that treats physical obstacles from depth and virtual obstacles from highly unreliable regions on equal footing. In simulation and on a real quadrotor, our planner produces markedly safer trajectories than a state‑of‑the‑art diffusion baseline, reducing the obstacle‑violation rate from 40.3% to 9.6% and raising the mean reliability of traversed regions from 0.588 to 0.925. Ablating the reliability term alone drops mean reliability from 0.898 to 0.783, confirming it as the decisive component, while distillation runs the framework up to 2 times faster than the full vision‑language model.
Authors: Zhenhong Peng, Junhao Wei, Baili Lu, Yanxiao Li, Yifu Zhao, Haochen Li, Dexing Yao, Xu Yang, Yapeng Wang
Abstract: Unmanned aerial vehicle (UAV) relays deliver flexible, on‑demand wireless coverage, but jointly tuning the position, altitude, transmit power and bandwidth of the relay is a non‑convex, heavily constrained optimization task that easily traps swarm‑based optimizers in poor local optima. We propose PWOA, a Polynomial‑decay and Pinhole‑imaging Whale Optimization Algorithm with three complementary improvements: (i) a Good Nodes Set (GNS) initialization that spreads the initial population uniformly across the search space; (ii) a polynomial nonlinear schedule for the convergence factor that prolongs early exploration and sharpens late exploitation; and (iii) a stagnation‑triggered pinhole‑imaging opposition‑based learning (POBL) operator paired with an elite Gaussian local search, which together escape local optima while refining the leader. On a five‑dimensional UAV relay deployment problem with five inequality constraints (N=30, T=500, 30 independent runs), PWOA simultaneously attains the lowest Best, Worst, Mean and standard deviation among PWOA, WOA, SCA and IPSO, cutting the mean by 1.4‑‑18.5% and the standard deviation by 15‑‑87% over the three baselines, and exhibits the fastest average convergence.
Authors: Guangji Chen, Long Shi, Qingqing Wu, Qiaoyan Peng, Caihong Kai
Abstract: Providing reliable communication for unmanned aerial vehicles (UAVs) via existing cellular networks is crucial for enabling the rapid growth of the low‑altitude economy. However, UAV jittering significantly degrades communication quality due to induced beam misalignment. Inspired by recent advances in integrated sensing and communication, we propose a novel two‑stage active sensing‑assisted communication framework tailored for ground‑to‑UAV links with jittering. Specifically, two schemes are conceived to leverage sensing for enhancing communication performance, namely the communication‑oriented scheme and the sensing‑oriented scheme. For the sensing‑oriented scheme, deterministic signals are employed in the first stage to facilitate angle‑of‑arrival (AoA) acquisition at the UAV side, followed by pure communication service in the second stage by using the estimated AoA. In contrast, the communication‑oriented scheme employs Gaussian information‑bearing signals throughout both stages, with AoA estimation relying on Gaussian random signals. For both schemes, we provide maximum likelihood estimators for AoA, along with analytical results characterizing the Cramér‑Rao bound. To capture the performance limit, closed‑form expressions for the achievable rates of the two schemes are derived, unveiling a fundamental tradeoff between sensing and communication quality across the two stages by tuning the time allocated to the first stage. The optimal time allocation that maximizes the overall rate is obtained in semi‑closed‑form. Based on these results, we unveil a sufficient condition under which the communication‑oriented scheme outperforms the sensing‑oriented scheme, which admits an interesting threshold‑based structure. Asymptotic analysis demonstrates that the performance loss of the proposed schemes relative to the jitter‑free upper bound approaches zero in the high transmit power regime.
Authors: Jianli Sun, Bin Tian, Qiyao Zhang, Zijian Liu, Yutong Wang, Zhiyong Cui, Bai Li, Yisheng Lv, Yonglin Tian
Abstract: Aerial manipulation systems have long suffered from representation coupling in end‑to‑end control, as platform‑level Unmanned Aerial Vehicle (UAV) movement and end‑effector‑level arm manipulation differ substantially in action scale, dynamics, and control objectives. In this paper, we propose AIR‑VLA+, a flow matching action generation architecture specifically designed for aerial manipulation, featuring cascaded dual‑action decoders and an asymmetric feature‑level Mixture of Experts (MoE). We construct cascaded manipulation and movement decoders, allowing the UAV to unidirectionally observe the manipulator's intent during movement to achieve workflow coordination, while isolating the impact of UAV movement information backpropagation on arm manipulation stability. Addressing the characteristic that UAV movement is highly dependent on high‑level semantics and responsible for task state transitions in aerial manipulation, we design an input feature enhancement module for the UAV movement decoder. This module introduces an implicit visual grasp projector to perceive the interaction state between the gripper and the object, and injects compressed global semantic features. Within the UAV movement decoder, we deploy an implicit MoE architecture, enabling different movement experts to spontaneously exhibit capacity inclinations for various task stages during training. Through dense soft blending computation on the feature manifold, the UAV movement is endowed with stronger task‑stage adaptability. Experiments on the standardized AIR‑VLA benchmark demonstrate that our method comprehensively surpasses all baselines with an overall average score of 48.0. The overall task completion score improves by 80.2% compared to the single‑head π_0.5 policy, effectively mitigating the heterogeneous coordinated control conflicts of composite robots.
Authors: Lars Oerlemans, Moji Shi, Marija Popovic
Abstract: Safe ground navigation in large, threat‑augmented environments requires aerial support that actively reduces the risks that a ground vehicle faces along its route. Existing aerial reconnaissance systems focus on mapping or covering the environment, but do not direct sensing toward regions that are most relevant for ground vehicle safety. In this paper, we address the problem of coordinating a team of unmanned aerial vehicles (UAVs) to improve the safety of an unmanned ground vehicle (UGV) navigating through unknown threat zones. A key aspect of our approach is a shared exposure belief that is updated online from aerial observations and used jointly by the UAV team and the ground vehicle. This enables us to direct aerial sensing towards route‑relevant regions while allowing the UGV to replan around newly revealed threats. We coordinate the UAV team through spatial region assignment to avoid redundant sensing. Simulation experiments show that our approach reduces cumulative UGV exposure by 38% compared to a system that does not account for hazard levels, and reduces redundant aerial coverage from 38.8% to 3.7% under our multi‑UAV coordination scheme.
Authors: Ke Li, Jianfei Yang, Luyao Zhang, Guo Yu, Chengwei Yan, Yuan Ding, Di Wang, Nan Luo, Gang Liu, Xiao Gao, Quan Wang
Abstract: Unmanned aerial vehicles (UAVs) are increasingly used in inspection, search and rescue, environmental monitoring, and emergency response. However, most UAV applications still rely on pre‑defined command sequences or task‑specific pipelines, where developers manually connect perception, planning, flight control, simulation, logging, and safety modules. This limits the flexibility, reproducibility, and extensibility of autonomous aerial systems. This paper presents AerialClaw, an open‑source software framework that enables UAVs to operate as decision‑making aerial agents rather than merely command‑following platforms. Given a natural‑language mission, AerialClaw allows an LLM‑based agent to understand the task, maintain context, invoke executable aerial skills, observe perception and runtime feedback, and iteratively update its decisions in a closed loop. The framework adopts a modular brain‑skill‑runtime architecture, combining hard skills for atomic UAV operations, Markdown‑based soft skills for reusable task strategies, document‑driven agent state and capability boundaries, memory‑driven reflection, safety‑oriented runtime validation, and platform‑agnostic execution adapters. AerialClaw supports lightweight mock execution, PX4 SITL with Gazebo, and AirSim‑based simulation, together with a web console, pluggable model backends, example missions, simulation assets, and staged deployment scripts. By combining standardized aerial skills, document‑driven agent state, memory, and closed‑loop LLM decision‑making, AerialClaw provides a reproducible and extensible open‑source framework for building UAV systems that can interpret missions, make decisions, execute skills, and adapt their behavior from feedback.
Authors: Baoyang Jiang, Fengchun Zhang, Leyuan Wang, Haotian Li, Yida Wang, Zhe Ji, Jinshan Lai, Xi Ren, Jianwei Hu, Qiang Ma
Abstract: Benchmarks are essential for evaluating embodied spatial intelligence, yet their construction is labor‑intensive, hard to reuse, and difficult to maintain. Existing embodied benchmarks are often static and may quickly become saturated as models improve, limiting their ability to distinguish new capabilities. We propose Embodied‑BenchClaw, an autonomous agentic system for constructing embodied spatial intelligence benchmarks. Given a user‑specified evaluation intent, Embodied‑BenchClaw automatically produces a complete and continually updatable benchmark package through a five‑stage pipeline: intent blueprinting, data collection, structuring and cleaning, benchmark synthesis, and evaluation reporting. The pipeline is coordinated by three agents for planning, construction, and evaluation. To improve reusability and reliability, Embodied‑BenchClaw introduces an extensible Skill Library and process quality control, enabling benchmark construction to be composable, verifiable, and repairable. We instantiate multiple benchmarks covering indoor spatial reasoning, outdoor spatial reasoning, robotic manipulation, quadruped robot navigation, UAV/aerial‑view understanding, and static benchmark enhancement. These benchmarks span diverse embodied carriers, data sources, and spatial capabilities. Experiments with human evaluation, judge‑based assessment, consistency checks, cost analysis, and ablations show that Embodied‑BenchClaw can construct verifiable, executable, maintainable, and diagnostically useful embodied spatial benchmarks with reduced manual effort.
Authors: Tiancheng Lai, Yuman Gao, Xiangyu Li, Ruitian Pang, Xingpeng Wang, Siqi Shen, Mengke Zhang, Yin He, Fei Gao, Chao Xu, Yanjun Cao
Abstract: Autonomous exploration with UAVs in large‑scale, topologically complex environments often suffers from low efficiency due to suboptimal scheduling and detours. Prior maps (e.g., construction drawings), although usually imprecise and flawed, are readily available in many scenarios and have the potential to provide global structural guidance. This paper presents a novel exploration framework that leverages sparse, unaligned, and even discrepant 2D prior maps for LiDAR‑based UAV exploration. First, a robust 2D‑3D point cloud registration pipeline is proposed to align LiDAR observations with prior maps. The registration pipeline combines a GeoContext descriptor for single‑frame candidate retrieval, a multi‑frame verification mechanism for coarse transformation estimation with outlier rejection, and a Scale‑ICP algorithm for refinement. The registration module can handle map discrepancies and provide multiple hypotheses when geometric ambiguities arise. To effectively utilize the registration results for exploration planning, we further develop a hierarchical viewpoint planning strategy under localization uncertainties. The hierarchical strategy first spatially attaches local viewpoints to prior guidepoints and adopts a Monte Carlo Tree Search solver to determine their traversal sequence under each registration hypothesis. To mitigate registration uncertainty, a risk‑aware selector evaluates prior sequences using confidence‑weighted travel risk, and a fixed‑endpoint traveling salesman problem is formulated to generate an efficient local coverage path under the selected prior guidance. Benchmark evaluations reveal up to 34.2% improvement in exploration efficiency and 37.9% reduction in flight distance compared to state‑of‑the‑art methods, while extensive simulations and field experiments further demonstrate robustness to prior map incompleteness and deformations.
Authors: Feibo Jiang, Li Dong, Lei Mao, Kezhi Wang, Cunhua Pan, Dong In Kim, Naofal Al-Dhahir
Abstract: Low‑Altitude Wireless Networks (LAWNs), composed of Unmanned Aerial Vehicles (UAVs) and other aerial platforms, provide integrated perception, communication, and computation services in low‑altitude airspace. However, deploying large generative models in this domain faces three major challenges: 1) Limited embodied action mapping; 2) Inadequate physical environment modeling; 3) Insufficient closed‑loop optimization. To address these challenges, this study proposes an Embodied Agentic UAV framework. Centered on a Vision‑Language‑Action (VLA) model as the execution core, the framework establishes an end‑to‑end embodied decision‑making pipeline from multimodal environmental perception to continuous control generation. In addition, a World Model (WM) is introduced to capture the coupling between UAV actions and environmental state evolution, thereby supporting environment prediction, policy verification, and dynamic optimization. Furthermore, memory and reflection mechanisms are incorporated to form an adaptive closed‑loop optimization paradigm of decision, execution, evaluation, and update, thereby enhancing the system's autonomous decision‑making capability and continual evolution ability in complex dynamic environments. Experimental results validate its effectiveness in enabling robust, predictive, and sustainable autonomous control in LAWNs.
Authors: Mohammed Saif, Shahrokh Valaee
Abstract: In this paper, we present a new approach for unmanned aerial vehicle (UAV) positioning and reconfigurable intelligent surface (RIS) partitioning to enhance connectivity of uplink RIS‑assisted UAV networks. To achieve this, our approach optimizes RIS‑aided link selection, RIS partitioning, and UAV positions to maximize network connectivity characterized by its Fiedler value. Meanwhile, it maintains a specific signal‑to‑interference plus noise ratio (SINR) constraint for user equipment (UE), which is influenced by RIS partitioning and UAV reliability. The network connectivity optimization problem is formulated using the Fiedler value subject to RIS elements allocation and SINR constraints. This problem is a computationally expensive combinatorial optimization, necessitating an efficient iterative approach. In particular, we propose a perturbation method for RIS‑aided link selection, and derive a closed‑form solution for RIS partitioning, with each partition tailored to optimize SINR for individual UAV. For the given RIS‑aided links and RIS partitioning, we then show that the problem of UAV positioning can be formulated as a low complexity semi‑definite programming (SDP) optimization problem, which can be solved using off‑the‑shelf CVX solvers. Our simulations show the potential gain of UAV positioning and RIS partitioning compared to the benchmark schemes from the literature.
Authors: Zhiting Zhou, Xingchen Liu, Xinglin Yu, Jiashen Chen, Haoyang Wang, Jingao Xu, Yunhao Liu, Xinlei Chen
Abstract: Unauthorized unmanned aerial vehicle (UAV) activity around airports, public venues, and other sensitive sites has made protected‑airspace monitoring increasingly important. A practical sensing system must search a wide angular region, find small long‑range targets, and return both bearing support and UAV‑specific evidence before a restricted perimeter is breached. Existing UAV detection paths often rely on spatially organized evidence, such as body extent, silhouette, or track continuity. At long range, however, these cues become difficult to preserve and verify as the target footprint weakens and its image‑plane support shrinks. EventRadar follows a complementary cue: propeller‑induced temporal periodicity, which recent event‑camera sensing studies have shown can reveal UAV‑specific motion after appearance becomes weak. We extend this cue to kilometer‑scale active sensing with an event‑camera prototype. Scene‑Anchored Geometry Evidence (SAGE) fuses scanning events with IMU pose to maintain a bearing‑indexed scene memory, separating transient candidate support from persistent background clutter. Comb‑guided Harmonic‑Group Learned Iterative Shrinkage and Thresholding Algorithm (CHG) then treats each candidate as a weak high‑rate timing signal and recovers phase‑insensitive harmonic evidence with fixed compute. Compared with related event‑camera baselines on 700‑1500 m UAV event recordings, EventRadar achieves 0.990 mAP_.3 and 0.949 F1_.3, reduces FN_.3 to 0.009, and shows real‑time feasibility in prototype profiling.
Authors: S. Habibi, L. Marques
Abstract: Unmanned aerial vehicles (UAVs) are increasingly used for active sensing and information gathering in spatially distributed environments. Their performance, however, is constrained by limited flight time, sensing uncertainty, and the trade‑off between spatial coverage and observation accuracy. This paper presents a real‑world validation of a multi‑UAV active sensing framework for probabilistic binary terrain mapping, with precision agriculture used as the application case. The environment is represented as a probabilistic belief map, where spatial dependencies are modeled through a factor‑graph formulation. UAV decision making is guided by Information Gain based Informative Path Planning (IGbIPP), and the approach is compared with Random Walk and Sweep coverage path planning baselines using both synthetic terrains and real UAV‑derived agricultural imagery. The study also evaluates spatial correlation weights and several probabilistic belief‑fusion rules for multi‑UAV information sharing. Results show that IGbIPP reduces entropy and mapping error more effectively than the baselines, while a wider field of view improves real‑world coverage and map accuracy. The results further show that simple equal or biased spatial weights can be more robust than adaptive weights, and that Bayesian, log‑odds, and Dempster‑‑Shafer fusion achieve the best cooperative mapping performance. These findings highlight the importance of uncertainty‑driven planning, sensing geometry, spatial modeling, and probabilistic fusion for real‑world UAV‑based active sensing.
Authors: Chanuka A. S. Hewa Kaluannakkage, Rajkumar Buyya
Abstract: Decentralized Federated Learning(DFL) enables collaborative model training across wireless edge nodes, including IoT deployments, autonomous vehicles, UAV swarms, and satellite constellations. Operating over lossy wireless links under constraints, these systems cannot rely on retransmissions, so model parameters must be accepted as partial chunks, leading to two key failure modes, which are selection bias, where poor‑quality links are systematically under‑represented in gossip aggregation, and update staleness, where asynchronous nodes contribute outdated models. We prove that classical gossip aggregation introduces irreducible selection bias proportional to the link‑loss rate. We propose DFL‑AA (Decentralized Federated Learning with Adaptive AoI‑weighted Aggregation), which corrects selection bias using Inverse Probability Weighting (IPW) with online channel estimation and mitigates staleness via Age‑of‑Information (AoI) decay without requiring a global clock. We prove that DFL‑AA removes link‑quality distortion in expectation and consistently outperforms state‑of‑the‑art baselines across varying loss rates and heterogeneous channel conditions on fixed directed topologies.
Authors: Zuan Gu, Tianhan Gao, Langxu Zhao
Abstract: Remote‑sensing and UAV applications need models that generalize across platforms and viewpoints without task‑specific training. Yet training‑free pipelines often falter on oriented geometry, scale/rotation variation, and crowded ports or airfields, and rarely unify detection and segmentation. We introduce ZODS‑RS, a training‑free, closed‑form pipeline that outputs horizontal boxes (HBB) and instance masks. Built on DINOv3 dense features and SAM‑style proposals, ZODS‑RS chains: PP (prototype purification via Tyler covariance), R‑SEM (rotation‑scale equivariant matching with separable kernels and global Hungarian assignment), and UAM (uncertainty‑aware pixelwise merging with adaptive priors and optional negative prototypes). A lightweight CWLA fuses multiple DINOv3 layers. On FAIR1M (HBB) we obtain \mathrmmAP_0.50:0.95=\mathbf13.06 and \mathrmAP_S=\mathbf2.93 \emph(class‑averaged over ship/airplane); on xView (HBB) we report \mathrmmAP=\mathbf16.69. On our UAV dataset, ZODS‑RS achieves mask \mathrmmIoU=\mathbf31.10 and improves small‑object AP by \mathbf+30.70 over Grounded‑SAM on a single 5090. This work offers a unified, \emphno‑training solution for horizontal‑box detection plus instance segmentation in aerial imagery; provides explicit closed‑form formulations for PP/R‑SEM/UAM tightly coupled with DINOv3; and demonstrates \emphconsistent gains on small and crowded targets and under cross‑domain shifts while keeping deployment simple.
Authors: Zhenqiang Qin, Chenguang Dai, Min Wang, Xian Li
Abstract: UAV multispectral imagery naturally contains multi‑angular observations due to low flight altitude and wide field‑of‑view imaging, which may introduce geometry‑driven radiometric variability. This study proposes a geometry‑aware multi‑angular observation extraction workflow to quantify observation‑geometry effects from a BRDF perspective. Specifically, camera intrinsics and extrinsics are refined via structure‑from‑motion (SFM), and homogeneous regions annotated on an orthomosaic are reprojected onto multiple raw sub‑images acquired from different viewpoints. This enables joint extraction of multi‑band reflectance and observation geometry parameters for the same ground targets under varying viewing directions. The extracted observations are further analyzed using band‑wise polar visualization in the (VZA, RAA) domain. Results on a grassland target show clear reflectance anisotropy across ten bands, with red‑edge and nearinfrared bands exhibiting 119‑137% variability between maximum and minimum reflectance, indicating non‑negligible observation‑geometry effects on radiometric consistency.
Authors: Ming Qian, Tianjian Ouyang, Mingchao Sun, Zijian Wang, Jincheng Xiong, Jiarong Han, Yongchang Zhang, Jiawei Zhang, Xu Wang, Yu Liu, Luyang Tang, Fei Yu, Zengye Ge, Mengmeng Du, Yuan Liu, Nianfei Fan, Song Wang, Yingliang Peng, Chunxue Jia, Yang Liu, Shiying Zeng, Haozhe Shi, Junnan Lai, Hongyu Pan, Zheng Wu, Ning Guo, Mu Xu, Hang Zhang
Abstract: We present ABot‑Earth 0.5, a generative 3D framework designed to synthesize vast, seamless 3D environments from ubiquitous, geospatially referenced satellite imagery. To achieve this, we propose a novel generative model formulated directly with the 3D Gaussian Splatting (3DGS) representation. The model is trained on a diverse corpus of existing real‑world urban reconstructions, learning to generate realistic geometry and textures. At inference, it synthesizes novel 3D scenes conditioned solely on satellite imagery at a scalable rate of under 10 minutes per square kilometer, while demonstrating exceptional realism. The framework is designed for accessibility, with integrated hierarchical level‑of‑detail (LOD) structures that permit real‑time, interactive visualization on web‑based map engines. This high‑fidelity simulation sandbox effectively mitigates the sim‑to‑real domain gap, enabling critical downstream Embodied AI applications like closed‑loop UAV navigation. By providing an ultra‑low‑cost and high‑efficiency solution, ABot‑Earth 0.5 significantly lowers the technical and financial barriers to large‑scale 3D reconstruction and empowers the future of global digital earth visualization.
Authors: Yue Zhang, Zizhong Ding, Lin Sun, Haopeng Chen, Yan Jiao, Yongming Xu
Abstract: The proliferation of multi‑dimensional trajectory data, fueled by large‑scale IoT and the emerging low‑altitude economy, particularly UAV operations, drives repositories to jointly support (x,y), (x,y,t), (x,y,z), and (x,y,z,t) queries within a single storage framework. Yet existing HBase‑based systems fall short in three respects: severe row‑key interval fragmentation when altitude is jointly encoded with horizontal coordinates, locality‑unfriendly spatial encodings with workload‑blind shape‑code ordering, and coarse‑grained temporal indexes that leave intra‑slot boundary ambiguity unresolved. We present AeroMesa, an efficient data management system for multi‑dimensional spatio‑temporal trajectories built on Apache HBase and Redis, that natively supports (x,y), (x,y,t), (x,y,z), and (x,y,z,t) queries within a unified storage framework. AeroMesa addresses the above limitations through three designs: a decoupled horizontal‑altitude architecture with a multi‑granularity Height Spatio‑Temporal Index (HTSI) that eliminates joint encoding fragmentation; Hilbert‑BFS with Workload‑Aware Jaccard (WAJ) reordering that improves spatial locality; and TI+, a dual‑offset temporal index that resolves intra‑slot false positives. Evaluations on T‑Drive and an 87,537‑trajectory high‑fidelity UAV simulation demonstrate that AeroMesa reduces 3D/4D query latency by up to 30x over XZ3/TXZ3, lowers 2D latency by up to 17.9% over TMan, and cuts temporal candidates by up to 51.3% over MCTM, with sub‑linear scalability confirmed under 200x data expansion, confirming AeroMesa's efficiency for multi‑dimensional spatio‑temporal trajectory management.
Authors: Lixuan Jin, Bingxuan Lan, Xinyi Bao, Xiangyuan Xie, Chunjie Zhang, Zheng Chen, Tianshuo Liu, Ruijie Tian, Jinyu Ru, Gang Wang, Lei Yuan, Yang Yu
Abstract: Unmanned aerial vehicles (UAVs) are increasingly being deployed in logistics, service robotics, and other real‑world applications, creating a growing demand for autonomous payload acquisition and delivery. Existing approaches typically assume pre‑attached payloads or rely on specialized grippers, leaving versatile end‑to‑end aerial delivery largely unresolved, where different payloads induce highly variable flight dynamics, requiring a single policy to adapt online without manual calibration or explicit system identification. To this end, we study Autonomous Aerial Manipulation via Contextual Contrastive Meta Reinforcement Learning (Aco2), a fully autonomous aerial delivery setting in which a quadrotor equipped with a lightweight hook continuously picks up, transports, and delivers diverse handle‑equipped objects between randomized locations, all without human intervention. First, we design a contextual observation encoder that infers a compact latent context from recent interaction history, enabling the policy to adapt online to payload‑dependent dynamics. To further improve the quality of this context, we introduce a contrastive objective that structures the context embedding around task‑relevant variations, improving generalization across diverse payloads without requiring explicit system identification. Trained entirely in simulation with extensive domain randomization, Aco2 can be directly deployed on a physical quadrotor without real‑world fine‑tuning.
Authors: Xiangyi Zheng, Xiangyu Wang, Qinan Liao, Zimu Tang, Yue Liao, Dongyue Lyu, Guodong Wang, Junjie Liu, Si Liu
Abstract: Language‑guided UAV agents must execute long‑horizon semantic instructions while producing smooth, physically feasible continuous flight commands, yet existing Vision‑Language Navigation (VLN) benchmarks typically use discrete or coarse actions and existing UAV Vision‑Language‑Action (VLA) tasks focus on short, atomic maneuvers. To address this gap in UAV task settings, we introduce FLIGHT, a Fine‑grained Long‑horizon Instruction‑Guided benchmark for Hybrid UAV navigation and reasoning Tasks, which combines multi‑stage instructions with dense 6‑DoF trajectory annotations across two dataset splits: Fine‑grained VLN and Long‑horizon Flow. To endow the UAV agent with the capability of real‑time in‑flight reasoning over task execution status and mission planning, while simultaneously accommodating high‑frequency, real‑time precise control, we further propose FLIGHT VLA, an asynchronous architecture that decouples a low‑frequency Streaming Pilot Vision‑Language Model (VLM) for task‑state reasoning from a high‑frequency diffusion action model for continuous control, supervised by explicit Pilot Reasoning texts that summarize the current flight state and anticipate the next subgoal. In closed‑loop evaluation, FLIGHT VLA consistently surpasses representative VLN and VLA baselines on our FLIGHT benchmarks, achieving stronger multi‑stage completion, subgoal adherence, and terminal control. Its trained Streaming Pilot Reasoning VLM further improves UAV video reasoning, validating the effectiveness of our design.
Authors: Utsav Bhandari, Saroj Burlakoti, Rhonda Miller, Sierra Young, Eric Westra, Aaron Etienne
Abstract: Weed pressure in forage corn production causes yield losses of up to 31.5%, yet site‑specific weed management (SSWM) systems built on UAV imagery and deep learning remain constrained by the scarcity of field‑representative training datasets. We present USU‑Corn‑WeedDB, a publicly available UAV RGB image dataset collected from a commercial forage corn field in Cache Valley, Utah, designed to support multi‑class weed detection under both supervised and semi‑supervised learning frameworks. RGB imagery was acquired on 27 June 2025 using an Autel EVO II Dual 640T V2 drone at ~10m above ground level, yielding a ground sampling distance of approximately 0.48 cm/pixel. A total of 366 full‑resolution images were tiled into 8,800 patches at 640 x 640‑pixel resolution. Of these, 800 images were manually annotated for three weed species; common lambsquarters (Chenopodium album), redroot pigweed (Amaranthus retroflexus), and green foxtail (Setaria viridis) comprising 10,539 bounding‑box instances, with the remaining 8,000 tiles retained as an unlabeled pool for semi‑supervised experiments. This dataset reflects a natural class imbalance where redroot pigweed constitutes 53.86% of annotated instances, which was preserved intentionally to mirror real field conditions. To validate dataset utility, we trained 28 object detection models spanning five architecture families including YOLOv8, YOLOv9, YOLOv10, YOLO11, YOLO26, and RT‑DETR under identical conditions without hyperparameter tuning. Test set mAP@0.5 ranged from 0.773 to 0.840, with lightweight models achieving competitive performance relevant to edge‑deployed UAV systems. USU‑Corn‑WeedDB is publicly available at https://doi.org/10.5281/zenodo.20044178.
Authors: Malak Allam, Khaled Shaban, Ali Hamdi
Abstract: Automated defect detection in high‑voltage transmission‑line insulators remains challenging due to severe class imbalance, large scale variation, and the small spatial extent of defect instances in Unmanned Aerial Vehicle (UAV) imagery. To address these challenges, this paper proposes AE‑YOLO, an Attention‑Guided AutoEncoder‑Enhanced YOLO framework for robust insulator defect detection. The architecture integrates lightweight bottleneck autoencoders within a Feature Pyramid Network‑Path Aggregation Network (FPN‑PAN) neck. This preserves anomaly‑sensitive information during multi‑scale feature fusion. Convolutional Block Attention Modules (CBAM) are used throughout the backbone, enhancing feature discrimination and suppressing background interference. The framework also introduces a variance‑maximizing autoencoder regularization strategy, which encourages diverse, defect‑discriminative latent representations. The network trains using a unified objective that combines focal loss, Complete IoU (CIoU) loss, and autoencoder regularization to address foreground‑background imbalance and improve localization accuracy. During inference, Weighted Boxes Fusion (WBF) combines predictions from YOLOv8, YOLOv10, and YOLO11. An autoencoder‑guided confidence boosting mechanism improves sensitivity to rare defect categories. Experiments on the Insulator‑Defect Detection dataset show that AE‑YOLO with an EfficientNetV2 backbone achieves 95.10 percent mAP at 0.5, 96.40 percent precision, and 93.80 percent recall. This performance surpasses the strongest YOLO‑family baseline by 5.0 points in mAP at 0.5 and 6.7 points in recall. These results confirm the effectiveness and adaptability of the framework. The model is a practical and scalable solution for UAV‑based transmission‑line inspection and defect monitoring.
Authors: Yadav Raj Ghimire, Jagrati Talreja, Tewodros Syum Gebre, Timothy Agboada, Shikha V. Chandel, Leila Hashemi Beni
Abstract: In this study, UAV multispectral imagery is used to segment the severity of bacterial leaf blight (BLB) in rice using convolutional neural networks (CNNs) and transformer‑based models. The evaluated architectures include U‑Net with a ResNet‑ 101 encoder, U‑Net++ with EfficientNet‑B3 and EfficientNetB7, DeepLabV3+, and SegFormer, all trained under a common pipeline with three input configurations (multispectral only, multispectral+NDVI, and multispectral+NDRE). Experiments are conducted using the publicly available BLB dataset with performance reported using mean IoU (mIoU), mean F1 (mF1), mean accuracy (mAcc), precision, and recall. U‑Net++ with EfficientNet‑B3 achieved the highest performance, with an mIoU of 97.62%. SegFormer obtained lower segmentation accuracy but comparable inference speed. Overall, the results indicate that lightweight CNN backbones remain more reliable for operational BLB monitoring while integration of vegetation indices provides small and consistent improvements. The study also highlights the value of standardised UAV datasets to compare disease mapping methods and encourages the use of CNN architectures for field implementation.
Authors: Tan Zhang, Quanyou Li, Lu Zhang, Jun Liu, Xiaofeng Zhu, Ping Hu
Abstract: When a disaster unfolds, responders must answer not only what is happening, but also why it is happening, what will happen next, and what to do now, often from noisy low‑altitude UAV views and under tight on‑site compute constraints. However, most existing multimodal benchmarks emphasize perception (e.g., recognition/description), cover limited disaster types, and provide insufficient support for the multi‑stage reasoning required in practical emergency response. We introduce DisasterBench, a multi‑stage multimodal reasoning benchmark for UAV‑Based disaster response in complex environments. DisasterBench spans 14 disaster‑related scene types and 9 response‑critical tasks across pre‑, during‑, and post‑disaster stages, with fine‑grained disaster‑task mappings that explicitly test causal attribution, propagation prediction, damage analysis, and decision‑oriented reasoning. To enable reasoning on the edge, we further propose DisasterVL, a lightweight multimodal model optimized with a three‑stage pipeline combining domain instruction tuning, chain‑of‑thought‑guided multimodal alignment, and reinforcement learning‑based policy optimization. Experiments across 21 popular MLLMs show that our 2B‑parameter DisasterVL outperforms all evaluated open‑source models and substantially narrows the gap to state‑of‑the‑art closed‑source models, achieving GPT‑4o‑comparable reasoning accuracy with superior efficiency. The project page is available at https://github.com/TanmouTT/DisasterBench.
Authors: Shengtao Zheng, Kai Li, Weichen Zhang, Yu Meng, Chen Gao, Xinlei Chen, Yong Li, Xiao-Ping Zhang
Abstract: End‑to‑end Vision‑Language‑Action (VLA) models have shown promise in UAV navigation. However, existing approaches typically rely on historical observations to directly predict actions, often struggling in dense urban environments where severe occlusions and sharp turns result in drastic viewpoint transitions. We argue that the ability to "imagine" future states ‑‑ inherent in World Models ‑‑ is critical for robust decision‑making under such partial observability. To address this, we construct a challenging Urban Canyon Traversal Benchmark, specifically designed to evaluate spatial understanding in scenarios characterized by severe occlusions and drastic viewpoint transitions. To this end, we propose WorldFly, a novel world‑model‑based VLA framework that employs a dual‑branch coupled flow matching mechanism to jointly generate future video predictions and navigation actions, thereby explicitly guiding the agent's policy via spatial imagination. Extensive evaluations on our benchmark demonstrate that WorldFly outperforms other baselines, particularly in unseen environments, validating the effectiveness of integrating world models into embodied aerial agents.
Authors: Zhaolin Li, Jinsong Chen, Shanxin Guo, Tuo Zhang, Xinglong Zhang, Pan Chen
Abstract: Hyperspectral imaging provides rich spectral information for quantitative remote sensing, yet hyperspectral sensors remain costly and thus unavailable in many UAV deployments. Spectral super‑resolution (SSR) seeks to reconstruct hyperspectral images (HSIs) from multispectral images (MSIs). Most existing SSR methods assume a fixed and known spectral response function (SRF) and are therefore limited to single‑sensor settings. In practical cross‑sensor scenarios, the spectral degradation from HSI to MSI is unknown and varies with sensor characteristics and scene content, which renders HSI reconstruction ill‑posed. This paper proposes a physics‑guided deep unfolding network, termed PGU‑Net, to address blind cross‑sensor SSR by jointly estimating the HSI and a learnable spectral transformation function (STF). PGU‑Net unrolls an alternating optimization procedure into an end‑to‑end trainable architecture with stages, where each stage sequentially updates the HSI and the STF. Both modules combine learnable proximal networks with differentiable closed‑form solvers, enabling physical interpretability while retaining strong representation capacity. Experiments on benchmark datasets (CAVE and NTIRE 2022) with multiple SRFs demonstrate accurate recovery of the STF (degradation operator) and improved reconstruction performance over state‑of‑the‑art SSR methods. Furthermore, evaluations on a real UAV cross‑sensor dataset (Headwall Nano HSI and DJI P4 Multispectral MSI) verify the effectiveness and robustness of PGU‑Net under truly blind conditions, and suggest that the estimated STF may exhibit land‑cover‑related differences.
Authors: Hanzhi Chang, Jing Bai, Xin Tang, Xiaomei Liu
Abstract: Unmanned aerial vehicles‑assisted mobile edge computing (UMEC) can execute compute‑intensive and latency‑critical artificial intelligence (AI) services, which can be provided by multiple UAVs collaborating in the air to perform inference tasks. Completing an AI service requires multiple inferences, each of which is implemented by an AI service chain consisting of multiple virtual network functions (VNFs). The application of AISC relies on an efficient AISC deployment strategy to determine which UAV to deploy VNF on. However, the UMEC network topology is highly dynamic due to the high‑speed movement of UAVs or their departure/arrival, which makes the AISC deployment in the UMEC network challenging. In addition, the intricate relationships between UMEC environment and AISC, as well as between individual VNFs in an AISC, can also affect the effectiveness of AISC deployment strategy. Moreover, under the constraints of energy consumption and load balancing, it is also difficult to optimize the AISC strategy to minimize AISC completion time for enhancing the quality of AI service. To address the above challenges, this paper proposes a double deep attention Q‑network based on heterogeneous graph neural networks, which incorporates heterogeneous graph to capture diverse relationships in UMEC and utilizes attention mechanisms to adaptively focus on critical nodes and links for intelligent AISC deployment. The experimental results demonstrate that the proposed algorithm performs excellently in AISC completion time, AISC completion rate, load balancing and energy consumption.
Authors: Phillip Jiang
Abstract: Multi‑object tracking (MOT) from UAV imagery presents unique challenges: altitude varies across sequences, objects are small and densely packed, and frequent occlusion causes identity switches. Existing graph‑based trackers assume fixed spatial context and treat all objects uniformly, ignoring the heterogeneous lifecycle states of detections, active tracklets, and lost targets. We propose HDST‑GNN, a Heterogeneous Dynamic Spatiotemporal Graph Neural Network with three novel contributions. First, Altitude‑Adaptive Edge Construction estimates a camera‑altitude proxy from mean object area and adjusts the graph connectivity radius accordingly. Second, Heterogeneous Node Representation models detections (Type‑D), confirmed tracklets (Type‑T), and lost tracklets (Type‑L) as distinct node types with dedicated projections and typed edge relations. Third, Occlusion‑Gated Temporal Aggregation gates each node's attention contribution by its occlusion confidence, preventing occluded nodes from corrupting neighbour embeddings. HDST‑GNN is trained end‑to‑end with a differentiable Sinkhorn head using joint cross‑entropy and triplet loss. On VisDrone2019‑MOT with oracle detections, HDST‑GNN achieves 94.51% MOTA and 97.24% IDF1, outperforming SORT by +5.0 MOTA points and reducing identity switches by 81%. With real YOLOv8n detections, HDST‑GNN reduces identity switches by 49% vs. SORT. Ablation studies confirm the independent contribution of each component.
Authors: Remon Polus, Soumaya Cherkaoui
Abstract: Uncrewed aerial vehicles (UAVs) are increasingly considered as aerial platforms capable of providing both sensing and communication services, representing a promising paradigm for intelligent transportation systems. This paper investigates the optimal time allocation for a UAV‑enabled integrated sensing and communication (ISaC) system operating in the X‑band for vehicular networks. We analyze the trade‑off between sensing accuracy and communication performance under practical UAV constraints and fading effects, considering both single‑shadowing and double‑shadowing channel models. An optimization framework is developed to allocate time between sensing and communication while guaranteeing minimum communication rates and sufficient sensing reliability. Simulation results demonstrate adaptive time allocation strategies, highlighting how UAV‑to‑ground channel conditions and target distances influence the balance between sensing and communication in smart mobility scenarios.
Authors: Loc X. Nguyen, Avi Deb Raha, Huy Q. Le, Zhu Han, Eui-Nam Huh, Choong Seon Hong
Abstract: The sixth‑generation wireless networks are envisioned to deliver ubiquitous, seamless, and intelligent connectivity that reaches far beyond the limits of terrestrial infrastructure. Non‑terrestrial networks (NTNs) are central to this vision, extending coverage to underserved regions, remote terrain, and disaster zones that terrestrial deployment cannot economically reach. However, NTN architecture faces numerous limitations: severe path loss over long distances, long propagation delays, large and time‑varying Doppler shifts, limited visibility windows, and tight on‑board energy and computing budgets. Semantic communication (SemCom), which conveys the meaning of data rather than its raw bit‑level representation, is unusually well matched to these conditions: extreme compression rate for task‑oriented eases bandwidth scarcity, deep joint source‑channel coding prevents the cliff effect due to low signal‑to‑noise ratio, and generative‑AI reconstructs content from sparse cues that survive rain‑faded or blocked links. This observation, that each NTN limitation maps onto a SemCom property that addresses it, motivates our survey. We first walk through the NTN limitations one by one, pairing each with the SemCom design choices that complement it, then we organize the literature along three axes: the NTN platform, the semantic methodology, and the supporting techniques, and follow this with platform‑by‑platform deep dives on satellite‑centric, UAV/HAPS‑centric, and integrated SAGIN systems. The survey concludes by identifying open research problems, gaps in existing standards, and future directions, including the application of foundation models, energy‑aware scheduling, and quantum‑assisted SemCom for deep space communication.
Authors: Ritabrata Roy Choudhury, Arkajyoti Karmakar, Rudra Pratap Mitra
Abstract: Mass gathering events are associated with critical safety incidents caused by insufficient crowd monitoring and inadequate emergency response coordination. Traditional surveillance systems lack intelligent analytics, resulting in delayed threat identification, poor resource deployment, and weak support for vulnerable individuals during dense public assemblies. This paper presents Drishti AI‑Event Guardian, an intelligent crowd management framework using deep learning for public safety enhancement. The architecture combines multimodal data from CCTV networks and UAV platforms, processed by models on Google Vertex AI infrastructure. Core methods include real‑time crowd density estimation using YOLOv8, spatiotemporal anomaly detection, and predictive crowd‑flow modeling through gradient‑boosted regression. Drishti also integrates four modules: (i) facial recognition for missing person identification with crowd‑wide notification; (ii) medical emergency reporting with automated dispatch; (iii) a conversational AI chatbot for reports and complaints; and (iv) an intelligent guard reallocation engine that dynamically reassigns personnel in response to crowd density changes. The system is evaluated on two scenarios: the Kumbh Mela gathering and the RCB Victory Parade event, achieving crowd density estimation MAE of 3.2 persons/m2, anomaly detection F1‑score of 0.91, facial recognition precision of 0.93, and median alert latency of 111 ms. Predictive congestion modeling provides five‑minute forecasts with MAPE of 8.3%, enabling preemptive intervention. The chatbot resolved 89% of incident filings without human operators, while guard reallocation reduced responder deployment latency by 34% versus manual reassignment. Results demonstrate a shift from passive surveillance toward active crowd intelligence and scalable foundation for events from local gatherings to mega festivals.
Authors: Ganyu Zou, Linhan Wang, Chen Dai, Siji Chen, Chang-Tien Lu
Abstract: Decentralized rigid formation flocking requires a swarm of autonomous agents to maintain a predetermined geometric configuration while moving, relying solely on local sensing and communication. However, existing decentralized control methods struggle to maintain strict inter‑agent distance constraints in cluttered environments, often suffering from local minima deadlocks, high frequency control oscillations, or limited flexibility during obstacle navigation, resulting in low success rate. To address these limitations, we propose Rigid Swarm Control (RSC), a decentralized control framework for large‑scale rigid formation flocking. To escape local minima via robust long‑term planning while ensuring short‑term safety, RSC integrates finite‑horizon trajectory predictions with a reactive artificial potential field (APF) safety controller within a hybrid architecture. Furthermore, to accelerate formation reassembly after obstacle traversal without interrupting task execution, RSC introduces an online leader‑follower reconfiguration mechanism based on stable role exchange. Extensive evaluations in challenging cluttered environments with 25 UAVs demonstrate that RSC reliably unifies rigid formation maintenance, obstacle avoidance, and target tracking. Under strict success criteria ‑ collision‑free operation with a maximum relative edge‑length error below 10%, RSC achieves an 83% success rate, significantly outperforming existing heuristic and learning‑based baselines that fall below 5%.
Authors: Faryal Batool, Muhammad Ahsan Mustafa, Fawad Mehboob, Valerii Serpiva, Dzmitry Tsetserukou
Abstract: Indoor UAV navigation requires efficient exploration, scene understanding, and reliable trajectory execution under limited field‑of‑view observations. Existing vision‑based navigation frameworks typically rely on single‑view observations, limiting their ability to reason about occlusions, target visibility, and global scene structure. In this work, we propose AgenticDiffusion, a multi‑view UAV navigation framework that coordinates language‑guided reasoning, open‑vocabulary target grounding, vision‑based diffusion planning, and NMPC within a unified aerial navigation pipeline. Given a natural language instruction and synchronized first‑person‑view (FPV) and top‑view observations, the framework determines the most informative viewpoint for navigation and generates a mission plan prior to trajectory execution. The targets are localized using an open‑vocabulary grounding model, after which viewpoint‑specific diffusion planners generate navigation trajectories for UAV execution. Using complementary viewpoints, the proposed framework reduces repeated target exploration and improves navigation efficiency in cluttered indoor environments. The framework was validated in four real‑world UAV navigation scenarios involving adaptive viewpoint selection, multi‑stage mission execution, long‑horizon navigation, and safe landing‑site selection. The experimental results demonstrated an overall mission success rate of 80% in 40 real‑world trials, while the diffusion planners achieved a trajectory generation success rate of 100%.
Authors: Roohan Ahmed Khan, Yasheerah Yaqoot, Muhammad Ahsan Mustafa, Dzmitry Tsetserukou
Abstract: Deep reinforcement learning has shown strong potential for enabling autonomous robots to learn complex navigational tasks. However, its practical use still depends heavily on human designed reward functions and repeated manual fine tuning, which is time consuming and does not guarantee high success in the desired task. This paper presents AgenticRL, agent guided reinforcement learning framework that increases autonomy in reward design, policy refinement, and real world deployment for unmanned aerial vehicles (UAV) navigation tasks. AgenticRL uses a multimodal generative pre‑trained tansformer (GPT) agent to interpret task information and visual scene observations, generate task specific reward functions, train policies using Proximal Policy Optimization (PPO) algorithm, and then act as a critic by evaluating the trained policy through diagnosis packets to generate feedback. Based on this feedback, the agent identifies failure modes and refines the reward function in a closed loop self improvement process. To further leverage the multimodal GPT agent during inference, AgenticRL uses real world images and natural language task information to automatically identify the active scenario and select the appropriate trained policy for execution. The framework is evaluated on multiple navigational tasks, including gate traversal, obstacle avoidance, wall barrier crossing with landing, trajectory following, and motion behavior learning. Experimental results show that the closed loop refinement process improves policy behavior compared with initial rewards by 71%. We also demonstrate sim‑to‑real transfer of the proposed framework, achieving a real world success rate of 91% and a sim‑to‑real accuracy of 94%.
Authors: Hongjiang Lei, Heng Jin, Ki-Hong Park, Jia Ye, Liang Yang, Gaofeng Pan, Yun Li
Abstract: Integrated sensing and communication (ISAC) has emerged as a promising key technology for future wireless networks, enabling the efficient coordination of sensing and communication functions within limited resources. This work investigates a secure ISAC system assisted by an uncrewed aerial vehicle (UAV). By incorporating the extended Kalman filter (EKF), the proposed system is capable of delivering communication services to legitimate users while simultaneously jamming eavesdroppers and performing joint prediction and tracking of the trajectories of both legitimate and illegitimate users. Considering practical constraints such as sensing beamwidth, transmit power, and UAV's propulsion energy consumption, the secrecy rate is maximized through the joint design of transmit beamforming and UAV trajectory. To tackle the resulting highly non‑convex optimization problem, an efficient iterative algorithm is developed by integrating block coordinate descent, successive convex approximation, and EKF, thereby yielding a high‑quality suboptimal solution. Extensive simulation results validate the superior performance of the proposed scheme compared to benchmarks.
Authors: Christian Manasseh, Savana Ammons
Abstract: Detecting coordination among unmanned aerial vehicle (UAV) fleets operating in shared airspace and identifying the route‑lead aircraft whose navigation decisions govern fleet behavior presents a fundamental speed‑‑accuracy trade‑off: fast methods enable real‑time traffic management but sacrifice detection fidelity, while accurate methods may exceed the time budget for actionable airspace deconfliction. This paper presents a game‑theoretic decision framework that resolves this trade‑off by formulating method selection as a two‑player zero‑sum game between a Monitor (selecting computational methods and parameters) and Nature (selecting the unknown traffic scenario). We construct an end‑to‑end pipeline from trajectory surveillance data through eight candidate detection algorithms, a Monte Carlo sensitivity analysis characterizing their stochastic performance, and finally a multi‑objective optimization layer that identifies Pareto‑optimal method portfolios. The minimax solution provides a robust mixed strategy with a probability distribution over methods that guarantees worst‑case performance regardless of scenario uncertainty. Experimental evaluation across 200 randomized configurations spanning 5‑‑50 aircraft demonstrates that the framework recommends distinct method portfolios depending on operational priority: Koopman Phase dominates balanced (70.6%) and speed‑priority (79.7%) profiles, while CRQA emerges as primary (47.4%) when route‑lead identification is prioritized. The framework achieves a guaranteed game value of 0.29‑‑0.53 (normalized utility) across all tested preference profiles, providing the first principled, scenario‑adaptive methodology for computational method selection in UTM fleet monitoring operations.
Authors: Xuchen Liu, Jiawei Huang, Shihao Xia, Bingxi Liu, Jinqiang Cui, Jiankun Yang
Abstract: Vision‑language navigation (VLN) for UAVs demands grounding free‑form instructions into 6‑DoF flight under partial observability. While Vision‑Language‑Action (VLA) models excel at semantic reasoning, they suffer from brittleness due to geometric inconsistency and dynamics mismatch. To address this, we propose ImagineUAV, an imagination‑driven framework leveraging cascaded world‑action modeling. Instead of direct regression, ImagineUAV employs a latent video diffusion model to generate instruction‑conditioned future observations, explicitly imagining environmental evolution, from which 6‑DoF motions are inferred via an action extractor. A kinodynamic planner then refines these estimates into collision‑free trajectories. Additionally, a step‑distilled inference pipeline ensures real‑time execution. With only 1.3B parameters, ImagineUAV outperforms prior VLN and VLA baselines on benchmarks and real‑world flights, validating the practicality of imagination‑driven aerial navigation.
Authors: Abinav Kiran, Sravan Danda, Aditya Challa, Sougata Sen, Daya Sagar B S
Abstract: Motion blur from high‑speed UAV acquisition de‑grades semantic segmentation on rare texture‑dependent classes with high agronomic value. Standard CNNs rely on high‑frequency magnitude features that blur destroys, causing statistical erasure of minority signals. We propose Dual Quantile Activation (QAct), a rank‑aware block replacing magnitude gating with instance‑level rank normalization. Evaluated onAgriculture‑Vision 2021 across zero‑shot and blur‑supervised regimes at multiple severities, QAct is the dominant architectural factor: it delivers consistent mIoU gains over ReLU across both regimes and all severities, with strongest gains on rare structural and texture‑dependent classes. Some dominant classes (water,planter skip) show mixed per‑class performance under distillation. At moderate blur, zero‑shot QAct outperforms distillation‑trained ReLU; across all severities, Distill‑QAct achieves best performance, confirming rank aware activation and blur‑domain training are complementary robustness sources.
Authors: Beier Hu, Yuanshen Guo, Jialu Cai, Chengwei Li, Yong Wang, Shunan Wu, Zhigang Wu
Abstract: Cross‑view geo‑localization (CVGL) is critical for Unmanned Aerial Vehicle (UAV) self‑positioning and target localization in GNSS‑denied environments. However, acquiring robust semantics while preserving finegrained spatial details remains challenging. To address this, we propose DINO‑GFSA, a framework leveraging a LoRA (Low‑Rank Adaptation) adapted DINOv3 (ViTL) backbone for parameter‑efficient, high‑capacity representation. Crucially, we introduce a Semantic Gated Residual Fusion module, which utilizes high‑level semantics to selectively calibrate and integrate low‑level spatial cues, effectively bridging the semantic gap. Furthermore, a Mamba‑based Sequential Aggregation Head is designed to capture long‑range spatial dependencies with linear complexity. Experiments demonstrate state‑of‑the‑art performance on University‑1652 and DenseUAV benchmarks, notably surpassing the previous best on DenseUAV by 3.48% on Recall@1. These results validate DINO‑GFSA as a generalized, robust solution for UAV CVGL.
Authors: Jie Gao, Jie Ma, Kaihui Lin, Kai Ye, Miaohui Zhang, Pingyang Dai, Liujuan Cao
Abstract: For low‑altitude Unmanned Aerial Vehicle (UAV) autonomy, 3D spatial understanding is not merely a perception objective, but the safety interface between human instructions and physical flight. In human‑scale urban airspace below 20 meters, thin geometry, occlusions, vegetation, and urban clutter define whether an aerial agent can safely enter the space ahead. However, existing UAV datasets mainly provide 2D annotations or 3D boxes, while driving‑oriented occupancy benchmarks assume stable ground‑level sensor rigs. Both miss the defining regime of low‑altitude flight: a front‑facing monocular camera observing occupied and free space from a moving aerial body with frame‑wise changing 6‑DoF pose and camera extrinsics. To bridge this gap, we introduce SkyShield, to the best of our knowledge the first front‑view monocular semantic occupancy benchmark for urban UAV flight below 20 meters. Built on CARLA, SkyShield contains 36K front‑view UAV samples across diverse urban scenes and weather conditions, pairing each image with frame‑wise 6‑DoF UAV pose, frame‑wise dynamic camera geometry, UAV states, and front‑frustum semantic occupancy labels. We further propose KAR‑mIoU, a UAV‑centric and dynamics‑aware metric that re‑weights voxel‑level evaluation by kinematic reachability and time‑to‑collision, revealing safety‑critical risks hidden by conventional mIoU. To tackle this challenging new setting, we provide SkyOcc, a geometry‑first monocular baseline that integrates frame‑wise UAV attitude into projection, fuses temporal occupancy features, and applies safety‑prior optimization to preserve sparse collision‑critical structures. Together, SkyShield, KAR‑mIoU, and SkyOcc establish occupancy as a safety interface for low‑altitude aerial autonomy. Code and dataset will be released publicly.
Authors: Jingfu Li, Jingjing Cui, Chong Huang, Jing Zhu, Zheng Chu, Mingzhe Chen, Pei Xiao, Rahim Tafazolli
Abstract: Semantic communications which can significantly reduce spectrum consumption in wireless networks, have recently become a popular research area. When combined with wireless power transfer (WPT), semantic communications can help achieve high spectral efficiency for energy‑limited devices in wireless communications. In energy‑constrained and link budget‑limited scenarios such as UAV networks, the integration of semantic communications and WPT enables highly energyefficient transmission mechanisms. In this paper, we investigate semantic communications in UAV‑enabled WPT networks. To achieve adaptability to varying signal‑to‑noise ratio (SNR) and task requirements, we introduce a multi‑layer hybrid bit and semantic communication framework. We adopt a semantic communication efficiency metric and aim to maximize it by jointly optimizing UAV trajectory, energy harvesting base station (EHBS) selection, user association, semantic mode selection, and energy harvesting time allocation. To address this complex longterm optimization problem, we introduce the distributional soft actor‑critic (DSAC) algorithm and introduce a decision assistant to further enhance the convergence performance of DSAC. Simulation results validate the effectiveness of the proposed method and framework and demonstrate that our algorithm can achieve superior long‑term optimization performance in dynamic network environments.
Authors: Erdem Uysal, Timo Kehrer, Sebastiano Panichella
Abstract: Foundation models are increasingly used to drive autonomous systems, yet existing approaches either keep the model in a tight control loop, raising latency and hallucination risk, or compile natural language into opaque end‑to‑end policies that are hard to explain, constraint and require domain‑specific datasets and fine‑tuning. We propose a planner‑executor agent for PX4‑based drones that decouples high‑level mission planning from low‑level control. A large language model performs single‑pass task planning, while execution is handled through a structured ROS 2 tool‑calling interface bridged to MAVLink. The system constructs a world model by combining modular 2D detectors (e.g., YOLO or vision‑language models) with a pinhole depth projection module for 3D object localization. A constraint enforcement layer enforces altitude limits and horizontal geofencing, and bounded replanning enables recovery from execution‑time action failures. We position our approach within three common design patterns for foundation‑model‑based robotics systems and demonstrate its feasibility in PX4 software‑in‑the‑loop simulations in Gazebo. Results highlight improved explainability, constraint enforcement, and reduced LLM calls compared to tightly coupled LLM control. The code, dataset, videos, and other material can be found at the following link: https://github.com/erdemuysalx/PEACE
Authors: Zhizhen Pan, Hesong Wang, Huan Wang
Abstract: Estimating 3D attributes directly from images has advanced rapidly with the Visual Geometry Grounded Transformer (VGGT), which predicts camera parameters, depth maps, and point clouds in a single forward pass. However, its 1.2B‑parameter scale severely limits deployment on resource‑constrained platforms such as UAVs and mobile AR devices. To address this limitation, we introduce QVGGT, a tailored quantization framework designed to compress VGGT. Our approach starts from the observation that transformer blocks within VGGT exhibit heterogeneous sensitivity to quantization. We thus analyze per‑block quantization sensitivity and propose a selective mixed‑precision strategy that allocates higher precision to the most fragile transformer blocks. To address the amplification of quantization error caused by high‑variance camera and register tokens, we further introduce token filtering with camera information compensation, which removes these outliers from activation calibration and restores their geometric cues using a PCA‑derived global compensation token. Finally, we develop a task‑aware scale search mechanism that evaluates candidate quantization scales not only through layer reconstruction but also through multi‑head supervision and cross‑head geometric consistency among camera poses, depth maps, and point maps. Extensive experiments on multiple geometry perception benchmarks demonstrate that QVGGT achieves near‑lossless W4A16 quantization, preserving the accuracy of all 3D prediction heads while delivering 3~4.9× memory reduction and up to 2.8× real hardware speedup over FP32. Our approach makes high‑fidelity 3D perception feasible on edge devices, enabling practical deployment of feed‑forward 3D reconstruction models in real‑world constrained environments.
Authors: Tianle Zeng, Yanci Wen, Xueang Yu, Hong Zhang
Abstract: Recent aerial vision‑language‑action (VLA) models show promising single‑UAV capabilities, such as tracking moving objects and navigating to language‑specified landmarks. However, it remains unclear whether these capabilities can transfer to air‑ground cooperation, where a UAV and a UGV must act jointly in a shared, closed‑loop physical world. We study this question with CARLA‑Air, a single‑process air‑ground evaluation environment that unifies CARLA and AirSim inside one Unreal Engine runtime. By sharing the same world state, physics tick, and sensing pipeline, CARLA‑Air enables physically consistent UAV‑‑UGV interaction and precise measurement of simulation‑timestamp alignment and effective coordination latency. Using CARLA‑Air, we evaluate representative aerial VLA and planning baselines on two complementary diagnostic tasks: moving‑platform landing and occlusion‑recovery escort. The results show that current aerial VLA models can often track or follow a ground partner, but struggle to convert this single‑agent competence into stable cooperative behavior. State prompting provides limited benefit, and naive bidirectional interaction fails to consistently improve performance and can amplify errors for most baselines. These findings suggest that, under the tested text‑based cue interfaces, zero‑shot cooperative air‑ground VLA requires three components beyond the current paradigm: explicit partner‑state grounding, low‑latency action coordination, and team‑level objective alignment. Our code is available at https://github.com/louiszengCN/CarlaAir.
Authors: Yifan Cai, Jan Ming Kevin Tan, Xiangqi Li, Chenzhe Jin, Narsimlu Kemsaram, Valerio Modugno
Abstract: Reliable midair docking between small unmanned aerial vehicles (UAVs) is essential for modular aerial cooperation and manipulation, but it requires precise relative‑pose control and repeatable platform under tight thrust and payload constraints. We present a dual‑drone docking platform where two quadrotors operate in a leader‑follower formation and dock using a lightweight modular frame with passive magnetic latching. A progress‑aware mission supervisor manages phase transitions: approach, alignment, capture, and settle. This platform integrates a complete hardware‑software stack (ROS 2 with Crazyflie/PX4 interfaces) and synchronized logging for benchmark evaluation. We evaluate the platform in simulation and real‑world experiments using quantitative metrics such as formation error, baseline and yaw consistency, docking success rate, time‑to‑dock, and failure‑mode statistics. The platform enables statistically grounded comparison of docking supervision and synchronization strategies and provides a practical testbed for modular aerial cooperation and repeatable midair aerial manipulation.
Authors: Jimin Choi, Grant Stagg, Cameron K. Peterson, Max Z. Li
Abstract: Uncrewed aerial vehicles (UAVs) are increasingly used for exploration‑driven monitoring in hazardous environments such as disaster zones, contaminated sites, wildfire areas, and damaged infrastructure, where limited flight endurance must be allocated between visiting reported locations and gathering new information. In these settings, prior information regarding hazards is often incomplete, spatially imprecise, and subject to change during execution. For example, initial reports may identify a region where a hazard is likely to exist, but the actual hazard may be displaced, partially observed, or entirely unreported. We present an integrated exploration‑aware UAV route optimization and path planning framework for hazard monitoring under uncertain and evolving prior information. The environment is represented as a spatial risk map, where each location has an associated belief of hazardous conditions. Reported hazards are modeled as uncertain regions of interest (ROIs) rather than confirmed target locations, requiring the UAV to inspect reported areas while also using its limited flight endurance to explore informative regions. The proposed method solves a vehicle routing problem over reported ROIs, augments the route with auxiliary pseudo‑nodes to improve spatial coverage, allocates the remaining flight distance budget across route segments, and optimizes dynamically feasible B‑spline trajectories for local exploration. During execution, UAV measurements update a grid‑based belief map, and the remaining trajectory is replanned when new information and the remaining budget justify adaptation. Across 48 scenario configurations, online replanning improves average KL reduction by 15.9% over the offline optimized planner and 48.6% over straight‑line traversal.
Authors: Feng Qiu, Zheng Fang, Shuhang Zhang, Kangjun Liu, Longkun Zou, Jing Liu, Ke Chen
Abstract: Learning‑based radio map estimation (RME) plays a critical role in UAV‑assisted wireless sensing, enabling tasks such as coverage prediction and network optimization. Most current methods assume an independently and identically distributed (i.i.d.) training and testing setting based on random sampling. However, practical UAV measurements are collected sequentially along feasible trajectories, resulting in highly structured and spatially correlated patterns. This mismatch introduces a sampling distribution shift that increases the intrinsic difficulty of spatial field recovery and compromises the generalization of models trained under i.i.d. assumptions. To mitigate this issue, we propose a trajectory‑aware training paradigm based on Stochastic‑Triggered Trajectory‑Based Sampling (ST‑TBS), which preserves trajectory continuity while introducing sampling variability. Moreover, from a statistical perspective, we show that trajectory‑based sampling reduces spatial diversity and increases information redundancy compared to random sampling. Extensive experiments on the RadioMapSeer and SpectrumNet datasets demonstrate that models trained with random sampling suffer significant performance degradation under trajectory‑based observations, with RMSE increasing from 0.0391 to 0.2632 on SpectrumNet. Conversely, our proposed ST‑TBS method effectively reduces the RMSE to 0.0571. These results highlight the necessity of aligning training and deployment sampling distributions for reliable RME.
Authors: Jorge L. Rodriguez, Victor Angulo Morales, Areej Alwahas, Mariana Elias Lara, Fida Mohammad Thoker, Kasper Johansen, Bernard Ghanem, Fernando T. Maestre, Matthew F. McCabe
Abstract: Foundation models offer a promising route to transferable remote sensing representations, but many current approaches depend on very large pretraining datasets and fixed sensor configurations, limiting their suitability for ecological and environmental applications, where observations often vary across platforms, spatial and spectral resolutions, and available modalities. We introduce FLORO, a multimodal geospatial foundation model designed to learn transferable representations from a small but highly diverse remote sensing corpus. FLORO is pretrained using masked autoencoding on a heterogeneous combination of Sentinel‑1, Sentinel‑2, SkySAT imagery, elevation, and UAV‑derived data. To accommodate sensor variability, FLORO incorporates availability‑aware inputs that indicate which spectral bands and auxiliary modalities are present in each sample, enabling a unified input space across heterogeneous sensor configurations. We evaluated FLORO on the PANGAEA benchmark under a frozen‑encoder protocol across scene classification, segmentation, and regression tasks. Despite being pretrained on a smaller corpus than competing foundation models, FLORO achieved strong and stable transfer across optical, optical‑SAR, and optical‑elevation benchmarks spanning medium‑resolution satellite, airborne, and ultra‑high‑resolution UAV imagery. FLORO obtained the second‑best average segmentation performance across six PANGAEA benchmarks, trailing only a recently introduced foundation model pretrained on over two orders of magnitude more images, remained competitive on scene classification, and was robust in regression tasks, while qualitative results showed improved preservation of spatial structure in flood, urban, biomass, and canopy‑height prediction settings. In a separate controlled experiment on EuroSAT‑MS, geo‑positional encoding further improved classification relative to absolute positional encoding.
Authors: Sravan Reddy Chintareddy, Sherwan Jalal Abdullah, Justin D. Clough, Victor S. Frost, Shawn Keshmiri, Morteza Hashemi
Abstract: In this paper, we present an open‑source measurement platform designed to characterize the performance of commercial cellular (Verizon, a major US provider) and LEO satellite (Starlink) networks through real‑world flight tests in rural environments. We implement a comprehensive multi‑layer measurement approach spanning physical layer signal metrics, multi‑cell network topology, and end‑to‑end (E2E) application performance. Through an extensive flight campaign with more than 10 flight tests, 4.5+ hours of flight time resulting in more than 18K samples, we present the first detailed, open‑source dataset analyzing dual cellular and Starlink performance for low‑altitude UAV operations. Our cellular‑Starlink comparative results, which are collected \emphsimultaneously at the same time and location, demonstrate significant performance differences between the two technologies: the LEO satellite link achieves superior latency performance with 95% of Round‑Trip Time (RTT) measurements below 50 ms compared to 80% under 150 ms for cellular, and exceptional downlink capacity with 95% exceeding 25 Mbps versus only 5 Mbps for cellular. Our analysis on cellular network performance demonstrates that while higher altitudes (e.g., 330+ m above the sea level) improve signal power by 15‑20 dB via line‑of‑sight (LOS) propagation, it causes a 3‑4 × increase in handover rates, which is due to excessive multi‑cell visibility rather than signal degradation. Furthermore, we observe asymmetric impacts on the RTT performance due to handovers such that 53.5% of handovers improve RTT, but worst‑case degradation (275 ms) is 2 × larger than best‑case improvement (137 ms).
Authors: Yuntao Wang, Haojia Yang, Han Liu, Jianle Ba, Zhou Su
Abstract: Unmanned aerial vehicle (UAV) swarms are increasingly deployed in vast low‑altitude applications, owing to their capabilities in distributed sensing, flexible communication, and autonomous coordination. Nevertheless, the open and highly dynamic operating environment of UAV swarms introduces serious security risks, including GPS spoofing, insider threats, and multi‑hop intrusion. These threats are aggravated by limited on‑board resources, frequently changing network topology, and the presence of intelligent adversaries. To tackle these issues, this paper proposes a cloud‑edge‑end collaborative defense framework for UAV swarms. Based on this framework, three complementary mechanisms are developed. First, a cooperative perception scheme is designed to resist GPS spoofing via interactive attack‑defense game modeling. Second, a behavior‑driven authentication method with trust evaluation is developed to mitigate insider threats. Third, a multi‑agent attack forensics framework is devised to intelligently trace the propagation paths of multi‑hop attacks in UAV networks. Experimental results validate the effectiveness of the proposed approaches. Finally, several open research directions are outlined.
Authors: Ungvári Gergő, Ferenc Braun, Attila Ámon, Péter Kackstädter, János Volk, Péter Kovács, Tamás Dózsa
Abstract: The detection of unmanned aerial vehicles (UAVs) is important for the protection of civilian and military infrastructure. In this paper we propose a cost effective UAV detection system using sound signals obtained from microphones. The recorded signals are passed through a signal processing pipeline which employs interpretable adaptive feature extractors using so‑called rational Gaussian wavelets. These adaptive wavelet transformations are embedded into and trained together with an underlying small neural network which detects and classifies UAVs based on the obtained features. This leads to a physically interpretable machine learning algorithm that in addition to classifying UAVs is also capable of detecting UAV swarms. We demonstrate our results using data collected in indoor studio and noisy outdoor environments. We conclude that the proposed method outperforms traditional machine learning approaches for detecting and classifying single UAVs as well as drone swarms, while retaining a high degree of interpretability. Our implementation of the proposed methods is made publicly available for reproducibility.
Authors: Knut Peterson, Zaid Mayers, Azmain Yousuf, Priontu Chowdhury, Asher Zaczepinski, Solmaz Arezoomandan, Reihaneh Maarefdoust, David Han
Abstract: Unmanned Aerial Vehicles (UAVs) have quickly become common in various airspaces, representing a wide range of applications from recreation flying to commercial photography and package delivery. With the increasing prevalence of UAVs, it becomes critical that both manned and unmanned aircraft can detect UAVs and other flying objects from long range to effectively track movement and ensure safe operation in shared spaces. While several datasets have been introduced for drone detection, the need for expanded high‑quality data persists, especially in the area of high‑resolution long‑range drone data. To address this, we introduce a high‑resolution dataset of 102,532 long‑range RGB images of drones, sampled at 5 FPS from 128 distinct video clips taken mid flight during 17 different data collection days spread over 8 months to ensure a wide variety of lighting scenarios, flight locations, and background elements. The dataset boasts comprehensive drone range information across the dataset, as well as 29,630 IR images, all paired with RGB counterparts from the base dataset. As one of the first drone detection datasets to leverage 4K image resolution and paired 640x512 IR images, our work represents a significant advancement to enable the detection of drones at long range. For access to the complete dataset, please visit https://research.coe.drexel.edu/ece/imaple/lrddv3/
Authors: Haiyan Zhao, Zirui He, Guanchu Wang, Ali Payani, Yingcong Li, Mengnan Du
Abstract: Activation verbalization explains hidden representations in natural language, but existing methods are mostly limited to self‑explanation, where each model explains only its own activations. We introduce Universal Activation Verbalizer (UAV), a framework that uses a shared decoder to explain activations from heterogeneous donor models. UAV learns a lightweight adapter that converts donor activations into soft tokens in decoder's embedding space, and further supports adapter‑only transfer by reusing a frozen decoder‑side LoRA while training only a new adapter for another donor. Across classification, fact retrieval, and gist summarization, UAV remains competitive with strong self‑explanation baselines while enabling cross‑model verbalization across model families and scales. Ablations show that decoder‑side tuning mainly improves task behavior, whereas the adapter provides the activation‑grounded factual and semantic information needed for faithful explanations.
Authors: Junhao Wei, Haochen Li, Yanxiao Li, Yifu Zhao, Dexing Yao, Baili Lu, Xudong Ye, Sio-Kei Im, Yapeng Wang, Xu Yang
Abstract: Mining anomalies from unmanned aerial vehicle (UAV) state‑estimation logs is challenging because failures are sparse, temporally structured, and distributed across heterogeneous PX4 telemetry streams with variable sensor availability and missing values. We present AeroTSBoost, a temporal‑statistical boosting framework for real‑world UAV telemetry anomaly mining. AeroTSBoost aligns multivariate flight logs, converts each window into deterministic descriptors that capture distributional shifts, quantile structure, endpoint drift, local dynamics, and lag correlation, and trains a class‑balanced LightGBM detector. On UAV‑SEAD, AeroTSBoost achieves the strongest AUPRC among evaluated classical, supervised tabular, neural reconstruction, recurrent, Granger‑causality‑based, and frequency‑domain baselines. Across five seeds, it reaches 0.7516\pm0.0043 AUPRC and 0.5342\pm0.0108 threshold‑swept event F1, improving AUPRC by 5.79 absolute points over the strongest non‑AeroTSBoost baseline. Under purged chronological and leave‑log‑out protocols, it remains the best AUPRC method, reaching 0.6066\pm0.0193 and 0.6388\pm0.0315, respectively. On related ALFA fixed‑wing UAV fault logs, AeroTSBoost reaches 0.9259\pm0.0076 leave‑sequence‑out AUPRC, ahead of RandomForest (0.8835\pm0.0797) and moments‑only (0.8700\pm0.0481). These results show that deterministic temporal‑statistical representations remain highly competitive for sparse anomaly mining in operational cyber‑physical telemetry.
Authors: Yu Xia, Zhengbo Zhang, Shuaihu Zhang, Zhigang Tu
Abstract: UAV action recognition faces a deployment shift that standard benchmarks often obscure: a model trained on UAV footage captured from low‑depression viewpoints may be required to recognize the same action classes from high‑depression viewpoints. While the action labels remain unchanged, this shift alters body visibility, motion projection, and scene context, encouraging models to rely on viewpoint‑specific shortcuts. We introduce UAV‑OVO, an Out‑of‑Viewpoint generalization benchmark for UAV action recognition. UAV‑OVO derives view scores from uncalibrated videos, uses a view‑isolation band to assign low‑depression videos to the training and in‑distribution test splits while reserving high‑depression videos for out‑of‑distribution testing, and constructs ID/OOD test sets matched by class distribution so that performance differences reflect viewpoint shift rather than label imbalance. Across representative video recognizers, UAV‑OVO reveals a substantial ID/OOD gap: models that fit the low‑depression training distribution well often fail to transfer to held‑out high‑depression views, exposing viewpoint shortcuts hidden by aggregate accuracy. We further propose LATER, LoRA‑Anchored Test‑time Re‑centering, which first adapts the recognizer with Low‑Rank Adaptation (LoRA) and then uses the learned LoRA subspace as a semantic anchor for online feature re‑centering. Specifically, LATER projects target‑domain displacement onto the orthogonal complement of the LoRA subspace before re‑centering features, reducing viewpoint‑induced drift while preserving task‑relevant semantics. Together, UAV‑OVO and LATER provide a controlled testbed and a practical adaptation method for viewpoint‑robust UAV video understanding.
Authors: Xiang Xie, Xiaonan Liu
Abstract: Building height, the third dimension (3D) of urban spatial data, is absent in over 95% of structures in global geospatial databases. For the emerging low‑altitude economy, this data gap forces each aerial platform to rely on real‑time onboard sensing rather than pre‑computed 3D scene geometry. We present the Location Prior Generation Framework (LPGF), a multi‑source data fusion pipeline that integrates Sentinel‑2 imagery, UAV telemetry, vehicle GPS trajectories, and OpenStreetMap footprints into structured, reusable urban location priors. LPGF assigns building heights through a three‑tier priority hierarchy: (1) explicit OSM height tags where available, (2) floor count multiplied by 3.2 m per story where recorded, and (3) building‑type default heights otherwise, yielding a worst‑case error of approximately 5.5 m. An optional shadow‑based height estimation module (SHEM) is activated only when a four‑criterion quality gate is satisfied; when any criterion fails, the pipeline routes to structured fallback. On the MiTra A50 Milan dataset, the quality gate correctly identified two imaging failure modes: sub‑pixel shadows at 10 m GSD and ground shadow merging at 0.93 m GSD, producing a consistent 27‑building prior in both cases. Tier 3 type‑default heights were validated against manual floor counts (n=15), achieving MAE=3.07 m within the 5.0 m uncertainty bound. The framework demonstrates that structured, quality‑gated fusion of universally available data streams can bootstrap 3D scene coverage for low‑altitude urban operations.
Authors: Phuc Duc Nguyen, Ryosuke Isogai, Keitarou Kondou, Satoshi Yasuda, Nobuyasu Shiga, Yozo Shoji
Abstract: In UAV‑to‑UAV communication, airborne UAVs need to detect the location and direction of ultra‑high‑speed millimeter‑wave (mmWave) and Terahertz (THz) coverage areas, referred to as ultra‑spots. This predictive capability allows UAVs to optimally adjust their flight paths, altitude, and velocity, thereby maximizing the utilization of ultra‑spot services. A space‑time synchronization technique employing multiple Wireless Two‑way Interferometry devices (multi‑Wi‑Wi) is proposed in this paper to detect mmWave/THz ultra‑spot locations during UAV operations. This paper proposes an algorithm that estimates the likelihood of nearby ultra‑spots by considering the UAV flight route and ultra‑spot direction, and by sharing location and pose information among UAVs in the network via a 920 MHz wireless communication link. For the first time, this work addresses the problem of optimizing UAV flight routes to maximize ultra‑spot utilization. To address the inherent challenges of Wi‑Wi, such as phase data unreliability, RSSI attenuation, or packet loss caused by obstructions from the UAV's own body, this study proposes the use of multiple Wi‑Wi devices equipped with antennas positioned at different positions around the arms of the UAV to leverage spatial diversity effects. The proposed method's effectiveness is confirmed through experimental data derived from real‑world UAV‑to‑UAV communication tests. An error of 37.16 cm was observed experimentally in ultra‑spot location estimation, corresponding to 186 ms error in temporal prediction of ultra‑spot entry from an in‑flight UAV, demonstrating its effectiveness in addressing ultra‑spot detection challenges in mmWave communication.
Authors: Shengjun Zhang, Tingyi Liu, Heng Zhang, Dong Xie
Abstract: This letter studies distributed stochastic optimization over a peer‑to‑peer network when agents can query only zeroth‑order function values. We propose ZOOM‑PB, a coordinate‑sampling distributed zeroth‑order method equipped with a fractional‑power powerball map. Unlike existing distributed zeroth‑order methods that mainly refine gradient estimation or introduce primal‑‑dual tracking, the proposed mechanism acts as a nonlinear feedback gain on the estimated gradient: it amplifies weak signals in flat regions and attenuates large stochastic estimates without adding transmitted states. Under standard smoothness, oracle‑variance, and network‑connectivity assumptions, ZOOM‑PB achieves the leading nonconvex stationarity rate \mathcalO(\sqrtp/(nT)), where p is the decision dimension, n is the number of agents, and T is the iteration horizon. Under the Polyak‑‑Łojasiewicz condition, it further attains the leading objective residual rate \mathcalO(p/(nT)). Thus the method preserves the known distributed ZO order while changing the finite‑time behavior through a local nonlinear control gain. Simulations on black‑box learning and sensor‑driven UAV source seeking show faster empirical convergence in weak‑signal regimes.
Authors: Shiqian Guo, Jianqing Liu, Beatriz Lorenzo
Abstract: Federated learning (FL) is an effective paradigm for enhancing the learning capability of edge devices while preserving data privacy. In geographically dispersed FL systems, such as sensor networks in remote areas, unmanned aerial vehicles (UAVs) can flexibly establish high‑quality communication links to support parameter exchange. However, device heterogeneity and the limited battery capacity of UAVs pose significant challenges. Specifically, data heterogeneity slows convergence, while scheduling all devices for global collaboration incurs excessive communication and energy costs. To overcome these challenges, we adopt a strict separation between a globally shared backbone and permanently local personalization heads, thereby mitigating the impact of data heterogeneity. Furthermore, we propose a gradient‑based scheduling strategy that jointly considers energy efficiency and learning performance. In each communication round, the backbone is updated only by the top‑α devices ranked by gradient \ell_2‑norm, ensuring that optimization focuses on the most informative updates. Simulation results demonstrate that the proposed scheme achieves higher learning accuracy than state‑of‑the‑art approaches while significantly reducing UAV energy consumption.
Authors: Hichem Cheriet, Badra Khellat Kihel, Samira Chouraqui
Abstract: Integrating artificial intelligence (AI) into sampling‑based motion planning provides new possibilities for improving autonomous navigation efficiency. In this paper, three algorithms, namely RRT, Neural RRT, and Neural Informed RRT, are implemented and evaluated on environments containing convex and concave obstacles with different obstacle densities. The obtained results indicate that neural‑guided planners improve path quality, producing up to 14% shorter paths and 55‑‑75% smoother trajectories compared with the conventional RRT algorithm. Among the evaluated methods, Neural Informed RRT achieves the best overall performance in terms of path length and trajectory smoothness. These results demonstrate the effectiveness of AI‑guided sampling strategies for improving reliability and trajectory efficiency in robotic and UAV navigation, despite a slight increase in computation time. Overall, the study highlights the growing importance of artificial intelligence in real‑time robotic path planning applications.
Authors: Prateek Priyaranjan Pradhan, Ketan Rajawat, Mangal Kothari
Abstract: Communication‑aware trajectory generation for unmanned aerial vehicles (UAVs) operating in urban environments requires simultaneous consideration of vehicle dynamics, wireless communication quality, obstacle avoidance, and onboard energy limitations. In such missions, UAVs must navigate through obstacle‑rich environments while ensuring reliable relay of mission‑critical sensory information to ground infrastructure. This results in a highly nonlinear and nonconvex optimal control problem involving coupled communication and flight‑dynamics constraints. This paper presents a communication‑constrained energy‑optimal trajectory generation framework for quadrotor UAVs operating in urban environments. The proposed formulation incorporates full rigid‑body quadrotor dynamics, urban wireless communication models, cumulative data throughput constraints, and obstacle avoidance requirements within a unified free‑final‑time optimal control framework. Unlike conventional approaches based on simplified kinematic or point‑mass models, the proposed framework generates dynamically feasible trajectories suitable for practical aerial platforms. The resulting nonconvex optimal control problem is solved iteratively using sequential convex programming (SCP). Numerical simulations for multiple urban mission scenarios demonstrate the ability of the proposed framework to generate energy‑efficient and communication‑aware trajectories while adapting mission duration according to data relay requirements. The proposed methodology provides a practical framework for autonomous UAV operations requiring reliable communication in dense urban environments.
Authors: Md Sharif Hossen, Vijay K. Shah, Ismail Guvenc
Abstract: Cellular‑connected unmanned aerial vehicles (UAVs) in 5G NR networks experience propagation and interference conditions that vary significantly with altitude and differ substantially from those experienced by terrestrial users. This is primarily caused by the down‑tilted antenna sectors in 5G NR networks, which cause UAVs to be served (and interfered with) by the sidelobes. In this paper, we develop a 3GPP‑compliant system‑level framework for the consistent characterization of key performance indicators (KPIs) such as reference signal received power (RSRP), reference signal received quality (RSRQ), and signal‑to‑interference‑and‑noise ratio (SINR) in a multi‑site tri‑sector deployment with realistic antenna patterns and probabilistic models for line‑of‑sight (LOS) and non‑LOS (NLOS) conditions. Simulation results demonstrate that a critical transition for aerial users is experienced when going from coverage‑limited to interference‑limited conditions at higher altitudes. Although RSRP is affected by large‑scale propagation characteristics and degrades gradually with increasing altitude and inter‑site distance (ISD), SINR degrades much faster due to increased interference caused by LOS conditions. On the contrary, increasing ISD improves SINR and RSRQ due to lower interference, even as received power is reduced.
Authors: Md Sharif Hossen, Vijay K. Shah, Ismail Guvenc
Abstract: Unmanned aerial vehicles (UAVs) are increasingly integrated into cellular networks to support emerging Internet of Things (IoT) applications. In such settings, reliable communication is critical for electronic conspicuity (EC), enabling UAV detection and tracking in shared airspace. However, UAVs operate at elevated altitudes where enhanced line‑of‑sight (LOS) visibility leads to simultaneous exposure to multiple base stations, resulting in strong inter‑cell interference. This article presents a system‑level analysis of how UAV altitude influences the radio environment and affects EC reliability. Using spatial and network‑level metrics, including serving distance, association behavior, and aggregate received power, we show that increasing altitude leads to stronger multi‑cell interaction, reduced dominance of nearby sectors, and interference‑dominated connectivity. These effects result in fragmented association regions and increased variability in link performance. The analysis is supported by measurement data from a helikite‑based spectrum monitoring campaign and corresponding simulation results. Despite differences in experimental conditions, both approaches exhibit consistent altitude‑dependent trends. These findings provide practical insights for designing altitude‑aware and interference‑aware cellular systems to support reliable UAV operation.
Authors: Jiarong Deng, Liu Chang, Quanshun Yang
Abstract: Ring‑like communication graphs appear in UAV formations, cyclic patrols, perimeter monitoring, and other multi‑agent tasks in which agents exchange information mainly with neighboring vehicles along a closed route. When measurement and actuation noise are persistent, a useful augmentation should improve both the convergence rate of consensus and the steady‑state disagreement level. This paper studies the addition of a single weighted chord to a connected weighted cycle. The central observation is that a chord is not just a generic rank‑one edge update: it splits the cycle into two complementary resistance arcs, and this resistance split governs both the algebraic‑connectivity gain and the Kirchhoff‑index reduction. We first derive exact chord‑induced effective‑resistance and Kirchhoff‑index update formulas, giving a closed‑form coherence objective. We then prove that, under bounded conductances and small resistance discrepancy, near‑antipodal resistance‑balanced chords are near‑optimal for algebraic‑connectivity improvement; an i.i.d. bounded‑conductance model yields the same conclusion with high probability. Finally, because the best convergence‑rate chord and the best coherence chord need not coincide, we formulate the design as a finite Pareto problem and introduce RBAPS and AW‑RBAPS, two resistance‑balanced screening rules that retain only linear or near‑linear candidate sets. Numerical experiments show that AW‑RBAPS remains effective beyond the formal moderate‑heterogeneity regime and approximates the exhaustive Pareto front with mean hypervolume ratio 0.9987 while evaluating about 10.1% of admissible chords.
Authors: Sebastian Ratto Valderrama, Ahmed N. Sayed, Arien Sligar, Jose R. Rosas-Bustos, Omar M. Ramahi, George Shaker
Abstract: Frequency‑modulated continuous‑wave radar sensing often relies on labeled measurements that are costly, restricted, or difficult to collect at scale. This work evaluates physics‑informed digital twins as controlled testbeds for early‑stage quantum‑classical radar learning. Two synthetic radar benchmarks are considered: unmanned aerial vehicle classification from range‑Doppler maps and human fall detection from Doppler‑time spectrograms. For both tasks, inputs are standardized, reduced using principal component analysis, and classified using either a radial basis function support vector classifier or a quantum support vector classifier. All quantum‑kernel results are obtained using noiseless classical simulation; no quantum hardware is used, and no quantum‑advantage claim is made. Across five random seeds, the quantum support vector classifier improves the UAV benchmark from four principal components onward, reaching an accuracy of 0.941 +/‑ 0.012 at eight components, compared with 0.880 +/‑ 0.029 for the classical baseline. On the fall‑detection benchmark, both classifiers perform similarly, with a small quantum‑kernel improvement at higher feature dimensions. A Gaussian‑noise robustness study shows limited performance degradation across the tested noise levels, while preserving the UAV quantum‑kernel gain. These results support digital twins as useful, controlled environments for radar‑QML benchmarking prior to measured‑data validation and hardware execution.
Authors: Anqi Lu, Yun Cheng, Youbing Hu, Zhiqiang Cao, Jie Liu, Zhijun Li
Abstract: The demand for unmanned aerial vehicle (UAV)‑based image acquisition and analysis has surged, with UAVs increasingly utilized for semantic segmentation tasks. To meet the real‑time analysis requirements of UAV remote sensing missions, performing onboard computation and making decisions based on the results is a natural approach. However, deploying semantic segmentation on resource‑constrained UAV platforms presents two significant challenges: 1) hardware constraints limit the ability of UAVs to perform real‑time semantic segmentation, and 2) environmental variations during flight cause data distribution shifts, deviating from the original training data. To address these issues, this paper introduces SkySeg, a heterogeneous multi‑UAV air‑air cooperation framework that integrates computer vision and flight pattern to enable onboard semantic segmentation using low‑cost sensors. SkySeg employs an efficient information fusion inference method, combining low‑definition, wide‑area images with high‑definition, focused‑area images. Additionally, it incorporates a cross‑device test‑time adaptation (TTA) strategy to enhance segmentation performance in dynamic environments by collaboratively addressing distribution shifts of test data streams across UAVs. Experimental results demonstrate that our SkySeg framework accelerates inference latency by approximately 3.6x, improves onboard segmentation accuracy by 5.91%, and achieves a 10.91% average accuracy gain in the wild.
Authors: Dimosthenis Angelis, Leonard Bauersfeld, Davide Scaramuzza, Evangelos Boukas
Abstract: Autonomous landing of Unmanned Aerial Vehicles on maritime vessels is challenging due to the coupled motion of the vehicle and landing platform in open‑sea conditions. This paper presents a reinforcement‑learning‑based approach for autonomous multirotor landing on moving maritime platforms without requiring explicit platform‑state information. The proposed method uses multirotor state measurements together with local visual features, consisting of keypoints and associated descriptors extracted from the landing surface, to predict attitude and thrust commands. These commands are tracked by a conventional low‑level controller. The policy is trained in simulation using synthetic keypoints with randomly generated normalized descriptors, enabling zero‑shot deployment with different local feature extractors onboard the UAV. We evaluate the method in a realistic simulator and show that it outperforms a state‑of‑the‑art Model Predictive Control baseline under platform motions corresponding to ``Very Rough'' sea conditions. Finally, we perform extensive real‑world experiments, demonstrating autonomous onboard landing using two different local feature extractors. To the best of our knowledge, this is the first approach for agile multirotor landing on maritime platforms in turbulent waters that does not rely on an explicit platform‑state representation.
Authors: Jacob Swindell, Michael Lowen, Marija Popovic, Riccardo Polvara
Abstract: Agricultural UAV research requires simulators that integrate realistic 3D scenes, high‑fidelity vehicle dynamics, and robotics middleware, while remaining practical to deploy across heterogeneous development machines. We present Droneulator, a portable UAV simulator architecture that combines RotorPy for multirotor dynamics with Godot 4 for rendering and sensor generation. Droneulator exposes both PX4‑based control and a lightweight WebSocket command path, and publishes synchronised visual and state streams through a Zenoh‑based ROS~2‑compatible pipeline. This integration enables a single stack to support inspection‑oriented data capture, ROS~2/PX4 local planning, and reinforcement learning experiments without modifying the simulator infrastructure. We present quantified validation of the current system across three agricultural UAV workflows: tree‑scale image collection for 3D reconstruction with COLMAP, local planning around canopy obstacles using EGO‑Planner, and closed‑loop reinforcement learning through a custom Gymnasium environment. In the reported setup, the results show that the simulator can sustain low‑latency sensing, support reconstruction‑oriented data collection under varying capture density, execute collision‑free local planning around canopy obstacles, and support stable depth‑sensing‑based policy training for obstacle‑aware navigation. Together, these results show the potential of Droneulator for agricultural UAV inspection, planning, and learning within one deployable stack.
Authors: Liuyang Wang, Feitian Zhang
Abstract: Object detection from Unmanned Aerial Vehicles (UAVs) is challenged by severe ego‑motion, camera jitter, and large scale variations. While modern detectors perform well on static images, their direct application to UAV video often fails, particularly for small objects in dynamic scenes. Existing motion‑based methods either rely on computationally expensive optical flow or use single‑interval differencing, which is sensitive to jitter and limited in capturing diverse motion patterns. We propose a vision‑only motion‑guided detection framework that decouples target motion from camera‑induced disturbances. A homography‑based Global Motion Compensation (GMC) first aligns adjacent frames. We then introduce a Dual‑Interval Motion Extraction strategy that captures both short‑term and long‑term motion cues. To integrate these cues, a lightweight Motion‑Guided Attention (MGA) module enhances feature representations within a Feature Pyramid Network. Experiments on the VisDrone‑VID dataset demonstrate consistent improvements over a strong YOLOv8 baseline under severe ego‑motion. Ablation studies further confirm the effectiveness of the dual‑interval design and the proposed motion‑guided attention mechanism.
Authors: X. Wang, Y. Cao, W. L. W. Leong, Y. R. Tan, S. Huang, S. H. R. Teo, C. Xiang
Abstract: Image‑Based Visual Servoing (IBVS) provides an efficient vision‑guided control paradigm for unmanned aerial vehicles (UAVs) by directly regulating image‑space errors. However, conventional IBVS controllers are vulnerable to two critical issues: loss of closed‑loop stability near the target due to input and state constraints, and control failure caused by intermittent loss of moment‑based visual features under aggressive motion. To address these challenges, this paper proposes a terminal‑constraint model predictive control (TC‑MPC) framework for IBVS, integrated with a Kalman filter (KF)‑based state‑prediction mechanism. The TC‑MPC explicitly incorporates terminal‑state constraints and a terminal cost into the IBVS error dynamics, ensuring recursive feasibility, improved convergence behavior, and closed‑loop stability under control and state constraints. In parallel, the Kalman filter predicts the temporal evolution of image moments during short‑term visual degradation, enabling the controller to preserve control continuity when moment measurements are partially unavailable. The proposed approach is validated through real‑time UAV visual servoing experiments.
Authors: Wenfeng Wu, Luping Xiang, Kun Yang
Abstract: The detection of non‑cooperative unmanned aerial vehicles (UAVs) presents significant challenges for Integrated Sensing and Communication (ISAC) systems due to the inherent limitations of single‑modal perception and the competition for shared communication and sensing resources. To address these challenges, this paper proposes a novel Camera‑Cooperative ISAC (CC‑ISAC) framework that employs multimodal sensing to enable efficient UAV beam steering and tracking. The proposed framework employs cameras for coarse‑grained airspace monitoring and utilizes ISAC for fine‑grained, high‑precision sensing, forming a complementary perception loop that enhances both sensing accuracy and resource efficiency. Within this framework, two key modules are developed: (1) a Vision‑to‑Echo Data Alignment (V2EDA) model that aligns visual and echo‑domain features through cross‑attention mechanisms, and (2) a Multimodal Fusion‑Based Estimation (MMFE) model that integrates historical multimodal data with current observations for robust state estimation. Extensive evaluations conducted on the DeepSense 6G dataset demonstrate that the proposed framework achieves an average reduction of 71% in beam steering overhead and 1.69‑11.15% in tracking overhead while maintaining high angular estimation accuracy. The CC‑ISAC framework effectively mitigates resource contention between sensing and communication, enabling reliable UAV surveillance while freeing substantial system resources for additional communication tasks, thereby representing a practical advancement in ISAC system design.
Authors: Javier Becerril, Maximiliano Vargas, Jennifer Herrera, Joanna Gutierrez, Jorge Rios, Mohsen Amjadian, Constantine Tarawneh, Jinghao Yang, Qi Lu
Abstract: This paper presents a non‑contact approach for vibration‑based structural damage detection using an autonomous and customized cost‑effective unmanned aerial vehicle (UAV). Vibration signals are extracted from video recordings through vision‑based motion tracking to identify shifts in natural frequencies indicative of structural degradation. A laboratory‑scale frame structure is evaluated under healthy and simulated‑damage conditions. The proposed system is validated through an experimental study involving two smartphones, a USB camera, and a custom‑built low‑cost UAV equipped with an onboard camera and an autonomous alignment system for operation in GPS‑denied environments. The displacement time is extracted and analyzed in the frequency domain and compared to reference measurements from contact accelerometers and a finite element model. Experimental results show that all platforms successfully capture the fundamental frequency and its shift due to damage. Although the UAV exhibits slightly higher errors (up to 5.7%) due to platform‑induced disturbances and sensing limitations, it reliably detects damage‑induced frequency changes. Compared to commercial UAV systems, the proposed platform achieves comparable inspection performance at significantly lower cost. These results demonstrate that low‑cost autonomous UAVs provide a practical, flexible, and scalable solution for structural health monitoring, particularly in scenarios where contact‑based sensing is impractical. The findings also support the potential for the deployment of multiple cooperative UAVs to further enhance inspection coverage and robustness.
Authors: Yihang Luo, Jun Chen, Chao Xiao, Yingqian Wang, Zhaoxu Li, Qiang Ling, Xu He, Nuo Chen, Gaowei Guo, Hongge Li, Miao Li, Longguang Wang, Yulan Guo, Li Liu, Wei An, Zhijie Chen
Abstract: The proliferation of unmanned aerial vehicles (UAVs) has created urgent demand for precise UAV monitoring. Existing RGB‑based systems rely on spatial cues that degrade at small scales, particularly with high inter‑type similarity, target‑clutter ambiguity, and low contrast. Multispectral imaging (MSI) encodes material‑aware spectral signatures, yet MSI‑based fine‑grained small‑UAV detection remains underexplored due to lack of dedicated datasets. We introduce UAVNet‑MS, the first multispectral dataset for fine‑grained small‑UAV detection, comprising 15,618 temporally synchronized RGB‑MSI data cubes (1440x1080) with bounding box annotations. The dataset features challenging small objects (93.7% <= 32^2 pixels, average 18^2 pixels, ~0.02% image area) under low contrast. We propose MFDNet, a dual‑stream baseline addressing array‑induced parallax and spatial‑spectral fusion. Extensive evaluation under RGB‑only, MSI‑only, and RGB+MSI protocols against 20 detectors shows MFDNet achieves +6.2% AP50 improvement over best RGB‑only methods, demonstrating spectral cues provide complementary material evidence beyond spatial cues. This work provides foundational dataset, strong baseline, and benchmark for multispectral UAV monitoring research.
Authors: Liming Hou, Yueping Peng, Hexiang Hao, Ji Wang, Xuekai Zhang, Wei Tang, Zecong Ye, Xin Ying, Yubo He
Abstract: Detecting small unmanned aerial vehicles from RGB‑infrared remote‑sensing pairs remains challenging due to tiny target scale, cluttered backgrounds, and spatial misalignment between heterogeneous sensors. Existing bimodal detectors often align or fuse features without assessing the reliability of local cross‑sensor correspondence, allowing mismatch artifacts to propagate into the detection head. To address this issue, we propose LER‑YOLO, a reliability‑aware sparse mixture‑of‑experts framework for misaligned RGB‑infrared UAV detection. LER‑YOLO first introduces an Uncertainty‑Aware Target Alignment module that resamples visible features toward the infrared reference and estimates a spatial reliability map. This reliability prior is then used by a Reliability‑Guided Sparse MoE Fusion module to adaptively select k experts from RGB‑dominant, infrared‑dominant, and interactive fusion experts, enabling trustworthy cross‑modal interaction while suppressing unreliable fusion. Experiments on the public MBU benchmark under a YOLOv5s‑family protocol show that LER‑YOLO achieves 89.7+/‑0.2% AP50 over three independent seeds, with a best result of 89.9%. Extensive ablations, parameter‑matched comparisons, synthetic‑shift evaluations, and complexity analysis demonstrate that the gains mainly come from reliability‑guided expert routing rather than increased model capacity.
Authors: Bingnan Liu, Chenhang Cui, Rui Huang, Jiani Luo, Zhirong Shen, Tinghao Wang, Xiande Huang, Lingbei Meng, Fei Shen, An Zhang
Abstract: We introduce WildRoadBench, a wild aerial road‑damage grounding benchmark that couples direct visual grounding by vision‑language models with autonomous research‑and‑engineering by LLM‑driven agents on a single professionally annotated UAV corpus. The same image set and the same per‑class AP_50 metric are evaluated under two protocols. The VLM Track measures whether a fixed VLM can localise domain‑specific damage from one image and one short prompt under a unified prompting, decoding and parsing pipeline. The Agent Track measures whether an autonomous agent, given only a written task brief, a small exploratory slice and a fixed interaction budget, can search the public web, adapt pretrained components, write training and inference code, and submit predictions through a scalar‑feedback oracle on a hidden holdout. We benchmark a broad pool of closed‑source frontier models and open‑source VLMs together with several frontier LLM‑driven agents. Both routes remain far from reliable performance in this wild setting: closed‑source frontier models lead the VLM leaderboard but still leave more than half of the metric on the table; open‑source grounders plateau well below them, and newer generations or reasoning‑style variants do not consistently improve grounding; small targets collapse for every open‑source model; agents lag the strongest VLM despite richer affordances, and several fail to land a valid submission within the budget. We release the code and data at https://anonymous.4open.science/r/wildroadbench‑0607 to support reproducible follow‑up research.
Authors: Dongli Wu, Zhuoxiao Li, Tongyan Hua, Yinrui Ren, Xiaobao Wei, Rongjun Qin, Wufan Zhao
Abstract: Reconstructing large‑scale urban scenes from sparse aerial views is a crucial yet challenging task. Due to biased top‑down and shallow‑oblique camera poses, sparse aerial captures exhibit strong evidence imbalance: roofs and open regions are repeatedly observed, while facades, distant buildings, and occluded structures receive little multi‑view support. Existing feed‑forward 3D Gaussian Splatting methods directly regress a deterministic representation from sparse inputs, but this often leads to ghosting, melted facades, and stretched textures. Recent pseudo‑view and video‑based generative reconstruction methods use additional supervision or generative priors. However, they often lack a clear separation between observed geometry and prior‑driven content, which can lead to plausible but inconsistent structures. We propose AnyCity, an observation‑grounded generative reconstruction framework for sparse aerial urban scenes. AnyCity first predicts an observation‑supported geometry latent to anchor reliable structures, and then uses scaffold‑conditioned aerial completion tokens to predict a gated residual update for weakly constrained content before Gaussian decoding. During training, dense‑to‑sparse distillation transfers structural cues from dense‑view reconstruction, while an aerial‑adapted video diffusion prior provides fine‑grained urban appearance cues through gated token conditioning. Observation‑preserving objectives keep the refined representation consistent with input‑supported geometry. At inference time, AnyCity reconstructs the final 3D Gaussian scene from sparse aerial views in a single feed‑forward pass, achieving coherent urban novel‑view synthesis with second‑level inference. Experiments on synthetic, aerial‑domain, UAV‑textured, and real‑world scenes show consistent improvements over feed‑forward baselines.
Authors: Dexing Yao, Haochen Li, Junhao Wei, Yifu Zhao, Yanxiao Li, Jiahui Xu, Jinxuan Hu, Lele Tian, Baili Lu, Zikun Li, Xu Yang, Sio-Kei Im, Dingcheng Yang, Yapeng Wang
Abstract: Autonomous UAV flight in confined, wall‑dense environments requires low‑latency and reliable motion planning under strict safety constraints. Traditional optimization‑based planners suffer from mapping latency and easily fall into local minima when navigating through dense structural obstacles. Meanwhile, existing end‑to‑end learning methods struggle to extract fine‑grained geometric features from raw depth images and lack hard kinodynamic constraints, leading to unpredictable collisions near walls. To address these issues, we propose KIO‑planner, an attention‑guided single‑stage trajectory planning framework. First, we integrate a Convolutional Block Attention Module (CBAM) into the perception backbone to adaptively focus on critical structural edges and traversable space. Second, we introduce a novel Dual Mapping mechanism‑‑comprising physical bounds activation and a deterministic Geometric Safety Shield in the depth‑pixel space‑‑to enforce kinodynamic feasibility and collision‑free flight without global map fusion. Extensive high‑fidelity simulated experiments demonstrate that KIO‑planner enables highly agile navigation at speeds up to 3.0 m/s. Compared to the state‑of‑the‑art baseline, KIO‑planner achieves lower inference latency (approximately 24 ms) and generates significantly smoother trajectories, reducing control cost by 28.4%. Most notably, our Dual Mapping substantially increases the worst‑case safety margin, measured by minimum distance to obstacles, from 0.48 m to 0.76 m, ensuring fast, smooth, and safer navigation in highly constrained environments.
Authors: Jinhan Li, Xijie Huang, Zhaoqi Wang, Yijin Wang, Weiqi Ge, Qiyi He, Mo Zhu, Fei Gao, Yuze Wu, Xin Zhou
Abstract: In the field of Vision‑Language Navigation (VLN), aerial datasets remain limited in their ability to combine scale, diversity, and realism, often relying on either costly real‑world scenes or visually limited simulations. To address these challenges, we introduce FlyMirage, a highly scalable and fully automated data generation pipeline for aerial VLN. Our approach leverages large language models (LLM) as an environment designer to promote scene diversity, paired with a generative world model that instantiates these designs into high‑fidelity 3D Gaussian Splatting (3DGS) scenes. To substantially reduce human labor and ensure the feasibility of flight data, FlyMirage automates scene exploration and semantic information acquisition, and further integrates a dynamically feasible planner for uncrewed aerial vehicle (UAV) trajectory generation. Utilizing this toolchain, we generate a large‑scale, diverse, and photorealistic aerial VLN dataset, with dynamically feasible flying trajectories, designed to support the development of next‑generation embodied navigation models.
Authors: João Pedro Matos-Carvalho, Laio Oriel Seman, Stefano Frizzo Stefenon, Mohammad Khalaf Mohammad Khreasat, Gabriel Villarrubia González
Abstract: The inspection of electrical power line insulators is essential for ensuring grid reliability and preventing failures caused by damaged or degraded insulation components. In recent years, Unmanned Aerial Vehicles (UAVs) combined with deep learning‑based vision systems have emerged as an effective solution for automating this process. However, insulator fault detection remains challenging due to small defect regions, heterogeneous fault patterns, complex backgrounds, and varying imaging conditions. To address these challenges, this paper proposes an optimized YOLO26‑MoE, a novel object detection architecture that integrates a sparse Mixture‑of‑Experts (MoE) module into the high‑resolution branch of the YOLO26 detector. The proposed modification enables adaptive feature refinement for subtle and diverse fault patterns while preserving the efficiency of a one‑stage detection framework. Hyperparameter optimization, final training, and evaluation were coordinated through a tool‑augmented Large Language Model (LLM) agent. The proposed model achieved 0.9900 mAP@0.5 and 0.9515 mAP@0.5:0.95, outperforming the latest YOLO versions. These results demonstrate that the proposed model provides an effective and reliable solution for UAV‑based insulator fault detection.
Authors: Jingshan Chen, Bochen Yu, Henrik Ebel, Peter Eberhard
Abstract: This paper presents a learning‑augmented trajectory planning framework for cooperative unmanned aerial vehicle (UAV) and unmanned ground vehicle (UGV) handover missions. While centralized trajectory optimization ensures dynamic feasibility and task optimality, its high computational cost limits real‑time applicability. We propose a neural surrogate planner utilizing decoupled encoder‑decoder long short‑term memory (LSTM) networks to generate coordinated handover trajectory predictions from the task specifications. These predictions serve as informed warm starts for the downstream centralized optimizer, thereby accelerating convergence to dynamically feasible solutions. Benchmark evaluations demonstrate that the learning‑augmented planning framework achieves more than a threefold speedup and 100% optimization success rate compared to cold start optimization. The results indicate that combining data‑driven inference with model‑based refinement enables fast and reliable trajectory generation for heterogeneous multi‑robot systems.
Authors: Wenhao Zhuang, Yuyi Mao, Ivan Wang-Hei Ho, Xianghao Yu
Abstract: The low‑altitude economy (LAE) is reshaping the industrial landscape by deploying unmanned aerial vehicles (UAVs) to facilitate a wide range of applications demanding flexible aerial mobility. Integrating edge artificial intelligence (AI) into LAE platforms creates a compelling paradigm where UAVs provide real‑time AI‑driven analysis while simultaneously executing their primary aerial mission duties. However, realizing this paradigm remains challenging due to the strict mission constraints imposed by these primary duties and the throughput bottlenecks of wireless links. To bridge this gap, we propose a UAV‑assisted cooperative edge inference framework where UAVs execute mission‑critical LAE duties, quantified by trajectory deviations from reference paths, while concurrently supporting ground devices via intermediate feature offloading. Within this framework, UAV trajectories, inference task offloading decisions, and feature compression ratios are jointly optimized to maximize the system performance. We cast this joint optimization task into a constrained partially observable Markov decision process (POMDP) framework. To efficiently solve it, we propose HDRL‑MoE, a novel hierarchical deep reinforcement learning framework that decouples the optimization of slow‑varying inference decisions from rapidly changing UAV trajectory control. Furthermore, HDRL‑MoE integrates a mixture‑of‑experts (MoE) architecture, where a router network orchestrates discrete offloading decisions while expert networks independently optimize the feature compression ratios. Extensive simulations show that HDRL‑MoE achieves significant inference accuracy gains over baselines and exhibits high scalability and efficiency through its MoE design.
Authors: Carlos A. Durán Paredes, Javier E. León Calderón, Nicolás Sánchez Perea, German Darío Díaz, Camilo Segura Quintero
Abstract: Unmanned aerial vehicles (UAVs) are cyber‑physical systems whose attack surface spans networked avionics and on‑board sensor fusion: a compromised GPS or battery module can mimic a benign mission segment and evade naive anomaly detectors. We present a leakage‑free evaluation of quantum machine learning for UAV anomaly detection on the multi‑sensor TLM:UAV benchmark. Three contributions support the study. (i) A group‑aware temporal protocol (B2) partitions the dataset into ten contiguous TimeUS blocks and evaluates over ten seeds, eliminating the inflation produced by random stratified splits that mix neighbouring samples. (ii) A three‑mode feature audit (full/loose/strict) quantifies how much accuracy stems from instantaneous physical signals versus contextual proxies (cumulative energy, battery state, GPS trajectory). (iii) A hybrid XGBoost + Data Reuploading (DRU) classifier is benchmarked against five paired non‑linear controls (raw, PCA, polynomial‑2, random‑RBF, and an untrained DRU map) under identical budgets. The standalone DRU does not consistently match the strongest classical baseline across seeds; however, the trained‑DRU hybrid is the only model whose mean F1 macro shifts upward from full to strict (+0.05), a directional signal that the per‑seed standard deviations prevent from being interpreted as a statistically established difference. The trained‑DRU hybrid also records the lowest mean false‑alarm rate under proxy‑free evaluation, subject to the inter‑seed variance reported. We frame this as an incremental, reproducible quantum‑enhanced hybrid benefit, and provide an open Qiskit 2.x implementation as a benchmark for cybersecurity analytics in NISQ‑era aerospace systems.
Authors: Hanxuan Chen, Xiangyue Wang, Songsheng Cheng, Ruilong Ren, Jie Zheng, Shuai Yuan, Tianle Zeng, Hanzhong Guo, Binbo Li, Kangli Wang, Ji Pei
Abstract: We present CosFly, a box‑structured planning and multimodal simulation pipeline for aerial tracking, together with CosFly‑Track, a large‑scale UAV dataset for dynamic target tracking across diverse environments including urban centers, highways, rural landscapes, forests, and coastal towns. In our current implementation on CARLA, CosFly provides a modular 7‑step construction pipeline that converts complex 3D worlds into structured obstacle representations for planning, then projects the resulting trajectories back into multi‑modal sensor data ‑‑ including RGB images, high‑precision depth maps, and semantic segmentation masks ‑‑ paired with natural language navigation instructions. A key feature is the support for configurable fixed‑FOV zoom levels (one FOV setting drawn per trajectory and held constant throughout), enabling simulation of various focal lengths through camera‑intrinsic adjustments. The pipeline covers the complete workflow from 3D map export through grid simplification, pedestrian and drone trajectory planning, multi‑modal rendering with 6‑DOF pose annotations, quality inspection, and teacher‑student caption generation. We analyze two trajectory‑planning paradigms for aerial target tracking: a conventional two‑stage pipeline with front‑end candidate generation and backend refinement, and a direct gradient‑based formulation that optimizes multiple tracking constraints in a single objective. The public CosFly‑Track release contains 250 validated trajectories and approximately 100,000 rendered images with complete 6‑DOF drone pose annotations (position x, y, z and orientation yaw, pitch, roll). Together, the pipeline and dataset establish a scalable foundation for aerial‑ground collaborative research, supporting dynamic target tracking, UAV navigation, and multi‑modal perception across diverse environments.
Authors: Kenan Majewski, Marcin Żugaj
Abstract: Unmanned Aerial Vehicles in dynamic environments face telemetry outages, structural vibrations, and regime‑dependent noise that invalidate the stationary covariance assumptions of classical Kalman filters. The Sage‑Husa Kalman Filter (SHKF) estimates noise statistics online, but its reliance on a static, scalar forgetting factor forces a strict compromise between steady‑state stability and transient responsiveness. We introduce the N‑Deep Recurrent Sage‑Husa Filter (NDR‑SHKF), which replaces this scalar parameter with a vector‑valued memory attenuation policy learned by a hierarchical recurrent network operating on whitened innovation sequences. A bifurcated architecture routes shallow recurrent states to capture instantaneous sensor anomalies and deep states to encode sustained dynamic trends, while an auxiliary reconstruction objective prevents feature collapse. The complete filter, including recursive covariance updates, is trained end‑to‑end via backpropagation through time to directly minimize state estimation error. Evaluations on topologically distinct chaotic attractors demonstrate cross‑domain generalization, outperforming purely data‑driven baselines that diverge under out‑of‑distribution dynamics. Furthermore, evaluations on recorded real‑world UAV flight datasets validate the framework's practical viability, demonstrating its capacity to bridge transitions into proprioceptive dead reckoning and outperform classical adaptive estimators during sensor outages.
Authors: Z. Jiang
Abstract: This paper proposes a Shared Backbone Proximal Policy Optimization (Shared Backbone PPO) algorithm. By sharing the base module between the Actor and Critic networks, the algorithm achieves efficient training and improved performance. The algorithm is implemented in a connectivity‑preserving multi‑UAV swarm communication coverage task and compared with the standard PPO algorithm. Experimental results demonstrate that the proposed method achieves superior performance. Furthermore, a graph information aggregation module is incorporated into the model architecture to accommodate the communication conditions among agents. With the integration of this module, the algorithm remains effective, and the trained agent swarm exhibits a higher level of cooperation.
Authors: Yiqin Deng, Zihan Fang, Yijie Wang, Qingxiao Huang, Junhui Gao, Qianyao Ren, Yuguang Fang
Abstract: This letter investigates computing‑accessibility‑aware cooperative 3D deployment of multiple UAVs for task completion enhancement, termed CA3D. We first provide a theoretical analysis showing that computing accessibility is the key mechanism linking UAV deployment to delay‑constrained task completion, and that UAV inter‑spacing creates a fundamental tradeoff between computing‑resource accessibility and task completion. We then develop a cooperative 3D deployment design that jointly balances accessible computing capacity, task completion probability, and redundant UAV overlap. Simulation results under heterogeneous computing node capacities show that CA3D consistently outperforms Random, Fixed, and Greedy deployment baselines under both hotspot and random ground user (GU) distributions. Under the hotspot GU distribution, CA3D achieves nearly full task completion, improving the task completion probability by about 3.3x over Random deployment when the number of UAVs is 8. Under a more challenging random GU distribution, CA3D still achieves about 35% higher task completion probability than the best baseline when the number of UAVs is 12. These results demonstrate that computing‑accessibility‑aware cooperative 3D deployment improves not only task completion but also robustness to GU distribution changes.
Authors: Qingyun Luo, Jingqing Wang, Wenchi Cheng
Abstract: Wireless resource allocation in digital‑twin‑enabled unmanned aerial vehicle (UAV) swarms must be both network‑feasible and certifiably safe for closed‑loop control.
Existing packet‑level or scalar‑priority schedulers cannot meaningfully compare heterogeneous multi‑hop actions that differ simultaneously in route, retransmission depth, blocklength, bidirectional delay, delivery probability, and TDMA slot cost.
This paper introduces a certificate‑guided resource allocation framework for low‑altitude multi‑hop UAV swarms.
A digital twin maps predicted topology, channel, route, and controller‑side state into a shared five‑dimensional quality‑of‑service (QoS) certificate comprising uplink/downlink delay bounds, directional delivery guarantees, and a certified upper bound on the interval between successful bidirectional interactions.
A state‑conditioned stochastic drift test then admits only certificates whose augmented Lyapunov drift is nonpositive under the current controller state.
Admitted actions are reduced to certified supply frontiers by removing dominated route‑slot configurations, and the online scheduler maximizes Lyapunov‑drift reduction under a shared TDMA slot budget via exact dynamic programming.
Closed‑loop ns‑3 simulations demonstrate that the proposed framework outperforms fixed‑service, certificate‑filtered fixed‑priority, dynamic‑transmission‑count, and value‑of‑information baselines in both tracking accuracy and high‑risk state suppression under identical communication budgets.
Authors: Xiangyue Wang, Hanxuan Chen, Songsheng Cheng, Ruilong Ren, Jie Zheng, Shuai Yuan, Tianle Zeng, Hanzhong Guo, Kangli Wang, Ji Pei
Abstract: Recent aerial vision‑language navigation (VLN) datasets have grown rapidly, but they primarily address goal‑oriented navigation to static destinations, leaving UAV visual tracking ‑‑ continuously following a moving target while maintaining visibility ‑‑ largely without dedicated training data. We introduce CosFlyTrack, a large‑scale multi‑modal dataset and scalable generation pipeline for UAV visual tracking in urban environments. The dataset provides approximately 12,000 expert and perturbed UAV trajectories generated from 6,000 pedestrian paths, comprising 2.4 million timesteps (approximately 334 hours) with seven aligned data channels: RGB, metric depth, semantic segmentation, six‑degree‑of‑freedom drone pose, target state with visibility flag, bilingual (Chinese‑English) instructions, and trajectory‑pair metadata. To generate high‑quality expert trajectories, we develop MuCO, a multi‑constraint optimizer that plans directly in continuous three‑dimensional space with BVH‑accelerated collision and visibility queries, jointly enforcing target visibility, viewpoint quality, collision avoidance, smoothness, and kinematic feasibility, avoiding the discretization artifacts and post‑hoc smoothing of grid‑based planners. Fine‑tuning experiments on seven vision‑language models show that CosFlyTrack improves tracking performance to 78.3 to 95.6 percent SR@1 meter, a 53 to 69 percentage point gain over zero‑shot baselines, supporting the dataset as a training resource for dynamic target‑following agents. The dataset is publicly available at https://huggingface.co/datasets/AutelRobotics/CosFly; evaluation scripts and pre‑trained checkpoints are hosted at https://huggingface.co/AutelRobotics/CosFly‑Track.
Authors: Immanuel R. Santjoko, Richie R. Suganda, Miao Pan, Bin Hu
Abstract: This letter proposes a distributed 3D leader‑follower formation (3D‑LFF) control framework for multi‑UAV systems that achieves formation tracking while enforcing perception safety constraints. Maintaining safe, vision‑based 3D‑LFF is challenging because onboard cameras impose strict Field‑of‑View (FOV) limitations, and demanding formation commands can drive the leader outside the follower's camera frustum, resulting in loss of visibility. To address this issue, we develop a perception‑aware safe control architecture that guarantees visibility by construction. First, we derive a relative kinematic model in a line‑of‑sight coordinate representation and design a distributed 3D‑LFF tracking controller using only locally available relative states. Next, we embed the nominal formation controller within a Control Barrier Function‑based Quadratic Program (CBF‑QP) safety filter that minimally modifies the commanded velocities to maintain the leader inside the follower's camera frustum while preserving formation tracking whenever feasible. Gazebo simulations and Crazyflie hardware experiments validate the proposed approach, demonstrating accurate formation tracking and effective FOV enforcement, including scenarios in which the nominal desired formation conflicts with visibility constraints.
Authors: Evgenii Vinogradov
Abstract: We introduce UPSim (UxNB Propagation Simulator), a ray tracing‑calibrated, semi‑deterministic solution for spatially consistent FR3 air‑to‑ground propagation modeling in uncrewed aerial vehicle (UAV) networks. Instead of launching rays for every receiver position, UPSim derives deterministic visibility regions from 3D building geometry via shadow projection. It then augments these regions with line‑of‑sight (LOS) state‑specific and altitude‑aware path loss, correlated large‑scale fading, and small‑scale fading. Calibration and validation against FR3 ray tracing data using the global 3D‑GloBFP building dataset demonstrate that UPSim accurately reproduces empirical channel distributions. Furthermore, the resulting maps support route‑based analysis of channel evolution over complex urban layouts, exposing critical trajectory‑level statistics such as outage distances. Consequently, UPSim offers a highly scalable, practical middle ground between computationally expensive full ray tracing and purely stochastic channel generation for mobility‑aware planning and radio‑map construction in aerial access scenarios.
Authors: Lingyi Wang, Tingyu Shui, Walid Saad, Pascal Adjakple
Abstract: Semantic communication has emerged as a promising paradigm for enabling goal‑oriented networking. However, most existing semantic communication solutions are tailored to one‑shot tasks and optimize instantaneous performance. Hence, they cannot be used to support closed‑loop dynamic systems with physical artificial intelligence (AI), in which the transmitted semantics affect not only the current inference outcome but also future control actions, state evolution, and ultimately long‑horizon task performance. To address this gap, this paper investigates goal‑oriented semantic communications for physical AI systems with closed‑loop sensing‑communication‑inference‑control. In particular, the problem of semantic communications is formulated as a long‑term return‑per‑bit maximization under wireless bit‑budget constraints while capturing both control efficiency and communication efficiency. To solve this problem, a novel causal information value (CIV) metric is introduced to evaluate the marginal contribution of each semantic token to the expected long‑term return by transmission interventions. Then, a world‑model‑enabled causal digital twin (WM‑CDT) framework is proposed to capture the dynamics of closed‑loop physical AI systems and enable counterfactual reasoning for long‑horizon imagined rollouts. Based on these imagined rollouts, an actor‑critic policy is trained for long‑horizon agent control with high data efficiency, while the semantic token selector is trained through CIV‑per‑bit evaluation. Extensive simulations on an AirSim‑Sionna‑based unmanned aerial vehicle (UAV) navigation simulator show that the proposed WM‑CDT framework achieves significant improvement in return‑per‑kbit and navigation success rate compared to existing reinforcement learning solutions.
Authors: Mitsutaka Nakada, Takahiko Ikebata, Kengo Ikebata, Yuji Mizuno, Yusuke Onoda, Ryuichi Takeshige, Kyaw Kyaw Htoo, Kanehiro Kitayama, Robert Ong, Masanori Onishi
Abstract: We present a highly detailed instance segmentation model for delineating individual tree crowns in natural broadleaf forests using aerial imagery acquired by unmanned aerial vehicles (UAVs). Tree crown delineation in broadleaf forests is more challenging than in other forest types due to diversity of crown shapes and the lack of clearly defined treetops. To address this issue, we developed a deep‑learning‑based crown segmentation model trained on high‑quality annotated crown outlines. We manually delineated 18,507 crown polygons from orthomosaic images collected across seven forests in Japan by skilled annotators, and developed a model based on Mask2Former with multiple backbone architectures. The best model achieved high segmentation performance in structurally complex broadleaf forests using only RGB imagery. This performance was maintained when applied to geographically distinct forests within Japan, as well as to biologically distinct tropical rainforests in Borneo. These results demonstrate that using a large number of high‑quality annotated datasets is critical for achieving detailed and generalizable crown segmentation across diverse forest ecosystems. The developed model has been integrated into DF Scanner Pro, a software that supports practical forest monitoring using UAVs, and this implementation is expected to enable a wide range of users to analyze tree‑level information in broadleaf forest from UAVs.
Authors: Luca Morando, Nishanth Bobbili, Giuseppe Loianno
Abstract: Gliding offers small fixed‑wing UAVs extended endurance and silent operation but requires accurate energy management, especially under wind disturbances and obstacle constraints. Traditional Total Energy Control Systems based controllers regulate the trade between potential and kinetic energy reactively, often requiring fine‑tuning and trim‑conditions knowledge. In this work, we shift the regulation to the planning level and present a nonlinear, multi‑cost trajectory planner for small UAV gliders. The method generates \mathcalC^3 continuous trajectories based on Bernstein polynomials, mapped into control commands through differential flatness, and re‑planned online to match experimentally derived sink polar curves. A simulated netto variometer is integrated into the optimization to estimate air mass motion, constraining the glide to energy‑balanced states. Consecutive gliding trajectories are linked by cruising segments computed through trajectories initialized on Dubins path‑based waypoints, enabling hybrid missions that combine powered and unpowered flight. The approach is validated in CFD simulations and real‑world experiments with a fixed‑wing platform, showing reliable stabilization of sink rate, airspeed, and glide ratio under wind gusts and in presence of obstacles.
Authors: Anindya Sarkar, Srikumar Sastry, Aleksis Pirinen, Nathan Jacobs, Yevgeniy Vorobeychik
Abstract: Visual active search (VAS) has been introduced as a modeling framework that leverages visual cues to direct aerial (e.g., UAV‑based) exploration and pinpoint areas of interest within extensive geospatial regions. Potential applications of VAS include detecting hotspots for rare wildlife poaching, aiding search‑and‑rescue missions, and uncovering illegal trafficking of weapons, among other uses. Previous VAS approaches assume that the entire search space is known upfront, which is often unrealistic due to constraints such as a restricted field of view and high acquisition costs, and they typically learn policies tailored to specific target objects, which limits their ability to search for multiple target categories simultaneously. In this work, we propose DiffVAS, a target‑conditioned policy that searches for diverse objects simultaneously according to task requirements in partially observable environments, which advances the deployment of visual active search policies in real‑world applications. DiffVAS leverages a diffusion model to reconstruct the entire geospatial area from sequentially observed partial glimpses, which enables a target‑conditioned reinforcement learning‑based planning module to effectively reason and guide subsequent search steps. Extensive experiments demonstrate that DiffVAS excels in searching diverse objects in partially observable environments, significantly surpassing state‑of‑the‑art methods on several datasets.
Authors: Kangning Cui, Surendra Bohara, Suraj Prasai, Zishan Shao, Wei Tang, Martin Pillaca, Edwin Flores, Zhen Yang, Gregory Larsen, Evan Dethier, David Lutz, Jean-Michel Morel, Miles Silman, Victor Pauca, Fan Yang
Abstract: Illegal gold mining in the Amazon rainforest causes deforestation, water contamination, and long‑term ecosystem disruption, yet remains difficult to monitor at fine spatial scales. Satellite imagery supports large‑scale observation, but often misses small mining‑related structures and subtle land‑cover transitions, especially under frequent cloud cover. We introduce ELDOR, a large‑scale UAV benchmark for monitoring environmental and landscape disturbance from illegal gold mining in the rainforest. ELDOR contains manually annotated orthomosaic imagery covering over 2,500 hectares, with pixel‑level semantic labels for both mining‑related activities and surrounding ecological structures. With this unified annotation source, we establish four benchmark tasks: semantic segmentation, segmentation‑derived recognition, direct multi‑label classification, and class‑presence recognition with vision‑language models. Across these tasks, we compare generic and remote‑sensing‑specific segmentation models, vision foundation model‑related segmentation methods, direct multi‑label classification methods, and vision‑language models under a controlled closed‑set protocol. Results show that current methods still struggle with rare small‑scale mining structures and fine‑grained recovery classes, suggesting the need for context‑aware and multimodal modeling. To support domain analysis and practical use, we further build an interactive explorer for domain experts that provides a unified interface for data exploration and model inference.
Authors: Nitik Jain, Mangal Kothari
Abstract: Reliable detection of humans beneath forest canopy remains a difficult remote‑sensing challenge due to sparse, structured, and viewpoint‑dependent occlusion. This paper presents a multimodal proof‑of‑concept pipeline that integrates three complementary approaches: (i) experimental evaluation of LiDAR returns through vegetation to assess the feasibility of active sensing, (ii) visible‑‑thermal image fusion using a multi‑scale transform and sparse‑representation framework to enhance human saliency, and (iii) synthetic‑aperture image formation via Airborne Optical Sectioning (AOS) to suppress canopy clutter. A YOLOv5 detector is fine‑tuned on the Teledyne FLIR thermal dataset and evaluated on thermal and fused imagery. Results show that the tested terrestrial LiDAR configuration provides limited penetration for object‑level detection, while visible‑‑thermal fusion improves target visibility in low‑contrast scenes and AOS enhances ground‑plane detection in synthetic forest imagery. The fine‑tuned YOLOv5 achieves a mean average precision of ~0.83 on the top three FLIR classes. These findings establish an initial baseline for UAV‑deployable search‑and‑rescue and surveillance systems operating in forested environments, and motivate future work on dedicated forest datasets and real‑time multimodal integration.
Authors: Hong Hong, Feiyu Liao, Yongheng Liang, Boning Zhang, Haitao Wang, Hejun Wu
Abstract: In obstacle avoidance navigation of unmanned aerial vehicles (UAVs), variations in obstacle scale have received strangely less attention than obstacle number or density. Existing methods typically extract purely geometric features from single‑frame depth observations. Such representations tend to neglect small obstacles and lose spatial context under occlusions caused by large obstacles, leading to noticeable degradation in environments with multi‑scale obstacles. To address this issue, we propose CaMeRL, a Collision‑aware and Memory‑enhanced Reinforcement Learning framework for UAV navigation. The collision‑aware latent representation encodes risk‑sensitive depth cues to preserve fine‑grained obstacle structures, thereby improving sensitivity to small obstacles. The temporal memory module integrates observations across frames, mitigating partial observability caused by large‑obstacle occlusions. We evaluate CaMeRL with multi‑scale obstacles, including ultra‑small and extra‑large obstacle settings. Results show that CaMeRL outperforms state‑of‑the‑art baselines across all scales, with success rate gains of 0.48 and 0.28 in the ultra‑small and extra‑large settings, respectively. More importantly, CaMeRL achieves reliable navigation in cluttered outdoor environments.
Authors: Junhao Wei, Yanxiao Li, Yifu Zhao, Qibin He, Haochen Li, Dexing Yao, Baili Lu, Zhenhong Peng, Yapeng Wang, Sio-Kei Im, Xu Yang
Abstract: UAV multi‑site inspection often reduces to choosing a high‑quality visiting order after target sites have been extracted from a map. This paper develops LA‑BHH, a landscape‑aware bandit hyper‑heuristic that learns an operator‑selection policy online for this routing layer. LA‑BHH treats 2‑opt, swap, relocate, and Or‑opt moves as low‑level arms, builds context from static landscape descriptors and online search‑state features, and updates a LinUCB controller from improvement rewards during the same run. Experimental results on 45 generated Euclidean TSP instances show that LA‑BHH achieves the best mean final gap and convergence AUC, with 0.0223 and 0.0389 respectively. It reduces final gap by 17.6% over UCB‑HH, 22.6% over Random‑HH, and 68.2% over nearest‑neighbor construction. Ablation results further show that contextual credit assignment, 2‑opt repair, and stagnation‑aware state use are the main contributors.
Authors: Yong Zeng
Abstract: For a multi‑user multiple‑input multiple‑output (MU‑MIMO) wireless communication system, imagining that the locations of the users are now fully controllable, what is the maximum sum‑capacity, and what are the corresponding optimal user locations? While these questions are irrelevant in conventional human‑centric communications with random user mobility, they become critically important for emerging applications involving ground or aerial robots. This paper addresses these fundamental questions in the context of MU‑MIMO communications with an unmanned aerial vehicle (UAV) swarm acting as the users. To this end, we first derive closed‑form expressions for the sum‑capacity of MU‑MIMO UAV swarm communications. Our results reveal that, compared to conventional MU‑MIMO systems, the additional degrees of freedom provided by the coordinated mobility of the UAV swarm yields substantial capacity enhancement. Specifically, when the base station (BS) is equipped with an M‑element uniform linear array (ULA), the full spatial multiplexing gain and beamforming gain, both equal to M, can be achieved simultaneously. For a BS with a uniform planar array (UPA), we show that asymptotically \fracπM4 users can simultaneously enjoy the full beamforming gain M. Furthermore, we propose a novel framework to optimize UAV swarm formation for maximizing the sum‑capacity achieved by successive interference cancellation (SIC) and maximizing the sum‑rate via treating interference as noise (TIN), taking into account practical considerations such as collision avoidance and swarm cohesion constraints. By exploiting the manifold structure of the array response vectors with respect to UAV directions, we develop an efficient algorithm to solve the resulting non‑convex formation optimization problems. Extensive simulation results demonstrate that the proposed algorithms achieve near‑optimal performance.
Authors: Iakovos-Christos Zarkadis, Christos Douligeris
Abstract: During the last few years, the term Mechanistic Interpretability, a specific area, under the umbrella of explainable artificial intelligence (XAI), has been introduced, to explain the decisions made by complex machine learning (ML) models in critical systems like UAV intrusion detection systems (UAVIDS). In this paper, we apply best‑practices for data pre‑processing and examine a wide range of tree‑ensembles, deep neural networks, hybrid stacking models and the latest ensemble neural networks to detect intrusions in UAV, with stratified 10‑fold cross validation. With our top‑performing model, XGBoost, we proceed to Shapley Additive explanations (SHAP), to analyze the global and local feature importances and understand which features, each attack targets, to mimic normal traffic and where the misclassifications occur. Furthermore a distribution analysis follows, by visually comparing violin plots and the curves of kernel density estimations. With the Westfall‑Young permutation test for multiple comparisons, the Bandwidth optimization of the KDEs and the selection of Jensen‑Shannon Distance for the test, we discover the true causes of false predictions, observed in Wormhole and Blackhole attacks in UAVIDS‑2025. The findings provide robust, reliable and explainable models for UAV intrusion detection, along with statistical insights, which capture and clarify the masked nature of the attacks, regarding the challenge of Density Support Intersection, between these attacks, in this dataset.
Authors: Pradeep J, Kedarisetty Siddhardha, Ashwini Ratnoo
Abstract: This paper considers fixed‑wing unmanned aerial vehicle (UAV) corridors comprising a main lane, a circular loiter lane for managing traffic congestion, and transit lanes connecting the two. In particular, we address the problem of conflict‑free reinsertion of UAVs from the loiter lane back into the main lane. The loiter lane contains a fixed number of equidistant virtual slots that UAVs can occupy. Reinsertion of loiter UAVs into the main lane becomes essential either due to reduced traffic in the main lane or due to a loiter UAV needing to reach its destination urgently. Given the total number of loiter slots, UAV speed limits, and the minimum safety distance, a guidance algorithm is developed to compute the required speed of a loiter UAV in the transit lane to ensure safe reinsertion. The proposed guidance and automation strategies are validated through numerical simulations.
Authors: Jonathan A. Diller, Fernando Cladera, Camillo J. Taylor, Vijay Kumar
Abstract: Traditional autonomous UAV search missions rely on geometric coverage patterns that ignore the semantic context of the target, leading to significant time waste in large‑scale environments. In this paper we present LMPath, a pipeline for generating language‑mediated exploration priors for Unmanned Aerial Vehicle (UAV) search missions that leverages semantics. Given a basic geofence and an object of interest prompt, LMPath uses generative language models to determine what regions of the environment should contain that object and a foundation vision model ran over satellite imagery to segment sub‑regions that form the exploration prior. This prior can then be used to generate UAV paths with various objectives, such as minimizing the expected time to locate the object of interest, maximizing the probability that the object is found given a limited travel distance, or narrowing down the search space to sub‑regions that are most likely to contain the object. To demonstrate it's capabilities, we used LMPath to generate various UAV paths and ran them using a real UAV over large‑scale environments. We also ran simulations to demonstrate how paths generated using LMPath outperform traditional path planning approaches for search missions.
Authors: Hosam Alamleh, Damir Pulatov
Abstract: Reliable real‑time 3D localization is essential for multi‑UAV navigation, collision avoidance, and coordinated flight, yet onboard estimates can degrade under GNSS multipath, non‑line‑of‑sight reception, vertical drift, and intentional interference. This paper presents a decentralized, lightweight 3D position‑refinement layer that improves robustness by fusing each Unmanned Aerial Vehicle (UAV)'s local estimate with neighbor‑shared state summaries and inter‑UAV range or proximity constraints. The method performs uncertainty‑aware neighborhood fusion by weighting each UAV's prior according to its reported covariance and weighting neighbor constraints according to link quality, ranging uncertainty, and a learned trust score. To support practical deployment, the framework explicitly handles cold start and temporary localization loss by inflating or substituting weak priors, allowing trusted neighborhood constraints to bootstrap and stabilize estimates until absolute sensing recovers. To mitigate the impact of faulty or malicious participants, each UAV applies a local range‑consistency check, smoothed over time, to down‑weight or exclude neighbors whose reported positions are incompatible with observed inter‑UAV distances. Simulation experiments with 10 UAVs in a 3D volume show that the proposed refinement substantially reduces mean localization error during cold start, remains competitive after local estimators stabilize, and maintains lower error as the fraction of malicious nodes increases compared with fusion without trust. These results suggest that the approach can serve as a practical resilience layer for swarm operation in challenging environments.
Authors: Giulio Delama, Martin Scheiber, Yixiao Ge, Tarek Hamel, Stephan Weiss, Robert Mahony
Abstract: Many Inertial Navigation Systems (INS) use Global Navigation Satellite System (GNSS) position as the primary measurement to drive filter performance and bound error growth. However, commercial‑grade GNSS receivers introduce unknown measurement delays ranging from 50 ms to 300 ms depending on sensor quality and operating mode. Such time delays can significantly degrade INS performance unless they are explicitly compensated for. Existing algorithms commonly estimate this delay offline, run the filter concurrently with GNSS measurements using buffered Inertial Measurement Unit (IMU) data, and predict the current state by forward‑integrating buffered inertial measurements via IMU preintegration. The state‑of‑the‑art online method is an Extended Kalman Filter (EKF) that explicitly models the time delay as a state parameter, which defines the preintegration duration. This paper introduces a novel geometric framework for modeling time‑delayed INS, in which Galilean symmetry is leveraged to provide a joint representation of space and time for consistent state estimation. An Equivariant Filter (EqF) is derived for the coupled estimation of navigation states and time delay. Validation is performed on two fixed‑wing Uncrewed Aerial Vehicles (UAV) with GNSS time lags of 90 ms and 120 ms. The test flights last two to three minutes. Simulations further investigate delays up to 500 ms and provide a statistical comparison against the state‑of‑the‑art EKF. Results show that the EqF preserves accuracy and consistency, while the EKF lacks consistency and its performance degrades significantly with increasing measurement delays.
Authors: Hanwen Zhang, Dusit Niyato, Wei Zhang, Xin Lou, Malcolm Yoke Hean Low
Abstract: In cloud manufacturing, unmanned aerial vehicles (UAVs) can support both product collection and mobile edge computing (MEC). This joint operation forms a hybrid scheduling problem, where physical logistics decisions are coupled with computational task scheduling. In this paper, UAVs collect finished products from manufacturing stations and transport them back to a central depot. Meanwhile, computational tasks generated by industrial sensor devices at these stations are processed locally, at UAVs, or offloaded via UAVs to the cloud. This coupling makes the problem challenging. A UAV can provide MEC services only during its service window at a station, so routing decisions directly determine when UAV‑assisted offloading is available. Routing decisions also affect the UAV energy budget and the availability of onboard computing and communication resources for computational task execution under task deadline constraints. To address this, we propose an agentic‑AI‑assisted optimization framework with two components. First, we develop an agentic AI that combines large language models, retrieval‑augmented generation, and chain‑of‑thought reasoning to translate user input into an interpretable mathematical formulation for the hybrid scheduling problem. Second, we design a hierarchical deep reinforcement learning approach based on proximal policy optimization (PPO), where the upper layer learns UAV routing and the lower layer optimizes per‑slot task execution and resource allocation. Simulation results show that the proposed framework yields more consistent formulations, while the hierarchical PPO achieves full product collection in 99.6% of the last 500 episodes and maintains a 100% deadline satisfaction rate, with more stable performance than the advantage actor‑critic approach.
Authors: Ehsan Aghazadeh, Masoud Malekzadeh, Ahmad Ghasemi, Hossein Pishro-Nik
Abstract: Designing continuous trajectories whose time‑averaged occupancy provably matches a prescribed spatial density (the \emphergodic coverage problem) is central to UAV‑assisted data collection and sensing, robotic exploration, and mobile monitoring. For flying agents in particular, this challenge is acute: trajectories must balance coverage fidelity against tight energy budgets, no‑fly zones, and acceleration limits. Existing methods either re‑optimize each trajectory online (with cost growing in the horizon and re‑running for every target, agent, and realization) or rely on bespoke analytical constructions that must be re‑derived for each new constraint. We propose a \emphepushforward framework that decouples ergodicity from density matching: an analytic latent trajectory provides exact uniform ergodicity on a simple annular domain, and a single map, learned offline by optimal‑transport conditional flow matching, transports this latent occupancy onto the prescribed target density. The composed trajectory is then asymptotically ergodic with respect to the learned pushforward distribution, with deviation from the target controlled by the flow‑matching training loss. Once trained for a given target density and constraint set, the map serves an unbounded number of trajectories and a multi‑agent fleet without per‑agent retraining, and many differentiable operational constraints (no‑fly zones, acceleration ceilings, or fairness penalties) enter as additive soft penalties in the training loss without re‑deriving the design. We prove three results (an acceleration‑energy bound, an O(1/\sqrtK) ergodic convergence rate in the number of trajectory cycles K, and an approximation‑error bound) that combine into an end‑to‑end coverage bound estimable from CFM training diagnostics (certified given an architectural Lipschitz bound on v_θ).
Authors: Longchen Niu, Gennaro Notomista
Abstract: This paper presents a safe and energy‑aware optimization‑based control framework for multi‑UAV wildfire suppression under localization and motion uncertainties. We first develop a centralized density‑based controller that couples UAV motion and water deployment in a wildfire‑specific control Lyapunov function. This framework is then extended to a decentralized setting suitable for large‑scale operations using only local information. The controllers use control barrier function constraints to enforce both danger zone avoidance and the ability to reach a charging region. Simulations and real quadcopter experiments demonstrate the controller's effectiveness in fire suppression while preserving safety and energy sufficiency over multiple charge cycles.
Authors: Kexin Zhang, Lixin Li, Yuna Yan, Xin Zhang, Wensheng Lin, Rui Li, Dongwei Zhao, Zhu Han
Abstract: The rapid development of low‑altitude economy has driven the proliferation of Unmanned Aerial Vehicle (UAV) applications, including logistics, inspection, and emergency response. However, transmitting high‑volume image data from UAVs to ground stations faces significant challenges due to limited bandwidth and stringent privacy requirements. To address these issues, a Semantic Communication (SC) framework based on Federated Learning (FL) is proposed for efficient and privacy‑preserving image transmission. A Swin Transformer‑based Semantic Communication (STSC) architecture is designed to extract multi‑scale semantic features under constrained bandwidth conditions. Dedicated communication and computing nodes are deployed on UAVs to enhance real‑time coverage and flexibility. Meanwhile, a FL mechanism enables global model training across distributed devices without sharing raw data, thus preserving user privacy. Simulation experiments conducted on the CIFAR‑10 dataset demonstrate that the proposed STSC framework achieves at least 5.7 dB improvement in Peak Signal‑to‑Noise Ratio (PSNR) compared to DeepJSCC baselines, while also showing superior convergence and generalization performance. The framework effectively integrates UAV‑assisted deployment with SC and privacy protection, offering a practical solution for bandwidth‑constrained image transmission in low‑altitude networks.
Authors: Ali Sidar Yilmaz, Buday Turan, Lukas Pries, Markus Ryll
Abstract: Fully actuated multirotor platforms decouple translational force generation from vehicle attitude, enabling independent control of position and orientation and shifting performance limitations from attitude authority to actuator dynamics and control effectiveness. This paper compares a model‑based nonlinear dynamic inversion controller (geometric NDI) with a sensor‑based incremental dynamic inversion controller (INDI) on a fixed‑tilt fully actuated hexarotor. Both controllers share an identical outer‑loop structure and are both executed at 500 Hz; therefore, performance differences can be attributed primarily to the inversion strategy. Controller performance is evaluated in five experiments covering attitude step tracking under nominal conditions and under a 50% mismatch in the rotor force coefficient, hover disturbance rejection under an external lateral load, waypoint tracking in the presence of wind gust disturbances, reduced control frequency, and injected sensor degradation. The results show that INDI offers clear advantages under parameter mismatch, gust disturbances, and sensor degradation, and maintains lower position errors across the controller‑frequency sweep. However, its advantages are not universal: geometric NDI yields better attitude tracking at reduced control frequencies. To the authors' best knowledge, this work presents the first experimental validation of a full pose tracking INDI controller with decoupled translational and rotational dynamics. These findings highlight the trade‑off between measurement‑based and model‑based inversion for robust control and rapid deployment of fully actuated UAVs.
Authors: Eni Solomon Laughter
Abstract: Accurate multi‑vehicle trajectory prediction in expressway merge and diverge areas is fundamental to the decision‑making frameworks of autonomous vehicle systems. However, the majority of existing graph‑based prediction models are developed and validated on mainline freeway segments and do not address the geometrically distinct interaction structures that characterize merge zones. Furthermore, standard evaluation protocols rely exclusively on displacement error metrics, leaving the safety consequences of predicted trajectories unquantified. This paper proposes a Lane‑Aware Graph Attention Network (LA‑GAT) that encodes vehicle interaction within dynamic scene graphs, augmented with a trainable lane‑relationship attention bias that prioritizes merge‑conflict interactions from the outset of training. The model is pre‑trained on the raw NGSIM US‑101 and I‑80 datasets and subsequently fine‑tuned on UAV‑captured UTE SQM‑W‑1 trajectory data from a Chinese expressway merge area, with final evaluation on the held‑out SQM‑W‑2 dataset. Evaluation spans both displacement metrics (ADE, FDE at 1s, 3s, 5s horizons) and surrogate safety measures (TTC violation rate, DRAC exceedance rate, collision rate). Fine‑tuned results on SQM‑W‑2 yield ADE of 0.865 m at 1s and 2.518 m at 3s, demonstrating that drone‑informed fine‑tuning substantially reduces the cross‑dataset transfer gap. The deliberate use of unfiltered NGSIM data is shown to characterize raw‑condition generalization limits, with the performance degradation attributed to the well‑documented measurement errors in that dataset.
Authors: Alexey Popov, Natalia Trukhina, Vadim Vashkelis
Abstract: Unmanned aerial vehicles (UAVs) can provide flexible traffic surveillance where fixed roadside cameras are unavailable, costly, or impractical. However, raw UAV video is difficult to use for traffic analytics because vehicle motion is observed in perspective image coordinates rather than in a stable metric road coordinate system. This paper presents a lightweight pipeline for converting monocular oblique UAV traffic video into a local metric bird's‑eye‑view (BEV) representation. Visible road geometry, including lane markings, road borders, and crosswalks, is used to estimate a road‑plane homography from image coordinates to metric ground‑plane coordinates. Vehicle observations from dataset annotations or detectors are then projected to BEV using estimated ground contact points. The resulting trajectories support estimation of vehicle direction, speed, heading, and dynamic 3D cuboids on the road plane. We evaluate the pipeline on UAVDT using ground‑truth annotations to isolate calibration and geometric reconstruction from detector and tracker errors. For sequence M1401, 40 sampled frames from img000001‑img000196 produce 632 metric cuboid instances across 23 tracks. Results show that road‑geometry calibration can transform monocular UAV footage into interpretable traffic‑camera‑style analytics, including BEV tracks and synchronized 3D cuboid visualizations. They also reveal key limitations: far‑field vehicles are sensitive to homography errors, manual validation is currently more reliable than fully automatic calibration, and the single‑plane assumption limits performance in non‑planar or ambiguous road regions. The proposed pipeline provides a practical foundation for deployable UAV traffic cameras and future real‑time traffic digital‑twin systems.
Authors: Zijiang Yan, Hao Zhou, Wael Jaafar, Jianhua Pei, Ping Wang, Halim Yanikomeroglu, Hina Tabassum
Abstract: Uncrewed aerial vehicles (UAVs) are increasingly deployed in complex networked environments, yet the joint optimization of multi‑UAV motion control and connectivity remains a fundamental challenge. In this paper, we study a multi‑UAV system operating in an integrated terrestrial and non‑terrestrial network (ITNTN) comprising terrestrial base stations and high‑altitude platform stations (HAPS). We consider a three‑dimensional (3D) aerial highway scenario where UAVs must adapt their motion to ensure collision avoidance, efficient traffic flow, and reliable communication under dynamic and partially observable conditions. We first model the problem as a hierarchical multi‑objective partially observable Markov decision process (H‑MO‑POMDP), capturing the coupling between control and communication objectives. Based on this formulation, we propose a large language model (LLM)‑driven hierarchical multi‑rate control framework. At the global level, an LLM‑based controller on the HAPS performs long‑term planning for load balancing and handover decisions. At the local level, each UAV employs a hybrid controller that integrates a slow‑timescale LLM for high‑level spatial reasoning with a reinforcement learning agent for faster UAV‑to‑infrastructure (U2I) communication and motion control. We further develop a high‑fidelity 3D simulation platform by integrating the gym‑pybullet‑drones environment with 3GPP‑compliant RF/THz channel models. Numerical results demonstrate that the proposed framework significantly outperforms state‑of‑the‑art baselines, achieving a 14% increase in transportation efficiency and a 25% improvement in telecommunication throughput. Additionally, it achieves a 23% reduction in physical collision rates, demonstrating strong handover stability and zero‑shot generalization in dynamic scenarios.
Authors: Jingxian Wang, Chen Yu, David Matthews, Emma Alexander, Sam Kriegman, Michael Rubenstein
Abstract: We introduce Phantom Twist, a type of single‑propeller UAV designed to achieve low visibility through high‑speed spinning and the exploitation of motion blur. We develop a two‑stage automated design pipeline that optimizes the placement of functional components including batteries, control PCB, motor‑propeller assembly, and counterweights. The pipeline minimizes visibility as measured by a human‑aligned perceptual metric (LPIPS) while strictly satisfying inertial and aerodynamic constraints required for stable flight. We validate this approach through fabrication and flight testing of multiple prototypes. These tests confirm that our pipeline produces stable, controllable designs and that the optimized UAV exhibits significantly reduced visual perceptibility compared to conventional quadcopters.
Authors: Hanyu Jin, Zhefan Xu, Haoyu Shen, Xinming Han, Kanlong Ye, Kenji Shimada
Abstract: Indoor infrastructure inspection, such as tunnels and industrial facilities, requires systematic surface coverage to ensure that all inspection targets are properly observed. Unmanned Aerial Vehicles (UAVs) offer an alternative to manual inspection by conducting map‑guided surface inspection using prior structural models. However, in practice, indoor inspection often relies on floorplan‑derived reference maps that may not reflect unforeseen obstacles, such as temporary structures or equipment, leading to occluded viewpoints and degraded inspection quality. Existing coverage planning methods typically assume a fully known inspection environment and perform deterministic global viewpoint optimization based on accurate prior maps, making them vulnerable to environmental discrepancies during execution. This work presents an adaptive UAV inspection framework for partially known structured indoor environments. The proposed method integrates a segment‑based global coverage planner with an inspection‑oriented local view‑angle adaptation module. The global planner organizes planar inspection targets into surface‑aligned clusters to generate compact viewpoint sequences with improved orientation consistency. The local planner generates collision‑free trajectories and adjusts the viewing direction online to mitigate occlusion‑induced coverage loss while preserving the planned trajectory structure. The simulation results across randomized scene configurations demonstrate that the proposed global planner achieves near‑complete coverage while reducing trajectory length compared to representative baselines. Real‑world flight experiments further validate that the framework produces usable inspection data for downstream analysis. These results indicate that the proposed framework improves inspection efficiency and adaptability in partially known structured indoor environments.
Authors: Sandesh More, Sneha Sudhakaran, Marco Carvalho
Abstract: Consumer unmanned aerial vehicles (UAVs) have evolved into capable computing platforms, yet their embedded firmware remains largely inaccessible to the security community. Entry‑level models, in particular those marketed to first‑time and younger operators, commonly ship with limited protection mechanisms and no public documentation of their software internals. This paper presents a systematic study of firmware extraction and validation applied to three Holy Stone consumer drone models: the HS175D, HS720, and HS360S. Rather than pursuing reverse‑engineering outcomes, the work focuses on obtaining reliable, ground‑truth firmware images across heterogeneous hardware designs using only commercially available, low‑cost tooling. Four acquisition methods are evaluated SPI flash in‑circuit reading, SWD/JTAG debug‑port access, UART boot‑message capture, and a clip‑based contact approach that avoids chip desoldering and each is assessed for success rate, image completeness, and operational practicality. Post‑acquisition quality is evaluated through sliding‑window Shannon entropy profiling and structural‑signature analysis using binwalk, together forming a three‑tier validation framework that distinguishes validated images from those that appear successful at the tool level but contain no meaningful firmware content. Static analysis via the EMBA framework confirms that validated images contain identifiable OS components, aging library stacks with known CVE exposure, and no binary‑hardening mechanisms. The resulting corpus and methodology provide a reproducible baseline for firmware rehosting, vulnerability analysis, secure‑boot assessment, and embedded‑systems education within the consumer UAV domain.
Index Terms: consumer UAV, drone firmware, embedded systems security, entropy analysis, firmware extraction, IoT security, SPI flash, SWD/JTAG, UART.
Authors: Haechan Mark Bong, Giovanni Beltrame
Abstract: Safe autonomous Uncrewed Aerial Vehicle (UAV) navigation in urban environments requires real‑time path planning that avoids obstacles. MaxConvNet is a potential‑field planner that leverages properties of Maxwell's equations to generate a path to the goal without local minima. We extend the 2D MaxConvNet magnetic field planner to 3D, using a convolutional autoencoder to predict obstacle‑aware potential fields from LiDAR‑derived 101^3 voxel grids. Evaluation across 100 randomized closed‑loop trials in two distinct Cosys‑AirSim urban environments, a dense night‑time cityscape and a suburban district shows a 100% path planning success rate on both maps without retraining. In offline path planning, 3DMaxConvNet produces path lengths comparable to A on unseen maps while reducing runtime from 0.155‑‑0.17s to 0.087‑‑0.089s, or about 1.7‑‑1.95 times faster than A. Against RRT(3k), 3DMaxConvNet achieves similar path quality while reducing planning runtime from 17.2‑‑17.5s to about 0.09s, which is roughly 193‑‑201 times faster than RRT(3k).
Authors: Zhenyu Liang, Jack C. P. Cheng
Abstract: Landslide monitoring and simulation play an important role in urban safety assessment and disaster prevention. Existing landslide simulation pipelines typically rely on digital elevation model and mesh‑based representations, which are suitable for geometric analysis, but often lack visual realism. This limitation reduces their effectiveness in interactive applications, hazard communication, and public education. In this paper, we propose a UAV‑based scan‑to‑simulation framework that bridges photorealistic scene capture and physics‑based landslide simulation through 3DGS. Specifically, our pipeline includes four stages: (1) UAV‑based acquisition of slope imagery, (2) reconstruction of a low‑anisotropy 3DGS scene representation, (3) volumetric conversion of the target simulation region by filling the interior of the surface‑based model, and (4) integration with the Material Point Method (MPM) for landslide simulation. We validate the proposed framework on a real landslide site in Hong Kong that experienced a severe landslide event. The results show that our method supports both realistic visual reconstruction and effective simulation.
Authors: Mohammed M. H. Qazzaz, Syed Ali Zaidi, Aubida A. Al-Hameed, Abdelaziz Salama, Des Mclernon
Abstract: This paper introduces a proactive Unmanned Aerial Vehicle (UAV) mobility management xApp for Open Radio Access Network (O‑RAN) Near Real‑Time Radio Intelligent Controller (Near‑RT RIC) environments, employing Double Deep Q‑Network (DDQN) reinforcement learning (RL) enhanced with transfer learning to optimise handover decisions for UAVs operating along predetermined flight trajectories. Unlike reactive approaches that respond to signal degradation, the proposed framework anticipates network conditions and minimises both outage probability and handover frequency through predictive optimisation. The system leverages centralised weight averaging to consolidate knowledge from multiple flight scenarios into a global model capable of generalising to previously unseen operational environments without extensive retraining. A comprehensive evaluation demonstrates that the proposed framework achieves a favourable trade‑off between handover frequency and connectivity reliability, reducing handover events by up to 54.6% compared to greedy approaches while maintaining outage probability at practically negligible levels. The results validate the effectiveness of intelligent learning‑based approaches for UAV mobility management in next‑generation O‑RAN architectures, thereby contributing to seamless integration of aerial user equipment into cellular networks.
Authors: Markus Brezovsky, Anatol Günthner, Frederik Schulte, Lukas Winiwarter, Boris Jutzi, Gottfried Mandlburger
Abstract: Through‑water photogrammetry based on UAV imagery enables shallow‑water bathymetry, but refraction at the air‑water interface violates the straight‑ray assumption of Structure‑from‑Motion and causes systematic depth bias. We present BathyFacto, a refraction‑aware two‑media extension of Nerfacto integrated into Nerfstudio that targets metrically precise underwater point clouds. BathyFacto uses a shared hash‑grid‑based density field with a medium‑conditioned color head that receives a one‑bit medium flag (air or water) and traces each camera ray as two segments: a straight segment in air up to a planar water surface and a refracted segment in water computed via Snell's law with known refractive indices. To allocate samples efficiently across the air‑water boundary, we employ a single proposal‑network sampler that operates on a virtual straight ray spanning both media, combined with a kinked density wrapper that transparently corrects water‑segment positions along the refracted direction before density evaluation. A data adaptation pipeline converts photogrammetric reconstructions to a Nerfstudio‑compatible format, estimates the water plane from boundary markers, and provides per‑pixel medium masks to gate refraction. We also extend the point cloud export with refraction‑corrected backprojection and reversible coordinate transforms to world and global frames. On a simulated two‑media scene with known ground truth, BathyFacto with refraction achieves a Cloud‑to‑Mesh mean distance of 0.06 m and 87 % completeness, compared to 0.52 m / 29 % for the Nerfacto baseline and 0.36 m / 21% for conventional MVS without refraction correction.
Authors: Meiqi Tian, Yihan Liu, Bingzhuo Zhong
Abstract: Multi‑sensor integration via error‑state Kalman filter (KF) is widely employed for precise state estimation in cyber‑physical systems (CPSs). However, this integration exposes the system to stealthy deception attacks that render conventional detection mechanisms ineffective. We propose an exposure framework to actively reveal such stealthy attacks without modifying sensor interfaces. The framework introduces a suspect mode in which the defender injects random exposure shakes into the nominal control inputs, thus creating a discrepancy between the defender's true state estimates and the attacker's manipulated state estimates, preventing the attack from remaining stealthy. We further derive an explicit exposure condition that characterizes the minimum shake magnitude to guarantee the finite‑time exposure and a compensable condition that ensures the shakes do not degrade closed‑loop performance. Simulation results based on a GNSS/INS‑integrated UAV system verify the effectiveness of the proposed framework.
Authors: Kevin Zhu, William Tang, Raphael Hay Tene, Zesheng Liu, Nhut Le, Maryam Rahnemoonfar
Abstract: Rapid and accurate damage assessment following natural disasters is critical for effective emergency response. However, identifying fine‑grained damage levels (e.g., distinguishing minor from major roof damage) in UAV imagery remains challenging due to the degradation of texture cues during resizing and extreme class imbalance. We propose DA‑SegFormer, a damage‑aware adaptation of the SegFormer architecture optimized for high‑resolution disaster imagery. Our method introduces a Class‑Aware Sampling strategy to guarantee exposure to rare damage features, and it integrates Online Hard Example Mining (OHEM) with Dice Loss to dynamically focus on underrepresented classes. In addition, we employ a resolution‑preserving inference protocol that maintains native texture details. Evaluated on the RescueNet dataset, DA‑SegFormer achieves 74.61% mIoU, outperforming the baseline by 2.55%. Notably, our improvements yield double‑digit gains in critical damage classes: Minor Damage (+11.7%) and Major Damage (+21.3%).
Authors: Yi Xiao, Qilong Jia, Hang Fan, Pascal Fua, Robert Jenssen, Xiaosong Ma, Wei Xue
Abstract: Many downstream decisions in complex terrain require fast wind estimates at a small number of user‑specified locations and heights for a given forecast valid time, rather than another dense forecast field on a fixed grid. We present WindINR, a latent‑state implicit neural representation framework for continuous high‑resolution local wind query and sparse‑observation correction. WindINR maps static terrain descriptors, a low‑resolution background field, and continuous query coordinates to a high‑resolution wind state through a latent‑conditioned decoder. To enable rapid inference‑time correction, WindINR separates reusable representation learning from sample‑specific latent‑state correction. During training, a privileged encoder infers a reference latent state from high‑resolution supervision, a deployable latent predictor estimates an initial latent state from inference‑time inputs alone, and their discrepancies are summarized into a dataset‑adaptive Gaussian prior over latent corrections. At inference time, within the WindINR module, network weights remain fixed and only the latent state is updated by minimizing a regularized correction objective using sparse observations and their uncertainty. In controlled OSSEs over the Senja region, including a UAV‑aided approach scenario and random‑observation robustness tests, WindINR improves local high‑resolution wind estimates by updating only a compact latent state rather than the full network. The corrected representation remains continuously queryable at arbitrary coordinates and, in our CPU benchmark, yields about a 2.6× online‑correction speedup over full‑network fine‑tuning, suggesting a practical interface between kilometer‑scale background products, sparse local observations, and wind queries in complex terrain.
Authors: Jin Liu, Wang Wang, Hongxu Pu, Zhen Cao, Yasong Wang, Hu Wang, Kunming Luo
Abstract: AI‑assisted bridge defect inspection often produces bounding boxes with crude geometry or raster masks that are costly to store, transmit, and reuse. This study investigates how detected defects can be represented as compact, recoverable contour‑level vector records in image space. We propose Frequency‑Supervised Fourier Series Detection (FS‑FSD), which directly regresses Fourier contour descriptors and evaluates boxes, masks, and contours under a unified polygon‑space protocol. On 3,767 UAV‑collected bridge images with 42,346 defect instances, FS‑FSD achieves higher polygon‑space accuracy and better matched‑TP geometric quality than representative detection, segmentation, and contour baselines. These results show that, compared with bounding boxes and raster masks, Fourier contour records preserve defect‑boundary geometry in a more compact, recoverable, and shareable form for engineering review and downstream information workflows. Future work will study the modeling of multi‑region, fragmented, and adjacent bridge‑defect boundaries and extend the framework toward long‑term bridge‑defect tracking and lifecycle‑oriented management.
Authors: Xindi Wang, Haining Li, Tao Ding, Bolin Cai
Abstract: This paper investigates the multi‑UAV multi‑task coordination problem in infrastructure‑less emergency scenarios, where UAVs collaboratively are required to jointly perform aerial image acquisition and ground‑user communication. To tackle the challenge of balancing heterogeneous tasks within dynamic environments, we propose a hierarchical dynamic weighting Deep Reinforcement Learning (DRL) framework. Specifically, an episode‑level module is introduced to capture global task preferences, while a step‑level module adaptively adjusts the objective weights according to real‑time system conditions. By integrating global and instantaneous weights, the proposed framework improves decision stability and responsiveness during task execution. Simulation results demonstrate that the proposed method achieves faster convergence, more stable training, and higher task completion efficiency than conventional works.
Authors: Alain P. Ndigande, Josiah Wiggins, Sedat Ozer
Abstract: Recent natural disasters have highlighted the urgent need for efficient data‑driven approaches to disaster management. Machine learning (ML) and deep learning (DL) techniques have shown considerable promise in enhancing the key phases of disaster management including mitigation, preparedness, detection, response, and recovery. A critical enabler of successful ML or DL based applications in remote sensing, however, is the accessibility and quality of annotated datasets. With the growing availability of high‑resolution imagery from unmanned aerial vehicles (UAVs) and satellites, computer vision and remote sensing algorithms have become essential tools for rapid detection, situational assessment, and decision‑making in disaster scenarios. This survey provides a comprehensive overview of publicly available image‑based datasets relevant to ML/DL‑based disaster management pipelines. Emphasis is placed on datasets that support computer vision and remote sensing tasks across all phases of disaster events including pre‑disaster, during, and post‑disaster. The goal of this work is to serve as a centralized reference for researchers and practitioners seeking high‑quality datasets for rapid development and deployment of remote sensing‑driven disaster response solutions.
Authors: Qiwei Wang, Zhongyao Tuo, Xianghui Ze, Yujiao Shi
Abstract: Cross‑view localization classically asks: where does this ground image lie on the satellite tile? Existing methods are typically limited to 3‑DoF estimates ‑‑ an (x,y) position and a yaw angle ‑‑ because nadir satellite imagery provides no direct cues for roll, pitch, or altitude, forcing a reliance on planar‑motion and zero‑tilt assumptions. These assumptions break on real terrain with slopes, ramps, and tilted camera mounts. To overcome this, we introduce a single UAV image as an intermediate viewpoint: it reveals the 3D structure invisible from nadir, supplies the cues for roll, pitch, and altitude that the satellite alone cannot provide, and needs only spatial overlap with the ground camera ‑‑ no known relative pose is required. Building on this insight, we propose Cross3R, a flexible feed‑forward model that ingests a satellite tile together with a UAV image, a ground image, or both, and, in a single forward pass, recovers a cross‑view 3D point cloud, the 6‑DoF poses of every input camera, and the on‑tile (x,y) position and yaw of each perspective camera. For training and evaluation, we also construct CrossGeo, a 278K‑image tri‑view dataset spanning 85 scenes across every continent except Antarctica. On CrossGeo, Cross3R consistently outperforms feed‑forward 3D baselines in point‑cloud reconstruction, 6‑DoF camera‑pose estimation, and cross‑view localization. On KITTI, it outperforms dedicated cross‑view methods trained on KITTI on most metrics, despite having no KITTI training itself.
Authors: Yannick Burkhardt, Sebastián Barbas Laina, Simon Boche, Leonard Freißmuth, Stefan Leutenegger
Abstract: The robustness of event cameras to high dynamic range and motion blur holds the potential to improve visual odometry systems in challenging environments. Although their high temporal resolution does not require synchronous processing, most event‑based odometry methods still run at fixed rates, which simplifies system design but restricts latency and throughput. In this work, we present AERO‑VIS, a stereo event‑inertial SLAM system with an integrated, data‑driven, robust, and performance‑optimized keypoint detector. By processing the event stream asynchronously, the system dynamically adapts to downstream runtime demands, ensuring low‑latency and real‑time performance. When deploying AERO‑VIS on a UAV, we achieve unprecedented accuracy in onboard event‑based SLAM. These unique characteristics enable us to present the first purely event‑based inertial SLAM system that demonstrates closed‑loop UAV control and large‑scale state estimation while relying solely on onboard compute. A video of the experiments and the source code are available at ethz‑mrl.github.io/AERO‑VIS.
Authors: Xudong Lv, Yuxiang Sun, Shuo Wang, Nanxing Chen, Jun Guan, Jingtian Hu
Abstract: Optical neural networks are emerging as powerful machine learning and information processing tools because of their potential advantages in speed and energy efficiency. The training methods of these physical models, however, remain underexplored compared to their digital counterparts and are leading to suboptimal performance. This paper reports a pre‑training‑driven approach that leads to snapshot image denoising with substantially improved quality. We demonstrated effective free‑space optical denoising by a diffractive network optimized by a two‑step process including (1) pre‑training using a massive dataset of 3.45 million diverse but simple images and (2) fine‑tuning with the corresponding task‑specific datasets. Compared to conventional Fourier‑domain filtering and directly trained diffractive networks, such a transfer learning process exhibited prominent advantages for denoising images degraded by severe noise, peak signal‑to‑noise ratio (PSNR) below 8 dB, while preserving fine image features and improving the PSNR to above 18 dB. Importantly, the same pre‑trained optical network could be consistently fine‑tuned to process degraded images from highly diverse styles ranging from handwritten digits (MNIST) and chest X‑rays (ChestMNIST) to CIFAR‑10 images and human faces (CelebA). We further demonstrated the critical role of our optical denoisers in vision‑based applications, including face detection, plate recognition, and localization of UAVs in noisy conditions.
Authors: Yijin Wang, Yuru Tian, Xijie Huang, Weiqi Gai, Mo Zhu, Xin Zhou, Yuze Wu, Fei Gao
Abstract: Bird's‑eye‑view (BEV) images have been widely demonstrated to provide valuable prior information for navigation. Given the global information provided by such views, two key challenges remain: how to fully exploit this information and how to reliably use it during execution. In this paper, we propose a navigation system that uses BEV images as global priors and is designed for ground and near‑ground robotic platforms. The system employs an image generation model to interpret human intent from natural language, identify the target destination, and generate traversability masks. During execution, we introduce cross‑view localization to align the robot's odometry with the BEV map and mitigate long‑term drift in conventional odometry. We conduct extensive benchmark experiments to evaluate the proposed method and further validate it on a UAV platform. Using only a conventional local motion planner, the UAV successfully completes a 160‑meter outdoor long‑range navigation task. This work demonstrates how the world‑understanding capabilities of foundation models can be transferred to embodied navigation, enabling robots to benefit from the strong generalization ability of existing image generation models.
Authors: Zirui Wang, Xinjia Luo, Haotian Sun, Jun Ma, Jian Guo, Boyu Zhou
Abstract: Classic exploration methods often rely on dense occupancy maps or high‑resolution point clouds for frontier detection and path planning, resulting in substantial memory consumption and computational overhead. Moreover, micro UAVs under size, weight, and power (SWaP) constraints are not practical to be equipped with sensors like LiDAR to obtain accurate environmental geometric measurements. This paper presents a lightweight autonomous exploration system that leverages omnidirectional vision and sparse topological map guidance. Specifically, we utilize a multi‑fisheye camera setup to achieve omnidirectional Field of View (FoV) and perform depth estimation. To address the limited depth estimation accuracy, frontiers are represented as potential unexplored regions characterized by topological nodes instead of explicit boundaries, enabling efficient identification of frontier regions without maintaining occupancy grids or global point clouds. Unlike classic dense representations, our approach abstracts the environment using a sparse topological map composed of key nodes and their descriptors, reducing memory consumption and computational demands. Global path planning is performed directly on the sparse graph. The proposed method is validated in both simulation and on a palm‑sized vision‑based UAV with an 11 cm wheelbase and a 400 g weight in real‑world experiments, demonstrating that our method can achieve efficient exploration with extremely low computational consumption.
Authors: Hongyang Zhang, Maonnan Wang, Ziyao Wang, Hongrui Yin, Man On Pun
Abstract: Cross‑view geo‑localization (CVGL) is fundamental for precise localization and navigation in GPS‑denied environments, aiming to match ground or UAV imagery with satellite views. Existing approaches often rely on global feature alignment, but they suffer from substantial domain shifts induced by varying regional textures and weather conditions. This issue becomes even more pronounced in UAV‑based scenarios, where the broader perspective inevitably introduces dense, fine‑grained objects, creating significant visual clutter. To address this, we draw inspiration from Object‑Centric Learning (OCL) and propose InfoGeo, an information‑theoretic framework designed to enhance robustness and generalization. InfoGeo reformulates the optimization as an information bottleneck process with two core objectives: (i) maximizing view‑invariant information by aligning the object‑centric structural relations across views, and (ii) minimizing view‑specific noisy signals through cross‑view knowledge constraints. Extensive evaluations across diverse benchmarks and challenging scenarios demonstrate that InfoGeo significantly outperforms state‑of‑the‑art methods.
Authors: Jiawei Xu, Longsen Gao, Rafael Fierro, David Saldaña
Abstract: The interaction of robots with bendable objects in midair presents significant challenges in control, often resulting in performance degradation and potential crashes, especially for aerial robots due to their limited actuation capabilities and constant need to remain airborne. This paper presents an adaptive controller that enables two aerial vehicles to collaboratively follow a trajectory while transporting a bendable object without relying on explicit elasticity models. Our method allows on‑the‑fly adaptation to the object's unknown deformable properties, ensuring stability and performance in trajectory‑tracking tasks. We use Lyapunov analysis to demonstrate that our adaptive controller is asymptotically stable. Our method is evaluated through hardware experiments in various scenarios, demonstrating the capabilities of using multirotor aerial vehicles to handle bendable objects.
Authors: Nathan Meraz, Ronan Taneja, Rachel Chan, Alisha Whitehead, Gabriella Mayrend, Megan Birch, Joseph L. Greene
Abstract: Passive 3D sensing is increasingly critical for early detection and tracking of small aerial vehicles (UAVs), where traditional active ranging can be tactically undesirable. We present SCHeimpflug for Optical Ranging TechnologY (SCHORTY), a single‑aperture passive and active ranging architecture that exploits the Scheimpflug principle to encode range along a tilted object space plane by tilting the sensor relative to the imaging optics. SCHORTY requires only a one‑time geometric calibration to map pixel coordinates to range and is inherently sensor and waveband agnostic. We implement SCHORTY using both a visible frame‑based camera and an event‑based camera (EBC) with closely matched pixel sizes for comparable horizontal resolutions and range binning. Controlled flights of an octocopter and a fixed‑wing UAV equipped with GPS provide ground truth distances out to 1.1 km. Experimental results show that SCHORTY achieves deterministic range assignment limited primarily by the projected pixel size, which grows squared distance, while avoiding computationally intensive inverse reconstructions common in coded aperture and PSF engineered systems. In the EBC configuration, EBC‑SCHORTY inherently suppresses static background and emphasizes motion, improving UAV detectability in cluttered natural scenes and under turbulence and motion blur. Additionally, we observe an asymmetric defocus blur about the object plane that depends on UAV trajectory, suggesting an extra cue for localization and trajectory inference. These results demonstrate SCHORTY as a practical and Size, Weight, and Power (SWaP) efficient passive ranging solution for medium‑range UAV observation and motivate future integration with 2.5D/3D PSF engineering and event‑based deconvolution to enhance 3D sensing performance.
Authors: Chenzhe Jin, Zhuohang Wu, Yifan Cai, Xiangqi Li, Jan Ming Kevin Tan, Narsimlu Kemsaram, Valerio Modugno
Abstract: The decline of natural pollinators has created a major challenge for crop production in controlled indoor agriculture, particularly in vertical farming environments where natural insect pollination is absent. This motivates the development of robotic systems capable of performing precise flower targeting tasks while minimizing physical interference with delicate floral structures. This paper presents an aerial manipulator platform for perception driven flower detection, localization, and approach in vertical farming environments. The proposed system integrates onboard RGBD based perception, model predictive path integral (MPPI) based unmanned aerial vehicle (UAV) control on a PX4 platform, and a lightweight 2DoF manipulator for precise end effector positioning. The platform is evaluated in both MuJoCo simulation and UAV lab experiments using a flower targeting testbed. The experimental results demonstrate stable UAV flight, reliable flower localization, and centimeter level end effector positioning accuracy. In simulation, the proposed controller achieves consistent trajectory convergence and accurate target alignment. In the real world UAV lab environment, the integrated perception control manipulation framework enables stable flower targeted positioning and end effector alignment under constrained aerial operation. These results validate the proposed aerial manipulator as a robust robotic carrier and positioning framework for future contactless pollination systems. While the current study focuses on perception guided targeting and positioning, the developed platform provides a practical foundation for integrating advanced contactless end effectors, including acoustic based pollen manipulation modules, in future work.
Authors: Siwei Cai, Knut Peterson, Quan Tran, Christian Ricks, Dhanush Parthasarathy, Amir Kaidarov, Neil Deshpande, Sukaina Najm, David Han, Lifeng Zhou
Abstract: Heterogeneous air‑ground robot teams combine complementary sensing modalities, mobility characteristics, and spatial viewpoints that can significantly enhance perception in complex outdoor environments. However, progress in multi‑robot collaborative perception has been constrained by the lack of real‑world datasets featuring overlapping multi‑modal observations from platforms operating in unstructured terrain. We present GA3T (Ground‑Aerial Team for Terrain Traversal), a real‑world multi‑robot collaborative perception dataset collected using a Clearpath Husky UGV and an Autel EVO~II UAV across diverse unstructured environments, including forest trails, rocky paths, muddy terrain, snow piles, and grass‑covered fields. The ground platform provides 3D LiDAR, stereo camera, IMU, and GPS data, while the aerial platform contributes RGB imagery, thermal/infrared observations, and GPS from a complementary overhead viewpoint, allowing for rich cross‑modal and cross‑view perception. The dataset is collected in 4 unique environments, with over 13,000 synchronized frames across approximately 29 minutes of operation, and includes both SAM~3‑based zero‑shot segmentation and over 8,000 manually labeled images. A unique aspect of the dataset is its early‑spring collection period, during which sparse tree canopies allow the aerial robot to partially observe the ground robot and terrain through the trees, allowing for occlusion‑aware collaborative perception. Unlike prior multi‑robot datasets that focus on SLAM or simulated cooperative driving, GA3T is specifically designed to support research on cross‑view perception, air‑ground viewpoint fusion, traversability estimation, and collaborative scene understanding in real off‑road environments.
Authors: Donglin Wang, Anjie Qiu, Qiuheng Zhou, Hans D. Schotten
Abstract: Non‑terrestrial networks (NTN) have been standardized by the 3rd generation partnership project (3GPP) as a key component of future 6G systems to enhance coverage and resilience. In particular, NTN technologies such as low‑earth orbit (LEO) satellites, high‑altitude platform stations (HAPS), and unmanned aerial vehicles (UAVs) are expected to support terrestrial networks (TN) during extreme events and disasters. In this paper, we present a lightweight system‑level simulator for evaluating post‑failure fallback behavior in integrated TN‑NTN wireless networks under a partial‑failure disaster model. The simulator follows 3GPP Rel‑17/18 modeling principles, supports probabilistic terrestrial next‑generation node B (gNB) failures, and service migration to NTN. The simulator supports comparative analysis of throughput, packet reception ratio (PRR), and latency under different user loads, disaster severities, and NTN provisioning levels. Results show the expected capacity‑delay tradeoff of terrestrial operation, the reliability and stability of non‑terrestrial service, and the balanced resilience behavior of hybrid TN‑NTN operation. The proposed framework provides a tractable tool for studying wireless network resilience and traffic management in future integrated 6G mobile systems.
Authors: Maoxin Ji, Qiong Wu, Pingyi Fan, Cui Zhang, Nan Cheng, Wen Chen, Khaled B. Letaief
Abstract: This paper investigates a multi‑Unmanned Aerial Vehicle (UAV) joint base station‑assisted Internet of Vehicles (IoV) task offloading system in dense urban environments. To minimize system delay and energy consumption under strict coupling constraints, the complex non‑convex optimization problem is decoupled into a hierarchical execution framework. First, a sequential distributed optimization algorithm based on Second‑Order Cone Programming (SOCP) is proposed to optimize the 3D flight trajectory of each UAV, ensuring adaptive network coverage. Second, a novel hybrid resource scheduling paradigm synergizing Deep Reinforcement Learning (DRL) and Large Language Models (LLMs) is developed. Within this framework, the DRL agent dictates the initial resource allocation, while the LLM acts as a semantic macro‑scheduler to rectify long‑tail allocation imbalances for failed and surplus tasks. Crucially, a reward decoupling mechanism is introduced to isolate DRL training from external LLM interventions, thereby ensuring policy convergence. Finally, the task offloading ratios are precisely determined via Linear Programming (LP) within an alternating optimization loop. Simulation results demonstrate that the proposed method significantly outperforms traditional multi‑agent reinforcement learning baselines in terms of task success rate and system efficiency.
Authors: Andrea Iannoli, Lorenzo Gigli, Luca Sciullo, Angelo Trotta, Marco Di Felice
Abstract: Large Language Models (LLMs) are increasingly explored as high‑level reasoning engines for cyber‑physical systems, yet their application to real‑time UAV swarm management remains challenging due to heterogeneous interfaces, limited grounding, and the need for long‑running closed‑loop execution. This paper presents a mission‑agnostic, agent‑enhanced LLM framework for UAV swarm control, where users express mission objectives in natural language and the system autonomously executes them through grounded, real‑time interactions. The proposed architecture combines an LLM‑based Agent Core with a Model Context Protocol (MCP) gateway and a Web‑of‑Drones abstraction based on W3C Web of Things (WoT) standards. By exposing drones, sensors, and services as standardized WoT Things, the framework enables structured tool‑based interaction, continuous state observation, and safe actuation without relying on code generation. We evaluate the framework using ArduPilot‑based simulation across four swarm missions and six state‑of‑the‑art LLMs. Results show that, despite strong reasoning abilities, current general‑purpose LLMs still struggle to achieve reliable execution ‑ even for simple swarm tasks ‑ when operating without explicit grounding and execution support. Task‑specific planning tools and runtime guardrails substantially improve robustness, while token consumption alone is not indicative of execution quality or reliability.
Authors: Prasoon Kumar, Akshay Deepak, Sandeep Kumar
Abstract: Reliable localization in GPS‑denied, visually degraded environments is critical for autonomous UAV opera‑ tions. This paper presents a systematic comparative evaluation of five V‑SLAM systems ORB‑SLAM3, DPVO, DROID‑SLAM, DUSt3R, and MASt3R spanning classical, deep learning, recurrent, and Vision Transformer (ViT) paradigms. Experiments are conducted on curated sequences from four public benchmarks (TUM RGB‑D, EuRoC MAV, UMA‑VI, SubT‑MRS) and a custom monocular indoor dataset under five controlled degradation conditions (normal, low light, dust haze, motion blur, and combined), with sub‑millimeter Vicon ground truth. Results show that ORB‑SLAM3 fails critically under severe degradation (62.4% overall TSR; 0% under dense haze), while learning‑based methods remain robust: MASt3R achieves the lowest degraded ATE (0.027 m) and DUSt3R the highest tracking success (96.5%). DPVO offers the best efficiency robustness trade‑off (18.6 FPS, 3.1 GB GPU memory, 86.1% TSR), making it the preferred choice for memory‑constrained embedded platforms. Embedded deployment analysis across NVIDIA Jetson platforms provides actionable guidelines for SLAM selection under SWaP‑constrained UAV scenarios.
Authors: Ana Maria Nascimento, Augusto Sales, Antonio Marcus Lima, Tiago Nascimento
Abstract: This work proposes a novel control and estimation approach for aerial manipulation of a cable‑suspended load using Unmanned Aerial Vehicles (UAVs). Common approaches in the state of the art have practical limitations, relying on direct load measurements and Lagrangian methods for dynamic modeling. The lack of a straightforward dynamic model of the system led us to propose adopting the Udwadia‑Kalaba method to explicitly incorporate the cable's geometric constraints. This formulation allowed for the consistent derivation of the tension force and its direct integration into the NMPC prediction model. Additionally, we propose a sensorless load state estimation based on the same geometric constraints. Results from real‑robot experiments demonstrated that the explicit inclusion of load dynamics in the optimization problem significantly reduces trajectory‑tracking errors and yields better overall performance compared to strategies based on incomplete models.
Authors: Victor L. Knoop, Serge P. Hoogendoorn
Abstract: Unmanned aerial vehicles (UAVs, or drones) are likely to significantly increase the amount of air traffic. If the skies are full of UAVs, they need to interact with each other, for instance by yielding or other evasive maneuvers. The aggregated movements of drones will create traffic patterns. Just like in current road traffic, the interactions will be very frequent, so a centralized computer managing these interactions is expected not to be possible. There is a long history of traffic flow theory and modeling for 1 dimensional (road) traffic; this has been expanded to 2 dimensional traffic (pedestrians). It is unclear how traffic flow theory works for 3 dimensional traffic. In this paper we show how drone traffic can interact in a decentralized way. For the microscopic description, we add asymmetric interaction rules. We show that without centralized control, we can have efficient and safe traffic. Moreover, we provide a framework that directly links microscopic interactions to macroscopic properties. For the macroscopic description, we formulate and apply a numerical scheme that integrates the competition of space by UAVs for multiple classes, directions and dimensions. We apply both the microscopic and macroscopic descriptions to analyze (emerging) patterns which may arise in 3D traffic flow. The current paper provides background to develop interaction rules for drone traffic. Currently, the drone traffic is taking its first steps, but once the aeronautic technique takes off, the legislation regarding drone interactions should be ready. To support so, and be able to assess traffic consequences of decisions, the traffic flow theory framework developed here is essential.
Authors: Steffen Knoblauch, Ram Kumar Muthusamy, Luis M. A. Bettencourt, Costas Velis, Pierre Chrzanowski, Edward Charles Anderson, Pete Masters, Innocent Maholi, Antonio Inguane, Levi Szamek, Alexander Zipf
Abstract: Managing municipal solid waste in rapidly urbanizing Sub‑Saharan Africa remains challenging due to dispersed informal dumping and limited high‑resolution datasets for spatial monitoring. We present an open‑access deep learning model for automated detection of openly dumped dispersed solid waste via crowdsourced UAV imagery, trained and evaluated across 29 regions in 10 countries, encompassing diverse environmental contexts. A deep learning model trained on manually annotated image tiles achieved excellent performance in detecting openly dumped dispersed solid waste across all study regions. Predicted distributions reveal heterogeneous accumulation patterns, ranging from localized hotspots ‑ often along waterways, where waste can exacerbate flood and public health risks ‑ to more dispersed litter across urban areas. Waste accumulation is most strongly associated with population density and indicators of lack of local infrastructure access, whereas its relationship with broader measures of regional development is weaker, highlighting the importance of fine‑scale data for understanding localized waste dynamics. By releasing the model, this study provides a ready‑to‑use tool for UAV imagery collected by municipalities and local mapping communities, enabling openly dumped dispersed solid waste monitoring without extensive technical expertise. This approach empowers local practitioners to convert UAV imagery into actionable insights, supporting targeted interventions and improved municipal solid waste management across Sub‑Saharan Africa.
Authors: Junhao Wei, Yanxiao Li, Dexing Yao, Yifu Zhao, Haochen Li, Qibin He, Baili Lu, Xiaofan Zou, Dingcheng Yang, Sio-Kei Im, Yapeng Wang, Xu Yang
Abstract: Agile unmanned aerial vehicle (UAV) navigation in cluttered environments demands a planning architecture that is both computationally efficient and structurally expressive enough to reason over multiple feasible motions. This paper presents SAGA, a robust self‑attention and goal‑aware anchor‑based planner for safe UAV autonomous navigation. SAGA formulates local planning as a one‑stage joint regression‑and‑ranking problem over a fixed lattice of motion anchors. Given a depth image and a body‑frame motion state, the planner predicts refined terminal states and planning scores for all anchors in a single forward pass, after which the best candidate is decoded into a dynamically feasible trajectory. The key idea of SAGA is to transform anchor‑aligned features into geometry‑aware tokens and perform cross‑anchor global reasoning with self‑attention. To preserve directional structure in the token space, we further introduce a polar positional encoding derived from anchor yaw and pitch. In addition, a goal‑aware modulation module injects velocity, acceleration, and target information into the token representation before final score prediction. Experiments in cluttered pillar‑map environments under maximum speed settings of 2.0, 3.0, and 4.0~m/s show that SAGA consistently achieves a 100% success rate, while YOPO drops from 90.91% to 62.50%, Ego‑planner from 71.43% to 52.63%, and Fast‑planner from 52.63% to 38.46%. Under the 4.0~m/s maximum speed setting, SAGA also improves average safety from 1.9843~m to 2.3888~m and minimum safety from 0.4390~m to 0.7576~m over YOPO, while reducing total flight time from 40.4631~s to 27.4901~s. The comparison with SAGA w/o PPE further shows that explicit polar positional encoding is critical for stable cross‑anchor reasoning and safe passage selection in cluttered scenes.
Authors: Houyi Qi, Minghui Liwang, Yuhan Su, Xianbin Wang
Abstract: Continuous and reliable service support is crucial for emerging latency‑sensitive and computation‑intensive applications in UAV‑assisted edge networks (UENs) due to operational dynamics and environmental uncertainty. Although conventional designs can improve coverage and computing efficiency, they often rely on instantaneous resource optimization or reactive handover, rendering ongoing services vulnerable to non‑negligible interruptions when the serving UAV degrades due to mobility, energy depletion, or channel dynamics. To avoid such post‑failure recovery, a promising approach is to prepare a successor UAV in advance, i.e., a standby UAV that reserves minimal resources and synchronizes service context for possible takeover. Thus, we consider a dynamic UEN architecture where each mobile user carries an ongoing computing mission requiring persistent service support, while UAVs provide wireless access and computing services under time‑varying network dynamics and stringent onboard energy constraints. To facilitate proactive and continuous service provisioning, we propose a forecasting‑driven proactive reservation‑based continuous service scheduling framework, termed Fresco. In Fresco, an LSTM‑based module is first used to predict short‑term disruption risks of ongoing missions from historical network observations. Guided by these predictions, an online risk‑aware successor matching scheme selects suitable standby UAVs for high‑risk missions under delay, resource, and energy constraints, while incorporating minimal communication/computation reservation and lightweight service‑context synchronization for efficient takeover preparation. Experiments show that Fresco significantly reduces service interruptions and improves mission continuity over reactive and non‑predictive baselines, with only modest reservation overhead.
Authors: Qinwei Huang, Rui Zuo, Simon Khan, Qinru Qiu
Abstract: Conventional federated learning assumes that greater learner participation improves training performance, by leveraging abundant, independently generated local data. However, in federated reinforcement learning (FRL) for unmanned aerial vehicle (UAV) teams in hazardous environments where experience generation is severely constrained by safety considerations, energy limitations, and mission duration, this assumption may break. This work introduces Experience‑Constrained Hierarchical Federated Reinforcement Learning (EC‑HFRL), a framework in which clusters act as federated learning agents, while multiple intra‑cluster learners represent parallel learning resources that reuse a shared experience pool. We show that increasing participation does not necessarily improve learning performance. Instead, learning performance is strongly associated with experience reuse strategy and the dominance of key analytically identified gradient transition experiences within a cluster. In particular, minibatch size primarily determines effective replay exposure, while higher intra‑cluster participation increases reuse level. Empirical results demonstrate that the performance regimes are strongly associated with the structure of the learning signal, rather than federated aggregation effects, clarifying the limited and secondary role of learner participation in experience‑constrained FRL.
Authors: Natalia Trukhina, Vadim Vashkelis
Abstract: Bandwidth‑constrained robotic and surveillance systems often rely on a single compressed video stream to support both continuous scene awareness and downstream machine perception. In practice, this creates a mismatch: low‑bitrate video can preserve motion and coarse context, but often loses the fine local detail needed for reliable object recognition and decision‑making. Motivated by a hybrid architecture in which low‑resolution video supports dynamic scene understanding while eventdriven high‑detail regions of interest (ROIs) support close‑up identification and analytics, this paper formalizes a two‑channel visual telemetry scheme in which a continuous low‑bitrate video stream is augmented by selectively transmitted high‑detail still ROIs. This first paper does not attempt to prove the superiority of a new still‑image codec. Instead, it establishes the hybrid transmission paradigm itself using a practical and reproducible codec stack: x265/HEVC for the base video stream and JPEG stills for ROI refinement. We formulate the problem as bitrate‑constrained information selection for robotic vision and define an experimental protocol in which video‑only and hybrid schemes are compared under matched total communication budgets. The study is designed around UAV‑oriented datasets, two practical bitrate regimes, several ROI triggering policies, and object‑level classification refinement on selectively transmitted ROI stills. The resulting paper lays the methodological foundation for a second‑stage investigation of JPEG AI as the semantic still‑image channel within the same hybrid architecture.
Authors: Ashik Abrar Naeem, Mohammad Ariful Haque
Abstract: Autonomous navigation and obstacle avoidance remain a core challenge of modern Unmanned Aerial Vehicles (UAVs). While traditional control methods struggle with the complexity and variability of the environment, reinforcement learning (RL) enables UAVs to learn adaptive behaviors through interaction with the environment. Existing research with RL prioritizes the mission success at the expense of mission time and safety of UAVs. This study integrates Potential Based Reward Shaping (PBRS) with Control Lyapunov Functions (CLF) and Control Barrier Functions (CBF) to simultaneously optimize mission time and ensure formal safety guarantees. An RL model is trained in a generalized simple environment, then used in complex scenarios incorporating a CLF‑CBF‑QP filter without further training. Experimental results in simulated environments demonstrate a significant reduction in mission time and outstanding performance in complex environment.
Authors: Animesh Kumar Shastry, Mangal Kothari
Abstract: This work develops a unified optimal control framework for a Quadrotor Biplane tailsitter UAV capable of operating seamlessly across hover, transition, and cruise flight regimes. Although the tailsitter configuration enables mechanically simple mode switching, the transition maneuver remains challenging due to strong nonlinearities and rapidly varying aerodynamics. To address this, a trajectory optimization scheme based on nonlinear programming with direct collocation is formulated, incorporating nonlinear dynamics, actuator limits, and angle‑of‑attack constraints. The resulting optimal trajectories are safe, reliable, and time‑efficient. For the cruise‑to‑hover maneuver, optimal trajectories are generated over a range of initial cruise velocities and subsequently learned using feedforward multilayer neural networks. The learned model generalizes across operating conditions and enables real‑time generation of constraint‑satisfying transition trajectories. These trajectories provide both feedforward control inputs and reference state profiles, which are tracked using a Model Predictive Controller (MPC). The MPC eliminates the need for controller switching or gain scheduling across flight envelopes, enabling a single universal controller for hover, transition, and cruise. A nonlinear Dynamic Inversion (DI) controller is also designed for comparison. Two numerical schemes for MPC are implemented and evaluated. Simulation results across all flight modes demonstrate that MPC achieves superior robustness to parameter uncertainties compared to DI. A computational cost analysis further highlights the trade‑off between execution time and performance for the different MPC solvers.
Authors: Sina Sajjadi, Jacopo Panerati, Sina Soleymanpour, Varunkumar Mehta, Farrokh Janabi-Sharifi, Iraj Mantegh
Abstract: Autonomous landing in cluttered or unstructured environments remains a safety‑critical challenge for unmanned aerial vehicles (UAVs), particularly under noisy perception caused by sensor uncertainty and platform‑induced disturbances such as vibration. This paper presents an evidence‑based probabilistic framework for autonomous UAV landing that explicitly separates decision‑making under uncertainty from execution via visual servoing. Landing safety is modeled as a latent variable and inferred through recursive accumulation of frame‑wise visual likelihoods derived from flatness, slope, and obstacle cues, yielding a temporally consistent belief map that is robust to transient perception errors. Physical feasibility is enforced through a hard geometric constraint based on the minimum required landing radius of the UAV, ensuring that undersized but visually appealing regions are rejected. The final landing site is selected using constrained maximum a posteriori estimation. Once selected, the UAV locks onto the target region using ORB feature tracking and performs precise alignment and descent via image‑based visual servoing (IBVS). The proposed approach is validated through both real‑world laboratory experiments and high‑fidelity simulations in Nvidia Isaac Sim, demonstrating consistent, cautious, and stable landing behavior across domains.
Authors: Yufei Ye, Shijian Gao, Xinhu Zheng, Liuqing Yang
Abstract: In the context of 6G ubiquitous connectivity, the space‑air‑ground‑sea integrated network (SAGSIN) emerges as a new paradigm to provide critical services for resource‑limited ocean environments. To realize this paradigm efficiently, we propose an innovative dynamic task and resource scheduling approach for green SAGSIN that delivers computing support for vessels while minimizing overall task execution delay. To address the challenge of multi‑layer task scheduling, a layer‑wise task offloading algorithm is developed specifically for SAGSIN. It adapts to real‑time, multi‑dimensional system dynamics and integrates an anticipatory handover strategy that adaptively controls the amount of data offloaded to the satellite, thereby preventing post‑handover congestion while improving satellite resource utilization. Furthermore, the bandwidth allocation of uncrewed aerial vehicles and base station, UAV trajectories, and computing resource allocation are jointly optimized to enhance connectivity among low‑altitude devices and facilitate demand‑driven resource allocation for green network development. Simulation results verify that the proposed method better adapts to dynamic system resources and achieves at least a 23% reduction in average task delay compared with benchmarks.
Authors: Zhihao Zhan, Le Tao, Shaobin Li, Chenxin Fang, Xingrui Yang, Liang Li, Rui Fan, Yuhang Ming
Abstract: Accurate terrain perception is essential for terrain‑following flight of agricultural unmanned aerial vehicles (UAVs), yet remains challenging in real‑world farmland due to occlusions, complex terrain geometry, and environmental disturbances. Millimeter‑wave (mmWave) radar is a promising sensing modality for this task due to its robustness to adverse conditions; however, existing UAV‑mounted radar systems rely on fixed field of view (FoV) and terrain extraction methods designed for dense LiDAR data, leading to incomplete and unreliable terrain estimation. To address these limitations, we present a low‑cost rotating mmWave radar‑enabled terrain perception framework for agricultural UAVs operating in complex farmland environments. Specifically, a mechanically rotating sensing design is introduced to enlarge spatial coverage and improve terrain observability beyond the limitations of fixed‑view radar under dynamic low‑altitude flight. Building upon this sensing capability, we further design a pose‑consistent terrain reconstruction pipeline tailored for sparse, noisy, and partially observable radar data, enabling reliable ground extraction and continuous terrain surface estimation in challenging agricultural scenarios. The complete system is deployed on a real agricultural UAV platform and comprehensively evaluated through extensive field experiments. Experimental results demonstrate improved terrain coverage and estimation accuracy, achieving an F1 score of 94.42 for ground segmentation, while the closest rival only achieves 90.48. Thus, leading to more robust terrain following flight.
Authors: Alan Gomes, Anderson Gonçalves, Samuel Felipe dos Santos, Nathan Felipe Alves, Magna Soelma Beserra de Moura, Bruna de Costa Alberton, Leonor Patricia C. Morellato, Ricardo da Silva Torres, Jurandy Almeida
Abstract: Plant phenology‑the study of recurrent life cycle events‑is essential for understanding ecosystem dynamics and their responses to climate change impacts. While Unmanned Aerial Vehicles (UAVs) and near‑surface cameras enable high‑resolution monitoring, identifying plant species across time remains computationally challenging. State‑of‑the‑art approaches, specifically Multi‑Temporal Convolutional Networks (CNNs), rely on rigid multi‑branch architectures that scale poorly with longer time series and require large spatial context windows. In this paper, we present an extensive study on optimizing Vision Transformers (ViTs) for efficient spatio‑temporal vegetation pixel classification. We conducted a comprehensive ablation study analyzing seven key design dimensions, including: (i) data normalization; (ii) spectral arrangement; (iii) boundary handling; (iv) spatial context window shape and size; (v) tokenization strategies; (vi) positional encoding; and (vii) feature aggregation strategies. Our method was evaluated on two datasets from the Brazilian Cerrado biome, Serra do Cipó (aerial imagery) and Itirapina (near‑surface imagery). Experimental results demonstrate that our ViT approach offers a substantial improvement in computational efficiency while maintaining competitive classification performance. Notably, our ViT reduces Floating Point Operations (FLOPs) by an order of magnitude and maintains constant parameter complexity regardless of the time series length, whereas the CNN baseline scales linearly. Our findings confirm that ViTs are a robust, scalable solution for resource‑constrained phenological monitoring systems.
Authors: Udayanga G. W. K. N. Gamage, Yan Zeng, Cesar Cadena, Matteo Fumagalli, Silvia Tolu
Abstract: Real‑time object detection on energy‑constrained platforms is critical for applications such as UAV‑based inspection, autonomous navigation, and mobile robotics. Spiking neural networks (SNNs) on neuromorphic hardware are believed to be significantly more energy‑efficient than conventional artificial neural networks (ANNs). In this work, we present a comprehensive methodology for designing general SNN detection architectures targeting neuromorphic platforms, along with the engineering adaptations required to deploy them on the state‑of‑the‑art Neuromorphic processor, Intel Loihi 2. We benchmark SNN‑based object detection on Loihi 2 using both frame‑based and event‑based datasets, comparing performance with ANN‑based detection on the NVIDIA Jetson Orin Nano, NVIDIA Jetson Nano B01, and the Apple M2 CPU. Our results show that SNNs on Loihi 2 can perform real‑time detection while achieving the lowest per‑inference dynamic energy among all platforms. Also, Loihi 2 outperforms the other platforms in terms of power consumption, though ANNs on Jetson Orin Nano achieve higher inference rates. Furthermore, our ANN‑to‑SNN distillation‑aware training enables SNNs to recover 87‑100% of the detection accuracy of their ANN counterparts while maintaining lower inference latency; without distillation, SNNs exhibit an 11‑27% accuracy drop. These results highlight the potential of neuromorphic systems for energy‑efficient, real‑time object detection at the edge.
Authors: Wentao Chen, Jingtang Chen, Mingjian Fu, Tiantian Li, Youfeng Su, Wenxi Liu, Yuanlong Yu
Abstract: Deep reinforcement learning (DRL) finds extensive application in autonomous drone navigation within complex, high‑risk environments. However, its practical deployment faces a safety‑exploration dilemma: soft penalty mechanisms encourage risky trial‑and‑error, while most constraint‑based methods suffer degraded performance under sensor noise and intent uncertainty. We propose Dynamic‑TD3, a physically enhanced framework that enforces strict safety constraints while maintaining maneuverability by modeling navigation as a Constrained Markov Decision Process (CMDP). This framework integrates an Adaptive Trajectory Relational Evolution Mechanism (ATREM) to capture long‑range intentions and employs a Physically Aware Gated Kalman Filter (PAG‑KF) to mitigate non‑stationary observation noise. The resulting state representation drives a dual‑criterion policy that balances mission efficiency against hard safety constraints via Lagrangian relaxation. In experiments with aggressive dynamic threats, this approach demonstrates superior collision avoidance performance, reduced energy consumption, and smoother flight trajectories.
Authors: Akhil Gupta, Erhan Guven
Abstract: Accurate state estimation of nonlinear dynamical systems is fundamental to modern aerospace operations across air, sea, and space domains. Online tracking of adversarial unmanned aerial vehicles (UAVs) is especially challenging due to agile nonlinear motion, noisy and sparse sensor measurements, and unknown control inputs; conditions that violate key assumptions of classical Kalman filter variants and degrade estimation performance. Neural networks (NNs) can learn complex nonlinear relationships from data, but lack principled uncertainty quantification, which is critical for state estimation tasks where confidence bounds drive downstream decisions. We address this with Bayesian Neural Networks (BNNs), which model uncertainty through distributions over network weights and produce predictive means and uncertainties via Monte Carlo sampling. Building on this, we propose the Bayesian Neural Kalman Filter (BNKF): a hybrid framework coupling a trained BNN with a Kalman correction step for robust online UAV state estimation. Unlike related neural Kalman approaches, BNKF produces full state predictions and incorporates Bayesian uncertainty directly into covariance propagation, improving robustness under high noise conditions. We evaluate BNKF under varying radar noise levels and sampling rates using synthetic nonlinear UAV flight data. Five fold cross validation demonstrates that BNKF outperforms Extended and Unscented Kalman Filters in accuracy, precision, and truth containment under degraded sensing. An ensemble variant (BNKFe) further improves precision in high‑noise edge cases at a slight accuracy tradeoff. Runtime analysis confirms minimal inference overhead, supporting real‑time deployment feasibility.
Authors: Kaleem Arshid, Ali Krayani, Lucio Marcenaro, David Martin Gomez, Carlo Regazzoni
Abstract: This paper presents an expert‑guided active‑inference‑inspired framework for adaptive UAV swarm trajectory planning. The proposed method converts multi‑UAV trajectory design from a repeated combinatorial optimization problem into a hierarchical probabilistic inference problem. In the offline phase, a genetic‑algorithm planner with repulsive‑force collision avoidance (GA‑‑RF) generates expert demonstrations, which are abstracted into Mission, Route, and Motion dictionaries. These dictionaries are used to learn a probabilistic world model that captures how expert mission allocations induce route orders and how route orders induce motion‑level behaviors. During online operation, the UAV swarm evaluates candidate actions by forming posterior beliefs over symbolic states and minimizing KL‑divergence‑based abnormality indicators with respect to expert‑derived reference distributions. This enables mission allocation, route insertion, motion adaptation, and collision‑aware replanning without rerunning the offline optimizer. Bayesian state estimators, including EKF and PF modules, are integrated at the motion level to improve trajectory correction under uncertainty. Simulation results show that the proposed framework preserves expert‑like planning structure while producing smoother and more stable behavior than modified Q‑learning. Additional validation using real‑flight UAV trajectory data demonstrates that the learned world model can correct symbolic predictions under noisy and non‑smooth observations, supporting its applicability to adaptive UAV swarm autonomy.
Authors: Alexandre Anahory Simoes, Leonardo Colombo
Abstract: This paper considers the robust control of a catenary robot composed of two quadrotors connected by an inextensible cable. The system is modeled on \(SE(3)\), with the cable treated as a geometric subsystem induced by the UAV configuration rather than as an independent dynamical element. The catenary shape determines configuration‑dependent forces that couple the translational dynamics of the vehicles. We propose a geometric tracking controller for the relative configuration of the agents and analyze its robustness with respect to unstructured uncertainties in the catenary‑induced forces. The main theoretical result establishes local input‑to‑state stability of the closed‑loop tracking errors. In particular, we obtain asymptotic convergence in the nominal case and an explicit ultimate bound for the tracking errors under bounded catenary‑force perturbations.
Authors: Aygun Baltaci, Irshad A. Meer, Mustafa Ozger, Cicek Cavdar, Dominic Schupke
Abstract: Future uncrewed aerial vehicle (UAV) systems increasingly combine heterogeneous communication technologies, such as low‑latency aerial mesh, terrestrial cellular, and satellite links, to improve robustness and coverage. Multipath transport is a natural mechanism for aggregating these links, yet its ability to support real‑time UAV services in highly heterogeneous environments remains insufficiently characterized. We present a measurement‑driven study based on UAV flight experiments in an integrated network comprising UAV‑to‑UAV aerial mesh, private cellular, and low Earth orbit (LEO) satellite connectivity. Using Multipath TCP (MPTCP) as a representative lossless, in‑order multipath transport framework, we find that aggregation can preserve end‑to‑end connectivity under severe link outages. However, large round‑trip time (RTT) heterogeneity amplifies packet reordering, leading to substantial receiver‑side buffering and bursty delivery. In addition, when the available links do not provide sufficient capacity for the offered load, pronounced sender‑side buffering emerges. These effects cause real‑time streaming to violate delay constraints, including cases where aggregate capacity is sufficient. To interpret these results, we formalize the distinction between connectivity continuity and service continuity and show empirically that maintaining connectivity is necessary but not sufficient for timely real‑time delivery in multi‑technology UAV networks. The findings motivate multipath designs that explicitly account for delay constraints, rather than optimizing for connectivity alone.
Authors: Wei Li, Haisheng Li, Weijie Li, Jiandong Wang, Kaichen Ma, Luming Yang
Abstract: With the widespread application of Unmanned Aerial Vehicles (UAVs) in bridge structural health monitoring, deep learning‑based automatic crack detection has become a major research focus. However, practical UAV inspections still face four key challenges: weak crack features, degraded imaging conditions, severe class imbalance, and limited computational resources for practical UAV inspection workflows. To address these issues, this paper proposes a unified lightweight convolutional neural network framework composed of four synergistic components: a lightweight backbone network, a Convolutional Block Attention Module (CBAM) for channel and spatial enhancement, a directed robust augmentation strategy based on inspection‑scene priors, and Focal Loss for hard‑sample learning under class imbalance. Experiments on the SDNET2018 bridge deck dataset show that the proposed method achieves an inference speed of 825 FPS with only 11.21M parameters and 1.82G FLOPs. Compared with the baseline model, the complete framework improves the F1‑score by 2.51% and recall by 3.95%. In addition, Grad‑CAM visualizations indicate that the introduced attention module shifts the model's focus from scattered regions to precise tracking along crack trajectories. Overall, this study achieves a strong balance among accuracy, speed, and robustness, providing a practical solution for ground‑station assisted real‑time deployment in UAV bridge inspections. The source code is available at: https://github.com/skylynf/AttXNet .
Authors: Mahya Ramezani, Holger Voos
Abstract: This paper presents a hierarchical decision‑making framework for unmanned aerial vehicle (UAV) missions motivated by search‑and‑rescue (SAR) scenarios under limited simulation training. The framework combines a fixed rule‑based high‑level advisor with an online goal‑conditioned low‑level reinforcement learning (RL) controller. To stress‑test early adaptation, we also consider a strict no‑pretraining deployment regime. The high‑level advisor is defined offline from a structured task specification and compiled into deterministic rules. It provides interpretable mission‑ and safety‑aware guidance through recommended actions, avoided actions, and regime‑dependent arbitration weights. The low‑level controller learns online from task‑defined dense rewards and reuses experience through a mode‑aware prioritized replay mechanism augmented with rule‑derived metadata. We evaluate the framework on two tasks: battery‑aware multi‑goal delivery and moving‑target delivery in obstacle‑rich environments. Across both tasks, the proposed method improves early safety and sample efficiency primarily by reducing collision terminations, while preserving the ability to adapt online to scenario‑specific dynamics.
Authors: Xiaoya Cheng, Rouwan Wu, Xinyi Liu, Zeyu Cui, Yan Liu, Na Zhao, Yu Liu, Maojun Zhang, Shen Yan
Abstract: Despite the rapid progress in data‑driven 3D vision, aerial geometric 3D vision remains a formidable challenge due to the severe scarcity of large‑scale, high‑fidelity training data. Existing benchmarks, predominantly biased toward ground‑level or object‑centric views, do not account for complex viewpoint transformations and diverse environmental conditions in UAV‑based sensing. To bridge this critical gap, we propose AirZoo, a unified large‑scale dataset and benchmark for grounding aerial geometric 3D vision. AirZoo possesses three appealing properties: 1) Scalable Generation Pipeline: Leveraging freely available, world‑scale photogrammetric 3D meshes, it renders vast outdoor environments with customizable UAV flight trajectories and configurable weather/illumination. 2) Comprehensive Scene Diversity: It provides the most extensive coverage of region types to date (spanning 378 regions across 22 countries), systematically encompassing both highly structured urban landscapes and complex unstructured natural environments. 3) Rich Geometric Annotations: Each frame provides synchronized, pixel‑level metric depth and precise 6‑DoF geo‑referenced poses, essential for geometry‑aware learning. Through three rigorous evaluation tracks ‑‑ aerial image retrieval, cross‑view matching, and multi‑view 3D reconstruction ‑‑ we demonstrate that AirZoo serves as a powerful pre‑training engine. Extensive experiments on both public and newly collected real‑world benchmarks reveal that fine‑tuning on AirZoo yields substantial performance gains for SoTA models (e.g., MegaLoc, RoMa, VGGT, and Depth Anything 3), establishing a new performance upper bound for aerial spatial intelligence.
Authors: Ryan Allen, Melissa Greeff
Abstract: Reliable backup localization for unmanned aerial vehicles (UAVs) operating in GNSS‑denied nighttime conditions remains an open challenge due to the severe modality gap between daytime RGB maps and nighttime thermal imagery. This work presents a semantic reprojection framework for map‑relative nighttime UAV localization by aligning segmented thermal observations with a globally referenced, semantically labeled 3D map constructed from daytime RGB data. Rather than relying on appearance‑based correspondence, localization is formulated in a shared semantic domain and solved via a symmetric bidirectional reprojection objective with confusion‑aware weighting to improve robustness under segmentation uncertainty. The approach is evaluated offline across 6.5 km of nighttime, real‑world UAV flight trajectories in urban and semi‑structured environments. Relative to RTK GNSS ground truth, the system achieves a bias‑corrected RMSE2D of 2.18 m and a median RMSE2D of 1.52 m. Results show that localization performance is strongly correlated with the availability of semantic edge evidence and that large‑error events are spatially localized to semantically ambiguous areas rather than uniformly distributed. These findings indicate that semantic reprojection offers a promising pathway toward globally referenced nighttime UAV localization using thermal imagery alone.
Authors: Neeraj Varshney, Steve Blandino, Jian Wang, Anuraag Bodi, Camillo Gentile, Nada Golmie
Abstract: ISAC is currently being standardized within the 3GPP New Radio (NR) to enable cellular infrastructure to perform sensing using existing communication waveforms. While standardization is progressing, practical deployment may be limited by scenario‑dependent observability constraints. For example, in UMa‑AV scenarios, sensing with a single TRP can be affected by restricted angular coverage, partial blockage, and limited field of view, which may degrade detection reliability in three‑dimensional UAV environments. For this reason, multi‑TRP solutions have been suggested to improve spatial diversity and sensing robustness. In this paper, we present a system‑level investigation of multi‑TRP assisted monostatic sensing for UAV detection under standardized 3GPP UMa‑AV channel assumptions and Release 19 evaluation parameters. We propose a spatial diversity fusion framework and evaluate the achievable performance of a 3GPP network by combining the measurements obtained independently at different TRP. Extensive evaluations demonstrate that multi‑TRP assistance improves target observability, reduces spurious detections, and tightens localization error distributions at the cost of additional sensing overhead due to the need for multiple TRPs to periodically allocate radio resources for sensing measurements. In the evaluated scenario, results show that a voting threshold of two assisting TRPs achieves an optimal trade‑off between miss detection probability and false alarm suppression, meeting 3GPP performance objectives. Furthermore, we quantify the sensing overhead and show that proper system design, tuned to the application requirements, can substantially reduce its impact: for example, extending the sensing refresh interval beyond the 128 ms coherent processing interval to 1 s reduces the effective overhead from 29 % to approximately 3.7 %, enabling more scalable network deployment.
Authors: Thomas J. Neubert, Laxima Niure Kandel, Berker Peköz
Abstract: Open, unclassified research on secure autonomy is constrained by limited access to operational platforms, contested communications infrastructure, and representative adversarial test conditions. This paper presents a threat‑oriented digital twinning methodology for cybersecurity evaluation of learning‑enabled autonomous platforms. The approach is instantiated as an open‑source, modular twin of a representative autonomy stack with separated sensing, autonomy, and supervisory‑control functions; confidence‑gated multi‑modal perception; explicit command and telemetry trust boundaries; and runtime hold‑safe behavior. The contribution is methodological: a reproducible design pattern that translates threat analysis into observable, controllable tests for spoofing, replay, malformed‑input injection, degraded sensing, and adversarial ML stress. Although the implemented proxy is ground based, the architecture is intentionally framed around stack elements shared with UAV and space systems, including constrained onboard compute, intermittent or high‑latency links, probabilistic perception, and mission‑critical recovery behavior. The result is an implementable research scaffold for dependable and secure autonomy studies across UAV and space domains.
Authors: Khagendra Joshi, Deepak Kumar Sahoo, Kamalesh Kumar K, Debidas Kundu, Vivek A. Bohara, Amalendu Patnaik
Abstract: In this paper, a broadband 1‑bit coding metasurface‑based reconfigurable intelligent surface (RIS) is presented. The unit cell of the metasurface consists of a wide dipole modified with interdigital capacitors and loaded with an SMP 1340‑040LF PIN diode. The proposed element offers cell miniaturization and a stable angular response. A phase difference of 180\degree \pm 30\degree is achieved for a frequency range of 4.85‑6.05 GHz between the ON and OFF states for the normal incidence of the TE polarized wave, whereas it provides a fairly stable response with reflection loss of less than 3 dB and phase difference of 180\degree \pm 50\degree for oblique incidence up to 45\degree. The RF is isolated from the DC on the bias lines using properly designed butterfly‑shaped radial stubs. Using this unit cell, a prototype with an array of 16 × 10 elements is constructed. A low‑cost microcontroller‑based control circuit is designed, which can be plugged‑in for biasing the PIN diodes of such array. The theoretically calculated and full‑wave simulated radiation patterns of the array are validated using experiments inside anechoic chamber. Furthermore, the capability of the RIS for non‑line of sight (NLOS) user equipment (UE) localization and robust uplink communication is demonstrated using LTE communication framework. This shows great potential of our RIS for applications, such as in unmanned aerial vehicle (UAV) localization and its uplink communication at NLOS or extended range.
Authors: Fabian Dionys Schrag, Mehmet Ozgur Turkoglu, Konrad Schindler, Ralph Lukas Stoop
Abstract: Domain adaptation (DA) addresses the challenge of transferring a machine learning model trained on a source domain to a target domain with a different data distribution. In this work, we study DA for the task of Rumex obtusifolius (Rumex) image classification. We train models on a published, ground vehicle‑based dataset (source) and evaluate their performance on a custom target dataset acquired by unmanned aerial vehicles (UAVs). We find that Convolutional Neural Network (CNN) models, specifically ResNets, generalize poorly to the target domain, even after fine‑tuning on the source data. Applying moment‑matching and maximum classifier discrepancy, two established DA techniques, substantially improves target‑domain performance. However, Vision Transformer (ViT) models pretrained with self‑supervised objectives (DINOv2, DINOv3) handle domain shifts intrinsically well, surpassing even moment‑matching‑trained ResNets, likely due to the rich, general‑purpose representations acquired during large‑scale pretraining. Using ViTs fine‑tuned on the source dataset, we demonstrate high classification performances in the range of F1=0.8 on our target dataset. To support further research on DA for weed detection in grassland systems, we publicly release our UAV‑based target dataset AGSMultiRumex, comprising data from 15 flights over Swiss meadows.
Authors: Pradeep J, Siddhardha Kedarisetty, Ashwini Ratnoo
Abstract: This paper addresses the problem of traffic congestion management in fixed‑wing unmanned aerial vehicle (UAV) corridors by further developing a recently introduced loiter‑lane framework. A semi‑cooperative guidance strategy is developed for inserting fixed‑wing UAVs into a loiter lane with minimal disruption to the UAVs already operating within it, while enabling a more compact fixed‑wing UAV corridor. Building on the concepts of cooperative and non‑disruptive loiter‑lane insertion, the proposed strategy makes the incoming UAV first attempt, within its speed bounds, to rendezvous with an existing empty loiter slot. If direct insertion is infeasible, a minimal number of loitering UAVs perform coordinated slot hopping to create a suitably positioned empty slot. The feasibility and performance of the method are demonstrated through numerical simulations.
Authors: Ninh Nguyen, Srinivas Akella
Abstract: We study cooperative shortest path planning for an unmanned ground vehicle (UGV) assisted by an unmanned aerial vehicle (UAV) in environments with unknown road blockages that are only discovered when a robot reaches the damaged point. This formulation generalizes the original Canadian Traveller Problem (CTP), which assumes a single ground vehicle and that the traversability status of all incident edges is revealed upon arrival at a vertex. We first analyze the case where the start and the goal are connected by k disjoint paths, and prove that the worst‑case competitive ratio ρ for a single UGV is 2k‑1. With UAV assistance, and under the simplifying assumption of negligible initial transit and deadheading UAV costs, the ratio improves to ρ= 2\fracv_Gv_A + v_Gk ‑ 1, where v_G and v_A denote the UGV and UAV speed, respectively. To address general graphs and non‑negligible UAV initial transit and deadheading costs, we present an optimal path partitioning strategy that assigns path prefix inspection to the UGV and path suffix inspection to the UAV, and prove the optimality of the UAV inspection strategy on general graphs. We evaluate our algorithm by performing experiments on road networks from the world's 50 most populous cities, with randomized blockages, and show that the proposed method reduces UGV travel times by up to 30%.
Authors: Ninh Nguyen, Srinivas Akella
Abstract: This paper addresses the Dynamic UGV‑UAV Cooperative Path Planning (DUCPP) problem involving one unmanned ground vehicle (UGV) assisted by one or more unmanned aerial vehicles (UAVs) operating on an uncertain road network with potentially impassable edges. DUCPP is particularly relevant for scenarios such as disaster response, emergency supply transport, and rescue operations, where a UGV must reach a specified destination in the presence of partially unknown road conditions. To enable the UGV to travel safely and efficiently to its destination, the UAV(s) dynamically inspect edges in the environment to identify and prune damaged or impassable edges from consideration.
We present multiple strategies, including a bidirectional approach, to optimize UGV‑UAV cooperation for finding a safe path in an uncertain road network. Furthermore, we explore the impact of using multiple UAVs on reducing the UGV's travel time, and evaluate the associated computation time. The proposed strategies are implemented and evaluated on 100 urban road networks. The results demonstrate that the bidirectional strategy achieves the best performance in most instances, and using multiple UAVs further reduces UGV travel time at the expense of increased computation time. This paper presents a robust framework for DUCPP to achieve efficient UGV‑UAV cooperation for path planning and inspection, offering practical solutions for navigation in challenging and uncertain conditions.
Authors: Dhrumil Bhatt, Anakha Kurup
Abstract: Reliable and secure communication is essential for mission‑critical aerospace and defence operations involving autonomous platforms such as Unmanned Aerial Vehicles (UAVs), satellites, and ground control systems. In contested or dynamic environments, communication links are frequently exposed to jamming, interference, and cyberattacks, making network resilience a key operational requirement. This paper presents a trust‑aware Software‑Defined Networking (SDN) framework that enables secure, low‑latency failover between heterogeneous communication channels. The proposed architecture integrates a high‑bandwidth primary link (e.g., satellite or tactical LTE) with a low‑power fallback channel (e.g., RF or mesh), managed by an SDN controller that enforces zero‑trust routing policies. A real‑time Intrusion Detection System (IDS) continuously updates node trust scores; when trust or link reliability degrades, the controller autonomously switches traffic to the secondary channel, ensuring uninterrupted connectivity. Simulation results in a Mininet‑based test environment demonstrate sub‑5 ms failover latency, efficient flow installation, and significant reduction in packet loss compared with conventional single‑channel or static routing systems. The proposed framework provides a scalable and resilient communication backbone for next‑generation aerospace networks, enhancing mission reliability, cyber defence, and autonomous coordination across distributed aerial and space assets.
Authors: Jihao Luo, Zesong Fei, Xinyi Wang, Shuntian Tang, Zilong Liu, Yiqing Zhou
Abstract: The rapid growth of the low‑altitude economy drives increasingly autonomous unmanned aerial vehicle (UAV) operations, giving rise to low‑altitude embodied intelligence (LAEI), in which sensing, communication, computation, and control (SC^3) are tightly integrated to enable closed‑loop interaction, ensuring timely, effective, and safe responses in complex or unknown environments. This article systematically explores the LAEI networks, from its fundamental architecture to the diverse scenarios that it can support. We examine key enabling techniques that sustain timely information exchange and effective decision feedback within the \textSC^3 closed loop. A representative low‑altitude UAV mission in an unknown urban area is presented as a case study, where the UAV provides communication services and performs environmental sensing to inform closed‑loop control, illustrating how coordinated \textSC^3 capabilities enable efficient and responsive operation. By identifying major challenges and outlining future research directions, this work serves as a cornerstone for developing next‑generation low‑altitude intelligent systems.
Authors: Jinbao Li, Jiancheng An, Hao Liu, Lu Gan, Victor C. M. Leung, Mehdi Bennis, Mérouane Debbah
Abstract: Semantic communications (SemCom) is a promising paradigm that prioritizes the transmission of task‑relevant information, thereby enabling superior communication efficiency over traditional bit‑centric systems. However, most existing SemCom systems face critical limitations in computational efficiency and spatial flexibility. To overcome these limitations, we propose a novel unmanned aerial vehicles (UAV)‑enabled distributed electromagnetic neural network (EMNN) for a task‑oriented SemCom system. Specifically, the proposed distributed EMNN is composed of multiple UAV‑mounted stacked intelligent metasurfaces (SIM) and a ground receiving station (GRS), where multiple SIMs collaboratively encode image semantics in the wave domain, and the GRS performs decoding based on the received power distribution. Moreover, we employ a temperature‑adaptive gradient optimization algorithm to train the distributed EMNN, which mitigates gradient vanishing and enhances learning stability. Finally, the numerical simulation results demonstrate the effectiveness of distributed EMNN in image recognition task‑oriented SemCom, achieving an average 8% accuracy improvement over the single‑SIM baseline across multiple datasets.
Authors: Weiming Huang, Hao Sun, Junting Chen
Abstract: Unified 2D and 3D radio map construction supports network planning, wireless digital twins, and unmanned aerial vehicle (UAV) applications. In urban environments, blockage, reflection, and diffraction make accurate construction expensive for physics‑based solvers. Autoregressive next‑token prediction offers a single sequential formulation that can cover both 2D and 3D generation, but standard raster ordering ignores the spatial structure of radio propagation. When generation follows propagation, each token is predicted from propagation‑relevant history rather than spatially arbitrary context, which provides more causally informative conditioning and lowers conditional uncertainty. We propose PILOT, a pretrained autoregressive framework that replaces raster scan with a wavefront sequence expanding outward from the transmitter. Each prediction step is guided by an environment‑aware instruction that spatially aligns environment features with the queried radio map region. The same framework extends to 3D radio maps through height‑slice stacking while a gradient loss enforces vertical continuity. On standard 2D benchmarks, PILOT achieves the lowest NMSE among all baselines. For volumetric generation, it reduces NMSE by 78% relative to the diffusion baseline at roughly 2500× faster inference. It also outperforms methods that rely on 10% sparse measurements and achieves the best zero‑shot results in the cross‑domain evaluation.
Authors: Linyuan Wang, Haibo Yao, Te-Ming Tseng, Kelvin Betitame, Xin Sun, Hanbo Huang, Dong Chen
Abstract: Weeds compete with crops for light, water, and nutrients, reducing yield and crop quality. Efficient weed detection is essential for site‑specific weed management (SSWM). Although deep learning models have been deployed on UAV‑based edge systems, a systematic understanding of how different model architectures perform under real‑world resource constraints is still lacking. To address this gap, this study proposes a deployment‑oriented framework for real‑time UAV‑based weed detection on resource‑constrained edge platforms. The framework integrates UAV data acquisition, model development, and on‑device inference, with a focus on balancing detection accuracy and computational efficiency. A diverse set of state‑of‑the‑art object detection models is evaluated, including convolution‑based YOLO models (v8‑v12) and transformer‑based RT‑DETR models (v1‑v2). Experiments on three edge devices (Jetson Orin Nano, Jetson AGX Xavier, and Jetson AGX Orin) demonstrate clear trade‑offs between accuracy and inference latency across models and hardware configurations. Results show that high‑capacity models achieve up to 86.9% mAP50 but suffer from high latency, limiting real‑time deployment. In contrast, lightweight models achieve 66%‑71% mAP50 with significantly lower latency, enabling real‑time performance. Among all models, RT‑DETRv2‑R50‑M achieves competitive accuracy (79% mAP50) with improved efficiency, while YOLOv10n provides the fastest inference speed. YOLOv11s and RT‑DETRv2‑R50‑M offer the best balance between accuracy and speed, making them strong candidates for real‑time UAV deployment.
Authors: Zhenjia Xu, Xiaoling Zhang, Nan Qi, Guangxu Zhu, Xiaojie Li, Luliang Jia
Abstract: The low‑altitude Internet of Things (IoT), supported by unmanned aerial vehicles (UAVs), provides ground sensing networks with advanced real‑time monitoring and data collection. To maximize data collection volume from distributed IoT nodes, AI‑powered data collection technology plays a critical role in enabling intelligent decision‑making. Among them, deep reinforcement learning (DRL) has gained particular attention. However, existing DRL‑based work on UAV‑assisted IoT data collection rarely addresses challenges such as interference and dynamic data volume, while also suffering from high computational demands and slow convergence. To address these challenges, a hierarchical DRL (HDRL) is designed to optimize UAV trajectories and bandwidth allocation to maximize data collection volume. Firstly, the proposed scenario incorporates interference, dynamic data volume of IoT nodes, and multiple types of obstacles. The entire task is hierarchically structured: the upper‑level makes flight trajectory decisions at a coarse temporal granularity, while the lower‑level makes bandwidth allocation decisions at a finer temporal granularity. Secondly, a trajectory and bandwidth allocation optimization algorithm based on hierarchical deep deterministic policy gradients (TBH‑DDPG) is proposed to solve the problem. Finally, simulation results demonstrate that the proposed algorithm improves convergence speed by 44.44%, and reduces computational cost by 58.05%, compared to non‑hierarchical algorithm.
Authors: Teighin Nordholt, Melissa Greeff
Abstract: Autonomous multirotor landings on uncrewed surface vessels (USVs) are critical for persistent maritime operations but remain challenging due to wave‑induced tilt, wind disturbances, and limited landing area. Many existing approaches exhibit small pose tolerance for reliable landing. This paper presents a lightweight toggleable adhesion mechanism to improve landing reliability. The system uses a motor‑driven corkscrew that engages hook‑and‑loop material on the landing surface, enabling active adhesion during landing and controlled release during takeoff. We evaluate a prototype using a modified Crazyflie 2.0 and a custom tilting platform at fixed angles representative of extreme wave conditions. Using only a simple vertical PID controller, the proposed approach increases landing success from an average of 40% (baseline) to 80% across platform tilts up to 43 degrees using appropriately selected actuation settings.
Authors: Giulio Delama, Jan Michalczyk, Morten Nissov, Martin Scheiber, Alessandro Fornasier, Kostas Alexis, Stephan Weiss
Abstract: Radar‑Inertial Odometry (RIO) based on the Extended Kalman Filter (EKF) relies on accurate extrinsic calibration between the radar and the Inertial Measurement Unit (IMU) and is sensitive to disturbances, as large linearization errors can degrade performance or even cause divergence. To address these limitations, this letter proposes an Equivariant Filter (EqF) for RIO based on a Lie group symmetry that geometrically couples navigation states and IMU biases, extending it to incorporate radar‑IMU extrinsic calibration and multi‑state constraint updates. This equivariant formulation inherently preserves consistency and enhances robustness, enabling reliable state estimation even under poor or completely wrong initialization of calibration states. Real‑world experiments on two different Uncrewed Aerial Vehicles (UAVs) show that the proposed EqF‑RIO achieves state‑of‑the‑art accuracy under correct extrinsic calibration and offers improved convergence under large calibration errors, where the conventional EKF‑RIO fails. Evaluation code is open‑sourced.
Authors: Sulagna Saha, Arthur Ouaknine, Etienne Laliberté, Carol Altimas, Evan M. Gora, Adriane Esquivel Muelbert, Ian R. McGregor, Cesar Gutierrez, Vanessa E. Rubio, David Rolnick
Abstract: Accurate classification of tropical tree species from unoccupied aerial vehicle (UAV) imagery remains challenging due to high species diversity and strong visual similarity among species at typical image resolutions (centimeters per pixel). In contrast, models trained on close‑up citizen science photographs captured with smartphones achieve strong plant species classification performance. Recent advances in UAV data acquisition now enable the collection of close‑up images that are spatially registered with top‑view aerial imagery and approach the level of visual detail found in smartphone photographs, with the trade‑off that such high‑resolution photos cannot be acquired for many trees. In this work, we evaluate the performance of existing methods using paired top‑view and close‑up UAV imagery collected in a species‑rich tropical forest. Through fine‑tuning experiments, we quantify the performance gap between vision foundation models and in‑domain generalist plant recognition models across both image types (high‑resolution close‑up versus coarser‑resolution top‑view imagery). We show that classification performance is consistently higher on close‑up images than on top‑view aerial imagery, and that this performance gap widens for rare species. Finally, we propose that self‑supervised representation alignment across these two spatial scales offers a promising approach for integrating fine‑grained visual information into canopy‑level species classification models based on top‑view UAV imagery. Leveraging high‑resolution close‑up UAV imagery to enhance canopy‑level species classification could substantially improve large‑scale monitoring of tropical forest biodiversity.
Authors: Liam P. Burns, Dayse M. Cavalcanti, Felipe G. Cabral, Max H. de Queiroz, Melissa Greeff, Publio M. M. Lima, Karen Rudie
Abstract: Discrete‑event systems and supervisory control theory provide a rigorous framework for specifying correct‑by‑construction behavior. However, their practical application to swarm robotics remains largely underexplored. In this paper, we investigate a topological recovery method based on discrete‑event‑systems within a swarm robotics context. We propose a hybrid architecture that combines a high‑level discrete event systems supervisor with a low‑level continuous controller, allowing lost drones to safely recover from fault or attack events and re‑enter a controlled region. The method is demonstrated using ten simulated UAVs in the py‑bullet‑drones framework. We show recovery performance across four distinct scenarios, each with varying initial state estimates. Additionally, we introduce a secondary recovery supervisor that manages the regrouping process for a drone after it has re‑entered the operational region.
Authors: Yongying Liu, Jiaqi Wang, Jian Song, Xinlei Shao, Yijia Chen, Nan Xu, Katsunori Mizuno, Shigeru Tabeta, Fan Zhao
Abstract: Accurate quantification of the physical exposure area of beach litter, rather than simple item counts, is essential for credible ecological risk assessment of marine debris. However, automated UAV‑based monitoring predominantly relies on bounding‑box detection, which systematically overestimates the planar area of irregular litter objects. To address this geometric limitation, we develop PLAS‑Net (Pixel‑level Litter Area Segmentor), an instance segmentation framework that extracts pixel‑accurate physical footprints of coastal debris. Evaluated on UAV imagery from a monsoon‑driven pocket beach in Koh Tao, Thailand, PLAS‑Net achieves a mAP_50 of 58.7% with higher precision than eleven baseline models, demonstrating improved mask fidelity under complex coastal conditions. To illustrate how the accuracy of the masking affects the conclusions of environmental analysis, we conducted three downstream demonstrations: (i) power‑law fitting of normalized plastic density (NPD) to characterize fragmentation dynamics; (ii) area‑weighted ecological risk index (ERI) to map spatial pollution hotspots; and (iii) source composition analysis revealing the abundance‑area paradox: fishing gear constitutes a small proportion of the total number of items, but has the largest physical area per unit item. Pixel‑level area extraction can provide more valuable information for coastal monitoring compared to methods based solely on counting.
Authors: Angel Ayala, Donling Sui, Francisco Cruz, Mitchell Torok, Mohammad Deghat, Bruno J. T. Fernandes
Abstract: Autonomous Unmanned Aerial Vehicles (UAVs) have revolutionized industries through their versatility with applications including aerial surveillance, search and rescue, agriculture, and delivery. Their autonomous capabilities offer unique advantages, such as operating in large open space environments. Reinforcement Learning (RL) empowers UAVs to learn intricate navigation policies, enabling them to optimize flight behavior autonomously. However, one of its main challenge is the inefficiency in using data sample to achieve a good policy. In object‑goal navigation (OGN) settings, target recognition arises as an extra challenge. Most UAV‑related approaches use relative or absolute coordinates to move from an initial position to a predefined location, rather than to find the target directly. This study addresses the data sample efficiency issue in solving a 3D OGN problem, in addition to, the formalization of the unknown target location setting as a Markov decision process. Experiments are conducted to analyze the interplay of different state representation learning (SRL) methods for perception with a model‑free RL algorithm for planning in an autonomous navigation system. The main contribution of this study is the development of the perception module, featuring a novel self‑predictive model named AmelPred. Empirical results demonstrate that its stochastic version, AmelPredSto, is the best‑performing SRL model when combined with actor‑critic RL algorithms. The obtained results show substantial improvement in RL algorithms' efficiency by using AmelPredSto in solving the OGN problem.
Authors: Jess Stephenson, Melissa Greeff
Abstract: Landing UAVs on heaving marine platforms is challenging because relative vertical motion can generate large impact forces and cause rebound on touchdown. To address this, we develop an impact‑aware Model Predictive Control (MPC) framework that models landing as a velocity‑level rigid‑body impact governed by Newton's restitution law. We embed this as a linear complementarity problem (LCP) within the MPC dynamics to predict the discontinuous post‑impact velocity and suppress rebound. In simulation, restitution‑aware prediction reduces pre‑impact relative velocity and improves landing robustness. Experiments on a heaving‑deck testbed show an 86.2% reduction in post‑impact deflection compared to a tracking MPC.
Authors: Bowen Li, Jiping Luo, Themistoklis Charalambous, Nikolaos Pappas
Abstract: Timely information delivery in low‑altitude networks is critical for many time‑sensitive applications, such as unmanned aerial vehicle (UAV) navigation, inspection, and surveillance. The key challenge lies in balancing three competing factors: stringent data freshness requirements, UAV onboard energy consumption, and interference with terrestrial services. Addressing this challenge requires not only efficient power and channel allocation strategies but also effective communication timing over the entire operation horizon. In this work, we propose a model predictive communication (MPComm) framework, enabled by advanced channel sensing techniques, in which the channel conditions that the UAV will experience are largely predictable. Within this framework, we formulate a constrained bi‑objective optimization problem to achieve a desired trade‑off between energy consumption and terrestrial channel occupation, subject to a strict timeliness constraint. We solve this problem using Pareto analysis and show that the original non‑convex, mixed‑integer problem can be decomposed into a two‑layer structure: the outer layer determines the optimal communication timing, while the inner layer determines the optimal power and channel allocation for each communication interval. An efficient algorithm for the inner problem is developed using non‑convex analysis, with asymptotic optimality guarantees, while the outer problem is solved optimally via a simple graph search, with edges characterized by inner solutions. The proposed approach applies to a broad class of problem variants, including objective transformations and single‑objective specializations. Numerical results demonstrate the efficiency of the proposed solution, achieving up to a six‑fold reduction in terrestrial channel occupation and a 6dB energy saving compared to benchmark schemes.
Authors: Bhola, Yu-Jia Chen, Ashutosh Balakrishnan, Swades De, Li-Chun Wang
Abstract: The sixth generation (6G) communication networks are expected to provide high data rates, ultra‑reliable communication, and massive connectivity, especially in challenging environments such as dense urban areas and disaster‑affected regions. However, traditional terrestrial‑only networks face significant challenges in these scenarios, including signal blockages from high‑rise buildings, traffic congestion, and dynamic user distributions. To address these limitations, we propose the adaptive multi‑UAV deployment (AMUD) framework within satellite air‑ground integrated networks (SAGINs). The AMUD framework dynamically deploys amplify‑and‑forward multiple unmanned aerial vehicle relay (UAVr) in with low Earth orbit (LEO) satellites to improve coverage, alleviate congestion, and ensure reliable communication in non‑line‑of‑sight and high‑demand conditions. We formulate an optimization problem that aims to jointly maximize the energy efficiency of the total network and the total capacity while ensuring the fairness of the total capacity and satisfying the users' requirements. The simulation results demonstrate that AMUD improves the total capacity of the network, improves the total energy efficiency, and increases the fairness of the capacity compared to traditional LEO satellite and ground base station (LEO‑GBS) only systems.
Authors: Bingchen Cheng, Tielin Ma, Jingcheng Fu, Lulu Tao, Tianhui Guo
Abstract: To enable autonomous wind estimation for energy‑efficient flight in small unmanned aerial vehicles (UAVs), this study proposes a method that estimates flight states and wind using only the low‑cost essential onboard sensors required for autonomous flight, without relying on additional wind measurement devices. The core of the method includes an Extended Kalman Filter (EKF) integrated with the aerodynamic model and an Adaptive Moving Average Estimation (AMAE) technique, which improves the accuracy and smoothness of the wind estimation. Simulation results show that the approach efficiently estimates both steady and time‑varying 3D wind vectors without requiring flow angle measurements. The impact of aerodynamic model accuracy on wind estimation errors is also analyzed to assess practical applicability. Flight tests validate the effectiveness of the method and its feasibility for real‑time onboard computation. Additionally, uncertainties and error sources encountered during testing are systematically examined, providing a foundation for further refinement.
Authors: Amir Zamani, Zeinab Abedini
Abstract: Visual detection of Unmanned Aerial Vehicles (UAVs) is a critical task in surveillance systems due to their small physical size and environmental challenges. Although deep learning models have achieved significant progress, deploying them on edge devices necessitates the use of lightweight models, such as YOLOv11 Nano, which possess limited learning capacity. In this research, an efficient and context‑aware data augmentation pipeline, combining Mosaic strategies and HSV color‑space adaptation, is proposed to enhance the performance of these models. Experimental results on four standard datasets demonstrate that the proposed approach, compared to heavy and instance‑level methods like Copy‑Paste, not only prevents the generation of synthetic artifacts and overfitting but also significantly improves mean Average Precision (mAP) across all scenarios. Furthermore, the evaluation of generalization capability under foggy conditions revealed that the proposed method offers the optimal balance between Precision and stability for real‑time systems, whereas alternative methods, such as MixUp, are effective only in specific applications.
Authors: Conor Flynn, Radoslav Ivanov, Birsen Yazici
Abstract: With modern defense applications increasingly relying on inexpensive, small Unmanned Aerial Vehicles (UAVs), a major challenge lies in designing intelligent and computationally efficient onboard Automatic Target Recognition (ATR) algorithms to carry out operational objectives. This is especially critical in Synthetic Aperture Radar (SAR), where processing techniques such as ATR are often carried out post data collection, requiring onboard systems to bear the memory burden of storing the back‑scattered signals. To alleviate this high cost, we propose an online, direct, edge‑mapping technique which bypasses the image reconstruction step to classify scenes and targets. Furthermore, by reconstructing the scene as an edge‑map we inherently promote sparsity, requiring fewer measurements and computational power than classic SAR reconstruction algorithms such as backprojection.
Authors: Wen Li, Hui Wang, Jinya Su, Cunjia Liu, Wen-Hua Chen, Shihua Li
Abstract: Reliable pipeline inspection is critical to safe energy transportation, but is constrained by long distances, complex terrain, and risks to human inspectors. Unmanned aerial vehicles provide a flexible sensing platform, yet reliable autonomous inspection remains challenging. This paper presents an autonomous quadrotor near‑proximity pipeline inspection framework for three‑dimensional scenarios based on image‑based visual servoing model predictive control (VMPC). A unified predictive model couples quadrotor dynamics with image feature kinematics, enabling direct image‑space prediction within the control loop. To address low‑rate visual updates, measurement noise, and environmental uncertainties, an extended‑state Kalman filtering scheme with image feature prediction (ESKF‑PRE) is developed, and the estimated lumped disturbances are incorporated into the VMPC prediction model, yielding the ESKF‑PRE‑VMPC framework. A terrain‑adaptive velocity design is introduced to maintain the desired cruising speed while generating vertical velocity references over unknown terrain slopes without prior terrain information. The framework is validated in high‑fidelity Gazebo simulations and real‑world experiments. In real‑world tests, the proposed method reduces RMSE by 52.63% and 75.04% in pipeline orientation and lateral deviation in the image, respectively, for straight‑pipeline inspection without wind, and successfully completes both wind‑disturbance and bend‑pipeline tasks where baseline method fails. An open‑source nano quadrotor is modified for indoor experimentation.
Authors: Luiz Giacomossi, Håkan Forsberg, Ivan Tomasic, Baran Çürüklü, Tommaso Cucinotta
Abstract: Modern UAV architectures increasingly aim to unify high‑level autonomy and low‑level flight control on a single General‑Purpose Operating System (GPOS). However, complex multi‑core System‑on‑Chips (SoCs) introduce significant timing indeterminism due to shared resource contention. This paper performs an architectural analysis of the PREEMPT RT Linux kernel on a Raspberry Pi 5, specifically isolating the impact of kernel activation paths (deferred execution SoftIRQs versus real‑time direct activation) on a 250 Hz control loop. Results show that under heavy stress, the standard kernel is unsuitable, exhibiting worst‑case latencies exceeding 9 ms. In contrast, PREEMPT RT reduced the worst‑case latency by nearly 88 percent to under 225 microseconds, enforcing a direct wake‑up path that mitigates OS noise. These findings demonstrate that while PREEMPT RT resolves scheduling variance, the residual jitter on modern SoCs is primarily driven by hardware memory contention.
Authors: Lingxue Lyu
Abstract: Language‑guided unmanned aerial vehicles (UAVs) often fail not from bad reasoning or perception, but from execution
mismatch: the gap between a planned trajectory and the controller's ability to track it when the real dynamics differ
from training (mass changes, drag shifts, actuator delay, wind). We propose AeroBridge‑TTA, a language‑conditioned
control pipeline that targets this gap with test‑time adaptation. It has three parts: a language encoder that maps the
command into a subgoal, an adaptive policy conditioned on the subgoal and a learned latent, and a test‑time
adaptation (TTA) module that updates the latent online from observed transitions. On five language‑conditioned UAV
tasks under 13 mismatch conditions with the same domain randomization, AeroBridge‑TTA ties a strong PPO‑MLP baseline
in‑distribution and wins all 5 out‑of‑distribution (OOD) conditions, +22.0 pts on average (62.7% vs. 40.7%); the +8.5
pt overall gain comes entirely from the OOD regime. A same‑weights ablation that only changes the step size α
shows the latent update itself is responsible for a 4.6× OOD lift.
Authors: Sufian Al majmaie, Ghazal Ghajari, Niraj Prasad Bhatta, Fathi Amsaad
Abstract: The integration of Fog Computing with Flying Ad‑Hoc Networks (FANETs) offers promising capabilities for decentralized, low‑latency intelligence in UAV‑based applications. However, the distributed nature, mobility, and resource constraints of FANETs expose them to significant security and privacy challenges, particularly against quantum threats. To address these issues, this work introduces a blockchain‑based, AI‑enhanced key management framework designed for fog‑enabled FANETs. The proposed scheme employs a Post‑Quantum Multivariate Identity‑Based Signature Scheme (PQ‑MISS) and Zero‑Knowledge Proofs (ZKPs) to achieve secure key establishment, privacy‑preserving data aggregation, and integrity verification. A polynomial composition‑based encryption mechanism and an aggregate signature model support secure and efficient multi‑device communication across fog and UAV layers. Fog servers construct partial blockchain blocks from validated UAV data. These blocks are completed and mined by Cloud Servers (CSs). AI algorithms then analyze the verified data to generate accurate predictions and insights. NS‑3 simulations validate the efficiency of PQ‑MISS in reducing communication overhead while improving the speed and reliability of data aggregation and verification. Comparative analysis demonstrates the proposed scheme's advantages over existing methods in computational cost, post‑quantum security, and scalability, making it a robust solution for secure, intelligent, and future‑ready FANET systems.
Authors: Wenchi Cheng, Jingqing Wang, Zhuohui Yao, Wei Zhang
Abstract: To support mission‑critical services in emergency scenarios, wireless networks are required to provide stringent guarantees under massive Ultra‑Reliable Low‑Latency Communications (mURLLC) constraints. Distributed unmanned aerial vehicle (UAV)‑based massive multiple‑input multiple‑output (MIMO) architectures have recently emerged as a promising solution for rapidly deployable emergency communication systems. However, how to fundamentally characterize and guarantee statistical quality‑of‑service (QoS) for such systems in the finite blocklength regime remains largely unexplored. To overcome these challenges, in this paper we develop a fundamental analytical framework for delay and reliability bounded QoS guarantees in distributed UAV‑based massive MIMO emergency networks under finite blocklength coding (FBC). By rigorously modeling the stochastic service process of distributed massive MIMO fading channels, we derive statistical characterizations the delay and error‑rate bounded QoS exponents. We also establish QoS‑driven controlling functions, including the ε‑effective capacity and the feasible QoS region. Finally, the obtained simulation results validate and evaluate our developed modeling techniques and asymptotic formulations to support mURLLC.
Authors: Ravi Kumar Thakur, Luis Granados Segura, Jan Klivan, Radim Špetlík, Tobiáš Vinklárek, Matouš Vrba, Martin Saska
Abstract: Autonomous swarms of multi‑Unmanned Aerial Vehicle (UAV) system requires an accurate and fast relative state estimation. Although monocular frame‑based camera methods perform well in ideal conditions, they are slow, suffer scale ambiguity, and often struggle in visually challenging conditions. The advent of event cameras addresses these challenging tasks by providing low latency, high dynamic range, and microsecond‑level temporal resolution. This paper proposes a framework for relative state estimation for quadrotors using event‑based propeller sensing. The propellers in the event stream are tracked by detection to extract the region‑of‑interests. The event streams in these regions are processed in temporal chunks to estimate per‑propeller frequencies. These frequency measurements drive a kinematic state estimation module as a thrust input, while camera‑derived position measurements provide the update step. Additionally, we use geometric primitives derived from event streams to estimate the orientation of the quadrotor by fitting an ellipse over a propeller and backprojecting it to recover body‑frame tilt‑axis. The existing event‑based approaches for quadrotor state estimation use the propeller frequency in simulated flight sequences. Our approach estimates the propeller frequency under 3% error on a test dataset of five real‑world outdoor flight sequences, providing a method for decentralized relative localization for multi‑robot systems using event camera.
Authors: Sascha Emanuel Zell, Toni Schneidereit, Armin Fügenschuh, Michael Breuß
Abstract: Drowning is an omnipresent risk associated with any activity on or in the water, and rescuing a drowning person is particularly challenging because of the time pressure, making a short response time important. Further complicating water rescue are unsupervised and extensive swimming areas, precise localization of the target, and the transport of rescue personnel. Technical innovations can provide a remedy: We propose an Unmanned Aircraft System (UAS), also known as a drone‑in‑a‑box system, consisting of a fleet of Unmanned Aerial Vehicles (UAVs) allocated to purpose‑built hangars near swimming areas. In an emergency, the UAS can be deployed in addition to Standard Rescue Operation (SRO) equipment to locate the distressed person early by performing a fully automated Search and Rescue (S&R) operation and dropping a flotation device. In this paper, we address automatically locating distressed swimmers using the image‑based object detection architecture You Only Look Once (YOLO). We present a dataset created for this application and outline the training process. We evaluate the performance of YOLO versions 3, 5, and 8 and architecture sizes (nano, extra‑large) using Mean Average Precision (mAP) metrics mAP@.5 and mAP@.5:.95. Furthermore, we present two Discrete‑Event Simulation (DES) approaches to simulate response times of SRO and UAS‑based water rescue. This enables estimation of time savings relative to SRO when selecting the UAS configuration (type, number, and location of UAVs and hangars). Computational experiments for a test area in the Lusatian Lake District, Germany, show that UAS assistance shortens response time. Even a small UAS with two hangars, each containing one UAV, reduces response time by a factor of five compared to SRO.
Authors: Gautam Kumar, Amit Shivam, Ashwini Ratnoo
Abstract: This paper presents a decentralized, collision‑free framework for path following guidance of multiple uncrewed aerial vehicles (UAVs), while maintaining uniform spacing along a reference path. A vector field‑based guidance law is employed to drive each UAV toward the reference path. A rotational repulsion mechanism, utilizing relative distance and bearing between UAVs, is proposed to avoid collisions during convergence to the path, and an inter‑UAV spacing error‑based velocity control law is presented to achieve uniform separation along the path. Analytical guarantees are established for collision avoidance and convergence of the inter‑UAV spacing errors to zero, ensuring uniform separation along the path. Numerical simulations demonstrate the efficacy of the proposed method.
Authors: Riddhi Apte, Shubhada Gadgil, Gaurav Kasbekar, Rushikesh Patil, Prasanna Chaporkar
Abstract: Driven by the demands of 5G/Beyond 5G and 6G networks, Unmanned Aerial Vehicles (UAVs) have surfaced in critical roles for aerial communications. In the present survey, we explore the multi‑mode roles of UAVs as relays, User Equipment (UE), gNB and Reconfigurable Intelligent Surfaces (RIS), along with their deployment scenarios, architectural frameworks, and different communication models incorporating Artificial Intelligence (AI) configurations. We consider the effects of alternate power sources on the communication payload. The survey also aims to address security issues in the UAV communications. As an advancement, we propose a novel UAV‑Network‑in‑a‑Box (NIB) architecture for disaster recovery and temporary coverage as an alternative to traditional network infrastructure.
Authors: Yuwei Ning, Ganlong Zhao, Yipeng Qin, Si Liu, Yang Liu, Liang Lin, Guanbin Li
Abstract: Aerial Vision‑and‑Language Navigation (Aerial VLN) enables unmanned aerial vehicles (UAVs) to follow natural language instructions and navigate complex urban environments. While recent advances have achieved progress through large‑scale memory graphs and lookahead path planning, they remain limited by shallow instruction understanding and high computational cost. In particular, existing methods rely primarily on landmark descriptions, overlooking directional cues "a key source of spatial context in human navigation". In this work, we propose LookasideVLN, a new paradigm that exploits directional cues in natural language to achieve both more accurate spatial reasoning and greater computational efficiency. LookasideVLN comprises three core components: (1) an Egocentric Lookaside Graph (ELG) that dynamically encodes instruction‑relevant landmarks and their directional relationships, (2) a Spatial Landmark Knowledge Base (SLKB) that provides lightweight memory retrieval from prior navigation experiences, and (3) a Lookaside MLLM Navigation Agent that aligns multimodal information from user instructions, visual observations, and landmark‑direction information from ELG for path planning. Extensive experiments show that LookasideVLN significantly outperforms the state‑of‑the‑art CityNavAgent, even with a single‑level lookahead, demonstrating that leveraging directional cues is a powerful yet efficient strategy for Aerial VLN.
Authors: Yusuke Tsunoda, Yusuke Goto, Takao Sato
Abstract: In this study, we propose a new sheepdog‑inspired control method for a swarm of small unmanned aerial vehicles (UAVs), which predicts the swarm behavior while explicitly accounting for the motion constraints of real robots. Sheepdog‑inspired guidance control refers to a framework in which a small number of navigator agents (sheepdog agents) indirectly drive a large number of autonomous agents (a flock of sheep agents) so as to steer the group toward a target position. In conventional studies on sheepdog‑inspired guidance, both types of agents have typically been modeled as point masses, and the guidance law for the navigator agents has been designed using simple interaction vectors based on the instantaneous relative positions between the agents. However, when implementing such methods on real robots such as drones, it is necessary to consider each agent's motion constraints, including upper bounds on velocity and acceleration. Moreover, we argue that guidance can be made more efficient by predicting the future behavior of the autonomous swarm that is observable to the navigator agents. To this end, we propose a three‑dimensional guidance control law based on behavior prediction of autonomous agents under motion constraints, inspired by the Dynamic Window Approach (DWA). At each control cycle, the navigator agent generates a set of feasible motion candidates that satisfy its motion constraints, and predicts the short‑horizon swarm evolution using an internal model of the autonomous agents maintained within the navigator agent. The motion candidates are then evaluated according to criteria such as the progress velocity toward the target, the positioning strategy with respect to the swarm, and safety margins, and the optimal motion is selected to achieve safe and efficient guidance. Numerical simulation results demonstrate the effectiveness of the proposed guidance control law.
Authors: Yun-Ping Hsiao, Yanda Li, Youssef Gamal, Halima Bouzidi, Mohammad Abudllah Al Faruque
Abstract: As Cyber‑Physical Systems (CPS) become increasingly pervasive and autonomous, ensuring the resilience of their embedded logic is critical to maintaining safety and integrity. Among the most stealthy and damaging threats are non‑invasive fault injection attacks, where hardware‑level disturbances propagate into software execution and compromise control logic. In this paper, we investigate the susceptibility of Unmanned Aerial Vehicle (UAV) autopilot fail‑safe mechanisms to voltage glitch fault injection. We introduce a dual evaluation approach: software‑based fault simulation using ARMORY and hardware‑based experiments with a voltage glitching platform (Chip‑Whisperer), applying controlled and timely faults to an STM32 microcontroller running UAV‑Autopilot fail‑safe logic. Our targeted analysis of specific fail‑safe modes uncovers timing‑sensitive vulnerabilities that can suppress or alter safety responses, such as disabling emergency failsafe activation at critical moments, potentially enabling UAV hijacking. Furthermore, we validate software‑based fault injection results against real hardware behavior, demonstrating how simulated attacks translate into tangible risks for CPS security and reliability.
Authors: Craig Iaboni, Pramod Abichandani
Abstract: Reliable UAV object detection requires robustness to illumination changes, motion blur, and scene dynamics that suppress RGB cues. Thermal long‑wave infrared (LWIR) sensing preserves contrast in low light, and event cameras retain microsecond‑level temporal edges, but integrating all three modalities in a unified detector has not been systematically studied. We present a tri‑modal framework that processes RGB, thermal, and event data with a dual‑stream hierarchical vision transformer. At selected encoder depths, a Modality‑Aware Gated Exchange (MAGE) applies inter‑sensor channel and spatial gating, and a Bidirectional Token Exchange (BiTE) module performs bidirectional token‑level attention with depthwise‑pointwise refinement, producing resolution‑preserving fused maps for a standard feature pyramid and two‑stage detector.
We introduce a 10,489‑frame UAV dataset with synchronized and pre‑aligned RGB‑thermal‑event streams and 24,223 annotated vehicles across day and night flights. Through 61 controlled ablations, we evaluate fusion placement, mechanism (baseline MAGE+BiTE, CSSA, GAFF), modality subsets, and backbone capacity. Tri‑modal fusion improves over all dual‑modal baselines, with fusion depth having a significant effect and a lightweight CSSA variant recovering most of the benefit at minimal cost. This work provides the first systematic benchmark and modular backbone for tri‑modal UAV‑based object detection.
Authors: Vishal Ramesh, Antony Thomas
Abstract: Multi‑UAV inspection missions require spare drones to replace active drones during recharging cycles. Existing fleet‑sizing approaches often assume steady‑state operating conditions that do not apply to finite‑horizon missions, or they treat replacement requests as statistically independent events. The latter provides per‑request blocking guarantees that fail to translate to mission‑level reliability when demands cluster. This paper identifies a structural failure mode where efficient routing assigns similar workloads to each UAV, leading to synchronized battery depletion and replacement bursts that exhaust the spare pool even when average capacity is sufficient.
We derive a closed‑form sufficient fleet‑sizing rule, k = m(ceil(R) + 1), where m is the number of active UAVs and R is the recovery‑to‑active time ratio. This additive buffer of m spares absorbs worst‑case synchronized demand at recovery‑cycle boundaries and ensures mission‑level reliability even when all UAVs deplete simultaneously. Monte Carlo validation across five scenarios (m in [2, 10], R in [0.87, 3.39], 1000 trials each) shows that Erlang‑B sizing with a per‑request blocking target epsilon = 0.01 drops to 69.9% mission success at R = 3.39, with 95% of spare exhaustion events concentrated in the top‑decile 5‑minute demand windows. In contrast, the proposed rule maintains 99.8% success (Wilson 95% lower bound 99.3%) across all tested conditions, including wind variability up to CV = 0.30, while requiring only four additional drones in the most demanding scenario.
Authors: Huan Lin, Lianghui Ding
Abstract: Large‑scale Unmanned Aerial Vehicle (UAV) failures can split an unmanned aerial vehicle swarm network into disconnected sub‑networks, making decentralized recovery both urgent and difficult. Centralized recovery methods depend on global topology information and become communication‑heavy after severe fragmentation. Decentralized heuristics and multi‑agent reinforcement learning methods are easier to deploy, but their performance often degrades when the swarm scale and damage severity vary. We present Physics‑informed Graph Adversarial Imitation Learning algorithm (PhyGAIL) that adopts centralized training with decentralized execution. PhyGAIL builds bounded local interaction graphs from heterogeneous observations, and uses physics‑informed graph neural network to encode directional local interactions as gated message passing with explicit attraction and repulsion. This gives the policy a physically grounded coordination bias while keeping local observations scale‑invariant. It also uses scenario‑adaptive imitation learning to improve training under fragmented topologies and variable‑length recovery episodes. Our analysis establishes bounded local graph amplification, bounded interaction dynamics, and controlled variance of the terminal success signal. A policy trained on 20‑UAV swarms transfers directly to swarms of up to 500 UAVs without fine‑tuning, and achieves better performance across reconnection reliability, recovery speed, motion safety, and runtime efficiency than representative baselines.
Authors: Shuyan Ke, Yifan Mei, Changli Wu, Yonghan Zheng, Jiayi Ji, Liujuan Cao, Rongrong Ji
Abstract: Reasoning segmentation has recently expanded from ground‑level scenes to remote‑sensing imagery, yet UAV data poses distinct challenges, including oblique viewpoints, ultra‑high resolutions, and extreme scale variations. To address these issues, we formally define the UAV Reasoning Segmentation task and organize its semantic requirements into three dimensions: Spatial, Attribute, and Scene‑level reasoning. Based on this formulation, we construct DRSeg, a large‑scale benchmark for UAV reasoning segmentation, containing 10k high‑resolution aerial images paired with Chain‑of‑Thought QA supervision across all three reasoning types. As a benchmark companion, we introduce PixDLM, a simple yet effective pixel‑level multimodal language model that serves as a unified baseline for this task. Experiments on DRSeg establish strong baseline results and highlight the unique challenges of UAV reasoning segmentation, providing a solid foundation for future research.
Authors: Jianqiao Yu, Jia Li, Tianhua Gao
Abstract: This paper presents a two‑stage trajectory planning framework for a multi‑UAV rigid‑payload cascaded transportation system, aiming to address planning challenges in densely cluttered environments. In Stage I, an Enhanced Tube‑RRT algorithm is developed by integrating active hybrid sampling and an adaptive expansion strategy, enabling rapid generation of a safe and feasible virtual tube in environments with dense obstacles. Moreover, a trajectory smoothness cost is explicitly incorporated into the edge cost to reduce excessive turns and thereby mitigate cable‑induced oscillations. Simulation results demonstrate that the proposed Enhanced Tube‑RRT achieves a higher success rate and effective sampling rate than mixed‑sampling Tube‑RRT (STube‑RRT) and adaptive‑extension Tube‑RRT (AETube‑RRT), while producing a shorter optimal path with a smaller cumulative turning angle. In Stage II, a convex quadratic program is formulated by considering payload translational and rotational dynamics, cable tension constraints, and collision‑safety constraints, yielding a smooth, collision‑free desired payload trajectory. Finally, a centralized geometric control scheme is applied to the cascaded system to validate the effectiveness and feasibility of the proposed planning framework, offering a practical solution for payload attitude maneuvering in densely cluttered environments.
Authors: Michael R. Chang, Anna Candotti, Karl von Ellenrieder, Enrico Tomelleri, Marco Camurri
Abstract: We present a curated multi‑platform LiDAR reference dataset from an instrumented ICOS forest plot, explicitly designed to support calibration, benchmarking, and integration of 3D structural data with ecological observations and standard allometric models. The dataset integrates UAV‑borne laser scanning (ULS) to measure canopy coverage, terrestrial laser scanning (TLS) for detailed stem mapping, and backpack mobile laser scanning (MLS) with real‑time SLAM for efficient sub‑canopy acquisition. We focus on the control plot with the most complete and internally consistent registration, where TLS point clouds (~333 million points) are complemented by ULS and MLS data capturing canopy and understory strata. Marker‑free, SLAM‑aware protocols were used to reduce field and processing time, while manual and automated methods were combined. Final products are available in LAZ and E57 formats with UTM coordinates, together with registration reports for reproducibility. The dataset provides a benchmark for testing registration methods, evaluating scanning efficiency, and linking point clouds with segmentation, quantitative structure models, and allometric biomass estimation. By situating the acquisitions at a long‑term ICOS site, it is explicitly linked to 3D structure with decades of ecological and flux measurements. More broadly, it illustrates how TLS, MLS, and ULS can be combined for repeated inventories and digital twins of forest ecosystems.
Authors: Domonkos Varga
Abstract: Reliable evaluation is essential in machine learning research, yet methodological flaws‑particularly data leakage‑continue to undermine the validity of reported results. In this work, we investigate whether large language models (LLMs) can act as independent analytical agents capable of identifying such issues in published studies. As a case study, we analyze a gesture‑recognition paper reporting near‑perfect accuracy on a small, human‑centered dataset. We first show that the evaluation protocol is consistent with subject‑level data leakage due to non‑independent training and test splits. We then assess whether this flaw can be detected independently by six state‑of‑the‑art LLMs, each analyzing the original paper without prior context using an identical prompt. All models consistently identify the evaluation as flawed and attribute the reported performance to non‑independent data partitioning, supported by indicators such as overlapping learning curves, minimal generalization gap, and near‑perfect classification results. These findings suggest that LLMs can detect common methodological issues based solely on published artifacts. While not definitive, their consistent agreement highlights their potential as complementary tools for improving reproducibility and supporting scientific auditing.
Authors: Hanxuan Chen, Jie Zheng, Siqi Yang, Tianle Zeng, Siwei Feng, Songsheng Cheng, Ruilong Ren, Hanzhong Guo, Shuai Yuan, Xiangyue Wang, Kangli Wang, Ji Pei
Abstract: Vision‑and‑Language Navigation for Unmanned Aerial Vehicles (UAV‑VLN) represents a pivotal challenge in embodied artificial intelligence, focused on enabling UAVs to interpret high‑level human commands and execute long‑horizon tasks in complex 3D environments. This paper provides a comprehensive and structured survey of the field, from its formal task definition to the current state of the art. We establish a methodological taxonomy that charts the technological evolution from early modular and deep learning approaches to contemporary agentic systems driven by large foundation models, including Vision‑Language Models (VLMs), Vision‑Language‑Action (VLA) models, and the emerging integration of generative world models with VLA architectures for physically‑grounded reasoning. The survey systematically reviews the ecosystem of essential resources simulators, datasets, and evaluation metrics that facilitates standardized research. Furthermore, we conduct a critical analysis of the primary challenges impeding real‑world deployment: the simulation‑to‑reality gap, robust perception in dynamic outdoor settings, reasoning with linguistic ambiguity, and the efficient deployment of large models on resource‑constrained hardware. By synthesizing current benchmarks and limitations, this survey concludes by proposing a forward‑looking research roadmap to guide future inquiry into key frontiers such as multi‑agent swarm coordination and air‑ground collaborative robotics.
Authors: Sicheng Wu, Minghui Liwang, Yangyang Gao, Deqing Wang, Wenbo Zhu, Yiguang Hong, Wei Ni, Seyyedali Hosseinalipour
Abstract: In air‑ground integrated networks (AGINs), unmanned aerial vehicles (UAVs) provide on‑demand edge services to ground vehicles. Realizing this vision requires carefully designed incentives to coordinate interactions among self‑interested participants. This is exacerbated by the dynamic nature of AGINs, where spatio‑temporal variations introduce significant uncertainty in matching UAVs and vehicles. Existing real‑time service provisioning typically relies on precise trajectory information, raising privacy concerns and incurring decision latency. To address these challenges, we propose look one‑step ahead (LOSA), a novel framework for efficient and privacy‑aware service provisioning. By exploiting predictable vehicle travel times between intersections, LOSA decomposes the process into two coupled phases: (i) a privacy‑aware look‑ahead phase and (ii) a lightweight real‑time execution phase. The look‑ahead phase allows vehicles to adaptively adjust privacy budgets based on historical utility, balancing trajectory exposure and matching accuracy. Leveraging this, a double auction mechanism establishes binding one‑step‑ahead agreements (OSAAs) through trajectory similarity clustering, while constructing preference lists to hedge against mobility uncertainty. The execution phase then enforces pre‑established OSAAs and preference lists, resolving real‑time resource conflicts without costly re‑negotiations. This design reduces computational overhead and preserves robustness. We analytically corroborate that LOSA guarantees truthfulness, individual rationality, and budget balance. Experiments on real‑world datasets (DAIR‑V2X, HighD, and RCooper) demonstrate that LOSA achieves superior privacy protection while lowering transaction latency compared to baseline approaches.
Authors: Tianshun Li, Hongliang Lu, Yanggang Sheng, Zhongzhen Wang, Haoang Li, Xinhu Zheng
Abstract: Ensuring energy feasibility under wind uncertainty is critical for the safety and reliability of UAV delivery missions. In realistic truck‑drone logistics systems, UAVs must deliver parcels and safely return under time‑varying wind conditions that are only partially observable during flight. However, most existing routing approaches assume static or deterministic energy models, making them unreliable in dynamic wind environments. We propose Battery‑Efficient Routing (BER), an online risk‑sensitive planning framework for wind‑sensitive truck‑assisted UAV delivery. The problem is formulated as routing on a time dependent energy graph whose edge costs evolve according to wind‑induced aerodynamic effects. BER continuously evaluates return feasibility while balancing instantaneous energy expenditure and uncertainty‑aware risk. The approach is embedded in a hierarchical aerial‑ground delivery architecture that combines task allocation, routing, and decentralized trajectory execution. Extensive simulations on synthetic ER graphs generated in Unreal Engine environments and quasi‑real wind logs demonstrate that BER significantly improves mission success rates and reduces wind‑induced failures compared with static and greedy baselines. These results highlight the importance of integrating real‑time energy budgeting and environmental awareness for UAV delivery planning under dynamic wind conditions.
Authors: Yann V. Bellec
Abstract: Aerial object detection in UAV imagery presents unique challenges due to the high prevalence of tiny objects, adverse environmental conditions, and strict computational constraints. Standard YOLO‑based detectors fail to address these jointly: their minimum detection stride of 8 pixels renders sub‑32px objects nearly undetectable, their CIoU loss produces zero gradients for non‑overlapping tiny boxes, and their architectures contain significant filter redundancy. We propose DroneScan‑YOLO, a holistic system contribution that addresses these limitations through four coordinated design choices: (1) increased input resolution of 1280x1280 to maximize spatial detail for tiny objects, (2) RPA‑Block, a dynamic filter pruning mechanism based on lazy cosine‑similarity updates with a 10‑epoch warm‑up period, (3) MSFD, a lightweight P2 detection branch at stride 4 adding only 114,592 parameters (+1.1%), and (4) SAL‑NWD, a hybrid loss combining Normalized Wasserstein Distance with size‑adaptive CIoU weighting, integrated into YOLOv8's TaskAligned assignment pipeline. Evaluated on VisDrone2019‑DET, DroneScan‑YOLO achieves 55.3% mAP@50 and 35.6% mAP@50‑95, outperforming the YOLOv8s baseline by +16.6 and +12.3 points respectively, improving recall from 0.374 to 0.518, and maintaining 96.7 FPS inference speed with only +4.1% parameters. Gains are most pronounced on tiny object classes: bicycle AP@50 improves from 0.114 to 0.328 (+187%), and awning‑tricycle from 0.156 to 0.237 (+52%).
Authors: Koffi Titus Sergio Aglin, Anthony K. Muchiri, Celestin Nkundineza
Abstract: Reliable image quality assessment is essential in applications where large volumes of images are acquired automatically and must be filtered before further analysis. In many practical scenarios, a pristine reference image is unavailable, making no reference image quality assessment (NR‑IQA) particularly important. This paper introduces Multi‑Metric Image Quality Assessment (MM‑IQA), a lightweight multi‑metric framework for NR‑IQA. It combines interpretable cues related to blur, edge structure, low resolution artifacts, exposure imbalance, noise, haze, and frequency content to produce a single quality score in the range [0,100].MM‑IQA was evaluated on five benchmark datasets (KonIQ‑10k, LIVE Challenge, KADID‑10k, TID2013, and BIQ2021) and achieved SRCC values ranging from 0.647 to 0.830. Additional experiments on a synthetic agricultural dataset showed consistent behavior of the designed cues. The Python/OpenCV implementation required about 1.97 s per image. This method also has modest memory requirements because it stores only a limited number of intermediate grayscale, filtered, and frequency‑domain representations, resulting in memory usage that scales linearly with image size. The results show that MM‑IQA can be used for fast image quality screening with explicit distortion aware cues and modest computational cost.
Authors: Ping Huang, Bin Duo, Ziedor Godfred, Liuwei Huo, Jin Ning, Xiaojun Yuan, Jun Li
Abstract: Natural disasters often damage ground infrastructure, making unmanned aerial vehicles (UAVs) essential for emergency supply delivery. Yet safe operation in complex post‑disaster environments requires reliable command‑and‑control (C2) links; link instability can cause loss of control, delay rescue, and trigger severe secondary harm. To provide continuous three‑dimensional (3D) C2 coverage during dynamic missions, we propose a Heterogeneous Dual‑Network Framework (HDNF) for safe and reliable emergency delivery. HDNF tightly couples an Emergency Communication Support Network (ECSN), formed by hovering UAV base stations, with a Delivery Path Network (DPN), formed by fast‑moving delivery UAVs. The ECSN dynamically safeguards mission‑critical flight corridors, while the DPN aligns trajectories with reliable coverage regions. We formulate a joint optimization problem over task assignment, 3D UAV‑BS deployment, and DPN path planning to maximize end‑to‑end C2 reliability while minimizing UAV flight energy consumption and base‑station deployment cost. To solve this computationally intractable NP‑hard problem, we develop a layered strategy with three components: (i) a multi‑layer C2 service model that overcomes 2D‑metric limitations and aligns UAV‑BS deployment with mission‑critical 3D phases; (ii) a 3D coverage‑aware multi‑agent reinforcement learning algorithm that addresses the high‑dimensional search space and improves both training efficiency and topology resilience; and (iii) a 3D communication‑aware A planner that jointly optimizes C2 quality and flight energy, mitigating trajectory‑‑coverage mismatch and improving routing safety. Extensive simulations show that HDNF markedly improves C2 reliability, eliminates outages in critical phases, and sustains high task success rates while reducing hardware deployment cost.
Authors: Vangara Saiprudhvi, Sandeep Singh, Keshav Singh, Hariharan Subramaniyam, Chih-Peng Li
Abstract: Integrated terrestrial and non‑terrestrial networks (ITNTNs) are regarded as a key architectural paradigm for sixth‑generation (6G) wireless systems. This paper investigates a dual‑aerial reconfigurable intelligent surface (RIS)‑assisted ITNTN, where a terrestrial base station (TBS) and a satellite (SAT) jointly serve terrestrial and satellite users with the aid of an unmanned aerial vehicle (UAV)‑mounted RIS and a high‑altitude platform (HAP)‑mounted RIS. We formulate an average sum‑rate maximization problem by jointly optimizing the TBS and SAT precoders, the RIS phase shift matrices, and the three‑dimensional trajectories of the UAV and the HAP, subject to transmit power, unit‑modulus, and mobility constraints. The resulting optimization problem is highly non‑convex due to the strong coupling among the transmit precoders, RIS phase shifts, and aerial platform mobility. To efficiently address this challenge, we propose a block coordinate descent (BCD) framework that integrates weighted minimum mean square error (WMMSE) optimization for precoder design, a manifold‑based Riemannian conjugate gradient (RCG) method for RIS phase‑shift optimization, and successive convex approximation (SCA) for trajectory optimization. The proposed algorithm is shown to converge to a stationary point. The simulation results show that the proposed joint design achieves an approximately 7.05 % higher average sum‑rate compared to the random RIS scheme, highlighting the effectiveness of dual‑aerial RIS deployment and joint communication‑mobility optimization in ITNTNs.
Authors: Vangara Saiprudhvi, Keshav Singh, Hariharan Subramaniyam, Chih-Peng Li
Abstract: Integrated terrestrial and non‑terrestrial networks (ITNTNs) are envisioned as a key paradigm for sixth‑generation (6G) wireless systems, enabling seamless global connectivity. In this paper, we investigate a dual‑aerial active reconfigurable intelligent surface (ARIS)‑assisted non‑orthogonal multiple access (NOMA)‑based ITNTN, where a terrestrial base station (TBS) and a satellite (SAT) simultaneously serve terrestrial and satellite users with the aid of a UAV‑mounted ARIS and a HAP‑mounted ARIS. Users are multiplexed via power‑domain NOMA with a predefined SIC decoding order. We formulate an average sum‑rate maximization problem by jointly optimizing transmit beamforming, ARIS coefficients, and the 3D trajectories of the UAV and HAP, subject to power, unit‑modulus, ARIS power, and mobility constraints. The problem is highly non‑convex due to coupled variables, nonlinear SINR expressions, ARIS amplification, and trajectory‑dependent channels. To address this, a block coordinate descent (BCD)‑based framework is proposed. Specifically, beamforming is optimized via WMMSE, ARIS phase shifts via a manifold‑based RCG method, amplification factors via SCA, and trajectories via first‑order approximations. The proposed algorithm is guaranteed to converge to a stationary point. Simulation results demonstrate that the proposed design achieves significant performance gains over benchmark schemes. In particular, it provides an average sum‑rate improvement of approximately 8.44% over passive RIS under given power constraints, highlighting the benefits of dual‑aerial ARIS and joint communication‑mobility optimization.
Authors: Ou Zheng, Ruyi Feng, Yufeng Yang, Shengxuan Ding, Lishengsa Yue, Ye Li, Yunhan Zheng, Minwei Kong, Dingyi Zhuang, Ao Qu, Zhibin Li, Meng Li, Dongjie Wang, Wangyang Ying
Abstract: Intelligent Transportation Systems increasingly depend on heterogeneous data from roadside cameras, UAV imagery, LiDAR, and in‑vehicle sensors, yet the lack of unified data standards, model interfaces, and evaluation protocols across these sources hampers reproducibility, cross‑dataset benchmarking, and cross‑region transferability of research findings. Existing trajectory datasets follow incompatible conventions for coordinate systems, object representations, and metadata fields, forcing researchers to build custom preprocessing pipelines for each dataset and simulator combination. To address these challenges, we propose Ozone, a unified platform for transportation research organized around five interconnected layers ‑‑ Hardware, Data, Model, Evaluation, and Prototype ‑‑ each with standardized schemas, automated conversion pipelines, and interoperable interfaces. In the first release, the data schema unifies four trajectory datasets ‑‑ NGSIM, highD, CitySim, and UTE ‑‑ into a canonical format with oriented bounding boxes, kinematic variables, and pre‑computed surrogate safety measures. Digital‑twin maps in CARLA and calibrated traffic models provide integrated benchmarking environments. Case studies in human‑factor research, traffic scene generation, and safety‑critical modeling demonstrate that Ozone reduces experiment setup time by 85%, achieves 91% cross‑city transfer efficiency for safety models, and improves cross‑dataset reproducibility to within 3% variance. The source code and datasets are publicly available.
Authors: Wenhao Wang, Yanyan Li, Long Jiao, Jiawei Yuan
Abstract: Recent advances in large language models (LLMs) provide robots with contextual reasoning abilities to comprehend human instructions. Yet, current LLM‑enabled robots typically depend on cloud‑based models or high‑performance computing infrastructure, which limit their deployment on robots under unreliable internet environments or with constrained computational resources, such as UAVs and small ground vehicles. Thus, deploying fine‑tuned small language models (SLMs) that support onboard deployment offers a promising alternative. This paper introduces Ro‑SLM, a framework that enables reliable SLM‑driven robot operation by distilling LLMs' knowledge and reasoning. Ro‑SLM starts from dataset synthesis by leveraging LLMs to generate diverse task instructions, produce corresponding ground truth code with minimal human assistance, and augment instructions into real‑world application scenarios. Ro‑SLM is then fine‑tuned with the dataset, in which LLM serves as a reward function to guide the training. Extensive experiments on UAV operation tasks demonstrate that Ro‑SLM improves the performance of SLM from being incapable of supporting robotic task planning and code generation to achieving performance that approaches LLM.
Authors: Yizhe Zhang, Jianping Li, Liangliang Yin, Zhen Dong, Bisheng Yang
Abstract: Human‑in‑the‑loop (HITL) UAV operation is essential in complex and safety‑critical aerial surveying environments, where human operators provide navigation intent while onboard autonomy must maintain accurate and robust state estimation. A key challenge in this setting is that resource‑constrained UAV platforms are often limited to narrow‑field‑of‑view LiDAR sensors. In geometrically degenerate or feature‑sparse scenes, limited sensing coverage often weakens LiDAR Inertial Odometry (LIO)'s observability, causing drift accumulation, degraded geometric accuracy, and unstable state estimation, which directly compromise safe and effective HITL operation and the reliability of downstream surveying products. To overcome this limitation, we present AWARE, a bio‑inspired whole‑body active yawing framework that exploits the UAV's own rotational agility to extend the effective sensor horizon and improve LIO's observability without additional mechanical actuation. The core of AWARE is a differentiable Model Predictive Control (MPC) framework embedded in a Reinforcement Learning (RL) loop. It first identifies the viewing direction that maximizes information gain across the full yaw space, and a lightweight RL agent then adjusts the MPC cost weights online according to the current environmental context, enabling an adaptive balance between estimation accuracy and flight stability. A Safe Flight Corridor mechanism further ensures operational safety within this HITL paradigm by decoupling the operator's navigational intent from autonomous yaw optimization to enable safe and efficient cooperative control. We validate AWARE through extensive experiments in diverse simulated and real‑world environments.
Authors: Qinxiao Ma, Ruiqian Li, Cheng Wang, Yang Wang
Abstract: Unmanned aerial vehicles (UAVs) operating in confined, cluttered environments face significant performance degradation due to nonlinear, time‑varying unmodeled dynamics‑such as ground/ceiling effects and wake recirculation‑that are unaccounted for in traditional controllers. While learning based compensators (e.g., MLPs, TCNs, LSTMs) struggle with historical data dependency, vanishing gradients, and prohibitive computational costs, this work pioneers the integration of a deep photonic reservoir computer (PRC) with feedforward control to overcome these limitations. Harnessing semiconductor laser dynamics and optical feedback, our hardware implemented deep PRC architecture achieves intrinsic temporal memory without explicit historical inputs, while reducing training time from hours to milliseconds and slashing inference latency to nanoseconds. Reliable high‑performance CFD simulations capturing proximity‑induced flows demonstrate that deep PRC delivers residual‑force prediction accuracy comparable to or exceeding TCN/MLP baselines, while training only a linear readout layer via ridge regression. By injecting these predictions into a nonlinear feedback PID controller via a feedforward channel, the framework significantly enhances closed‑loop tracking stability in confined spaces. Essentially, this work establishes the first deep PRC‑based lightweight, ultrafast solution for real‑time UAV dynamic compensation, with promising extensibility to unseen scenarios with more complex fluid environments.
Authors: Yu Wu, Guangzeng Han, Ibra Niang Niang, Francia Ravelombola, Maiara Oliveira, Jason Davis, Dong Chen, Feng Lin, Xiaolei Huang
Abstract: To improve crop genetics, high‑throughput, effective and comprehensive phenotyping is a critical prerequisite. While such tasks were traditionally performed manually, recent advances in multimodal foundation models, especially in vision‑language models (VLMs), have enabled more automated and robust phenotypic analysis. However, plant science remains a particularly challenging domain for foundation models because it requires domain‑specific knowledge, fine‑grained visual interpretation, and complex biological and agronomic reasoning. To address this gap, we develop PlantXpert, an evidence‑grounded multimodal reasoning benchmark for soybean and cotton phenotyping. Our benchmark provides a structured and reproducible framework for agronomic adaptation of VLMs, and enables controlled comparison between base models and their domain‑adapted counterparts. We constructed a dataset comprising 385 digital images and more than 3,000 benchmark samples spanning key plant science domains including disease, pest control, weed management, and yield. The benchmark can assess diverse capabilities including visual expertise, quantitative reasoning, and multi‑step agronomic reasoning. A total of 11 state‑of‑the‑art VLMs were evaluated. The results indicate that task‑specific fine‑tuning leads to substantial improvement in accuracy, with models such as Qwen3‑VL‑4B and Qwen3‑VL‑30B achieving up to 78%. At the same time, gains from model scaling diminish beyond a certain capacity, generalization across soybean and cotton remains uneven, and quantitative as well as biologically grounded reasoning continue to pose substantial challenges. These findings suggest that PlantXpert can serve as a foundation for assessing evidence‑grounded agronomic reasoning and for advancing multimodal model development in plant science.
Authors: Negar Fathi
Abstract: Autonomous Unmanned Aerial Vehicles (UAVs) must reliably detect thin obstacles such as wires, poles, and branches to navigate safely in real‑world environments. These structures remain difficult to perceive because they occupy few pixels, often exhibit weak visual contrast, and are strongly affected by class imbalance. Existing segmentation methods primarily target coarser obstacles and do not fully exploit the complementary multimodal cues needed for thin‑structure perception. We present EDFNet, a modular early‑fusion segmentation framework that integrates RGB, depth, and edge information for thin‑obstacle perception in cluttered aerial scenes. We evaluate EDFNet on the Drone Depth and Obstacle Segmentation (DDOS) dataset across sixteen modality‑backbone configurations using U‑Net and DeepLabV3 in pretrained and non‑pretrained settings. The results show that early RGB‑Depth‑Edge fusion provides a competitive and well‑balanced baseline, with the most consistent gains appearing in boundary‑sensitive and recall‑oriented metrics. The pretrained RGBDE U‑Net achieves the best overall performance, with the highest Thin‑Structure Evaluation Score (0.244), mean IoU (0.219), and boundary IoU (0.234), while maintaining competitive runtime performance (19.62 FPS) on our evaluation hardware. However, performance on the rarest ultra‑thin categories remains low across all models, indicating that reliable ultra‑thin segmentation is still an open challenge. Overall, these findings position early RGB‑Depth‑Edge fusion as a practical and modular baseline for thin‑obstacle segmentation in UAV navigation.
Authors: Jinquan Yan, Zhicheng Zhao, Zhengzheng Tu, Chenglong Li, Jin Tang, Bin Luo
Abstract: UAV images are critical for applications such as large‑area mapping, infrastructure inspection, and emergency response. However, in real‑world flight environments, a single image is often affected by multiple degradation factors, including rain, haze, and noise, undermining downstream task performance. Current unified restoration approaches typically rely on implicit degradation representations that entangle multiple factors into a single condition, causing mutual interference among heterogeneous corrections. To this end, we propose DAME‑Net, a Degradation‑Aware Mixture‑of‑Experts Network that decouples explicit degradation perception from degradation‑conditioned reconstruction for compositional UAV image restoration. Specifically, we design a Factor‑wise Degradation Perception module(FDPM) to provide explicit per‑factor degradation cues for the restoration stage through multi‑label prediction with label‑similarity‑guided soft alignment, replacing implicit entangled conditions with interpretable and generalizable degradation descriptions. Moreover, we develop a Conditioned Decoupled MoE module(CDMM) that leverages these cues for stage‑wise conditioning, spatial‑frequency hybrid processing, and mask‑constrained decoupled expert routing, enabling selective factor‑specific correction while suppressing irrelevant interference. In addition, we construct the Multi‑Degradation UAV Restoration benchmark (MDUR), the first large‑scale UAV benchmark for compositional UAV image restoration, with 43 degradation configurations from single degradations to four‑factor composites and standardized seen/unseen splits.Extensive experiments on MDUR demonstrate consistent improvements over representative unified restoration methods, with greater gains on unseen and higher‑order composite degradations. Downstream experiments further validate benefits for UAV object detection.
Authors: Junhui Gao, Yan Pan, Qianru Wang, Wenzhe Hou, Yiqin Deng, Liangliang Jiang, Yuguang Fang
Abstract: Instant delivery, shipping items before critical deadlines, is essential in daily life. While multiple delivery agents, such as couriers, Unmanned Aerial Vehicles (UAVs), and crowdsourced agents, have been widely employed, each of them faces inherent limitations (e.g., low efficiency/labor shortages, flight control, and dynamic capabilities, respectively), preventing them from meeting the surging demands alone. This paper proposes TriDeliver, the first hierarchical cooperative framework, integrating human couriers, UAVs, and crowdsourced ground vehicles (GVs) for efficient instant delivery. To obtain the initial scheduling knowledge for GVs and UAVs as well as improve the cooperative delivery performance, we design a Transfer Learning (TL)‑based algorithm to extract delivery knowledge from couriers' behavioral history and transfer their knowledge to UAVs and GVs with fine‑tunings, which is then used to dispatch parcels for efficient delivery. Evaluated on one‑month real‑world trajectory and delivery datasets, it has been demonstrated that 1) by integrating couriers, UAVs, and crowdsourced GVs, TriDeliver reduces the delivery cost by 65.8% versus state‑of‑the‑art cooperative delivery by UAVs and couriers; 2) TriDeliver achieves further improvements in terms of delivery time (‑17.7%), delivery cost (‑9.8%), and impacts on original tasks of crowdsourced GVs (‑43.6%), even with the representation of the transferred knowledge by simple neural networks, respectively.
Authors: Wen Qiu, Zhiqiang He, Wei Zhao, Hiroshi Masui
Abstract: Unmanned aerial vehicles serving as aerial base stations can rapidly restore connectivity after disasters, yet abrupt changes in user mobility and traffic demands shift the quality of service trade‑offs and induce strong non‑stationarity. Deep reinforcement learning policies suffer from plasticity loss under such shifts, as representation collapse and neuron dormancy impair adaptation. We propose plasticity enhanced multi‑agent mixture of experts (PE‑MAMoE), a centralized training with decentralized execution framework built on multi‑agent proximal policy optimization. PE‑MAMoE equips each UAV with a sparsely gated mixture of experts actor whose router selects a single specialist per step. A non‑parametric Phase Controller injects brief, expert‑only stochastic perturbations after phase switches, resets the action log‑standard‑deviation, anneals entropy and learning rate, and schedules the router temperature, all to re‑plasticize the policy without destabilizing safe behaviors. We derive a dynamic regret bound showing the tracking error scales with both environment variation and cumulative noise energy. In a phase‑driven simulator with mobile users and 3GPP‑style channels, PE‑MAMoE improves normalized interquartile mean return by 26.3% over the best baseline, increases served‑user capacity by 12.8%, and reduces collisions by approximately 75%. Diagnostics confirm persistently higher expert feature rank and periodic dormant‑neuron recovery at regime switches.
Authors: Zexin Fang, Bin Han, Anjie Qiu, Zhuojun Tian, Hans D. Schotten
Abstract: Reliable positioning is essential for Uncrewed Aerial Vehicles (UAVs) in safety‑critical urban operations, yet achieving sub‑meter accuracy under stringent latency constraints remains challenging. While 3rd Generation Partnership Project (3GPP) specifies repeated Positioning Reference Signals (PRS) transmissions for accurate Time Difference of Arrival (TDoA) measurements, denoising techniques specifically tailored for extremely limited measurement sequences within 3GPP frameworks remain underexplored. We propose Adaptive Gain Exponential Smoother (AGES), a lightweight filter combining exponentially weighted averaging with adaptive gains informed by 3GPP measurement quality reports. Simulations demonstrate AGES achieves 30‑40% reduction in positioning error with only 3‑5 repeated measurements while maintaining Fifth Generation New Radio (5G‑NR) infrastructure compatibility.
Authors: Assane Sankara, Daniel Bonilla Licea, Hajar El Hammouti
Abstract: Unmanned Aerial Vehicles (UAVs) have emerged as a key enabler technology for data collection from Internet of Things (IoT) devices. However, effective data collection is challenged by resource constraints and the need for real‑time decision‑making. In this work, we propose a novel framework that integrates semantic communication with UAV command‑and‑control (C&C) to enable efficient image data collection from IoT devices. Each device uses Deep Joint Source‑Channel Coding (DeepJSCC) to generate a compact semantic latent representation of its image to enable image reconstruction even under partial transmission. A base station (BS) controls the UAV's trajectory by transmitting acceleration commands. The objective is to maximize the average quality of reconstructed images by maintaining proximity to each device for a sufficient duration within a fixed time horizon. To address the challenging trade‑off and account for delayed C&C signals, we model the problem as a Markov Decision Process and propose a Double Deep Q‑Learning (DDQN)‑based adaptive flight policy. Simulation results show that our approach outperforms baseline methods such as greedy and traveling salesman algorithms, in both device coverage and semantic reconstruction quality.
Authors: Xingyu Xia, Lekai Zhou, Yujie Tang, Xiaozhou Zhu, Hai Zhu, Wen Yao
Abstract: Aerial vision‑and‑language navigation (Aerial VLN) aims to enable unmanned aerial vehicles (UAVs) to interpret natural language instructions and autonomously navigate complex three‑dimensional environments by grounding language in visual perception. This survey provides a critical and analytical review of the Aerial VLN field, with particular attention to the recent integration of large language models (LLMs) and vision‑language models (VLMs). We first formally introduce the Aerial VLN problem and define two interaction paradigms: single‑instruction and dialog‑based, as foundational axes. We then organize the body of Aerial VLN methods into a taxonomy of five architectural categories: sequence‑to‑sequence and attention‑based methods, end‑to‑end LLM/VLM methods, hierarchical methods, multi‑agent methods, and dialog‑based navigation methods. For each category, we systematically analyze design rationales, technical trade‑offs, and reported performance. We critically assess the evaluation infrastructure for Aerial VLN, including datasets, simulation platforms, and metrics, and identify their gaps in scale, environmental diversity, real‑world grounding, and metric coverage. We consolidate cross‑method comparisons on shared benchmarks and analyze key architectural trade‑offs, including discrete versus continuous actions, end‑to‑end versus hierarchical designs, and the simulation‑to‑reality gap. Finally, we synthesize seven concrete open problems: long‑horizon instruction grounding, viewpoint robustness, scalable spatial representation, continuous 6‑DoF action execution, onboard deployment, benchmark standardization, and multi‑UAV swarm navigation, with specific research directions grounded in the evidence presented throughout the survey.
Authors: Xiaohuan Li, Junchuan Fan, Bingqi Zhang, Rong Yu, Xumin Huang, Qian Chen
Abstract: To implement the intelligent transportation digital twin (ITDT), unmanned aerial vehicles (UAVs) are scheduled to process the sensing data from the roadside sensors. At this time, generative artificial intelligence (GAI) technologies such as diffusion models are deployed on the UAVs to transform the raw sensing data into the high‑quality and valuable. Therefore, we propose the GAI‑empowered ITDT. The dynamic processing of a set of diffusion model inference (DMI) tasks on the UAVs with dynamic mobility simultaneously influences the DT updating fidelity and delay. In this paper, we investigate a joint optimization problem of DMI task offloading, inference optimization and UAV trajectory planning as the system utility maximization (SUM) problem to address the fidelity‑delay tradeoff for the GAI‑empowered ITDT. To seek a solution to the problem under the network dynamics, we model the SUM problem as the heterogeneous‑agent Markov decision process, and propose the sequential update‑based heterogeneous‑agent twin delayed deep deterministic policy gradient (SU‑HATD3) algorithm, which can quickly learn a near‑optimal solution. Numerical results demonstrate that compared with several baseline algorithms, the proposed algorithm has great advantages in improving the system utility and convergence rate.
Authors: Kota Kondo, Jesús Tordesillas, Jonathan P. How
Abstract: SANDO is a safe trajectory planner for 3D dynamic unknown environments, where obstacle locations and motions are unknown a priori and a collision‑free plan can become unsafe at any moment, requiring fast replanning. Existing soft‑constraint planners are fast but cannot guarantee collision‑free paths, while hard‑constraint methods ensure safety at the cost of longer computation. SANDO addresses this trade‑off through three contributions. First, a heat map‑based A global planner steers paths away from high‑risk regions using soft costs, and a spatiotemporal safe flight corridor (STSFC) generator produces time‑layered polytopes that inflate obstacles only by their worst‑case reachable set at each time layer, rather than by the worst case over the entire horizon. Second, trajectory optimization is formulated as a Mixed‑Integer Quadratic Program (MIQP) with hard collision‑avoidance constraints, and a variable elimination technique reduces the number of decision variables, enabling fast computation. Third, a formal safety analysis establishes collision‑free guarantees under explicit velocity‑bound and estimation‑error assumptions. Ablation studies show that variable elimination yields up to 7.4x speedup in optimization time, and that STSFCs are critical for feasibility in dense dynamic environments. Benchmark simulations against state‑of‑the‑art methods across standardized static benchmarks, obstacle‑rich static forests, and dynamic environments show that SANDO consistently achieves the highest success rate with no constraint violations across all difficulty levels; perception‑only experiments without ground truth obstacle information confirm robust performance under realistic sensing. Hardware experiments on a UAV with fully onboard planning, perception, and localization demonstrate six safe flights in static environments and ten safe flights among dynamic obstacles.
Authors: Zhaowen Fan, Rongchao Zhang
Abstract: Autonomous agents operating in dynamic and safety‑critical environments require decision‑making frameworks that are both computationally efficient and physically grounded. However, many existing approaches rely on end‑to‑end learning, which often lacks interpretability and explicit mechanisms for ensuring consistency with physical constraints. In this work, we propose an event‑centric world modeling framework with memory‑augmented retrieval for embodied decision‑making. The framework represents the environment as a structured set of semantic events, which are encoded into a permutation‑invariant latent representation. Decision‑making is performed via retrieval over a knowledge bank of prior experiences, where each entry associates an event representation with a corresponding maneuver. The final action is computed as a weighted combination of retrieved solutions, providing a transparent link between decision and stored experiences. The proposed design enables structured abstraction of dynamic environments and supports interpretable decision‑making through case‑based reasoning. In addition, incorporating physics‑informed knowledge into the retrieval process encourages the selection of maneuvers that are consistent with observed system dynamics. Experimental evaluation in UAV flight scenarios demonstrates that the framework operates within real‑time control constraints while maintaining interpretable and consistent behavior.
Authors: Md Sharif Hossen, Vijay K. Shah, Ismail Guvenc
Abstract: Cellular‑connected unmanned aerial vehicles (UAVs) operating in 5G New Radio (NR) macro networks experience severe and spatially non‑uniform downlink interference. This is primarily caused by the interference from the sidelobes of downtilted base station (BS) antennas serving terrestrial users, which limits the ability of the network to provide uniform and high‑quality coverage to aerial users. Supporting aerial users requires boosting the coverage of certain cells or sectors, which can further exacerbate inter‑cell interference in dense macro deployments. This motivates the need for inter‑cell interference coordination (ICIC) in multi‑cell 5G NR networks serving both aerial and terrestrial users. In this work, we propose an ICIC framework that jointly optimizes antenna‑domain coordination through BS uptilt angle optimization and time‑domain interference coordination (TDIC) through NR‑compliant scheduling. The framework is formulated as a multi‑cell NR macro deployment problem that maximizes the minimum UAV signal‑to‑interference ratio (SIR) over a spatial grid of UAV locations while maintaining acceptable performance for ground user equipment (GUEs). The resulting optimization problem is non‑convex and is solved using bio‑inspired optimization techniques, including particle swarm optimization (PSO) and genetic algorithm (GA). Simulation results demonstrate that coordinated uptilt optimization with the booster‑cell architecture significantly improves worst‑case UAV SIR and downlink reliability in multi‑cell 5G NR networks. booster‑cell architecture significantly improves worst‑case UAV SIR and downlink reliability in multi‑cell 5G NR networks.
Authors: Zhaochen Chu, Tao Song, Ren Jin, Shaoming He, Defu Lin, Siqing Cheng
Abstract: Air‑to‑air tracking of swarm UAVs presents significant challenges due to the complex nonlinear group motion and weak visual cues for small objects, which often cause detection failures, trajectory fragmentation, and identity switches. Although existing methods have attempted to improve performance by incorporating trajectory prediction, they model each object independently, neglecting the swarm‑level motion dependencies. Their limited integration between motion prediction and appearance representation also weakens the spatio‑temporal consistency required for tracking in visually ambiguous and cluttered environments, making it difficult to maintain coherent trajectories and reliable associations. To address these challenges, we propose SCT‑MOT, a tracking framework that integrates Swarm‑Coupled motion modeling and Trajectory‑guided feature fusion. First, we develop a Swarm Motion‑Aware Trajectory Prediction (SMTP) module jointly models historical trajectories and posture‑aware appearance features from a swarm‑level perspective, enabling more accurate forecasting of the nonlinear, coupled group trajectories. Second, we design a Trajectory‑Guided Spatio‑Temporal Feature Fusion (TG‑STFF) module aligns predicted positions with historical visual cues and deeply integrates them with current frame features, enhancing temporal consistency and spatial discriminability for weak objects. Extensive experiments on three public air‑to‑air swarm UAV tracking datasets, including AIRMOT, MOT‑FLY, and UAVSwarm, demonstrate that SMTP achieves more accurate trajectory forecasts and yields a 1.21% IDF1 improvement over the state‑of‑the‑art trajectory prediction module EqMotion when integrated into the same MOT framework. Overall, our SCT‑MOT consistently achieves superior accuracy and robustness compared to state‑of‑the‑art trackers across multiple metrics under complex swarm scenarios.
Authors: Sergio Vargas Villar, Simran Singh, Özgür Özdemir, Mihail L. Sichitiu, İsmail Güvenç
Abstract: This paper presents a field‑based evaluation of Long Range Wide Area Network (LoRaWAN) signal propagation conducted at two locations within the Aerial Experimentation and Research Platform for Advanced Wireless (AERPAW) testbed: Lake Wheeler Field and NC State University's Centennial Campus. Three distinct transmission platforms were deployed, a ground vehicle, a multirotor drone at 50 meters, and a helikite at a steady altitude of 150 meters and 300 meters approximately. These platforms enabled a comparative study on how altitude, mobility, and terrain influence wireless signal reception across a LoRaWAN gateway network. We analyze received signal strength (RSSI) and signal‑to‑noise ratio (SNR) as functions of distance and spreading factor (SF). Three complementary metrics are visualized: SNR versus distance with demodulation thresholds, probability of successful reception, and SNR boxplots grouped by distance bins. These plots reveal link degradation patterns and demonstrate the role of adaptive SF selection in maintaining communication reliability. To characterize propagation behavior, we apply a log‑distance path loss model to empirical data from the ground vehicle experiment, which encompass both rural and urban non‑line‑of‑sight (NLOS) conditions. Model parameters are optimized through error minimization techniques. Our results show that the helikite platform, due to its stable high‑altitude position, provided the most reliable and consistent link performance. Conversely, the drone and vehicle exhibited higher variability due to movement, obstructions, and terrain‑induced multipath. These findings demonstrate the influence of platform dynamics and altitude on LoRaWAN reception performance, providing support for future aerial network planning efforts.
Authors: Alberto Piccina, Massimiliano Bertoni, Angelo Cenedese, Giulia Michieletto
Abstract: From a maneuverability perspective, the main advantage of tilting multirotor UAVs lies in the dynamic variability of the feasible executable wrench, which represents a key asset for physical interaction tasks. Accordingly, cant‑angle selection should be optimized to ensure high performance while avoiding abrupt variations and preserving real‑world feasibility. In this context, this work proposes a lightweight control framework for star‑shaped interdependent cant‑tilting hexarotor UAVs performing interaction tasks. The method uses an offline‑computed look‑up table of zero‑moment force polytopes to identify feasible cant angles for a desired control force and select the optimal one by balancing efficiency and smoothness. The framework is integrated with a geometric full‑pose controller and validated through Monte Carlo simulations in MATLAB/Simulink and compared against a baseline strategy. The results show a significant reduction in computation time, together with improved pose‑tracking performance and competitive actuation efficiency. A final physics‑based simulation of a complete wall inspection task in Simscape further confirms the feasibility of the proposed strategy in interacting scenarios.
Authors: Marcello Sorge, Federico Ciresola, Giulia Michieletto, Angelo Cenedese
Abstract: This paper focuses on dynamic control allocation for a hexarotor UAV platform, considering a trajectory tracking task as as case study. It is assumed that the platform is dual‑tilting, meaning that it is able to tilt each propeller independently during flight, along two orthogonal axis. We present a hierarchical control structure composed of a high‑level controller generating the required wrench for the tracking task, and a control allocation law ensuring that the actuators produce such wrench. The allocator imposes desired first‑order dynamics on the actuators set, and exploits system redundancy to optimize the actuators state with respect to a given objective function. Unlike other studies on the subject, we explicitly model actuator saturation and provide theoretical insights on its effect on control performances. We also investigate the role of propeller tilt angles, by imposing asymmetric shapes in the objective function. Numerical simulations are presented to validate the allocation strategy.
Authors: Jintao Sun, Gangyi Ding, Donglin Di, Hu Zhang, Zhedong Zheng
Abstract: Vision‑Language Models have achieved strong progress in ground‑view visual understanding, yet they remain brittle in high‑altitude Unmanned Aerial Vehicle scenes, where objects are tiny and densely packed, textures are repetitive, and top‑down orientations are ambiguous. We introduce UAVReason, a large‑scale UAV‑native dataset and evaluation suite for studying unified aerial reasoning and generation under this nadir‑view domain shift. UAVReason aligns RGB imagery, depth maps, semantic segmentation masks, captions, and question‑answer pairs within a consistent aerial domain. It contains 23.6K captioned frames, 273K VQA pairs including 68.2K two‑frame temporal questions, and 188.8K cross‑modal generation samples across RGB, depth, and segmentation modalities. We further adapt UAVReason‑Bagel as a unified understanding‑and‑generation baseline that jointly optimizes language reasoning and dense visual generation objectives. Experiments show that general‑purpose VLMs and off‑the‑shelf unified generators struggle with UAV‑native grounding, while UAVReason‑Bagel substantially improves over its pretrained counterpart, increasing VQA‑1F F1 from 0.394 to 0.711, VQA‑2F F1 from 0.427 to 0.822, and heading‑aware VQA F1 from 0.798 to 0.973. For generation, it improves segmentation mIoU to 0.143 and reduces KID from 0.078 to 0.048 for depth‑segmentation‑text‑conditioned RGB synthesis. More importantly, our ablations reveal a bidirectional synergy between synthesis and reasoning. Dense generation objectives improve temporal semantic consistency, while language‑level reasoning regularizes sparse‑condition image synthesis. These results suggest that unified reasoning and generation provide effective geometry‑aware structural priors for physically grounded aerial intelligence. All data, code, and evaluation tools will be released.
Authors: Akram Hossain, Rabab Abdelfattah, Xiaofeng Wang, Kareem Abdelfatah
Abstract: The deployment of lightweight segmentation models on drones for autonomous power line inspection presents a critical challenge: maintaining reliable performance under real‑world conditions that differ from training data. Although compact architectures such as U‑Net enable real‑time onboard inference, their segmentation outputs can degrade unpredictably in adverse environments, raising safety concerns. In this work, we study the feasibility of using a large language model (LLM) as a semantic judge to assess the reliability of power line segmentation results produced by drone‑mounted models. Rather than introducing a new inspection system, we formalize a watchdog scenario in which an offboard LLM evaluates segmentation overlays and examine whether such a judge can be trusted to behave consistently and perceptually coherently. To this end, we design two evaluation protocols that analyze the judge's repeatability and sensitivity. First, we assess repeatability by repeatedly querying the LLM with identical inputs and fixed prompts, measuring the stability of its quality scores and confidence estimates. Second, we evaluate perceptual sensitivity by introducing controlled visual corruptions (fog, rain, snow, shadow, and sunflare) and analyzing how the judge's outputs respond to progressive degradation in segmentation quality. Our results show that the LLM produces highly consistent categorical judgments under identical conditions while exhibiting appropriate declines in confidence as visual reliability deteriorates. Moreover, the judge remains responsive to perceptual cues such as missing or misidentified power lines, even under challenging conditions. These findings suggest that, when carefully constrained, an LLM can serve as a reliable semantic judge for monitoring segmentation quality in safety‑critical aerial inspection tasks.
Authors: Selim Ahmet Iz, Francesco Nex, Norman Kerle, Henry Meissner, Ralf Berger
Abstract: Real‑time depth reconstruction from ultra‑high‑resolution UAV imagery is essential for time‑critical geospatial tasks such as disaster response, yet remains challenging due to wide‑baseline parallax, large image sizes, low‑texture or specular surfaces, occlusions, and strict computational constraints. Recent zero‑shot diffusion models offer fast per‑image dense predictions without task‑specific retraining, and require fewer labelled datasets than transformer‑based predictors while avoiding the rigid capture geometry requirement of classical multi‑view stereo. However, their probabilistic inference prevents reliable metric accuracy and temporal consistency across sequential frames and overlapping tiles. We present ZeD‑MAP, a cluster‑level framework that converts a test‑time diffusion depth model into a metrically consistent, SLAM‑like mapping pipeline by integrating incremental cluster‑based bundle adjustment (BA). Streamed UAV frames are grouped into overlapping clusters; periodic BA produces metrically consistent poses and sparse 3D tie‑points, which are reprojected into selected frames and used as metric guidance for diffusion‑based depth estimation. Validation on ground‑marker flights captured at approximately 50 m altitude (GSD is approximately 0.85 cm/px, corresponding to 2,650 square meters ground coverage per frame) with the DLR Modular Aerial Camera System (MACS) shows that our method achieves sub‑meter accuracy, with approximately 0.87 m error in the horizontal (XY) plane and 0.12 m in the vertical (Z) direction, while maintaining per‑image runtimes between 1.47 and 4.91 seconds. Results are subject to minor noise from manual point‑cloud annotation. These findings show that BA‑based metric guidance provides consistency comparable to classical photogrammetric methods while significantly accelerating processing, enabling real‑time 3D map generation.
Authors: Simran Singh, Anıl Gürses, Özgür Özdemir, Ram Asokan, Mihail L. Sichitiu, İsmail Güvenç, Rudra Dutta, Magreth Mushi
Abstract: The integration of cellular communication with Unmanned Aerial Vehicles (UAVs) extends the range of command and control and payload communications of autonomous UAV applications. Accurate modeling of this air‑to‑ground wireless environment aids UAV mission planning. Models built on and insights obtained from real‑life experiments intricately capture the variations in air‑to‑ground link quality with UAV position, offering more fidelity for simulations and system design than those that rely on generic theoretical models designed for ground scenarios or ray‑tracing simulations. In this work, we conduct aerial flights at the Aerial Experimentation and Research Platform for Advanced Wireless (AERPAW) Lake Wheeler testbed to study the variation in key performance indicators (KPIs) of a private 4G/5G cellular base station (BS) with the UAV's altitude, distance from the BS, elevation, and azimuth relative to the BS. Variations in 4G and 5G physical layer KPIs and application layer throughput are logged and analyzed, using two Android smartphones: a Keysight Nemo device, with enhanced KPI access, through a rooted operating system, and a standard smartphone running a custom application that utilizes open‑source Android APIs. The observed signal strength measurements are compared to theoretical predictions from free space path loss models that incorporate the BS antenna radiation patterns. Mathematical model parameters for polynomial curve approximations are derived to fit the observed data. Light machine learning approaches, namely random forests, gradient boosting regressors and neural networks, are used to model KPI behaviour as a function of UAV position relative to the BS. The insights and models generated from real‑life experiments in this study can serve as valuable tools in the design, simulation and deployment of cellular communication‑based UAV systems.
Authors: Maria G. Mendoza, Victoria Marie Tuck, Chinmay Maheshwari, Shankar Sastry
Abstract: A key challenge in disaster response is maintaining situational awareness of an evolving landscape, which requires balancing exploration of unobserved regions with sustained monitoring of changing Regions of Interest (ROIs). Unmanned Aerial Vehicles (UAVs) have emerged as an effective response tool, particularly in applications like environmental monitoring and search‑and‑rescue, due to their ability to provide aerial coverage, withstand hazardous conditions, and navigate quickly and flexibly. However, efficient and adaptable multi‑robot coverage with limited sensing in disaster settings and evolving time‑varying information maps remains a significant challenge, necessitating better methods for UAVs to continuously adapt their trajectories in response to changes. In this paper, we propose a decentralized multi‑agent coverage framework that serves as a high‑level planning strategy for adaptive coverage in unknown, time‑varying environments under partial observability. Each agent computes an adaptive ergodic policy, implemented via a Markov‑chain transition model, that tracks a continuously updated belief over the underlying importance map. Gaussian Processes are used to perform those online belief updates. The resulting policy drives agents to spend time in ROIs proportional to their estimated importance, while preserving sufficient exploration to detect and adapt to time‑varying environmental changes. Unlike existing approaches that assume known importance maps, require centralized coordination, or assume a static environment, our framework addresses the combined challenges of unknown, time‑varying distributions in a more realistic decentralized and partially observable setting. We compare against alternative coverage strategies and analyze our method's response to simulated disaster evolution, highlighting its improved adaptability and transient performance in dynamic scenarios.
Authors: Ali Akarma, Toqeer Ali Syed, Salman Jan, Hammad Muneer, Abdul Khadar Jilani
Abstract: The AI‑based sensing and autonomous monitoring have become the main components of wildfire early detection, but current systems do not provide adaptive inter‑agent coordination, structurally defined human control, and cryptographically verifiable responsibility. Purely autonomous alert dissemination in the context of safety critical disasters poses threats of false alarming, governance failure and lack of trust in the system. This paper provides a blockchain‑based governance‑conscious agentic AI architecture of trusted wildfire early warning. The monitoring of wildfires is modeled as a constrained partially observable Markov decision process (POMDP) that accounts for the detection latency, false alarms reduction and resource consumption with clear governance constraints. Hierarchical multi‑agent coordination means dynamic risk‑adaptive reallocation of unmanned aerial vehicles (UAVs). With risk‑adaptive policies, a permissioned blockchain layer sets mandatory human‑authorization as a state‑transition invariant as a smart contract. We build formal assurances such as integrity of alerts, human control, non‑repudiation and limited detection latency assumptions of Byzantine fault. Security analysis shows that it is resistant to alert injections, replays, and tampering attacks. High‑fidelity simulation environment experimental evaluation of governance enforcement demonstrates that it presents limited operational overhead and decreases false public alerts and maintains adaptive detection performance. This work is a step towards a principled design paradigm of reliable AI systems by incorporating accountability into the agentic control loop of disaster intelligence systems that demand safety in their application.
Authors: Tianhao Liang, Nanchi Su, Yuqi Ping, Guangyu Lei, Xinglin Chen, Longyu Zhou, Tingting Zhang, Qinyu Zhang, Tony Q. S. Quek
Abstract: The emerging low‑altitude economy has catalyzed the large‑scale deployment of unmanned aerial vehicles (UAVs), driving a paradigm shift in environment monitoring, logistics, and emergency response. However, operating within these environments presents notable challenges as pervasive coverage holes, unpredictable interference, and spectrum scarcity. To this end, this article present a communication and control co‑design framework to enable a resilient architecture for cellular‑connected UAVs. Specifically, we first characterize typical service applications and their stringent performance requirements, followed by a comprehensive analysis of the unique challenges. To bridge the gap between volatile wireless links and rigid flight stability, a three layered architecture is proposed, integrating pre‑flight strategic planning, in‑flight adaptive action, and system‑level resource orchestration. Furthermore, we detail the key enabling technologies for communication and control co‑design. Preliminary case studies are proposed to validate that the co‑design framework significantly improve the resilience of cellular‑connected UAV systems, providing a robust foundation for the evolution of intelligent low‑altitude networks.
Authors: Arthur Amorim, Paul Gazzillo, Max Taylor, Lance Joneckis
Abstract: Standard communication protocols for Unmanned Aerial Vehicles (UAVs), such as MAVLink, lack the capability to enforce the contextual validity of message sequences. Autopilots therefore remain vulnerable to stealthy attacks, where syntactically correct but semantically ill‑timed commands induce unsafe states without triggering physical anomaly detectors. Prior work (DATUM) demonstrated that global Refined Multiparty Session Types (RMPSTs) are an effective specification language for centralized MAVLink protocol enforcement, but suffered from two engineering failures: manual proof terms interleaved with protocol definitions, and an OCaml extraction backend whose managed runtime is incompatible with resource‑constrained UAV hardware. We present Platum, a framework that addresses both failures with a minimal DSL requiring only the five semantic components of a global session type (sender, receiver, label, payload variable, refinement predicate), whose structural well‑formedness conditions are confirmed via reflective decision procedures in Meta‑F. Confirmed specifications are compiled directly into flat, allocation‑free C Finite State Machines (FSMs), deployed as centralized proxy monitors at the GCS/UAV communication boundary. Our evaluation demonstrates a 4x reduction in total monitor latency and lower memory overhead compared to DATUM, measured via ArduPilot SITL simulation.
Authors: Yejia Liu, Hengle Jiang, Haoxian Liu, Runxi Huang, Xiaomin Ouyang
Abstract: 3D human pose estimation is a key enabling technology for applications such as healthcare monitoring, human‑robot collaboration, and immersive gaming, but real‑world deployment remains challenged by viewpoint variations. Existing methods struggle to generalize to unseen camera viewpoints, require large amounts of training data, and suffer from high inference latency. We propose MoViD, a viewpoint‑invariant 3D human pose estimation framework that disentangles viewpoint information from motion features. The key idea is to extract viewpoint information from intermediate pose features and leverage it to enhance both the robustness and efficiency of pose estimation. MoViD introduces a view estimator that models key joint relationships to predict viewpoint information, and an orthogonal projection module to disentangle motion and view features, further enhanced through physics‑grounded contrastive alignment across views. For real‑time edge deployment, MoViD employs a frame‑by‑frame inference pipeline with a view‑aware strategy that adaptively activates flip refinement based on the estimated viewpoint. Evaluations on nine public datasets and newly collected multiview UAV and gait analysis datasets show that MoViD reduces pose estimation error by over 24.2% compared to state‑of‑the‑art methods, maintains robust performance under severe occlusions with 60% less training data, and achieves real‑time inference at 15 FPS on NVIDIA edge devices.
Authors: Martin Zoula, Daniel Bonilla Licea, Jan Faigl, Václav Navrátil, Martin Saska
Abstract: The paper presents an approach for learning antenna Radiation Patterns (RPs) of a pair of heterogeneous quadrotor Uncrewed Aerial Vehicles (UAVs) by calibration flight data. RPs are modeled either as a Spherical Harmonics series or as a weighted average over inducing samples. Linear regression of polynomial coefficients simultaneously decouples the two independent UAVs' RPs. A joint calibration trajectory exploits available flight time in an obstacle‑free anechoic altitude. Evaluation on a real‑world dataset demonstrates the feasibility of learning both radiation patterns, achieving 3.6 dB RMS error, the measurement noise level. The proposed RP learning and decoupling can be exploited in rapid recalibration upon payload changes, thereby enabling precise autonomous path planning and swarm control in real‑world applications where setup changes are expected.
Authors: Linzuo Zhang, Yu Hu, Feng Yu, Yang Deng, Wenxian Yu, Danping Zou
Abstract: ‑Navigation through narrow and irregular gaps is an essential skill in autonomous drones for applications such as inspection, search‑and‑rescue, and disaster response. However, traditional planning and control methods rely on explicit gap extraction and measurement, while recent end‑to‑end approaches often assume regularly shaped gaps, leading to poor generalization and limited practicality. In this work, we present a fully vision‑based, end‑to‑end framework that maps depth images directly to control commands, enabling drones to traverse complex gaps within unseen environments. Operating in the Special Euclidean group SE(3), where position and orientation are tightly coupled, the framework leverages differentiable simulation, a Stop‑Gradient operator, and a Bimodal Initialization Distribution to achieve stable traversal through consecutive gaps. Two auxiliary prediction modules‑a gap‑crossing success classifier and a traversability predictor‑further enhance continuous navigation and safety. Extensive simulation and real‑world experiments demonstrate the approach's effectiveness, generalization capability, and practical robustness.
Authors: Gaoxiang Cao, Wenke Yuan, Yunpeng Hou, Huasen He, Quan Zheng, Jian Yang
Abstract: Vehicular Ad Hoc Networks (VANETs) play a crucial role in realizing vehicle‑road collaboration and intelligent transportation. However, urban VANETs often face challenges such as frequent link disconnections and subnet fragmentation, which hinder reliable connectivity. To address these issues, we dynamically deploy multiple Unmanned Aerial Vehicles (UAVs) as communication relays to enhance VANET. A novel Score based Dynamic Action Mask enhanced QMIX algorithm (Q‑SDAM) is proposed for multi‑UAV deployment, which maximizes vehicle connectivity while minimizing multi‑UAV energy consumption. Specifically, we design a score‑based dynamic action mask mechanism to guide UAV agents in exploring large action spaces, accelerate the learning process and enhance optimization performance. The practicality of Q‑SDAM is validated using real‑world datasets. We show that Q‑SDAM improves connectivity by 18.2% while reducing energy consumption by 66.6% compared with existing algorithms.
Authors: Steve Blandino, Neeraj Varshney, Jian Wang, Jack Chuang, Camillo Gentile, Nada Golmie
Abstract: 3GPP Release 19 has initiated the standardization of integrated sensing and communications (ISAC), including a channel model for monostatic sensing, evaluation scenarios, and performance assessment methodologies. These common assumptions provide an important basis for ISAC evaluation, but reproducible end‑to‑end studies still require a transparent sensing implementation. This paper evaluates 5G New Radio (NR) base station (gNB)‑based monostatic sensing for the Unmanned Aerial Vehicle (UAV) use case using a 5G NR downlink Cyclic Prefix‑Orthogonal Frequency Division Multiplexing (CP‑OFDM) waveform and positioning reference signals (PRS), following 3GPP Urban Macro‑Aerial Vehicle (UMa‑AV) scenario assumptions. We present an end‑to‑end processing chain for multi‑target detection and 3D localization, achieving more than 70% detection probability with less than 5% false alarm rate, in the considered scenario. For correctly detected targets, localization errors are on the order of a few meters, with a 90th‑percentile error of 4m and 6m in the vertical and horizontal directions, respectively. To support reproducible baseline studies and further research, we release the simulator 5GNRad, which reproduces our evaluation
Authors: Roger Fowler, Cahit Ikbal Er, Benjamin Johnsenberg, Yasin Yazicioglu
Abstract: We consider energy‑aware planning for an unmanned aerial vehicle (UAV) and unmanned ground vehicle (UGV) team operating in a stochastic environment. The UAV must visit a set of air points in minimum time while respecting energy constraints, relying on the UGV as a mobile charging station. Unlike prior work that assumed deterministic travel times or used fixed robustness margins, we model travel times as random variables and bound the probability of failure (energy depletion) across the entire mission to a user‑specified risk level. We formulate the problem as a Mixed‑Integer Program and propose PRO‑SPECT, a polynomial‑time algorithm that generates risk‑bounded plans. The algorithm supports both offline planning and online re‑planning, enabling the team to adapt to disturbances while preserving the risk bound. We provide theoretical results on solution feasibility and time complexity. We also demonstrate the performance of our method via numerical comparisons and simulations.
Authors: Dian Liu, Jie Feng, Di Li, Yuhui Zheng, Guanbin Li, Weisheng Dong, Guangming Shi
Abstract: Synergistic spatial intelligence between UAVs and satellites is indispensable for emergency response and security operations, as it uniquely integrates macro‑scale global coverage with dynamic, real‑time local perception. However, the capacity of Vision‑Language Models (VLMs) to master this complex interplay remains largely unexplored. This gap persists primarily because existing benchmarks are confined to isolated Unmanned Aerial Vehicle (UAV) videos or static satellite imagery, failing to evaluate the dynamic local‑to‑global spatial mapping essential for comprehensive cross‑view reasoning. To bridge this gap, we introduce LinkS^2Bench, the first comprehensive benchmark designed to evaluate VLMs' wide‑area, dynamic cross‑view spatial intelligence. LinkS^2Bench links 1,022 minutes of dynamic UAV footage with high‑resolution satellite imagery covering over 200 km^2. Through an LMM‑assisted pipeline and rigorous human annotation, we constructed 17.9k high‑quality question‑answer pairs comprising 12 fine‑grained tasks across four dimensions: perception, localization, relation, and reasoning. Evaluations of 18 representative VLMs reveal a substantial gap compared to human baselines, identifying accurate cross‑view dynamic alignment as the critical bottleneck. To alleviate this, we design a Cross‑View Alignment Adapter, demonstrating that explicit alignment significantly improves model performance. Furthermore, fine‑tuning experiments underscore the potential of LinkS^2Bench in advancing VLM adaptation for complex spatial reasoning.
Authors: Haoyuan Li, Wen Yang, Fang Xu, Hong Tan, Haijian Zhang, Shengyang Li, Gui-Song Xia
Abstract: Cross‑view geo‑localization for Unmanned Aerial Vehicles (UAVs) operating in GNSS‑denied environments remains challenging due to the severe geometric discrepancy between oblique UAV imagery and orthogonal satellite maps. Most existing methods address this problem through a decoupled pipeline of place retrieval and pose estimation, implicitly treating perspective distortion as appearance noise rather than an explicit geometric transformation. In this work, we propose a geometry‑aware UAV geo‑localization framework that explicitly models the 3D scene geometry to unify coarse place recognition and fine‑grained pose estimation within a single inference pipeline. Our approach reconstructs a local 3D scene from multi‑view UAV image sequences using a Visual Geometry Grounded Transformer (VGGT), and renders a virtual Bird's‑Eye View (BEV) representation that orthorectifies the UAV perspective to align with satellite imagery. This BEV serves as a geometric intermediary that enables robust cross‑view retrieval and provides spatial priors for accurate 3 Degrees of Freedom (3‑DoF) pose regression. To efficiently handle multiple location hypotheses, we introduce a Satellite‑wise Attention Block that isolates the interaction between each satellite candidate and the reconstructed UAV scene, preventing inter‑candidate interference while maintaining linear computational complexity. In addition, we release a recalibrated version of the University‑1652 dataset with precise coordinate annotations and spatial overlap analysis, enabling rigorous evaluation of end‑to‑end localization accuracy. Extensive experiments on the refined University‑1652 benchmark and SUES‑200 demonstrate that our method significantly outperforms state‑of‑the‑art baselines, achieving robust meter‑level localization accuracy and improved generalization in complex urban environments.
Authors: Anıl Gürses, John Kesler, Mihail L. Sichitiu
Abstract: Uncrewed Aerial Vehicle (UAV) networks require accurate Air‑to‑Air (A2A) channel models, but most existing work focuses on Air‑to‑Ground links and leaves the sub‑6 GHz A2A channel poorly characterized. We present preliminary 3.4 GHz A2A channel measurements collected with a lightweight, reconfigurable, open‑source channel sounder built from USRP B210 software‑defined radios and a high‑precision GNSS‑disciplined oscillator mounted on two UAVs. Measurements were conducted at the AERPAW Lake Wheeler testbed using a spherical flight trajectory around a second drone to capture channel behavior over varying altitudes, elevation angles, and relative headings. From these data, we analyze fundamental channel properties, extract channel impulse responses, model fading behavior as a function of link geometry, and characterize fading statistics including RMS delay spread. The resulting dataset and analysis provide a more realistic basis for the design, emulation, and evaluation of physical‑layer and MAC protocols for next‑generation UAV communication networks.
Authors: Tao Liu, Yingzhi Zhang, Kan Ren, Xiaoqi Zhao
Abstract: Drone‑view geo‑localization (DVGL) aims to determine the location of drones in GPS‑denied environments by retrieving the corresponding geotagged satellite tile from a reference gallery given UAV observations of a location. In many existing formulations, these observations are represented by a single oblique UAV image. In contrast, our satellite‑free setting is designed for multi‑view UAV sequences, which are used to construct a geometry‑normalized UAV‑side location representation before cross‑view retrieval. Existing approaches rely on satellite imagery during training, either through paired supervision or unsupervised alignment, which limits practical deployment when satellite data are unavailable or restricted. In this paper, we propose a satellite‑free training (SFT) framework that converts drone imagery into cross‑view compatible representations through three main stages: drone‑side 3D scene reconstruction, geometry‑based pseudo‑orthophoto generation, and satellite‑free feature aggregation for retrieval. Specifically, we first reconstruct dense 3D scenes from multi‑view drone images using 3D Gaussian splatting and project the reconstructed geometry into pseudo‑orthophotos via PCA‑guided orthographic projection. This rendering stage operates directly on reconstructed scene geometry without requiring camera parameters at rendering time. Next, we refine these orthophotos with lightweight geometry‑guided inpainting to obtain texture‑complete drone‑side views. Finally, we extract DINOv3 patch features from the generated orthophotos, learn a Fisher vector aggregation model solely from drone data, and reuse it at test time to encode satellite tiles for cross‑view retrieval. Experimental results on University‑1652 and SUES‑200 show that our SFT framework substantially outperforms satellite‑free generalization baselines and narrows the gap to methods trained with satellite imagery.
Authors: Hanbing Liang, Fujun Liu
Abstract: Macroscopic unmanned aerial vehicle (UAV) traffic organization in three‑dimensional airspace faces significant challenges from static wind fields and complex obstacles. A critical difficulty lies in simultaneously capturing the strong anisotropy induced by wind while strictly preserving transport consistency and boundary semantics, which are often compromised in standard physics‑informed learning approaches. To resolve this, we propose a constraint‑preserving hybrid solver that integrates a physics‑informed neural network for the anisotropic Eikonal value problem with a conservative finite‑volume method for steady density transport. These components are coupled through an outer Picard iteration with under‑relaxation, where the target condition is hard‑encoded and strictly conservative no‑flux boundaries are enforced during the transport step. We evaluate the framework on reproducible homing and point‑to‑point scenarios, effectively capturing value slices, induced‑motion patterns, and steady density structures such as bands and bottlenecks. Ultimately, our perspective emphasizes the value of a reproducible computational framework supported by transparent empirical diagnostics to enable the traceable assessment of macroscopic traffic phenomena.
Authors: Yiyang Wu, Xiaohu Zhang, Yanjin Du, Tongsu Zhang, Chujun Li, Siyang Chen, Guoyi Zhang, Xiangpeng Xu
Abstract: Accurate pose estimation is fundamental for unmanned aerial vehicle (UAV) applications, where Visual‑Inertial SLAM (VI‑SLAM) provides a cost‑effective solution for localization and mapping. However, existing VI‑SLAM methods mainly rely on sensors with limited fields of view (FoV), which can lead to drift and even failure in complex UAV scenarios. Although panoramic cameras provide omnidirectional perception to improve robustness, panoramic VI‑SLAM and corresponding real‑world datasets for UAVs remain underexplored. To address this limitation, we first construct a real‑world panoramic visual‑inertial dataset covering diverse flight conditions, including varying illumination, altitudes, trajectory lengths, and motion dynamics. To achieve accurate and robust pose estimation under such challenging UAV scenarios, we propose a panoramic VI‑SLAM framework that exploits the omnidirectional FoV via the proposed panoramic feature extraction and panoramic loop closure, enhancing feature constraints and ensuring global consistency. Extensive experiments on both the proposed dataset and public benchmarks demonstrate that our method achieves superior accuracy, robustness, and consistency compared to existing approaches. Moreover, deployment on embedded platform validates its practical applicability, achieving comparable computational efficiency to PC implementations.
The source code and dataset are publicly available at https://drive.google.com/file/d/1lG1Upn6yi‑N6tYpEHAt6dfR1uhzNtWbT/view
Authors: Rodolfo Verdin, Hugo Moreno, Mark W. Spong, Gerardo Flores
Abstract: This paper presents QuadSoft, a novel fully actuated quadrotor equipped with continuous‑curvature, tendon‑driven soft robotic arms. The design combines a semi‑rigid central frame with flexible arms, enabling controlled structural reconfiguration during flight without altering the propeller layout. Unlike existing soft aerial platforms that rely on discrete bending joints, QuadSoft utilizes a continuum deformation approach to modulate arm curvature, actively adjusting its thrust vector and aerodynamic characteristics. We characterize the geometric mapping between servomotor input and the resulting constant curvature, validating it experimentally. Outdoor flight tests demonstrate stable take‑off, hover, directional maneuvers, and landing, confirming that controlled arm bending can generate horizontal displacement while preserving altitude. Measurements of pitch, roll, and curvature angles show that the platform follows intended actuation patterns with minimal attitude deviations. These results demonstrate that QuadSoft preserves the baseline stability of rigid quadrotors while enabling morphology‑driven maneuverability, all under the standard PX4 autopilot without retuning. Beyond a proof of concept, this work establishes a distinctive outdoor validation of a tendon‑driven continuum morphing quadrotor, opening a new research avenue toward adaptive aerial systems that combine the safety and versatility of soft robotics with the performance of conventional UAVs.
Authors: Hans Riess, Yujun Huang, Matthew Klawonn, Gioele Zardini, Matthew Hale
Abstract: Monotone co‑design enables compositional engineering design by modeling components through feasibility relations between required resources and provided functionalities. However, its standard boolean formulation cannot natively represent quantitative criteria such as cost, confidence, or implementation choice. In practice, these quantities are often introduced through ad hoc scalarization or by augmenting the resource space, which obscures system structure and increases computational burden. We address this limitation by developing a quantale‑enriched theory of co‑design. We model resources and functionalities as quantale‑enriched categories and design problems as quantale‑enriched profunctors, thereby lifting co‑design from boolean feasibility to general quantitative evaluation. We show that the fundamental operations of series, parallel, and feedback composition remain valid over arbitrary commutative quantales. We further introduce heterogeneous composition through change‑of‑base maps between quantales, enabling different subsystems to be evaluated in different local semantics and then composed in a common framework. The resulting theory unifies feasibility‑, cost‑, confidence‑, and implementation‑aware co‑design within one compositional formalism. Numerical examples on a target‑tracking system and a UAV delivery problem demonstrate the framework and highlight how native quantitative enrichment can avoid the architectural and computational drawbacks of boolean‑only formulations.
Authors: Yuhua Xu, Mingtao Jiang, Chenfei Hu, Yinglong Wang, Chuan Zhang, Meng Li, Ming Lu, Liehuang Zhu
Abstract: In low‑altitude wireless networks (LAWN), federated learning (FL) enables collaborative intelligence among unmanned aerial vehicles (UAVs) and integrated sensing and communication (ISAC) devices while keeping raw sensing data local. Due to the "right to be forgotten" requirements and the high mobility of ISAC devices that frequently enter or leave the coverage region of UAV‑assisted servers, the influence of departing devices must be removed from trained models. This necessity motivates the adoption of federated unlearning (FUL) to eliminate historical device contributions from the global model in LAWN. However, existing FUL approaches implicitly assume that the UAV‑assisted server executes unlearning operations honestly. Without client‑verifiable guarantees, an untrusted server may retain residual device information, leading to potential privacy leakage and undermining trust. To address this issue, we propose VerFU, a privacy‑preserving and client‑verifiable federated unlearning framework designed for LAWN. It empowers ISAC devices to validate the server‑side unlearning operations without relying on original data samples. By integrating linear homomorphic hash (LHH) with commitment schemes, VerFU constructs tamper‑proof records of historical updates. ISAC devices ensure the integrity of unlearning results by verifying decommitment parameters and utilizing the linear composability of LHH to check whether the global model accurately removes their historical contributions. Furthermore, VerFU is capable of efficiently processing parallel unlearning requests and verification from multiple ISAC devices. Experimental results demonstrate that our framework efficiently preserves model utility post‑unlearning while maintaining low communication and verification overhead.
Authors: Anja Bosak, Dorian Erić, Ana Milas, Stjepan Bogdan
Abstract: In this paper, we present a generalized, comprehensive nonlinear mathematical model and conceptual design for the MetaMorpher, a metamorphic Unmanned Aerial Vehicle (UAV) designed to bridge the gap between vertical takeoff and landing agility and fixed‑wing cruising efficiency. Building on the successful design of the spincopter platform, this work introduces a simplified mechanical architecture using lightweight materials and a novel wing‑folding strategy. Unlike traditional rigid‑body approximations, we derive a nonlinear flight dynamics model that enables arbitrary force distributions across a segmented wing structure. This modularity allows for testing different airfoils, mass distributions, and chord lengths in a single environment. As part of this work, various flight modes were specifically tested and analyzed in the Simulink environment. The results show that the model behaves predictably under different structural configurations, demonstrating its reliability as a tool for rapid design evaluation.
Authors: Chengzhen Meng, Chenming He, Yidong Jiang, Xiaoran Fan, Dequan Wang, Lingyu Wang, Jianmin Ji, Yanyong Zhang
Abstract: The potential usage of UAVs in daily life has made monitoring them essential. However, existing systems for monitoring UAVs typically rely on cameras, LiDARs, or radars, whose limited sensing range or high deployment cost hinder large‑scale adoption. In response, we develop BSense, the first system that tracks UAVs by leveraging point clouds from commercial 5G‑A base stations. The key challenge lies in the dominant number of noise points that closely resemble true UAV points, resulting in a noise‑to‑UAV ratio over 100:1. Therefore, identifying UAVs from the raw point clouds is like finding a needle in a haystack. To overcome this, we propose a layered framework that filters noise at the point, object, and trajectory levels. At the raw point level, we observe that noise points from different spatial regions exhibit distinguishable and consistent signal fingerprints, which we can model to identify and remove them. At the object level, we design spatial and velocity consistency checks to identify false objects, and further compute confidence scores by aggregating these checks over multiple frames for more reliable discrimination. At the final trajectory level, we propose a Transformer‑based network that captures multi‑frame motion patterns to filter the few remaining false trajectories.
We evaluated BSense on a commercial 5G‑A base station deployed in an urban environment. The UAV was instructed to fly along 25 distinct trajectories across 54 cases over 7 days, yielding 155 minutes of data with more than 14,000 frames. On this dataset, our system reduces the number of false detections from an average of 168.05 per frame to 0.04, achieving an average F1 score of 95.56% and a mean localization error of 4.9 m at ranges up to 1,000 m.
Authors: Andrew Nash, Dirk Pesch, Krishnendu Guha
Abstract: With the ever‑increasing range of applications of Internet in Things (IoT) and sensor networks, challenges are emerging in various categories of classification tasks. Applications such as vehicular networking, UAV swarm coordination and cyber‑physical systems require global classification over distributed sensors, with tight constraints on communication and computation resources. There has been much research in decentralized and distributed data‑exchange for communication‑efficient collective inference. Likewise, there has been considerable research involving the use of cloud and edge computing paradigms for efficient task allocation. To the best of our knowledge, there has been no research on the integration of these two concepts to create a hybrid cloud and distributed approach that makes dynamic runtime communication strategy decisions. In this paper, we focus on aspects of combining distributed and hierarchical communication and classification approaches for collective inference. We derive optimal policies for agents that implement this hybrid approach, and evaluate their performance under various scenarios of the distribution of underlying data. Our analysis shows that this approach can maintain a high level of classification accuracy (comparable to that of centralised joint inference over all data), at reduced theoretical communication cost. We expect there is potential for our approach to facilitate efficient collective inference for real‑world applications, including instances that involves more complex underlying data distributions.
Authors: Xiaobin Zhou, Zihao Zheng, Aoxu Jin, Lei Qiang, Bo Zhu
Abstract: Unmanned Aerial Vehicles (UAVs) perception relies on onboard sensors like cameras and LiDAR, which are limited by the narrow field of view (FoV). We present Self‑Perception INertial Navigation Enabled Rotorcraft (SPINNER), a self‑rotating tri‑rotor UAV for the FoV expansion and autonomous flight. Without adding extra sensors or energy consumption, SPINNER significantly expands the FoV of onboard camera and LiDAR sensors through continuous spin motion, thereby enhancing environmental perception efficiency. SPINNER achieves full 3‑dimensional position and roll‑‑pitch attitude control using only three brushless motors, while adjusting the rotation speed via anti‑torque plates design. To address the strong coupling, severe nonlinearity, and complex disturbances induced by spinning flight, we develop a disturbance compensation control framework that combines nonlinear model predictive control (MPC) with incremental nonlinear dynamic inversion. Experimental results demonstrate that SPINNER maintains robust flight under wind disturbances up to 4.8 \,m/s and achieves high‑precision trajectory tracking at a maximum speed of 2.0\,m/s. Moreover, tests in parking garages and forests show that the rotational perception mechanism substantially improves FoV coverage and enhances perception capability of SPINNER.
Authors: Daniel Gutierrez, Ruben Martinez, Leyre Arnedo, Antonio Cuesta, Soukaina El Hamry
Abstract: The demand for high‑speed, low‑latency, and energy‑efficient object detection in autonomous systems ‑‑ such as advanced driver‑assistance systems (ADAS), unmanned aerial vehicles (UAVs), and Industry 4.0 robotics ‑‑ has exposed the limitations of traditional Convolutional Neural Networks (CNNs). To address these challenges, we have developed AceleradorSNN, a third‑generation artificial intelligence cognitive system. This architecture integrates a Neuromorphic Processing Unit (NPU) based on Spiking Neural Networks (SNNs) to process asynchronous data from Dynamic Vision Sensors (DVS), alongside a dynamically reconfigurable Cognitive Image Signal Processor (ISP) for RGB cameras. This paper details the hardware‑oriented design of both IP cores, the evaluation of surrogate‑gradienttrained SNN backbones, and the real‑time streaming ISP architecture implemented on Field‑Programmable Gate Arrays (FPGA).
Authors: Mohammad Farhoudi, Hamidreza Mazandarani, Masoud Shokrnezhad, Tarik Taleb, Ignacio Lacalle
Abstract: The proliferation of users, devices, and novel vehicular applications ‑ propelled by advancements in autonomous systems and connected technologies ‑ is precipitating an unprecedented surge in novel services. These emerging services require substantial bandwidth allocation, adherence to stringent Quality of Service (QoS) parameters, and energy‑efficient implementations, particularly within highly dynamic vehicular environments. The complexity of these requirements necessitates a fundamental paradigm shift in service orchestration methodologies to facilitate seamless and robust service delivery. This paper addresses this challenge by presenting a novel framework for service orchestration in Unmanned Aerial Vehicles (UAV)‑assisted 6G aerial‑terrestrial networks. The proposed framework synergistically integrates UAV trajectory planning, Multiple‑Access Control (MAC), and service placement to facilitate energy‑efficient service coverage while maintaining ultra‑low latency communication for vehicular user service requests. We first present a non‑linear programming model that formulates the optimization problem. Next, to address the problem, we employ a Hierarchical Deep Reinforcement Learning (HDRL) algorithm that dynamically predicts service requests, user mobility, and channel conditions, addressing the challenges of interference, resource scarcity, and mobility in heterogeneous networks. Simulation results demonstrate that the proposed framework outperforms state‑of‑the‑art solutions in request acceptance, energy efficiency, and latency minimization, showcasing its potential to support the high demands of next‑generation vehicular networks.
Authors: Yuqi Ping, Huahao Ding, Tianhao Liang, Longyu Zhou, Guangyu Lei, Xinglin Chen, Junwei Wu, Jieyu Zhou, Tingting Zhang
Abstract: Natural language (NL) navigation for low‑altitude unmanned aerial vehicles (UAVs) offers an intelligent and convenient solution for low‑altitude aerial services by enabling an intuitive interface for non‑expert operators. However, deploying this capability in urban environments necessitates the precise grounding of underspecified instructions into safety‑critical, dynamically feasible motion plans subject to spatiotemporal constraints. To address this challenge, we propose a unified framework that translates NL instructions into Signal Temporal Logic (STL) specifications and subsequently synthesizes trajectories via mixed‑integer linear programming (MILP). Specifically, to generate executable STL formulas from free‑form NL, we develop a reasoning‑enhanced large language model (LLM) leveraging chain‑of‑thought (CoT) supervision and group‑relative policy optimization (GRPO), which ensures high syntactic validity and semantic consistency. Furthermore, to resolve infeasibilities induced by stringent logical or spatial requirements, we introduce a specification repair mechanism. This module combines MILP‑based diagnosis with LLM‑guided semantic reasoning to selectively relax task constraints while strictly enforcing safety guarantees. Extensive simulations and real‑world flight experiments demonstrate that the proposed closed‑loop framework significantly improves NL‑to‑STL translation robustness, enabling safe, interpretable, and adaptable UAV navigation in complex scenarios.
Authors: Qian Yang, Miaomiao Wang, Abdelhamid Tayebi
Abstract: This paper proposes a model predictive trajectory tracking approach for quadrotors subject to input constraints. Our proposed approach relies on a hierarchical control strategy with an outer‑loop feedback generating the required thrust and desired attitude and an inner‑loop feedback regulating the actual attitude to the desired one. For the outer‑loop translational dynamics, the generation of the virtual control input is formulated as a constrained model predictive control problem with time‑varying input constraints and a control strategy, endowed with uniform global asymptotic stability guarantees, is proposed. For the inner‑loop rotational dynamics, a hybrid geometric controller is adopted, achieving semi‑global exponential tracking of the desired attitude. Finally, we prove that the overall cascaded system is semi‑globally asymptotically stable. Simulation results illustrate the effectiveness of the proposed approach.
Authors: Pengzhi Zhong, Jiwei Mo, Dan Zeng, Feixiang He, Shuiwang Li
Abstract: Spiking Neural Networks (SNNs), characterized by their event‑driven computation and low power consumption, have shown great potential for energy‑efficient visual tracking on unmanned aerial vehicles (UAVs). However, existing efficient SNN‑based trackers heavily rely on costly event cameras, limiting their deployment on UAVs. To address this limitation, we propose STATrack, an efficient fully spiking neural network framework for UAV visual tracking using RGB inputs only. To the best of our knowledge, this work is the first to investigate spiking neural networks for UAV visual tracking tasks. To mitigate the weakening of target features by background tokens, we propose adaptively maximizing the mutual information between templates and features. Extensive experiments on four widely used UAV tracking benchmarks demonstrate that STATrack achieves competitive tracking performance while maintaining low energy consumption.
Authors: Hussein Naser, Hashim A. Hashim, Mojtaba Ahmadi
Abstract: This paper introduces an advanced Quaternion‑based Unscented Kalman Filter (QUKF) for real‑time, robust estimation of system states and external wrenches in assistive aerial payload transportation systems that engage in direct physical interaction. Unlike conventional filtering techniques, the proposed approach employs a unit‑quaternion representation to inherently avoid singularities and ensure globally consistent, drift‑free estimation of the platform's pose and interaction wrenches. A rigorous quaternion‑based dynamic model is formulated to capture coupled translational and rotational dynamics under interaction forces. Building on this model, a comprehensive QUKF framework is established for state prediction, measurement updates, and external wrench estimation. The proposed formulation fully preserves the nonlinear characteristics of rotational motion, enabling more accurate and numerically stable estimation during physical interaction compared to linearized filtering schemes. Extensive simulations validate the effectiveness of the QUKF, showing significant improvements over the Extended Kalman Filter (EKF). Specifically, the QUKF achieved a 79.41% reduction in Root Mean Squared Error (RMSE) for torque estimation, with average RMSE improvements of 79% and 56%, for position and angular rates, respectively. These findings demonstrate enhanced robustness to measurement noise and modeling uncertainties, providing a reliable foundation for safe, stable, and responsive human‑UAV physical interaction in cooperative payload transportation tasks.
Authors: Dikai Shang, Jingyue Zhao, Shi Xu, Nanyang Ye, Lei Wang
Abstract: Achieving safe, high‑speed autonomous flight in complex environments with static, dynamic, or mixed obstacles remains challenging, as a single perception modality is incomplete. Depth cameras are effective for static objects but suffer from motion blur at high speeds. Conversely, event cameras excel at capturing rapid motion but struggle to perceive static scenes. To exploit the complementary strengths of both sensors, we propose an end‑to‑end flight control network that achieves feature‑level fusion of depth images and event data through a bidirectional crossattention module. The end‑to‑end network is trained via imitation learning, which relies on high‑quality supervision. Building on this insight, we design an efficient expert planner using Spherical Principal Search (SPS). This planner reduces computational complexity from O(n^2) to O(n) while generating smoother trajectories, achieving over 80% success rate at 17m/s‑‑nearly 20% higher than traditional planners. Simulation experiments show that our method attains a 70‑80% success rate at 17 m/s across varied scenes, surpassing single‑modality and unidirectional fusion models by 10‑20%. These results demonstrate that bidirectional fusion effectively integrates event and depth information, enabling more reliable obstacle avoidance in complex environments with both static and dynamic objects.
Authors: Vinay Kathiriya, Saurabh Kumar, Shashi Ranjan Kumar
Abstract: This paper addresses the three‑dimensional path‑following guidance problem for unmanned aerial vehicles under explicit actuator constraints. Unlike conventional approaches that assume unbounded control inputs or handle saturation heuristically, the proposed method incorporates bounded lateral acceleration directly into the guidance design. A nonlinear guidance framework is developed employing a nested saturation‑based control technique. The proposed guidance strategy guarantees bounded control inputs while ensuring exponential convergence of cross‑track errors to zero. The formulation is applicable to general smooth paths and is systematically extended from planar to three‑dimensional scenarios using a path‑tangent coordinate framework. Rigorous stability analysis based on Lyapunov theory establishes convergence and feasibility properties of the closed‑loop system. Numerical simulations on representative paths, including straight‑line, circular, and sinusoidal paths, demonstrate that the proposed method achieves superior tracking performance, reduced control effort, and robustness against disturbances compared to existing guidance laws. The simplicity of the design and its compatibility with practical actuator limits make it suitable for real‑world UAV applications.
Authors: Daniel T. Bonkowsky, Ibrahim Kilinc, Robert W. Heath
Abstract: Unmanned aerial vehicles (UAVs) fill coverage holes as wireless relays during emergency situations. Fixed‑wing UAVs offer longer flight duration and larger coverage in such situations than rotary‑wing counterparts. Maximizing the effectiveness of fixed‑wing UAV relay systems requires careful tuning of system and flight parameters. This process is challenging because factors including flight trajectory, timeshare, and user scheduling are not easily optimized. In this paper, we propose an optimization for UAV‑based wireless relaying networks based on a setup which is applicable to arbitrary spatial user positions. In the setup, a fixed‑wing UAV flies over a circular trajectory and relays data from ground users in a coverage hole to a distant base station (BS). Our optimization iteratively maximizes the average achievable spectral efficiency (SE) for the UAV trajectory, user scheduling, and relay timeshare. The simulation results show that our optimization is effective for varying user distributions and that it performs especially well on distributions with a high standard deviation.
Authors: Hazim Alzorgan, Sayed Pedram Haeri Boroujeni, Abolfazl Razi
Abstract: Drones equipped with overhead manipulators offer unique capabilities for inspection, maintenance, and contact‑based interaction. However, the motion of the drone and its manipulator is tightly linked, and even small attitude changes caused by wind or control imperfections shift the end‑effector away from its intended path. This coupling makes reliable tracking difficult and also limits the direct use of learning‑based arm controllers that were originally designed for fixed‑base robots. These effects appear consistently in our tests whenever the UAV body experiences drift or rapid attitude corrections. To address this behavior, we develop a reinforcement‑learning (RL) framework with a transformer‑based double deep Q learning (DDQN), with the core idea of using an adaptive beam‑search planner that applies a short‑horizon beam search over candidate control sequences using the learned critic as the forward estimator. This allows the controller to anticipate the end‑effector's motion through simulated rollouts rather than executing those actions directly on the actual model, realizing a software‑in‑the‑loop (SITL) approach. The lookahead relies on value estimates from a Transformer critic that processes short sequences of states, while a DDQN backbone provides the one‑step targets needed to keep the learning process stable. Evaluated on a 3‑DoF aerial manipulator under identical training conditions, the proposed meta‑adaptive planner shows the strongest overall performance with a 10.2% reward increase, a substantial reduction in mean tracking error (from about 6% to 3%), and a 29.6% improvement in the combined reward‑error metric relative to the DDQN baseline. Our method exhibits elevated stability in tracking target tip trajectory (by maintaining 5 cm tracking error) when the drone base exhibits drifts due to external disturbances, as opposed to the fixed‑beam and Transformer‑only variants.
Authors: Yida Lin, Bing Xue, Mengjie Zhang, Sam Schofield, Richard Green
Abstract: Autonomous tree pruning with unmanned aerial vehicles (UAVs) is a safety‑critical real‑world task: the onboard perception system must estimate the metric distance from a cutting tool to thin tree branches in real time so that the UAV can approach, align, and actuate the pruner without collision. We address this problem by training five variants of DEFOM‑Stereo ‑ a recent foundation‑model‑based stereo matcher ‑ on a task‑specific synthetic dataset and deploying the checkpoints on an NVIDIA Jetson Orin Super 16 GB. The training corpus is built in Unreal Engine 5 with a simulated ZED Mini stereo camera capturing 5,520 stereo pairs across 115 tree instances from three viewpoints at 2m distance; dense EXR depth maps provide exact, spatially complete supervision for thin branches. On the synthetic test set, DEFOM‑Stereo ViT‑S achieves the best depth‑domain accuracy (EPE 1.74 px, D1‑all 5.81%, delta‑1 95.90%, depth MAE 23.40 cm) but its Jetson inference speed of ~2.2 FPS (~450 ms per frame) remains too slow for responsive closed‑loop tool control. A newly introduced balanced variant, DEFOM‑PrunePlus (~21M backbone, ~3.3 FPS on Jetson), offers the best deployable accuracy‑speed trade‑off (EPE 5.87 px, depth MAE 64.26 cm, delta‑1 87.59%): its frame rate is sufficient for real‑time guidance and its depth accuracy supports safe branch approach planning at the 2m operating range. The lightweight DEFOM‑PruneStereo (~6.9 FPS) and DEFOM‑PruneNano (~8.5 FPS) run fast but sacrifice substantial accuracy (depth MAE > 57 cm), making estimates too unreliable for safe actuation. Zero‑shot inference on real photographs confirms that full‑capacity models preserve branch geometry, validating the sim‑to‑real transfer. We conclude that DEFOM‑PrunePlus provides the most practical accuracy‑latency balance for onboard distance estimation, while ViT‑S serves as the reference for future hardware.
Authors: John Ayotunde, Qinghua Xu, Guancheng Wang, Lionel C. Briand
Abstract: Safety monitoring is essential for Cyber‑Physical Systems (CPSs). However, unsafe events are rare in real‑world CPS operations, creating an extreme class imbalance that degrades safety predictors. Standard rebalancing techniques perform poorly on time‑series CPS telemetry, either generating unrealistic synthetic samples or overfitting on the minority class. Meanwhile, behavioral uncertainty in CPS operations, defined as the degree of doubt or uncertainty in CPS decisions , is often correlated with safety outcomes but unexplored in safety monitoring. To that end, we propose U‑Balance, a supervised approach that leverages behavioral uncertainty to rebalance imbalanced datasets prior to training a safety predictor. U‑Balance first trains a GatedMLP‑based uncertainty predictor that summarizes each telemetry window into distributional kinematic features and outputs an uncertainty score. It then applies an uncertainty‑guided label rebalancing (uLNR) mechanism that probabilistically relabels safe‑labeled windows with unusually high uncertainty as unsafe, thereby enriching the minority class with informative boundary samples without synthesizing new data. Finally, a safety predictor is trained on the rebalanced dataset for safety monitoring. We evaluate U‑Balance on a large‑scale UAV benchmark with a 46:1 safe‑to‑unsafe ratio. Results confirm a moderate but significant correlation between behavioral uncertainty and safety. We then identify uLNR as the most effective strategy to exploit uncertainty information, compared to direct early and late fusion. U‑Balance achieves a 0.806 F1 score, outperforming the strongest baseline by 14.3 percentage points, while maintaining competitive inference efficiency. Ablation studies confirm that both the GatedMLP‑based uncertainty predictor and the uLNR mechanism contribute significantly to U‑Balance's effectiveness.
Authors: Pengyu Chen, Haotian Sa, Yiwei Hu, Yuhan Cheng, Junbo Wang
Abstract: Detecting small unmanned aerial vehicles (UAVs) from a ground‑to‑air (G2A) perspective presents significant challenges, including extremely low pixel occupancy, cluttered aerial backgrounds, and strict real‑time constraints. Existing YOLO‑based detectors are primarily optimized for general object detection and often lack adequate feature resolution for sub‑pixel targets, while introducing complexities during deployment. In this paper, we propose SDD‑YOLO, a small‑target detection framework tailored for G2A anti‑UAV surveillance. To capture fine‑grained spatial details critical for micro‑targets, SDD‑YOLO introduces a P2 high‑resolution detection head operating at 4 times downsampling. Furthermore, we integrate the recent architectural advancements from YOLO26, including a DFL‑free, NMS‑free architecture for streamlined inference, and the MuSGD hybrid training strategy with ProgLoss and STAL, which substantially mitigates gradient oscillation on sparse small‑target signals. To support our evaluation, we construct DroneSOD‑30K, a large‑scale G2A dataset comprising approximately 30,000 annotated images covering diverse meteorological conditions. Experiments demonstrate that SDD‑YOLO‑n achieves a mAP@0.5 of 86.0% on DroneSOD‑30K, surpassing the YOLOv5n baseline by 7.8 percentage points. Extensive inference analysis shows our model attains 226 FPS on an NVIDIA RTX 5090 and 35 FPS on an Intel Xeon CPU, demonstrating exceptional efficiency for future edge deployment.
Authors: Yunes Alqudsi
Abstract: Drone light shows (DLShows) represent a rapidly growing application of swarm robotics, creating captivating aerial displays through the synchronized flight of hundreds or thousands of unmanned aerial vehicles (UAVs) as environmentally friendly and reusable alternatives to traditional pyrotechnics. This domain presents unique challenges in optimally assigning drones to visual waypoints and generating smooth, collision‑free trajectories at a very large scale. This article introduces the Unified Assignment and Trajectory Generation (UATG) framework. The proposed approach concurrently solves two core problems: the optimal assignment of drones to designated goal locations and the generation of dynamically feasible, collision‑free, time‑parameterized trajectories. The UATG framework is specifically designed for DLShows, ensuring minimal transition times between formations and guaranteeing inter‑drone collision avoidance. A key innovation is its exceptional computational efficiency, enabling the coordination of large‑scale in real‑time; for instance, it computes the optimal assignment and trajectories for 1008 drones in approximately one second on a standard laptop. Extensive simulations in realistic environments validate the framework's performance, demonstrating its capability to orchestrate complex formations, from alphanumeric characters to intricate 3D shapes, with precision and visual smoothness. This work provides a critical advancement for the DLShow industry, offering a practical and scalable solution for generating complex aerial choreography and establishing a valuable benchmark for ground control station software designed for the efficient coordination of multiple UAVs. A supplemental animated simulation of this work is available at https://youtu.be/‑Fjrhw03594.
Authors: Zimao Sheng, Zirui Yu, Hong'an Yang
Abstract: Multiple fixed‑wing unmanned aerial vehicles (multi‑UAVs) encounter significant challenges in cooperative path following over complex Digital Elevation Model (DEM) low‑altitude airspace, including wind field disturbances, sudden obstacles, and requirements of distributed temporal synchronization during differentiated path tracking. Existing methods lack efficient distributed coordination mechanisms for time‑consistent tracking of 3D differentiated paths, fail to quantify robustness against disturbances, and lack effective online obstacle avoidance replanning capabilities. To address these gaps, a cooperative control strategy is proposed: first, the distributed cooperative path‑following problem is quantified via time indices, and consistency is ensured through a distributed communication protocol; second, a longitudinal‑lateral look‑ahead angle adjustment method coupled with a robust guidance law is developed to achieve finite‑time stabilization of path following error to zero under wind disturbances; third, an efficient local path replanning method with minimal time cost is designed for real‑time online obstacle avoidance.Experimental validations demonstrate the effectiveness and superiority of the \ proposed strategy.
Authors: Mohamed Nennouche, Mohammad-Ali Khalighi, Alexis Alfredo Dowhuszko, Djamal Merad
Abstract: Underwater observatories have recently emerged as an efficient solution for marine biodiversity monitoring. The primary objective of this work is to enable efficient and cost‑effective data muling from underwater sensors by investigating the use of optical wireless communications to transmit data from the underwater sensors to an aerial node close to the water surface, such as an unmanned aerial vehicle (UAV). More specifically, we utilize a direct water‑to‑air (W2A) optical communication link between the sensor node equipped with an LED emitter and the UAV equipped with an ultra‑sensitive receiver, i.e., a silicon photo‑multiplier. As a main contribution, we develop a comprehensive Monte Carlo‑based ray‑tracing algorithm to characterize this complex channel. This framework rigorously incorporates the impact of air bubbles modeled through the Mie scattering theory, a realistic sea surface representation derived from the JONSWAP spectrum, and an analytical derivation of the channel loss resulting from UAV instability under wind‑induced perturbations. Furthermore, we conduct a comprehensive analysis of the W2A channel, examining the influence of key parameters such as wind speed, transmitter configurations, and receiver characteristics. The end‑to‑end performance evaluation demonstrates the practical feasibility of the proposed approach, achieving a bit‑error rate of 10^‑3 at a data rate of 1 Mbps for a transmitter depth of 47 m and wind speeds up to 13 m/s.
Authors: Niloufar Amiri, Farrokh Janabi-Sharifi
Abstract: Tendon‑driven aerial continuum manipulators (TD‑ACMs) combine the maneuverability of uncrewed aerial vehicles (UAVs) with the compliance of lightweight continuum robots (CRs). Existing coupled dynamic modeling approaches for TD‑ACMs incur high computational costs and do not explicitly account for aerial platform underactuation. To address these limitations, this paper presents a generalized dynamic formulation of a coupled TD‑ACM with an underactuated base. The proposed approach integrates a strain‑parameterized Cosserat rod model with a rigid‑body model of the UAV into a unified Lagrangian ordinary differential equation (ODE) framework on \mathrmSE(3), thereby eliminating computationally intensive symbolic derivations. Building upon the developed model, a robust dual‑camera image‑based visual servoing (IBVS) scheme is introduced. The proposed controller mitigates the field‑of‑view (FoV) limitations of conventional IBVS, compensates for attitude‑induced image motion caused by UAV lateral dynamics, and incorporates a low‑level adaptive controller to address modeling uncertainties with formal stability guarantees. Extensive simulations and experimental validation on a compact custom‑built prototype demonstrate the effectiveness and robustness of the proposed framework in real‑world scenarios.
Authors: Samar Heydari, Jawher Said, Galip Ümit Yolcu, Evgenii Kortukov, Elena Golimblevskaia, Evgenios Vlachos, Vasileios Mygdalis, Ioannis Pitas, Sebastian Lapuschkin, Leila Arras
Abstract: Deep learning models for flood and wildfire segmentation and object detection enable precise, real‑time disaster localization when deployed on embedded drone platforms. However, in natural disaster management, the lack of transparency in their decision‑making process hinders human trust required for emergency response. To address this, we present an explainability framework for understanding flood segmentation and car detection predictions on the widely used PIDNet and YOLO architectures. More specifically, we introduce a novel redistribution strategy that extends Layer‑wise Relevance Propagation (LRP) explanations for sigmoid‑gated element‑wise fusion layers. This extension allows LRP relevances to flow through the fusion modules of PIDNet, covering the entire computation graph back to the input image. Furthermore, we apply Prototypical Concept‑based Explanations (PCX) to provide both local and global explanations at the concept level, revealing which learned features drive the segmentation and detection of specific disaster semantic classes. Experiments on a publicly available flood dataset show that our framework provides reliable and interpretable explanations while maintaining near real‑time inference capabilities, rendering it suitable for deployment on resource‑constrained platforms, such as Unmanned Aerial Vehicles (UAVs).
Authors: Li Dong, Feibo Jiang, Kezhi Wang, Cunhua Pan, Dong In Kim, Ekram Hossain
Abstract: Low‑Altitude Wireless Networks (LAWNs), composed of Unmanned Aerial Vehicles (UAVs) and mobile terminals, are emerging as a critical extension of 6G. However, applying Large Language Models in LAWNs faces three major challenges: 1) Computational and energy constraints; 2) Communication and bandwidth limitations; 3) Real‑time and reliability conflicts. To address these challenges, we propose Aerial Agentic AI, a hierarchical framework integrating UAV‑side fast‑thinking Small Language Model (SLMs) with BS‑side slow‑thinking Large Language Model (LLMs). First, we design SLM‑based Agents capable of on‑board perception, short‑term memory enhancement, and real‑time decision‑making on the UAVs. Second, we implement a LLM‑based Agent system that leverages long‑term memory, global knowledge, and tool orchestration at the Base Station (BS) to perform deep reasoning, knowledge updates, and strategy optimization. Third, we establish an efficient hierarchical coordination mechanism, enabling UAVs to execute high‑frequency tasks locally while synchronizing with the BS only when necessary. Experimental results validate the effectiveness of the proposed Aerial Agentic AI.
Authors: Jieting Yuan, Songhan Zhao, Ye Xue, Yu Zhao, Bo Gu, Shimin Gong
Abstract: This paper focuses on secure communications in UAV‑assisted wireless networks, which comprise multiple legitimate UAVs (LE‑UAVs) and an intelligent eavesdropping UAV (EA‑UAV). The intelligent EA‑UAV can observe the LE‑UAVs'transmission strategies and adaptively adjust its trajectory to maximize information interception. To counter this threat, we propose a mode‑switching scheme that enables LE‑UAVs to dynamically switch between the data transmission and jamming modes, thereby balancing data collection efficiency and communication security. However, acquiring full global network state information for LE‑UAVs' decision‑making incurs significant overhead, as the network state is highly dynamic and time‑varying. To address this challenge, we propose a digital twin‑enabled simultaneous learning and modeling (DT‑SLAM) framework that allows LE‑UAVs to learn policies efficiently within the DT, thereby avoiding frequent interactions with the real environment. To capture the competitive relationship between the EA‑UAV and the LE‑UAVs, we model their interactions as a multi‑stage Stackelberg game and jointly optimize the GUs' transmission control, UAVs' trajectory planning, mode selection, and network formation to maximize overall secure throughput. Considering potential model mismatch between the DT and the real environment, we propose a robust proximal policy optimization (RPPO) algorithm that encourages LE‑UAVs to explore service regions with higher uncertainty. Numerical results demonstrate that the proposed DT‑SLAM framework effectively supports the learning process. Meanwhile, the RPPO algorithm converges about 12% faster and the secure throughput can be increased by 8.6% compared to benchmark methods.
Authors: Cahit Ikbal Er, Saikiran Juttu, Yasin Yazicioglu
Abstract: We present an energy‑aware collaborative exploration framework for a UAV‑UGV team operating in unknown environments, where the UAV's energy constraint is modeled as a maximum flight‑time limit. The UAV executes a sequence of energy‑bounded exploration tours, while the UGV simultaneously explores on the ground and serves as a mobile charging station. Rendezvous is enforced under a shared time budget so that the vehicles meet at the end of each tour before the UAV reaches its flight‑time limit. We construct a sparsely coupled air‑ground roadmap using a density‑aware layered probabilistic roadmap (PRM) and formulate tour selection over the roadmap as coupled orienteering problems (OPs) to maximize information gain subject to the rendezvous constraint. The resulting tours are constructed over collision‑validated roadmap edges. We validate our method through simulation studies, benchmark comparisons, and real‑world experiments.
Authors: Sandeep Zachariah, Francisco Yandun, Sachet Korada, Abhisesh Silwal
Abstract: Monitoring and controlling invasive tree species across large forests, parks, and trail networks is challenging due to limited accessibility, reliance on manual scouting, and degraded under‑canopy GNSS. We present MapForest, a modular field robotics system that transforms multi‑modal sensor data into GIS‑ready invasive‑species maps. Our system features: (i) a compact, platform‑agnostic sensing payload that can be rapidly mounted on UAV, bicycle, or backpack platforms, and (ii) a software pipeline comprising LiDAR‑inertial mapping, image‑based invasive‑species detection, and georeferenced map generation. To ensure reliable operation in GNSS‑intermittent environments, we enhance a LiDAR‑inertial mapping backbone with covariance‑aware GNSS factors and robust loss kernels. We train an object detector to detect the Tree‑of‑Heaven (Ailanthus altissima) from onboard RGB imagery and fuse detections with the reconstructed map to produce geospatial outputs suitable for downstream decision making. We collected a dataset spanning six sites across urban environments, parks, trails, and forests to evaluate individual system modules, and report end‑to‑end results on two sites containing Tree‑of‑Heaven. The enhanced mapping module achieved a trajectory deviation error of 1.95 m over a 1.2 km forest traversal, and the Tree‑of‑Heaven detector achieved an F1 score of 0.653. The datasets and associated tooling are released to support reproducible research in forest mapping and invasive‑species monitoring.
Authors: Kejia Liu, Haoyang Zhou, Ruoyu Xu, Peicheng Wang, Mingli Song, Haofei Zhang
Abstract: Recent advances in cross‑view geo‑localization (CVGL) methods have shown strong potential for supporting unmanned aerial vehicle (UAV) navigation in GNSS‑denied environments. However, existing work predominantly focuses on matching UAV views to onboard map tiles, which introduces an inherent trade‑off between accuracy and storage overhead, and overlooks the importance of the UAV's heading during navigation. Moreover, the substantial discrepancies and varying overlaps in cross‑view scenarios have been insufficiently considered, limiting their generalization to real‑world scenarios. In this paper, we present Bearing‑UAV, a purely vision‑driven cross‑view navigation method that jointly predicts UAV absolute location and heading from neighboring features, enabling accurate, lightweight, and robust navigation in the wild. Our method leverages global and local structural features and explicitly encodes relative spatial relationships, making it robust to cross‑view variations, misalignment, and feature‑sparse conditions. We also present Bearing‑UAV‑90k, a multi‑city benchmark for evaluating cross‑view localization and navigation. Extensive experiments show encouraging results that Bearing‑UAV yields lower localization error than previous matching/retrieval paradigm across diverse terrains. Our code and dataset will be made publicly available.
Authors: Afsoon Alidadi Shamsabadi, Cosmas Mwaba, Thomas Nugent, Jie Gao, Pablo Madoery, Halim Yanikomeroglu, Subhadeep Pal
Abstract: Advanced Air Mobility (AAM) has emerged as a key pillar of next‑generation transportation systems, encompassing a wide range of uncrewed aerial vehicle (UAV) applications. To enable AAM, maintaining reliable and efficient communication links between UAVs and control centers is essential. At the same time, the highly dynamic nature of wireless networks, combined with the limited onboard energy of UAVs, makes efficient trajectory planning and network association crucial. Existing terrestrial networks often fail to provide ubiquitous coverage due to frequent handovers and coverage gaps. To address these challenges, geostationary Earth orbit (GEO) satellites offer a promising complementary solution for extending UAV connectivity beyond terrestrial boundaries. This work proposes an integrated GEO terrestrial network architecture to ensure seamless UAV connectivity. Leveraging artificial intelligence (AI), a deep Q network (DQN) based algorithm is developed for joint UAV trajectory and association planning (JUTAP), aiming to minimize energy consumption, handover frequency, and disconnectivity. Simulation results validate the effectiveness of the proposed algorithm within the integrated GEO terrestrial framework.
Authors: Mogens Plessen
Abstract: A method for trajectory smoothing for UAV reference path planning is presented. It is derived based on the dynamics of a Dubins airplane model, and involves a decoupling step, spatial modeling and linear programming. The decoupling step enables algebraic control laws for flight‑path angle and speed control. Only for roll angle control an optimization step is applied, involving the solution of a small linear program. Two variations are discussed. They differ by reference centerline tracking and the introduction of a path shaping constraint. The benefit of natural dimensionality reduction for spatial modeling is discussed. The simplicity of the overall method is highlighted. An extension to aerobatic flight is outlined, which comes at the cost of a model approximation, however at the gain of maintaining the general model structure. An extension of the method to tractor path planning along 3D terrain is discussed. The method is validated in simulations.
Authors: Che Chen, Lanhua Li, Shimin Gong, Yu Zhao, Yuming Fang, Dusit Niyato
Abstract: In this paper, we employ multiple UAVs to accelerate data transmissions from ground users (GUs) to a remote base station (BS) via the UAVs' relay communications. The UAVs' intermittent information exchanges typically result in delays in acquiring the complete system state and hinder their effective collaboration. To maximize the overall throughput, we first propose a delay‑tolerant multi‑agent deep reinforcement learning (MADRL) algorithm that integrates a delay‑penalized reward to encourage information sharing among UAVs, while jointly optimizing the UAVs' trajectory planning, network formation, and transmission control strategies. Additionally, considering information loss due to unreliable channel conditions, we further propose a spatio‑temporal attention based prediction approach to recover the lost information and enhance each UAV's awareness of the network state. These two designs are envisioned to enhance the network capacity in UAV‑assisted wireless networks with limited communications. The simulation results reveal that our new approach achieves over 50% reduction in information delay and 75% throughput gain compared to the conventional MADRL. Interestingly, it is shown that improving the UAVs' information sharing will not sacrifice the network capacity. Instead, it significantly improves the learning performance and throughput simultaneously. It is also effective in reducing the need for UAVs' information exchange and thus fostering practical deployment of MADRL in UAV‑assisted wireless networks.
Authors: Kesheng Chen, Wenjian Luo, Xin Lin, Zhen Song, Yatong Chang
Abstract: Unmanned aerial vehicles (UAVs) have been widely used in urban missions, and proper planning of UAV paths can improve mission efficiency while reducing the risk of potential third‑party impact. Existing work has considered all efficiency and safety objectives for a single decision‑maker (DM) and regarded this as a multiobjective optimization problem (MOP). However, there is usually not a single DM but two DMs, i.e., an efficiency DM and a safety DM, and the DMs are only concerned with their respective objectives. The final decision is made based on the solutions of both DMs. In this paper, for the first time, biparty multiobjective UAV path planning (BPMO‑UAVPP) problems involving both efficiency and safety departments are modeled. The existing multiobjective immune algorithm with nondominated neighbor‑based selection (NNIA), the hybrid evolutionary framework for the multiobjective immune algorithm (HEIA), and the adaptive immune‑inspired multiobjective algorithm (AIMA) are modified for solving the BPMO‑UAVPP problem, and then biparty multiobjective optimization algorithms, including the BPNNIA, BPHEIA, and BPAIMA, are proposed and comprehensively compared with traditional multiobjective evolutionary algorithms and typical multiparty multiobjective evolutionary algorithms (i.e., OptMPNDS and OptMPNDS2). The experimental results show that BPAIMA performs better than ordinary multiobjective evolutionary algorithms such as NSGA‑II and multiparty multiobjective evolutionary algorithms such as OptMPNDS, OptMPNDS2, BPNNIA and BPHEIA.
Authors: Ebasa Temesgen, Nathnael Minyelshowa, Lebsework Negash
Abstract: The use of unmanned aerial vehicles (UAVs) in precision agriculture has seen a huge increase recently. As such, systems that aim to apply various algorithms on the field need a structured framework of abstractions. This paper defines the various tasks of the UAVs in precision agriculture and model them into an architectural framework. The presented architecture is built on the context that there will be minimal physical intervention to do the tasks defined with multiple coordinated and cooperative UAVs. Various tasks such as image processing, path planning, communication, data acquisition, and field mapping are employed in the architecture to provide an efficient system. Besides, different limitation for applying Multi‑UAVs in precision agriculture has been considered in designing the architecture. The architecture provides an autonomous end‑to‑end solution, starting from mission planning, data acquisition and image processing framework that is highly efficient and can enable farmers to comprehensively deploy UAVs onto their lands. Simulation and field tests shows that the architecture offers a number of advantages that include fault‑tolerance, robustness, developer and user‑friendliness.
Authors: Wen Jiang, Kangyao Huang, Li Wang, Wang Xu, Wei Fan, Jinyuan Liu, Shaoyu Liu, Hanfang Liang, Hongwei Duan, Bin Xu, Xiangyang Ji
Abstract: UAVs play an important role in applications such as autonomous exploration, disaster response, and infrastructure inspection. However, UAV VLN in complex 3D environments remains challenging. A key difficulty is the structural representation mismatch between 2D visual perception and the 3D trajectory decision space, which limits spatial reasoning. To this end, we propose SpatialFly, a geometry‑guided spatial representation framework for UAV VLN. Operating on RGB observations without explicit 3D reconstruction, SpatialFly introduces a geometry‑guided 2D representation alignment mechanism. Specifically, the geometric prior injection module injects global structural cues into 2D semantic tokens to provide scene‑level geometric guidance. The geometry‑aware reparameterization module then aligns 2D semantic tokens with 3D geometric tokens through cross‑modal attention, followed by gated residual fusion to preserve semantic discrimination. Experimental results show that SpatialFly consistently outperforms state‑of‑the‑art UAV VLN baselines across both seen and unseen environments, reducing NE by 4.03m and improving SR by 1.27% over the strongest baseline on the unseen Full split. Additional trajectory‑level analysis shows that SpatialFly produces trajectories with better path alignment and smoother, more stable motion.
Authors: Zelin Wan, Jin-Hee Cho, Mu Zhu, Ahmed H. Anwar, Charles Kamhoua, Munindar P. Singh
Abstract: Unmanned Aerial Vehicles (UAVs) are valuable for mission‑critical systems like surveillance, rescue, or delivery. Not surprisingly, such systems attract cyberattacks, including Denial‑of‑Service (DoS) attacks to overwhelm the resources of mission drones (MDs). How can we defend UAV mission systems against DoS attacks? We adopt cyber deception as a defense strategy, in which honey drones (HDs) are proposed to bait and divert attacks. The attack and deceptive defense hinge upon radio signal strength: The attacker selects victim MDs based on their signals, and HDs attract the attacker from afar by emitting stronger signals, despite this reducing battery life. We formulate an optimization problem for the attacker and defender to identify their respective strategies for maximizing mission performance while minimizing energy consumption. To address this problem, we propose a novel approach, called HT‑DRL. HT‑DRL identifies optimal solutions without a long learning convergence time by taking the solutions of hypergame theory into the neural network of deep reinforcement learning. This achieves a systematic way to intelligently deceive attackers. We analyze the performance of diverse defense mechanisms under different attack strategies. Further, the HT‑DRL‑based HD approach outperforms existing non‑HD counterparts up to two times better in mission performance while incurring low energy consumption.
Authors: Yifei Deng, Chenglong Li, Yuyang Zhang, Guyue Hu, Jin Tang
Abstract: Text‑aerial person retrieval aims to identify targets in UAV‑captured images from eyewitness descriptions, supporting intelligent transportation and public security applications. Compared to ground‑view text‑‑image person retrieval, UAV‑captured images often suffer from degraded visual information due to drastic variations in viewing angles and flight altitudes, making semantic alignment with textual descriptions very challenging. To address this issue, we propose a novel Cross‑modal Fuzzy Alignment Network, which quantifies the token‑level reliability by fuzzy logic to achieve accurate fine‑grained alignment and incorporates ground‑view images as a bridge agent to further mitigate the gap between aerial images and text descriptions, for text‑‑aerial person retrieval. In particular, we design the Fuzzy Token Alignment module that employs the fuzzy membership function to dynamically model token‑level association strength and suppress the influence of unobservable or noisy tokens. It can alleviate the semantic inconsistencies caused by missing visual cues and significantly enhance the robustness of token‑level semantic alignment. Moreover, to further mitigate the gap between aerial images and text descriptions, we design a Context‑Aware Dynamic Alignment module to incorporate the ground‑view agent as a bridge in text‑‑aerial alignment and adaptively combine direct alignment and agent‑assisted alignment to improve the robustness. In addition, we construct a large‑scale benchmark dataset called AERI‑PEDES by using a chain‑of‑thought to decompose text generation into attribute parsing, initial captioning, and refinement, thus boosting textual accuracy and semantic consistency. Experiments on AERI‑PEDES and TBAPR demonstrate the superiority of our method.
Authors: Abdullahi Isa Ahmed, Ana Maria Drăgulinescu, El Mehdi Amhoud
Abstract: The evolution of Internet of Things (IoT) into multi‑layered environments has positioned Low‑Power Wide Area Networks (LPWANs), particularly Long Range (LoRa), as the backbone for connectivity across both surface and subterranean landscapes. However, existing LoRa‑based network designs often treat ground‑based wireless sensor networks (WSNs) and wireless underground sensor networks (WUSNs) as separate systems, resulting in inefficient and non‑integrated connectivity across diverse environments. To address this, we propose Hetero‑Net, a unified heterogeneous LoRa framework that integrates diverse LoRa end devices with multiple unmanned aerial vehicle (UAV)‑mounted LoRa gateways. Our objective is to maximize system energy efficiency through the joint optimization of the spreading factor, transmission power, and three‑dimensional (3D) placement of the UAVs. To manage the dynamic and partially observable nature of this system, we model the problem as a partially observable stochastic game (POSG) and address it using a multi‑agent proximal policy optimization (MAPPO) framework. An ablation study shows that our proposed MAPPO Hetero‑Net significantly outperforms traditional, isolated network designs, achieving energy efficiency improvements of 55.81% and 198.49% over isolated WSN‑only and WUSN‑only deployments, respectively.
Authors: Men Niu, Xinxin Fan, Quanliang Jing, Shaoye Luo, Yunfeng Lu
Abstract: Cooperative multi‑agent reinforcement learning (c‑MARL) has been widely deployed in real‑world applications, such as social robots, embodied intelligence, UAV swarms, etc. Nevertheless, many adversarial attacks still exist to threaten various c‑MARL systems. At present, the studies mainly focus on single‑adversary perturbation attacks and white‑box adversarial attacks that manipulate agents' internal observations or actions. To address these limitations, we in this paper attempt to study collusive adversarial attacks through strategically organizing a set of malicious agents into three collusive attack modes: Collective Malicious Agents, Disguised Malicious Agents, and Spied Malicious Agents. Three novelties are involved: i) three collusive adversarial attacks are creatively proposed for the first time, and a unified framework CAMA for policy‑level collusive attacks is designed; ii) the attack effectiveness is theoretically analyzed from the perspectives of disruptiveness, stealthiness, and attack cost; and iii) the three collusive adversarial attacks are technically realized through agent's observation information fusion, attack‑trigger control. Finally, multi‑facet experiments on four SMAC II maps are performed, and experimental results showcase the three collusive attacks have an additive adversarial synergy, strengthening attack outcome while maintaining high stealthiness and stability over long horizons. Our work fills the gap for collusive adversarial learning in c‑MARL.
Authors: Islam Guven, Mehmet Parlak
Abstract: Multi‑UAV networks are increasingly deployed for large‑scale inspection and monitoring missions, where operational performance depends on the coordination of sensing reliability, communication quality, and energy constraints. In particular, the rapid increase in overflowing waste bins and illegal dumping sites has created a need for efficient detection of waste hotspots. In this work, we introduce JCAS‑MARL, a resource‑aware multi‑agent reinforcement learning (MARL) framework for joint communication and sensing (JCAS)‑enabled UAV networks. Within this framework, multiple UAVs operate in a shared environment where each agent jointly controls its trajectory and the resource allocation of an OFDM waveform used simultaneously for sensing and communication. Battery consumption, charging behavior, and associated CO_2 emissions are incorporated into the system state to model realistic operational constraints. Information sharing occurs over a dynamic communication graph determined by UAV positions and wireless channel conditions. Waste hotspot detection requires consensus among multiple UAVs to improve reliability. Using this environment, we investigate how MARL policies exploit the sensing‑communication‑energy trade‑off in JCAS‑enabled UAV networks. Simulation results demonstrate that adaptive pilot‑density control learned by the agents can outperform static configurations, particularly in scenarios where sensing accuracy and communication connectivity vary across the environment.
Authors: Liangshun Wu, Jianbo Du, Junsuo Qu
Abstract: Efficient computation offloading in multi‑UAV edge networks becomes particularly challenging in dense urban areas, where line‑of‑sight (LoS) links are frequently blocked and user demand varies rapidly. Reconfigurable intelligent surfaces (RISs) can mitigate blockage by creating controllable reflected links, but realizing their potential requires tightly coupled decisions on UAV trajectories, offloading schedules, and RIS phase configurations. This joint optimization is hard to solve in practice because multiple UAVs must coordinate under limited information exchange, and purely model‑free multi‑agent reinforcement learning (MARL) often learns too slowly in highly dynamic environments. To address these challenges, we propose a decentralized model‑based MARL framework. Each UAV optimizes mobility and offloading using observations from several hop neighbors, and submits an RIS phase proposal that is aggregated by a lightweight RIS controller. To boost sample efficiency and stability, agents learn local dynamics models and perform short horizon branched rollouts for proximal policy optimization (PPO) updates. Simulations show near centralized performance with improved throughput and energy efficiency at scale.
Authors: Jingyu Guo, Ziye Chen, Ziwen Li, Zhengqing Gao, Jiaxin Huang, Hanlue Zhang, Fengming Huang, Yu Yao, Tongliang Liu, Mingming Gong
Abstract: Existing UAV vision‑language navigation (VLN) benchmarks have enabled language‑guided flight, but they largely focus on long, step‑wise route descriptions with goal‑centric evaluation, making them less diagnostic for real operations where brief, high‑level commands must be grounded into safe multi‑stage behaviors. We present HUGE‑Bench, a benchmark for High‑Level UAV Vision‑Language‑Action (HL‑VLA) tasks that tests whether an agent can interpret concise language and execute complex, process‑oriented trajectories with safety awareness. HUGE‑Bench comprises 4 real‑world digital twin scenes, 8 high‑level tasks, and 2.56M meters of trajectories, and is built on an aligned 3D Gaussian Splatting (3DGS)‑Mesh representation that combines photorealistic rendering with collision‑capable geometry for scalable generation and collision‑aware evaluation. We introduce process‑oriented and collision‑aware metrics to assess process fidelity, terminal accuracy, and safety. Experiments on representative state‑of‑the‑art VLA models reveal significant gaps in high‑level semantic completion and safe execution, highlighting HUGE‑Bench as a diagnostic testbed for high‑level UAV autonomy.
Authors: Gaoxiang Cao, Wenke Yuan, Huasen He, Yunpeng Hou, Xiaofeng Jiang, Shuangwu Chen, Jian Yang
Abstract: Vehicular Ad‑hoc Networks (VANETs) are the digital cornerstone of autonomous driving, yet they suffer from severe network fragmentation in urban environments due to physical obstructions. Unmanned Aerial Vehicles (UAVs), with their high mobility, have emerged as a vital solution to bridge these connectivity gaps. However, traditional Deep Reinforcement Learning (DRL)‑based UAV deployment strategies lack semantic understanding of road topology, often resulting in blind exploration and sample inefficiency. By contrast, Large Language Models (LLMs) possess powerful reasoning capabilities capable of identifying topological importance, though applying them to control tasks remains challenging. To address this, we propose the Semantic‑Augmented DRL (SA‑DRL) framework. Firstly, we propose a fragmentation quantification method based on Road Topology Graphs (RTG) and Dual Connected Graphs (DCG). Subsequently, we design a four‑stage pipeline to transform a general‑purpose LLM into a domain‑specific topology expert. Finally, we propose the Semantic‑Augmented PPO (SA‑PPO) algorithm, which employs a Logit Fusion mechanism to inject the LLM's semantic reasoning directly into the policy as a prior, effectively guiding the agent toward critical intersections. Extensive high‑fidelity simulations demonstrate that SA‑PPO achieves state‑of‑the‑art performance with remarkable efficiency, reaching baseline performance levels using only 26.6% of the training episodes. Ultimately, SA‑PPO improves two key connectivity metrics by 13.2% and 23.5% over competing methods, while reducing energy consumption to just 28.2% of the baseline.
Authors: Damyon Kim, Yuichi Honjo, Tatsuya Iizuka, Naomi Okubo, Naoto Endo, Hiroshi Matsubara, Yoshihiro Kawahara, Naoto Morita, Takuya Sasatani
Abstract: Air‑dispersed sensor networks deployed from aerial robotic systems (e.g., UAVs) provide a low‑cost approach to wide‑area environmental monitoring. However, existing methods often rely on active actuators for mid‑air shape or trajectory control, increasing both power consumption and system cost. Here, we introduce a passive elastic‑folding hinge mechanism that transforms sensors from a flat, stackable form into a three‑dimensional structure upon release. Hinges are fabricated by laminating commercial sheet materials with rigid printed circuit boards (PCBs) and programming fold angles through a single oven‑heating step, enabling scalable production without specialized equipment. Our geometric model links laminate geometry, hinge mechanics, and resulting fold angle, providing a predictive design methodology for target configurations. Laboratory tests confirmed fold angles between 10 degrees and 100 degrees, with a standard deviation of 4 degrees and high repeatability. Field trials further demonstrated reliable data collection and LoRa transmission during dispersion, while the Horizontal Wind Model (HWM)‑based trajectory simulations indicated strong potential for wide‑area sensing exceeding 10 km.
Authors: Nikolaos D. Tantaroudas, Guanqun Gai, Ilias Karachalios
Abstract: Model Reference Adaptive Control based on Lyapunov stability theory is developed for gust load alleviation of nonlinear aeroelastic systems. The controller operates on a nonlinear reduced‑order model derived from Taylor series expansion and eigenvector projection of the coupled fluid‑structure‑flight dynamic equations. The complete MRAC formulation is presented, including the reference model design that encodes desired closed‑loop damping characteristics, the adaptive control law with real‑time gain adjustment, and the Lyapunov derivation of the adaptation law that guarantees asymptotic tracking in the linear case and bounded tracking under a Lipschitz condition on the nonlinear residual. The adaptation rate matrix is identified as the single most important design parameter, governing the trade‑off between convergence speed, peak load reduction, and actuator demand. Two test cases are considered, a 3DOF aerofoil with cubic stiffness nonlinearities, and a Global Hawk type unmanned aerial vehicle. For the UAV under a discrete gusts, MRAC achieves significant wing‑tip deflection reductions, outperforming the H infinity robust control benchmark with comparable control effort. Under Von Karman stochastic turbulence, meaningful reductions are also obtained, with performance scaling with the adaptation rate. The results demonstrate that MRAC provides an effective framework for GLA of flexible aircraft operating in both deterministic and stochastic disturbance environments.
Authors: Khushiyant
Abstract: This paper transfers three statistical methods from particle physics to multirotor propeller fault detection: the likelihood ratio test (LRT) for binary detection, the CLs modified frequentist method for false alarm rate control, and sequential neural posterior estimation (SNPE) for quantitative fault characterization. Operating on spectral features tied to rotor harmonic physics, the system returns three outputs: binary detection, controlled false alarm rates, and calibrated posteriors over fault severity and motor location. On UAV‑FD, a hexarotor dataset of 18 real flights with 5% and 10% blade damage, leave‑one‑flight‑out cross‑validation gives AUC 0.862 +/‑ 0.007 (95% CI: 0.849‑‑0.876), outperforming CUSUM (0.708 +/‑ 0.010), autoencoder (0.753 +/‑ 0.009), and LSTM autoencoder (0.551). At 5% false alarm rate the system detects 93% of significant and 81% of subtle blade damage. On PADRE, a quadrotor platform, AUC reaches 0.986 after refitting only the generative models. SNPE gives a full posterior over fault severity (90% credible interval coverage 92‑‑100%, MAE 0.012), so the output includes uncertainty rather than just a point estimate or fault flag. Per‑flight sequential detection achieves 100% fault detection with 94% overall accuracy.
Authors: Shenghui Huang, Menghao Hu, Longkun Zou, Hongyu Chi, Zekai Li, Feng Gao, Fan Yang, Qingyao Wu, Ke Chen
Abstract: Detecting Unmanned Aerial Vehicles (UAVs) in low‑altitude environments is essential for perception and defense systems but remains highly challenging due to complex backgrounds, camouflage, and multimodal interference. In real‑world scenarios, UAVs are frequently visually blended with surrounding structures such as buildings, vegetation, and power lines, resulting in low contrast, weak boundaries, and strong confusion with cluttered background textures. Existing UAV detection datasets, though diverse, are not specifically designed to capture these camouflage and complex‑background challenges, which limits progress toward robust real‑world perception. To fill this gap, we construct UAV‑CB, a new RGB‑T UAV detection dataset deliberately curated to emphasize complex low‑altitude backgrounds and camouflage characteristics. Furthermore, we propose the Local Frequency Bridge Network (LFBNet), which models features in localized frequency space to bridge both the frequency‑spatial fusion gap and the cross‑modality discrepancy gap in RGB‑T fusion. Extensive experiments on UAV‑CB and public benchmarks demonstrate that LFBNet achieves state‑of‑the‑art detection performance and strong robustness under camouflaged and cluttered conditions, offering a frequency‑aware perspective on multimodal UAV perception in real‑world applications.
Authors: Nikolaos D. Tantaroudas, Ilias Karachalios
Abstract: H Infinity robust control synthesis for gust load alleviation of very flexible aircraft is presented. The controller is synthesised on a compact reduced‑order model comprising 8 degrees of freedom for the UAV configuration and 9 for the flying‑wing, obtained through nonlinear model order reduction of the coupled fluid‑structure‑flight dynamics system, and validated on the full nonlinear model. The control architecture employs trailing‑edge flap deflection as the actuator and wing‑tip displacement as the performance output, with an input‑shaping weighting function Kc that governs the trade‑off between structural load alleviation and rigid‑body trajectory deviation. Results are presented for a Global Hawk‑like UAV and a very flexible flying‑wing configuration. The methodology demonstrates that H infinity controllers designed on low‑order ROMs can robustly alleviate gust loads when applied to high‑dimensional nonlinear aeroelastic systems.
Authors: Yunting Xu, Jiacheng Wang, Ruichen Zhang, Changyuan Zhao, Yinqiu Liu, Dusit Niyato, Liang Yu, Haibo Zhou, Dong In Kim
Abstract: Multi‑uncrewed aerial vehicle (UAV) cooperative perception has emerged as a promising paradigm for diverse low‑altitude economy applications, where complementary multi‑view observations are leveraged to enhance perception performance via wireless communications. However, the massive visual data generated by multiple UAVs poses significant challenges in terms of communication latency and resource efficiency. To address these challenges, this paper proposes a communication‑efficient cooperative perception framework, termed Base‑Station‑Helped UAV (BHU), which reduces communication overhead while enhancing perception performance. Specifically, we employ a Top‑K selection mechanism to identify the most informative pixels from UAV‑captured RGB images, enabling sparsified visual transmission with reduced data volume and latency. The sparsified images are transmitted to a ground server via multi‑user MIMO (MU‑MIMO), where a Swin‑large‑based MaskDINO encoder extracts bird's‑eye‑view (BEV) features and performs cooperative feature fusion for ground vehicle perception. Furthermore, we develop a diffusion model‑based deep reinforcement learning (DRL) algorithm to jointly select cooperative UAVs, sparsification ratios, and precoding matrices, achieving a balance between communication efficiency and perception utility. Simulation results on the Air‑Co‑Pred dataset demonstrate that, compared with traditional CNN‑based BEV fusion baselines, the proposed BHU framework improves perception performance by over 5% while reducing communication overhead by 85%, providing an effective solution for multi‑UAV cooperative perception under resource‑constrained wireless environments.
Authors: Nadine Muller, Stefano DeRosa, Su Zhang, Chun Lee Huan
Abstract: Multi‑agent deep learning (MADL), including multi‑agent deep reinforcement learning (MADRL), distributed/federated training, and graph‑structured neural networks, is becoming a unifying framework for decision‑making and inference in wireless systems where sensing, communication, and computing are tightly coupled. Recent 5G‑Advanced and 6G visions strengthen this coupling through integrated sensing and communication, edge intelligence, open programmable RAN, and non‑terrestrial/UAV networking, which create decentralized, partially observed, time‑varying, and resource‑constrained control problems. This survey synthesizes the state of the art, with emphasis on 2021‑2025 research, on MADL for distributed sensing and wireless communications. We present a task‑driven taxonomy across (i) learning formulations (Markov games, Dec‑POMDPs, CTDE), (ii) neural architectures (GNN‑based radio resource management, attention‑based policies, hierarchical learning, and over‑the‑air aggregation), (iii) advanced techniques (federated reinforcement learning, communication‑efficient federated deep RL, and serverless edge learning orchestration), and (iv) application domains (MEC offloading with slicing, UAV‑enabled heterogeneous networks with power‑domain NOMA, intrusion detection in sensor networks, and ISAC‑driven perceptive mobile networks). We also provide comparative tables of algorithms, training topologies, and system‑level trade‑offs in latency, spectral efficiency, energy, privacy, and robustness. Finally, we identify open issues including scalability, non‑stationarity, security against poisoning and backdoors, communication overhead, and real‑time safety, and outline research directions toward 6G‑native sense‑communicate‑compute‑learn systems.
Authors: Linghao Zhang, Haitao Zhao, Bo Xu, Hongbo Zhu, Xianbin Wang
Abstract: Space‑air‑ground integrated networks (SAGIN) promise ubiquitous 6G connectivity but face significant resource management challenges due to heterogeneous infrastructure, dynamic topologies, and stringent quality‑of‑service (QoS) requirements. Conventional model‑driven approaches struggle with scalability and adaptability in such complex environments. This paper presents an agentic artificial intelligence (AI) framework for autonomous SAGIN resource management by embedding large language model (LLM)‑based agents into a Monitor‑Analyze‑Plan‑ Execute‑Knowledge (MAPE‑K) control plane. The framework incorporates three specialized agents, namely semantic resource perceivers, intent‑driven orchestrators, and adaptive learners, that collaborate through natural language reasoning to bridge the gap between operator intents and network execution. A key innovation is the hierarchical agent‑reinforcement learning (RL) collaboration mechanism, wherein LLM‑based orchestrators dynamically shape reward functions for RL agents based on semantic network conditions. Validation through UAV‑assisted AIGC service orchestration in energy‑constrained scenarios demonstrates that LLM‑driven reward shaping achieves 14% energy reduction and the lowest average service latency among all compared methods. This agentic paradigm offers a scalable pathway toward adaptive, AI‑native 6G networks, capable of autonomously interpreting intents and adapting to dynamic environments.
Authors: Nikolaos D. Tantaroudas, Ilias Karachalios
Abstract: Identification of worst‑case gust loads is a critical step in the certification of very flexible aircraft, yet the computational cost of nonlinear full‑order simulations renders exhaustive parametric searches impractical. This paper presents a reduced‑order model (ROM) based methodology for rapid worstcase gust identification that achieves computational speedups of up to 600 times relative to full‑order nonlinear simulations. The approach employs nonlinear model order reduction via Taylor series expansion and eigenvector projection of the coupled fluid‑structure‑flight dynamic system. Three test cases of increasing complexity are considered: a three‑degree‑of‑freedom aerofoil (14 states, worst‑case identified from 1,000 design sites), a Global Hawk‑like UAV (540 states, 80 parametric calculations with 30 times speedup), and a very flexible flying‑wing (1,616 states, 37 parametric calculations reduced from 222 hours to 22 minutes). The linear ROM is shown to be accurate for deformations below 10% of the wingspan, while the nonlinear ROM with second‑order Taylor expansion accurately captures the large‑deformation regime. The methodology provides a practical tool for integrating worst‑case gust search into aircraft certification workflows.
Authors: Enguang Fan, Yifan Chen, Zihan Shan, Matthew Caesar, Jae Kim
Abstract: Autonomous Unmanned Aerial Vehicle (UAV) swarms are increasingly used as rapidly deployable aerial relays and sensing platforms, yet practical deployments must operate under partial observability and intermittent peer‑to‑peer links. We present a graph‑based multi‑agent reinforcement learning framework trained under centralized training with decentralized execution (CTDE): a centralized critic and global state are available only during training, while each UAV executes a shared policy using local observations and messages from nearby neighbors. Our architecture encodes local agent state and nearby entities with an agent‑entity attention module, and aggregates inter‑UAV messages with neighbor self‑attention over a distance‑limited communication graph. We evaluate primarily on a cooperative relay deployment task (DroneConnect) and secondarily on an adversarial engagement task (DroneCombat). In DroneConnect, the proposed method achieves high coverage under restricted communication and partial observation (e.g. 74% coverage with M = 5 UAVs and N = 10 nodes) while remaining competitive with a mixed‑integer linear programming (MILP) optimization‑based offline upper bound, and it generalizes to unseen team sizes without fine‑tuning. In the adversarial setting, the same framework transfers without architectural changes and improves win rate over non‑communicating baselines.
Authors: Moji Shi, Rajitha de Silva, Hang Yu, Riccardo Polvara, Marija Popović
Abstract: Autonomous exploration in unknown environments typically relies on onboard state estimation for localisation and mapping. Existing exploration methods primarily maximise coverage efficiency, but often overlook that visual‑inertial odometry (VIO) performance strongly depends on the availability of robust visual features. As a result, exploration policies can drive a robot into feature‑sparse regions where tracking degrades, leading to odometry drift, corrupted maps, and mission failure. We propose a hierarchical perception‑aware exploration framework for a stereo‑equipped unmanned aerial vehicle (UAV) that explicitly couples exploration progress with feature observability. Our approach (i) associates each candidate frontier with an expected feature quality using a global feature map, and prioritises visually informative subgoals, and (ii) optimises a continuous yaw trajectory along the planned motion to maintain stable feature tracks. We evaluate our method in simulation across environments with varying texture levels and in real‑world indoor experiments with largely textureless walls. Compared to baselines that ignore feature quality and/or do not optimise continuous yaw, our method maintains more reliable feature tracking, reduces odometry drift, and achieves on average 30% higher coverage before the odometry error exceeds specified thresholds.
Authors: Jacob Elskamp, Moji Shi, Leonard Bauersfeld, Davide Scaramuzza, Marija Popović
Abstract: Battery‑powered multirotor unmanned aerial vehicles (UAVs) can rapidly map unknown environments, but mission performance is often limited by energy rather than geometry alone. Standard exploration policies that optimise for coverage or time can therefore waste energy through manoeuvre‑heavy trajectories. In this paper, we address energy‑aware autonomous 3D exploration for multirotor UAVs in initially unknown environments. We propose Energy‑Aware Autonomous Exploration (EAAE), a modular frontier‑based framework that makes energy an explicit decision variable during frontier selection. EAAE clusters frontiers into view‑consistent regions, plans dynamically feasible candidate trajectories to the most informative clusters, and predicts their execution energy using an offline power estimation loop. The next target is then selected by minimising predicted trajectory energy while preserving exploration progress through a dual‑layer planning architecture for safe execution. We evaluate EAAE in a full exploration pipeline with a rotor‑speed‑based power model across simulated 3D environments of increasing complexity. Compared to representative distance‑based and information gain‑based frontier baselines, EAAE consistently reduces total energy consumption while maintaining competitive exploration time and comparable map quality, providing a practical drop‑in energy‑aware layer for frontier exploration.
Authors: Yida Lin, Bing Xue, Mengjie Zhang, Sam Schofield, Richard Green
Abstract: Dense ground‑truth disparity maps are practically unobtainable in forestry environments, where thin overlapping branches and complex canopy geometry defeat conventional depth sensors ‑‑ a critical bottleneck for training supervised stereo matching networks for autonomous UAV‑based pruning. We present UE5‑Forest, a photorealistic synthetic stereo dataset built entirely in Unreal Engine 5 (UE5). One hundred and fifteen photogrammetry‑scanned trees from the Quixel Megascans library are placed in virtual scenes and captured by a simulated stereo rig whose intrinsics ‑‑ 63 mm baseline, 2.8 mm focal length, 3.84 mm sensor width ‑‑ replicate the ZED Mini camera mounted on our drone. Orbiting each tree at up to 2 m across three elevation bands (horizontal, +45 degrees, ‑45 degrees) yields 5,520 rectified 1920 x 1080 stereo pairs with pixel‑perfect disparity labels. We provide a statistical characterisation of the dataset ‑‑ covering disparity distributions, scene diversity, and visual fidelity ‑‑ and a qualitative comparison with real‑world Canterbury Tree Branches imagery that confirms the photorealistic quality and geometric plausibility of the rendered data. The dataset will be publicly released to provide the community with a ready‑to‑use benchmark and training resource for stereo‑based forestry depth estimation.
Authors: Riya Samanta, Bidyut Saha
Abstract: Precision agriculture increasingly integrates artificial intelligence to enhance crop monitoring, irrigation management, and resource efficiency. Nevertheless, the vast majority of the current systems are still mostly cloud‑based and require reliable connectivity, which hampers the adoption to smaller scale, smallholder farming and underdeveloped country systems. Using recent literature reviews, ranging from 2023 to 2026, this review covers deployments of Edge AI, focused on the evolution and acceptance of Tiny Machine Learning, in low‑cost and low‑powered agriculture. A hardware‑targeted deployment‑oriented study has shown pronounced variation in architecture with microcontroller‑class platforms i.e. ESP32, STM32, ATMega dominating the inference options, in parallel with single‑board computers and UAV‑assisted solutions. Quantitative synthesis shows quantization is the dominant optimization strategy; the approach in many works identified: around 50% of such works are quantized, while structured pruning, multi‑objective compression and hardware aware neural architecture search are relatively under‑researched. Also, resource profiling practices are not uniform: while model size is occasionally reported, explicit flash, RAM, MAC, latency and millijoule level energy metrics are not well documented, hampering reproducibility and cross‑system comparison. Moreoever, to bridge the gap between research prototypes and deployment‑ready systems, the review also presents a literature‑informed deployment perspective in the form of a privacy‑preserving layered Edge AI architecture for agriculture, synthesizing the key system‑level design insights emerging from the surveyed works. Overall, the findings demonstrate a clear architectural shift toward localized inference with centralized training asymmetry.
Authors: Hürkan Şahin, Huy Xuan Pham, Van Huyen Dang, Alper Yegenoglu, Erdal Kayacan
Abstract: Autonomous navigation in GPS‑denied and visually degraded environments remains challenging for unmanned aerial vehicles (UAVs). To this end, we investigate the use of a monocular thermal camera as a standalone sensor on a UAV platform for real‑time depth estimation and simultaneous localization and mapping (SLAM). To extract depth information from thermal images, we propose a novel pipeline employing a lightweight supervised network with recurrent blocks (RBs) integrated to capture temporal dependencies, enabling more robust predictions. The network combines lightweight convolutional backbones with a thermal refinement network (T‑RefNet) to refine raw thermal inputs and enhance feature visibility. The refined thermal images and predicted depth maps are integrated into ORB‑SLAM3, enabling thermal‑only localization. Unlike previous methods, the network is trained on a custom non‑radiometric dataset, obviating the need for high‑cost radiometric thermal cameras. Experimental results on datasets and UAV flights demonstrate competitive depth accuracy and robust SLAM performance under low‑light conditions. On the radiometric VIVID++ (indoor‑dark) dataset, our method achieves an absolute relative error of approximately 0.06, compared to baselines exceeding 0.11. In our non‑radiometric indoor set, baseline errors remain above 0.24, whereas our approach remains below 0.10. Thermal‑only ORB‑SLAM3 maintains a mean trajectory error under 0.4 m.
Authors: Yueshan Lin, Wei Feng, Yunfei Chen, Yongxu Zhu, Ning Ge, Shi Jin
Abstract: This paper investigates latency‑constrained resource synergization for mission‑oriented non‑terrestrial networks (NTNs) in post‑disaster emergency scenarios. When terrestrial infrastructures are damaged, unmanned aerial vehicles (UAVs) equipped with edge information hubs (EIHs) are deployed to provide temporary coverage and synergize communication and computing resources for rapid situation awareness. We formulate a joint resource configuration and location optimization problem to minimize overall resource costs while guaranteeing stringent latency requirements. Through analytical derivations, we obtain closed‑form optimal solutions that reveal the fundamental tradeoff between communication and computing resources, and develop a successive convex approximation method for EIH location optimization. Simulation results demonstrate that the proposed scheme achieves approximately 20% cost reduction compared with benchmark approaches, validating its optimality and effectiveness for mission‑critical emergency response applications in the sixth‑generation (6G) era.
Authors: Valentin Gaucher, Wenlong Zhang
Abstract: Unmanned aerial vehicles (UAVs) operating in cluttered environments require efficient and accurate impact modeling to maintain stability post collisions, however classical impulse contact models decouple the normal and tangential components. This letter presents a dual quaternion impulse reset map directly on the SE(3) manifold. By operating on the unified spatial twist (unified linear and angular velocities), the proposed formulation retains the cross‑coupling between normal and tangential impulse components in a single closed‑form expression, and recovers the classical decoupled Newton impulse model as a special case. A recovery controller is designed that couples linear and angular momentum to enforce kinetic energy dissipation across impacts. Hardware‑in‑the‑loop benchmarks demonstrate a 24% reduction in execution latency compared to an optimized matrix‑based implementation, and a 20% reduction relative to a position‑plus‑quaternion (PQ) formulation. MuJoCo simulations across Monte Carlo sweeps over impact angles and friction coefficients show a 50.8%‑75.1% reduction in position root‑mean‑square error (RMSE) and a 68.7%‑85% decrease in peak kinetic energy compared to published linear‑admittance baselines.
Authors: Seoyoung Lee, Shaekh Mohammad Shithil, Durgakant Pushp, Lantao Liu, Zhangyang Wang
Abstract: Inspection of confined infrastructure such as culverts often requires accessing hidden spaces whose entrances are reachable primarily from elevated viewpoints. Aerial‑ground cooperation enables a UAV to deploy a compact UGV for interior exploration, but selecting a suitable deployment region from aerial observations requires metric terrain reasoning involving scale ambiguity, reconstruction uncertainty, and terrain semantics. We present a metric RGB‑based geometric‑semantic reconstruction and traversability analysis framework for aerial‑to‑ground hidden space inspection. A feed‑forward multi‑view RGB reconstruction backbone produces dense geometry, while temporally consistent semantic segmentation yields a 3D semantic map. To enable deployment‑relevant measurements without LiDAR‑based dense mapping, we introduce an embodied motion prior that recovers metric scale by enforcing consistency between predicted camera motion and onboard platform egomotion. From the metrically grounded reconstruction, we construct a confidence‑aware geometric‑semantic traversability map and evaluate candidate deployment zones under explicit reachability constraints. Experiments on a tethered UAV‑UGV platform demonstrate reliable deployment‑zone identification in hidden space scenarios.
Authors: Jiarui Zhang, Junqi Hu, Zurong Mai, Yuhang Chen, Shuohong Lou, Henglian Huang, Lingyuan Zhao, Jianxi Huang, Yutong Lu, Haohuan Fu, Juepeng Zheng
Abstract: Agricultural multimodal reasoning requires robust spatial understanding across varying scales, from ground‑level close‑ups to top‑down UAV and satellite imagery. Existing Multi‑modal Large Language Models (MLLMs) suffer from a significant "terrestrial‑centric" bias, causing scale confusion and logic drift during complex agricultural planning. To address this, we introduce the first large‑scale AgroOmni (288K), a multi‑view training corpus designed to capture diverse spatial topologies and scales in modern precision agriculture. Built on this dataset, we propose AgroNVILA, an MLLM that utilizes a novel Perception‑Reasoning Decoupling (PRD) architecture. On the perception side, we incorporate a View‑Conditioned Meta‑Net (VCMN), which injects macroscopic spatial context into visual tokens, resolving scale ambiguities with minimal computational overhead. On the reasoning side, Agriculture‑aware Relative Policy Optimization (ARPO) leverages reinforcement learning to align the model's decision‑making with expert agricultural logic, preventing statistical shortcuts. Extensive experiments demonstrate that AgroNVILA outperforms state‑of‑the‑art MLLMs, achieving significant improvements (+15.18%) in multi‑altitude agricultural reasoning, reflecting its robust capability for holistic agricultural spatial planning.
Authors: Yang Zhan, Yuan Yuan
Abstract: Multimodal Large Language Models (MLLMs) have made significant strides in natural images and satellite remote sensing images. However, understanding low‑altitude drone scenarios remains a challenge. Existing datasets primarily focus on a few specific low‑altitude visual tasks, which cannot fully assess the ability of MLLMs in real‑world low‑altitude UAV applications. Therefore, we introduce UAVBench, a comprehensive benchmark, and UAVIT‑1M, a large‑scale instruction tuning dataset, designed to evaluate and improve MLLMs' abilities in low‑altitude vision‑language tasks. UAVBench comprises 43 test units and 966k high‑quality data samples across 10 tasks at the image‑level and region‑level. UAVIT‑1M consists of approximately 1.24 million diverse instructions, covering 789k multi‑scene images and about 2,000 types of spatial resolutions with 11 distinct tasks. UAVBench and UAVIT‑1M feature pure real‑world visual images and rich weather conditions, and involve manual verification to ensure high quality. Our in‑depth analysis of 11 state‑of‑the‑art MLLMs using UAVBench reveals that open‑source MLLMs cannot generate accurate conversations about low‑altitude visual content, lagging behind closed‑source MLLMs. Extensive experiments demonstrate that fine‑tuning open‑source MLLMs on UAVIT‑1M significantly addresses this gap. Our contributions pave the way for bridging the gap between current MLLMs and low‑altitude UAV real‑world application demands. (Project page: https://UAVBench.github.io/)
Authors: Kautuk Astu, Yogesh Simmhan
Abstract: Designing correct UAV autonomy programs is challenging due to joint navigation, sensing and analytics requirements. While LLMs can generate code, their reliability for safety‑critical UAVs remains uncertain. This paper presents AeroGen, an open‑loop framework that enables consistently correct single‑shot AI‑generated drone control programs through structured guardrail prompting and integration with the AeroDaaS drone SDK. AeroGen encodes API descriptions, flight constraints and operational world rules directly into the system context prompt, enabling generic LLMs to produce constraint‑aware code from user prompts, with minimal example code. We evaluate AeroGen across a diverse benchmark of 20 navigation tasks and 5 drone missions on urban, farm and inspection environments, using both imperative and declarative user prompts. AeroGen generates about 40 lines of AeroDaaS Python code in about 20s per mission, in both real‑world and simulations, showing that structured prompting with a well‑defined SDK improves robustness, correctness and deployability of LLM‑generated drone autonomy programs.
Authors: Xianke Wu, Songlin Bai, Chengxiang Li, Zhiyao Luo, Yulin Tian, Fenghua Zhu, Yisheng Lv, Yonglin Tian
Abstract: While Vehicle‑to‑Vehicle (V2V) collaboration extends sensing ranges through multi‑agent data sharing, its reliability remains severely constrained by ground‑level occlusions and the limited perspective of chassis‑mounted sensors, which often result in critical perception blind spots. We propose OpenCOOD‑Air, a novel framework that integrates UAVs as extensible platforms into V2V collaborative perception to overcome these constraints. To mitigate gradient interference from ground‑air domain gaps and data sparsity, we adopt a transfer learning strategy to fine‑tune UAV weights from pre‑trained V2V models. To prevent the spatial information loss inherent in this transition, we formulate ground‑air collaborative perception as a heterogeneous integration task with explicit altitude supervision and introduce a Cross‑Domain Spatial Converter (CDSC) and a Spatial Offset Prediction Transformer (SOPT). Furthermore, we present the OPV2V‑Air benchmark to validate the transition from V2V to Vehicle‑to‑Vehicle‑to‑UAV. Compared to state‑of‑the‑art methods, our approach improves 2D and 3D AP@0.7 by 4% and 7%, respectively.
Authors: Wenchao Wu, Shutong Chen, Wenjie Liu, Zhibo Pang, Yansha Deng, Robert Schober
Abstract: Wirelessly‑connected robotic systems empower robots with real‑time intelligence by leveraging remote computing resources for decision‑making. However, the data exchange between robots and edge servers often overwhelms communication links, introducing latency that degrades task performance. To tackle this, goal‑oriented semantic communication (GSC) has been introduced for wirelessly‑connected robotic systems to extract and transmit only goal‑relevant semantic representations. While this improves task effectiveness, it generally overlooks practical safety requirements. Meanwhile, existing robotics research often treats safety primarily as a control‑level problem, without systematically considering safety across sensing, communication, and control in a closed‑loop manner. To bridge this gap, we investigate how to enable safety‑aware goal‑oriented semantic (SA‑GS) sensing, communication, and control co‑design in wirelessly‑connected robotic systems, aiming to maximize the robotic task effectiveness subject to practical safety requirements. We first introduce an architecture for wirelessly‑connected robotic systems and representative use cases. We then summarize general safety requirements and effectiveness metrics across the use cases. Next, we systematically analyze the unique safety and effectiveness challenges in sensing, communication, and control. Based on these, we further present potential SA‑GS research directions. Finally, an Unmanned Aerial Vehicle (UAV) target tracking case study validates that one of the presented SA‑GS research directions, i.e., semantic‑based C\&C packet execution, could significantly improve safety rate and tracking success rate by more than 2 times and 4.5 times, respectively.
Authors: Yara AlaaEldin
Abstract: In this thesis, we leverage monocular cameras on aerial robots to predict depth and semantic maps in low‑altitude unstructured environments. We propose a joint deep‑learning architecture, named Co‑SemDepth, that can perform the two tasks accurately and rapidly, and validate its effectiveness on a variety of datasets. The training of neural networks requires an abundance of annotated data, and in the UAV field, the availability of such data is limited. We introduce a new synthetic dataset in this thesis, TopAir that contains images captured with a nadir view in outdoor environments at different altitudes, helping to fill the gap.
While using synthetic data for the training is convenient, it raises issues when shifting to the real domain for testing. We conduct an extensive analytical study to assess the effect of several factors on the synthetic‑to‑real generalization. Co‑SemDepth and TaskPrompter models are used for comparison in this study. The results reveal a superior generalization performance for Co‑SemDepth in depth estimation and for TaskPrompter in semantic segmentation. Also, our analysis allows us to determine which training datasets lead to a better generalization. Moreover, to help attenuate the gap between the synthetic and real domains, image style transfer techniques are explored on aerial images to convert from the synthetic to the realistic style. Cycle‑GAN and Diffusion models are employed. The results reveal that diffusion models are better in the synthetic to real style transfer.
In the end, we focus on the marine domain and address its challenges. Co‑SemDepth is trained on a collected synthetic marine data, called MidSea, and tested on both synthetic and real data. The results reveal good generalization performance of Co‑SemDepth when tested on real data from the SMD dataset while further enhancement is needed on the MIT dataset.
Authors: Qishen Zhong, Junlong Wu, Jian Yang, Guanwei Xiao, Junqi Wu, Zimeng Jiang, Pingan Fang
Abstract: This paper presents a feasibility‑enhanced control barrier function (FECBF) framework for multi‑UAV collision avoidance. In dense multi‑UAV scenarios, the feasibility of the CBF quadratic program (CBF‑QP) can be compromised due to internal incompatibility among multiple CBF constraints. To address this issue, we analyze the internal compatibility of CBF constraints and derive a sufficient condition for internal compatibility. Based on this condition, a sign‑consistency constraint is introduced to mitigate internal incompatibility. The proposed constraint is incorporated into a decentralized CBF‑QP formulation using worst‑case estimates and slack variables. Simulation results demonstrate that the proposed method significantly reduces infeasibility and improves collision avoidance performance compared with existing baselines in dense scenarios. Additional simulations under varying time delays demonstrate the robustness of the proposed method. Real‑world experiments validate the practical applicability of the proposed method.
Authors: MoniJesu Wonders James, Amir Atef Habel, Aleksey Fedoseev, Dzmitry Tsetserokou
Abstract: Cooperative visual semantic navigation is a foundational capability for aerial robot teams operating in unknown environments. However, achieving robust open‑vocabulary object‑goal navigation remains challenging due to the computational constraints of deploying heavy perception models onboard and the complexity of decentralized multi‑agent coordination. We present GoalSwarm, a fully decentralized multi‑UAV framework for zero‑shot semantic object‑goal navigation. Each UAV collaboratively constructs a shared, lightweight 2D top‑down semantic occupancy map by projecting depth observations from aerial vantage points, eliminating the computational burden of full 3D representations while preserving essential geometric and semantic structure. The core contributions of GoalSwarm are threefold: (1) integration of zero‑shot foundation model ‑‑ SAM3 for open vocabulary detection and pixel‑level segmentation, enabling open‑vocabulary target identification without task‑specific training; (2) a Bayesian Value Map that fuses multi‑viewpoint detection confidences into a per‑pixel goal‑relevance distribution, enabling informed frontier scoring via Upper Confidence Bound (UCB) exploration; and (3) a decentralized coordination strategy combining semantic frontier extraction, cost‑utility bidding with geodesic path costs, and spatial separation penalties to minimize redundant exploration across the swarm.
Authors: Jinwen Zhu, Xudong Zhao, Fangcheng Zhu, Jun Hu, Shi Jin, Yinian Mao, Guoquan Huang
Abstract: Robust and accurate navigation is critical for Unmanned Aerial Vehicles (UAVs) especially for those with stringent Size, Weight, and Power (SWaP) constraints. However, most state‑of‑the‑art (SOTA) LiDAR‑Inertial Odometry (LIO) systems still suffer from estimation inconsistency and computational bottlenecks when deployed on such platforms. To address these issues, this paper proposes a consistent and efficient tightly‑coupled LIO framework tailored for UAVs. Within the efficient Multi‑State Constraint Kalman Filter (MSCKF) framework, we build coplanar constraints inferred from planar features observed across a sliding window. By applying null‑space projection to sliding‑window coplanar constraints, we eliminate the direct dependency on feature parameters in the state vector, thereby mitigating overconfidence and improving consistency. More importantly, to further boost the efficiency, we introduce a parallel voxel‑based data association and a novel compact cluster‑to‑plane measurement model. This compact measurement model losslessly reduces observation dimensionality and significantly accelerating the update process. Extensive evaluations demonstrate that our method outperforms most state‑of‑the‑art (SOTA) approaches by providing a superior balance of consistency and efficiency. It exhibits improved robustness in degenerate scenarios, achieves the lowest memory usage via its map‑free nature, and runs in real‑time on resource‑constrained embedded platforms (e.g., NVIDIA Jetson TX2).
Authors: Fuhai Chen, Pengpeng Huang, Junwen Wu, Hehong Zhang, Shiping Wang, Xiaoguang Ma, Xuri Ge
Abstract: This paper proposes a novel task for UAV scene understanding ‑ UAV Scene Change Captioning (UAV‑SCC) ‑ which aims to generate natural language descriptions of semantic changes in dynamic aerial imagery captured from a movable viewpoint. Unlike traditional change captioning that mainly describes differences between image pairs captured from a fixed camera viewpoint over time, UAV scene change captioning focuses on image‑pair differences resulting from both temporal and spatial scene variations dynamically captured by a moving camera. The key challenge lies in understanding viewpoint‑induced scene changes from UAV image pairs that share only partially overlapping scene content due to viewpoint shifts caused by camera rotation, while effectively exploiting the relative orientation between the two images. To this end, we propose a Hierarchical Dual‑Change Collaborative Learning (HDC‑CL) method for UAV scene change captioning. In particular, a novel transformer, \emphi.e. Dynamic Adaptive Layout Transformer (DALT) is designed to adaptively model diverse spatial layouts of the image pair, where the interrelated features derived from the overlapping and non‑overlapping regions are learned within the flexible and unified encoding layer. Furthermore, we propose a Hierarchical Cross‑modal Orientation Consistency Calibration (HCM‑OCC) method to enhance the model's sensitivity to viewpoint shift directions, enabling more accurate change captioning. To facilitate in‑depth research on this task, we construct a new benchmark dataset, named UAV‑SCC dataset, for UAV scene change captioning. Extensive experiments demonstrate that the proposed method achieves state‑of‑the‑art performance on this task. The dataset and code will be publicly released upon acceptance of this paper.
Authors: Kan Yu, Kaixuan Li, Yujia Zhao, Dingyou Ma, Qixun Zhang, Zhiyong Feng
Abstract: The rapid proliferation of unmanned aerial vehicle (UAV) applications imposes stringent requirements on continuous and reliable communication coverage in low‑altitude airspace. Conventional cellular systems built upon fixed‑position antennas (FPAs) are inherently constrained by static array geometries and limited mechanical degrees of freedom, which severely restrict their ability to adapt to highly dynamic three‑dimensional (3D) propagation environments. Movable antenna (MA) technology has recently emerged as a promising paradigm to overcome these limitations by actively reconfiguring electromagnetic radiation characteristics through controllable antenna positioning and array orientation, thereby enabling flexible spatial coverage adaptation. To systematically quantify the airspace coverage capability of MA‑enabled systems, this paper formulates a spatial coverage maximization problem over a discretized 3D voxel space. For each voxel, the received signal‑to‑noise ratio (SNR) is maximized via joint optimization of the MA's 3D positions and beamforming matrices. To efficiently solve the resulting non‑convex problem, a hybrid particle swarm optimization and simulated annealing framework is developed to search for high‑quality antenna configurations. Simulation results demonstrate that the proposed MA design framework substantially outperforms conventional FPA‑based schemes in terms of spatial coverage, achieving coverage rates of 26.8% and 29.65% for airspace below 300m and 600m, respectively. Moreover, further coverage enhancement can be attained by incorporating mechanical tilt adjustment, highlighting the strong potential of MA technology for reliable low‑altitude communication coverage.
Authors: Nivand Khosravi, Rodrigo Ventura, Meysam Basiri
Abstract: Non‑repetitive solid‑state LiDAR scanning leads to an extremely sparse measurement regime for detecting airborne UAVs: a small quadrotor at 10‑25 m typically produces only 1‑2 returns per scan, which is far below the point densities assumed by most existing detection approaches and inadequate for robust multi‑target data association. We introduce an unsupervised, LiDAR‑only pipeline that addresses both detection and tracking without the need for labeled training data. The detector integrates range‑adaptive DBSCAN clustering with a three‑stage temporal consistency check and is benchmarked on real‑world air‑to‑air flight data under eight different parameter configurations. The best setup attains 0.891 precision, 0.804 recall, and 0.63 m RMSE, and a systematic minPts sweep verifies that most scans contain at most 1‑2 target points, directly quantifying the sparsity regime. For multi‑target tracking, we compare deterministic Hungarian assignment with joint probabilistic data association (JPDA), each coupled with Interacting Multiple Model filtering, in four simulated scenarios with increasing levels of ambiguity. JPDA cuts identity switches by 64% with negligible impact on MOTA, demonstrating that probabilistic association is advantageous when UAV trajectories approach one another closely. A two‑environment evaluation strategy, combining real‑world detection with RTK‑GPS ground truth and simulation‑based tracking with identity‑annotated ground truth, overcomes the limitations of GNSS‑only evaluation at inter‑UAV distances below 2 m.
Authors: Zhirun Li, Derek Hollenbeck, Ruikun Wu, Michelle Sherman, Sihua Shao, Xiang Sun, Mostafa Hassanalian
Abstract: Undocumented orphaned wells pose significant health and environmental risks to nearby communities by releasing toxic gases and contaminating water sources, with methane emissions being a primary concern. Traditional survey methods such as magnetometry often fail to detect older wells effectively. In contrast, aerial in‑situ sensing using unmanned aerial vehicles (UAVs) offers a promising alternative for methane emission detection and source localization. This study presents a robust and efficient framework based on a multi‑agent deep reinforcement learning (MARL) algorithm for the chemical plume source localization (CPSL) problem. The proposed approach leverages virtual anchor nodes to coordinate UAV navigation, enabling collaborative sensing of gas concentrations and wind velocities through onboard and shared measurements. Source identification is achieved by analyzing the historical trajectory of anchor node placements within the plume. Comparative evaluations against the fluxotaxis method demonstrate that the MARL framework achieves superior performance in both localization accuracy and operational efficiency.
Authors: Rui Wang, Kaitao Meng, Deshi Li, Liang Xu
Abstract: Integrated sensing and communication (ISAC) has attracted growing research interests to facilitate the large‑scale development of the low‑altitude economy (LAE). However, the high dynamics of low‑altitude targets may overwhelm fixed ISAC systems, particularly at the edge of their coverage or in blind zones. Driven by high flexibility, unmanned aerial vehicle (UAV)‑assisted ISAC can provide more freedom of design to enhance communication and sensing abilities. In this paper, we propose an ISAC‑enabled multi‑UAV dynamic collaborative target sensing scheme, where UAVs can dynamically adjust their flight and resource allocation for cooperative sensing of mobile target through communicating with the terrestrial cellular network with ISAC signals. To achieve the precise sensing of the dynamic target, the posterior Cramer‑Rao bound (PCRB) for the target state is derived. Subsequently, the PCRB minimization problem is formulated by jointly optimizing the UAV‑BS association, UAVs' trajectories and bandwidth allocation, subject to the communication requirements for the UAVs. However, the problem is challenging since it involves non‑convex and implicit objective function with coupled optimization variables. For a fast implementation of sensing and tracking, we propose a low‑complexity iterative algorithm that can efficiently obtain a sub‑optimal solution to the problem. Specifically, the UAV‑BS association is first determined by the communication‑optimal solution. Then the UAVs' trajectories and bandwidth allocation are alternatively optimized based on the descent direction search algorithm. Finally, numerical results are provided to validate the superiority of our proposed designs as compared to various benchmarks.
Authors: Min Hao, Zhizhuo Li, Zirui Zhang, Maoqiang Wu, Han Zhang, Rong Yu
Abstract: Millimeter‑wave or terahertz communications can meet demands of low‑altitude economy networks for high‑throughput sensing and real‑time decision making. However, high‑frequency characteristics of wireless channels result in severe propagation loss and strong beam directivity, which make beam prediction challenging in highly mobile uncrewed aerial vehicles (UAV) scenarios. In this paper, we employ agentic AI to enable the transformation of mmWave base stations toward embodied intelligence. We innovatively design a multi‑agent collaborative reasoning architecture for UAV‑to‑ground mmWave communications and propose a hybrid beam prediction model system based on bimodal data. The multi‑agent architecture is designed to overcome the limited context window and weak controllability of large language model (LLM)‑based reasoning by decomposing beam prediction into task analysis, solution planning, and completeness assessment. To align with the agentic reasoning process, a hybrid beam prediction model system is developed to process multimodal UAV data, including numeric mobility information and visual observations. The proposed hybrid model system integrates Mamba‑based temporal modelling, convolutional visual encoding, and cross‑attention‑based multimodal fusion, and dynamically switches data‑flow strategies under multi‑agent guidance. Extensive simulations on a real UAV mmWave communication dataset demonstrate that proposed architecture and system achieve high prediction accuracy and robustness under diverse data conditions, with maximum top‑1 accuracy reaching 96.57%.
Authors: Adrian Andrei Buda, Xavier Chen, Nicolò Botteghi, Urban Fasel
Abstract: Co‑design optimisation of autonomous systems has emerged as a powerful alternative to sequential approaches by jointly optimising physical design and control strategies. However, existing frameworks often neglect the robustness required for autonomous systems navigating unstructured, real‑world environments. For agile Unmanned Aerial Vehicles (UAVs) operating at the edge of the flight envelope, this lack of robustness yields designs that are sensitive to perturbations and model mismatch. To address this, we propose a robust co‑design framework for agile fixed‑wing UAVs that integrates parametric uncertainty and wind disturbances directly into the concurrent optimisation process. Our bi‑level approach optimises physical design in a high‑level loop while discovering nominal solutions via a constrained trajectory planner and evaluating performance across a stochastic Monte Carlo ensemble using feedback LQR control. Validated across three agile flight missions, our strategy consistently outperforms deterministic baselines. The results demonstrate that our robust co‑design strategy inherently tailors aerodynamic features, such as wing placement and aspect ratio, to achieve an optimal trade‑off between mission performance and disturbance rejection.
Authors: Islam Guven, Mehmet Parlak
Abstract: Unmanned aerial vehicles (UAVs) are increasingly used to support time‑critical medical supply delivery, providing rapid and flexible logistics during emergencies and resource shortages. However, effective deployment of UAV fleets requires coordination mechanisms capable of prioritizing medical requests, allocating limited aerial resources, and adapting delivery schedules under uncertain operational conditions. This paper presents a multi‑agent reinforcement learning (MARL) framework for coordinating UAV fleets in stochastic medical delivery scenarios where requests vary in urgency, location, and delivery deadlines. The problem is formulated as a partially observable Markov decision process (POMDP) in which UAV agents maintain awareness of medical delivery demands while having limited visibility of other agents due to communication and localization constraints. The proposed framework employs Proximal Policy Optimization (PPO) as the primary learning algorithm and evaluates several variants, including asynchronous extensions, classical actor‑‑critic methods, and architectural modifications to analyze scalability and performance trade‑offs. The model is evaluated using real‑world geographic data from selected clinics and hospitals extracted from the OpenStreetMap dataset. The framework provides a decision‑support layer that prioritizes medical tasks, reallocates UAV resources in real time, and assists healthcare personnel in managing urgent logistics. Experimental results show that classical PPO achieves superior coordination performance compared to asynchronous and sequential learning strategies, highlighting the potential of reinforcement learning for adaptive and scalable UAV‑assisted healthcare logistics.
Authors: Sizhe Huang, Shujie Yang
Abstract: As backdoor attacks in UAV‑based decentralized federated learning (DFL) grow increasingly stealthy and sophisticated, existing defenses have likewise escalated in complexity. Yet these defenses, which rely heavily on outlier detection, remain vulnerable to carefully crafted backdoors. In UAV‑DFL, the lack of global coordination and limited resources further render outlier‑based defenses impractical. Against this backdrop, gradient spectral analysis offers a promising alternative. While prior work primarily leverages low‑frequency coefficients for pairwise comparisons, it neglects to analyze the intrinsic spectral characteristics of backdoor gradients. Through empirical analysis of existing stealthy attacks, we reveal a key insight: the more effort attackers invest in mimicking benign behaviors, the more distinct the spectral concentration becomes. Motivated by this, we propose Task‑Aware Spectral Energy Refine (TASER) ‑‑ a decentralized defense framework. To our knowledge, this is the first efficient backdoor defense that utilizes spectral concentration instead of complex outlier detection, enabling mitigation of stealthy attacks by structurally disrupting the backdoor task. To suppress the backdoor task, TASER preserves main‑task‑relevant frequency coefficients and discards others. We provide theoretical guarantees and demonstrate through experiments that TASER remains effective against stealthy backdoor attacks that bypass outlier‑based defenses, achieving attack success rate below 20% and accuracy loss under 5%.
Authors: Nivand Khosravi, Meysam Basiri, Rodrigo Ventura
Abstract: Accurate relative positioning is crucial for swarm aerial robotics, enabling coordinated flight and collision avoidance. Although vision‑based tracking has been extensively studied, 3D LiDAR‑based methods remain underutilized despite their robustness under varying lighting conditions. Existing systems often rely on bulky, power‑intensive sensors, making them impractical for small UAVs with strict payload and energy constraints. This paper presents a lightweight LiDAR‑based UAV tracking system incorporating an Adaptive Extended Kalman Filter (AEKF) framework. Our approach effectively addresses the challenges posed by sparse, noisy, and nonuniform point cloud data generated by non‑repetitive scanning 3D LiDARs, ensuring reliable tracking while remaining suitable for small drones with strict payload constraints. Unlike conventional filtering techniques, the proposed method dynamically adjusts the noise covariance matrices using innovation and residual statistics, thereby enhancing tracking accuracy under real‑world conditions. Additionally, a recovery mechanism ensures continuity of tracking during temporary detection failures caused by scattered LiDAR returns or occlusions. Experimental validation was performed using a Livox Mid‑360 LiDAR mounted on a DJI F550 UAV in real‑world flight scenarios. The proposed method demonstrated robust UAV tracking performance under sparse LiDAR returns and intermittent detections, consistently outperforming both standard Kalman filtering and particle filtering approaches during aggressive maneuvers. These results confirm that the framework enables reliable relative positioning in GPS‑denied environments without the need for multi‑sensor arrays or external infrastructure.
Authors: Ziye Jia, Yao Wu, Qihui Wu, Lijun He, Qiuming Zhu, Fuhui Zhou, Zhu Han
Abstract: Unmanned aerial vehicle (UAV) swarms are increasingly explored for their potentials in various applications such as surveillance, disaster response, and military. However, UAV swarms face significant challenges of implementing effective and rapid decisions under dynamic and uncertain environments. The traditional decision‑making frameworks, mainly relying on centralized control and rigid architectures, are limited by their adaptability and scalability especially in complex environments. To overcome these challenges, in this paper, we propose a hierarchical Observe‑Orient‑Decide‑Act (H‑OODA) loop based framework for the UAV swarm operation in uncertain environments, which is implemented by embedding the classical OODA loop across the cloud‑edge‑terminal layers, and leveraging the network function virtualization (NFV) technology to provide flexible and scalable decision‑making functions. In addition, based on the proposed H‑OODA framework, we joint autonomous decision‑making and cooperative control to enhance the adaptability and efficiency of UAV swarms. Furthermore, we present some typical case studies to verify the improvement and efficiency of the proposed framework. Finally, the potential challenges and possible directions are analyzed to provide insights for the future H‑OODA enabled UAV swarms.
Authors: Haoxiang Lei, Daotong Wang, Shenghai Yuan, Jianbo Su
Abstract: Reliable 3D trajectory estimation of unmanned aerial vehicles (UAVs) is a fundamental requirement for anti‑UAV systems, yet the acquisition of large‑scale and accurately annotated trajectory data remains prohibitively expensive. In this work, we present a novel framework that derives UAV 3D trajectories and category information directly from Internet‑scale UAV videos, without relying on manual annotations. First, language‑driven data acquisition is employed to autonomously discover and collect UAV‑related videos, while vision‑language reasoning progressively filters task‑relevant segments. Second, a training‑free cross‑modal label generation module is introduced to infer 3D trajectory hypotheses and UAV type cues. Third, a physics‑informed refinement process is designed to impose temporal smoothness and kinematic consistency on the estimated trajectories. The resulting video clips and trajectory annotations can be readily utilized for downstream anti‑UAV tasks. To assess effectiveness and generalization, we conduct zero‑shot transfer experiments on a public, well‑annotated 3D UAV benchmark. Results reveal a clear data scaling behavior: as the amount of online video data increases, zero‑shot transfer performance on the target dataset improves consistently, without any target‑domain training. The proposed method closely approaches the current state‑of‑the‑art, highlighting its robustness and applicability to real‑world anti‑UAV scenarios. Code and datasets will be released upon acceptance.
Authors: Faryal Batool, Yasheerah Yaqoot, Muhammad Ahsan Mustafa, Roohan Ahmed Khan, Aleksey Fedoseev, Dzmitry Tsetserukou
Abstract: Safe swarm navigation in cluttered indoor environment requires long‑horizon planning, reactive obstacle avoidance, and adaptive compliance. We propose ImpedanceDiffusion, a hierarchical framework that leverages image‑conditioned diffusion‑based global path planning with Artificial Potential Field (APF) tracking and semantic‑aware variable impedance control for aerial drone swarms.
The diffusion model generates geometric global trajectories directly from RGB images without explicit map construction. These trajectories are tracked by an APF‑based reactive layer, while a VLM‑RAG module performs semantic obstacle classification with 90% retrieval accuracy to adapt impedance parameters for mixed obstacle environments during execution.
Two diffusion planners are evaluated: (i) a top‑view long‑horizon planner using single‑pass inference and (ii) a first‑person‑view (FPV) short‑horizon planner deployed via a two‑stage inference pipeline. Both planners achieve a 100% trajectory generation rate across twenty static and dynamic experimental configurations and are validated via zero‑shot sim‑to‑real deployment on Crazyflie 2.1 drones through the hierarchical APF‑impedance control stack. The top‑view planner produces smoother trajectories that yield conservative tracking speeds of 1.0‑1.2 m/s near hard obstacles and 0.6‑1.0 m/s near soft obstacles. In contrast, the FPV planner generates trajectories with greater local clearance and typically higher speeds, reaching 1.4‑2.0 m/s near hard obstacles and up to 1.6 m/s near soft obstacles. Across 20 experimental configurations (100 total runs), the framework achieved a 92% success rate while maintaining stable impedance‑based formation control with bounded oscillations and no in‑flight collisions, demonstrating reliable and adaptive swarm navigation in cluttered indoor environments.
Authors: Valerio Brunacci, Davide Plozza, Alessio De Angelis, Michele Magno, Tommaso Polonelli
Abstract: We present a complete infrastructure‑less magneto‑inductive (MI) localization system enabling a lightweight UAV to autonomously hover, track, and land with centimeter precision on a mobile quadruped robot acting as a dynamic docking pad. This work advances the vision of heterogeneous robot collaboration, where ultra‑lightweight flying robots serve as mobile perception agents for ground‑based Unmanned Ground Vehicles (UGVs). By extending the sensing horizon and providing complementary viewpoints, the UAVs enhance exploration efficiency and improve the quality of data collection in large‑scale, unknown environments. The proposed system aims to complements traditional localization modalities with a compact, embedded, and infrastructure‑less magnetic sensing approach, providing accurate short‑range relative positioning to bridge the gap between coarse navigation and precise UAV docking. A single lightweight receive coil and a fully embedded estimation pipeline on the UAV deliver 20 Hz relative pose estimates in the UGV's frame, achieving a 3D position root‑mean‑square error (RMSE) of 5 cm. The system uses real‑time estimation and a warm‑started solver to estimate the 3D position, which is then fused with inertial and optical‑flow measurements in the onboard extended Kalman filter. Real‑world experiments validate the effectiveness of the framework, demonstrating significant improvements in UAV‑‑UGV teaming in infrastructure‑less scenarios compared to state‑of‑the‑art methods, requiring no external anchors or global positioning. In dynamic scenarios, the UAV tracks and docks with a moving UGV while maintaining a 7.2 cm RMSE and achieving successful autonomous landings.
Authors: Fawad Mehboob, Amir Atef Habel, Roohan Ahmed Khan, Mikhail Derevianchenko, Clement Fortin, Dzmitry Tsetserukou
Abstract: The stability and control of Unmanned Aerial Vehicles (UAVs) in a turbulent environment is a matter of great concern. Devising a robust control algorithm to reject disturbances is challenging due to the highly nonlinear nature of wind dynamics, and modeling the dynamics using analytical techniques is not straightforward. While traditional techniques using disturbance observers and classical adaptive control have shown some progress, they are mostly limited to relatively non‑complex environments. On the other hand, learning based approaches are increasingly being used for modeling of residual forces and disturbance rejection; however, their generalization and interpretability is a factor of concern. To this end, we propose a novel integration of data‑driven system identification using Sparse Identification of Non‑Linear Dynamics (SINDy) with a Recursive Least Square (RLS) adaptive control to adapt and reject wind disturbances in a turbulent environment. We tested and validated our approach on Gazebo harmonic environment and on real flights with wind speeds of up to 2 m/s from four directions, creating a highly dynamic and turbulent environment. Adaptive SINDy outperformed the baseline PID and INDI controllers on several trajectory tracking error metrics without crashing. A root mean square error (RMSE) of up to 12.2 cm and 17.6 cm, and a mean absolute error (MAE) of 13.7 cm and 10.5 cm were achieved on circular and lemniscate trajectories, respectively. The validation was performed on a very lightweight Crazyflie drone under a highly dynamic environment for complex trajectory tracking.
Authors: Seyedreza Rezaei, Junjie Kang, Amaldev Haridevan, Jinjun Shan
Abstract: Model Predictive Control (MPC) is widely adopted for agile multirotor vehicles, yet achieving both stability and obstacle‑free flight is particularly challenging when a payload is suspended beneath the airframe. This paper introduces a Safety Enhanced Passivity‑Based Nonlinear MPC (SEP‑NMPC) that provides formal guarantees of stability and safety for a quadrotor transporting a slung payload through cluttered environments. Stability is enforced by embedding a strict passivity inequality, which is derived from a shaped energy storage function with adaptive damping, directly into the NMPC. This formulation dissipates excess energy and ensures asymptotic convergence despite payload swings. Safety is guaranteed through high‑order control barrier functions (HOCBFs) that render user‑defined clearance sets forward‑invariant, obliging both the quadrotor and the swinging payload to maintain separation while interacting with static and dynamic obstacles. The optimization remains quadratic‑program compatible and is solved online at each sampling time without gain scheduling or heuristic switching. Extensive simulations and real‑world experiments confirm stable payload transport, collision‑free trajectories, and real‑time feasibility across all tested scenarios. The SEP‑NMPC framework therefore unifies passivity‑based closed‑loop stability with HOCBF‑based safety guarantees for UAV slung‑payload transportation.
Authors: Sachin Kadam
Abstract: Wide‑area IoT sensor networks require efficient data collection mechanisms when sensors are dispersed over large regions with limited communication infrastructure. Unmanned aerial vehicle (UAV)‑mounted Mobile Base Stations (MBSs) provide a flexible solution; however, their limited onboard energy and the strict energy budgets of sensors necessitate carefully optimized tour planning. In this paper, we introduce the Mobile Base Station Optimal Tour (MOT) problem, which seeks a minimum‑cost, non‑revisiting tour over a subset of candidate stops such that the union of their coverage regions ensures complete sensor data collection under a global sensor energy constraint. The tour also avoids restricted areas. We formally model the MOT problem as a combinatorial optimization problem, which is NP‑complete. Owing to its computational intractability, we develop a polynomial‑time greedy heuristic that jointly considers travel cost and incremental coverage gain while avoiding restricted areas. Using simulations, we obtain tours with low cost, complete sensor coverage, and faster execution. Our proposed greedy algorithm outperforms state‑of‑the‑art approaches in terms of a performance indicator defined as the product of tour length and algorithm execution time, achieving an improvement of 39.15%. The proposed framework provides both theoretical insight into the structural complexity of MBS‑assisted data collection and a practical algorithmic solution for large‑scale IoT deployments.
Authors: Michael Bezick, Majid Sahin
Abstract: Detecting fast‑moving objects, such as unmanned aerial vehicle (UAV), from event camera data is challenging due to the sparse, asynchronous nature of the input. Traditional Discrete Fourier Transforms (DFT) are effective at identifying periodic signals, such as spinning rotors, but they assume uniformly sampled data, which event cameras do not provide. We propose a novel per‑pixel temporal analysis framework using the Non‑uniform Discrete Fourier Transform (NDFT), which we call Drone Detection via Harmonic Fingerprinting (DDHF). Our method uses purely analytical techniques that identify the frequency signature of drone rotors, as characterized by frequency combs in their power spectra, enabling a tunable and generalizable algorithm that achieves accurate real‑time localization of UAV. We compare against a YOLO detector under equivalent conditions, demonstrating improvement in accuracy and latency across a difficult array of drone speeds, distances, and scenarios. DDHF achieves an average localization F1 score of 90.89% and average latency of 2.39ms per frame, while YOLO achieves an F1 score of 66.74% and requires 12.40ms per frame. Through utilization of purely analytic techniques, DDHF is quickly tuned on small data, easily interpretable, and achieves competitive accuracies and latencies to deep learning alternatives.
Authors: Manuel Boldrer, Michal Kamler, Afzal Ahmad, Martin Saska
Abstract: We present a communication‑free method for safe multi‑robot coordination in complex environments such as forests with dense canopy cover, where GNSS is unavailable. Our approach relies on an onboard anisotropic 3D LiDAR sensor used for SLAM as well as for detecting obstacles and neighboring robots. We develop a novel perception‑aware 3D navigation framework that enables robots to safely and effectively progress toward a goal region despite limited sensor field‑of‑view. The approach is evaluated through extensive simulations across diverse scenarios and validated in real‑world field experiments, demonstrating its scalability, robustness, and reliability.
Authors: Ishrat Jahan, Molla E Majid, M Murugappan, Muhammad E. H. Chowdhury, N. B. Prakash, Saad Bin Abul Kashem, Balamurugan Balusamy, Amith Khandakar
Abstract: Reliable unmanned aerial vehicle (UAV) detection is critical for autonomous airspace monitoring but remains challenging when integrating sensor streams that differ substantially in resolution, perspective, and field of view. Conventional fusion methods‑such as wavelet‑, Laplacian‑, and decision‑level approaches‑often fail to preserve spatial correspondence across modalities and suffer from annotation of inconsistencies, limiting their robustness in real‑world settings. This study introduces two fusion strategies, Registration‑aware Guided Image Fusion (RGIF) and Reliability‑Gated Modality‑Attention Fusion (RGMAF), designed to overcome these limitations. RGIF employs Enhanced Correlation Coefficient (ECC)‑based affine registration combined with guided filtering to maintain thermal saliency while enhancing structural detail. RGMAF integrates affine and optical‑flow registration with a reliability‑weighted attention mechanism that adaptively balances thermal contrast and visual sharpness. Experiments were conducted on the Multi‑Sensor and Multi‑View Fixed‑Wing (MMFW)‑UAV dataset comprising 147,417 annotated air‑to‑air frames collected from infrared, wide‑angle, and zoom sensors. Among single‑modality detectors, YOLOv10x demonstrated the most stable cross‑domain performance and was selected as the detection backbone for evaluating fused imagery. RGIF improved the visual baseline by 2.13% mAP@50 (achieving 97.65%), while RGMAF attained the highest recall of 98.64%. These findings show that registration‑aware and reliability‑adaptive fusion provides a robust framework for integrating heterogeneous modalities, substantially enhancing UAV detection performance in multimodal environments.
Authors: Bowen Liu, Pengyue Jia, Wanyu Wang, Derong Xu, Jiawei Cheng, Jiancheng Dong, Xiao Han, Zimo Zhao, Chao Zhang, Bowen Yu, Fangyu Hong, Xiangyu Zhao
Abstract: Cross‑view UAV geolocalization is fundamentally a challenging large‑scale image retrieval task, aiming to determine the geographic coordinates of Unmanned Aerial Vehicle (UAV) queries by matching them against an extensive geo‑tagged satellite image database. Most existing methods learn separate feature representations for each view and determine the final prediction using naive heuristics to assess feature similarity, thereby neglecting to model the crucial cross‑view relationships. In this paper, we propose SkyLink, a novel plug‑and‑play ranking framework that pioneers joint relational modeling of inter‑view relationships to enhance cross‑view UAV geolocalization. SkyLink leverages a Large Vision‑Language Model (LVLM) to model the intricate visual‑semantic relationships between UAV and satellite views, facilitating effective cross‑view matching. To further refine the learning process, we introduce a relational‑aware loss. It leverages soft labels to provide a more nuanced supervision signal, mitigating the harsh penalty on near‑positive pairs. This approach enhances both training stability and the model's discriminative capacity. Extensive experiments conducted across multiple base retrieval architectures and benchmark datasets demonstrate that SkyLink significantly boosts the ranking effectiveness of existing models, consistently achieving superior performance in various challenging scenarios.
Authors: Yiming Zhang, Junyi Geng
Abstract: Aerial manipulation (AM) expands UAV capabilities beyond passive observation to contact‑based operations at high altitudes and in otherwise inaccessible environments. Although recent advances show promise, most AM systems are developed in controlled settings that overlook key aerodynamic effects. Simplified thrust models are often insufficient to capture the nonlinear wind disturbances and proximity‑induced flow variations present in real‑world environments near infrastructure, while high‑fidelity CFD methods remain impractical for real‑time use. Learning‑based models are computationally efficient at inference, but often struggle to generalize to unseen condition. This paper combines both approaches by integrating a physics‑based blade‑element model with a learning‑based residual force estimator, along with a rotor‑speed allocation strategy for disturbance compensation, resulting in a unified control framework. The blade‑element model computes per‑rotor aerodynamic forces under wind and provides a refined feedforward disturbance estimate. A learning‑based estimator then predicts the residual forces not captured by the model, enabling compensation for unmodeled aerodynamic effects. An online adaptation mechanism further updates the residual‑force prediction and rotor‑speed allocation jointly to reduce the mismatch between desired and realized thrust. We evaluate this framework in both free‑flight and wall‑contact tracking tasks in a simulated near‑wall wind environment. Results demonstrate improved disturbance estimation and trajectory‑tracking accuracy over conventional approaches, enabling robust wall‑contact execution under challenging aerodynamic conditions.
Authors: Zeyu Fang, Beomyeol Yu, Cheng Liu, Zeyuan Yang, Rongqian Chen, Yuxin Lin, Mahdi Imani, Tian Lan
Abstract: Human‑AI joint planning in Unmanned Aerial Vehicles (UAVs) typically relies on control handover when facing environmental uncertainties, which is often inefficient and cognitively demanding for non‑expert operators. To address this, we propose a novel framework that shifts the collaboration paradigm from control takeover to active information elicitation. We introduce the Minimal Information Neuro‑Symbolic Tree (MINT), a reasoning mechanism that explicitly structures knowledge gaps regarding obstacles and goals into a queryable format. By leveraging large language models, our system formulates optimal binary queries to resolve specific ambiguities with minimal human interaction. We demonstrate the efficacy of this approach through a comprehensive workflow integrating a vision‑language model for perception, voice interfaces, and a low‑level UAV control module in both high‑fidelity NVIDIA Isaac simulations and real‑world deployments. Experimental results show that our method achieves a significant improvement in the success rate for complex search‑and‑rescue tasks while significantly reducing the frequency of human interaction compared to exhaustive querying baselines.
Authors: Zeyu Fang, Yuxin Lin, Cheng Liu, Beomyeol Yu, Zeyuan Yang, Rongqian Chen, Taeyoung Lee, Mahdi Imani, Tian Lan
Abstract: Effective human‑robot collaboration in open‑world environments requires joint planning under uncertain conditions. However, existing approaches often treat humans as passive supervisors, preventing autonomous agents from becoming human‑like teammates that can actively model teammate behaviors, reason about knowledge gaps, query, and elicit responses through communication to resolve uncertainties. To address these limitations, we propose a unified human‑robot joint planning system designed to tackle dual sources of uncertainty: task‑relevant knowledge gaps and latent human intent. Our system operates in two complementary modes. First, an uncertainty‑mitigation joint planning module enables two‑way conversations to resolve semantic ambiguity and object uncertainty. It utilizes an LLM‑assisted active elicitation mechanism and a hypothesis‑augmented A^ search, subsequently computing an optimal querying policy via dynamic programming to minimize interaction and verification costs. Second, a real‑time intent‑aware collaboration module maintains a probabilistic belief over the human's latent task intent via spatial and directional cues, enabling dynamic, coordination‑aware task selection for agents without explicit communication. We validate the proposed system in both Gazebo simulations and real‑world UAV deployments integrated with a Vision‑Language Model (VLM)‑based 3D semantic perception pipeline. Experimental results demonstrate that the system significantly cuts the interaction cost by 51.9% in uncertainty‑mitigation planning and reduces the task execution time by 25.4% in intent‑aware cooperation compared to the baselines.
Authors: Yibin Ye, Shuo Chen, Kun Wang, Xiaokai Song, Jisheng Dang, Qifeng Yu, Xichao Teng, Zhang Li
Abstract: Cross‑View Geo‑Localization (CVGL) between UAV imagery and satellite images plays a crucial role in target localization and UAV self‑positioning. However, most existing methods rely on the idealized assumption of scale consistency between UAV queries and satellite galleries, overlooking the severe scale ambiguity commonly encountered in real‑world scenarios. This discrepancy leads to field‑of‑view misalignment and feature mismatch, significantly degrading CVGL robustness. To address this issue, we propose a geometric framework that recovers the absolute metric scale from monocular UAV images using semantic anchors. Specifically, small vehicles (SVs), characterized by relatively stable prior size distributions and high detectability, are exploited as metric references. A Decoupled Stereoscopic Projection Model is introduced to estimate the absolute image scale from these semantic targets. By decomposing vehicle dimensions into radial and tangential components, the model compensates for perspective distortions in 2D detections of 3D vehicles, enabling more accurate scale estimation. To further reduce intra‑class size variation and detection noise, a dual‑dimension fusion strategy with Interquartile Range (IQR)‑based robust aggregation is employed. The estimated global scale is then used as a physical constraint for scale‑adaptive satellite image cropping, improving UAV‑to‑satellite feature alignment. Experiments on augmented DenseUAV and UAV‑VisLoc datasets demonstrate that the proposed method significantly improves CVGL robustness under unknown UAV image scales. Additionally, the framework shows strong potential for downstream applications such as passive UAV altitude estimation and 3D model scale recovery.
Authors: Xin Tang, Xiaohuan Li, Qian Chen, Binhan Liao, Yaqi Zhang, Jianxin Chen, Changyuan Zhao, Junchuan Fan, Junxi Tian
Abstract: Unmanned aerial vehicular network (UAVN) is envisioned to provide flexible connectivity, wide‑area coverage, and low‑latency services in dynamic environments. From an agentic artificial intelligence (Agentic AI) perspective, UAVNs naturally operate as multi‑agent systems, where UAVs act as intelligent agents that coordinate deployment and networking decisions to achieve global performance objectives. However, the strong coupling between discrete link decisions and continuous deployment parameters makes UAVN deployment optimization a mixed‑integer nonconvex problem, resulting in challenges in scalability, efficiency, and solution consistency under dynamic network conditions. This paper proposes a dual spatial‑scale UAVN deployment optimization framework based on exact potential games (EPGs), enhanced by Agentic AI. At the large spatial scale, a log‑linear learning based EPG (L3‑EPG) algorithm is developed to optimize inter‑UAV link configurations, enabling sparse yet connected network topologies while reducing redundant links and interference. At the small spatial scale, an approximate gradient based EPG (AG‑EPG) algorithm jointly optimizes UAV deployment, transmission power allocation, and ground user (GU) association to improve network throughput and latency. To further enhance adaptability across heterogeneous scenarios, a large language model (LLM) is incorporated as a knowledge‑driven decision enhancer to automatically generate utility weights according to network characteristics, alleviating reliance on manual parameter tuning. Simulation results demonstrate that the proposed framework consistently outperforms baseline methods in terms of energy consumption, end‑to‑end latency, and system throughput.
Authors: Jiaxu Zhou, Shaobo Wang, Zhiyuan Yang, Zhenjun Yu, Tao Li
Abstract: Vision‑Language Navigation aims to enable agents to understand natural language instructions and carry out appropriate navigation actions in real‑world environments. Most work focuses on indoor settings, with little research in complex outdoor scenes. Current UAV Vision‑and‑Language Navigation models typically act as black boxes without explicit reasoning. We introduce FreeFly‑thinking, an end‑to‑end VLN framework that converts the UAV agent's egocentric images and language instructions into a series of actions, inspired by environment of urban architecture proposed by OpenFly. We first construct a UAV dataset for navigation task, and then performing natural language chain of thought. We adopt a two‑stage training strategy: Supervised fine‑tuning and Reinforcement fine‑tuning. Experiments on unseen test demonstrate a strong performance, presenting robustness and efficiency in UAV navigation issue.
Authors: Hussein N. Naser, Hashim A. Hashim, Mojtaba Ahmadi
Abstract: This paper presents an Adaptive Gain Nonlinear Observer (AGNO) for estimating the external interaction wrench (forces and torques) in human‑UAV physical interaction for assistive payload transportation. The proposed AGNO uses the full nonlinear dynamic model to achieve an accurate and robust wrench estimation without relying on dedicated force‑torque sensors. A key feature of this approach is the explicit consideration of the non‑constant inertia matrix, which is essential for aerial systems with asymmetric mass distribution or shifting payloads. A comprehensive dynamic model of a cooperative transportation system composed of two quadrotors and a shared payload is derived, and the stability of the observer is rigorously established using Lyapunov‑based analysis. Simulation results validate the effectiveness of the proposed observer in enabling intuitive and safe human‑UAV interaction. Comparative evaluations demonstrate that the proposed AGNO outperforms an Extended Kalman Filter (EKF) in terms of estimation root mean square errors (RMSE), particularly for torque estimation under nonlinear interaction conditions. This approach reduces system weight and cost by eliminating additional sensing hardware, enhancing practical feasibility.
Authors: Zihao Deng, Qianhuang Li, Peng Gao, Maggie Wigness, John Rogers, Donghyun Kim, Hao Zhang
Abstract: Collaborative planning under operational constraints is an essential capability for heterogeneous robot teams tackling complex large‑scale real‑world tasks. Unmanned Aerial Vehicles (UAVs) offer rapid environmental coverage, but flight time is often limited by energy constraints, whereas Unmanned Ground Vehicles (UGVs) have greater energy capacity to support long‑duration missions, but movement is constrained by traversable terrain. Individually, neither can complete tasks such as environmental monitoring. Effective UAV‑UGV collaboration therefore requires energy‑constrained multi‑UAV task planning, traversability‑constrained multi‑UGV path planning, and crucially, synchronized concurrent co‑planning to ensure timely in‑mission recharging. To enable these capabilities, we propose Collaborative Planning with Concurrent Synchronization (CoPCS), a learning‑based approach that integrates a heterogeneous graph transformer for operationally constrained task encoding with a transformer decoder for joint, synchronized co‑planning that enables UAVs and UGVs to act concurrently in a coordinated manner. CoPCS is trained end‑to‑end under a unified imitation learning paradigm. We conducted extensive experiments to evaluate CoPCS in both robotic simulations and physical robot teams. Experimental results demonstrate that our method provides the novel multi‑robot capability of synchronized concurrent co‑planning and substantially improves team performance. More details of this work are available on the project website: https://hcrlab.gitlab.io/project/CoPCS.
Authors: Riccardo Pretto, Mahmoud Hamandi, Abdullah Mohamed Ali, Gokhan Alcan, Anthony Tzes, Fares Abu-Dakka
Abstract: Fully actuated omnidirectional UAVs enable independent control of forces and torques along all six degrees of freedom, broadening the operational envelope for agile flight and aerial interaction tasks. However, conventional control allocation methods neglect the asymmetric dynamics of the onboard actuators, which can induce oscillatory motor commands and degrade trajectory tracking during dynamic maneuvers. This work proposes a receding‑horizon, actuation‑aware allocation strategy that explicitly incorporates asymmetric motor dynamics and exploits the redundancy of over‑actuated platforms through nullspace optimization. By forward‑simulating the closed‑loop system over a prediction horizon, the method anticipates actuator‑induced oscillations and suppresses them through smooth redistribution of motor commands, while preserving the desired body wrench exactly. The approach is formulated as a constrained optimal control problem solved online via Constrained iterative LQR. Simulation results on the OmniOcta platform demonstrate that the proposed method significantly reduces motor command oscillations compared to a conventional single‑step quadratic programming allocator, yielding improved trajectory tracking in both position and orientation.
Authors: Jingtao Ye, Kexin Zhang, Xunchi Ma, Yuehan Li, Guangming Zhu, Peiyi Shen, Linhua Jiang, Xiangdong Zhang, Liang Zhang
Abstract: The rapid movements and agile maneuvers of unmanned aerial vehicles (UAVs) induce significant observational challenges for multi‑object tracking (MOT). However, existing UAV‑perspective MOT benchmarks often lack these complexities, featuring predominantly predictable camera dynamics and linear motion patterns. To address this gap, we introduce DynUAV, a new benchmark for dynamic UAV‑perspective MOT, characterized by intense ego‑motion and the resulting complex apparent trajectories. The benchmark comprises 42 video sequences with over 1.7 million bounding box annotations, covering vehicles, pedestrians, and specialized industrial categories such as excavators, bulldozers and cranes. Compared to existing benchmarks, DynUAV introduces substantial challenges arising from ego‑motion, including drastic scale changes and viewpoint changes, as well as motion blur. Comprehensive evaluations of state‑of‑the‑art trackers on DynUAV reveal their limitations, particularly in managing the intertwined challenges of detection and association under such dynamic conditions, thereby establishing DynUAV as a rigorous benchmark. We anticipate that DynUAV will serve as a demanding testbed to spur progress in real‑world UAV‑perspective MOT, and we will make all resources available at link.
Authors: Muhammad Zawad Mahmud, Samiha Islam, Damian Lyons
Abstract: The generation of synthetic novel views has the potential to positively impact robot navigation in several ways. In image‑based navigation, a novel overhead view generated from a scene taken by a ground robot could be used to guide an aerial robot to that location. In Video Place Recognition (VPR), novel views of ground locations from the air can be added that enable a UAV to identify places seen by the ground robot, and similarly, overhead views can be used to generate novel ground views.
This paper presents a systematic evaluation of synthetic novel views in VPR using five public VPR image databases and seven typical image similarity methods. We show that for small synthetic additions, novel views improve VPR recognition statistics. We find that for larger additions, the magnitude of viewpoint change is less important than the number of views added and the type of imagery in the dataset.
Authors: Guangyuan Liu, Changyuan Zhao, Yinqiu Liu, Dusit Niyato, Biplab Sikdar
Abstract: Mobile agentic AI is extending autonomous capabilities to resource‑constrained platforms such as edge robots and unmanned aerial vehicles (UAVs), where strict size, weight, power, and cost (SWAP‑C) constraints and intermittent wireless connectivity limit both on‑device computation and cloud access. Existing approaches mostly optimize per‑round communication efficiency, yet mobile agents must sustain competence across a stream of tasks. We propose a knowledge‑driven reasoning framework that extracts reusable decision structures from past execution, synchronizes them over bandwidth‑limited links, and injects them into on‑device reasoning to reduce latency, energy, and error accumulation. A DIKW‑inspired taxonomy distinguishes raw observations, episode‑scoped traces, and persistent cross‑task knowledge, and categorizes knowledge into retrieval, structured, procedural, and parametric representations, each with a distinct tradeoff between reasoning speedup and failure risk. A key finding is that knowledge exposure is non‑monotonic: too little forces costly trial‑and‑error replanning, while too much introduces conflicting cues and errors. A UAV case study validates the framework, where a compact knowledge pack synchronized over intermittent backhaul enables a 3B‑parameter onboard model to achieve perfect mission reliability with lower reasoning cost than both knowledge‑free on‑device reasoning and cloud‑centric replanning.
Authors: Shuaichen Yan, Xiao Hu, Jiayang Sun, Zeyuan Yang, Shipeng Li, Heung-Yeung Shum, Shijun Yin, Yuqing Tang
Abstract: The explosive growth of the low‑altitude economy, driven by eVTOLs and UAVs, demands a unified digital infrastructure to ensure safety and scalability. However, the current aviation vertical references are dangerously fragmented: manned aviation relies on barometric pressure, cartography uses Mean Sea Level (MSL), and obstacle avoidance depends on Above Ground Level (AGL). This fragmentation creates significant ambiguity for autonomous systems and hinders cross‑stakeholder interoperability. In this article, we propose Height Above Ellipsoid (HAE) as the standardized vertical reference for lower airspace. Unlike legacy systems prone to environmental drift and inconsistent datums, HAE provides a globally consistent, GNSS‑native, and mathematically stable reference. We present a pragmatic bidirectional transformation framework to bridge HAE with legacy systems and demonstrate its efficacy through (1) real‑world implementation in Shenzhen's partitioned airspace management, and (2) a probabilistic risk assessment driven by empirical flight logs from the PX4 ecosystem. Results show that transitioning to HAE reduces the required vertical separation minimum, effectively increasing dynamic airspace capacity while maintaining a target safety level. This work offers a roadmap for transitioning from analog height keeping to a digital‑native vertical standard.
Authors: Yuxuan Yang, Bin Lyu, Abbas Jamalipour
Abstract: Space‑air‑ground integrated networks (SAGINs) interconnect satellites, uncrewed aerial vehicles (UAVs), and ground devices to enable flexible and ubiquitous wireless services. The integration of reconfigurable intelligent surfaces (RISs) and fluid antenna systems (FASs) further enhances radio environment controllability. However, the tight integration of cross‑layer facilities and radio enhancement technologies leads to pronounced environmental dynamics and heterogeneity, posing fundamental challenges for system modeling and optimization in large‑scale SAGINs. This paper investigates a SAGIN in which low Earth orbit (LEO) satellite constellations communicate with multiple ground hotspots via RIS‑assisted UAV relays, serving both FAS‑equipped and conventional users. A system model is developed that explicitly captures satellite mobility, UAV trajectories, RIS phase control, and heterogeneous user reception capabilities. Accordingly, a multi‑hotspot downlink rate maximization problem is studied, whose solvability is analyzed through a hierarchical Stackelberg game. To address heterogeneous and time‑varying multi‑hotspot environments, an adaptive personalized federated reinforcement learning (FRL) algorithm is proposed for adaptive optimization of UAV trajectories and RIS phase controls. Simulation results demonstrate superior performance and validate the effectiveness of personalization in dynamic heterogeneous SAGIN scenarios.
Authors: Yuxuan Yang, Bin Lyu, Abbas Jamalipour
Abstract: The low‑altitude economy (LAE) is a rapidly emerging paradigm that builds a service‑centric economic ecosystem through large‑scale and sustainable uncrewed aerial vehicle (UAV)‑enabled service provisioning, reflecting the transition of the 6G era from technological advancement toward commercial deployment. The significant market potential of LAE attracts an increasing number of service providers (SPs), resulting in intensified competition in service deployment. In this paper, we study a realistic LAE scenario in which multiple SPs dynamically deploy UAVs to deliver multiple services to user hotspots, aiming to jointly optimize communication and computation resource allocation. To resolve deployment competition among SPs, an authenticity‑guaranteed auction mechanism is designed, and game‑theoretic analysis is conducted to establish the solvability of the proposed resource allocation problem. Furthermore, a resilient federated reinforcement learning (FRL)‑based solution is developed with strong fault tolerance, effectively countering transmission errors and malicious competition while facilitating potential cooperation among self‑interested SPs. Simulation results demonstrate that the proposed approach significantly improves service performance and robustness compared with baseline methods, providing a practical and scalable solution for competitive LAE service deployment.
Authors: Wagner Comin Sonaglio, Ágney Lopes Roth Ferraz, André Elias Melo, Murray Evangelista de Souza, Guevara Noubir, Lourenço Alves Pereira Júnior
Abstract: Beyond Visual Line of Sight (BVLOS) unmanned aerial vehicle (UAV) operations increasingly use 5G standalone (SA) networks for command and control (C2) between the UAV and the ground control station (GCS). The 3rd Generation Partnership Project (3GPP) has specified mechanisms for authentication and authorization of unmanned aircraft systems (UAS) in this architectural setting. As a result, operators may treat registration state, Protocol Data Unit (PDU) session status, and IP reachability as evidence that the C2 path is available. In practice, however, these connectivity indicators alone do not guarantee that closed‑loop control remains operationally safe. Attacks can degrade UAS C2 when timeliness degrades under shared User Plane contention, mobility continuity fails during Control Plane instability, or command integrity is violated at a trusted next‑generation Node B (gNodeB). Such failures undermine connectivity as the central security indicator for UAV operations. In this paper, we demonstrate these issues using three distinct threat models on a reproducible Open5GS and UERANSIM testbed that carries Micro Air Vehicle Link (MAVLink) over the 5G User Plane, and we use a commercial Nokia core to ground deployment assumptions. We address timeliness, availability, and integrity through experiments in which attack success is defined as forcing an unsafe closed‑loop state without a clean disconnect. We observe stale telemetry and heavy‑tailed delay under co‑tenant User Plane contention, failsafe after handover under Control Plane instability, and navigation hijacking after command rewriting at a compromised gNodeB. We further discuss why each threat model arises and evaluate mitigations for these cross‑layer failures. Across the study, we disclosed five robustness issues: three CVEs have already been assigned, and two additional CVE requests are pending.
Authors: Jack R. Pence, Jackson Fezell, Jack W. Langelaan, Junyi Geng
Abstract: Transporting heavy or oversized slung loads using rotorcraft has traditionally relied on single‑aircraft systems, which limits both payload capacity and control authority. Cooperative multilift using teams of rotorcraft offers a scalable and efficient alternative, especially for infrequent but challenging "long‑tail" payloads without the need of building larger and larger rotorcraft. Most prior multilift research assumes GPS availability, uses centralized estimation architectures, or relies on controlled laboratory motion‑capture setups. As a result, these methods lack robustness to sensor loss and are not viable in GPS‑denied or operationally constrained environments. This paper addresses this limitation by presenting a distributed and decentralized payload state estimation framework for vision‑based multilift operations. Using onboard monocular cameras, each UAV detects a fiducial marker on the payload and estimates its relative pose. These measurements are fused via a Distributed and Decentralized Extended Information Filter (DDEIF), enabling robust and scalable estimation that is resilient to individual sensor dropouts. This payload state estimate is then used for closed‑loop trajectory tracking control. Monte Carlo simulation results in Gazebo show the effectiveness of the proposed approach, including the effect of communication loss during flight.
Authors: Yifei Chen, Xupeng Chen, Feng Wang, Niangang Jiao, Jiayin Liu
Abstract: Autonomous aerial robots operating in GPS‑denied or communication‑degraded environments frequently lose access to camera metadata and telemetry, leaving onboard perception systems unable to recover the absolute metric scale of the scene. As LLM/VLM‑based planners are increasingly adopted as high‑level agents for embodied systems, their ability to reason about physical dimensions becomes safety‑critical ‑‑ yet our experiments show that five state‑of‑the‑art VLMs suffer from spatial scale hallucinations, with median area estimation errors exceeding 50%. We propose VANGUARD, a lightweight, deterministic Geometric Perception Skill designed as a callable tool that any LLM‑based agent can invoke to recover Ground Sample Distance (GSD) from ubiquitous environmental anchors: small vehicles detected via oriented bounding boxes, whose modal pixel length is robustly estimated through kernel density estimation and converted to GSD using a pre‑calibrated reference length. The tool returns both a GSD estimate and a composite confidence score, enabling the calling agent to autonomously decide whether to trust the measurement or fall back to alternative strategies. On the DOTA~v1.5 benchmark, VANGUARD achieves 6.87% median GSD error on 306~images. Integrated with SAM‑based segmentation for downstream area measurement, the pipeline yields 19.7% median error on a 100‑entry benchmark ‑‑ with 2.6x lower category dependence and 4x fewer catastrophic failures than the best VLM baseline ‑‑ demonstrating that equipping agents with deterministic geometric tools is essential for safe autonomous spatial reasoning.
Authors: Giorgio Audrito, Daniele Bortoluzzi, Ferruccio Damiani, Giordano Scarso, Gianluca Torta, Andrea Basso, Monica Cochi, Lorenzo Gusman, Lorenzo Comba, Paolo Gay, Paola Dal Zovo, Giada Galati, Francesco Gallo, Aljaž Grdadolnik, Massimo Pescarollo, Paola Pisano
Abstract: Aggregate Programming (AP) is a paradigm for programming the collective behaviour of sets of distributed devices, possibly situated at the network far edge, by relying on asynchronous proximity‑based interactions. The eXchange Calculus (XC), a recently proposed foundational model for AP, is essentially a typed lambda calculus extended with an operator (the exchange operator) providing an implicit communication mechanism between neighbour devices. This paper provides a gentle introduction to XC and to its implementation as a C++ library, called FCPP. The FCPP library and toolchain has been mainly developed at the Department of Computer Science of the University of Turin, where Stefano Berardi spent most of his academic career conducting outstanding research about logical foundation of computer science and transmitting his passion for research to students and young researchers, often exploiting typed lambda calculi. An FCCP program is essentially a typed lambda term, and FCPP has been used to write code that has been deployed on devices at the far edge of the network, including rovers and (soon) Uncrewed Aerial Vehicles (UAVs); hence the title of the paper.
Authors: Augustin Borne, Pierre Notin, Christophe Hennequin, Sebastien Changey, Stephane Bazeille, Christophe Cudel, Franz Quint
Abstract: Object tracking from Unmanned Aerial Vehicles (UAVs) is challenged by platform dynamics, camera motion, and limited onboard resources. Existing visual trackers either lack robustness in complex scenarios or are too computationally demanding for real‑time embedded use. We propose an Modular Asynchronous Tracking Architecture (MATA) that combines a transformer‑based tracker with an Extended Kalman Filter, integrating ego‑motion compensation from sparse optical flow and an object trajectory model. We further introduce a hardware‑independent, embedded oriented evaluation protocol and a new metric called Normalized time to Failure (NT2F) to quantify how long a tracker can sustain a tracking sequence without external help. Experiments on UAV benchmarks, including an augmented UAV123 dataset with synthetic occlusions, show consistent improvements in Success and NT2F metrics across multiple tracking processing frequency. A ROS 2 implementation on a Nvidia Jetson AGX Orin confirms that the evaluation protocol more closely matches real‑time performance on embedded systems.
Authors: Taige Luo, Junru Xie, Chenyang Fan, Bingrong Liu, Ruisheng Wang, Yang Shao, Sheng Xu, Lin Cao
Abstract: Intelligent forest tree breeding has advanced plant phenotyping, yet existing research largely focuses on large‑leaf agricultural crops, with limited attention to fine‑grained leaf analysis of sapling trees in open‑field environments. Natural scenes introduce challenges including scale variation, illumination changes, and irregular leaf morphology. To address these issues, we collected UAV RGB imagery of field‑grown saplings and constructed the Poplar‑leaf dataset, containing 1,202 branches and 19,876 pixel‑level annotated leaf instances. To our knowledge, this is the first instance segmentation dataset specifically designed for forestry leaves in open‑field conditions. We propose LeafInst, a novel segmentation framework tailored for irregular and multi‑scale leaf structures. The model integrates an Asymptotic Feature Pyramid Network (AFPN) for multi‑scale perception, a Dynamic Asymmetric Spatial Perception (DASP) module for irregular shape modeling, and a dual‑residual Dynamic Anomalous Regression Head (DARH) with Top‑down Concatenation decoder Feature Fusion (TCFU) to improve detection and segmentation performance. On Poplar‑leaf, LeafInst achieves 68.4 mAP, outperforming YOLOv11 by 7.1 percent and MaskDINO by 6.5 percent. On the public PhenoBench benchmark, it reaches 52.7 box mAP, exceeding MaskDINO by 3.4 percent. Additional experiments demonstrate strong generalization and practical utility for large‑scale leaf phenotyping.
Authors: Danish Rizvi, David Boyle
Abstract: Coordinating multiple autonomous agents to explore and serve spatially heterogeneous demand requires jointly learning unknown spatial patterns and planning trajectories that maximize task performance. Pure model‑based approaches provide structured uncertainty estimates but lack adaptive policy learning, while deep reinforcement learning often suffers from poor sample efficiency when spatial priors are absent. This paper presents a hybrid belief‑reinforcement learning (HBRL) framework to address this gap. In the first phase, agents construct spatial beliefs using a Log‑Gaussian Cox Process (LGCP) and execute information‑driven trajectories guided by a Pathwise Mutual Information (PathMI) planner with multi‑step lookahead. In the second phase, trajectory control is transferred to a Soft Actor‑Critic (SAC) agent, warm‑started through dual‑channel knowledge transfer: belief state initialization supplies spatial uncertainty, and replay buffer seeding provides demonstration trajectories generated during LGCP exploration. A variance‑normalized overlap penalty enables coordinated coverage through shared belief state, permitting cooperative sensing in high‑uncertainty regions while discouraging redundant coverage in well‑explored areas. The framework is evaluated on a multi‑UAV wireless service provisioning task. Results show 10.8% higher cumulative reward and 38% faster convergence over baselines, with ablation studies confirming that dual‑channel transfer outperforms either channel alone.
Authors: Hanjian Liu, Jinsong Gui
Abstract: The deployment of Multipath QUIC (MPQUIC) in Unmanned Aerial Vehicle (UAV)‑assisted Space‑Air‑Ground Integrated Networks (SAGINs) is severely hampered by the out‑of‑order (OFO) packet delivery problem. Frequent stream handovers, high mobility, and massive multi‑access contention in these networks introduce severe transport‑layer challenges. Existing solutions typically isolate multipath scheduling from congestion control, which leads to suboptimal performance and transient congestion in highly dynamic environments. To overcome these limitations, this paper proposes the GPR Hierarchical Synergistic Framework, representing the first joint optimization of multipath scheduling and congestion control for multi‑access MPQUIC in SAGINs. Our framework introduces the GradNorm Probabilistic Self‑Predictive (GPASP) module to forecast latent states and filter task‑irrelevant information in high‑dimensional, noisy observation spaces. Furthermore, we develop a Proactive Handover‑Aware Congestion Control (PHACC) algorithm that leverages neural network‑driven decisions to proactively distinguish handover‑induced packet losses from actual network congestion. To address decision‑making lag caused by neural network inference latency, a Neural‑network Preference Estimation (NNPE) algorithm is designed for highly efficient, real‑time scheduling. Extensive ns‑3 simulations demonstrate that the proposed framework significantly outperforms state‑of‑the‑art baselines, achieving substantial goodput improvements and a marked reduction in OFO degrees.
Authors: Yinghao Zhao, Chenguang Dai, Liang Lyu, Zhenchao Zhang, Chaozhen Lan, Hong Xie
Abstract: Motion planning is a critical component of intelligent unmanned systems, enabling their complex autonomous operations. However, current planning algorithms still face limitations in planning efficiency due to inflexible strategies and weak adaptability. To address this, this paper proposes a multi‑mode hybrid trajectory planning method for UAVs based on real‑time environmental awareness, which dynamically selects the optimal planning model for high‑quality trajectory generation in response to environmental changes. First, we introduce a goal‑oriented spatial awareness method that rapidly assesses flight safety in the upcoming environments. Second, a multi‑mode hybrid trajectory planning mechanism is proposed, which can enhance the planning efficiency by selecting the optimal planning model for trajectory generation based on prior spatial awareness. Finally, we design a lazy replanning strategy that triggers replanning only when necessary to reduce computational resource consumption while maintaining flight quality. To validate the performance of the proposed method, we conducted comprehensive comparative experiments in simulation environments. Results demonstrate that our approach outperforms existing state‑of‑the‑art (SOTA) algorithms across multiple metrics, achieving the best performance particularly in terms of the average number of planning iterations and computational cost per iteration. Furthermore, the effectiveness of our approach is further verified through real‑world flight experiments integrated with a self‑developed intelligent UAV platform.
Authors: Yang Zhao, Zihao Li, Zhiyu Jiang, Dandan Ma, Ganchao Liu, Wenzhe Zhao
Abstract: While Large Language Models (LLMs) form the cornerstone of sequential decision‑making agent development, they have inherent limitations in high‑frequency decision tasks. Existing research mainly focuses on discrete embodied decision scenarios with low‑frequency and significant semantic differences in state space (e.g., household planning). These methods suffer from limited performance in high‑frequency decision‑making tasks, since high‑precision numerical state information in such tasks undergoes frequent updates with minimal fluctuations, and exhibiting policy misalignment between the learned sub‑tasks and composite tasks. To address these issues, this paper proposes Normalized Action Reward guided Consistency Policy Optimization (NAR‑CP). 1) Our method first acquires predefined dense rewards from environmental feedback of candidate actions via reward functions, then completes reward shaping through normalization, and theoretically verifies action reward normalization does not impair optimal policy. 2) To reduce policy misalignment in composite tasks, we use LLMs to infer sub‑observation candidate actions and generate joint policies, with consistency loss ensuring precise alignment between global semantic policies and sub‑semantic policies. Experiments on UAV pursuit, a typical high‑frequency task, show our method delivers superior performance on independent and composite tasks with excellent generalization to unseen tasks.
Authors: Deokyun Kim, Jeongjun Lee, Jungwon Choi, Jonggeon Park, Giyoung Lee, Yookyung Kim, Myungseok Ki, Juho Lee, Jihun Cha
Abstract: Detecting missing persons in forest environments remains a challenge, as dense canopy cover often conceals individuals from detection in top‑down or oblique aerial imagery typically captured by Unmanned Aerial Vehicles (UAVs). While UAVs are effective for covering large, inaccessible areas, their aerial perspectives often miss critical visual cues beneath the forest canopy. This limitation underscores the need for under‑canopy perspectives better suited for detecting missing persons in such environments. To address this gap, we introduce ForestPersons, a novel large‑scale dataset specifically designed for under‑canopy person detection. ForestPersons contains 96,482 images and 204,078 annotations collected under diverse environmental and temporal conditions. Each annotation includes a bounding box, pose, and visibility label for occlusion‑aware analysis. ForestPersons provides ground‑level and low‑altitude perspectives that closely reflect the visual conditions encountered by Micro Aerial Vehicles (MAVs) during forest Search and Rescue (SAR) missions. Our baseline evaluations reveal that standard object detection models, trained on prior large‑scale object detection datasets or SAR‑oriented datasets, show limited performance on ForestPersons. This indicates that prior benchmarks are not well aligned with the challenges of missing person detection under the forest canopy. We offer this benchmark to support advanced person detection capabilities in real‑world SAR scenarios. The dataset is publicly available at https://huggingface.co/datasets/etri/ForestPersons.
Authors: Wenjie Liu, Yansha Deng, Henk Wymeersch
Abstract: We investigate an integrated sensing and communication (ISAC)‑enabled BS for the unmanned aerial vehicle (UAV) obstacle avoidance task, and propose a goal‑oriented semantic communication (GOSC) framework for the BS to transmit sensing and command and control (C&C) signals efficiently and effectively. Our GOSC framework establishes a closed loop for sensing‑C&C generation‑sensing and C&C transmission: For sensing, a Kalman filter (KF) is applied to continuously predict UAV positions, mitigating the reliance of UAV position acquisition on continuous sensing signal transmission, and enhancing position estimation accuracy through sensing‑prediction fusion. Based on the refined estimation position provided by the KF, we develop a Mahalanobis distance‑based dynamic window approach (MD‑DWA) to generate precise C&C signals under uncertainty, in which we derive the mathematical expression of the minimum Mahalanobis distance required to guarantee collision avoidance. Finally, for efficient sensing and C&C signal transmission, we propose an effectiveness‑aware deep Q‑network (E‑DQN) to determine the transmission of sensing and C&C signals based on their value of information (VoI). The VoI of sensing signals is quantified by the reduction in uncertainty entropy of UAV's position estimation, while the VoI of C&C signals is measured by their contribution to UAV navigation improvement. Extensive simulations validate the effectiveness of our proposed GOSC framework. Compared to the conventional ISAC transmission framework that transmits sensing and C&C signals at every time slot, GOSC achieves the same 100% task success rate while reducing the number of transmitted sensing and C&C signals by 92.4% and the number of transmission time slots by 85.5%.
Authors: Haitian Wang, Xinyu Wang, Muhammad Ibrahim, Dustin Severtson, Ajmal Mian
Abstract: Accurate weed mapping in cereal fields requires pixel‑level segmentation from UAV imagery that remains reliable across fields, seasons, and illumination. Existing multispectral pipelines often depend on thresholded vegetation indices, which are brittle under radiometric drift and mixed crop‑‑weed pixels, or on single‑stream CNN and Transformer backbones that ingest stacked bands and indices, where radiance cues and normalized index cues interfere and reduce sensitivity to small weed clusters embedded in crop canopy. We propose VISA, a two‑stream segmentation network that decouples these cues and fuses them at native resolution. The radiance stream learns from calibrated five‑band reflectance using local residual convolutions, channel recalibration, spatial gating, and skip‑connected decoding, which preserve fine textures, row boundaries, and small weed structures that are often weakened after ratio‑based index compression. The index stream operates on vegetation‑index maps with windowed self‑attention to model local structure efficiently, state‑space layers to propagate field‑scale context without quadratic attention cost, and Slot Attention to form stable region descriptors that improve discrimination of sparse weeds under canopy mixing. To support supervised training and deployment‑oriented evaluation, we introduce BAWSeg, a four‑year UAV multispectral dataset collected over commercial barley paddocks in Western Australia, providing radiometrically calibrated blue, green, red, red edge, and near‑infrared orthomosaics, derived vegetation indices, and dense crop, weed, and other labels with leakage‑free block splits. On BAWSeg, VISA achieves 75.6% mIoU and 63.5% weed IoU with 22.8 M parameters, outperforming a multispectral SegFormer‑B1 baseline by 1.2 mIoU and 1.9 weed IoU. Under cross‑plot and cross‑year protocols, VISA maintains 71.2% and 69.2% mIoU, respectively.
Authors: Ngo Tran Anh Thu, Pham Dang Anh Duc, Bui Trong Duc, Nguyen Minh Quan, Trinh Van Chien, Hoang D. Le
Abstract: This paper investigates energy efficiency maximization in an integrated sensing and communication framework for satellite‑UAV MIMO systems, where a LEO satellite and a UAV simultaneously serve ground users and perform target sensing. Both the satellite and UAV are equipped with uniform planar arrays of transmit antennas, enabling a distributed multi‑user and multi‑target architecture. We derive the achievable downlink throughput by considering that the high‑altitude satellite maintains a line‑of‑sight (LoS) link with users, while adopting a probabilistic model for the UAV that accounts for the likelihood of both LoS and non‑line‑of‑sight conditions. The energy efficiency maximization problem is formulated as a complex non‑convex optimization problem, subject to power constraints, quality of service (QoS) requirements, and beampattern gain constraints for accurate sensing. To tackle this challenge, we propose an efficient alternating optimization algorithm capable of handling the complex search space and QoS guarantees. Numerical results across diverse scenarios with multiple users demonstrate that the proposed method achieves high energy efficiency while meeting both communication and sensing performance targets.
Authors: Snehashish Ghosh, Sasthi C. Ghosh
Abstract: Unmanned aerial vehicle (UAV) networks are emerging as a promising solution for ultra‑reliable low‑latency communication (URLLC) in next‑generation wireless systems. A key challenge in millimeter wave UAV networks is maintaining continuous line of sight (LoS) coverage for mobile users, as existing snapshot‑based trajectory planning methods fail to account for user mobility within decision intervals, leading to catastrophic coverage gaps. Standard uniform sampling for continuous coverage verification is computationally prohibitive, requiring huge number of samples to estimate rare failure events with latencies incompatible with real‑time requirements. In this work, we propose a predictive importance sampling (PIS) framework that drastically reduces sample complexity by concentrating verification efforts on predicted failure regions. Specifically, we develop a long short‑term memory mixture density network (LSTM‑MDN) architecture to capture multimodal user trajectory distributions and combine it with defensive mixture sampling for robustness against prediction errors. We prove that PIS provides unbiased failure probability estimates with lower variance than uniform sampling. We then integrate PIS with multi‑agent deep deterministic policy gradient (MADDPG) for coordinated multi‑UAV trajectory planning using an adaptive multi‑objective reward function balancing throughput, coverage, fairness, and energy consumption. Lastly, the simulation results show how our suggested method outperforms three other state‑of‑the‑art methods in terms of coverage rate, throughput, and verification latency, making proactive coverage management for URLLC‑aware UAV networks feasible.
Authors: Yansheng Liu, Jinbo Wen, Kun Zhu, Yang Zhang, Jiawen Kang
Abstract: Low‑Altitude Economy Networks (LAENets) have emerged as a critical communication paradigm for operation‑critical and regulation‑aware applications, where Unmanned Aerial Vehicles (UAVs) transmit task‑related information under stringent low‑probability‑of‑detection constraints. These constraints severely limit the available transmission power and bandwidth, rendering conventional bit‑level communication inefficient when task performance depends on high‑level semantic understanding rather than raw data fidelity. Fortunately, Semantic Communication (SemCom) can be a promising solution by prioritizing task‑relevant information over bit‑level accuracy. However, different levels of semantic abstraction inherently introduce different degrees of information loss and redundancy, which may either compromise task reliability or incur excessive transmission overhead if not properly controlled. To this end, we propose an incentive‑aware semantic entropy control framework for covert communications in LAENets. Specifically, we regulate semantic uncertainty at the receiver by adjusting the semantic abstraction level at the UAV side, thereby enabling reliable task information delivery under extreme covert constraints. Since the Base Station (BS) cannot directly observe the semantic processing capabilities and abstraction‑dependent transmission costs of UAVs, information asymmetry naturally arises in SemCom service provision. Accordingly, we propose a contract theoretic model, where we adopt Prospect Theory (PT) to capture the subjective utility of the BS toward personalized semantic services. Furthermore, we design a Regularized Diffusion‑based Soft Actor‑Critic (RDSAC) algorithm for optimal contract design under PT. This algorithm enhances contract design by introducing diffusion entropy regularization together with action entropy regularization.
Authors: Wael Hafez, Amir Nazeri
Abstract: Model Predictive Control (MPC) is a vital technique for autonomous systems, like Unmanned Aerial Vehicles (UAVs), enabling optimized motion planning. However, traditional MPC struggles to adapt to real‑time changes such as dynamic obstacles and shifting system dynamics, lacking inherent mechanisms for self‑monitoring and adaptive optimization. Here, we introduce Entanglement Learning (EL), an information‑theoretic framework that enhances MPC adaptability through an Information Digital Twin (IDT). The IDT monitors and quantifies, in bits, the information flow between MPC inputs, control actions, and UAV behavior. By introducing new information‑theoretic metrics we call entanglement metrics, it tracks variations in these dependencies. These metrics measure the mutual information between the optimizer's input, its control actions, and the resulting UAV dynamics, enabling a deeper understanding of their interrelationships. This allows the IDT to detect performance deviations and generate real‑time adaptive signals to recalibrate MPC parameters, preserving stability. Unlike traditional MPC, which relies on error‑based feedback, this dual‑feedback approach leverages information flow for proactive adaptation to evolving conditions. Scalable and leveraging existing infrastructure, this framework improves MPC reliability and robustness across diverse scenarios, extending beyond UAV control to any MPC implementation requiring adaptive performance.
Authors: Susmita Ghanta, Karan Nathwani, Rohit Chaurasiya
Abstract: Real‑time unmanned aerial vehicle (UAV) acoustic detection at the edge demands low‑latency inference under strict power and hardware limits. This paper presents SHIELD8‑UAV, a sequential 8‑bit hardware implementation of a precision‑aware 1D feature‑driven CNN (1D‑F‑CNN) accelerator for continuous acoustic monitoring. The design performs layer‑wise execution on a shared multi‑precision datapath, eliminating the need for replicated processing elements. A layer‑sensitivity quantisation framework supports FP32, BF16, INT8, and FXP8 modes, while structured channel pruning reduces the flattened feature dimension from 35,072 to 8,704 (75%), thereby lowering serialised dense‑layer cycles. The model achieves 89.91% detection accuracy in FP32 with less than 2.5% degradation in 8‑bit modes. The accelerator uses 2,268 LUTs and 0.94 W power with 116 ms end‑to‑end latency, achieving 37.8% and 49.6% latency reduction compared with QuantMAC and LPRE, respectively, on a Pynq‑Z2 FPGA, and 5‑9% lower logic usage than parallel designs. ASIC synthesis in UMC 40 nm technology shows a maximum operating frequency of 1.56 GHz, 3.29 mm2 core area, and 1.65 W total power. These results demonstrate that sequential execution combined with precision‑aware quantisation and serialisation‑aware pruning enables practical low‑energy edge inference without relying on massive parallelism.
Authors: Jiahao Fu, Feng Yang
Abstract: Autonomous UAV infiltration in dynamic contested environments remains a significant challenge due to the partially observable nature of threats and the conflicting objectives of mission efficiency versus survivability. Traditional Reinforcement Learning (RL) approaches often suffer from myopic decision‑making and struggle to balance these trade‑offs in real‑time. To address these limitations, this paper proposes an Intent‑Context Synergy Reinforcement Learning (ICS‑RL) framework. The framework introduces two core innovations: (1) An LSTM‑based Intent Prediction Module that forecasts the future trajectories of hostile units, transforming the decision paradigm from reactive avoidance to proactive planning via state augmentation; (2) A Context‑Analysis Synergy Mechanism that decomposes the mission into hierarchical sub‑tasks (safe cruise, stealth planning, and hostile breakthrough). We design a heterogeneous ensemble of Dueling DQN agents, each specialized in a specific tactical context. A dynamic switching controller based on Max‑Advantage values seamlessly integrates these agents, allowing the UAV to adaptively select the optimal policy without hard‑coded rules. Extensive simulations demonstrate that ICS‑RL significantly outperforms baselines (Standard DDQN) and traditional methods (PSO, Game Theory). The proposed method achieves a mission success rate of 88% and reduces the average exposure frequency to 0.24 per episode, validating its superiority in ensuring robust and stealthy penetration in high‑dynamic scenarios.
Authors: Durgakant Pushp, Swapnil Kalhapure, Shaekh Mohammad Shithil, Lantao Liu
Abstract: Exploring and inspecting \emphHidden Spaces, defined as environments whose entrances are accessible only to aerial robots but remain unexplored due to geometric constraints, limited flight time, and communication loss, remains a major challenge. We present miniUGV_2, a compact UAV‑deployable tracked ground vehicle that extends UAV capabilities into confined environments. The system introduces dual articulated arms, integrated LiDAR and depth sensing, and modular electronics for enhanced autonomy. A novel tether module with an electro‑permanent magnetic head enables safe deployment, retrieval, and optional detachment, thereby overcoming prior entanglement issues. Experiments demonstrate robust terrain navigation, self‑righting, and manipulation of objects up to 3.5 kg, validating miniUGV_2 as a versatile platform for hybrid aerial‑ground robotics.
Authors: Talip Tolga Sarı, Rameez Ahmed, Abdullah Al Noman, Gökhan Seçinti, Chris Dick, Debashri Roy
Abstract: Low‑Altitude Wireless Networks (LAWN) are transforming the low‑altitude airspace into a mission‑driven, dynamically reconfigurable 3D network fabric for safety‑critical and public‑safety operations. In parallel, Direct‑to‑Cell (D2C) satellite access can rapidly restore connectivity after disasters, yet dense urban blockages make the satellite‑to‑ground link unreliable for many users. To overcome this, we leverage the LAWN aerial layer and form an adaptive low‑altitude relay topology where Unmanned Aerial Vehicles (UAVs) act as D2C‑assisted aerial relays for obstructed ground users. We introduce TITAN, a twin‑informed topology adaptation framework that builds a high‑fidelity Digital Twin (DT) of the affected urban area and performs site‑specific, ray‑traced air‑to‑ground channel modeling via Sionna RT. This informs a Bayesian optimization process that adapts the aerial topology to maximize coverage and Quality of Service (QoS) for ground users by using UAVs as optimal D2C relays. Extensive system‑level simulations with Sionna show that TITAN consistently outperforms the baselines and delivers +32.2% user coverage, +64.9% system sum‑rate, and +49.3% fairness over the state‑of‑the‑art (SOTA) that employ heuristic placement or statistical channel approximations. To support further research in resilient network design, we open‑source the codebase of the TITAN framework.
Authors: Kautuk Astu, Suman Raj, Priyanshu Pansari, Yogesh Simmhan
Abstract: The increasing adoption of UAVs equipped with advanced sensors and GPU‑accelerated edge computing has enabled real‑time AI‑driven applications in domains such as precision agriculture, wildfire monitoring, and environmental conservation. However, the integrated design and orchestration of navigation, sensing, and analytics, together with seamless real‑time coordination across drone, edge, and cloud resources, remains a significant challenge. To address these challenges, we propose AeroDaaS, a service‑oriented framework that abstracts UAV‑based sensing complexities and provides a Drone‑as‑a‑Service (DaaS) model for intelligent decision‑making. AeroDaaS offers modular service primitives for on‑demand UAV sensing, navigation and analytics as composable microservices, ensuring cross‑platform compatibility and scalability across heterogeneous UAV and edge‑cloud infrastructures. AeroDaaS also supports plug‑and‑play scheduling modules, including Waypoint and Analytics schedulers, which enable trajectory optimization and real‑time coordination of inference workloads. We implement and evaluate AeroDaaS for six real‑world DaaS applications, of which two are evaluated in real‑world scenarios and four in simulation. AeroDaaS requires less than 40 lines of code for the applications and has minimal platform overhead of less than 20 ms per frame and about 1 GB memory usage on Orin Nano and a AMD RTX 3090 GPU workstation. These results are promising for AeroDaaS as an efficient, flexible and scalable UAV programming framework for autonomous aerial analytics.
Authors: Yao Wu, Ziye Jia, Jingjing Zhao, Haoyang Wang, Qihui Wu, Zhu Han
Abstract: Unmanned aerial vehicle (UAV) networks are increasingly deployed for complex missions, including disaster response, intelligent logistics, and environmental monitoring. These missions generally require coordinated collaboration among multiple UAVs across distinct administrative domains. To support such cross‑domain cooperation, service function chains (SFCs) are constructed, where complex workflows are decomposed into ordered service functions assigned to appropriate UAVs along the mission path. However, it is challenging to ensure secure, trustworthy, and low‑latency cross‑domain SFC orchestration in identity management, authentication, and resilience to node failures. To address these issues, this paper proposes a consortium blockchain‑based trust architecture for cross‑domain decentralized identity verification, auditable task execution, and dynamic service‑aware orchestrator selection. The framework employs a hierarchical four‑phase cross‑domain authentication protocol covering the credential pre‑verification, intra‑domain execution, secure relay, and audit logging. The use case analysis confirms that the proposed framework achieves substantial reductions in authentication latency and significant improvements in system throughput against centralized and static schemes. The open challenges in scalability, adaptive trust assessment, interoperability, and energy efficiency are discussed, thereby providing directions for future researches on secure and efficient cross‑domain UAV service orchestration.
Authors: Haolin Zheng, Ning Gao, Zhenghang Zhu, Zhijun Huang, Shi Jin, Michail Matthaiou
Abstract: We present a real‑world multi‑scenario unmanned aerial vehicle (UAV) radio frequency (RF) dataset, namely DRFF‑R2, which is collected using a dedicated acquisition platform under diverse operational conditions. All signals are acquired within a unified framework to ensure consistency in hardware configuration and environmental settings. The dataset is systematically organized into seven well‑defined subsets corresponding to different operational and signal composition scenarios to facilitate structured experimentation. Each file follows a clearly annotated naming convention to enable convenient data indexing and reproducible analysis. The dataset contains RF recordings from 26 UAV units spanning 8 distinct models, captured across varying flight states, altitudes, speeds, acquisition days, and receiver configurations. By covering diverse acquisition settings and signal compositions, the dataset provides a comprehensive resource for future UAV RF signal research, including RF fingerprinting (RFF) identification, model‑level recognition, flight state analysis, time‑varying RFF study, and interference‑aware signal processing.
Authors: Alexandre Anahory Simoes, Leonardo Colombo, Juan Giribet, Efstratios Stratoglou
Abstract: We propose a geometric control framework on SE(3) for quadrotors that enforces pointing‑driven missions without completing a full attitude reference. The mission is encoded through virtual constraints defining a task manifold and an associated set of admissible velocities, and invariance is achieved by a feedback law obtained from a linear system in selected inputs. Under a transversality condition with the effective actuation distribution, the invariance‑enforcing input is uniquely defined, yielding a constructive control law and, for relevant tasks, closed‑form expressions. We further derive a local off‑manifold stabilization extension. As a case study, we lock a body axis to a prescribed line‑of‑sight direction while maintaining fixed altitude.
Authors: Václav Riss, Vít Krátký, Robert Pěnička, Martin Saska
Abstract: This paper introduces an online inspection algorithm that enables an autonomous UAV to fly around a transmission tower and obtain detailed inspection images without a prior map of the tower. Our algorithm relies on camera‑LiDAR sensor fusion for online detection and localization of insulators. In particular, the algorithm is based on insulator detection using a convolutional neural network, projection of LiDAR points onto the image, and filtering them using the bounding boxes. The detection pipeline is coupled with several proposed insulator localization methods based on DBSCAN, RANSAC, and PCA algorithms. The performance of the proposed online inspection algorithm and camera‑LiDAR sensor fusion pipeline is demonstrated through simulation and real‑world flights. In simulation, we showed that our single‑flight inspection strategy can save up to 24 % of total inspection time, compared to the two‑flight strategy of scanning the tower and afterwards visiting the inspection waypoints in the optimal way. In a real‑world experiment, the best performing proposed method achieves a mean horizontal and vertical localization error for the insulator of 0.16 +‑ 0.08 m and 0.16 +‑ 0.11 m, respectively. Compared to the most relevant approach, the proposed method achieves more than an order of magnitude lower variance in horizontal insulator localization error.
Authors: Xingyu Shao, Mengfan He, Chunyu Li, Liangzheng Sun, Ziyang Meng
Abstract: To address the scale mismatch caused by large altitude variations in UAV visual place recognition, we propose a monocular vision‑only altitude‑adaptive geo‑localization framework. The method first estimates relative altitude from a single downward‑looking image by transforming the input into the frequency domain and formulating altitude estimation as a regression‑as‑classification (RAC) problem. The estimated altitude is then used to crop the query image to a canonical scale, after which a classification‑then‑retrieval visual place recognition module performs coarse localization. To improve retrieval robustness under varying image quality, we further introduce a quality‑adaptive margin classifier (QAMC) and refine the final location by weighted coordinate estimation over the top retrieved candidates. Experiments on two synthetic datasets and two real‑flight datasets show that the relative altitude estimation (RAE) module yields clear overall improvements in downstream retrieval performance under significant altitude changes. With our visual place recognition module, altitude adaptation improves average R@1 and R@5 by 41.50 and 56.83 percentage points, respectively, compared with using the same retrieval pipeline without altitude normalization, and the full system runs at 13.3 frames/s on the reported workstation hardware. These results indicate that relative altitude estimation provides an effective scale prior for cross‑altitude UAV geo‑localization and supports GPS‑denied coarse initialization without auxiliary range sensors or temporal inputs.
Authors: Jixiang Wang, Siyuan Yang, Ziyi Wu, Siqi Wei, Ashay Wakode, Agata Barcis, Hung Nguyen, Shaoming He
Abstract: Acceleration‑commanded guidance laws (e.g., proportional navigation) are attractive for high‑level decision making, but their direct deployment on fixed‑wing UAVs is challenging because accelerations are not directly actuated and must be realized through attitude and thrust under flight‑envelope constraints. This paper presents an acceleration‑level outer‑loop control framework that converts commanded tangential and normal accelerations into executable body‑rate and normalized thrust commands compatible with mainstream autopilots (e.g., PX4/APM). For the normal channel, we derive an engineering mapping from the desired normal acceleration to roll‑ and pitch‑rate commands that regulate the direction and magnitude of the lift vector under small‑angle assumptions. For the tangential channel, we introduce an energy‑based formulation inspired by total energy control and identify an empirical thrust‑energy acceleration relationship directly from flight data, avoiding explicit propulsion modeling or thrust bench calibration. We further discuss priority handling between normal and tangential accelerations under saturation and non‑level maneuvers. Extensive real‑flight experiments on a VTOL fixed‑wing platform demonstrate accurate acceleration tracking and enable practical implementation of proportional navigation using only body‑rate and normalized thrust interfaces.
Authors: Wenzhe Zhao, Yang Zhao, Ganchao Liu, Zhiyu Jiang, Dandan Ma, Zihao Li, Xuelong Li
Abstract: In UAV dynamic decision, complex and variable hazardous factors pose severe challenges to the generalization capability of algorithms. Despite offering semantic understanding and scene generalization, Large Language Models (LLM) lack domain‑specific UAV control knowledge and formal safety assurances, restricting their direct applicability. To bridge this gap, this paper proposes a train‑free two‑layer decision architecture based on LLMs, integrating high‑level safety planning with low‑level precise control. The framework introduces three key contributions: 1) A fuzzy Control Barrier Function verification mechanism for semantically‑augmented actions, providing provable safety certification for LLM outputs. 2) A star‑hierarchical graph‑based retrieval‑augmented generation system, enabling efficient, elastic, and interpretable scene adaptation. 3) Systematic experimental validation in pursuit‑evasion scenarios with unknown obstacles and emergent threats, demonstrating that our SAGE‑LLM maintains performance while significantly enhancing safety and generalization without online training. The proposed framework demonstrates strong extensibility, suggesting its potential for generalization to broader embodied intelligence systems and safety‑critical control domains.
Authors: Seungyeol Baek, Jaspreet Singh, Lala Shakti Swarup Ray, Hymalai Bello, Paul Lukowicz, Sungho Suh
Abstract: Human operators are still frequently exposed to hazardous environments such as disaster zones and industrial facilities, where intuitive and reliable teleoperation of mobile robots and Unmanned Aerial Vehicles (UAVs) is essential. In this context, hands‑free teleoperation enhances operator mobility and situational awareness, thereby improving safety in hazardous environments. While vision‑based gesture recognition has been explored as one method for hands‑free teleoperation, its performance often deteriorates under occlusions, lighting variations, and cluttered backgrounds, limiting its applicability in real‑world operations. To overcome these limitations, we propose a multimodal gesture recognition framework that integrates inertial data (accelerometer, gyroscope, and orientation) from Apple Watches on both wrists with capacitive sensing signals from custom gloves. We design a late fusion strategy based on the log‑likelihood ratio (LLR), which not only enhances recognition performance but also provides interpretability by quantifying modality‑specific contributions. To support this research, we introduce a new dataset of 20 distinct gestures inspired by aircraft marshalling signals, comprising synchronized RGB video, IMU, and capacitive sensor data. Experimental results demonstrate that our framework achieves performance comparable to a state‑of‑the‑art vision‑based baseline while significantly reducing computational cost, model size, and training time, making it well suited for real‑time robot control. We therefore underscore the potential of sensor‑based multimodal fusion as a robust and interpretable solution for gesture‑driven mobile robot and drone teleoperation.
Authors: Nazia Hossain, Xintong Jiang, Yu Tian, Philippe Seguin, O. Grant Clark, Shangpeng Sun
Abstract: Fine‑grained crop‑weed segmentation is essential for enabling targeted herbicide application in precision agriculture. However, existing deep learning models struggle to generalize across heterogeneous agricultural environments due to reliance on dataset‑specific visual features. We propose Vision‑Language Weed Segmentation (VL‑WS), a novel framework that addresses this limitation by grounding pixel‑level segmentation in semantically aligned, domain‑invariant representations. Our architecture employs a dual‑encoder design, where frozen Contrastive Language‑Image Pretraining (CLIP) embeddings and task‑specific spatial features are fused and modulated via Feature‑wise Linear Modulation (FiLM) layers conditioned on natural language captions. This design enables image level textual descriptions to guide channel‑wise feature refinement while preserving fine‑grained spatial localization. Unlike prior works restricted to training and evaluation on single‑source datasets, VL‑WS is trained on a unified corpus that includes close‑range ground imagery (robotic platforms) and high‑altitude UAV imagery, covering diverse crop types, weed species, growth stages, and sensing conditions. Experimental results across four benchmark datasets demonstrate the effectiveness of our framework, with VL‑WS achieving a mean Dice score of 91.64% and outperforming the CNN baseline by 4.98%. The largest gains occur on the most challenging weed class, where VL‑WS attains 80.45% Dice score compared to 65.03% for the best baseline, representing a 15.42% improvement. VL‑WS further maintains stable weed segmentation performance under limited target‑domain supervision, indicating improved generalization and data efficiency. These findings highlight the potential of vision‑language alignment to enable scalable, label‑efficient segmentation models deployable across diverse real‑world agricultural domains.
Authors: Ziye Jia, Sijie He, Ligang Yuan, Fuhui Zhou, Qihui Wu, Zhu Han, Dusit Niyato
Abstract: Due to the scalability and portability, low‑altitude intelligent networks (LAINs) are essential in various fields such as surveillance and disaster rescue. However, in LAINs, unmanned aerial vehicles (UAVs) are characterized by the distributed topology and high mobility, thus vulnerable to security threats, which may degrade routing performances for data transmissions. Hence, how to ensure the routing stability and security of LAINs is challenging. In this paper, we focus on the routing with multiple UAV clusters in LAINs. To minimize the damage caused by potential threats, we present the zero‑trust architecture with the software‑defined perimeter and blockchain techniques to manage the identify and mobility of UAVs. Besides, we formulate the routing problem to optimize the end‑to‑end (E2E) delay and transmission success ratio (TSR) simultaneously, which is an integer nonlinear programming problem and intractable to solve. Therefore, we reformulate the problem into a decentralized partially observable Markov decision process. We design the multi‑agent double deep Q‑network‑based routing algorithms to solve the problem, empowered by the soft‑hierarchical experience replay buffer and prioritized experience replay mechanisms. Finally, extensive simulations are conducted and the numerical results demonstrate that the proposed framework reduces the average E2E delay by 59% and improves the TSR by 29% on average compared to benchmarks, while simultaneously enabling faster and more robust identification of low‑trust UAVs.
Authors: Anuraj Uthayasooriyan, Krishna Manaswi Digumarti, Jack Breward, Fernando Vanegas, Julian Galvez-Serna, Felipe Gonzalez
Abstract: Aerial manipulators extend the reach and manipulation capabilities of uncrewed multirotor aerial vehicles for inspection, agriculture, sampling, and delivery. Continuum arm aerial manipulation systems offer lightweight, dexterous, and compliant interaction opportunities. Existing designs allow manipulation only below the UAV which restricts their deployability in multiple directions and through clutter. They are also sensitive to propeller downwash. Addressing these limitations, we present Tilt‑X, a continuum arm aerial manipulator that integrates a tilting mechanism, a telescopic stage, and a cable‑driven continuum section. We present its design and kinematic model and validate it through flight demonstrations. Tilt‑X enables a volumetric workspace with up to 75 mm extension and planar orientations between 0^\circ to 90^\circ. Experiments comparing end effector pose with and without downwash quantitatively measure its accuracy, providing critical evidence to guide the design and control of reliable aerial manipulators. Results show stabilisation of end effector pose as the manipulator extends out of the propeller influence zone.
Authors: Tao Liu, Gang Wan, Kan Ren, Shibo Wen
Abstract: We propose a new unsupervised framework for online video stabilization. Unlike methods based on deep learning that require paired stable and unstable datasets, our approach instantiates the classical stabilization pipeline with three stages and incorporates a multithreaded buffering mechanism. This design addresses three longstanding challenges in end‑to‑end learning: limited data, poor controllability, and inefficiency on hardware with constrained resources. Existing benchmarks focus mainly on handheld videos with a forward view in visible light, which restricts the applicability of stabilization to domains such as UAV nighttime remote sensing. To fill this gap, we introduce a new multimodal UAV aerial video dataset (UAV‑Test). Experiments show that our method consistently outperforms state‑of‑the‑art online stabilizers in both quantitative metrics and visual quality, while achieving performance comparable to offline methods.
Authors: Chong-Yi Sun, Heling Yuan, Xu Fang, Yan He, Xi-Ming Sun
Abstract: This paper investigates the position‑tracking control problem for fixed‑wing unmanned aerial vehicles (UAVs) equipped with a turbojet engine via an integrated flight and propulsion control scheme. To this end, a hierarchical control framework with thrust and disturbance compensation is proposed. In particular, we first propose a perturbed fixed‑wing UAV model with turbojet engine dynamics, accounting for both unmodeled dynamics and external disturbances. Second, a versatile extended observer is designed to handle both unmeasurable thrust dynamics and external disturbances. Third, a hierarchical control framework is implemented using three observer‑based controllers to guarantee position‑tracking performance. With the proposed control strategy, we prove that the closed‑loop system asymptotically converges to the desired trajectory. Finally, a comparative simulation is performed to illustrate the proposed control strategy.
Authors: Yuankai Chen, Kai Lin, Qihong Wu, Xinxuan Yang, Jiashuo Lai, Ruoen Chen, Haonan Shi, Minfan He, Meihua Wang
Abstract: Small target detection in UAV imagery faces significant challenges such as scale variations, dense distribution, and the dominance of small targets. Existing algorithms rely on manually designed components, and general‑purpose detectors are not optimized for UAV images, making it difficult to balance accuracy and complexity. To address these challenges, this paper proposes an end‑to‑end object detection framework, UFO‑DETR, which integrates an LSKNet‑based backbone network to optimize the receptive field and reduce the number of parameters. By combining the DAttention and AIFI modules, the model flexibly models multi‑scale spatial relationships, improving multi‑scale target detection performance. Additionally, the DynFreq‑C3 module is proposed to enhance small target detection capability through cross‑space frequency feature enhancement. Experimental results show that, compared to RT‑DETR‑L, the proposed method offers significant advantages in both detection performance and computational efficiency, providing an efficient solution for UAV edge computing.
Authors: Kai Li, Shengtao Zheng, Linkun Xiu, Yuze Sheng, Xiao-Ping Zhang, Dongyue Huang, Xinlei Chen
Abstract: Autonomous exploration in unknown environments is key for mobile robots, helping them perceive, map, and make decisions in complex areas. However, current methods often rely on frequent global optimization, suffering from high computational latency and trajectory oscillation, especially on resource‑constrained edge devices. To address these limitations, we propose SCOPE, a novel framework that incrementally constructs a real‑time skeletal graph and introduces Implicit Unknown Region Analysis for efficient spatial reasoning. The planning layer adopts a hierarchical on‑demand strategy: the Proximal Planner generates smooth, high‑frequency local trajectories, while the Region‑Sequence Planner is activated only when necessary to optimize global visitation order. Comparative evaluations in simulation demonstrate that SCOPE achieves competitive exploration performance comparable to state‑of‑the‑art global planners, while reducing computational cost by an average of 86.9%. Real‑world experiments further validate the system's robustness and low latency in practical scenarios.
Authors: Alfonso Sciacchitano, Douglas L. Van Bossuyt
Abstract: Small, low‑size, weight, power, and cost (SWaP‑C) uncrewed aerial vehicles (UAVs) are increasingly used for intelligence, surveillance, and reconnaissance (ISR) missions due to their affordability, attritability, and suitability for distributed operations. However, their design poses challenges including limited endurance, constrained payload capacity, and reliance on simple sensing modalities such as fixed‑field‑of‑view, bearing‑only cameras. Traditional platform‑centric methods cannot capture the coupled performance, cost, and coordination trade‑offs that emerge at the system‑of‑systems level.
This paper presents a mission engineering framework for early‑phase design of low‑SWaP‑C UAV ISR architectures. The framework integrates design of experiments, multi‑objective optimization, and high‑fidelity simulation into a closed‑loop process linking design variables to estimator‑informed performance and mission cost. Candidate architectures are explored via Latin hypercube sampling and refined using a genetic algorithm, with performance evaluated through Monte Carlo trials of a federated Kalman filter benchmarked against the posterior Cramer‑Rao lower bound. Validation follows the Validation Square methodology, combining theoretical, empirical, and structural assessments.
A case study on man‑overboard localization in a GNSS‑denied maritime environment shows that localization accuracy saturates at sub‑meter levels, while higher‑cost configurations primarily add redundancy and resilience. The framework thus quantifies mission trade‑offs between performance, affordability, and robustness, providing a scalable decision‑support tool for contested, resource‑constrained ISR missions.
Authors: Hanyang Liu, Rongjun Qin
Abstract: Recent advances in 4D scene reconstruction have significantly improved dynamic modeling across various domains. However, existing approaches remain limited under aerial conditions with single‑view capture, wide spatial range, and dynamic objects of limited spatial footprint and large motion disparity. These challenges cause severe depth ambiguity and unstable motion estimation, making monocular aerial reconstruction inherently ill‑posed. To this end, we present AeroDGS, a physics‑guided 4D Gaussian splatting framework for monocular UAV videos. AeroDGS introduces a Monocular Geometry Lifting module that reconstructs reliable static and dynamic geometry from a single aerial sequence, providing a robust basis for dynamic estimation. To further resolve monocular ambiguity, we propose a Physics‑Guided Optimization module that incorporates differentiable ground‑support, upright‑stability, and trajectory‑smoothness priors, transforming ambiguous image cues into physically consistent motion. The framework jointly refines static backgrounds and dynamic entities with stable geometry and coherent temporal evolution. We additionally build a real‑world UAV dataset that spans various altitudes and motion conditions to evaluate dynamic aerial reconstruction. Experiments on synthetic and real UAV scenes demonstrate that AeroDGS outperforms state‑of‑the‑art methods, achieving superior reconstruction fidelity in dynamic aerial environments.
Authors: Shuang Song, Debao Huang, Deyan Deng, Haolin Xiong, Yang Tang, Yajie Zhao, Rongjun Qin
Abstract: Intrinsic image decomposition (IID) of outdoor scenes is crucial for relighting, editing, and understanding large‑scale environments, but progress has been limited by the lack of real‑world datasets with reliable albedo and shading supervision. We introduce Olbedo, a large‑scale aerial dataset for outdoor albedo‑‑shading decomposition in the wild. Olbedo contains 5,664 UAV images captured across four landscape types, multiple years, and diverse illumination conditions. Each view is accompanied by multi‑view consistent albedo and shading maps, metric depth, surface normals, sun and sky shading components, camera poses, and, for recent flights, measured HDR sky domes. These annotations are derived from an inverse‑rendering refinement pipeline over multi‑view stereo reconstructions and calibrated sky illumination, together with per‑pixel confidence masks. We demonstrate that Olbedo enables state‑of‑the‑art diffusion‑based IID models, originally trained on synthetic indoor data, to generalize to real outdoor imagery: fine‑tuning on Olbedo significantly improves single‑view outdoor albedo prediction on the MatrixCity benchmark. We further illustrate applications of Olbedo‑trained models to multi‑view consistent relighting of 3D assets, material editing, and scene change analysis for urban digital twins. We release the dataset, baseline models, and an evaluation protocol to support future research in outdoor intrinsic decomposition and illumination‑aware aerial vision.
Authors: Xinkai Ji, Pan Liu, Ying Yang, Yu Han
Abstract: In Part I of this companion paper series, we introduced SWIFTraj, a new open‑source vehicle trajectory dataset collected using a unmanned aerial vehicle (UAV) swarm. The dataset has two distinctive features. First, by connecting trajectories across consecutive UAV videos, it provides long‑distance continuous trajectories, with the longest exceeding 4.5 km. Second, it covers an integrated traffic network consisting of both freeways and their connected urban roads. Obtaining such long‑distance continuous trajectories from a UAV swarm is challenging, due to the need for accurate time alignment across multiple videos and the irregular spatial distribution of UAVs. To address these challenges, this paper proposes a novel graph‑based approach for connecting vehicle trajectories captured by a UAV swarm. An undirected graph is constructed to represent flexible UAV layouts, and an automatic time alignment method based on trajectory matching cost minimization is developed to estimate optimal time offsets across videos. To associate trajectories of the same vehicle observed in different videos, a vehicle matching table is established using the Hungarian algorithm. The proposed approach is evaluated using both simulated and real‑world data. Results from real‑world experiments show that the time alignment error is within three video frames, corresponding to approximately 0.1 s, and that the vehicle matching achieves an F1‑score of about 0.99. These results demonstrate the effectiveness of the proposed method in addressing key challenges in UAV‑based trajectory connection and highlight its potential for large‑scale vehicle trajectory collection.
Authors: Leonardo Colombo, Thomas Beckers, Juan Giribet
Abstract: This paper presents an aggressiveness‑aware control framework for quadrotor UAVs that integrates learning‑based oracles to mitigate the effects of unknown disturbances. Starting from a nominal tracking controller on \mathrmSE(3), unmodeled generalized forces and moments are estimated using a learning‑based oracle and compensated in the control inputs. An aggressiveness‑aware gain scheduling mechanism adapts the feedback gains based on probabilistic model‑error bounds, enabling reduced feedback‑induced aggressiveness while guaranteeing a prescribed practical exponential tracking performance. The proposed approach makes explicit the trade‑off between model accuracy, robustness, and control aggressiveness, and provides a principled way to exploit learning for safer and less aggressive quadrotor maneuvers.
Authors: Luka Šiktar, Branimir Ćaran, Bojan Šekoranja, Marko Švaco
Abstract: Search and rescue (SAR) operations require rapid responses to save lives or property. Unmanned Aerial Vehicles (UAVs) equipped with vision‑based systems support these missions through prior terrain investigation or real‑time assistance during the mission itself. Vision‑based UAV frameworks aid human search tasks by detecting and recognizing specific individuals, then tracking and following them while maintaining a safe distance. A key safety requirement for UAV following is the accurate estimation of the distance between camera and target object under real‑world conditions, achieved by fusing multiple image modalities. UAVs with deep learning‑based vision systems offer a new approach to the planning and execution of SAR operations. As part of the system for automatic people detection and face recognition using deep learning, in this paper we present the fusion of depth camera measurements and monocular camera‑to‑body distance estimation for robust tracking and following. Deep learning‑based filtering of depth camera data and estimation of camera‑to‑body distance from a monocular camera are achieved with YOLO‑pose, enabling real‑time fusion of depth information using the Extended Kalman Filter (EKF) algorithm. The proposed subsystem, designed for use in drones, estimates and measures the distance between the depth camera and the human body keypoints, to maintain the safe distance between the drone and the human target. Our system provides an accurate estimated distance, which has been validated against motion capture ground truth data. The system has been tested in real time indoors, where it reduces the average errors, root mean square error (RMSE) and standard deviations of distance estimation up to 15,3% in three tested scenarios.
Authors: Yida Lin, Bing Xue, Mengjie Zhang, Sam Schofield, Richard Green
Abstract: Accurate per‑branch 3D reconstruction is a prerequisite for autonomous UAV‑based tree pruning; however, dense disparity maps from modern stereo matchers often remain too noisy for individual branch analysis in complex forest canopies. This paper introduces a progressive pipeline integrating DEFOM‑Stereo foundation‑model disparity estimation, SAM3 instance segmentation, and multi‑stage depth optimization to deliver robust per‑branch point clouds. Starting from a naive baseline, we systematically identify and resolve three error families through successive refinements. Mask boundary contamination is first addressed through morphological erosion and subsequently refined via a skeleton‑preserving variant to safeguard thin‑branch topology. Segmentation inaccuracy is then mitigated using LAB‑space Mahalanobis color validation coupled with cross‑branch overlap arbitration. Finally, depth noise ‑ the most persistent error source ‑ is initially reduced by outlier removal and median filtering, before being superseded by a robust five‑stage scheme comprising MAD global detection, spatial density consensus, local MAD filtering, RGB‑guided filtering, and adaptive bilateral filtering. Evaluated on 1920x1080 stereo imagery of Radiata pine (Pinus radiata) acquired with a ZED Mini camera (63 mm baseline) from a UAV in Canterbury, New Zealand, the proposed pipeline reduces the average per‑branch depth standard deviation by 82% while retaining edge fidelity. The result is geometrically coherent 3D point clouds suitable for autonomous pruning tool positioning. All code and processed data are publicly released to facilitate further UAV forestry research.
Authors: Christos Maikos, Georgios Angelidis, Georgios Th. Papadopoulos
Abstract: In this study, we present an end‑to‑end pipeline capable of converting drone‑captured video streams into high‑fidelity 3D reconstructions with minimal latency. Unmanned aerial vehicles (UAVs) are extensively used in aerial real‑time perception applications. Moreover, recent advances in 3D Gaussian Splatting (3DGS) have demonstrated significant potential for real‑time neural rendering. However, their integration into end‑to‑end UAV‑based reconstruction and visualization systems remains underexplored. Our goal is to propose an efficient architecture that combines live video acquisition via RTMP streaming, synchronized sensor fusion, camera pose estimation, and 3DGS optimization, achieving continuous model updates and low‑latency deployment within interactive visualization environments that supports immersive augmented and virtual reality (AR/VR) applications. Experimental results demonstrate that the proposed method achieves competitive visual fidelity, while delivering significantly higher rendering performance and substantially reduced end‑to‑end latency, compared to NeRF‑based approaches. Reconstruction quality remains within 4‑7% of high‑fidelity offline references, confirming the suitability of the proposed system for real‑time, scalable augmented perception from aerial platforms.
Authors: Chenran Kou, Changsheng You, Mingjiang Wu, Dingzhu Wen, Zezhong Zhang, Chengwen Xing
Abstract: For low‑altitude economy (LAE), fast and accurate beam prediction between high‑mobility unmanned aerial vehicles (UAVs) and ground base stations is of paramount importance, which ensures seamless coverage and reliable communications. However, existing deep learning‑based beam prediction methods lack high‑level semantic understanding of dynamic environments, resulting in poor generalization. On the other hand, the emerging large language model (LLM) based approaches show promise in enhancing generalization, but they typically lack rich environmental perception, thereby failing to capture fine‑grained spatial semantics essential for precise beam alignment. To tackle these limitations, we propose in this correspondence a novel end‑to‑end generative framework for beam prediction, called BeamVLM, which treats beam prediction as a vision question answering task capitalizing on powerful existing vision‑language models (VLMs). By projecting raw visual patches directly into the language domain and judiciously designing an instructional prompt, the proposed BeamVLM enables the VLM to jointly reason over UAV trajectories and environmental context. Last, experimental results on real‑world datasets demonstrate that the proposed BeamVLM outperforms state‑of‑the‑art methods in prediction accuracy and also exhibits superior generalization for other scenarios such as vehicle‑to‑infrastructure (V2I) beam prediction.
Authors: Bang Huang, Baha Eddine Youcef Belmekki, Mohamed-Slim Alouini
Abstract: The Low‑Altitude Economy (LAE) is rapidly emerging as a new technological and industrial frontier, with unmanned aerial vehicles (UAVs), electric vertical takeoff and landing (eVTOL) aircraft, and aerial swarms increasingly deployed in logistics, infrastructure inspection, security, and emergency response. However, the large‑scale development of the LAE demands a reliable aerial foundation that ensures not only real‑time connectivity and computational support, but also navigation integrity and safe airspace management for safety‑critical operations. High‑Altitude Platforms (HAPs), positioned at around 20 km, provide a unique balance between wide‑area coverage and low‑latency responsiveness. Compared with low earth orbit (LEO) satellites, HAPs are closer to end users and thus capable of delivering millisecond‑level connectivity, fine‑grained regulatory oversight, and powerful onboard computing and caching resources. Beyond connectivity and computation, HAPs‑assisted sensing and regulation further enable navigation integrity and airspace trust, which are essential for safety‑critical UAV and eVTOL operations in the LAE. This article proposes a five‑stage evolutionary roadmap for HAPs in the LAE: from serving as aerial infrastructure bases, to becoming super back‑ends for UAV, to acting as frontline support for ground users, further enabling swarm‑scale UAV coordination, and ultimately advancing toward edge‑air‑cloud closed‑loop autonomy. In parallel, HAPs complement LEO satellites and cloud infrastructures to form a global‑regional‑local three‑tier architecture. Looking forward, HAPs are expected to evolve from simple platforms into intelligent hubs, emerging as pivotal nodes for air traffic management, intelligent logistics, and emergency response. By doing so, they will accelerate the transition of the LAE toward large‑scale deployment, autonomy, and sustainable growth.
Authors: Yida Lin, Bing Xue, Mengjie Zhang, Sam Schofield, Richard Green
Abstract: Autonomous drone‑based tree pruning needs accurate, real‑time depth estimation from stereo cameras. Depth is computed from disparity maps using Z = f B/d, so even small disparity errors cause noticeable depth mistakes at working distances. Building on our earlier work that identified DEFOM‑Stereo as the best reference disparity generator for vegetation scenes, we present the first study to train and test ten deep stereo matching networks on real tree branch images. We use the Canterbury Tree Branches dataset ‑‑ 5,313 stereo pairs from a ZED Mini camera at 1080P and 720P ‑‑ with DEFOM‑generated disparity maps as training targets. The ten methods cover step‑by‑step refinement, 3D convolution, edge‑aware attention, and lightweight designs. Using perceptual metrics (SSIM, LPIPS, ViTScore) and structural metrics (SIFT/ORB feature matching), we find that BANet‑3D produces the best overall quality (SSIM = 0.883, LPIPS = 0.157), while RAFT‑Stereo scores highest on scene‑level understanding (ViTScore = 0.799). Testing on an NVIDIA Jetson Orin Super (16 GB, independently powered) mounted on our drone shows that AnyNet reaches 6.99 FPS at 1080P ‑‑ the only near‑real‑time option ‑‑ while BANet‑2D gives the best quality‑speed balance at 1.21 FPS. We also compare 720P and 1080P processing times to guide resolution choices for forestry drone systems.
Authors: Yulun Huang, Zhiyu Wang, Rajkumar Buyya
Abstract: Wildfire monitoring demands timely data collection and processing for early detection and rapid response. UAV‑assisted edge computing is a promising approach, but jointly minimizing end‑to‑end service response time while satisfying energy, revisit time, and capacity constraints remains challenging. We propose an integrated framework that co‑optimizes UAV route planning, fleet sizing, and edge service provisioning for wildfire monitoring. The framework combines fire‑history‑weighted clustering to prioritize high‑risk areas, Quality of Service (QoS)‑aware edge assignment balancing proximity and computational load, 2‑opt route optimization with adaptive fleet sizing, and a dynamic emergency rerouting mechanism. The key insight is that these subproblems are interdependent: clustering decisions simultaneously shape patrol efficiency and edge workloads, while capacity constraints feed back into feasible configurations. Experiments show that the proposed framework reduces average response time by 70.6‑‑84.2%, energy consumption by 73.8‑‑88.4%, and fleet size by 26.7‑‑42.1% compared to GA, PSO, and greedy baselines. The emergency mechanism responds within 233 seconds, well under the 300‑second deadline, with negligible impact on normal operations.
Authors: Yousef Emami, Hao Zhou, Radha Reddy, Atefeh Hajijamali Arani, Biliang Wang, Kai Li, Luis Almeida, Zhu Han
Abstract: Uncrewed Aerial Vehicles (UAVs) are widely deployed across diverse applications due to their mobility and agility. Recent advances in Large Language Models (LLMs) offer a transformative opportunity to enhance UAV intelligence beyond conventional optimization‑based and learning‑based approaches. By integrating LLMs into UAV systems, advanced environmental understanding, swarm coordination, mobility optimization, and high‑level task reasoning can be achieved, thereby allowing more adaptive and context‑aware aerial operations. This survey systematically explores the intersection of LLMs and UAV technologies and proposes a unified framework that consolidates existing architectures, methodologies, and applications for UAVs. We first present a structured taxonomy of LLM adaptation techniques for UAVs, including pretraining, fine‑tuning, Retrieval‑Augmented Generation (RAG), and prompt engineering, along with key reasoning capabilities such as Chain‑of‑Thought (CoT) and In‑Context Learning (ICL). We then examine LLM‑assisted UAV communications and operations, covering navigation, mission planning, swarm control, safety, autonomy, and network management. After that, the survey further discusses Multimodal LLMs (MLLMs) for human‑swarm interaction, perception‑driven navigation, and collaborative control. Finally, we address ethical considerations, including bias, transparency, accountability, and Human‑in‑the‑Loop (HITL) strategies, and outline future research directions. Overall, this work positions LLM‑assisted UAVs as a foundation for intelligent and adaptive aerial systems.
Authors: Nicola Cigarini, Giulia Michieletto, Angelo Cenedese
Abstract: In recent years, aerial platforms have evolved from passive flying sensors into versatile, contact‑aware robotic systems, leading to rapid advances in platform design. Standard coplanar and collinear quadrotors have been complemented by modern tilted and tilting multi‑rotor platforms with enhanced maneuverability. To properly analyze, control, and validate the performance of these emerging platforms, an accurate modeling step is required; however, this can be time‑consuming, user‑dependent and error‑prone. To address this issue, we propose a MATLAB/Simulink toolbox for modeling and simulating the dynamics of a broad class of multi‑rotor platforms through both an analytical and physics‑based approaches. The toolbox, named RotorSuite, is provided with comprehensive documentation and example use cases, representing a valuable tool for didactic, research, and industrial development purposes.
Authors: Domonkos Varga
Abstract: This paper presents a methodological analysis of the gesture‑recognition approach proposed by Liu and Szirányi, with a particular focus on the validity of their evaluation protocol. We show that the reported near‑perfect accuracy metrics result from a frame‑level random train‑test split that inevitably mixes samples from the same subjects across both sets, causing severe data leakage. By examining the published confusion matrix, learning curves, and dataset construction, we demonstrate that the evaluation does not measure generalization to unseen individuals. Our findings underscore the importance of subject‑independent data partitioning in vision‑based gesture‑recognition research, especially for applications ‑ such as UAV‑human interaction ‑ that require reliable recognition of gestures performed by previously unseen people.
Authors: Mark Spiller, Lennart Kracke, Johannes Autenrieb
Abstract: Many unmanned aerial vehicles (UAVs) can remain aerodynamically flyable after sustaining structural or control surface damage, yet insufficient robustness in conventional autopilots often leads to mission failure. This paper proposes a robust adaptive sliding mode controller (RASMC) for fixed‑wing UAVs subject to aerodynamic coefficient perturbations and partial loss of control surface effectiveness. A damage‑aware flight dynamics model is developed to systematically analyze the impact of such impairments on the closed‑loop behavior. The RASMC is designed to ensure reliable tracking and stabilization, while a gain adaptation law maintains low control effort under nominal conditions and increases the gains as needed in the presence of aerodynamic damage. Lyapunov‑based stability guarantees are derived, and assumptions on admissible uncertainty bounds are formulated to characterize the limits within which closed‑loop stability and performance can be ensured. The proposed controller is implemented within an existing UAV autopilot framework, where outer‑loop guidance and speed control modules provide reference commands to the RASMC for attitude stabilization. Simulations demonstrate that, despite significant damage, all closed‑loop states remain stable with bounded tracking errors.
Authors: Nuno Saavedra, Pedro Ribeiro, André Coelho, Rui Campos
Abstract: Unmanned Aerial Vehicle (UAV)‑assisted networks are increasingly foreseen as a promising approach for emergency response, providing rapid, flexible, and resilient communications in environments where terrestrial infrastructure is degraded or unavailable. In such scenarios, voice radio communications remain essential for first responders due to their robustness; however, their unstructured nature prevents direct integration with automated UAV‑assisted network management. This paper proposes SIREN, an AI‑driven framework that enables voice‑driven perception for UAV‑assisted networks. By integrating Automatic Speech Recognition (ASR) with Large Language Model (LLM)‑based semantic extraction and Natural Language Processing (NLP) validation, SIREN converts emergency voice traffic into structured, machine‑readable information, including responding units, location references, emergency severity, and Quality‑of‑Service (QoS) requirements. SIREN is evaluated using synthetic emergency scenarios with controlled variations in language, speaker count, background noise, and message complexity. The results demonstrate robust transcription and reliable semantic extraction across diverse operating conditions, while highlighting speaker diarization and geographic ambiguity as the main limiting factors. These findings establish the feasibility of voice‑driven situational awareness for UAV‑assisted networks and show a practical foundation for human‑in‑the‑loop decision support and adaptive network management in emergency response operations.
Authors: Sehani Siriwardana, Jean Michel de Souza Sant'Ana, Richard Demo Souza, Abolfazl Zakeri, Onel Luis Alcaraz López
Abstract: This paper investigates goal‑oriented remote monitoring of an unobservable Markov source using energy‑harvesting sensors that communicate with a mobile receiver, such as a Low Earth Orbit (LEO) satellite or Unmanned Aerial Vehicle (UAV). Unlike conventional systems that assume stationary base stations, the proposed framework explicitly accounts for receiver mobility, which induces time‑varying channel characteristics modeled as a finite‑state Markov process. The remote monitoring problem is formulated as a partially observable Markov decision process (POMDP), which is transformed into a tractable belief‑state MDP and solved using relative value iteration to obtain optimal sampling and transmission policies. Two estimation strategies are considered: Maximum Likelihood (ML) and Minimum Mean Distortion (MMD). Numerical results demonstrate that incorporating receiver mobility and channel state information into the optimization reduces the average distortion by 10% to 42% compared to baseline policies and constant‑channel assumptions, highlighting the importance of base station motion knowledge for effective goal‑oriented communication.
Authors: Antonio Rapuano, Yaolei Shen, Federico Califano, Chiara Gabellieri, Antonio Franchi
Abstract: This paper presents a framework for aerial manipulation of an extensible cable that combines a high‑fidelity model based on partial differential equations (PDEs) with a reduced‑order representation suitable for real‑time control. The PDEs are discretised using a finite‑difference method, and proper orthogonal decomposition is employed to extract a reduced‑order model (ROM) that retains the dominant deformation modes while significantly reducing computational complexity. Based on this ROM, a nonlinear model predictive control scheme is formulated, capable of stabilizing cable oscillations and handling hybrid transitions such as payload attachment and detachment. Simulation results confirm the stability, efficiency, and robustness of the ROM, as well as the effectiveness of the controller in regulating cable dynamics under a range of operating conditions. Additional simulations illustrate the application of the ROM for trajectory planning in constrained environments, demonstrating the versatility of the proposed approach. Overall, the framework enables real‑time, dynamics‑aware control of unmanned aerial vehicles (UAVs) carrying suspended flexible cables.
Authors: Gitae Park, Kisong Lee
Abstract: This paper investigates the joint optimization of trajectory, user scheduling, and time‑slot duration in unmanned aerial vehicle (UAV)‑assisted wireless communication systems under minimum expected spectral efficiency (SE) constraints. Unlike most existing studies that approximate the expected SE by substituting the random channel gain with its mean value, thereby evaluating the SE at the average channel realization and overestimating the true expected SE due to Jensen's inequality, we approximate the expected SE by numerically integrating the SE over the channel distributions. Specifically, instead of relying on average‑channel‑based approximations, we develop a conservative yet tractable quadrature‑based approximation by discretizing the associated cumulative distribution functions. The resulting finite‑sum representation explicitly accounts for the probabilistic LoS structure and channel fading effects, while remaining tractable for optimization. Leveraging this lower bound, we formulate a mission completion time minimization problem subject to minimum expected‑SE requirements for all ground nodes. The resulting problem is a mixed‑integer nonconvex optimization, which is tackled via a penalty‑based block coordinate descent framework. The proposed algorithm alternately optimizes the scheduling decisions and the UAV trajectory along with adaptive time‑slot durations, and maintains feasibility with respect to the original expected‑SE constraints by leveraging successive convex approximation and quadratic transform techniques. Simulation results demonstrate that the proposed method strictly satisfies the minimum expected‑SE constraints and achieves a significantly shorter mission completion time than conventional average‑channel‑based approaches, which are shown to yield infeasible or overly conservative solutions.
Authors: Leonardo Spampinato, Lorenzo Mario Amorosa, Enrico Testi, Chiara Buratti, Riccardo Marini
Abstract: Future vehicular networks require continuous connectivity to serve highly mobile users in urban environments. To mitigate the coverage limitations of fixed terrestrial macro base stations (MBS) under non line‑of‑sight (NLoS) conditions, fleets of unmanned aerial base stations (UABSs) can be deployed as aerial base stations, dynamically repositioning to track vehicular users and traffic hotspots in coordination with the terrestrial network. This paper addresses cooperative multi‑agent trajectory design under different service areas and takeoff configurations, where rapid and safe adaptation across scenarios is essential. We formulate the problem as a multi‑task decentralized partially observable Markov decision process and solve it using centralized training and decentralized execution with double dueling deep Q‑network (3DQN), enabling online training for real‑world deployments. However, efficient exploration remains a bottleneck, with conventional strategies like ε‑greedy requiring careful tuning. To overcome this, we propose the multi‑agent meta‑advisor with advisor override (MAMO). This framework guides agent exploration through a meta‑policy learned jointly across tasks. It uses a dynamic override mechanism that allows agents to reject misaligned guidance when the advisor fails to generalize to a specific scenario. Simulation results across three realistic urban scenarios and multiple takeoff configurations show that MAMO achieves faster convergence and higher returns than tuned ε‑greedy baselines, outperforming both an advisor‑only ablation and a single generalized policy. Finally, we demonstrate that the learned UABS fleet significantly improves network performance compared to deployments without aerial support.
Authors: Md Sharif Hossen, Cole Dickerson, Ozgur Ozdemir, Anil Gurses, Mohamed Rabeek Sarbudeen, Thomas Zajkowski, Ahmed Manavi Alam, Everett Tucker, William Bjorndahl, Fred Solis, Sadaf Javed, Anirudh Kamath, Xiangyao Tang, Joarder Jafor Sadique, Kevin Liu Hermstein, Kaies Al Mahmud, Jose Angel Sanchez Viloria, Skyler Hawkins, Yuqing Cui, Annoy Dey, Yuchen Liu, Ali Gurbuz, Joseph Camp, Rizwan Ahmad, Jacobus van der Merwe, Ahmed Ibrahim Mohamed, Gil Zussman, Mehmet Kurum, Namuduri Kamesh, Zhangyu Guan, Dimitris Pados, George Sklivanitis, Ismail Guvenc, Mihail Sichitiu, Magreth Mushi, Rudra Dutta
Abstract: In this work, we present an unmanned aerial vehicle (UAV) wireless dataset collected as part of the AERPAW Autonomous Aerial Data Mule (AADM) challenge, organized by the NSF Aerial Experimentation and Research Platform for Advanced Wireless (AERPAW) project. The AADM challenge was the second competition in which an autonomous UAV acted as a data mule, where the UAV downloaded data from multiple base stations (BSs) in a dynamic wireless environment. Participating teams designed flight control and decision‑making algorithms for choosing which BSs to communicate with and how to plan flight trajectories to maximize data download within a mission completion time. The competition was conducted in two stages: Stage 1 involved development and experimentation using a digital twin (DT) environment, and in Stage 2, the final test run was conducted on the outdoor testbed. The total score for each team was compiled from both stages. The resulting dataset includes link quality and data download measurements, both in DT and physical environments. Along with the USRP measurements used in the contest, the dataset also includes UAV telemetry, Keysight RF sensors position estimates, link quality measurements from LoRa receivers, and Fortem radar measurements. It supports reproducible research on autonomous UAV networking, multi‑cell association and scheduling, air‑to‑ground propagation modeling, DT‑to‑real‑world transfer learning, and integrated sensing and communication, which serves as a benchmark for future autonomous wireless experimentation.
Authors: Zhiyuan Ren, Yudong Fang, Tao Zhang, Wenchi Cheng, Ben Lan
Abstract: Post‑disaster survivor localization using Unmanned Aerial Vehicles (UAVs) faces a fundamental physical challenge: the prevalence of Non‑Line‑of‑Sight (NLOS) propagation in collapsed structures. Unlike standard Gaussian noise, signal reflection from debris introduces strictly non‑negative ranging biases. Existing robust estimators, typically designed with symmetric loss functions (e.g., Huber or Tukey), implicitly rely on the assumption of error symmetry. Consequently, they experience a theoretical mismatch in this regime, leading to a phenomenon we formally identify as Statistical‑Geometric Degeneracy (SGD)‑a state where the estimator stagnates due to the coupling of persistent asymmetric bias and limited observation geometry. While emerging data‑driven approaches offer alternatives, they often struggle with the scarcity of training data and the sim‑to‑real gap inherent in unstructured disaster zones. In this work, we propose a physically‑grounded solution, the AsymmetricHuberEKF, which explicitly incorporates the non‑negative physical prior of NLOS biases via a derived asymmetric loss function. Theoretically, we show that standard symmetric filters correspond to a degenerate case of our framework where the physical constraint is relaxed. Furthermore, we demonstrate that resolving SGD requires not just a robust filter, but specific bilateral information, which we achieve through a co‑designed active sensing strategy. Validated in a 2D nadir‑view scanning scenario, our approach significantly accelerates convergence compared to symmetric baselines, offering a resilient building block for search operations where data is scarce and geometry is constrained.
Authors: Huan Liu, Michel Gendreau, Binjie Xu, Guohua Wu, Yi Gu
Abstract: In this paper, we introduce a close‑enough multi‑UAV general routing problem (CEMUAVGRP) where a fleet of homogeneous UAVs conduct monitoring tasks containing nodes, each of which has its disk neighborhood, and edges, aiming to minimize the total distance. A two‑phase iterative method is proposed, partitioning the CEMUAVGRP into a general routing phase where a satisfactory route including required nodes and edges for each UAV is obtained without considering the disk neighborhoods of required nodes, and a close‑enough routing phase where representative points are optimized for each required node in the determined route. To be specific, a variable neighborhood descent (VND) heuristic is proposed for the general routing phase, while a second‑order cone programming (SOCP) procedure is applied in the close‑enough routing phase. These two phases are performed in an iterative fashion under the framework of an adaptive iterated local search (AILS) algorithm until the predefined termination criteria are satisfied. Extensive experiments and comparative studies are conducted, demonstrating the efficiency of the proposed AILS‑VND‑SOCP algorithm and the superiority of disk neighborhoods.
Authors: Haichao Liu, Yufeng Hu, Shuang Wang, Kangjun Guo, Jun Ma, Jinni Zhou
Abstract: Autonomous landing of Uncrewed Aerial Vehicles (UAVs) on oscillating marine platforms is severely constrained by wave‑induced multi‑frequency oscillations, wind disturbances, and prediction phase lags in motion prediction. Existing methods either treat platform motion as a general random process or lack explicit modeling of wave spectral characteristics, leading to suboptimal performance under dynamic sea conditions. To address these limitations, we propose SpecFuse: a novel spectral‑temporal fusion predictive control framework that integrates frequency‑domain wave decomposition with time‑domain recursive state estimation for high‑precision 6‑DoF motion forecasting of Uncrewed Surface Vehicles (USVs). The framework explicitly models dominant wave harmonics to mitigate phase lags, refining predictions in real time via IMU data without relying on complex calibration. Additionally, we design a hierarchical control architecture featuring a sampling‑based HPO‑RRT algorithm for dynamic trajectory planning under non‑convex constraints and a learning‑augmented predictive controller that fuses data‑driven disturbance compensation with optimization‑based execution. Extensive validations (2,000 simulations + 8 lake experiments) show our approach achieves a 3.2 cm prediction error, 4.46 cm landing deviation, 98.7% / 87.5% success rates (simulation / real‑world), and 82 ms latency on embedded hardware, outperforming state‑of‑the‑art methods by 44%‑48% in accuracy. Its robustness to wave‑wind coupling disturbances supports critical maritime missions such as search and rescue and environmental monitoring. All code, experimental configurations, and datasets will be released as open‑source to facilitate reproducibility.
Authors: Sunjung Kang, Vishrant Tripathi, Christopher G. Brinton
Abstract: We investigate a remote monitoring framework with multiple sensing modalities including IoT sensors on the ground, mobile UAVs in the air, and a periodically available satellite constellation. While the IoT sensors cover small areas and remain fixed, the UAVs can move between locations and cover larger areas, and the satellites can observe the entire region but have high latency and low reliability. We divide the deployment region into cells and model it as a graph, with the nodes representing individual cells and edges representing possible UAV mobility patterns. To evaluate the freshness of collected information from this graph, we adopt the Age of Information (AoI) metric, measured separately for each cell. Under a given deployment of IoT nodes and UAV mobility patterns, our objective is to ascertain whether the system should actually utilize monitoring updates from satellites ‑ a seemingly simple yet surprisingly elusive question. For stationary randomized scheduling policies, we develop closed‑form expressions and lower bounds for the weighted‑sum AoI and utilize this analysis to explore performance tradeoffs as system parameters vary. We also provide a Lyapunov style max‑weight policy and detailed simulations that provide crucial insights for deploying such systems in practice.
Authors: Maria Conceição, António Grilo, Meysam Basiri
Abstract: A networked aerial robot team (NART) comprises a group of agents (e.g., unmanned aerial vehicles (UAVs), ground control stations, etc.) interconnected by wireless links. Inter‑agent connectivity, even if intermittent (i.e. sparse), enables data exchanges between agents and supports cooperative behaviours in several NART missions. It can benefit online decentralised decision‑making and group resilience, particularly when prior knowledge is inaccurate or incomplete. These requirements can be accounted for in the offline mission planning stages to incentivise cooperative behaviours and improve mission efficiency during the NART deployment. This paper proposes a novel path planning tool for a Sparse, Aware, and Cooperative Networked Aerial Robot Team (SpArC‑NART) in exploration missions. It simultaneously considers different levels of prior information regarding the environment, limited agent energy, sensing, and communication, as well as distinct NART constitutions. The communication model takes into account the limitations of user‑defined radio technology and physical phenomena. The proposed tool aims to maximise the mission goals (e.g., finding one or multiple targets, covering the full area of the environment, etc.), while cooperating with other agents to reduce agent reporting times, increase their global situational awareness (e.g., their knowledge of the environment), and facilitate mission replanning, if required. The developed cooperation mechanism leverages soft‑motion constraints and dynamic rewards based on the Value of Movement and the expected communication availability between the agents at each time step. A ground sensing coverage use case was chosen to illustrate the current capabilities of this tool.
Authors: Aykut Kabaoglu, Sanem Sariel
Abstract: Accurate state estimation in Unmanned Aerial Vehicles (UAVs) is crucial for ensuring reliable and safe operation, as anomalies occurring during mission execution may induce discrepancies between expected and observed system behaviors, thereby compromising mission success or posing potential safety hazards. It is essential to continuously monitor and detect such conditions in order to ensure a timely response and maintain system reliability. In this work, we focus on UAV state estimation anomalies and provide a large‑scale real‑world UAV dataset to facilitate research aimed at improving the development of anomaly detection. Unlike existing datasets that primarily rely on injected faults into simulated data, this dataset comprises 1396 real flight logs totaling over 52 hours of flight time, collected across diverse indoor and outdoor environments using a collection of PX4‑based UAVs equipped with a variety of sensor configurations. The dataset comprises both normal and anomalous flights without synthetic manipulation, making it uniquely suitable for realistic anomaly detection tasks. A structured classification is proposed that categorizes UAV state estimation anomalies into four classes: mechanical and electrical, external position, global position, and altitude anomalies. These classifications reflect collective, contextual, and outlier anomalies observed in multivariate sensor data streams, including IMU, GPS, barometer, magnetometer, distance sensors, visual odometry, and optical flow, that can be found in the PX4 logging mechanism. It is anticipated that this dataset will play a key role in the development, training, and evaluation of anomaly detection and isolation systems to address the critical gap in UAV reliability research.
Authors: Chen Feng, Yang Xu, Shaojie Shen
Abstract: Autonomous aerial scanning of target structures is crucial for practical applications, requiring online adaptation to unknown obstacles during flight. Existing methods largely emphasize collision avoidance and efficiency, but overlook occlusion‑induced visibility degradation, severely compromising scanning quality. In this study, we propose FC‑Vision, an on‑the‑fly visibility‑aware replanning framework that proactively and safely prevents target occlusions while preserving the intended coverage and efficiency of the original plan. Our approach explicitly enforces dense surface‑visibility constraints to regularize replanning behavior in real‑time via an efficient two‑level decomposition: occlusion‑free viewpoint repair that maintains coverage with minimal deviation from the nominal scan intent, followed by segment‑wise clean‑sensing connection in 5‑DoF space. A plug‑in integration strategy is also presented to seamlessly interface FC‑Vision with existing UAV scanning systems without architectural changes. Comprehensive simulation and real‑world evaluations show that FC‑Vision consistently improves scanning quality under unexpected occluders, delivering a maximum coverage gain of 55.32% and a 73.17% reduction in the occlusion ratio, while achieving real‑time performance with a moderate increase in flight time. The source code will be made publicly available.
Authors: Haiquan Lu, Chao Feng, Yong Zeng, Shaodan Ma, Long Shi, Shi Jin, Rui Zhang
Abstract: Unmanned aerial vehicle (UAV) with the intrinsic three‑dimensional (3D) mobility provides an ideal platform for implementing aerial movable antenna (AMA) system enabled by UAV swarm cooperation. Besides, AMA system is readily to achieve an extremely large‑scale array aperture, rendering the conventional far‑field uniform plane wave (UPW) model no longer valid for aerial‑to‑ground links. This paper studies the UAV swarm enabled near‑field AMA communication, by taking into account the non‑uniform spherical wave (NUSW) model, where UAV swarm trajectory simultaneously influences the channel amplitude and phase. We formulate a general optimization problem to maximize the minimum average communication rate over user equipments (UEs), by jointly optimizing the 3D UAV swarm trajectory and receive beamforming for all UEs. To draw useful insights, the special case of single UE is first studied, and successive convex approximation (SCA) technique is proposed to efficiently optimize the UAV swarm trajectory. For the special case of placement optimization, the optimal placement positions of UAVs for cases of single UAV and two UAVs are derived in closed‑form. Then, for the special case of two UEs, we show that an inter‑UE interference (IUI)‑free communication can be achieved by symmetrically placing an even number of UAVs along a hyperbola, with its foci corresponding to the locations of the two UEs. Furthermore, for arbitrary number of UEs, an alternating optimization algorithm is proposed to efficiently tackle the non‑convex optimization problem. Numerical results validate the significant performance gains over the benchmark schemes.
Authors: Weian Guo, Shixin Deng, Wuzhao Li, Li Li
Abstract: We address multi‑objective unmanned aerial vehicle (UAV) placement for motorway intelligent transportation systems, where deployments must balance coverage, link quality, and UAV count under geometric constraints. We construct a reproducible benchmark from highD motorway recordings with recording‑level splits and generate Pareto‑optimal labels via NSGA‑II. A preference rule yields deployable targets while preserving multi‑objective evaluation. We train fast surrogate models that map unordered vehicle positions to UAV count and continuous placements, using permutation‑aware losses and constraint‑regularized training across set‑based and sequence‑based architectures. The evaluation protocol combines Pareto quality metrics, success‑rate curves, runtime benchmarks, and robustness studies, with uncertainty quantified by recording‑level bootstrap. Results indicate that permutation‑invariant set models provide the strongest coverage‑‑SNR‑‑count trade‑off among learned predictors and approach NSGA‑II quality while enabling real‑time inference. Under shared budgets, they offer a more favorable success‑‑latency trade‑off than heuristic baselines. The benchmark, splits are released to support reproducible ITS deployment studies and to facilitate comparisons under shared operational budgets.
Authors: Sohail Ali Farooqui, Zuhair Ahmed Khan Taha, Mohammed Mudassir Uddin, Shahnawaz Alam
Abstract: Unmanned aerial vehicles serve as primary sensing platforms for surveillance, traffic monitoring, and disaster response, making aerial object detection a central problem in applied computer vision. Current detectors struggle with UAV‑specific challenges: targets spanning only a few pixels, cluttered backgrounds, heavy occlusion, and strict onboard computational budgets. This study introduces LAF‑YOLOv10, built on YOLOv10n, integrating four complementary techniques to improve small‑object detection in drone imagery. A Partial Convolution C2f (PC‑C2f) module restricts spatial convolution to one quarter of backbone channels, reducing redundant computation while preserving discriminative capacity. An Attention‑Guided Feature Pyramid Network (AG‑FPN) inserts Squeeze‑and‑Excitation channel gates before multi‑scale fusion and replaces nearest‑neighbor upsampling with DySample for content‑aware interpolation. An auxiliary P2 detection head at 160×160 resolution extends localization to objects below 8×8 pixels, while the P5 head is removed to redistribute parameters. Wise‑IoU v3 replaces CIoU for bounding box regression, attenuating gradients from noisy annotations in crowded aerial scenes. The four modules address non‑overlapping bottlenecks: PC‑C2f compresses backbone computation, AG‑FPN refines cross‑scale fusion, the P2 head recovers spatial resolution, and Wise‑IoU stabilizes regression under label noise. No individual component is novel; the contribution is the joint integration within a single YOLOv10 framework. Across three training runs (seeds 42, 123, 256), LAF‑YOLOv10 achieves 35.1\pm0.3% mAP@0.5 on VisDrone‑DET2019 with 2.3\,M parameters, exceeding YOLOv10n by 3.3 points. Cross‑dataset evaluation on UAVDT yields 35.8\pm0.4% mAP@0.5. Benchmarks on NVIDIA Jetson Orin Nano confirm 24.3 FPS at FP16, demonstrating viability for embedded UAV deployment.
Authors: Hengyu Mu, Jianshi Wu, Yuxin Guo, XianLian Lin, Qingyong Hu, Sheng Ao, Chenglu Wen, Cheng Wang
Abstract: Regression‑based LiDAR relocalization has recently emerged as a promising solution for high‑precision positioning in GNSS‑denied environments. However, these methods are primarily tailored to autonomous driving, exhibiting significantly degraded accuracy in unmanned aerial vehicle (UAV) scenarios due to arbitrary pose variations and irregular flight paths. In this paper, we propose SOAR, a regression‑based LiDAR relocalization framework for UAVs. Specifically, we introduce a locality‑preserving sliding window attention module with locally invariant positional encoding to capture discriminative geometric structures robust to viewpoint changes. A coordinate‑independent feature initialization module is further designed to eliminate sensitivity to global transformations. Furthermore, most existing UAV datasets are limited to evaluate LiDAR relocalization in real‑world, due to the lack of synchronized LiDAR scans, accurate 6‑DoF poses, or multiple traversals. Thus, we construct a large‑scale UAV LiDAR localization dataset with 4 scenes and 13 irregular paths exhibiting rotation and altitude variations, providing a more realistic benchmark for UAVs. Extensive experiments demonstrate that our method achieves state‑of‑the‑art performance, improving the localization success rate by 40% and reducing mean error over 10m on UAVLoc. Our code and dataset will be released soon.
Authors: Jie Zheng, Ruichen Zhang, Dusit Niyato, Haijun Zhang, Jiacheng Wang, Hongyang Du, Jiawen Kang, Zehui Xiong
Abstract: Enhancing future wireless networks presents a significant challenge for networking systems due to diverse user demands and the emergence of 6G technology. While reinforcement learning (RL) is a powerful framework, it often encounters difficulties with high‑dimensional state spaces and complex environments, leading to substantial computational demands, distributed intelligence, and potentially inconsistent outcomes. Large language models (LLMs), with their extensive pretrained knowledge and advanced reasoning capabilities, offer promising tools to enhance RL in optimizing 6G wireless networks. We explore RL models augmented by LLMs, emphasizing their roles and the potential benefits of their synergy in wireless network optimization. We then examine LLM‑enabled RL across various protocol layers: physical, data link, network, transport, and application layers. Additionally, we propose an LLM‑assisted state representation and semantic extraction to enhance the multi‑agent reinforcement learning (MARL) framework. This approach is applied to service migration and request routing, as well as topology graph generation in unmanned aerial vehicle (UAV)‑satellite networks. Through case studies, we demonstrate that our framework effectively performs optimization of wireless network. Finally, we outline prospective research directions for LLM‑enabled RL in wireless network optimization.
Authors: Abdikarim Mohamed Ibrahim, Rosdiadee Nordin
Abstract: Artificial intelligence (AI) and reinforcement learning (RL) have shown significant promise in wireless systems, enabling dynamic spectrum allocation, traffic management, and large‑scale Internet of Things (IoT) coordination. However, their deployment in mission‑critical applications introduces the risk of unsafe emergent behaviors, such as UAV collisions, denial‑of‑service events, or instability in vehicular networks. Existing safety mechanisms are predominantly reactive, relying on anomaly detection or fallback controllers that intervene only after unsafe actions occur, which cannot guarantee reliability in ultra‑reliable low‑latency communication (URLLC) settings. In this work, we propose a proactive safety‑constrained RL framework that integrates proof‑carrying control (PCC) with empowerment‑budgeted (EB) enforcement. Each agent action is verified through lightweight mathematical certificates to ensure compliance with interference constraints, while empowerment budgets regulate the frequency of safety overrides to balance safety and autonomy. We implement this framework on a wireless uplink scheduling task using Proximal Policy Optimization (PPO). Simulation results demonstrate that the proposed PCC+EB controller eliminates unsafe transmissions while preserving system throughput and predictable autonomy. Compared with unconstrained and reactive baselines, our method achieves provable safety guarantees with minimal performance degradation. These results highlight the potential of proactive safety constrained RL to enable trustworthy wireless autonomy in future 6G networks.
Authors: Cunlai Pu, Fangrui Wu, Zhe Wang, Xiangbo Shu
Abstract: In complex Unmanned Aerial Vehicle (UAV) networks, UAVs can establish dynamic and heterogeneous links with one another for various purposes, such as communication coverage, collective sensing, and task collaboration. These interactions give rise to dynamic multiplex UAV networks, where each layer represents a distinct type of interaction among UAVs. Understanding how such links form and evolve is both of theoretical interest and of practical importance for the control and maintenance of networked UAV systems. In this paper, we first develop a dynamic multiplex network model for UAV networks to characterize their dynamic and heterogeneous link properties. We then propose a cross‑layer fusion‑based deep learning model, termed CLF‑ULP, to predict future inter‑UAV links based on historical topology data. CLF‑ULP incorporates graph attention networks to extract topological features within each layer and perform a cross‑layer attention fusion to capture inter‑layer dependencies. Furthermore, a shared‑parameter long short‑term memory network is employed to model the temporal evolution of each layer. To improve embedding quality and link prediction performance, we develop a joint loss function that considers both intra‑layer and inter‑layer UAV adjacency. Extensive experiments on simulated UAV datasets under diverse mobility patterns demonstrate that CLF‑ULP achieves state‑of‑the‑art performance in predicting links within dynamic multiplex UAV networks.
Authors: Andrii Grekhov, Volodymyr Kharchenko, Vasyl Kondratiuk
Abstract: The purpose of this paper is to model traffic in Ad Hoc network of Unmanned Aerial Vehicles and demonstrate a way for adapting communication channel using Artificial Intelligence. The modeling was based on the original model of Ad Hoc network including 20 Unmanned Aerial Vehicles. The dependences of packet loss on the packet size for different transmission powers, on the packet size for different frequencies, on Unmanned Aerial Vehicles flight area and on the number of Unmanned Aerial Vehicles were obtained and analyzed. The implementation of adaptive data transmission is presented in the program code. The dependences of packet loss, power and transaction size on time during Artificial Intelligence adaptation are shown.
Authors: Andrii Grekhov, Volodymyr Kharchenko, Vasyl Kondratiuk
Abstract: This paper presents a simulation based study of Artificial Intelligence assisted communication channel adaptation in Unmanned Aerial Vehicle enabled cellular networks. The considered system model includes communication channel Ground Base Station Aerial Repeater UAV Base Station Cluster of Cellular Network Users. The primary objective of the study is to investigate the impact of adaptive channel parameter control on communication performance under dynamically changing interference conditions. A lightweight supervised machine learning approach based on linear regression is employed to implement cognitive channel adaptation. The AI model operates on packet level performance indicators and enables real time adjustment of Transaction Size in response to variations in Bit Error Rate and effective Data Rate. A custom simulation environment is developed to generate training and testing datasets and to evaluate system behavior under both static and adaptive channel configurations.
Authors: Songxin Lei, Chunming Ma, Haomin Wen, Yexin Li, Lizhenghe Chen, Qianyu Yang, Fugee Tsung, Lei Chen, Sijie Ruan, Yuxuan Liang
Abstract: Cooperative air‑ground delivery has emerged as a promising logistics paradigm by leveraging the complementary strengths of UAVs and ground carriers. However, effective dispatching in such heterogeneous systems faces two critical challenges: i) the heterogeneity between flight and road dynamics, ii) the scalability bottleneck raised by the exponential decision variables in large‑scale fleets. To address these challenges, we propose HRL4AG, a Hierarchical Reinforcement Learning framework for cooperative Air‑Ground delivery. Specifically, HRL4AG employs a high‑level manager to tackle the scalability bottleneck by decomposing the joint action space, and mode‑specific workers that encode distinct flight and road dynamics to address the heterogeneity. Furthermore, a novel internal reward mechanism is designed to guide the hierarchical policy learning, addressing the credit assignment problem in sparse‑reward settings. Extensive experiments on two real‑world datasets and an evaluation platform demonstrate that HRL4AG significantly outperforms state‑of‑the‑art baselines, improving the delivery success rate by up to 26% while achieving an 80‑fold increase in computational efficiency.
Authors: Houssem Eddine Mohamadi, Nadjia Kara
Abstract: The success of surveillance applications involving small unmanned aerial vehicles (UAVs) depends on how long the limited on‑board power would persist. To cope with this challenge, alternative renewable sources of lift are sought. One promising solution is to extract energy from rising masses of buoyant air. This paper proposes a local‑global behavioral management and decision‑making approach for the autonomous deployment of soaring‑capable UAVs. The cooperative UAVs are modeled as non‑deterministic finite state‑based rational agents. In addition to a mission planning module for assigning tasks and issuing dynamic navigation waypoints for a new path planning scheme, in which the concepts of visibility and prediction are applied to avoid the collisions. Moreover, a delayed learning and tuning strategy is employed optimize the gains of the path tracking controller. Rigorous comparative analyses carried out with three benchmarking baselines and 15 evolutionary algorithms highlight the adequacy of the proposed approach for maintaining the surveillance persistency (staying aloft for longer periods without landing) and maximizing the detection of targets (two times better than non‑cooperative and semi‑cooperative approaches) with less power consumption (almost 6% of battery consumed in six hours).
Authors: Amath Sow, Mauricio Rodriguez Cesen, Fabiola Martins Campos de Oliveira, Mariusz Wzorek, Daniel de Leng, Mattias Tiger, Fredrik Heintz, Christian Esteve Rothenberg
Abstract: Preflight planning for large‑scale Unmanned Aerial Vehicle (UAV) fleets in dynamic, shared airspace presents significant challenges, including temporal No‑Fly Zones (NFZs), heterogeneous vehicle profiles, and strict delivery deadlines. While Multi‑Agent Path Finding (MAPF) provides a formal framework, existing methods often lack the scalability and flexibility required for real‑world Unmanned Traffic Management (UTM). We propose DTAPP‑IICR: a Delivery‑Time Aware Prioritized Planning method with Incremental and Iterative Conflict Resolution. Our framework first generates an initial solution by prioritizing missions based on urgency. Secondly, it computes roundtrip trajectories using SFIPP‑ST, a novel 4D single‑agent planner (Safe Flight Interval Path Planning with Soft and Temporal Constraints). SFIPP‑ST handles heterogeneous UAVs, strictly enforces temporal NFZs, and models inter‑agent conflicts as soft constraints. Subsequently, an iterative Large Neighborhood Search, guided by a geometric conflict graph, efficiently resolves any residual conflicts. A completeness‑preserving directional pruning technique further accelerates the 3D search. On benchmarks with temporal NFZs, DTAPP‑IICR achieves near‑100% success with fleets of up to 1,000 UAVs and gains up to 50% runtime reduction from pruning, outperforming batch Enhanced Conflict‑Based Search in the UTM context. Scaling successfully in realistic city‑scale operations where other priority‑based methods fail even at moderate deployments, DTAPP‑IICR is positioned as a practical and scalable solution for preflight planning in dense, dynamic urban airspace.
Authors: Muhammad Farhan Ahmed, Vincent Frémont
Abstract: Autonomous aerial‑surface robot teams offer a scalable solution for maritime monitoring, but deployment remains difficult due to water‑induced visual artifacts and bandwidth‑limited coordination. This paper presents a decentralized multi‑robot framework to detect and track floating containers using multiple UAVs cooperating with an autonomous surface vessel. Each UAV runs a YOLOv8 detector augmented with stereo disparity and maintains per‑target EKF tracks with uncertainty‑aware data association. Robots exchange compact track summaries that are fused conservatively using Covariance Intersection, preserving estimator consistency under unknown cross‑correlations. An information‑driven allocator assigns targets and selects UAV hover viewpoints by trading expected uncertainty reduction in travel effort and safety separation. Implemented in ROS, the proposed system is validated in simulations and compared with representative tracking and fusion baselines, showing improved identity continuity and localization accuracy with modest communication overhead.
Authors: Zhihan Zeng, Kaihe Wang, Zhongpei Zhang, Yue Xiu
Abstract: The integration of Generative AI (GenAI) into Consumer Electronics (CE)‑‑from AI‑powered assistants in wearables to generative planning in autonomous Uncrewed Aerial Vehicles (UAVs)‑‑has revolutionized user experiences. However, these GenAI applications impose immense computational burdens on edge hardware, leaving strictly limited resources for fundamental security tasks like Global Navigation Satellite System (GNSS) signal protection. Furthermore, training robust classifiers for such devices is hindered by the scarcity of real‑world interference data. To address the dual challenges of data scarcity and the extreme efficiency required by the GenAI era, this paper proposes a novel framework named GAC‑KAN. First, we adopt a physics‑guided simulation approach to synthesize a large‑scale, high‑fidelity jamming dataset, mitigating the data bottleneck. Second, to reconcile high accuracy with the stringent resource constraints of GenAI‑native chips, we design a Multi‑Scale Ghost‑ACB‑Coordinate (MS‑GAC) backbone. This backbone combines Asymmetric Convolution Blocks (ACB) and Ghost modules to extract rich spectral‑temporal features with minimal redundancy. Replacing the traditional Multi‑Layer Perceptron (MLP) decision head, we introduce a Kolmogorov‑Arnold Network (KAN), which employs learnable spline activation functions to achieve superior non‑linear mapping capabilities with significantly fewer parameters. Experimental results demonstrate that GAC‑KAN achieves an overall accuracy of 98.0%, outperforming state‑of‑the‑art baselines. Significantly, the model contains only 0.13 million parameter‑‑approximately 660 times fewer than Vision Transformer (ViT) baselines. This extreme lightweight characteristic makes GAC‑KAN an ideal "always‑on" security companion, ensuring GNSS reliability without contending for the computational resources required by primary GenAI tasks.
Authors: Yin Tang, Jiawei Ma, Jinrui Zhang, Alex Jinpeng Wang, Deyu Zhang
Abstract: Continuous navigation in complex environments is critical for Unmanned Aerial Vehicle (UAV). However, the existing Vision‑Language Navigation (VLN) models follow the dead‑reckoning, which iteratively updates its position for the next waypoint prediction, and subsequently construct the complete trajectory. Then, such stepwise manner will inevitably lead to accumulated errors of position over time, resulting in misalignment between internal belief and objective coordinates, which is known as "state drift" and ultimately compromises the full trajectory prediction. Drawing inspiration from classical control theory, we propose to correct for errors by formulating such sequential prediction as a recursive Bayesian state estimation problem. In this paper, we design NeuroKalman, a novel framework that decouples navigation into two complementary processes: a Prior Prediction, based on motion dynamics and a Likelihood Correction, from historical observation. We first mathematically associate Kernel Density Estimation of the measurement likelihood with the attention‑based retrieval mechanism, which then allows the system to rectify the latent representation using retrieved historical anchors without gradient updates. Comprehensive experiments on TravelUAV benchmark demonstrate that, with only 10% of the training data fine‑tuning, our method clearly outperforms strong baselines and regulates drift accumulation.
Authors: Alfonso Sciacchitano, Liraz Mudrik, Sean Kragelund, Isaac Kaminer
Abstract: Accurate localization of maritime targets by unmanned aerial vehicles (UAVs) remains challenging in GPS‑denied environments. UAVs equipped with gimballed electro‑optical sensors are typically used to localize targets, however, reliance on these sensors increases mechanical complexity, cost, and susceptibility to single‑point failures, limiting scalability and robustness in multi‑UAV operations. This work presents a new trajectory optimization framework that enables cooperative target localization using UAVs with fixed, non‑gimballed cameras operating in coordination with a surface vessel. This estimation‑aware optimization generates dynamically feasible trajectories that explicitly account for mission constraints, platform dynamics, and out‑of‑frame events. Estimation‑aware trajectories outperform heuristic paths by reducing localization error by more than a factor of two, motivating their use in cooperative operations. Results further demonstrate that coordinated UAVs with fixed, non‑gimballed cameras achieve localization accuracy that meets or exceeds that of single gimballed systems, while substantially lowering system complexity and cost, enabling scalability, and enhancing mission resilience.
Authors: Jijia Tian, Junting Chen, Pooi-Yuen Kam
Abstract: Unmanned aerial vehicle (UAV) downlink transmission facilitates critical time‑sensitive visual applications but is fundamentally constrained by bandwidth scarcity and dynamic channel impairments. The rapid fluctuation of the air‑to‑ground (A2G) link creates a regime where reliable transmission slots are intermittent and future channel quality can only be predicted with uncertainty. Conventional deep joint source‑channel coding (DeepJSCC) methods transmit coupled feature streams, causing global reconstruction failure when specific time slots experience deep fading. Decoupling semantic content into a deterministic structure component and a stochastic texture component enables differentiated error protection strategies aligned with channel reliability. A predictive transmission framework is developed that utilizes a split‑stream variational codec and a channel‑aware scheduler to prioritize the delivery of structural layout over reliable slots. Experimental evaluations indicate that this approach achieves a 5.6 dB gain in peak signal‑to‑noise (SNR) ratio over single‑stream baselines and maintains structural fidelity under significant prediction mismatch.
Authors: Chuan-Chi Lai
Abstract: Ensuring continuous service coverage under unexpected hardware failures is a fundamental challenge for 3D Aerial‑Ground Integrated Networks. Although Multi‑Agent Reinforcement Learning facilitates autonomous coordination, traditional architectures often lack resilience to sudden topology deformations. This paper proposes the Topology‑Aware Graph MAPPO (TAG‑MAPPO) framework to enhance system survivability through autonomous 3D spatial reconfiguration. Our framework integrates graph‑based feature aggregation with a residual ego‑state fusion mechanism to capture intricate inter‑agent dependencies. To achieve structural robustness, we introduce a Random Observation Shuffling mechanism that fosters strong generalization to agent population fluctuations by breaking coordinate‑index dependencies. Extensive simulations across heterogeneous environments, including high‑speed mobility at 15 meters per second, demonstrate that TAG‑MAPPO significantly outperforms Multi‑Layer Perceptron baselines. Specifically, the framework reduces redundant handoffs by up to 50 percent while maintaining superior energy efficiency. Most notably, TAG‑MAPPO exhibits exceptional self‑healing capabilities, restoring over 90 percent of pre‑failure coverage within 15 time steps. In dense urban scenarios, the framework achieves a post‑failure fairness index surpassing its original four‑UAV configuration by autonomously resolving service overlaps and interference. These findings confirm that topology‑aware coordination is essential for resilient 6G aerial networks.
Authors: Chuan-Chi Lai, Chi Jai Choy
Abstract: In the era of 6G Air‑Ground Integrated Networks (AGINs), Unmanned Aerial Vehicles (UAVs) are pivotal for providing on‑demand wireless coverage in mission‑critical environments, such as post‑disaster rescue operations. However, traditional Deep Reinforcement Learning (DRL) approaches for multi‑UAV orchestration often face critical challenges: instability due to the non‑stationarity of multi‑agent environments and the difficulty of balancing energy efficiency with service equity. To address these issues, this paper proposes ORCHID (Orchestration of Resilient Coverage via Hybrid Intelligent Deployment), a novel stability‑enhanced two‑stage learning framework. First, ORCHID leverages a GBS‑aware topology partitioning strategy to mitigate the exploration cold‑start problem. Second, we introduce a Reset‑and‑Finetune (R\&F) mechanism within the MAPPO architecture that stabilizes the learning process via synchronized learning rate decay and optimizer state resetting. This mechanism effectively suppresses gradient variance to prevent policy degradation, thereby ensuring algorithmic resilience in dynamic environments. Furthermore, we uncover a counter‑intuitive efficiency‑fairness synergy: contrary to the conventional trade‑off, our results demonstrate that the proposed Max‑Min Fairness (MMF) design not only guarantees service for cell‑edge users but also achieves superior energy efficiency compared to Proportional Fairness (PF), which tends to converge to suboptimal greedy equilibria. Extensive experiments confirm that ORCHID occupies a superior Pareto‑dominant position compared to state‑of‑the‑art baselines, ensuring robust convergence and resilient connectivity in mission‑critical scenarios.
Authors: François Marcoux, François Grondin
Abstract: In recent years, the illicit use of unmanned aerial vehicles (UAVs) for deliveries in restricted area such as prisons became a significant security challenge. While numerous studies have focused on UAV detection or localization, little attention has been given to delivery events identification. This study presents the first acoustic package delivery detection algorithm using a ground‑based microphone array. The proposed method estimates both the drone's propeller speed and the delivery event using solely acoustic features. A deep neural network detects the presence of a drone and estimates the propeller's rotation speed or blade passing frequency (BPF) from a mel spectrogram. The algorithm analyzes the BPFs to identify probable delivery moments based on sudden changes before and after a specific time. Results demonstrate a mean absolute error of the blade passing frequency estimator of 16 Hz when the drone is less than 150 meters away from the microphone array. The drone presence detection estimator has a accuracy of 97%. The delivery detection algorithm correctly identifies 96% of events with a false positive rate of 8%. This study shows that deliveries can be identified using acoustic signals up to a range of 100 meters.
Authors: Chuan-Chi Lai
Abstract: Unmanned Aerial Vehicle (UAV) mounted Base Stations (UAV‑BSs) provide flexible coverage for temporary hotspot scenarios; however, efficiently optimizing 3D deployment to satisfy heterogeneous user distributions remains a significant challenge. While Deep Reinforcement Learning (DRL) approaches have shown promise, they often suffer from prohibitive training overhead and poor generalization in cold‑start scenarios where the user topology is unknown a priori. To address these limitations, this paper proposes Satisfaction‑driven Coverage Optimization via Perimeter Extraction (SCOPE), which is a deterministic and training‑free 3D deployment framework. Unlike existing heuristics that rely on fixed‑altitude assumptions, SCOPE integrates a perimeter‑based peeling strategy with the Welzl Smallest Enclosing Circle (SEC) algorithm to dynamically optimize 3D positions. Theoretically, we provide a rigorous convergence proof and derive a polynomial time complexity of O(N^2 \log N), ensuring predictable execution for real‑time applications. Experimentally, we evaluate SCOPE in unpredictable hotspot environments against both traditional heuristics and state‑of‑the‑art DRL baselines under a matched hardware budget. Simulation results demonstrate that SCOPE maintains a high user satisfaction rate between 82% and 88% while generating solutions within millisecond‑level latency on commodity hardware. Furthermore, SCOPE demonstrates exceptional resilience by maintaining an approximate 40% functional coverage rate at a minimum altitude constraint of 60 m; in this challenging regime, baseline methods suffer a significant performance degradation, dropping to approximately 20% due to altitude‑induced path loss. These findings validate SCOPE as a robust and agile solution for establishing instantaneous digital lifelines in zero‑day disaster response missions.
Authors: Xiaolou Sun, Wufei Si, Wenhui Ni, Yuntian Li, Dongming Wu, Fei Xie, Runwei Guan, He-Yang Xu, Henghui Ding, Yuan Wu, Yutao Yue, Yongming Huang, Hui Xiong
Abstract: Vision‑language navigation (VLN) requires intelligent agents to navigate environments by interpreting linguistic instructions alongside visual observations, serving as a cornerstone task in Embodied AI. Current VLN research for unmanned aerial vehicles (UAVs) relies on detailed, pre‑specified instructions to guide the UAV along predetermined routes. However, real‑world outdoor exploration typically occurs in unknown environments where detailed navigation instructions are unavailable. Instead, only coarse‑grained positional or directional guidance can be provided, requiring UAVs to autonomously navigate through continuous planning and obstacle avoidance. To bridge this gap, we propose AutoFly, an end‑to‑end Vision‑Language‑Action (VLA) model for autonomous UAV navigation. AutoFly incorporates a pseudo‑depth encoder that derives depth‑aware features from RGB inputs to enhance spatial reasoning, coupled with a progressive two‑stage training strategy that effectively aligns visual, depth, and linguistic representations with action policies. Moreover, existing VLN datasets have fundamental limitations for real‑world autonomous navigation, stemming from their heavy reliance on explicit instruction‑following over autonomous decision‑making and insufficient real‑world data. To address these issues, we construct a novel autonomous navigation dataset that shifts the paradigm from instruction‑following to autonomous behavior modeling through: (1) trajectory collection emphasizing continuous obstacle avoidance, autonomous planning, and recognition workflows; (2) comprehensive real‑world data integration. Experimental results demonstrate that AutoFly achieves a 3.9% higher success rate compared to state‑of‑the‑art VLA baselines, with consistent performance across simulated and real environments.
Authors: Aamer Mohamed Huroon, Li-Chun Wang
Abstract: Unmanned Aerial Vehicles (UAVs) are crucial for advancing railway communication by offering reliable connectivity, adaptive coverage, and mobile edge services . This survey examines UAV‑assisted approaches for 6G railway needs including ultra‑reliable low‑latency communication (URLLC) and integrated sensing and communication (ISAC). We cover railway channel models, reconfigurable intelligent surfaces (RIS), and UAV‑assisted mobile edge computing (MEC). Key challenges include coexistence with existing systems, handover management, Doppler effect, and security. The roadmap suggests work on integrated communication‑control systems and AI‑driven optimization for intelligent railway networks.
Authors: Mohammad Morsali, Siavash H. Khajavi
Abstract: According to the United Nations, wildfire frequency and intensity are projected to increase by approximately 14% by 2030 and 30% by 2050 due to global warming, posing critical threats to life, infrastructure, and ecosystems. Conventional disaster management frameworks rely on static simulations and passive data acquisition, hindering their ability to adapt to arbitrarily evolving wildfire episodes in real‑time. To address these limitations, we introduce the Intelligent Virtual Situation Room (IVSR), a bidirectional Digital Twin (DT) platform augmented by autonomous AI agents. The IVSR continuously ingests multisource sensor imagery, weather data, and 3D forest models to create a live virtual replica of the fire environment. A similarity engine powered by AI aligns emerging conditions with a precomputed Disaster Simulation Library, retrieving and calibrating intervention tactics under the watchful eyes of experts. Authorized action‑ranging from UAV redeployment to crew reallocation‑is cycled back through standardized procedures to the physical layer, completing the loop between response and analysis. We validate IVSR through detailed case‑study simulations provided by an industrial partner, demonstrating capabilities in localized incident detection, privacy‑preserving playback, collider‑based fire‑spread projection, and site‑specific ML retraining. Our results indicate marked reductions in detection‑to‑intervention latency and more effective resource coordination versus traditional systems. By uniting real‑time bidirectional DTs with agentic AI, IVSR offers a scalable, semi‑automated decision‑support paradigm for proactive, adaptive wildfire disaster management.
Authors: Jiarui Zhang, Chengyong Lei, Chengjiang Dai, Lijie Wang, Zhichao Han, Fei Gao
Abstract: Quadrotor unmanned aerial vehicles (UAVs) are increasingly deployed in complex missions that demand reliable autonomous navigation and robust obstacle avoidance. However, traditional modular pipelines often incur cumulative latency, whereas purely reinforcement learning (RL) approaches typically provide limited formal safety guarantees. To bridge this gap, we propose an end‑to‑end RL framework augmented with model‑based safety mechanisms. We incorporate physical priors in both training and deployment. During training, we design a physics‑informed reward structure that provides global navigational guidance. During deployment, we integrate a real‑time safety filter that projects the policy outputs onto a provably safe set to enforce strict collision‑avoidance constraints. This hybrid architecture reconciles high‑speed flight with robust safety assurances. Benchmark evaluations demonstrate that our method outperforms both traditional planners and recent end‑to‑end obstacle avoidance approaches based on differentiable physics. Extensive experiments demonstrate strong generalization, enabling reliable high‑speed navigation in dense clutter and challenging outdoor forest environments at velocities up to 7.5m/s.
Authors: Stefan Ivić, Luka Lanča, Karlo Jakac, Ante Sikirica, Stella Dumenčić, Matej Mališa, Zvonimir Mrle, Bojan Crnković
Abstract: This paper presents the integration of flow field reconstruction, dynamic probabilistic modeling, search control, and machine vision detection in a system for autonomous maritime search operations. Field experiments conducted in Valun Bay (Cres Island, Croatia) involved real‑time drifter data acquisition, surrogate flow model fitting based on computational fluid dynamics and numerical optimization, advanced multi‑UAV search control and vision sensing, as well as deep learning‑based object detection. The results demonstrate that a tightly coupled approach enables reliable detection of floating targets under realistic uncertainties and complex environmental conditions, providing concrete insights for future autonomous maritime search and rescue applications.
Authors: Melone Nyoba Tchonkeu, Soulaimane Berkane, Tarek Hamel
Abstract: This paper investigates the problem of attitude and air velocity estimation for fixed‑wing unmanned aerial vehicles (UAVs) using IMU measurements and at least one Pitot tube measurement, with almost global asymptotic stability (AGAS) guarantees. A cascade observer architecture is developed, in which a Riccati/Kalman‑type filter estimates the body‑fixed frame air velocity and the vehicle's tilt using IMU data as inputs and Pitot measurements as outputs. Under mild excitation conditions, the resulting air velocity and tilt estimation error dynamics are shown to be uniformly observable. The estimated tilt is then combined with magnetometer measurements in a nonlinear observer on SO(3) to recover the full attitude. Rigorous analysis establishes AGAS of the overall cascade structure under the uniform observability (UO) condition. The effectiveness of the proposed approach is demonstrated through validation on real flight data.
Authors: Yuanzhu Zhan, Yufei Jiang, Muqing Cao, Junyi Geng
Abstract: Aerial manipulation (AM) promises to move Unmanned Aerial Vehicles (UAVs) beyond passive inspection to contact‑rich tasks such as grasping, assembly, and in‑situ maintenance. Most prior AM demonstrations rely on external motion capture (MoCap) and emphasize position control for coarse interactions, limiting deployability. We present a fully onboard perception‑control pipeline for contact‑rich AM that achieves accurate motion tracking and regulated contact wrenches without MoCap. The main components are (1) an augmented visual‑inertial odometry (VIO) estimator with contact‑consistency factors that activate only during interaction, tightening uncertainty around the contact frame and reducing drift, and (2) image‑based visual servoing (IBVS) to mitigate perception‑control coupling, together with a hybrid force‑motion controller that regulates contact wrenches and lateral motion for stable contact. Experiments show that our approach closes the perception‑to‑wrench loop using only onboard sensing, yielding an velocity estimation improvement of 66.01% at contact, reliable target approach, and stable force holding‑pointing toward deployable, in‑the‑wild aerial manipulation.
Authors: Mei Ling Chee, Thangarajah Akilan, Aparna Ravindra Phalke, Kanchan Keisham
Abstract: Semantic segmentation in high‑resolution agricultural imagery demands models that strike a careful balance between accuracy and computational efficiency to enable deployment in practical systems. In this work, we propose DAS‑SK, a novel lightweight architecture that retrofits selective kernel convolution (SK‑Conv) into the dual atrous separable convolution (DAS‑Conv) module to strengthen multi‑scale feature learning. The model further enhances the atrous spatial pyramid pooling (ASPP) module, enabling the capture of fine‑grained local structures alongside global contextual information. Built upon a modified DeepLabV3 framework with two complementary backbones ‑ MobileNetV3‑Large and EfficientNet‑B3, the DAS‑SK model mitigates limitations associated with large dataset requirements, limited spectral generalization, and the high computational cost that typically restricts deployment on UAVs and other edge devices. Comprehensive experiments across three benchmarks: LandCover.ai, VDD, and PhenoBench, demonstrate that DAS‑SK consistently achieves state‑of‑the‑art performance, while being more efficient than CNN‑, transformer‑, and hybrid‑based competitors. Notably, DAS‑SK requires up to 21x fewer parameters and 19x fewer GFLOPs than top‑performing transformer models. These findings establish DAS‑SK as a robust, efficient, and scalable solution for real‑time agricultural robotics and high‑resolution remote sensing, with strong potential for broader deployment in other vision domains.
Authors: Zinan Lv, Yeqian Qian, Chen Sang, Hao Liu, Danping Zou, Ming Yang
Abstract: UAV navigation in unstructured outdoor environments using passive monocular vision is hindered by the substantial visual domain gap between simulation and reality. While 3D Gaussian Splatting enables photorealistic scene reconstruction from real‑world data, existing methods inherently couple static lighting with geometry, severely limiting policy generalization to dynamic real‑world illumination. In this paper, we propose a novel end‑to‑end reinforcement learning framework designed for effective zero‑shot transfer to unstructured outdoors. Within a high‑fidelity simulation grounded in real‑world data, our policy is trained to map raw monocular RGB observations directly to continuous control commands. To overcome photometric limitations, we introduce Relightable 3D Gaussian Splatting, which decomposes scene components to enable explicit, physically grounded editing of environmental lighting within the neural representation. By augmenting training with diverse synthesized lighting conditions ranging from strong directional sunlight to diffuse overcast skies, we compel the policy to learn robust, illumination‑invariant visual features. Extensive real‑world experiments demonstrate that a lightweight quadrotor achieves robust, collision‑free navigation in complex forest environments at speeds up to 10 m/s, exhibiting significant resilience to drastic lighting variations without fine‑tuning.
Authors: Yongkang Lai, Xihan Mu, Dasheng Fan, Donghui Xie, Shanxin Guo, Wenli Huang, Tianjie Zhao, Guangjian Yan
Abstract: Large‑scale, high‑resolution forest canopy height mapping plays a crucial role in understanding regional and global carbon and water cycles. Spaceborne LiDAR missions, including the Ice, Cloud, and Land Elevation Satellite‑2 (ICESat‑2) and the Global Ecosystem Dynamics Investigation (GEDI), provide global observations of forest structure but are spatially sparse and subject to inherent uncertainties. In contrast, near‑surface LiDAR platforms, such as airborne and unmanned aerial vehicle (UAV) LiDAR systems, offer much finer measurements of forest canopy structure, and a growing number of countries have made these datasets openly available. In this study, a state‑of‑the‑art monocular depth estimation model, Depth Anything V2, was trained using approximately 16,000 km2 of canopy height models (CHMs) derived from publicly available airborne LiDAR point clouds and related products across multiple countries, together with 3 m resolution PlanetScope and airborne RGB imagery. The trained model, referred to as Depth2CHM, enables the estimation of spatially continuous CHMs directly from PlanetScope RGB imagery. Independent validation was conducted at sites in China (approximately 1 km2) and the United States (approximately 116 km2). The results showed that Depth2CHM could accurately estimate canopy height, with biases of 0.59 m and 0.41 m and root mean square errors (RMSEs) of 2.54 m and 5.75 m for these two sites, respectively. Compared with an existing global meter‑resolution CHM product, the mean absolute error is reduced by approximately 1.5 m and the RMSE by approximately 2 m. These results demonstrated that monocular depth estimation networks trained with large‑scale airborne LiDAR‑derived canopy height data provide a promising and scalable pathway for high‑resolution, spatially continuous forest canopy height estimation from satellite RGB imagery.
Authors: Xusheng Zhu, Kai-Kit Wong, Hanjiang Hong, Han Xiao, Hao Xu, Tuo Wu, Chan-Byoung Chae
Abstract: This paper develops a framework for analyzing UAV‑enabled short‑packet communication, leveraging fluid antenna system (FAS)‑assisted relaying networks. Operating in the short‑packet regime and focusing on challenging urban environments, we derive novel, closed‑form expressions for the block error rate (BLER). This is achieved by modeling the spatially correlated Nakagami‑m fading link via a tractable eigenvalue‑based approach. A high‑signal‑to‑noise ratio (SNR) asymptotic analysis is also presented, revealing the system's fundamental diversity order. Building on this analysis, we formulate a novel energy efficiency (EE) maximization problem that, unlike idealized models, uniquely incorporates the non‑trivial time and energy overheads of FAS port selection. An efficient hierarchical algorithm is proposed to jointly optimize key system parameters. Numerical results validate our analysis, demonstrating that while FAS provides substantial power gains, the operational overhead creates a critical trade‑off. This trade‑off dictates an optimal number of FAS ports and a non‑trivial optimal UAV deployment altitude, governed by the balance between blockage and path loss. This work provides key insights for FAS‑aided UAV communications.
Authors: Faisal Al-Kamali, Francois Chan, Hussein A. Ammar, James H. Bayes, Claude D'Amours
Abstract: Relays are pivotal in military communication networks, expanding coverage and ensuring reliable connectivity in challenging operational environments. While traditional terrestrial relays (TR) are constrained by fixed locations and vulnerability to physical obstructions, unmanned aerial vehicle (UAV)‑mounted aerial relays (AR) offer a dynamic and flexible alternative by operating above obstacles and adapting to changing battlefield conditions. This paper provides a comprehensive survey of AR systems in military communications, presenting a detailed comparison between AR and TR paradigms and examining two specific AR technologies: active aerial relays (AAR) and aerial reconfigurable intelligent surface (ARIS) relays. The survey delves into their operation, benefits, challenges, and military applications, supported by a qualitative analysis across metrics such as coverage, flexibility, security, and cost. A novel multi‑dimensional metric, the mission‑critical relay effectiveness score (MCRES), is introduced as a quantitative method for evaluating relay suitability based on mission‑specific weights for critical attributes like mobility, jamming resilience, deployment speed, stealth, coverage, and autonomy. Furthermore, we present Algorithm 1, a decision‑making framework that leverages the MCRES to guide the systematic selection of the optimal relay type, AR or TR, and subsequently AAR or ARIS, tailored to the unique demands of a given military scenario, such as dynamic battlefield operations, electronic warfare, or covert missions. Finally, the paper addresses current implementation challenges and outlines promising future research directions to advance the deployment of robust and resilient UAV‑mounted relay systems in contested military environments.
Authors: Zhang Hengyu, Maryam Cheraghy, Liu Wei, Armin Farhadi, Meysam Soltanpour, Zhong Zhuoqing
Abstract: This paper proposes an Improved Noisy Deep Q‑Network (Noisy DQN) to enhance the exploration and stability of Unmanned Aerial Vehicle (UAV) when applying deep reinforcement learning in simulated environments. This method enhances the exploration ability by combining the residual NoisyLinear layer with an adaptive noise scheduling mechanism, while improving training stability through smooth loss and soft target network updates. Experiments show that the proposed model achieves faster convergence and up to +40 higher rewards compared to standard DQN and quickly reach to the minimum number of steps required for the task 28 in the 15 15 grid navigation environment set up. The results show that our comprehensive improvements to the network structure of NoisyNet, exploration control, and training stability contribute to enhancing the efficiency and reliability of deep Q‑learning.
Authors: Bessie Dominguez-Dager, Sergio Suescun-Ferrandiz, Felix Escalona, Francisco Gomez-Donoso, Miguel Cazorla
Abstract: This paper introduces VLN‑Pilot, a novel framework in which a large Vision‑and‑Language Model (VLLM) assumes the role of a human pilot for indoor drone navigation. By leveraging the multimodal reasoning abilities of VLLMs, VLN‑Pilot interprets free‑form natural language instructions and grounds them in visual observations to plan and execute drone trajectories in GPS‑denied indoor environments. Unlike traditional rule‑based or geometric path‑planning approaches, our framework integrates language‑driven semantic understanding with visual perception, enabling context‑aware, high‑level flight behaviors with minimal task‑specific engineering. VLN‑Pilot supports fully autonomous instruction‑following for drones by reasoning about spatial relationships, obstacle avoidance, and dynamic reactivity to unforeseen events. We validate our framework on a custom photorealistic indoor simulation benchmark and demonstrate the ability of the VLLM‑driven agent to achieve high success rates on complex instruction‑following tasks, including long‑horizon navigation with multiple semantic targets. Experimental results highlight the promise of replacing remote drone pilots with a language‑guided autonomous agent, opening avenues for scalable, human‑friendly control of indoor UAVs in tasks such as inspection, search‑and‑rescue, and facility monitoring. Our results suggest that VLLM‑based pilots may dramatically reduce operator workload while improving safety and mission flexibility in constrained indoor environments.
Authors: Runxiao Liu, Pengda Mao, Xiangli Le, Shuang Gu, Yapeng Chen, Quan Quan
Abstract: This paper proposes a novel control framework for cooperative transportation of cable‑suspended loads by multiple unmanned aerial vehicles (UAVs) operating in constrained environments. Leveraging virtual tube theory and principles from dissipative systems theory, the framework facilitates efficient multi‑UAV collaboration for navigating obstacle‑rich areas. The proposed framework offers several key advantages. (1) It achieves tension distribution and coordinated transportation within the UAV‑cable‑load system with low computational overhead, dynamically adapting UAV configurations based on obstacle layouts to facilitate efficient navigation. (2) By integrating dissipative systems theory, the framework ensures high stability and robustness, essential for complex multi‑UAV operations. The effectiveness of the proposed approach is validated through extensive simulations, demonstrating its scalability for large‑scale multi‑UAV systems. Furthermore, the method is experimentally validated in outdoor scenarios, showcasing its practical feasibility and robustness under real‑world conditions.
Authors: Zhiyu Chen, Ming-Min Zhao, Songfu Cai, Ming Lei, Min-Jian Zhao
Abstract: Unmanned aerial vehicles (UAVs) are increasingly deployed in mission‑critical applications such as target tracking, where they must simultaneously sense dynamic environments, ensure reliable communication, and achieve precise control. A key challenge here is to jointly guarantee tracking accuracy, communication reliability, and control stability within a unified framework. To address this issue, we propose an integrated sensing, communication, and control (ISCC) framework for UAV‑assisted target tracking, where the considered tracking system is modeled as a discrete‑time linear control process, with the objective of driving the deviation between the UAV and target states toward zero. We formulate a stochastic model predictive control (MPC) optimization problem for joint control and beamforming design, which is highly non‑convex and intractable in its original form. To overcome this difficulty, the target state is first estimated using an extended Kalman filter (EKF). Then, by deriving the closed‑form optimal beamforming solution under a given control input, the original problem is equivalently reformulated into a tractable control‑oriented form. Finally, we convexify the remaining non‑convex constraints via a relaxation‑based convex approximation, yielding a computationally tractable convex optimization problem that admits efficient global solution. Numerical results show that the proposed ISCC framework achieves tracking accuracy comparable to a non‑causal benchmark while maintaining stable communication, and it significantly outperforms the conventional control and tracking method.
Authors: Jan Michalczyk
Abstract: Recently, the progress in the radar sensing technology consisting in the miniaturization of the packages and increase in measuring precision has drawn the interest of the robotics research community. Indeed, a crucial task enabling autonomy in robotics is to precisely determine the pose of the robot in space. To fulfill this task sensor fusion algorithms are often used, in which data from one or several exteroceptive sensors like, for example, LiDAR, camera, laser ranging sensor or GNSS are fused together with the Inertial Measurement Unit (IMU) measurements to obtain an estimate of the navigation states of the robot. Nonetheless, owing to their particular sensing principles, some exteroceptive sensors are often incapacitated in extreme environmental conditions, like extreme illumination or presence of fine particles in the environment like smoke or fog. Radars are largely immune to aforementioned factors thanks to the characteristics of electromagnetic waves they use. In this thesis, we present Radar‑Inertial Odometry (RIO) algorithms to fuse the information from IMU and radar in order to estimate the navigation states of a (Uncrewed Aerial Vehicle) UAV capable of running on a portable resource‑constrained embedded computer in real‑time and making use of inexpensive, consumer‑grade sensors. We present novel RIO approaches relying on the multi‑state tightly‑coupled Extended Kalman Filter (EKF) and Factor Graphs (FG) fusing instantaneous velocities of and distances to 3D points delivered by a lightweight, low‑cost, off‑the‑shelf Frequency Modulated Continuous Wave (FMCW) radar with IMU readings. We also show a novel way to exploit advances in deep learning to retrieve 3D point correspondences in sparse and noisy radar point clouds.
Authors: Yufei Ye, Shijian Gao, Xinhu Zheng, Liuqing Yang
Abstract: The agile mobility of Unmanned Aerial Vehicles (UAVs) makes them ideal for low‑altitude edge computing. This paper proposes a novel multi‑tier UAV edge computing system where lightweight Low‑Tier UAVs (L‑UAVs) function as edge servers for vehicle users, supported by a powerful High‑Tier UAV (H‑UAV) acting as a backup server. The objective is to minimize task execution delays while ensuring the long‑term energy stability of the L‑UAVs, despite unknown future system states. To this end, the problem is decoupled using Lyapunov optimization, which adaptively balances the priorities of task delays and L‑UAV energy cost based on their real‑time energy states. An efficient vehicle to L‑UAV matching scheme is designed, and the joint optimization problem for task assignment, computing resource allocation, and trajectory control of L‑UAVs and H‑UAV is then solved via a Block Coordinate Descent (BCD) algorithm. Simulation results demonstrate a reduction in L‑UAV transmission energy of over 26% and superior L‑UAV energy stability compared to existing benchmarks.
Authors: Nicolaj Haarhøj Malle, Emad Ebeid
Abstract: Detecting and estimating distances to power lines is a challenge for both human UAV pilots and autonomous systems, which increases the risk of unintended collisions. We present a mmWave radar‑based perception system that provides spherical sensing coverage around a small UAV for robust power line detection and avoidance. The system integrates multiple compact solid‑state mmWave radar modules to synthesize an omnidirectional field of view while remaining lightweight. We characterize the sensing behavior of this omnidirectional radar arrangement in power line environments and develop a robust detection‑and‑avoidance algorithm tailored to that behavior. Field experiments on real power lines demonstrate reliable detection at ranges up to 10 m, successful avoidance maneuvers at flight speeds upwards of 10 m/s, and detection of wires as thin as 1.2 mm in diameter. These results indicate the approach's suitability as an additional safety layer for both autonomous and manual UAV flight.
Authors: Tiago Leite, Maria Conceição, António Grilo
Abstract: The exploration of unknown, Global Navigation Satellite System (GNSS) denied environments by an autonomous communication‑aware and collaborative group of Unmanned Aerial Vehicles (UAVs) presents significant challenges in coordination, perception, and decentralized decision‑making. This paper implements Multi‑Agent Reinforcement Learning (MARL) to address these challenges in a 2D indoor environment, using high‑fidelity game‑engine simulations (Godot) and continuous action spaces. Policy training aims to achieve emergent collaborative behaviours and decision‑making under uncertainty using Network‑Distributed Partially Observable Markov Decision Processes (ND‑POMDPs). Each UAV is equipped with a Light Detection and Ranging (LiDAR) sensor and can share data (sensor measurements and a local occupancy map) with neighbouring agents. Inter‑agent communication constraints include limited range, bandwidth and latency. Extensive ablation studies evaluated MARL training paradigms, reward function, communication system, neural network (NN) architecture, memory mechanisms, and POMDP formulations. This work jointly addresses several key limitations in prior research, namely reliance on discrete actions, single‑agent or centralized formulations, assumptions of a priori knowledge and permanent connectivity, inability to handle dynamic obstacles, short planning horizons and architectural complexity in Recurrent NNs/Transformers. Results show that the scalable training paradigm, combined with a simplified architecture, enables rapid autonomous exploration of an indoor area. The implementation of Curriculum‑Learning (five increasingly complex levels) also enabled faster, more robust training. This combination of high‑fidelity simulation, MARL formulation, and computational efficiency establishes a strong foundation for deploying learned cooperative strategies in physical robotic systems.
Authors: Filip Novák, Matěj Petrlík, Matej Novosad, Parakh M. Gupta, Robert Pěnička, Martin Saska
Abstract: Fast flights with aggressive maneuvers in cluttered GNSS‑denied environments require fast, reliable, and accurate UAV state estimation. In this paper, we present an approach for onboard state estimation of a high‑speed UAV using a monocular RGB camera and an IMU. Our approach fuses data from Visual‑Inertial Odometry (VIO), an onboard landmark‑based camera measurement system, and an IMU to produce an accurate state estimate. Using onboard measurement data, we estimate and compensate for VIO drift through a novel mathematical drift model. State‑of‑the‑art approaches often rely on more complex hardware (e.g., stereo cameras or rangefinders) and use uncorrected drifting VIO velocities, orientation, and angular rates, leading to errors during fast maneuvers. In contrast, our method corrects all VIO states (position, orientation, linear and angular velocity), resulting in accurate state estimation even during rapid and dynamic motion. Our approach was thoroughly validated through 1600 simulations and numerous real‑world experiments. Furthermore, we applied the proposed method in the A2RL Drone Racing Challenge 2025, where our team advanced to the final four out of 210 teams and earned a medal.
Authors: Dibyayan Patra, Pasindu Ranasinghe, Bikram Banerjee, Simit Raval
Abstract: Characterisation of structural discontinuity sets in exposed rock faces of underground mine cavities is essential for assessing rock‑mass stability, excavation safety, and operational efficiency. UAV and other mobile laser‑scanning techniques provide efficient means of collecting point clouds from rock faces. However, the development of a robust and efficient approach for automatic characterisation of discontinuity sets in real‑world scenarios, like fully enclosed rock faces in cavities, remains an open research problem. In this study, a new approach is proposed for automatic discontinuity set characterisation that uses a single‑shot filtering strategy, an innovative cyclic orientation transformation scheme and a hierarchical clustering technique. The single‑shot filtering step isolates planar regions while robustly suppressing noise and high‑curvature artefacts in one pass using a signal‑processing technique. To address the limitations of Cartesian clustering on polar orientation data, a cyclic orientation transformation scheme is developed, enabling accurate representation of dip angle and dip direction in Cartesian space. The transformed orientations are then characterised into sets using a hierarchical clustering technique, which handles varying density distributions and identifies clusters without requiring user‑defined set numbers. The accuracy of the method is validated on real‑world mine stope and against ground truth obtained using manually handpicked discontinuity planes identified with the Virtual Compass tool, as well as widely used automated structure mapping techniques. The proposed approach outperforms the other techniques by exhibiting the lowest mean absolute error in estimating discontinuity set orientations in real‑world stope data with errors of 1.95° and 2.20° in nominal dip angle and dip direction, respectively, and dispersion errors lying below 3°.
Authors: Jiaming Cui, Wenqiang Li, Shuai Zhou, Ruifeng Qin, Feng Shen
Abstract: Transmission line defect detection remains challenging for automated UAV inspection due to the dominance of small‑scale defects, complex backgrounds, and illumination variations. Existing RGB‑based detectors, despite recent progress, struggle to distinguish geometrically subtle defects from visually similar background structures under limited chromatic contrast. This paper proposes CMAFNet, a Cross‑Modal Alignment and Fusion Network that integrates RGB appearance and depth geometry through a principled purify‑then‑fuse paradigm. CMAFNet consists of a Semantic Recomposition Module that performs dictionary‑based feature purification via a learned codebook to suppress modality‑specific noise while preserving defect‑discriminative information, and a Contextual Semantic Integration Framework that captures global spatial dependencies using partial‑channel attention to enhance structural semantic reasoning. Position‑wise normalization within the purification stage enforces explicit reconstruction‑driven cross‑modal alignment, ensuring statistical compatibility between heterogeneous features prior to fusion. Extensive experiments on the TLRGBD benchmark, where 94.5% of instances are small objects, demonstrate that CMAFNet achieves 32.2% mAP@50 and 12.5% APs, outperforming the strongest baseline by 9.8 and 4.0 percentage points, respectively. A lightweight variant reaches 24.8% mAP50 at 228 FPS with only 4.9M parameters, surpassing all YOLO‑based detectors while matching transformer‑based methods at substantially lower computational cost.
Authors: Aditya Shibu, Marah Saleh, Mohamed Al-Musleh, Nidhal Abdulaziz
Abstract: Unmanned Aerial Vehicle (UAV) swarms offer versatile applications in logistics, agriculture, and surveillance, yet controlling them requires expert knowledge for safety and feasibility. Traditional static methods limit adaptability, while Large Language Models (LLMs) enable natural language control but generate unsafe trajectories due to lacking physical grounding. This paper introduces SkySim, a ROS2‑based simulation framework in Gazebo that decouples LLM high‑level planning from low‑level safety enforcement. Using Gemini 3.5 Pro, SkySim translates user commands (e.g., "Form a circle") into spatial waypoints, informed by real‑time drone states. An Artificial Potential Field (APF) safety filter applies minimal adjustments for collision avoidance, kinematic limits, and geo‑fencing, ensuring feasible execution at 20 Hz. Experiments with swarms of 3, 10, and 30 Crazyflie drones validate spatial reasoning accuracy (100% across tested geometric primitives), real‑time collision prevention, and scalability. SkySim empowers non‑experts to iteratively refine behaviors, bridging AI cognition with robotic safety for dynamic environments. Future work targets hardware integration.
Authors: Astik Srivastava, Thomas J Chackenkulam, Bitla Bhanu Teja, Antony Thomas, Madhava Krishna
Abstract: We address the problem of reactive motion planning for quadrotors operating in unknown environments with dynamic obstacles. Our approach leverages a 4‑dimensional spatio‑temporal planner, integrated with vision‑based Safe Flight Corridor (SFC) generation and trajectory optimization. Unlike prior methods that rely on map fusion, our framework is mapless, enabling collision avoidance directly from perception while reducing computational overhead. Dynamic obstacles are detected and tracked using a vision‑based object segmentation and tracking pipeline, allowing robust classification of static versus dynamic elements in the scene. To further enhance robustness, we introduce a backup planning module that reactively avoids dynamic obstacles when no direct path to the goal is available, mitigating the risk of collisions during deadlock situations. We validate our method extensively in both simulation and real‑world hardware experiments, and benchmark it against state‑of‑the‑art approaches, showing significant advantages for reactive UAV navigation in dynamic, unknown environments.
Authors: Chunliang Hua, Zeyuan Yang, Lei Zhang, Jiayang Sun, Fengwen Chen, Chunlan Zeng, Xiao Hu
Abstract: Safe UAV emergency landing requires more than just identifying flat terrain; it demands understanding complex semantic risks (e.g., crowds, temporary structures) invisible to traditional geometric sensors. In this paper, we propose a novel framework leveraging Remote Sensing (RS) imagery and Multimodal Large Language Models (MLLMs) for global context‑aware landing site assessment. Unlike local geometric methods, our approach employs a coarse‑to‑fine pipeline: first, a lightweight semantic segmentation module efficiently pre‑screens candidate areas; second, a vision‑language reasoning agent fuses visual features with Point‑of‑Interest (POI) data to detect subtle hazards. To validate this approach, we construct and release the Emergency Landing Site Selection (ELSS) benchmark. Experiments demonstrate that our framework significantly outperforms geometric baselines in risk identification accuracy. Furthermore, qualitative results confirm its ability to generate human‑like, interpretable justifications, enhancing trust in automated decision‑making. The benchmark dataset is publicly accessible at https://anonymous.4open.science/r/ELSS‑dataset‑43D7.
Authors: Yue Zhong, Jiawen Kang, Yongju Tong, Hong-Ning Dai, Dong In Kim, Abbas Jamalipour, Shengli Xie
Abstract: With the rapid expansion of the low‑altitude economy, Unmanned Aerial Vehicles (UAVs) serve as pivotal aerial base stations supporting diverse services from users, ranging from latency‑sensitive critical missions to bandwidth‑intensive data streaming. However, the efficacy of such heterogeneous networks is often compromised by the conflict between limited onboard resources and stringent stability requirements. Moving beyond traditional throughput‑centric designs, we propose a Sensing‑Communication‑Computing‑Control closed‑loop framework that explicitly models the impact of communication latency on physical control stability. To guarantee mission reliability, we leverage the Lyapunov stability theory to derive an intrinsic mapping between the state evolution of the control system and communication constraints, transforming abstract stability requirements into quantifiable resource boundaries. Then, we formulate the resource allocation problem as a Stackelberg game, where UAVs (as leaders) dynamically price resources to balance load and ensure stability, while users (as followers) optimize requests based on service urgency. Furthermore, addressing the prohibitive computational overhead of standard Deep Reinforcement Learning (DRL) on energy‑constrained edge platforms, we propose a novel and lightweight pruning‑based Proximal Policy Optimization (PPO) algorithm. By integrating a dynamic structured pruning mechanism, the proposed algorithm significantly compresses the neural network scale during training, enabling the UAV to rapidly approximate the game equilibrium with minimal inference latency. Simulation results demonstrate that the proposed scheme effectively secures control loop stability while maximizing system utility in dynamic low‑altitude environments.
Authors: Weiqi Gai, Yuman Gao, Yuan Zhou, Yufan Xie, Zhiyang Liu, Yuze Wu, Xin Zhou, Fei Gao, Zhijun Meng
Abstract: Zero‑Shot Object Navigation in unknown environments poses significant challenges for Unmanned Aerial Vehicles (UAVs) due to the conflict between high‑level semantic reasoning requirements and limited onboard computational resources. To address this, we present USS‑Nav, a lightweight framework that incrementally constructs a Unified Spatio‑Semantic scene graph and enables efficient Large Language Model (LLM)‑augmented Zero‑Shot Object Navigation in unknown environments. Specifically, we introduce an incremental Spatial Connectivity Graph generation method utilizing polyhedral expansion to capture global geometric topology, which is dynamically partitioned into semantic regions via graph clustering. Concurrently, open‑vocabulary object semantics are instantiated and anchored to this topology to form a hierarchical environmental representation. Leveraging this hierarchical structure, we present a coarse‑to‑fine exploration strategy: LLM grounded in the scene graph's semantics to determine global target regions, while a local planner optimizes frontier coverage based on information gain. Experimental results demonstrate that our framework outperforms state‑of‑the‑art methods in terms of computational efficiency and real‑time update frequency (15 Hz) on a resource‑constrained platform. Furthermore, ablation studies confirm the effectiveness of our framework, showing substantial improvements in Success weighted by Path Length (SPL). The source code will be made publicly available to foster further research.
Authors: Jiahe Wu, Bing Cao, Qilong Wang, Qinghua Hu, Dongdong Li, Pengfei Zhu
Abstract: Multimodal Large Language Models (MLLM) are primarily pre‑trained on the RGB modality, thereby limiting their performance on other modalities, such as infrared, depth, and event data, which are crucial for complex scenarios. To address this, we propose RGBX‑R1, a framework to enhance MLLM's perception and reasoning capacities across various X visual modalities. Specifically, we employ an Understand‑Associate‑Validate (UAV) prompting strategy to construct the Visual Modality Chain‑of‑Thought (VM‑CoT), which aims to expand the MLLMs' RGB understanding capability into X modalities. To progressively enhance reasoning capabilities, we introduce a two‑stage training paradigm: Cold‑Start Supervised Fine‑Tuning (CS‑SFT) and Spatio‑Temporal Reinforcement Fine‑Tuning (ST‑RFT). CS‑SFT supervises the reasoning process with the guidance of VM‑CoT, equipping the MLLM with fundamental modality cognition. Building upon GRPO, ST‑RFT employs a Modality‑understanding Spatio‑Temporal (MuST) reward to reinforce modality reasoning. Notably, we construct the first RGBX‑Grounding benchmark, and extensive experiments verify our superiority in multimodal understanding and spatial perception, outperforming baselines by 22.71% on three RGBX grounding tasks.
Authors: Boya Li, Xiaonan Liu, Dongzhu Liu, Dusit Niyato, Zhu Han
Abstract: Uncrewed aerial vehicles (UAVs) have played an important role in the low‑altitude economy and have been used in various applications. However, with the increasing number of UAVs and explosive wireless data, the existing bit‑oriented communication network has approached the Shannon capacity, which cannot satisfy the quality of service (QoS) with ultra‑reliable low‑latency communication (URLLC) requirements for command and control (C\&C) transmission in bit‑oriented UAV communication networks. To address this issue, we propose a novel semantic‑aware C\&C transmission for multi‑UAVs under limited wireless resources. Specifically, we leverage semantic similarity to measure the variation in C\&C messages for each UAV over continuous transmission time intervals (TTIs) and capture the correlation of C\&C messages among UAVs, enabling multicast transmission. Based on the semantic similarity and the importance of UAV commands, we design a trigger function to quantify the QoS of UAVs. Then, to maximize the long‑term QoS and exploit multicast opportunities of C\&C messages induced by semantic similarity, we develop a proximal policy optimization (PPO) algorithm to jointly determine the transmission mode (unicast/multicast/idle) and the allocation of limited resource blocks (RBs) between a base station (BS) and UAVs. Experimental results show that our proposed semantic‑aware framework significantly increases transmission efficiency and improves effectiveness compared with bit‑oriented UAV transmission.
Authors: Yuan Gao, Xinyu Guo, Wenjing Xie, Zifan Wang, Hongwen Yu, Gongyang Li, Shugong Xu
Abstract: To meet the requirements for managing unauthorized UAVs in the low‑altitude economy, a multi‑modal UAV trajectory prediction method based on the fusion of LiDAR and millimeter‑wave radar information is proposed. A deep fusion network for multi‑modal UAV trajectory prediction, termed the Multi‑Modal Deep Fusion Framework, is designed. The overall architecture consists of two modality‑specific feature extraction networks and a bidirectional cross‑attention fusion module, aiming to fully exploit the complementary information of LiDAR and radar point clouds in spatial geometric structure and dynamic reflection characteristics. In the feature extraction stage, the model employs independent but structurally identical feature encoders for LiDAR and radar. After feature extraction, the model enters the Bidirectional Cross‑Attention Mechanism stage to achieve information complementarity and semantic alignment between the two modalities. To verify the effectiveness of the proposed model, the MMAUD dataset used in the CVPR 2024 UG2+ UAV Tracking and Pose‑Estimation Challenge is adopted as the training and testing dataset. Experimental results show that the proposed multi‑modal fusion model significantly improves trajectory prediction accuracy, achieving a 40% improvement compared to the baseline model. In addition, ablation experiments are conducted to demonstrate the effectiveness of different loss functions and post‑processing strategies in improving model performance. The proposed model can effectively utilize multi‑modal data and provides an efficient solution for unauthorized UAV trajectory prediction in the low‑altitude economy.
Authors: Tian-Tian Lin, Yi Liu, Xiao-Wei Tang, Yunmei Shi, Yi Huang, Zhongxiang Wei, Qingqing Wu, Yuhan Dong
Abstract: Recently, the integration of unmanned aerial vehicle (UAV) and visible light communication (VLC) technologies has emerged as a promising solution to offer flexible communication and efficient lighting. This letter investigates the three‑dimensional trajectory planning in a UAV‑assisted VLC system, where a UAV is dispatched to collect data from ground users (GUs). The core objective is to develop a trajectory planning framework that minimizes UAV flight distance, which is equivalent to maximizing the data collection efficiency. This issue is formulated as a challenging mixed‑integer non‑convex optimization problem. To tackle it, we first derive a closed‑form optimal flight altitude under specific VLC channel gain threshold. Subsequently, we optimize the UAV horizontal trajectory by integrating a novel pheromone‑driven reward mechanism with the twin delayed deep deterministic policy gradient algorithm, which enables adaptive UAV motion strategy in complex environments. Simulation results validate that the derived optimal altitude effectively reduces the flight distance by up to 35% compared to baseline methods. Additionally, the proposed reward mechanism significantly shortens the convergence steps by approximately 50%, demonstrating notable efficiency gains in the context of UAV‑assisted VLC data collection.
Authors: Elhadj Moustapha Diallo, Mamadou Aliou Diallo, Abusaeed B. M. Adam, Muhammad Naeem Shah
Abstract: This paper considers a hybrid reconfigurable environment comprising a UAV‑mounted reflecting RIS, an outdoor STAR‑RIS enabling simultaneous transmission and reflection, and an indoor holographic RIS (H‑RIS), jointly enhancing secure downlink communication for indoor and outdoor users. The system operates under user mobility, dynamic blockages, colluding idle and active eavesdroppers, and transceiver and surface hardware impairments. A 3GPP and ITU‑compliant stochastic channel model is developed, capturing mobility‑induced covariance evolution, outdoor‑indoor penetration losses, and distortion‑aware noise due to practical EVM‑based impairments. We aim to minimize the aggregate secrecy‑outage probability subject to secrecy‑rate constraints, QoS requirements, power limitations, and statistical CSI uncertainty. The resulting problem contains coupled secrecy and QoS chance constraints and nonlinear interactions among the BS beamforming vectors, multi‑surface phase coefficients, and UAV position. To handle these difficulties, we derive rigorous Bernstein‑type deterministic approximations for all chance constraints, yielding a distributionally robust reformulation. Building on this, we propose an alternating optimization framework that employs successive convex approximation (SCA) to convexify each block and solve the BS beamforming, RIS, STAR‑RIS, H‑RIS configuration, and UAV placement subproblems efficiently. The proposed algorithm is shown to monotonically decrease a smooth surrogate of the secrecy‑outage cost and converge to a stationary point of the robustified problem. Simulations based on 3GPP TR 38.901, TR 36.873, and ITU‑R P.2109 demonstrate that integrating UAV‑RIS, STAR‑RIS, and H‑RIS significantly reduces secrecy‑outage probability compared with benchmark schemes and provides strong robustness to channel uncertainty, blockages, colluding eavesdroppers, and hardware impairments.
Authors: Chuan-Chi Lai
Abstract: This paper addresses catastrophic forgetting in mobile edge UAV networks within dynamic spatiotemporal environments. Conventional deep reinforcement learning often fails during task transitions, necessitating costly retraining to adapt to new user distributions. We propose the spatiotemporal continual learning (STCL) framework, realized through the group‑decoupled multi‑agent proximal policy optimization (G‑MAPPO) algorithm. The core innovation lies in the integration of a group‑decoupled policy optimization (GDPO) mechanism with a gradient orthogonalization layer to balance heterogeneous objectives including energy efficiency, user fairness, and coverage. This combination employs dynamic z‑score normalization and gradient projection to mitigate conflicts without offline resets. Furthermore, 3D UAV mobility serves as a spatial compensation layer to manage extreme density shifts. Simulations demonstrate that the STCL framework ensures resilience, with service reliability recovering to over 0.9 for moderate loads of up to 100 users. Even under extreme saturation with 140 users, G‑MAPPO maintains a significant performance lead over the multi‑agent deep deterministic policy gradient (MADDPG) baseline by preventing policy stagnation. The algorithm delivers an effective capacity gain of 20 percent under high traffic loads, validating its potential for scalable aerial edge swarms.
Authors: Carmen D. R. Pita-Romero, Pedro Arias-Perez, Miguel Fernandez-Cortizas, Rafael Perez-Segui, Pascual Campoy
Abstract: Maintaining the formation of complex structures with multiple UAVs and achieving complex trajectories remains a major challenge. This work presents an algorithm for implementing the flocking behavior of UAVs based on the concept of Virtual Centroid to easily develop a structure for the flock. The approach builds on the classical virtual‑based behavior, providing a theoretical framework for incorporating enhancements to dynamically control both the number of agents and the formation of the structure. Simulation tests and real‑world experiments were conducted, demonstrating its simplicity even with complex formations and complex trajectories.
Authors: Junaid Sajid, Ivo Müürsepp, Luca Reggiani, Davide Scazzoli, Federico Francesco Luigi Mariani, Maurizio Magarini, Rizwan Ahmad, Muhammad Mahtab Alam
Abstract: Uncrewed Aerial Vehicles (UAVs) are increasingly used in civilian and industrial applications, making secure low‑altitude operations crucial. In dense mmWave environments, accurately classifying low‑altitude UAVs as either inside authorized or restricted airspaces remains challenging, requiring models that handle complex propagation and signal variability. This paper proposes a deep learning model, referred to as CoBA, which stands for integrated Convolutional Neural Network (CNN), Bidirectional Long Short‑Term Memory (BiLSTM), and Attention which leverages Fifth Generation (5G) millimeter‑wave (mmWave) radio measurements to classify UAV operations in authorized and restricted airspaces at low altitude. The proposed CoBA model integrates convolutional, bidirectional recurrent, and attention layers to capture both spatial and temporal patterns in UAV radio measurements. To validate the model, a dedicated dataset is collected using the 5G mmWave network at TalTech, with controlled low altitude UAV flights in authorized and restricted scenarios. The model is evaluated against conventional ML models and a fingerprinting‑based benchmark. Experimental results show that CoBA achieves superior accuracy, significantly outperforming all baseline models and demonstrating its potential for reliable and regulated UAV airspace monitoring.
Authors: Xinran Wang, Peng Wu, Xiaopeng Yuan, Yulin Hu, Anke Schmeink
Abstract: We study dual‑unmanned aerial vehicle (UAV) jamming‑aided secure communication networks, in which one UAV delivers confidential data to multiple ground users (GUs), while a cooperative UAV provides protective interference against a ground eavesdropper. To enforce fairness, we maximize the minimum secrecy throughput across GUs by jointly designing trajectories and communication scheduling. The key difficulty lies in the continuous‑time nature of UAV trajectories and the tight space‑time coupling between the transmitter and the jammer, which jointly render the problem infinite‑dimensional and nonconvex. To address these challenges, we characterize, for the first time, the structure of the optimal trajectories and rigorously prove that they follow a collaborative successive hover‑and‑fly (co‑SHF) structure, where the two UAVs visit a limited number of synchronized co‑hovering point pairs, and during each flight segment at least one UAV moves at maximum speed. Leveraging this structure, we reformulate the problem into a finite‑dimensional form, without loss of optimality, over hovering and turning points, hovering durations, and scheduling. For tractability, we adopt a minimum‑distance approximation of continuous anti‑collision constraints and employ concave lower bounds on secrecy throughput within a successive convex approximation (SCA) method, which converges and, thanks to the co‑SHF reduction in optimization variables and constraints, achieves low computational complexity. Numerical results show that, compared with time‑discretization and no‑jamming benchmarks, the proposed co‑SHF design improves the min‑secrecy and user fairness while requiring significantly less runtime.
Authors: Huixiang Zhang, Mahzabeen Emu, Octavia A. Dobre
Abstract: Next‑generation Unmanned Aerial Vehicle (UAV) communication networks must maintain reliable connectivity under rapid topology changes, fluctuating link quality, and time‑critical data exchange. Existing topology control methods rely on global optimization to produce a single optimal topology or involve high computational complexity, which limits adaptability in dynamic environments. This paper presents a two‑stage quantum‑assisted framework for efficient and resilient topology control in dynamic UAV networks by exploiting quantum parallelism to generate a set of high‑quality and structurally diverse candidate topologies. In the offline stage, we formulate the problem as a Quadratic Unconstrained Binary Optimization (QUBO) model and leverage quantum annealing (QA) to parallelly sample multiple high‑quality and structurally distinct topologies, providing a rich solution space for adaptive decision‑making. In the online stage, a lightweight classical selection mechanism rapidly identifies the most suitable topology based on real‑time link stability and channel conditions, substantially reducing the computation delay. The simulation results show that, compared to a single static optimal topology, the proposed framework improves performance retention by 6.6% in a 30‑second dynamic window. Moreover, relative to the classic method, QA achieves an additional 5.15% reduction in objective value and a 28.3% increase in solution diversity. These findings demonstrate the potential of QA to enable fast and robust topology control for next‑generation UAV communication networks.
Authors: Yida Lin, Bing Xue, Mengjie Zhang, Sam Schofield, Richard Green
Abstract: Autonomous UAV forestry operations require robust depth estimation with strong cross‑domain generalization, yet existing evaluations focus on urban and indoor scenarios, leaving a critical gap for vegetation‑dense environments. We present the first systematic zero‑shot evaluation of eight stereo methods spanning iterative refinement, foundation model, diffusion‑based, and 3D CNN paradigms. All methods use officially released pretrained weights (trained on Scene Flow) and are evaluated on four standard benchmarks (ETH3D, KITTI 2012/2015, Middlebury) plus a novel 5,313‑pair Canterbury Tree Branches dataset (1920 × 1080). Results reveal scene‑dependent patterns: foundation models excel on structured scenes (BridgeDepth: 0.23 px on ETH3D; DEFOM: 4.65 px on Middlebury), while iterative methods show variable cross‑benchmark performance (IGEV++: 0.36 px on ETH3D but 6.77 px on Middlebury; IGEV: 0.33 px on ETH3D but 4.99 px on Middlebury). Qualitative evaluation on the Tree Branches dataset establishes DEFOM as the gold‑standard baseline for vegetation depth estimation, with superior cross‑domain consistency (consistently ranking 1st‑2nd across benchmarks, average rank 1.75). DEFOM predictions will serve as pseudo‑ground‑truth for future benchmarking.
Authors: Junhao Wei, Wenxuan Zhu, Qingyang Xu, Yanxiao Li, Yifu Zhao, Zikun Li, Ran Zhang, Yanzhao Gu, Jinhong Song, Yapeng Wang, Zhiwen Wang, Ngai Cheong, Sio-Kei Im, Xu Yang
Abstract: The Sparrow Search Algorithm (SSA), characterized by its simple structure and ease of implementation, nevertheless suffers from an insufficient balance between exploration and exploitation, making it prone to premature convergence and slow optimization progress. To address these shortcomings, this paper proposes a Geometric Sparrow Search Algorithm (GeoSSA). By integrating Good Nodes Set initialization, a Sine‑Cosine Enhanced Producer position update strategy, and a Triangular‑Walk Enhanced Edge Sparrow update strategy, GeoSSA significantly improves the global exploration ability, local exploitation efficiency, and convergence stability of the original SSA. To thoroughly validate the effectiveness of GeoSSA, we conducted ablation studies, qualitative analysis, and comparative experiments on 23 benchmark functions against state‑of‑the‑art algorithms. Experimental results show that GeoSSA achieves the best or near‑best performance in terms of average fitness, standard deviation, Wilcoxon tests, and Friedman rankings, with an Overall Effectiveness (OE) of 95.65%. Its overall performance is significantly superior to all compared algorithms. In three‑dimensional UAV path planning tasks, GeoSSA demonstrates excellent stability and superior path quality. In four categories of engineering design optimization problems, GeoSSA consistently attains the highest solution accuracy and strongest stability. GeoSSA not only exhibits outstanding global optimization performance on standard benchmark functions but also shows strong robustness and generalization ability in practical applications such as UAV path planning and engineering design. Therefore, GeoSSA provides an efficient and reliable solution framework for complex optimization problems.
Authors: Abdullah Khanfor, Raby Hamadi, Noureddine Lasla, Hakim Ghazzai
Abstract: UAVs have the potential to revolutionize urban management and provide valuable services to citizens. They can be deployed across diverse applications, including traffic monitoring, disaster response, environmental monitoring, and numerous other domains. However, this integration introduces novel security challenges that must be addressed to ensure safe and trustworthy urban operations. This paper provides a structured, evidence‑based synthesis of UAV applications in smart cities and their associated security challenges as reported in the literature over the last decade, with particular emphasis on developments from 2019 to 2025. We categorize these challenges into two primary classes: 1) cyber‑attacks targeting the communication infrastructure of UAVs and 2) unwanted or unauthorized physical intrusions by UAVs themselves. We examine the potential of Artificial Intelligence (AI) techniques in developing intrusion detection mechanisms to mitigate these security threats. We analyze how AI‑based methods, such as machine/deep learning for anomaly detection and computer vision for object recognition, can play a pivotal role in enhancing UAV security through unified detection systems that address both cyber and physical threats. Furthermore, we consolidate publicly available UAV datasets across network traffic and vision modalities suitable for Intrusion Detection Systems (IDS) development and evaluation. The paper concludes by identifying ten key research directions, including scalability, robustness, explainability, data scarcity, automation, hybrid detection, large language models, multimodal approaches, federated learning, and privacy preservation. Finally, we discuss the practical challenges of implementing UAV IDS solutions in real‑world smart city environments.
Authors: Venkatakrishna Reddy Oruganti
Abstract: Autonomous drone pursuit requires not only detecting drones but also predicting their trajectories in a manner that enables kinematically feasible interception. Existing tracking methods optimize for prediction accuracy but ignore pursuit feasibility, resulting in trajectories that are physically impossible to intercept 99.9% of the time. We propose Perception‑to‑Pursuit (P2P), a track‑centric temporal reasoning framework that bridges detection and actionable pursuit planning. Our method represents drone motion as compact 8‑dimensional tokens capturing velocity, acceleration, scale, and smoothness, enabling a 12‑frame causal transformer to reason about future behavior. We introduce the Intercept Success Rate (ISR) metric to measure pursuit feasibility under realistic interceptor constraints. Evaluated on the Anti‑UAV‑RGBT dataset with 226 real drone sequences, P2P achieves 28.12 pixel average displacement error and 0.597 ISR, representing a 77% improvement in trajectory prediction and 597x improvement in pursuit feasibility over tracking‑only baselines, while maintaining perfect drone classification accuracy (100%). Our work demonstrates that temporal reasoning over motion patterns enables both accurate prediction and actionable pursuit planning.
Authors: Yu Xia, Chang Liu, Tianqi Xiang, Zhigang Tu
Abstract: Real‑time small object detection in Unmanned Aerial Vehicle (UAV) imagery remains challenging due to limited feature representation and ineffective multi‑scale fusion. Existing methods underutilize frequency information and rely on static convolutional operations, which constrain the capacity to obtain rich feature representations and hinder the effective exploitation of deep semantic features. To address these issues, we propose EFSI‑DETR, a novel detection framework that integrates efficient semantic feature enhancement with dynamic frequency‑spatial guidance. EFSI‑DETR comprises two main components: (1) a Dynamic Frequency‑Spatial Unified Synergy Network (DyFusNet) that jointly exploits frequency and spatial cues for robust multi‑scale feature fusion, (2) an Efficient Semantic Feature Concentrator (ESFC) that enables deep semantic extraction with minimal computational cost. Furthermore, a Fine‑grained Feature Retention (FFR) strategy is adopted to incorporate spatially rich shallow features during fusion to preserve fine‑grained details, crucial for small object detection in UAV imagery. Extensive experiments on VisDrone and CODrone benchmarks demonstrate that our EFSI‑DETR achieves the state‑of‑the‑art performance with real‑time efficiency, yielding improvement of 1.6% and 5.8% in AP and AP_s on VisDrone, while obtaining 188 FPS inference speed on a single RTX 4090 GPU.
Authors: Valerii Serpiva, Artem Lykov, Jeffrin Sam, Aleksey Fedoseev, Dzmitry Tsetserukou
Abstract: We propose a novel Unmanned Aerial Vehicles (UAV) assisted creative capture system that leverages diffusion models to interpret high‑level natural language prompts and automatically generate optimal flight trajectories for cinematic video recording. Instead of manually piloting the drone, the user simply describes the desired shot (e.g., "orbit around me slowly from the right and reveal the background waterfall"). Our system encodes the prompt along with an initial visual snapshot from the onboard camera, and a diffusion model samples plausible spatio‑temporal motion plans that satisfy both the scene geometry and shot semantics. The generated flight trajectory is then executed autonomously by the UAV to record smooth, repeatable video clips that match the prompt. User evaluation using NASA‑TLX showed a significantly lower overall workload with our interface (M = 21.6) compared to a traditional remote controller (M = 58.1), demonstrating a substantial reduction in perceived effort. Mental demand (M = 11.5 vs. 60.5) and frustration (M = 14.0 vs. 54.5) were also markedly lower for our system, confirming clear usability advantages in autonomous text‑driven flight control. This project demonstrates a new interaction paradigm: text‑to‑cinema flight, where diffusion models act as the "creative operator" converting story intentions directly into aerial motion.
Authors: Rongxin Huang, Guangfeng Lin, Wenbo Zhou, Zhirong Li, Wenhuan Wu
Abstract: Unmanned Aerial Vehicle (UAV) applications have become increasingly prevalent in aerial photography and object recognition. However, there are major challenges to accurately capturing small targets in object detection due to the imbalanced scale and the blurred edges. To address these issues, boundary and position information mining (BPIM) framework is proposed for capturing object edge and location cues. The proposed BPIM includes position information guidance (PIG) module for obtaining location information, boundary information guidance (BIG) module for extracting object edge, cross scale fusion (CSF) module for gradually assembling the shallow layer image feature, three feature fusion (TFF) module for progressively combining position and boundary information, and adaptive weight fusion (AWF) module for flexibly merging the deep layer semantic feature. Therefore, BPIM can integrate boundary, position, and scale information in image for small object detection using attention mechanisms and cross‑scale feature fusion strategies. Furthermore, BPIM not only improves the discrimination of the contextual feature by adaptive weight fusion with boundary, but also enhances small object perceptions by cross‑scale position fusion. On the VisDrone2021, DOTA1.0, and WiderPerson datasets, experimental results show the better performances of BPIM compared to the baseline Yolov5‑P2, and obtains the promising performance in the state‑of‑the‑art methods with comparable computation load.
Authors: Lingxiao Sun, Zhaoyang Zhang, Zihan Lin, Zirui Chen, Weijie Zhou, Zhaohui Yang, Tony Q. S. Quek
Abstract: Future sixth‑generation (6G) networks are expected to support low‑altitude wireless networks (LAWNs), where unmanned aerial vehicles (UAVs) and aerial robots operate in highly dynamic three‑dimensional environments under stringent latency, reliability, and autonomy requirements. In such scenarios, autonomous task execution at the network edge demands holistic coordination among sensing, communication, computing, and control (SC3) processes. Agentic Artificially Intelligent Radio Access Networks (Agentic AI‑RAN) offer a promising paradigm by enabling the edge network to function as an autonomous decision‑making entity for low‑altitude agents with limited onboard resources. In this article, we propose and design a task‑oriented Agentic AI‑RAN architecture that enables SC3 task execution within a single edge node. This integrated design tackles the fundamental problem of coordinating heterogeneous workloads in resource‑constrained edge environments. Furthermore, a representative low‑altitude embodied intelligence system is prototyped based on a general‑purpose Graphics Processing Unit (GPU) platform to demonstrate autonomous drone navigation in realistic settings. By leveraging the Multi‑Instance GPU (MIG) partitioning technique and the containerized deployment, the demonstration system achieves physical resource isolation while supporting tightly coupled coordination between real‑time communication and multimodal inference under a unified task framework. Experimental results demonstrate low closed‑loop latency, robust bidirectional communication, and stable performance under dynamic runtime conditions, highlighting its viability for mission‑critical low‑altitude wireless networks in 6G.
Authors: Wen Zhang, Aimin Wang, Geng Sun, Jiahui Li, Jiacheng Wang, Changyuan Zhao, Dusit Niyato
Abstract: The development of wireless power transfer (WPT) and Internet of Things (IoT) offers significant potential but faces challenges such as limited energy supply, dynamic environmental changes, and unstable transmission links. This paper presents an unmanned aerial vehicle (UAV)‑assisted data collection and WPT scheme to support batteryless sensor (BLS) networks in remote areas. In this system, BLSs harvest energy from the UAV and utilize the harvested energy to transmit the collected data back to the UAV. The goal is to maximize the collected data volume and fairness index while minimizing the UAV energy consumption. To achieve these objectives, an optimization problem is formulated to jointly optimize the transmit power and UAV trajectory. Due to the non‑convexity and dynamic nature of the problem, a deep reinforcement learning (DRL)‑based algorithm is proposed to solve the problem. Specifically, this algorithm integrates prioritized experience replay and the performer module to enhance system stability and accelerate convergence. Simulation results demonstrate that the proposed approach consistently outperforms benchmark schemes in terms of collected data volume, fairness, and UAV energy consumption.
Authors: Xiaoya Zheng, Geng Sun, Jiahui Li, Jiacheng Wang, Weijie Yuan, Qingqing Wu, Dusit Niyato, Abbas Jamalipour
Abstract: The low‑altitude economy (LAE) is an emerging economic paradigm which fosters integrated development across multiple fields. As a pivotal component of the LAE, low‑altitude uncrewed aerial vehicles (UAVs) can restore communication by serving as aerial relays between the post‑disaster areas and remote base stations (BSs). However, conventional approaches face challenges from vulnerable long‑distance links between the UAVs and remote BSs, and data bottlenecks arising from massive data volumes and limited onboard UAV resources. In this work, we investigate a low‑altitude multi‑UAV‑assisted data collection and semantic forwarding network, in which multiple UAVs collect data from ground users, form clusters, perform intra‑cluster data aggregation with semantic extraction, and then cooperate as virtual antenna array (VAAs) to transmit the extracted semantic information to a remote BS via collaborative beamforming (CB). We formulate a data collection and semantic forwarding multi‑objective optimization problem (DCSFMOP) that jointly maximizes both the user and semantic transmission rates while minimizing UAV energy consumption. The formulated DCSFMOP is a mixed‑integer nonlinear programming (MINLP) problem that is inherently NP‑hard and characterized by dynamically varying decision variable dimensionality. To address these challenges, we propose a large language model‑enabled alternating optimization approach (LLM‑AOA), which effectively handles the complex search space and variable dimensionality by optimizing different subsets of decision variables through tailored optimization strategies. Simulation results demonstrate that LLM‑AOA outperforms AOA by approximately 26.8% and 22.9% in transmission rate and semantic rate, respectively.
Authors: Hossein Mohammadalizadeh, Holger Karl
Abstract: Dynamic resource allocation to parallel queues is a cornerstone of network scheduling, yet classical solutions often fail when accounting for the overhead of switching delays to queues with superior link conditions. In particular, system performance is further degraded when switching delays are stochastic and inhomogeneous. In this domain, the myopic, Max‑Weight policy struggles, as it is agnostic to switching delays. This paper introduces ACI, a non‑myopic, frame‑based scheduling framework that directly amortizes these switching delays. We first use a Lyapunov drift analysis to prove that backlog‑driven ACI is throughput‑optimal with respect to a scaled capacity region; then validate ACI's effectiveness on multi‑UAV networks with an FSO backhaul. Finally, we demonstrate how adapting its core urgency metric provides the flexibility to navigate the throughput‑latency trade‑off.
Authors: Amir Habel, Ivan Snegirev, Elizaveta Semenyakina, Miguel Altamirano Cabrera, Jeffrin Sam, Fawad Mehboob, Roohan Ahmed Khan, Muhammad Ahsan Mustafa, Dzmitry Tsetserukou
Abstract: This paper presents Glove2UAV, a wearable IMU‑glove interface for intuitive UAV control through hand and finger gestures, augmented with vibrotactile warnings for exceeding predefined speed thresholds. To promote safer and more predictable interaction in dynamic flight, Glove2UAV is designed as a lightweight and easily deployable wearable interface intended for real‑time operation. Glove2UAV streams inertial measurements in real time and estimates palm and finger orientations using a compact processing pipeline that combines median‑based outlier suppression with Madgwick‑based orientation estimation. The resulting motion estimations are mapped to a small set of control primitives for directional flight (forward/backward and lateral motion) and, when supported by the platform, to object‑interaction commands. Vibrotactile feedback is triggered when flight speed exceeds predefined threshold values, providing an additional alert channel during operation. We validate real‑time feasibility by synchronizing glove signals with UAV telemetry in both simulation and real‑world flights. The results show fast gesture‑based command execution, stable coupling between gesture dynamics and platform motion, correct operation of the core command set in our trials, and timely delivery of vibratile warning cues.
Authors: Abd Ullah Khan, Wali Ullah Khan, Haejoon Jung, Hyundong Shin
Abstract: Unmanned aerial vehicles (UAVs) with multi‑connectivity (MC) capabilities efficiently and reliably transfer data between terrestrial networks (TNs) and non‑terrestrial networks (NTNs). However, optimally sharing and allocating spectrum and power resources to maintain MC while ensuring reliable connectivity and optimal performance remains challenging in such networks. Channel variations induced by mobility in UAV networks, coupled with the varying quality of service (QoS) demands of heterogeneous devices, resource sharing, and fairness requirements in capacity distribution pose challenges to optimal resource allocation. Thus, this paper investigates resource allocation for QoS‑constrained, MC‑enabled, dynamic UAVs in an integrated TN‑NTN environment with spectrum sharing and fairness considerations. To this end, we consider three types of links: UAV‑to‑radio base station (RBS), UAV‑to‑UAV, and UAV‑to‑HAP. We also assume two types of UAVs with diverse QoS requirements to reflect a practical scenario. Consequently, we propose two algorithms. The first algorithm maximizes the capacity of UAVs‑RBS and UAVs‑HAP links while ensuring the reliability of the UAV‑UAV link. To achieve this, the algorithm maximizes the collective throughput of the UAVs by optimizing the sum capacity of all the UAV‑RBS and UAV‑HAP links. Next, to provide constant capacity to all links and ensure fairness, we propose another algorithm that maximizes the minimum capacity across all links. We validate the performance of both algorithms through simulation
Authors: Berk Ciloglu, Ozgun Ersoy, Metin Ozturk, Ali Gorcin
Abstract: In disaster scenarios, ensuring both reliable communication and situational awareness becomes a critical challenge due to the partial or complete collapse of terrestrial networks. This paper proposes an integrated sensing and communication (ISAC) over non‑terrestrial networks (NTN) architecture referred to as ISAC‑over‑NTN that integrates multiple uncrewed aerial vehicles (UAVs) and a high‑altitude platform station (HAPS) to maintain resilient and reliable network operations in post‑disaster conditions. We aim to achieve two main objectives: i) provide a reliable communication infrastructure, thereby ensuring the continuity of search‑and‑rescue activities and connecting people to their loved ones, and ii) detect users, such as those trapped under rubble or those who are mobile, using a Doppler‑based mobility detection model. We employ an innovative beamforming method that simultaneously transmits data and detects Doppler‑based mobility by integrating multi‑user multiple‑input multiple‑output (MU‑MIMO) communication and monostatic sensing within the same transmission chain. The results show that the proposed framework maintains reliable connectivity and achieves high detection accuracy of users in critical locations, reaching 90% motion detection sensitivity and 88% detection accuracy.
Authors: Faryal Batool, Iana Zhura, Valerii Serpiva, Roohan Ahmed Khan, Ivan Valuev, Issatay Tokmurziyev, Dzmitry Tsetserukou
Abstract: Reliable human‑‑robot collaboration in emergency scenarios requires autonomous systems that can detect humans, infer navigation goals, and operate safely in dynamic environments. This paper presents HumanDiffusion, a lightweight image‑conditioned diffusion planner that generates human‑aware navigation trajectories directly from RGB imagery. The system combines YOLO‑11 based human detection with diffusion‑driven trajectory generation, enabling a quadrotor to approach a target person and deliver medical assistance without relying on prior maps or computationally intensive planning pipelines. Trajectories are predicted in pixel space, ensuring smooth motion and a consistent safety margin around humans. We evaluate HumanDiffusion in simulation and real‑world indoor mock‑disaster scenarios. On a 300‑sample test set, the model achieves a mean squared error of 0.02 in pixel‑space trajectory reconstruction. Real‑world experiments demonstrate an overall mission success rate of 80% across accident‑response and search‑and‑locate tasks with partial occlusions. These results indicate that human‑conditioned diffusion planning offers a practical and robust solution for human‑aware UAV navigation in time‑critical assistance settings.
Authors: Francesco Rossato, Mattia Figaro, Alessandro Traspadini, Takayuki Shimizu, Chinmay Mahabal, Sanjeewa Herath, Chunghan Lee, Dogan Kutay Pekcan, Michele Zorzi, Marco Giordani
Abstract: As 5th generation (5G) networks continue to evolve, there is a growing interest toward the integration of Terrestrial Networks (TNs) and Non‑Terrestrial Networks (NTNs). Specifically, NTNs leverage space/air base stations such as satellites, High Altitude Platforms (HAPs), and Unmanned Aerial Vehicles (UAVs) for expanding wireless coverage to underserved rural/remote areas, supporting emergency communications, and offloading traffic in highly congested urban environments. In this paper we focus on the 3GPP 5G NR‑NTN standard in the context of satellite communication networks, and highlight critical challenges that must be addressed for proper full‑stack protocol design, with considerations related to the PHY, MAC, and higher layers. We also present simulation results in ns‑3 to demonstrate the impact of some of these challenges on the network, as an initial step toward more advanced standardization activities on 3GPP 5G NR‑NTN.
Authors: Thuan Minh Nguyen, Vu Tuan Truong, Long Bao Le
Abstract: The integration of agentic AI, powered by large language models (LLMs) with autonomous reasoning, planning, and execution, into unmanned aerial vehicle (UAV) swarms opens new operational possibilities and brings the vision of the Internet of Drones closer to reality. However, infrastructure constraints, dynamic environments, and the computational demands of multi‑agent coordination limit real‑world deployment in high‑risk scenarios such as wildfires and disaster response. This paper investigates the integration of LLM‑based agentic AI and edge computing to realize scalable and resilient autonomy in UAV swarms. We first discuss three architectures for supporting UAV swarms ‑ standalone, edge‑enabled, and edge‑cloud hybrid deployment ‑ each optimized for varying autonomy and connectivity levels. Then, a use case for wildfire search and rescue (SAR) is designed to demonstrate the efficiency of the edge‑enabled architecture, enabling high SAR coverage, reduced mission completion times, and a higher level of autonomy compared to traditional approaches. Finally, we highlight open challenges in integrating LLMs and edge computing for mission‑critical UAV‑swarm applications.
Authors: Shrief Rizkalla, Adrian Kliks, Nila Bagheri, Miguel A. Bellido-Manganell, Aniruddha Chandra, Anja Dakic, Laura Finarelli, Davy Gaillot, Matti Hamalainen, Ruisi He, Markus Hofer, Sandaruwan Jayaweera, Francesco Linsalata, Konstantin Mikhaylov, Jon M. Peha, Ibrahim Rashdan, Gianluca Rizzo, Abdul Saboor, Martin Schmidhammer, Michal Sybis, Fredrik Tufvesson, Paul Unterhuber, Fernando J. Velez, Evgenii Vinogradov, Michael Walter, Thomas Zemen, Haibin Zhang, Zhengyu Zhang
Abstract: This white paper aims to comprehensively analyze and consolidate the state of the art in communication technologies supporting modern and future Information and Communication Technology (ICT). Its primary objective is to establish a common understanding of how communication solutions enable automation, safety, and efficiency across multiple transport domains, including railways, road vehicles, aircraft, and unmanned aerial vehicles. The document seeks to identify key communication requirements and technological enablers necessary for interoperable and reliable ITS operation. It also assesses the limitations of current systems and proposes pathways for integrating emerging technologies such as 5G, Sixth Generation (6G), and Artificial Intelligence (AI)‑driven network control. The white paper also intends to support harmonization between different transport modes through a unified framework for communication modeling, testing, and standardization. It highlights the importance of accurate channel modeling and empirical validation to design efficient, robust, and scalable systems. Another objective is to explore the use of reconfigurable intelligent surfaces, integrated sensing and communication, and digital twin concepts within ITS. The document emphasizes the role of spectrum management and standardization efforts in ensuring interoperability among diverse communication systems. Finally, the paper seeks to stimulate collaboration among academia, industry, and standardization bodies to advance the design of resilient and adaptive communication infrastructures for future transportation systems.
Authors: Babacar Toure, Dimitrios Tsilimantos, Omid Esrafilian, Marios Kountouris
Abstract: Due to their adaptability and mobility, Unmanned Aerial Vehicles (UAVs) are becoming increasingly essential for wireless network services, particularly for data harvesting tasks. In this context, Artificial Intelligence (AI)‑based approaches have gained significant attention for addressing UAV path planning tasks in large and complex environments, bridging the gap with real‑world deployments. However, many existing algorithms suffer from limited training data, which hampers their performance in highly dynamic environments. Moreover, they often overlook the inherently multi‑objective nature of the task, treating it in an overly simplistic manner. To address these limitations, we propose an attention‑based Multi‑Objective Reinforcement Learning (MORL) architecture that explicitly handles the trade‑off between data collection and energy consumption in urban environments, even without prior knowledge of wireless channel conditions. Our method develops a single model capable of adapting to varying trade‑off preferences and dynamic scenario parameters without the need for fine‑tuning or retraining. Extensive simulations show that our approach achieves substantial improvements in performance, model compactness, sample efficiency, and most importantly, generalization to previously unseen scenarios, outperforming existing RL solutions.
Authors: Sander Doodeman, Paula Chanfreut Palacio, Elena Torta, Duarte Antunes
Abstract: This paper studies the impact of rigidly attached heavy payload placement ‑ where the payload mass significantly influences the UAV's dynamics ‑ on the stability and control performance of a multirotor unmanned aerial vehicle (UAV). In particular, we focus on how the position of such a payload relative to the vehicle's Center of Gravity (CoG) affects the stability and control performance at an arbitrary point of interest on the UAV, such as the payload position, and on how this position can be optimized. Our conclusions are based on two key contributions. First, we analyze the stability of the zero‑dynamics of a complete nonlinear model of the UAV with payload. We demonstrate that the stability of the zero dynamics depends on the vertical signed distance in the body‑fixed frame between the controlled output position and the combined CoG of the UAV with payload. Specifically, positioning the output below the CoG yields unstable zero dynamics, while the linearized zero dynamics are marginally stable when placing it above, indicating reduced sensitivity to input disturbances. Second, we analyze the performance of the linearized UAV model with payload by providing an analytical expression for the H2‑norm, from which we can quantify the system's attenuation to white noise input disturbances. We conclude that less control authority leads to a higher optimal position of the controlled output with respect to the CoG for closed‑loop white‑noise disturbance rejection capabilities, also when the heavy payload is the controlled output. The results are illustrated through numerical examples.
Authors: Myong-Yol Choi, Hankyoul Ko, Hanse Cho, Changseung Kim, Seunghwan Kim, Jaemin Seo, Hyondong Oh
Abstract: This paper presents a deep reinforcement learning (DRL) based controller for collective navigation of unmanned aerial vehicle (UAV) swarms in communication‑denied environments, enabling robust operation in complex, obstacle‑rich environments. Inspired by biological swarms where informed individuals guide groups without explicit communication, we employ an implicit leader‑follower framework. In this paradigm, only the leader possesses goal information, while follower UAVs learn robust policies using only onboard LiDAR sensing, without requiring any inter‑agent communication or leader identification. Our system utilizes LiDAR point clustering and an extended Kalman filter for stable neighbor tracking, providing reliable perception independent of external positioning systems. The core of our approach is a DRL controller, trained in GPU‑accelerated Nvidia Isaac Sim, that enables followers to learn complex emergent behaviors ‑ balancing flocking and obstacle avoidance ‑ using only local perception. This allows the swarm to implicitly follow the leader while robustly addressing perceptual challenges such as occlusion and limited field‑of‑view. The robustness and sim‑to‑real transfer of our approach are confirmed through extensive simulations and challenging real‑world experiments with a swarm of five UAVs, which successfully demonstrated collective navigation across diverse indoor and outdoor environments without any communication or external localization.
Authors: Mahmud S. Zango, Jianglin Lan
Abstract: Autonomous navigation for nano‑scale unmanned aerial vehicles (nano‑UAVs) is governed by extreme Size, Weight, and Power (SWaP) constraints (with the weight < 50 g and sub‑100 mW onboard processor), distinguishing it fundamentally from standard robotic paradigms. This review synthesizes the state‑of‑the‑art in sensing, computing, and control architectures designed specifically for these sub‑ 100mW computational envelopes. We critically analyse the transition from classical geometry‑based methods to emerging "Edge AI" paradigms, including quantized deep neural networks deployed on ultra‑low‑power System‑on‑Chips (SoCs) and neuromorphic event‑based control. Beyond algorithms, we evaluate the hardware‑software co‑design requisite for autonomy, covering advancements in dense optical flow, optimized Simultaneous Localization and Mapping (SLAM), and learning‑based flight control. While significant progress has been observed in visual navigation and relative pose estimation, our analysis reveals persistent gaps in long‑term endurance, robust obstacle avoidance in dynamic environments, and the "Sim‑to‑Real" transfer of reinforcement learning policies. This survey provides a roadmap for bridging these gaps, advocating for hybrid architectures that fuse lightweight classical control with data‑driven perception to enable fully autonomous, agile nano‑UAVs in GPS‑denied environments.
Authors: Jacob Swindell, Marija Popović, Riccardo Polvara
Abstract: Accurate agricultural weed mapping using unmanned aerial vehicles (UAVs) is crucial for precision farming. While traditional methods rely on rigid, pre‑defined flight paths and intensive offline processing, informative path planning (IPP) offers a way to collect data adaptively where it is most needed. Gaussian process (GP) mapping provides a continuous model of weed distribution with built‑in uncertainty. However, GPs must be discretised for practical use in autonomous planning. Many discretisation techniques exist, but the impact of discrete representation choice remains poorly understood. This paper investigates how different discrete GP representations influence both mapping quality and mission‑level performance in UAV‑based weed mapping. Considering a UAV equipped with a downward‑facing camera, we implement a receding‑horizon IPP strategy that selects sampling locations based on the map uncertainty, travel cost, and coverage penalties. We investigate multiple discretisation strategies for representing the GP posterior and use their induced map partitions to generate candidate viewpoints for planning. Experiments on real‑world weed distributions show that representation choice significantly affects exploration behaviour and efficiency. Overall, our results demonstrate that discretisation is not only a representational detail but a key design choice that shapes planning dynamics, coverage efficiency, and computational load in online UAV weed mapping.
Authors: Harry Huang, Talia Xu, Marco Zúñiga Zamalloa
Abstract: Micro‑Unmanned Aerial Vehicles (UAVs) are rapidly expanding into tasks from inventory to environmental sensing, yet their short endurance and unreliable navigation in GPS‑denied spaces limit deployment. Lighter‑Than‑Air (LTA) drones offer an energy‑efficient alternative: they use a helium envelope to provide buoyancy, which enables near‑zero‑power drain during hovering and much longer operation. LTAs are promising, but their design is complex, and they lack integrated solutions to enable sustained autonomous operations and navigation with simple, low‑infrastructure.
We propose a compact, self‑sustaining LTA drone that uses light for both energy harvesting and navigation. Our contributions are threefold: (i) a high‑fidelity simulation framework to analyze LTA aerodynamics and select a stable, efficient configuration; (ii) a framework to integrate solar cells on the envelope to provide net‑positive energy; and (iii) a point‑and‑go navigation system with three light‑seeking algorithms operating on a single light beacon.
Our LTA‑analysis, together with the integrated solar panels, not only saves energy while flying, but also enables sustainable operation: providing 1 minute of flying time for every 4 minutes of energy harvesting, under illuminations of 80klux. We also demonstrate robust single‑beacon navigation towards a light source that can be up to 7m away, in indoor and outdoor environments, even with moderate winds. The resulting system indicates a plausible path toward persistent, autonomous operation for indoor and outdoor monitoring. More broadly, this work provides a practical pathway for translating the promise of LTA drones into a persistent, self‑sustaining aerial system.
Authors: Evangelos Ntouros, Ewoud J. J. Smeur
Abstract: This paper develops a guidance control law based on a parametric Guiding Vector Field (GVF) and integrates it with a state‑of‑the‑art acceleration and attitude control architecture for tailsitters. The resulting framework enables a direct comparison between traditional trajectory‑tracking guidance and GVF‑based path‑following guidance using a realistic tailsitter model operating under windy conditions. Through extensive simulations, it is shown that for agile flight scenarios with wind and small initial position error, both guidance strategies achieve comparable tracking performance, indicating that the additional complexity introduced by the GVF formulation is not always justified. However, the GVF‑based approach exhibits an advantage when initial deviation from the path is present, yielding smooth and well‑behaved convergence toward the desired path. Two additional contributions support this evaluation. First, a modification of the parametric GVF is proposed that guarantees exponential stability of the tracking error dynamics for a single integrator system. Second, the differential flatness transform of a tailsitter vehicle is extended to account for explicit knowledge of the wind velocity vector.
Authors: Kaleem Arshid, Ali Krayani, Lucio Marcenaro, David Martin Gomez, Carlo Regazzoni
Abstract: This paper proposes an Active Inference‑based framework for autonomous trajectory design in UAV swarms. The method integrates probabilistic reasoning and self‑learning to enable distributed mission allocation, route ordering, and motion planning. Expert trajectories generated using a Genetic Algorithm with Repulsion Forces (GA‑RF) are employed to train a hierarchical World Model capturing swarm behavior across mission, route, and motion levels. During online operation, UAVs infer actions by minimizing divergence between current beliefs and model‑predicted states, enabling adaptive responses to dynamic environments. Simulation results show faster convergence, higher stability, and safer navigation than Q‑Learning, demonstrating the scalability and cognitive grounding of the proposed framework for intelligent UAV swarm control.
Authors: Yulu Han, Ziye Jia, Jingjing Zhao, Lijun He, Yao Wu, Qihui Wu
Abstract: The unmanned aerial vehicle (UAV) network plays important roles in emergency communications. However, it is challenging to design reliable routing strategies that ensure low latency, energy efficiency, and security in the dynamic and attack‑prone environments. To this end, we design a secure routing architecture integrating software‑defined networking (SDN) for centralized control and blockchain for tamper‑proof trust management. In particular, a novel security degree metric is introduced to quantify the UAV trustworthiness. Based on this architecture, we propose a beam search‑proximal policy optimization (BSPPO) algorithm, where beam search (BS) pre‑screens the high‑security candidate paths, and proximal policy optimization (PPO) performs hop‑by‑hop routing decisions to support dynamic rerouting upon attack detections. Finally, extensive simulations under varying attack densities, packet sizes, and rerouting events demonstrate that BSPPO outperforms PPO, BS‑Q learning, and BS‑actor critic in terms of delay, energy consumption, and transmission success rate, showing the outstanding robustness and adaptability.
Authors: Manobendu Sarker, Md. Zoheb Hassan, Xianbin Wang
Abstract: In this paper, we investigate the uplink (UL) radio resource management for 5G aerial corridors with an open‑radio access network (O‑RAN)‑enabled cell‑free (CF) massive multiple‑input multiple‑output (mMIMO) system. Our objective is to maximize the minimum spectral efficiency (SE) by jointly optimizing unmanned aerial vehicle (UAV)‑open radio unit (O‑RU) association and UL transmit power under quality‑of‑service (QoS) constraints. Owing to its NP‑hard nature, the formulated problem is decomposed into two tractable sub‑problems solved via alternating optimization (AO) using two computationally efficient algorithms. We then propose (i) a QoS‑driven and multi‑connectivity‑enabled association algorithm incorporating UAV‑centric and O‑RU‑centric criteria with targeted refinement for weak UAVs, and (ii) a bisection‑guided fixed‑point power control algorithm achieving global optimality with significantly reduced complexity, hosted as xApp at the near‑real‑time (near‑RT) RAN intelligent controller (RIC) of O‑RAN. Solving the resource‑allocation problem requires global channel state information (CSI), which incurs substantial measurement and signaling overhead. To mitigate this, we leverage a channel knowledge map (CKM) within the O‑RAN non‑RT RIC to enable efficient environment‑aware CSI inference. Simulation results show that the proposed framework achieves up to 440% improvement in minimum SE, 100% QoS satisfaction and fairness, while reducing runtime by up to 99.7% compared to an interior point solver‑based power allocation solution, thereby enabling O‑RAN compliant real‑time deployment.
Authors: Prosenjit Chatterjee, ANK Zaman
Abstract: The rapid proliferation of airborne platforms, including commercial aircraft, drones, and UAVs, has intensified the need for real‑time, automated threat assessment systems. Current approaches depend heavily on manual monitoring, resulting in limited scalability and operational inefficiencies. This work introduces a dual‑task model based on EfficientNetB4 capable of performing airborne object classification and threat‑level prediction simultaneously. To address the scarcity of clean, balanced training data, we constructed the AODTA Dataset by aggregating and refining multiple public sources. We benchmarked our approach on both the AVD Dataset and the newly developed AODTA Dataset and further compared performance against a ResNet‑50 baseline, which consistently underperformed EfficientNetB4. Our EfficientNetB4 model achieved 96% accuracy in object classification and 90% accuracy in threat‑level prediction, underscoring its promise for applications in surveillance, defense, and airspace management. Although the title references detection, this study focuses specifically on classification and threat‑level inference using pre‑localized airborne object images provided by existing datasets.
Authors: Suguru Sato, Kamesh Subbarao
Abstract: This paper presents a three‑dimensional, hydrodynamics‑inspired collision avoidance framework for uncrewed aerial vehicle (UAV) formations operating in dynamic environments. When moving obstacles enter a UAV's sensing region, they are modeled as three dimensional doublets or ellipsoids that generate local velocity fields, guiding nearby UAVs to execute smooth, collision‑free maneuvers without trajectory discontinuities or explicit trajectory replanning. This flow‑based approach enables real‑time operation and interpretable behavior by leveraging the nature of fluid flow around obstacles via the harmonic properties of Laplace's equation, inherently avoiding local minima common in traditional potential field methods. To establish and maintain coordination among the UAVs, a Virtual Rigid Body (VRB) formation strategy is integrated, ensuring that formation geometry and trajectory tracking are preserved. Simulation results demonstrate the feasibility and scalability of the method for both individual and multi‑UAV scenarios with multiple formation geometries encountering moving obstacles. The proposed approach achieves safe, smooth, and computationally efficient avoidance maneuvers suitable for real‑time and practical applications.
Authors: Abdelrahman Ramadan, Zahra Dorbeigi Namaghi, Emily Taylor, Lucas Edwards, Xan Giuliani, David S. McLagan, Sidney Givigi, Melissa Greeff
Abstract: Wildfire monitoring requires high‑resolution atmospheric measurements, yet low‑cost sensors on Unmanned Aerial Vehicles (UAVs) exhibit baseline drift, cross‑sensitivity, and response lag that corrupt concentration estimates. Traditional deep learning denoising approaches demand large datasets impractical to obtain from limited UAV flight campaigns. We present PC^2DAE, a physics‑informed denoising autoencoder that addresses data scarcity by embedding physical constraints directly into the network architecture. Non‑negative concentration estimates are enforced via softplus activations and physically plausible temporal smoothing, ensuring outputs are physically admissible by construction rather than relying on loss function penalties. The architecture employs hierarchical decoder heads for Black Carbon, Gas, and CO_2 sensor families, with two variants: PC^2DAE‑Lean (21k parameters) for edge deployment and PC^2DAE‑Wide (204k parameters) for offline processing. We evaluate on 7,894 synchronized 1 Hz samples collected from UAV flights during prescribed burns in Saskatchewan, Canada (approximately 2.2 hours of flight data), two orders of magnitude below typical deep learning requirements. PC^2DAE‑Lean achieves 67.3% smoothness improvement and 90.7% high‑frequency noise reduction with zero physics violations. Five baselines (LSTM‑AE, U‑Net, Transformer, CBDAE, DeSpaWN) produce 15‑‑23% negative outputs. The lean variant outperforms wide (+5.6% smoothness), suggesting reduced capacity with strong inductive bias prevents overfitting in data‑scarce regimes. Training completes in under 65 seconds on consumer hardware.
Authors: Amir Farzin Nikkhah, Dong Chen, Bradford Campbell, Somayeh Asadi, Arsalan Heydarian
Abstract: Unmanned Aerial Vehicles (UAVs) are transforming infrastructure inspections in the Architecture, Engineering, Construction, and Facility Management (AEC+FM) domain. By synthesizing insights from over 150 studies, this review paper highlights UAV‑based methodologies for data acquisition, photogrammetric modeling, defect detection, and decision‑making support. Key innovations include path optimization, thermal integration, and advanced machine learning (ML) models such as YOLO and Faster R‑CNN for anomaly detection. UAVs have demonstrated value in structural health monitoring (SHM), disaster response, urban infrastructure management, energy efficiency evaluations, and cultural heritage preservation. Despite these advancements, challenges in real‑time processing, multimodal data fusion, and generalizability remain. A proposed workflow framework, informed by literature and a case study, integrates RGB imagery, LiDAR, and thermal sensing with transformer‑based architectures to improve accuracy and reliability in detecting structural defects, thermal anomalies, and geometric inconsistencies. The proposed framework ensures precise and actionable insights by fusing multimodal data and dynamically adapting path planning for complex environments, presented as a comprehensive step‑by‑step guide to address these challenges effectively. This paper concludes with future research directions emphasizing lightweight AI models, adaptive flight planning, synthetic datasets, and richer modality fusion to streamline modern infrastructure inspections.
Authors: Anna Abramowicz, Michal Laska, Adam Nadudvari, Oimahmad Rahmonov
Abstract: The study aimed to evaluate the applicability of environmental indices in the monitoring of smouldering coal‑waste dumps. A dump located in the Upper Silesian Coal Basin served as the research site for a multi‑method analysis combining remote sensing and field‑based data. Two UAV survey campaigns were conducted, capturing RGB, infrared, and multispectral imagery. These were supplemented with direct ground measurements of subsurface temperature and detailed vegetation mapping. Additionally, publicly available satellite data from the Landsat and Sentinel missions were analysed. A range of vegetation and fire‑related indices (NDVI, SAVI, EVI, BAI, among others) were calculated to identify thermally active zones and assess vegetation conditions within these degraded areas. The results revealed strong seasonal variability in vegetation indices on thermally active sites, with evidence of disrupted vegetation cycles, including winter greening in moderately heated root zones ‑ a pattern indicative of stress and degradation processes. While satellite data proved useful in reconstructing the fire history of the dump, their spatial resolution was insufficient for detailed monitoring of small‑scale thermal anomalies. The study highlights the diagnostic potential of UAV‑based remote sensing in post‑industrial environments undergoing land degradation but emphasises the importance of field validation for accurate environmental assessment.
Authors: Steffen Knoblauch, Ram Kumar Muthusamy, Hao Li, Iddy Chazua, Benedcto Adamu, Innocent Maholi, Alexander Zipf
Abstract: Climate change is intensifying human heat exposure, particularly in densely built urban centers of the Global South. Low‑cost construction materials and high thermal‑mass surfaces further exacerbate this risk. Yet scalable methods for assessing such heat‑relevant building attributes remain scarce. We propose a machine learning framework that fuses openly available unmanned aerial vehicle (UAV) and street‑view (SV) imagery via a coupled global context vision transformer (CGCViT) to learn heat‑relevant representations of urban structures. Thermal infrared (TIR) measurements from HotSat‑1 are used to quantify the relationship between building attributes and heat‑associated health risks. Our dual‑modality cross‑view learning approach outperforms the best single‑modality models by up to 9.3%, demonstrating that UAV and SV imagery provide valuable complementary perspectives on urban structures. The presence of vegetation surrounding buildings (versus no vegetation), brighter roofing (versus darker roofing), and roofing made of concrete, clay, or wood (versus metal or tarpaulin) are all significantly associated with lower HotSat‑1 TIR values. Deployed across the city of Dar es Salaam, Tanzania, the proposed framework illustrates how household‑level inequalities in heat exposure ‑ often linked to socio‑economic disadvantage and reflected in building materials ‑ can be identified and addressed using machine learning. Our results point to the critical role of localized, data‑driven risk assessment in shaping climate adaptation strategies that deliver equitable outcomes.
Authors: Vishisht Sharma, Sam Leroux, Lisa Landuyt, Nick Witvrouwen, Pieter Simoens
Abstract: Effective disaster response relies on rapid disaster response, where oblique aerial video is the primary modality for initial scouting due to its ability to maximize spatial coverage and situational awareness in limited flight time. However, the on‑board processing of high‑resolution oblique streams is severely bottlenecked by the strict Size, Weight, and Power (SWaP) constraints of Unmanned Aerial Vehicles (UAVs). The computational density required to process these wide‑field‑of‑view streams precludes low‑latency inference on standard edge hardware. To address this, we propose Temporal Token Reuse (TTR), an adaptive inference framework capable of accelerating video segmentation on embedded devices. TTR exploits the intrinsic spatiotemporal redundancy of aerial video by formulating image patches as tokens; it utilizes a lightweight similarity metric to dynamically identify static regions and propagate their precomputed deep features, thereby bypassing redundant backbone computations. We validate the framework on standard benchmarks and a newly curated Oblique Floodwater Dataset designed for hydrological monitoring. Experimental results on edge‑grade hardware demonstrate that TTR achieves a 30% reduction in inference latency with negligible degradation in segmentation accuracy (< 0.5% mIoU). These findings confirm that TTR effectively shifts the operational Pareto frontier, enabling high‑fidelity, real‑time oblique video understanding for time‑critical remote sensing missions
Authors: Linxier Deng
Abstract: We present a theoretical framework for quantum key distribution (QKD) using orbital angular momentum (OAM) encoded BB84 on an unmanned aerial vehicle (UAV) platform. A unified channel model captures Kolmogorov turbulence, pointing induced misalignment, and finite aperture clipping, enabling quantitative predictions of inter mode crosstalk and the resulting quantum bit error rate (QBER). Using a weak plus vacuum decoy state formulation, we derive composable finite key lower bounds on the secret key rate that incorporate statistical fluctuations, detector dark counts, efficiency mismatch, and error correction leakage. To stabilize performance under non stationary flight conditions, we introduce a lightweight physics informed learning module that combines physical priors with measured link statistics to classify valid pulses, reject corrupted data, and recommend decoding strategies. We outline a complete evaluation pipeline including UAV system architecture, turbulence driven QBER maps, decoy optimization, finite key scaling, and AI calibration metrics. Simulations indicate that under moderate turbulence and milliradian level pointing jitter, the proposed AI assisted method can improve the secret key rate by 10 percent to 30 percent while preserving composable security.
Authors: Cuong Le, Symeon Chatzinotas, Thang X. Vu
Abstract: This paper addresses the joint optimization of trajectories and bandwidth allocation for multiple Unmanned Aerial Vehicles (UAVs) to enhance energy efficiency in the cooperative data collection problem. We focus on an important yet underestimated aspect of the system, where action synchronization across all UAVs is impossible. Since most existing learning‑based solutions are not designed to learn in this asynchronous environment, we formulate the trajectory planning problem as a Decentralized Partially Observable Semi‑Markov Decision Process and introduce an asynchronous multi‑agent learning algorithm to learn UAVs' cooperative policies. Once the UAVs' trajectory policies are learned, the bandwidth allocation can be optimally solved based on local observations at each collection point. Comprehensive empirical results demonstrate the superiority of the proposed method over other learning‑based and heuristic baselines in terms of both energy efficiency and mission completion time. Additionally, the learned policies exhibit robustness under varying environmental conditions.
Authors: Zhenyu Zhao, Tiankui Zhang, Xiaoxia Xu, Junjie Li, Yuanwei Liu, Wenjuan Xing
Abstract: Multi‑hop uncrewed aerial vehicle (UAV) networks are promising to extend the terrestrial network coverage. Existing multi‑hop UAV networks employ a single routing path by selecting the next‑hop forwarding node in a hop‑by‑hop manner, which leads to local congestion and increases traffic delays. In this paper, a novel traffic‑adaptive multipath routing method is proposed for multi‑hop UAV networks, which enables each UAV to dynamically split and forward traffic flows across multiple next‑hop neighbors, thus meeting latency requirements of diverse traffic flows in dynamic mobile environments. An on‑time packet delivery ratio maximization problem is formulated to determine the traffic splitting ratios at each hop. This sequential decision‑making problem is modeled as a decentralized partially observable Markov decision process (Dec‑POMDP). To solve this Dec‑POMDP, a novel multi‑agent deep reinforcement leaning (MADRL) algorithm, termed Independent Proximal Policy Optimization with Dirichlet Modeling (IPPO‑DM), is developed. Specifically, the IPPO serves as the core optimization framework, where the Dirichlet distribution is leveraged to parameterize a continuous stochastic policy network on the probability simplex, inherently ensuring feasible traffic splitting ratios. Simulation results demonstrate that IPPO‑DM outperforms benchmark schemes in terms of both delivery latency guarantee and packet loss performance.
Authors: Yiqin Deng, Zhengru Fang, Senkang Hu, Yanan Ma, Xiaoyu Guo, Haixia Zhang, Yuguang Fang
Abstract: This paper presents an innovative framework that boosts computing power by utilizing ubiquitous computing power distribution and enabling higher computing node accessibility via adaptive UAV positioning, establishing a UAV‑enabled Computing Power Network (UAV‑CPN). In a UAV‑CPN, a UAV functions as a dynamic relay, outsourcing computing tasks from the request zone to an expanded service zone with diverse computing nodes, including vehicle onboard units, edge servers, and dedicated powerful nodes. This approach has the potential to alleviate communication bottlenecks and overcome the "island effect" observed in multi‑access edge computing. A significant challenge is to quantify computing power performance under complex dynamics of communication and computing. To address this challenge, we introduce task completion probability to capture the capability of UAV‑CPNs for task computing. We further enhance UAV‑CPN performance under a hybrid energy architecture by jointly optimizing UAV altitude and transmit power, where fuel cells and batteries collectively power both UAV propulsion and communication systems. Extensive evaluations show significant performance gains, highlighting the importance of balancing communication and computing capabilities, especially under dual‑energy constraints. These findings underscore the potential of UAV‑CPNs to significantly boost computing power.
Authors: Maitiniyazi Maimaitijiang, Hillson Ghimire, Subash Thapa, Mohammad Maruf Billah, Shaurya Sehgal, Mandeep Singh, Swas Kaushal, Kushal Poudel, Santosh Subedi, Ubaid Ur Rehman Janjua, Lise-Olga Makonga, Jyotirmoy Halder, Harsimardeep S. Gill, Mazhar Sher, Jagdeep Singh Sidhu, Sunish K. Sehgal
Abstract: High‑throughput, low‑cost phenotyping remains a critical bottleneck in wheat breeding, genetics, and crop management. This is particularly evident in the measurement of complex yield components (i.e., spike and spikelet counts), disease and grain‑quality traits related to Fusarium Head Blight (FHB) and Fusarium‑Damaged Kernels (FDK), and microscale physiological traits such as density and size of stomata and aperture. We introduce WheatAI (wheatai.net), an AI‑powered web application designed to bridge the gap between advanced computer vision, AI and deep learning models, and high‑throughput phenotyping (HTP) and practical agricultural applications. WheatAI v1.0 provides an accessible, browser‑based interface that supports multiscale data ingestion from smartphones, Unmanned Aerial Vehicles (UAVs), and portable microscopes. The core functionalities of the platform include plot‑ and field‑scale assessment via UAV‑ and smartphone‑based wheat spike detection and counting, as well as smartphone‑based spikelet counting. Additionally, it offers grain quality assessment through FDK ratio estimation and kernel morphometric measurements, such as length, width, and area, derived from smartphone images of kernel samples. For leaf‑level analysis, WheatAI provides microscale phenotyping through automated stomatal counting, size, and aperture measurement from digital microscopy images. The system supports both single‑image and bulk processing via a guided upload‑and‑run workflow. This platform is designed to reduce labor costs and rater subjectivity while accelerating field‑to‑lab decision cycles. By providing standardized, image‑based outputs, WheatAI enables breeders, agronomists, and producers to implement high‑throughput selection and precision scouting at scale.
Authors: Yizhan Feng, Hichem Snoussi, Yuhang Wang, Jing Teng, Abel Cherouat, Tian Wang
Abstract: With large language models demonstrating significant potential in code generation tasks, their application to onboard control of resource‑constrained Unmanned Aerial Vehicles has emerged as an important research direction. However, a notable contradiction exists between the high resource consumption of large models and the real‑time, lightweight requirements of UAV platforms. This paper proposes an integrated approach that combines knowledge distillation, chain‑of‑thought guidance, and supervised fine‑tuning for UAV multi‑SDK control tasks, aiming to efficiently transfer complex reasoning and code generation capabilities to smaller models. Firstly, a high‑quality dataset covering various mainstream UAV SDKs is constructed, featuring instruction‑code‑reasoning chains, and incorporates counterfactual negative samples for data augmentation, guiding the model to learn the end‑to‑end logic from instruction parsing to code generation. Secondly, leveraging DeepSeek‑Coder‑V2‑Lite quantized via QLoRA as the teacher model, and based on a hybrid black‑box and white‑box distillation strategy, high‑quality chain‑of‑thought soft labels are generated. These are combined with a weighted cross‑entropy loss using hard labels to transfer complex reasoning capabilities to the smaller student model. Finally, through prompt tuning engineering optimized for the UAV control scenario, the model performance on core tasks such as SDK type recognition and function call matching is enhanced. Experimental results indicate that the distilled lightweight model maintains high code generation accuracy while achieving significant improvements in deployment and inference efficiency, effectively demonstrating the feasibility and superiority of our approach in achieving precise and lightweight intelligent control for UAVs
Authors: Yizhan Feng, Hichem Snoussi, Jing Teng, Jian Liu, Yuyang Wang, Abel Cherouat, Tian Wang
Abstract: The demand for real‑time visual understanding and interaction in complex scenarios is increasingly critical for unmanned aerial vehicles. However, a significant challenge arises from the contradiction between the high computational cost of large Vision language models and the limited computing resources available on UAV edge devices. To address this challenge, this paper proposes a lightweight multimodal task platform based on BLIP‑2, integrated with YOLO‑World and YOLOv8‑Seg models. This integration extends the multi‑task capabilities of BLIP‑2 for UAV applications with minimal adaptation and without requiring task‑specific fine‑tuning on drone data. Firstly, the deep integration of BLIP‑2 with YOLO models enables it to leverage the precise perceptual results of YOLO for fundamental tasks like object detection and instance segmentation, thereby facilitating deeper visual‑attention understanding and reasoning. Secondly, a content‑aware key frame sampling mechanism based on K‑Means clustering is designed, which incorporates intelligent frame selection and temporal feature concatenation. This equips the lightweight BLIP‑2 architecture with the capability to handle video‑level interactive tasks effectively. Thirdly, a unified prompt optimization scheme for multi‑task adaptation is implemented. This scheme strategically injects structured event logs from the YOLO models as contextual information into BLIP‑2's input. Combined with output constraints designed to filter out technical details, this approach effectively guides the model to generate accurate and contextually relevant outputs for various tasks.
Authors: Yinqiu Liu, Ruichen Zhang, Dusit Niyato, Abbas Jamalipour, Trung Q. Duong, Dong In Kim
Abstract: Nowadays, agentic AI is emerging as a transformative paradigm for next‑generation communication networks, promising to evolve large language models (LLMs) from passive chatbots into autonomous operators. However, unleashing this potential requires bridging the critical gap between abstract reasoning and physical actuation, a capability we term tool intelligence. In this article, we explore the landscape of tool engineering to empower agentic AI in communications. We first analyze the functionalities of tool intelligence and its effects on communications. We then propose a systematic review for tool engineering, covering the entire lifecycle from tool creation and discovery to selection, learning, and benchmarking. Furthermore, we present a case study on tool‑assisted uncrewed aerial vehicles (UAV) trajectory planning to demonstrate the realization of tool intelligence in communications. By introducing a teacher‑guided reinforcement learning approach with a feasibility shield, we enable agents to intelligently operate tools. They utilize external tools to eliminate navigational uncertainty while mastering cost‑aware scheduling under strict energy constraints. This article aims to provide a roadmap for building the tool‑augmented intelligent agents of the 6G era.
Authors: Fei Li, Lang Qiao, Jiahao Fan, Yijia Xu, Shawn M. Kaeppler, Zhou Zhang
Abstract: High‑resolution UAV photogrammetry has become a key technology for precision agriculture, enabling centimeter‑level crop monitoring and point‑level plant localization. However, point‑level maize localization in UAV imagery remains challenging due to (1) extremely small object‑to‑pixel ratios, typically less than 0.1%, (2) prohibitive computational costs of quadratic attention on ultra‑high‑resolution images larger than 3000 x 4000 pixels, and (3) agricultural scene‑specific complexities such as sparse object distribution and environmental variability that are poorly handled by general‑purpose vision models.
To address these challenges, we propose the Additive Kolmogorov‑Arnold Transformer (AKT), which replaces conventional multilayer perceptrons with Pade Kolmogorov‑Arnold Network (PKAN) modules to enhance functional expressivity for small‑object feature extraction, and introduces PKAN Additive Attention (PAA) to model multiscale spatial dependencies with reduced computational complexity. In addition, we present the Point‑based Maize Localization (PML) dataset, consisting of 1,928 high‑resolution UAV images with approximately 501,000 point annotations collected under real field conditions.
Extensive experiments show that AKT achieves an average F1‑score of 62.8%, outperforming state‑of‑the‑art methods by 4.2%, while reducing FLOPs by 12.6% and improving inference throughput by 20.7%. For downstream tasks, AKT attains a mean absolute error of 7.1 in stand counting and a root mean square error of 1.95‑1.97 cm in interplant spacing estimation. These results demonstrate that integrating Kolmogorov‑Arnold representation theory with efficient attention mechanisms offers an effective framework for high‑resolution agricultural remote sensing.
Authors: Chaoyi Lin Yang, Gabriele Dessena, Oscar E. Bonilla-Manrique
Abstract: Structural vibration testing plays a key role in aerospace engineering for evaluating dynamic behaviour, ensuring reliability and verifying structural integrity. These tests rely on accurate and robust data acquisition systems (DAQ) to capture high‑quality acceleration data. However, commercial DAQs that provide the required performance and features are often expensive and complex, limiting their accessibility for small‑scale research and experimental applications. This work presents the design and experimental validation of an affordable and in‑house‑developed acceleration DAQ, tested on a small fixed‑wing UAV through several Taxi Vibration Test (TVT) runs and ambient vibration measurements. The proposed system integrates several OrangePi 3 LTS single‑board computers with multiple LSM6DS3TR‑C MEMS inertial measurement units operating simultaneously via an Inter‑Integrated Circuit (I2C) communication interface, managed under a Python‑based master/slave architecture. Data is acquired at a stable sampling rate of approximately 208 Hz and post‑processed using Welch's method to estimate their Power Spectral Density (PSD). Results confirm the system ability to provide consistent multi‑sensor acceleration data and repeatable PSD profiles under the same test conditions; thus, demonstrating its reliability. With a total hardware cost below 600 EUR (approximately 690 USD), the developed DAQ offers a compact, scalable and cost‑effective alternative for aerospace vibration analysis and structural testing.
Authors: Andrei A. Korigodskii, Artem E. Vasiunik, Georgii A. Varin, Adilia M. Zukhurova, Matvei V. Urvantsev, Semen A. Osipenkov, Igor S. Efremov, Georgii E. Bondar
Abstract: The integration of autonomous unmanned aerial vehicles (UAVs) into large‑scale artistic projects has emerged as a new application in robotics. This paper presents the design, deployment, and testing of a novel multi‑drone system for automated mural painting in outdoor settings. This technology makes use of new software that coordinates multiple drones simultaneously, utilizing state‑machine algorithms for task execution. Key advancements are the complex positioning system that combines 2D localization using a single motion tracking camera with onboard LiDAR for precise positioning, and a novel flight control algorithm, which works differently along the trajectory and normally to it, ensuring smoothness and high precision of the drawings at the same time. A 100 square meters mural was created using the developed multi‑drone system, validating the system's efficacy. Compared to single‑drone approaches, our multi‑UAV solution significantly improves scalability and operational speed while maintaining high stability even in harsh weather conditions. The findings highlight the potential of autonomous robotic swarms in creative applications, paving the way for further advancements in large‑scale robotic art.
Authors: Md Sharif Hossen, Anil Gurses, Ozgur Ozdemir, Mihail Sichitiu, Ismail Guvenc
Abstract: Unmanned aerial vehicles (UAVs) can be critical for time‑sensitive data collection missions, yet existing research often relies on simulations that fail to capture real‑world complexities. Many studies assume ideal wireless conditions or focus only on path planning, neglecting the challenge of making real‑time decisions in dynamic environments. To bridge this gap, we address the problem of adaptive sensor selection for a data‑gathering UAV, considering both the buffered data at each sensor and realistic propagation conditions. We introduce the Hover‑based Greedy Adaptive Download (HGAD) strategy, designed to maximize data transfer by intelligently hovering over sensors during periods of peak signal quality. We validate HGAD using both a digital twin (DT) and a real‑world (RW) testbed at the NSF‑funded AERPAW platform. Our experiments show that HGAD significantly improves download stability and successfully meets per‑sensor data targets. When compared with the traditional Greedy approach that simply follows the strongest signal, HGAD is shown to outperform in the cumulative data download. This work demonstrates the importance of integrating signal‑to‑noise ratio (SNR)‑aware and buffer‑aware scheduling with DT and RW signal traces to design resilient UAV data‑mule strategies for realistic deployments.
Authors: Yiming Sun, Zifan Ye, Qinghua Hu, Pengfei Zhu
Abstract: Multi‑modal image fusion aims to integrate complementary information from multiple source images to produce high‑quality fused images with enriched content. Although existing approaches based on state space model have achieved satisfied performance with high computational efficiency, they tend to either over‑prioritize infrared intensity at the cost of visible details, or conversely, preserve visible structure while diminishing thermal target salience. To overcome these challenges, we propose DIFF‑MF, a novel difference‑driven channel‑spatial state space model for multi‑modal image fusion. Our approach leverages feature discrepancy maps between modalities to guide feature extraction, followed by a fusion process across both channel and spatial dimensions. In the channel dimension, a channel‑exchange module enhances channel‑wise interaction through cross‑attention dual state space modeling, enabling adaptive feature reweighting. In the spatial dimension, a spatial‑exchange module employs cross‑modal state space scanning to achieve comprehensive spatial fusion. By efficiently capturing global dependencies while maintaining linear computational complexity, DIFF‑MF effectively integrates complementary multi‑modal features. Experimental results on the driving scenarios and low‑altitude UAV datasets demonstrate that our method outperforms existing approaches in both visual quality and quantitative evaluation.
Authors: Xiao Fan, Wenkun Wen, Peiran Wu, Junhui Zhao, Minghua Xia
Abstract: Uncrewed aerial vehicles (UAVs) play a pivotal role in ensuring seamless connectivity for Internet of Things (IoT) devices, particularly in scenarios where conventional terrestrial networks are constrained or temporarily unavailable. However, traditional coverage‑hole detection approaches, such as minimizing drive tests, are costly, time‑consuming, and reliant on outdated radio‑environment data, making them unsuitable for real‑time applications. To address these limitations, this paper proposes a UAV‑assisted framework for real‑time detection and recovery of coverage holes in IoT networks. In the proposed scheme, a patrol UAV is first dispatched to identify coverage holes in regions where the operational status of terrestrial base stations (BSs) is uncertain. Once a coverage hole is detected, one or more UAVs acting as aerial BSs are deployed by a satellite or nearby operational BSs to restore connectivity. The UAV swarm is organized based on Delaunay triangulation, enabling scalable deployment and tractable analytical characterization using stochastic geometry. Moreover, a collision‑avoidance mechanism grounded in multi‑agent system theory ensures safe and coordinated motion among multiple UAVs. Simulation results demonstrate that the proposed framework achieves high efficiency in both coverage‑hole detection and on‑demand connectivity restoration while significantly reducing operational cost and time.
Authors: Wei Shi, Wei Xu, Yongming Huang, Jiacheng Yao, Wenhao Hu, Dongming Wang
Abstract: Low‑altitude wireless networks (LAWNs) are expected to play a central role in future 6G infrastructures, yet uplink transmissions of uncrewed aerial vehicles (UAVs) remain vulnerable to eavesdropping due to their limited transmit power, constrained antenna resources, and highly exposed air‑ground propagation conditions. To address this fundamental bottleneck, we propose a flexible‑duplex cell‑free (CF) architecture in which each distributed access point (AP) can dynamically operate either as a receive AP for UAV uplink collection or as a transmit AP that generates cooperative artificial noise (AN) for secrecy enhancement. Such AP‑level duplex flexibility introduces an additional spatial degree of freedom that enables distributed and adaptive protection against wiretapping in LAWNs. Building upon this architecture, we formulate a max‑min secrecy‑rate problem that jointly optimizes AP mode selection, receive combining, and AN covariance design. This tightly coupled and nonconvex optimization is tackled by first deriving the optimal receive combiners in closed form, followed by developing a penalty dual decomposition (PDD) algorithm with guaranteed convergence to a stationary solution. To further reduce computational burden, we propose a low‑complexity sequential scheme that determines AP modes via a heuristic metric and then updates the AN covariance matrices through closed‑form iterations embedded in the PDD framework. Simulation results show that the proposed flexible‑duplex architecture yields substantial secrecy‑rate gains over CF systems with fixed AP roles. The joint optimization method attains the highest secrecy performance, while the low‑complexity approach achieves over 90% of the optimal performance with an order‑of‑magnitude lower computational complexity, offering a practical solution for secure uplink communications in LAWNs.
Authors: Mattia Figaro, Francesco Rossato, Alexander Bonora, Marco Giordani, Giovanni Schembra, Michele Zorzi
Abstract: Reliable connectivity is critical for Public Protection and Disaster Relief operations, especially in rural or compromised environments where terrestrial infrastructure is unavailable. In such scenarios, NTNs, and specifically UAVs, are promising candidates to provide on‑demand and rapid connectivity on the ground, serving as aerial base stations. In this paper, we implement a setup in which a rotary‑wing UAV, equipped with a Starlink Mini terminal, provides Internet connectivity to an emergency ground user in the absence of cellular coverage via LEO satellites. The UAV functions as a Wi‑Fi access point, while backhauling the ground traffic through the Starlink constellation. We evaluate the system via both network simulations in ns‑3 and real‑world flight experiments in a rural environment, in terms of throughput, latency, coverage, and energy consumption under static and dynamic flight conditions. Our results demonstrate that the system can maintain a stable uplink throughput of approximately 30 Mbps up to approximately 200 meters, and with minimal impact on the UAV battery lifetime. These findings demonstrate the feasibility of deploying commercial LEO satellite terminals on UAVs as a practical solution for emergency connectivity.
Authors: Hengxing Cai, Yijie Rao, Ligang Huang, Zanyang Zhong, Jinhan Dong, Jingjun Tan, Changhao Nai, Jue Hou, Wenhao Lu, Renxin Zhong
Abstract: Existing UAV vision‑and‑language navigation (VLN) benchmarks rarely provide realistic aerial scenes, natural process‑level instructions, and sufficient scale simultaneously, making it difficult to systematically train and evaluate UAV VLN agents under realistic settings. To address this, we propose AirNav, a large‑scale benchmark built on real urban aerial data, comprising 137K navigation samples with natural and diverse instructions generated via a human‑‑LLM collaborative pipeline with 10 user personas. We conduct a systematic evaluation of representative approaches on AirNav, ranging from traditional models to multimodal large language models (MLLMs), under unified metrics with open‑source implementations. We further propose AirVLN‑R1, trained via supervised fine‑tuning (SFT) and reinforcement fine‑tuning (RFT), achieving state‑of‑the‑art performance with a 51.82% success rate on the test‑unseen split. Real‑world experiments on a physical UAV platform provide preliminary evidence of sim‑to‑real transferability, and our dataset and code are publicly available.
Authors: Zhicheng Zhao, Fengjiao Peng, Jinquan Yan, Wei Lu, Chenglong Li, Jin Tang
Abstract: Optics‑guided thermal UAV image super‑resolution has attracted significant research interest due to its potential in all‑weather monitoring applications. However, existing methods typically compress optical features to match thermal feature dimensions for cross‑modal alignment and fusion, which not only causes the loss of high‑frequency information that is beneficial for thermal super‑resolution, but also introduces physically inconsistent artifacts such as texture distortions and edge blurring by overlooking differences in the imaging physics between modalities. To address these challenges, we propose PCNet to achieve cross‑resolution mutual enhancement between optical and thermal modalities, while physically constraining the optical guidance process via thermal conduction to enable robust thermal UAV image super‑resolution. In particular, we design a Cross‑Resolution Mutual Enhancement Module (CRME) to jointly optimize thermal image super‑resolution and optical‑to‑thermal modality conversion, facilitating effective bidirectional feature interaction across resolutions while preserving high‑frequency optical priors. Moreover, we propose a Physics‑Driven Thermal Conduction Module (PDTM) that incorporates two‑dimensional heat conduction into optical guidance, modeling spatially‑varying heat conduction properties to prevent inconsistent artifacts. In addition, we introduce a temperature consistency loss that enforces regional distribution consistency and boundary gradient smoothness to ensure generated thermal images align with real‑world thermal radiation principles. Extensive experiments on VGTSR2.0 and DroneVehicle datasets demonstrate that PCNet significantly outperforms state‑of‑the‑art methods on both reconstruction quality and downstream tasks including semantic segmentation and object detection.
Authors: Chris Webb, Mobin Habibpour, Mayamin Hamid Raha, Ali Reza Tavakkoli, Janice Coen, Fatemeh Afghah
Abstract: Wildfire monitoring demands autonomous systems capable of reasoning under extreme visual degradation, rapidly evolving physical dynamics, and scarce real‑world training data. Existing UAV navigation approaches rely on simplified simulators and supervised perception pipelines, and lack embodied agents interacting with physically realistic fire environments. We introduce FIRE‑VLM, the first end‑to‑end vision‑language model (VLM) guided reinforcement learning (RL) framework trained entirely within a high‑fidelity, physics‑grounded wildfire digital twin. Built from USGS Digital Elevation Model (DEM) terrain, LANDFIRE fuel inventories, and semi‑physical fire‑spread solvers, this twin captures terrain‑induced runs, wind‑driven acceleration, smoke plume occlusion, and dynamic fuel consumption. Within this environment, a PPO agent with dual‑view UAV sensing is guided by a CLIP‑style VLM. Wildfire‑specific semantic alignment scores, derived from a single prompt describing active fire and smoke plumes, are integrated as potential‑based reward shaping signals. Our contributions are: (1) a GIS‑to‑simulation pipeline for constructing wildfire digital twins; (2) a VLM‑guided RL agent for UAV firefront tracking; and (3) a wildfire‑aware reward design that combines physical terms with VLM semantics. Across five digital‑twin evaluation tasks, our VLM‑guided policy reduces time‑to‑detection by up to 6 times, increases time‑in‑FOV, and is, to our knowledge, the first RL‑based UAV wildfire monitoring system demonstrated in kilometer‑scale, physics‑grounded digital‑twin fires.
Authors: Zongyang Lv, Yanmei Jia, Yongqing Liu, Alan F. Lynch, Qing Zhao, Yuhu Wu
Abstract: Unmanned aerial vehicle (UAV) with slung load system is a classic air transportation system. In practical applications, the suspension point of the slung load does not always align with the center of mass (CoM) of the UAV due to mission requirements or mechanical interference. This offset creates coupling in the system's nonlinear dynamics which leads to a complicated motion control problem. In existing research, modeling of the system are performed about the UAV's CoM. In this work we use the point of suspension instead. Based on the new model, a cascade control strategy is developed. In the middle‑loop controller, the acceleration of the suspension point is used to regulate the swing angle of the slung load without the need for considering the coupling between the slung load and the UAV. An inner‑loop controller is designed to track the UAV's attitude without the need of simplification on the coupling effects. We prove local exponential stability of the closed‑loop using Lyapunov approach. Finally, simulations and experiments are conducted to validate the proposed control system.
Authors: Chunhui Zhao, Xirui Kao, Yilin Lu, Yang Lyu
Abstract: Autonomous landing on mobile platforms is crucial for extending quadcopter operational flexibility, yet conventional methods are often too inefficient for highly dynamic scenarios. The core limitation lies in the prevalent ``track‑then‑descend'' paradigm, which treats the platform as a passive target and forces the quadcopter to perform complex, sequential maneuvers. This paper challenges that paradigm by introducing a bi‑directional cooperative landing framework that redefines the roles of the vehicle and the platform. The essential innovation is transforming the problem from a single‑agent tracking challenge into a coupled system optimization. Our key insight is that the mobile platform is not merely a target, but an active agent in the landing process. It proactively tilts its surface to create an optimal, stable terminal attitude for the approaching quadcopter. This active cooperation fundamentally breaks the sequential model by parallelizing the alignment and descent phases. Concurrently, the quadcopter's planning pipeline focuses on generating a time‑optimal and dynamically feasible trajectory that minimizes energy consumption. This bi‑directional coordination allows the system to execute the recovery in an agile manner, characterized by aggressive trajectory tracking and rapid state synchronization within transient windows. The framework's effectiveness, validated in dynamic scenarios, significantly improves the efficiency, precision, and robustness of autonomous quadrotor recovery in complex and time‑constrained missions.
Authors: Md. Asif Hossain, G M Mota-Tahrin Tayef, Nabil Subhan
Abstract: Manual inspections for solar panel systems are a tedious, costly, and error‑prone task, making it desirable for Unmanned Aerial Vehicle (UAV) based monitoring. Though deep learning models have excellent fault detection capabilities, almost all methods either are too large and heavy for edge computing devices or involve biased estimation of accuracy due to ineffective learning techniques. We propose a new solar panel fault detection model called HybridSolarNet. It integrates EfficientNet‑B0 with Convolutional Block Attention Module (CBAM). We implemented it on the Kaggle Solar Panel Images competition dataset with a tight split‑before‑augmentation protocol. It avoids leakage in accuracy estimation. We introduced focal loss and cosine annealing. Ablation analysis validates that accuracy boosts due to added benefits from CBAM (+1.53%) and that there are benefits from recognition of classes with imbalanced samples via focal loss. Overall average accuracy on 5‑fold stratified cross‑validation experiments on the given competition dataset topped 92.37% +/‑ 0.41 and an F1‑score of 0.9226 +/‑ 0.39 compared to baselines like VGG19, requiring merely 16.3 MB storage, i.e., 32 times less. Its inference speed measured at 54.9 FPS with GPU support makes it a successful candidate for real‑time UAV implementation. Moreover, visualization obtained from Grad‑CAM illustrates that HybridSolarNet focuses on actual locations instead of irrelevant ones.
Authors: Poorvi Joshi, Mohan Gurusamy
Abstract: In smart cities, bandwidth‑constrained Unmanned Aerial Vehicles (UAVs) often fail to relay mission‑critical data in time, compromising real‑time decision‑making. This highlights the need for faster and more efficient transmission of only the most relevant information. To address this, we propose DSC‑UAV model, leveraging a context‑adaptive Digital Semantic Communication (DSC) framework. This model redefines aerial data transmission through three core components: prompt‑aware encoding, dynamic UAV‑enabled relaying, and user mobility‑optimized reinforcement learning. Ground users transmit context‑driven visual content. Images are encoded via Vision Transformer combined with a prompt‑text encoder to generate semantic features based on the desired context (generic or object‑specific). These features are then quantized and transmitted over a UAV network that dynamically relays the data. Joint trajectory and resource allocation are optimized using Truncated Quantile Critic (TQC)‑aided reinforcement learning technique, which offers greater stability and precision over standard SAC and TD3 due to its resistance to overestimation bias. Simulations demonstrate significant performance improvement, up to 22% gain in semantic‑structural similarity and 14% reduction in Age of Information (AoI) compared to digital and prior UAV‑semantic communication baselines. By integrating mobility control with context‑driven visual abstraction, DSC‑UAV advances resilient, information‑centric surveillance for next‑generation UAV networks in bandwidth‑constrained environments.
Authors: Adari Rama Sukanya, Puvvula Roopesh Naga Sri Sai, Kota Moses, Rimalapudi Sarvendranath
Abstract: We present a large‑scale unmanned aerial vehicle (UAV)‑based RGB and multispectral image dataset collected over paddy fields in the Vijayawada region, Andhra Pradesh, India, covering nursery to harvesting stages. We used a 20‑megapixel RGB camera and a 5‑megapixel four‑band multispectral camera capturing red, green, red‑edge, and near‑infrared bands. Standardised operating procedure (SOP) and checklists were developed to ensure repeatable data acquisition. Our dataset comprises of 42,430 raw images (415 GB) captured over 5 acres with 1 cm/pixel ground sampling distance (GSD) with associated metadata such as GPS coordinates, flight altitude, and environmental conditions. Captured images were validated using Pix4D Fields to generate orthomosaic maps and vegetation index maps, such as normalised difference vegetation index (NDVI) and normalised difference red‑edge (NDRE) index. Our dataset is one of the few datasets that provide high‑resolution images with rich metadata that cover all growth stages of Indian paddy crops. The dataset is available on IEEE DataPort with DOI, . It can support studies on targeted spraying, disease analysis, and yield estimation.
Authors: Eslam Eldeeb, Hirley Alves
Abstract: The next‑generation wireless technologies, including beyond 5G and 6G networks, are paving the way for transformative applications such as vehicle platooning, smart cities, and remote surgery. These innovations are driven by a vast array of interconnected wireless entities, including IoT devices, access points, UAVs, and CAVs, which increase network complexity and demand more advanced decision‑making algorithms. Artificial intelligence (AI) and machine learning (ML), especially reinforcement learning (RL), are key enablers for such networks, providing solutions to high‑dimensional and complex challenges. However, as networks expand to multi‑agent environments, traditional online RL approaches face cost, safety, and scalability limitations. Offline multi‑agent reinforcement learning (MARL) offers a promising solution by utilizing pre‑collected data, reducing the need for real‑time interaction. This article introduces a novel offline MARL algorithm based on conservative Q‑learning (CQL), ensuring safe and efficient training. We extend this with meta‑learning to address dynamic environments and validate the approach through use cases in radio resource management and UAV networks. Our work highlights offline MARL's advantages, limitations, and future directions in wireless applications.
Authors: Aly Sabri Abdalla, Vuk Marojevic
Abstract: Despite the growing interest in low‑altitude economy (LAE) applications, including UAV‑based logistics and emergency response, fundamental challenges remain in orchestrating such missions over complex, signal‑constrained environments. These include the absence of real‑time, resilient, and context‑aware orchestration of aerial nodes with limited integration of artificial intelligence (AI) specialized for LAE missions. This paper introduces an open radio access network (O‑RAN)‑enabled LAE framework that leverages seamless coordination between the disaggregated RAN architecture, open interfaces, and RAN intelligent controllers (RICs) to facilitate closed‑loop, AI‑optimized, and mission‑critical LAE operations. We evaluate the feasibility and performance of the proposed architecture via a semantic‑aware rApp that acts as a terrain interpreter, offering semantic guidance to a reinforcement learning‑enabled xApp, which performs real‑time trajectory planning for LAE swarm nodes. We survey the capabilities of UAV testbeds that can be leveraged for LAE research, and present critical research challenges and standardization needs.
Authors: Julia Di, Kenneth A. W. Hoffmann, Tony G. Chen, Tian-Ao Ren, Mark R. Cutkosky
Abstract: Perching allows unmanned aerial vehicles (UAVs) to reduce energy consumption, remain anchored for surface sampling operations, or stably survey their surroundings. Previous efforts for perching on vertical surfaces have predominantly focused on lightweight mechanical design solutions with relatively scant system‑level integration. Furthermore, perching strategies for vertical surfaces commonly require high‑speed, aggressive landing operations that are dangerous for a surveyor drone with sensitive electronics onboard. This work presents the preliminary investigation of a perching approach suitable for larger drones that both gently perches on vertical tree trunks and reacts and recovers from perch failures. The system in this work, called SLAP, consists of vision‑based perch site detector, an IMU (inertial‑measurement‑unit)‑based perch failure detector, an attitude controller for soft perching, an optical close‑range detection system, and a fast active elastic gripper with microspines made from commercially‑available slapbands. We validated this approach on a modified 1.2 kg commercial quadrotor with component and system analysis. Initial human‑in‑the‑loop autonomous indoor flight experiments achieved a 75% perch success rate on a real oak tree segment across 20 flights, and 100% perch failure recovery across 2 flights with induced failures.
Authors: Anas K. Saeed, Mahmoud M. Salim, Ali Arshad Nasir, Ali H. Muqaibel
Abstract: Reconfigurable intelligent surfaces (RISs) mounted on unmanned aerial vehicles (UAVs) can reshape wireless propagation on‑demand. However, their performance is sensitive to UAV jitter and cascaded channel uncertainty. This paper investigates a downlink multiple‑input single‑output UAV‑mounted RIS system in which a ground multiple‑antenna base station (BS) serves multiple single‑antenna users under practical impairments. Our goal is to maximize the expected throughput under stochastic three‑dimensional UAV jitter and imperfect cascaded channel state information (CSI) based only on the available channel estimates. This leads to a stochastic nonconvex optimization problem subject to a BS transmit power constraint and strict unit‑modulus constraints on all RIS elements. To address this problem, we design a model‑free deep reinforcement learning (DRL) framework with a contextual bandit formulation. A differentiable feasibility layer is utilized to map continuous actions to feasible solutions, while the reward is a Monte Carlo estimate of the expected throughput. We instantiate this framework with constrained variants of deep deterministic policy gradient (DDPG) and twin delayed deep deterministic policy gradient (TD3) that do not use target networks. Simulations show that the proposed algorithms yield higher throughput than conventional alternating optimization‑based weighted minimum mean‑square error (AO‑WMMSE) baselines under severe jitter and low CSI quality. Across different scenarios, the proposed methods achieve performance that is either comparable to or slightly below the AO‑WMMSE benchmark, based on sample average approximation (SAA) with a relative gap ranging from 0‑12%. Moreover, the proposed DRL controllers achieve online inference times of 0.6 ms per decision versus roughly 370‑550 ms for AO‑WMMSE solvers.
Authors: Yixian Wang, Geng Sun, Zemin Sun, Jiacheng Wang, Changyuan Zhao, Daxin Tian, Dusit Niyato, Shiwen Mao
Abstract: In this paper, we propose an intelligent reflecting surface (IRS)‑enabled low‑altitude multi‑access edge computing (MEC) architecture, where an aerial MEC server cooperates with a terrestrial MEC server to provide computing services, while hybrid IRSs (i.e., building‑installed and UAV‑carried IRSs) are deployed to enhance the air‑ground connectivity under blockage. Based on this architecture, we formulate a multi‑objective optimization problem (MOOP) to minimize the task completion delay and energy consumption by jointly optimizing task offloading, UAV trajectory control, IRS phase‑shift configuration, and computation resource allocation. The considered problem is NP‑hard, and thus we propose a hierarchical online optimization approach (HOOA) to efficiently solve the problem. Specifically, we reformulate the MOOP as a Stackelberg game, where MEC servers collectively act as the leader to determine the system‑level decisions, while the vehicles act as followers to make individual decisions. At the follower level, we present a many‑to‑one matching mechanism to generate feasible discrete decisions. At the leader level, we propose a generative diffusion model‑enhanced twin delayed deep deterministic policy gradient (GDMTD3) algorithm integrated with a Karush‑Kuhn‑Tucker (KKT)‑based method, which is a deep reinforcement learning (DRL)‑based approach, to determine the continuous decisions. Simulation results demonstrate that the proposed HOOA achieves significant improvements, which reduces average task completion delay by 2.5% and average energy consumption by 3.1% compared with the best‑performing benchmark approach and state‑of‑the‑art DRL algorithm, respectively. Moreover, the proposed HOOA exhibits superior convergence stability while maintaining strong robustness and scalability in dynamic environments.
Authors: Qingyu Xu, Runtong Zhang, Zihuan Qiu, Fanman Meng
Abstract: Object detection in fire rescue scenarios is importance for command and decision‑making in firefighting operations. However, existing research still suffers from two main limitations. First, current work predominantly focuses on environments such as mountainous or forest areas, while paying insufficient attention to urban rescue scenes, which are more frequent and structurally complex. Second, existing detection systems include a limited number of classes, such as flames and smoke, and lack a comprehensive system covering key targets crucial for command decisions, such as fire trucks and firefighters. To address the above issues, this paper first constructs a new dataset named "FireRescue" for rescue command, which covers multiple rescue scenarios, including urban, mountainous, forest, and water areas, and contains eight key categories such as fire trucks and firefighters, with a total of 15,980 images and 32,000 bounding boxes. Secondly, to tackle the problems of inter‑class confusion and missed detection of small targets caused by chaotic scenes, diverse targets, and long‑distance shooting, this paper proposes an improved model named FRS‑YOLO. On the one hand, the model introduces a plug‑and‑play multidi‑mensional collaborative enhancement attention module, which enhances the discriminative representation of easily confused categories (e.g., fire trucks vs. ordinary trucks) through cross‑dimensional feature interaction. On the other hand, it integrates a dynamic feature sampler to strengthen high‑response foreground features, thereby mitigating the effects of smoke occlusion and background interference. Experimental results demonstrate that object detection in fire rescue scenarios is highly challenging, and the proposed method effectively improves the detection performance of YOLO series models in this context.
Authors: Nhut Le, Maryam Rahnemoonfar
Abstract: The increasing frequency of natural disasters poses severe threats to human lives and leads to substantial economic losses. While 3D semantic segmentation is crucial for post‑disaster assessment, existing deep learning models lack datasets specifically designed for post‑disaster environments. To address this gap, we constructed a specialized 3D dataset using unmanned aerial vehicles (UAVs)‑captured aerial footage of Hurricane Ian (2022) over affected areas, employing Structure‑from‑Motion (SfM) and Multi‑View Stereo (MVS) techniques to reconstruct 3D point clouds. We evaluated the state‑of‑the‑art (SOTA) 3D semantic segmentation models, Fast Point Transformer (FPT), Point Transformer v3 (PTv3), and OA‑CNNs on this dataset, exposing significant limitations in existing methods for disaster‑stricken regions. These findings underscore the urgent need for advancements in 3D segmentation techniques and the development of specialized 3D benchmark datasets to improve post‑disaster scene understanding and response.
Authors: Haojin Li, Anbang Zhang, Chen Sun, Chenyuan Feng, Kaiqian Qu, Tony Q. S. Quek, Haijun Zhang
Abstract: The low‑altitude economy (LAE) is rapidly expanding driven by urban air mobility, logistics drones, and aerial sensing, while fast and accurate beam prediction in uncrewed aerial vehicles (UAVs) communications is crucial for achieving reliable connectivity. Current research is shifting from single‑signal to multi‑modal collaborative approaches. However, existing multi‑modal methods mostly employ fixed or empirical weights, assuming equal reliability across modalities at any given moment. Indeed, the importance of different modalities fluctuates dramatically with UAV motion scenarios, and static weighting amplifies the negative impact of degraded modalities. Furthermore, modal mismatch and weak alignment further undermine cross‑scenario generalization. To this end, we propose a reliability‑aware dynamic weighting scheme applied to a semantic‑aware multi‑modal beam prediction framework, named SaM2B. Specifically, SaM2B leverages lightweight cues such as environmental visual, flight posture, and geospatial data to adaptively allocate contributions across modalities at different time points through reliability‑aware dynamic weight updates. Moreover, by utilizing cross‑modal contrastive learning, we align the "multi‑source representation beam semantics" associated with specific beam information to a shared semantic space, thereby enhancing discriminative power and robustness under modal noise and distribution shifts. Experiments on real‑world low‑altitude UAV datasets show that SaM2B achieves more satisfactory results than baseline methods.
Authors: Han Zhen Li, Yu Hu, Lai Zhang, Hong Bo Sun, Xu Chao Zhang
Abstract: Cycloidal propellers are known for their omnidirectional vectored thrust, enabling smooth transitions between hovering and forward flight, making them ideal for unmanned aerial vehicles (UAVs) and electric vertical take‑off and landing (eVTOL) aircraft. However, cycloidal propellers tend to have lower hovering efficiency compared to screw propellers. Adding end plates to the blade tips can enhance hovering efficiency by reducing blade tip vortices. But the impact of these end plates and the optimal design for cycloidal propellers incorporating them have not been thoroughly studied. This paper seeks to optimize hovering efficiency and develop design theories for cycloidal propellers with end plates. Extensive force measurement experiments are conducted to identify designs with optimal hovering efficiency. The sliding mesh technique is employed to solve the unsteady Reynolds‑averaged Navier‑Stokes (URANS) equations for a detailed analysis. Experimental results indicate that the designs with end plates generally achieve significantly better hovering efficiency than those without end plates. End plates help to maintain hovering efficiency, even though the blade aspect ratio is as small as 1.5. The designs with stationary end plates are superior to those with rotating end plates because rotation introduces additional torque caused by the friction force. Designs featuring thick end plates outperform those with thin end plates, as the rounded edges can eliminate end plate vortices. The best design features stationary thick end plates, a chord‑to‑radius ratio of 0.65, and a large pitching amplitude of 40 degrees. It achieves a hovering efficiency of 0.72 with a blade aspect ratio of 3, which is comparable to that of helicopters. In contrast, for the cases without end plates, the highest hovering efficiency is merely 0.54.
Authors: Fuqiang Gu, Jiangshan Ai, Xu Lu, Xianlei Long, Yan Li, Tao Jiang, Chao Chen, Huidong Liu
Abstract: Unmanned Aerial Vehicles (UAVs) play an important role in various applications, where precise trajectory tracking is crucial. However, conventional control algorithms for trajectory tracking often exhibit limited performance due to the underactuated, nonlinear, and highly coupled dynamics of quadrotor systems. To address these challenges, we propose HBO‑PID, a novel control algorithm that integrates the Heteroscedastic Bayesian Optimization (HBO) framework with the classical PID controller to achieve accurate and robust trajectory tracking. By explicitly modeling input‑dependent noise variance, the proposed method can better adapt to dynamic and complex environments, and therefore improve the accuracy and robustness of trajectory tracking. To accelerate the convergence of optimization, we adopt a two‑stage optimization strategy that allow us to more efficiently find the optimal controller parameters. Through experiments in both simulation and real‑world scenarios, we demonstrate that the proposed method significantly outperforms state‑of‑the‑art (SOTA) methods. Compared to SOTA methods, it improves the position accuracy by 24.7% to 42.9%, and the angular accuracy by 40.9% to 78.4%.
Authors: Zonghan Li, Tianwen Tao, Rao Fu, Liang Wang, Dongyuan Zhang, Quan Quan
Abstract: Significant challenges are posed by simulation and testing in the field of low‑altitude unmanned aerial vehicle (UAV) traffic due to the high costs associated with large‑scale UAV testing and the complexity of establishing low‑altitude traffic test scenarios. Stringent safety requirements make high fidelity one of the key metrics for simulation platforms. Despite advancements in simulation platforms for low‑altitude UAVs, there is still a shortage of platforms that feature rich traffic scenarios, high‑precision UAV and scenario simulators, and comprehensive testing capabilities for low‑altitude traffic. Therefore, this paper introduces an integrated high‑fidelity simulation platform for low‑altitude UAV traffic. This platform simulates all components of the UAV traffic network, including the control system, the traffic management system, the UAV system, the communication network , the anomaly and fault modules, etc. Furthermore, it integrates RflySim/AirSim and Unreal Engine 5 to develop full‑state models of UAVs and 3D maps that model the real world using the oblique photogrammetry technique. Additionally, the platform offers a wide range of interfaces, and all models and scenarios can be customized with a high degree of flexibility. The platform's source code has been released, making it easier to conduct research related to low‑altitude traffic.
Authors: Liangtao Feng, Zhenchang Liu, Feng Zhang, Xuefeng Ren
Abstract: This paper introduces SHIELD, a Spherical‑Projection Hybrid‑Frontier Integration for Efficient LiDAR‑based Drone exploration method. Although laser LiDAR offers the advantage of a wide field of view, its application in UAV exploration still faces several challenges. The observation quality of LiDAR point clouds is generally inferior to that of depth cameras. Traditional frontier methods based on known and unknown regions impose a heavy computational burden, especially when handling the wide field of view of LiDAR. In addition, regions without point cloud are also difficult to classify as free space through raycasting. To address these problems, the SHIELD is proposed. It maintains an observation‑quality occupancy map and performs ray‑casting on this map to address the issue of inconsistent point‑cloud quality during exploration. A hybrid frontier method is used to tackle both the computational burden and the limitations of point‑cloud quality exploration. In addition, an outward spherical‑projection ray‑casting strategy is proposed to jointly ensure flight safety and exploration efficiency in open areas. Simulations and flight experiments prove the effectiveness of SHIELD. This work will be open‑sourced to contribute to the research community.
Authors: Zijian Ling, Man Zhou, Hongda Zhai, Yating Huang, Lingchen Zhao, Qi Li, Chao Shen, Qian Wang
Abstract: In recent years, drone delivery, which utilizes unmanned aerial vehicles (UAVs) for package delivery and pickup, has gradually emerged as a crucial method in logistics. Since delivery drones are expensive and may carry valuable packages, they must maintain a safe distance from individuals until user‑drone mutual authentication is confirmed. Despite numerous authentication schemes being developed, existing solutions are limited in authentication distance and lack resilience against sophisticated attacks. To this end, we introduce SyncGait, an implicit gait‑based mutual authentication system for drone delivery. SyncGait leverages the user's unique arm swing as he walks toward the drone to achieve mutual authentication without requiring additional hardware or specific authentication actions. We conducted extensive experiments on 14 datasets collected from 31 subjects. The results demonstrate that SyncGait achieves an average accuracy of 99.84% at a long distance (>18m) and exhibits strong resilience against various spoofing attacks, making it a robust, secure, and user‑friendly solution in real‑world scenarios.
Authors: Shiqi Dai, Zizhi Ma, Zhicong Luo, Xuesong Yang, Yibin Huang, Wanyue Zhang, Chi Chen, Zonghao Guo, Wang Xu, Yufei Sun, Maosong Sun
Abstract: While Multimodal Large Language Models (MLLMs) have exhibited remarkable general intelligence across diverse domains, their potential in low‑altitude applications dominated by Unmanned Aerial Vehicles (UAVs) remains largely underexplored. Existing MLLM benchmarks rarely cover the unique challenges of low‑altitude scenarios, while UAV‑related evaluations mainly focus on specific tasks such as localization or navigation, without a unified evaluation of MLLMs'general intelligence. To bridge this gap, we present MM‑UAVBench, a comprehensive benchmark that systematically evaluates MLLMs across three core capability dimensions‑perception, cognition, and planning‑in low‑altitude UAV scenarios. MM‑UAVBench comprises 19 sub‑tasks with over 5.7K manually annotated questions, all derived from real‑world UAV data collected from public datasets. Extensive experiments on 16 open‑source and proprietary MLLMs reveal that current models struggle to adapt to the complex visual and cognitive demands of low‑altitude scenarios. Our analyses further uncover critical bottlenecks such as spatial bias and multi‑view understanding that hinder the effective deployment of MLLMs in UAV scenarios. We hope MM‑UAVBench will foster future research on robust and reliable MLLMs for real‑world UAV intelligence.
Authors: Soham Dutta, Soham Banerjee, Sneha Mahata, Anindya Sen, Sayantani Datta
Abstract: Apple orchards require timely disease detection, fruit quality assessment, and yield estimation, yet existing UAV‑based systems address such tasks in isolation and often rely on costly multispectral sensors. This paper presents a unified, low‑cost RGB‑only UAV‑based orchard intelligent pipeline integrating ResNet50 for leaf disease detection, VGG 16 for apple freshness determination, and YOLOv8 for real‑time apple detection and localization. The system runs on an ESP32‑CAM and Raspberry Pi, providing fully offline on‑site inference without cloud support. Experiments demonstrate 98.9% accuracy for leaf disease classification, 97.4% accuracy for freshness classification, and 0.857 F1 score for apple detection. The framework provides an accessible and scalable alternative to multispectral UAV solutions, supporting practical precision agriculture on affordable hardware.
Authors: Minh Bui, Simon Monckton, Mo Chen
Abstract: Reach‑avoid (RA) games have significant applications in security and defense, particularly for unmanned aerial vehicles (UAVs). These problems are inherently challenging due to the need to consider obstacles, consider the adversarial nature of opponents, ensure optimality, and account for nonlinear dynamics. Hamilton‑Jacobi (HJ) reachability analysis has emerged as a powerful tool for tackling these challenges; however, while it has been applied to games involving two spatial dimensions, directly extending this approach to three spatial dimensions is impossible due to high dimensionality. On the other hand, alternative approaches for solving RA games lack the generality to consider games with three spatial dimensions involving agents with non‑trivial system dynamics. In this work, we propose a novel framework for dimensionality reduction by decomposing the problem into a horizontal RA sub‑game and a vertical RA sub‑game. We then solve each sub‑game using HJ reachability analysis and consider second‑order dynamics that account for the defender's acceleration. To reconstruct the solution to the original RA game from the sub‑games, we introduce a HJ‑based tracking control algorithm in each sub‑game that not only guarantees capture of the attacker but also tracking of the attacker thereafter. We prove the conditions under which the capture guarantees are maintained. The effectiveness of our approach is demonstrated via numerical simulations, showing that the decomposition maintains optimality and guarantees in the original problem. Our methods are also validated in a Gazebo physics simulator, achieving successful capture of quadrotors in three spatial dimensions space for the first time to the best of our knowledge.
Authors: Max Beffert, Andreas Zell
Abstract: One of the main limitations of multirotor UAVs is their short flight time due to battery constraints. A practical solution for continuous operation is to power the drone from the ground via a tether. While this approach has been demonstrated for stationary systems, scenarios with a fast‑moving base vehicle or strong wind conditions require modeling the tether forces, including aerodynamic effects. In this work, we propose two complementary approaches for low‑latency quasi‑static tether modeling with aerodynamics. The first is an analytical method based on catenary theory with a uniform drag assumption, achieving very fast solve times below 1 ms. The second is a numerical method that discretizes the tether into segments and lumped masses, solving the equilibrium equations using CasADi and IPOPT. By leveraging initialization strategies, such as warm starting and analytical initialization, low‑latency performance was achieved with a solve time of 5 ms, while allowing for flexible force formulations. Both approaches were validated in real‑world tests using a load cell to measure the tether force. The results show that the analytical method provides sufficient accuracy for most tethered UAV applications with minimal computational cost, while the numerical method offers higher flexibility and physical accuracy when required. These approaches form a lightweight and extensible framework for low‑latency tether simulation, applicable to both offline optimization and online tasks such as simulation, control, and trajectory planning.
Authors: Ndagijimana Cyprien, Mehdi Sookhak, Hosein Zarini, Chandra N Sekharan, Mohammed Atiquzzaman
Abstract: Joint deployment of unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs) has been shown to be an effective method to establish communications in areas affected by disasters. However, ensuring good Quality of Services (QoS) while using as few UAVs as possible also requires optimal positioning and trajectory planning for UAVs and UGVs. This paper proposes a joint UAV‑UGV‑based positioning and trajectory planning framework for UAVs and UGVs deployment that guarantees optimal QoS for ground users. To model the UGVs' mobility, we introduce a road graph, which directs their movement along valid road segments and adheres to the road network constraints. To solve the sum rate optimization problem, we reformulate the problem as a Markov Decision Process (MDP) and propose a novel asynchronous Advantage Actor Critic (A3C) incorporated with meta‑learning for rapid adaptation to new environments and dynamic conditions. Numerical results demonstrate that our proposed Meta‑A3C approach outperforms A3C and DDPG, delivering 13.1% higher throughput and 49% faster execution while meeting the QoS requirements.
Authors: Wen Jiang, Li Wang, Kangyao Huang, Wei Fan, Jinyuan Liu, Shaoyu Liu, Hongwei Duan, Bin Xu, Xiangyang Ji
Abstract: Unmanned aerial vehicles (UAVs) are crucial tools for post‑disaster search and rescue, facing challenges such as high information density, rapid changes in viewpoint, and dynamic structures, especially in long‑horizon navigation. However, current UAV vision‑and‑language navigation(VLN) methods struggle to model long‑horizon spatiotemporal context in complex environments, resulting in inaccurate semantic alignment and unstable path planning. To this end, we propose LongFly, a spatiotemporal context modeling framework for long‑horizon UAV VLN. LongFly proposes a history‑aware spatiotemporal modeling strategy that transforms fragmented and redundant historical data into structured, compact, and expressive representations. First, we propose the slot‑based historical image compression module, which dynamically distills multi‑view historical observations into fixed‑length contextual representations. Then, the spatiotemporal trajectory encoding module is introduced to capture the temporal dynamics and spatial structure of UAV trajectories. Finally, to integrate existing spatiotemporal context with current observations, we design the prompt‑guided multimodal integration module to support time‑based reasoning and robust waypoint prediction. Experimental results demonstrate that LongFly outperforms state‑of‑the‑art UAV VLN baselines by 7.89% in success rate and 6.33% in success weighted by path length, consistently across both seen and unseen environments.
Authors: Ali Lotfi, Adam Carter, Thuan Ha, Mohammad Meysami, Kwabena Nketia, Steve Shirtliffe
Abstract: Spectral indices such as NDVI have driven vegetation monitoring for decades, yet their design remains largely manual and ad hoc. Their usefulness stems not only from their empirical performance, but also from algebraic forms that remain compact and biologically interpretable. However, the space of possible algebraic expressions relating spectral bands is effectively infinite, making systematic search impractical without structural constraints. We introduce the Spectral Feature Polynomial (SFP) framework, a general pipeline that automatically discovers compact, interpretable spectral indices from labeled multispectral imagery. SFP constructs a library of ratio‑based spectral features that inherit illumination invariance by construction. It then applies cross‑validated feature selection and continuous coefficient optimization to produce a single closed‑form equation per task, transparent to domain experts and deployable on any remote sensing platform without requiring standardization statistics. We validate the framework on two agricultural applications. For Kochia (Bassia scoparia) detection in Sentinel‑2 imagery near Lucky Lake of Saskatchewan over three growing seasons, the same two‑term equation emerged in 44 of 46 independent cross‑validation folds, achieving 98.6% mean accuracy, more than 4 percentage points above the best established index under year‑held‑out evaluation. For wheat plant classification from UAV multispectral imagery, stage‑specific indices achieved 99.5%, 97.2%, and 93.5% across three growth stages, compared to 78% or below for the best established index at late season when NIR‑based contrasts lose discriminatory power as wheat senesces. In both applications, SFP yielded a single transparent equation that generalized across held‑out regions and outperformed established indices.
Authors: Weichen Zhang, Peizhi Tang, Xin Zeng, Fanhang Man, Shiquan Yu, Zichao Dai, Baining Zhao, Hongjin Chen, Yu Shang, Wei Wu, Chen Gao, Xinlei Chen, Xin Wang, Yong Li, Wenwu Zhu
Abstract: Unmanned aerial vehicles (UAVs) have emerged as powerful embodied agents. One of the core abilities is autonomous navigation in large‑scale three‑dimensional environments. Existing navigation policies, however, are typically optimized for low‑level objectives such as obstacle avoidance and trajectory smoothness, lacking the ability to incorporate high‑level semantics into planning. To bridge this gap, we propose ANWM, an aerial navigation world model that predicts future visual observations conditioned on past frames and actions, thereby enabling agents to rank candidate trajectories by their semantic plausibility and navigational utility. ANWM is trained on 4‑DoF UAV trajectories and introduces a physics‑inspired module: Future Frame Projection (FFP), which projects past frames into future viewpoints to provide coarse geometric priors. This module mitigates representational uncertainty in long‑distance visual generation and captures the mapping between 3D trajectories and egocentric observations. Empirical results demonstrate that ANWM significantly outperforms existing world models in long‑distance visual forecasting and improves UAV navigation success rates in large‑scale environments.
Authors: Zhan Chen, Zile Guo, Enze Zhu, Peirong Zhang, Xiaoxuan Liu, Lei Wang, Yidan Zhang
Abstract: Video prediction is plagued by a fundamental trilemma: achieving high‑resolution and perceptual quality typically comes at the cost of real‑time speed, hindering its use in latency‑critical applications. This challenge is most acute for autonomous UAVs in dense urban environments, where foreseeing events from high‑resolution imagery is non‑negotiable for safety. Existing methods, reliant on iterative generation (diffusion, autoregressive models) or quadratic‑complexity attention, fail to meet these stringent demands on edge hardware. To break this long‑standing trade‑off, we introduce RAPTOR, a video prediction architecture that achieves real‑time, high‑resolution performance. RAPTOR's single‑pass design avoids the error accumulation and latency of iterative approaches. Its core innovation is Efficient Video Attention (EVA), a novel translator module that factorizes spatiotemporal modeling. Instead of processing flattened spacetime tokens with O((ST)^2) or O(ST) complexity, EVA alternates operations along the spatial (S) and temporal (T) axes. This factorization reduces the time complexity to O(S + T) and memory complexity to O(max(S, T)), enabling global context modeling at 512^2 resolution and beyond, operating directly on dense feature maps with a patch‑free design. Complementing this architecture is a 3‑stage training curriculum that progressively refines predictions from coarse structure to sharp, temporally coherent details. Experiments show RAPTOR is the first predictor to exceed 30 FPS on a Jetson AGX Orin for 512^2 video, setting a new state‑of‑the‑art on UAVid, KTH, and a custom high‑resolution dataset in PSNR, SSIM, and LPIPS. Critically, RAPTOR boosts the mission success rate in a real‑world UAV navigation task by 18%, paving the way for safer and more anticipatory embodied agents.
Authors: Ammar El Falou
Abstract: The integration of non‑terrestrial networks (NTNs) into 6G systems is crucial for achieving seamless global coverage, particularly in underserved and disaster‑prone regions. Among NTN platforms, unmanned aerial vehicles (UAVs) are especially promising due to their rapid deployability. However, this shift from fixed, wired base stations (BSs) to mobile, wireless, energy‑constrained UAV‑BSs introduces unique security challenges. Their central role in emergency communications makes them attractive candidates for emergency alert spoofing. Their limited computing and energy resources make them more vulnerable to denial‑of‑service (DoS) attacks, and their dependence on wireless backhaul links and GNSS navigation exposes them to jamming, interception, and spoofing. Furthermore, UAV mobility opens new attack vectors such as malicious handover manipulation. This paper identifies several attack surfaces of UAV‑BS systems and outlines principles for mitigating their threats.
Authors: Siddhartha Upadhyay, Ratnangshu Das, Pushpak Jagtap
Abstract: In this work, we extend the Spatiotemporal Tube (STT) framework to address Probabilistic Temporal Reach‑Avoid‑Stay (PrT‑RAS) tasks in dynamic environments with uncertain obstacles. We develop a real‑time tube synthesis procedure that explicitly accounts for time‑varying uncertain obstacles and provides formal probabilistic safety guarantees. The STT is formulated as a time‑varying ball in the state space whose center and radius evolve online based on uncertain sensory information. We derive a closed‑form, approximation‑free control law that confines the system trajectory within the tube, ensuring both probabilistic safety and task satisfaction. Our method offers a formal guarantee for probabilistic avoidance and finite‑time task completion. The resulting controller is model‑free, approximation‑free, and optimization‑free, enabling efficient real‑time execution while guaranteeing convergence to the target. The effectiveness and scalability of the framework are demonstrated through simulation studies and hardware experiments on mobile robots, a UAV, and a 7‑DOF manipulator navigating in cluttered and uncertain environments.
Authors: Yuanshuang Fu, Qianyao Wang, Qihao Wang, Bonan Zhang, Jiaxin Zhao, Yiming Cao, Zhijun Li
Abstract: Unmanned Aerial Vehicle (UAV) spectral remote sensing technology is widely used in water quality monitoring. However, in dynamic environments, varying illumination conditions, such as shadows and specular reflection (sun glint), can cause severe spectral distortion, thereby reducing data availability. To maximize the acquisition of high‑quality data while ensuring flight safety, this paper proposes an active path planning method for dynamic light and shadow disturbance avoidance. First, a dynamic prediction model is constructed to transform the time‑varying light and shadow disturbance areas into three‑dimensional virtual obstacles. Second, an improved Interfered Fluid Dynamical System (IFDS) algorithm is introduced, which generates a smooth initial obstacle avoidance path by building a repulsive force field. Subsequently, a Model Predictive Control (MPC) framework is employed for rolling‑horizon path optimization to handle flight dynamics constraints and achieve real‑time trajectory tracking. Furthermore, a Dynamic Flight Altitude Adjustment (DFAA) mechanism is designed to actively reduce the flight altitude when the observable area is narrow, thereby enhancing spatial resolution. Simulation results show that, compared with traditional PID and single obstacle avoidance algorithms, the proposed method achieves an obstacle avoidance success rate of 98% in densely disturbed scenarios, significantly improves path smoothness, and increases the volume of effective observation data by approximately 27%. This research provides an effective engineering solution for precise UAV water quality monitoring in complex illumination environments.
Authors: Wei Wu, Lingyi Wang, Fuhui Zhou, Zhaohui Yang, Qihui Wu
Abstract: Artificial intelligence (AI)‑native three‑dimensional (3D) spectrum maps are crucial in spectrum monitoring for intelligent communication networks. However, it is challenging to obtain and transmit 3D spectrum maps in a spectrum‑efficient, computation‑efficient, and AI‑driven manner, especially under complex communication environments and sparse sampling data. In this paper, we consider practical air‑to‑ground semantic communications for spectrum map completion, where the unmanned aerial vehicle (UAV) measures the spectrum at spatial points and extracts the spectrum semantics, which are then utilized to complete spectrum maps at the ground device. Since statistical machine learning can easily be misled by superficial data correlations with the lack of interpretability, we propose a novel knowledge‑enhanced semantic spectrum map completion framework with two expert knowledge‑driven constraints from physical signal propagation models. This framework can capture the real‑world physics and avoid getting stuck in the mindset of superficial data distributions. Furthermore, a knowledge‑enhanced vector‑quantized Transformer (KE‑VQ‑Transformer) based multi‑scale low‑complex intelligent completion approach is proposed, where the sparse window is applied to avoid ultra‑large 3D attention computation, and the multi‑scale design improves the completion performance. The knowledge‑enhanced mean square error (KMSE) and root KMSE (RKMSE) are introduced as novel metrics for semantic spectrum map completion that jointly consider the numerical precision and physical consistency with the signal propagation model, based on which a joint offline and online training method is developed with supervised and unsupervised knowledge loss. The simulation demonstrates that our proposed scheme outperforms the state‑of‑the‑art benchmark schemes in terms of RKMSE.
Authors: Siqi Mu, Shuo Wen, Yang Lu, Ruihong Jiang, Bo Ai
Abstract: Due to their inherent flexibility and autonomous operation, unmanned aerial vehicles (UAVs) have been widely used in Internet of Medical Things (IoMT) to provide real‑time biomedical edge computing service for wireless body area network (WBAN) users. In this paper, considering the time‑varying task criticality characteristics of diverse WBAN users and the dual mobility between WBAN users and UAV, we investigate the dynamic task offloading and UAV flight trajectory optimization problem to minimize the weighted average task completion time of all the WBAN users, under the constraint of UAV energy consumption. To tackle the problem, an embodied AI‑enhanced IoMT edge computing framework is established. Specifically, we propose a novel hierarchical multi‑scale Transformer‑based user trajectory prediction model based on the users' historical trajectory traces captured by the embodied AI agent (i.e., UAV). Afterwards, a prediction‑enhanced deep reinforcement learning (DRL) algorithm that integrates predicted users' mobility information is designed for intelligently optimizing UAV flight trajectory and task offloading decisions. Real‑word movement traces and simulation results demonstrate the superiority of the proposed methods in comparison with the existing benchmarks.
Authors: Tanmay P. Patel, Erica L. Tevere, Erik H. Kramer, Rudranarayan M. Mukherjee
Abstract: This paper presents a general purpose framework for autonomous, vision‑based interception of dynamic, non‑cooperative targets, validated across three distinct mobility platforms: an unmanned aerial vehicle (UAV), a four‑wheeled ground rover, and an air‑thruster spacecraft testbed. The approach relies solely on a monocular camera with fiducials for target tracking and operates entirely in the local observer frame without the need for global information. The core contribution of this work is a streamlined and general approach to autonomous interception that can be adapted across robots with varying dynamics, as well as our comprehensive study of the robot interception problem across heterogenous mobility systems under limited observability and no global localization. Our method integrates (1) an Extended Kalman Filter for relative pose estimation amid intermittent measurements, (2) a history‑conditioned motion predictor for dynamic target trajectory propagation, and (3) a receding‑horizon planner solving a constrained convex program in real time to ensure time‑efficient and kinematically feasible interception paths. Our operating regime assumes that observability is restricted by partial fields of view, sensor dropouts, and target occlusions. Experiments are performed in these conditions and include autonomous UAV landing on dynamic targets, rover rendezvous and leader‑follower tasks, and spacecraft proximity operations. Results from simulated and physical experiments demonstrate robust performance with low interception errors (both during station‑keeping and upon scenario completion), high success rates under deterministic and stochastic target motion profiles, and real‑time execution on embedded processors such as the Jetson Orin, VOXL2, and Raspberry Pi 5. These results highlight the framework's generalizability, robustness, and computational efficiency.
Authors: Mazyar Taghavi, Javad Vahidi
Abstract: This study introduces a quantum inspired framework for optimizing the exploration exploitation tradeoff in multiagent reinforcement learning, applied to UAVassisted 6G network deployment. We consider a cooperative scenario where ten intelligent UAVs autonomously coordinate to maximize signal coverage and support efficient network expansion under partial observability and dynamic conditions. The proposed approach integrates classical MARL algorithms with quantum‑inspired optimization techniques, leveraging variational quantum circuits VQCs as the core structure and employing the Quantum Approximate Optimization Algorithm QAOA as a representative VQC based method for combinatorial optimization. Complementary probabilistic modeling is incorporated through Bayesian inference, Gaussian processes, and variational inference to capture latent environmental dynamics. A centralized training with decentralized execution CTDE paradigm is adopted, where shared memory and local view grids enhance local observability among agents. Comprehensive experiments including scalability tests, sensitivity analysis, and comparisons with PPO and DDPG baselines demonstrate that the proposed framework improves sample efficiency, accelerates convergence, and enhances coverage performance while maintaining robustness. Radar chart and convergence analyses further show that QI MARL achieves a superior balance between exploration and exploitation compared to classical methods. All implementation code and supplementary materials are publicly available on GitHub to ensure reproducibility.
Authors: Rajdeep Chatterjee, Sudip Chakrabarty, Trishaani Acharjee, Deepanjali Mishra
Abstract: Unmanned aerial vehicles (UAVs), commonly known as drones, are increasingly used across diverse domains, including logistics, agriculture, surveillance, and defense. While these systems provide numerous benefits, their misuse raises safety and security concerns, making effective detection mechanisms essential. Acoustic sensing offers a low‑cost and non‑intrusive alternative to vision or radar‑based detection, as drone propellers generate distinctive sound patterns. This study introduces AUDRON (AUdio‑based Drone Recognition Network), a hybrid deep learning framework for drone sound detection, employing a combination of Mel‑Frequency Cepstral Coefficients (MFCC), Short‑Time Fourier Transform (STFT) spectrograms processed with convolutional neural networks (CNNs), recurrent layers for temporal modeling, and autoencoder‑based representations. Feature‑level fusion integrates complementary information before classification. Experimental evaluation demonstrates that AUDRON effectively differentiates drone acoustic signatures from background noise, achieving high accuracy while maintaining generalizability across varying conditions. AUDRON achieves 98.51 percent and 97.11 percent accuracy in binary and multiclass classification. The results highlight the advantage of combining multiple feature representations with deep learning for reliable acoustic drone detection, suggesting the framework's potential for deployment in security and surveillance applications where visual or radar sensing may be limited.
Authors: Tim Aebersold, Soheyl Massoudi, Mark D. Fuge
Abstract: Engineering complex systems (aircraft, buildings, vehicles) requires coordinating geometric and performance couplings across subsystems. As generative models proliferate for specialized domains, a key research gap is how to coordinate frozen, pre‑trained submodels to generate full‑system designs that are feasible, diverse, and high‑performing. We introduce GLUE, which orchestrates pre‑trained, frozen generators while enforcing system‑level feasibility, optimality, and diversity. Compatible models must be end‑to‑end differentiable with a smooth, well‑behaved latent‑to‑output mapping. We propose and benchmark (i) data‑driven GLUE models trained on pre‑generated system‑level designs and (ii) a data‑free GLUE model trained on a differentiable geometry layer. On a UAV design problem with five coupling constraints, we find that data‑driven approaches yield diverse, high‑performing designs but require large datasets to satisfy constraints reliably. The data‑free approach is competitive with Bayesian optimization and gradient‑based optimization in performance and feasibility while training a full generative model in only ~10 min on an RTX 4090 GPU, requiring more than two orders of magnitude fewer geometry evaluations and FLOPs than the data‑driven method. We identify equality constraint satisfaction as a key difficulty and remaining limitation, and ablate approaches that improve this for the data‑free approach. As a first step toward scaling generative design to complex, real‑world engineering systems, this work explores how unmodified, domain‑informed submodels can be integrated into a modular generative workflow.
Authors: Tarek Bouazza, Alessandro Melis, Soulaimane Berkane, Robert Mahony, Tarek Hamel
Abstract: This paper tackles the problem of estimating the relative position, orientation, and velocity between a UAV and a planar platform undergoing arbitrary 3D motion during approach and landing. The estimation relies on measurements from Inertial Measurement Units (IMUs) mounted on both systems, assuming there is a suitable communication channel to exchange data, together with visual information provided by an onboard monocular camera, from which the bearing (line‑of‑sight direction) to the platform's center and the normal vector of its planar surface are extracted. We propose a cascade observer with a complementary filter on SO(3) to reconstruct the relative attitude, followed by a linear Riccati observer for relative position and velocity estimation. Convergence of both observers is established under persistently exciting conditions, and the cascade is shown to be almost globally asymptotically and locally exponentially stable. We further extend the design to the case where the platform's rotation is restricted to its normal axis and show that its measured linear acceleration can be exploited to recover the remaining unobservable rotation angle. A sufficient condition to ensure local exponential convergence in this setting is provided. The performance of the proposed observers is validated through extensive simulations.
Authors: Tao Li, Zhenbao Yu, Banglei Guan, Jianli Han, Weimin Lv, Friedrich Fraundorfer
Abstract: This work presents two novel solvers for estimating the relative poses among views with known vertical directions. The vertical directions of camera views can be easily obtained using inertial measurement units (IMUs) which have been widely used in autonomous vehicles, mobile phones, and unmanned aerial vehicles (UAVs). Given the known vertical directions, our lgorithms only need to solve for two rotation angles and two translation vectors. In this paper, a linear closed‑form solution has been described, requiring only four point correspondences in three views. We also propose a minimal solution with three point correspondences using the latest Gröbner basis solver. Since the proposed methods require fewer point correspondences, they can be efficiently applied within the RANSAC framework for outliers removal and pose estimation in visual odometry. The proposed method has been tested on both synthetic data and real‑world scenes from KITTI. The experimental results show that the accuracy of the estimated poses is superior to other alternative methods.
Authors: Pengyu Chen, Tao Ouyang, Ke Luo, Weijie Hong, Xu Chen
Abstract: Autonomous navigation for Unmanned Aerial Vehicles faces key challenges from limited onboard computational resources, which restrict deployed deep neural networks to shallow architectures incapable of handling complex environments. Offloading tasks to remote edge servers introduces high latency, creating an inherent trade‑off in system design. To address these limitations, we propose CoDrone ‑ the first cloud‑edge‑end collaborative computing framework integrating foundation models into autonomous UAV cruising scenarios ‑ effectively leveraging foundation models to enhance performance of resource‑constrained unmanned aerial vehicle platforms. To reduce onboard computation and data transmission overhead, CoDrone employs grayscale imagery for the navigation model. When enhanced environmental perception is required, CoDrone leverages the edge‑assisted foundation model Depth Anything V2 for depth estimation and introduces a novel one‑dimensional occupancy grid‑based navigation method ‑ enabling fine‑grained scene understanding while advancing efficiency and representational simplicity of autonomous navigation. A key component of CoDrone is a Deep Reinforcement Learning‑based neural scheduler that seamlessly integrates depth estimation with autonomous navigation decisions, enabling real‑time adaptation to dynamic environments. Furthermore, the framework introduces a UAV‑specific vision language interaction module incorporating domain‑tailored low‑level flight primitives to enable effective interaction between the cloud foundation model and the UAV. The introduction of VLM enhances open‑set reasoning capabilities in complex unseen scenarios. Experimental results show CoDrone outperforms baseline methods under varying flight speeds and network conditions, achieving a 40% increase in average flight distance and a 5% improvement in average Quality of Navigation.
Authors: Zhenguo Gao, Hui Li, Yiqin Chen, Qingyu Gao, Zhufang Kuang, Shih-Hau Fang, Hsiao-Chun Wu
Abstract: The high mobility and flexible deployment capability of UAVs make them an impressive option for charging nodes in Wireless Rechargeable Sensor Networks (WRSNs) using Directional Wireless Power Transfer (WPT) technology. However, existing studies largely focus on 2D‑WRSNs, lacking designs catering to real 3D‑WRSNs. The spatial distribution characteristics of nodes in a 3D‑WRSN further increase the complexity of the charging scheduling task, thus requiring a systematic framework to solve this problem. In this paper, we investigated the Directional UAV Charging Scheduling problem for 3D‑WRSNs (DCS‑3D) and established its NP‑hard property, and then proposed a three‑step framework named as directional charging scheduling algorithm using Functional Equivalent (FuncEqv) direction set and Lin‑Kernighan heuristic (LKH) for 3D‑WRSNs (FELKH‑3D) to solve it. In FELKH‑3D, the challenge of infinite charging direction space is solved by designing an algorithm generating a minimum‑size direction set guaranteed to be FuncEqv to the infinite set of whole sphere surface, and the optimaility of the method was proved.To determine the optimal charging tour for the UAV, the LKH algorithm is employed.Simulation experiments demonstrated the superiority of FELKH‑3D over other classical algorithms.
Authors: Xu Liu, Yu Liu, Hanshuo Qiu, Yang Qirong, Zhouhui Lian
Abstract: Vision‑Language Navigation (VLN) enables agents to navigate in complex environments by following natural language instructions grounded in visual observations. Although most existing work has focused on ground‑based robots or outdoor Unmanned Aerial Vehicles (UAVs), indoor UAV‑based VLN remains underexplored, despite its relevance to real‑world applications such as inspection, delivery, and search‑and‑rescue in confined spaces. To bridge this gap, we introduce IndoorUAV, a novel benchmark and method specifically tailored for VLN with indoor UAVs. We begin by curating over 1,000 diverse and structurally rich 3D indoor scenes from the Habitat simulator. Within these environments, we simulate realistic UAV flight dynamics to collect diverse 3D navigation trajectories manually, further enriched through data augmentation techniques. Furthermore, we design an automated annotation pipeline to generate natural language instructions of varying granularity for each trajectory. This process yields over 16,000 high‑quality trajectories, comprising the IndoorUAV‑VLN subset, which focuses on long‑horizon VLN. To support short‑horizon planning, we segment long trajectories into sub‑trajectories by selecting semantically salient keyframes and regenerating concise instructions, forming the IndoorUAV‑VLA subset. Finally, we introduce IndoorUAV‑Agent, a novel navigation model designed for our benchmark, leveraging task decomposition and multimodal reasoning. We hope IndoorUAV serves as a valuable resource to advance research on vision‑language embodied AI in the indoor aerial navigation domain.
Authors: Wencan Mao, Quanxi Zhou, Tomas Couso Coddou, Manabu Tsukada, Yunling Liu, Yusheng Ji
Abstract: Unmanned aerial vehicles (UAVs) have emerged as a promising auxiliary platform for smart agriculture, capable of simultaneously performing weed detection, recognition, and data collection from wireless sensors. However, trajectory planning for UAV‑based smart agriculture is challenging due to the high uncertainty of the environment, partial observations, and limited battery capacity of UAVs. To address these issues, we formulate the trajectory planning problem as a Markov decision process (MDP) and leverage multi‑agent reinforcement learning (MARL) to solve it. Furthermore, we propose a novel imitation‑based triple deep Q‑network (ITDQN) algorithm, which employs an elite imitation mechanism to reduce exploration costs and utilizes a mediator Q‑network over a double deep Q‑network (DDQN) to accelerate and stabilize training and improve performance. Experimental results in both simulated and real‑world environments demonstrate the effectiveness of our solution. Moreover, our proposed ITDQN outperforms DDQN by 4.43% in weed recognition rate and 6.94% in data collection rate.
Authors: Quanxi Zhou, Wencan Mao, Yilei Liang, Manabu Tsukada, Yunling Liu, Jon Crowcroft
Abstract: The widespread application of wireless communication technology has promoted the development of smart agriculture, where unmanned aerial vehicles (UAVs) play a multifunctional role. We target a multi‑UAV smart agriculture system where UAVs cooperatively perform data collection, image acquisition, and communication tasks. In this context, we model a Markov decision process to solve the multi‑UAV trajectory planning problem. Moreover, we propose a novel Elite Imitation Actor‑Shared Ensemble Critic (EIA‑SEC) framework, where agents adaptively learn from the elite agent to reduce trial‑and‑error costs, and a shared ensemble critic collaborates with each agent's local critic to ensure unbiased objective value estimates and prevent overestimation. Experimental results demonstrate that EIA‑SEC outperforms state‑of‑the‑art baselines in terms of reward performance, training stability, and convergence speed.
Authors: Zidong Gu, Shoufu Tian
Abstract: Object detection in aerial imagery is a critical task in applications such as UAV reconnaissance. Although existing methods have extensively explored feature interaction between different modalities, they commonly rely on simple fusion strategies for feature aggregation. This introduces two critical flaws: it is prone to cross‑modal noise and disrupts the hierarchical structure of the feature pyramid, thereby impairing the fine‑grained detection of small objects. To address this challenge, we propose the Pyramidal Adaptive Cross‑Gating Network (PACGNet), an architecture designed to perform deep fusion within the backbone. To this end, we design two core components: the Symmetrical Cross‑Gating (SCG) module and the Pyramidal Feature‑aware Multimodal Gating (PFMG) module. The SCG module employs a bidirectional, symmetrical "horizontal" gating mechanism to selectively absorb complementary information, suppress noise, and preserve the semantic integrity of each modality. The PFMG module reconstructs the feature hierarchy via a progressive hierarchical gating mechanism. This leverages the detailed features from a preceding, higher‑resolution level to guide the fusion at the current, lower‑resolution level, effectively preserving fine‑grained details as features propagate. Through evaluations conducted on the DroneVehicle and VEDAI datasets, our PACGNet sets a new state‑of‑the‑art benchmark, with mAP50 scores reaching 82.2% and 82.1% respectively.
Authors: Evangelos Vlachos
Abstract: This letter investigates the coupled control problem in UAV networks utilizing high‑frequency hybrid beamsteering. While phased arrays enable rapid electronic scanning, their finite Field of View (FoV) imposes a fundamental constraint that necessitates active mechanical steering of the airframe to maintain connectivity. We propose a decentralized Model Predictive Control (MPC) framework that jointly optimizes trajectory and heading to maximize network sum‑capacity subject to safety constraints. Addressing the numerical instability caused by fast‑fading channel nulls, we introduce a regularized surrogate cost function based on discrete spatial smoothing. We analytically prove that this approximation bounds the cost curvature, restoring the Lipschitz continuity of the gradient. Crucially, we derive a sufficient condition linking this Lipschitz constant to the controller gain, guaranteeing the contraction and linear convergence of the distributed best‑response dynamics. Simulation results demonstrate that the proposed algorithm effectively navigates the trade‑off between electronic beam tracking and kinematic safety, significantly systematically outperforming velocity‑aligned baselines.
Authors: Ami Pandat, Punna Rajasekhar, Gopika Vinod, Rohit Shukla
Abstract: Unmanned Aerial Vehicles, commonly known as, drones pose increasing risks in civilian and defense settings, demanding accurate and real‑time drone detection systems. However, detecting drones is challenging because of their small size, rapid movement, and low visual contrast. A modified architecture of YolovN called the YolovN‑CBi is proposed that incorporates the Convolutional Block Attention Module (CBAM) and the Bidirectional Feature Pyramid Network (BiFPN) to improve sensitivity to small object detections. A curated training dataset consisting of 28K images is created with various flying objects and a local test dataset is collected with 2500 images consisting of very small drone objects. The proposed architecture is evaluated on four benchmark datasets, along with the local test dataset. The baseline Yolov5 and the proposed Yolov5‑CBi architecture outperform newer Yolo versions, including Yolov8 and Yolov12, in the speed‑accuracy trade‑off for small object detection. Four other variants of the proposed CBi architecture are also proposed and evaluated, which vary in the placement and usage of CBAM and BiFPN. These variants are further distilled using knowledge distillation techniques for edge deployment, using a Yolov5m‑CBi teacher and a Yolov5n‑CBi student. The distilled model achieved a mA@P0.5:0.9 of 0.6573, representing a 6.51% improvement over the teacher's score of 0.6171, highlighting the effectiveness of the distillation process. The distilled model is 82.9% faster than the baseline model, making it more suitable for real‑time drone detection. These findings highlight the effectiveness of the proposed CBi architecture, together with the distilled lightweight models in advancing efficient and accurate real‑time detection of small UAVs.
Authors: Yuncheng Lu, Yucen Shi, Aobo Li, Zehao Li, Junying Li, Bo Wang, Tony Tae-Hyoung Kim
Abstract: We present an energy‑efficient anti‑UAV system that integrates frame‑based and event‑driven object tracking to enable reliable detection of small and fast‑moving drones. The system reconstructs binary event frames using run‑length encoding, generates region proposals, and adaptively switches between frame mode and event mode based on object size and velocity. A Fast Object Tracking Unit improves robustness for high‑speed targets through adaptive thresholding and trajectory‑based classification. The neural processing unit supports both grayscale‑patch and trajectory inference with a custom instruction set and a zero‑skipping MAC architecture, reducing redundant neural computations by more than 97 percent. Implemented in 40 nm CMOS technology, the 2 mm^2 chip achieves 96 pJ per frame per pixel and 61 pJ per event at 0.8 V, and reaches 98.2 percent recognition accuracy on public UAV datasets across 50 to 400 m ranges and 5 to 80 pixels per second speeds. The results demonstrate state‑of‑the‑art end‑to‑end energy efficiency for anti‑UAV systems.
Authors: Xiaopeng Yuan, Peng Wu, Xinran Wang, Yulin Hu, Anke Schmeink
Abstract: In this paper, we investigate an integrated sensing‑and‑communication (ISAC) network enabled by an unmanned aerial vehicle (UAV). The UAV is supposed to fly along a periodical circular trajectory at a fixed height for ISAC service supply from the sky. We consider on‑demand sensing services, where on‑demand detection and on‑demand localization requests may be activated at any time toward any position within the targeted serving region. While guaranteeing satisfactory accuracy for both on‑demand sensing tasks, we aim at maximizing the minimum achievable throughput among all communication users, via joint optimizing the UAV trajectory and communication user scheduling. To address the complicated problem with infinite sensing constraints, we characterize the on‑demand detection constraint as a restricted deployment area for UAV and the on‑demand localization constraint as Cramer‑Rao Bound (CRB) constraints over finite reference target points, based on which the original problem is simplified to more tractable one. Afterwards, particularly aiming to ensure no violations of CRB constraints, we propose a convex approximation for the reformulated problem, where tight approximation is guaranteed at given local solution. The construction strategy for convex problem approximation allows an efficient iterative algorithm with verified convergence to a superior suboptimal solution. At last, with simulations, we verified the applicability of our developed optimization scheme in strictly fulfilling the on‑demand sensing constraints and the effectiveness of our proposed solution for simultaneously enhancing the communication throughput in UAV‑enabled ISAC.
Authors: Ufuk Asil, Efendi Nasibov
Abstract: This study presents an innovative hybrid Visual‑Inertial Odometry (VIO) method for Unmanned Aerial Vehicles (UAVs) that is resilient to environmental challenges and capable of dynamically assessing sensor reliability. Built upon a loosely coupled sensor fusion architecture, the system utilizes a novel hybrid Quaternion‑focused Error‑State EKF/UKF (Qf‑ES‑EKF/UKF) architecture to process inertial measurement unit (IMU) data. This architecture first propagates the entire state using an Error‑State Extended Kalman Filter (ESKF) and then applies a targeted Scaled Unscented Kalman Filter (SUKF) step to refine only the orientation. This sequential process blends the accuracy of SUKF in quaternion estimation with the overall computational efficiency of ESKF. The reliability of visual measurements is assessed via a dynamic sensor confidence score based on metrics, such as image entropy, intensity variation, motion blur, and inference quality, adapting the measurement noise covariance to ensure stable pose estimation even under challenging conditions. Comprehensive experimental analyses on the EuRoC MAV dataset demonstrate key advantages: an average improvement of 49% in position accuracy in challenging scenarios, an average of 57% in rotation accuracy over ESKF‑based methods, and SUKF‑comparable accuracy achieved with approximately 48% lower computational cost than a full SUKF implementation. These findings demonstrate that the presented approach strikes an effective balance between computational efficiency and estimation accuracy, and significantly enhances UAV pose estimation performance in complex environments with varying sensor reliability.
Authors: Yuqi Ping, Junwei Wu, Bofeng Zheng, Fan Liu, Tianhao Liang, Tingting Zhang
Abstract: In this letter, we present an uncertainty‑aware single‑anchor Ultra‑Wideband (UWB)‑based 3D tracking framework. Specifically, a mobile Unmanned Aerial Vehicle (UAV) maintains a desired standoff distance to a moving target using range and 3D bearing measurements from a multi‑antenna UWB anchor rigidly mounted on the UAV. To enhance the stability and safety under measurement degradation and motion uncertainty, we jointly design a robust factor‑graph‑based target localization method and a covariance‑aware control Lyapunov function‑‑control barrier function (CLF‑‑CBF) tracking controller. This controller adaptively adjusts distance bounds and safety margins based on the posterior target covariance provided by the factor graph. The proposed system is evaluated through numerical simulations and real‑world experiments carried out in a narrow indoor corridor environment.
Authors: Tianhao Shao, Kaixing Zhao, Feng Liu, Lixin Yang, Bin Guo
Abstract: As unmanned systems such as Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs) become increasingly important to applications like urban sensing and emergency response, efficiently recruiting these autonomous devices to perform time‑sensitive tasks has become a critical challenge. This paper presents MPBS (Mobility‑aware Prediction and Behavior‑based Scheduling), a scalable task recruitment framework that treats each device as a recruitable "user". MPBS integrates three key modules: a behavior‑aware KNN classifier, a time‑varying Markov prediction model for forecasting device mobility, and a dynamic priority scheduling mechanism that considers task urgency and base station performance. By combining behavioral classification with spatiotemporal prediction, MPBS adaptively assigns tasks to the most suitable devices in real time. Experimental evaluations on the real‑world GeoLife dataset show that MPBS significantly improves task completion efficiency and resource utilization. The proposed framework offers a predictive, behavior‑aware solution for intelligent and collaborative scheduling in unmanned systems.
Authors: Huayu Huang, Chen Chen, Banglei Guan, Ze Tan, Yang Shang, Zhang Li, Qifeng Yu
Abstract: Tracking and measuring targets using a variety of sensors mounted on UAVs is an effective means to quickly and accurately locate the target. This paper proposes a fusion localization method based on ridge estimation, combining the advantages of rich scene information from sequential imagery with the high precision of laser ranging to enhance localization accuracy. Under limited conditions such as long distances, small intersection angles, and large inclination angles, the column vectors of the design matrix have serious multicollinearity when using the least squares estimation algorithm. The multicollinearity will lead to ill‑conditioned problems, resulting in significant instability and low robustness. Ridge estimation is introduced to mitigate the serious multicollinearity under the condition of limited observation. Experimental results demonstrate that our method achieves higher localization accuracy compared to ground localization algorithms based on single information. Moreover, the introduction of ridge estimation effectively enhances the robustness, particularly under limited observation conditions.
Authors: Yifei Qiu, Tianle Liao, Xin Jin, Qinyu Zhang, Shaohua Wu
Abstract: A space‑air‑ground‑sea integrated network (SAGSIN) has emerged as a cornerstone of 6G systems, establishing a unified global architecture by integrating multi‑domain network resources. Motivated by the demand for real‑time situational awareness and intelligent operational maintenance, digital twin (DT) technology was initially regarded as a promising solution, owing to its capability to create virtual replicas and emulate physical system behaviors. However, in the context of SAGSIN, the high‑fidelity, full‑scale modeling paradigm inherent to conventional DTs encounters fundamental limitations, including prohibitive computational overhead, delayed model synchronization, and cross‑system semantic gaps. To address these limitations, this survey paper proposes a novel twinning framework: goal‑oriented semantic twin (GOST). Unlike DTs that pursue physical mirroring, GOST prioritizes ``utility'' over ``fidelity,'' leveraging semantic technologies and goal‑oriented principles to construct lightweight, task‑specific representations. This paper systematically articulates the GOST framework through three layers: knowledge‑based semantics, data‑driven semantics, and goal‑oriented principles. Furthermore, we provide a comprehensive tutorial on constructing GOST by detailing its core enabling technologies and introduce a multidimensional evaluation framework for GOST. We present a case study targeting collaborative tracking tasks in remote satellite‑UAV networks, demonstrating that GOST significantly outperforms conventional DTs in timeliness of perceptual data and collaborative tracking. Finally, we outline research directions, establishing GOST as a transformative twinning paradigm to guide the development of SAGSIN.
Authors: Shuaidong Ji, Mahdi Bamdad, Francisco Cruz
Abstract: Efficient and reliable UAV navigation in cluttered and dynamic environments remains challenging. We propose SWIFT‑Nav: Stability‑aware Waypoint‑level Integration of Fuzzy arbitration and TD3 for Navigation, a TD3‑based navigation framework that achieves fast, stable convergence to obstacle‑aware paths. The system couples a sensor‑driven perception front end with a TD3 waypoint policy: the perception module converts LiDAR ranges into a confidence‑weighted safety map and goal cues, while the TD3 policy is trained with Prioritised Experience Replay to focus on high‑error transitions and a decaying epsilon‑greedy exploration schedule that gradually shifts from exploration to exploitation. A lightweight fuzzy‑logic layer computes a safety score from radial measurements and near obstacles, gates mode switching and clamps unsafe actions; in parallel, task‑aligned reward shaping combining goal progress, clearance, and switch‑economy terms provides dense, well‑scaled feedback that accelerates learning. Implemented in Webots with proximity‑based collision checking, our approach consistently outperforms baselines in trajectory smoothness and generalization to unseen layouts, while preserving real‑time responsiveness. These results show that combining TD3 with replay prioritisation, calibrated exploration, and fuzzy‑safety rules yields a robust and deployable solution for UAV navigation in cluttered scenes.
Authors: Yuze Wu, Mo Zhu, Xingxing Li, Yuheng Du, Yuxin Fan, Wenjun Li, Zhichao Han, Xin Zhou, Fei Gao
Abstract: This paper proposes VLA‑AN, an efficient and onboard Vision‑Language‑Action (VLA) framework dedicated to autonomous drone navigation in complex environments. VLA‑AN addresses four major limitations of existing large aerial navigation models: the data domain gap, insufficient temporal navigation with reasoning, safety issues with generative action policies, and onboard deployment constraints. First, we construct a high‑fidelity dataset utilizing 3D Gaussian Splatting (3D‑GS) to effectively bridge the domain gap. Second, we introduce a progressive three‑stage training framework that sequentially reinforces scene comprehension, core flight skills, and complex navigation capabilities. Third, we design a lightweight, real‑time action module coupled with geometric safety correction. This module ensures fast, collision‑free, and stable command generation, mitigating the safety risks inherent in stochastic generative policies. Finally, through deep optimization of the onboard deployment pipeline, VLA‑AN achieves a robust real‑time 8.3x improvement in inference throughput on resource‑constrained UAVs. Extensive experiments demonstrate that VLA‑AN significantly improves spatial grounding, scene reasoning, and long‑horizon navigation, achieving a maximum single‑task success rate of 98.1%, and providing an efficient, practical solution for realizing full‑chain closed‑loop autonomy in lightweight aerial robots.
Authors: Yufeng Xie, Cong Wang
Abstract: Infrared and visible image fusion (IVIF) is a pivotal technology in low‑altitude Unmanned Aerial Vehicle (UAV) reconnaissance missions, enabling robust target detection and tracking by integrating thermal saliency with environmental textures. However, traditional no‑reference metrics (Statistics‑based metrics and Gradient‑based metrics) fail in complex low‑light environments, termed the ``Noise Trap''. This paper mathematically prove that these metrics are positively correlated with high‑frequency sensor noise, paradoxically assigning higher scores to degraded images and misguiding algorithm optimization. To address this, we propose the Target‑Background Contrast (TBC) metric. Inspired by Weber's Law, TBC focuses on the relative contrast of salient targets rather than global statistics. Unlike traditional metrics, TBC penalizes background noise and rewards target visibility. Extensive experiments on the DroneVehicle dataset demonstrate the superiority of TBC. Results show that TBC exhibits high ``Semantic Discriminability'' in distinguishing thermal targets from background clutter. Furthermore, TBC achieves remarkable computational efficiency, making it a reliable and real‑time standard for intelligent UAV systems.
Authors: Yiqin Deng, Zhengru Fang, Senkang Hu, Yanan Ma, Haixia Zhang, Yuguang Fang
Abstract: This paper presents an innovative framework that synergistically enhances computing performance through ubiquitous computing power distribution and dynamic computing node accessibility control via adaptive unmanned aerial vehicle (UAV) positioning, establishing UAV‑enabled Computing Power Networks (UAV‑CPNs). In UAV‑CPNs, UAVs function as dynamic aerial relays, outsourcing tasks generated in the request zone to an expanded service zone, consisting of a diverse range of computing devices, from vehicles with onboard computational capabilities and edge servers to dedicated computing nodes. This approach has the potential to alleviate communication bottlenecks in traditional computing power networks and overcome the "island effect" observed in multi‑access edge computing. However, how to quantify the network performance under the complex spatio‑temporal dynamics of both communication and computing power is a significant challenge, which introduces intricacies beyond those found in conventional networks. To address this, in this paper, we introduce task completion probability as the primary performance metric for evaluating the ability of UAV‑CPNs to complete ground users' tasks within specified end‑to‑end latency requirements. Utilizing theories from stochastic processes and stochastic geometry, we derive analytical expressions that facilitate the assessment of this metric. Our numerical results emphasize that striking a delicate balance between communication and computational capabilities is essential for enhancing the performance of UAV‑CPNs. Moreover, our findings show significant performance gains from the widespread distribution of computing nodes.
Authors: Jiayang Wan, Ke He, Yafei Wang, Fan Liu, Wenjin Wang, Shi Jin
Abstract: Due to the significant variations in unmanned aerial vehicle (UAV) altitude and horizontal mobility, it becomes difficult for any single network to ensure continuous and reliable threedimensional coverage. Towards that end, the space‑air‑ground integrated network (SAGIN) has emerged as an essential architecture for enabling ubiquitous UAV connectivity. To address the pronounced disparities in coverage and signal characteristics across heterogeneous networks, this paper formulates UAV mobility management in SAGIN as a constrained multi‑objective joint optimization problem. The formulation couples discrete link selection with continuous trajectory optimization. Building on this, we propose a two‑level multi‑agent hierarchical deep reinforcement learning (HDRL) framework that decomposes the problem into two alternately solvable subproblems. To map complex link selection decisions into a compact discrete action space, we conceive a double deep Q‑network (DDQN) algorithm in the top‑level, which achieves stable and high‑quality policy learning through double Q‑value estimation. To handle the continuous trajectory action space while satisfying quality of service (QoS) constraints, we integrate the maximum‑entropy mechanism of the soft actor‑critic (SAC) and employ a Lagrangian‑based constrained SAC (CSAC) algorithm in the lower‑level that dynamically adjusts the Lagrange multipliers to balance constraint satisfaction and policy optimization. Moreover, the proposed algorithm can be extended to multi‑UAV scenarios under the centralized training and decentralized execution (CTDE) paradigm, which enables more generalizable policies. Simulation results demonstrate that the proposed scheme substantially outperforms existing benchmarks in throughput, link switching frequency and QoS satisfaction.
Authors: Aleksi Karhunen, Teemu Hakala, Väinö Karjalainen, Eija Honkavaara
Abstract: Interest in utilizing autonomous uncrewed aerial vehicles (UAVs) for under‑canopy forest remote sensing has increased in recent years, resulting in the publication of numerous autonomous flight algorithms in the scientific literature. To support the selection and development of such algorithms, a reliable comparison of existing approaches based on published studies is essential. However, reliable comparisons are currently challenging due to widely varying experimental setups and incomplete reporting practices. This study proposes a standardized experimental setup for evaluating autonomous under‑canopy UAV systems to fill this gap. The proposed setup emphasizes quantitative reporting of forest complexity, visual representation of test environments, execution of multiple repeated flights, and reporting of flight success rates alongside qualitative flight results. In addition, flights at multiple target speeds are encouraged, with reporting of realized flight speed, mission completion time, and point‑to‑point flight distance. The proposed setup is demonstrated using a lightweight lidar‑based quadrotor employing state‑of‑the‑art open‑source algorithms, evaluated through extensive experiments in two natural boreal forest environments. Based on a systematic evaluation of the original system, several improvements were introduced. The same experimental protocol was then repeated with the optimized system, resulting in a total of 93 real‑world flights. The optimized system achieved success rates of 12/15 and 15/15 at target flight speeds of 1 m/s and 2 m/s, respectively, in a medium‑difficulty forest, and 12/15 and 5/15 in a difficult forest. Adoption of the proposed experimental setup would facilitate the literature‑based comparison of autonomous under‑canopy flight systems and support systematic performance improvement of future UAV‑based forest robotics solutions.
Authors: Houyi Qi, Minghui Liwang, Seyyedali Hosseinalipour, Liqun Fu, Sai Zou, Xianbin Wang, Wei Ni, Yiguang Hong
Abstract: In this paper, we introduce a first‑of‑its‑kind forecasting‑driven, incentive‑inherent service provisioning framework for distributed air‑ground integrated networks that explicitly accounts for human‑machine coexistence. In our framework, vehicular‑UAV agent pairs (APs) are proactively dispatched to overloaded hotspots to augment the computing capacity of edge servers (ESs), which in turn gives rise to a set of challenges that we jointly address: highly uncertain spatio‑temporal workloads, spatio‑temporal coupling between road traffic and UAV capacity, forecast‑driven contracting risks, and heterogeneous quality‑of‑service (QoS) requirements of human users (HUs) and machine users (MUs). To address these challenges, we propose FUSION, a two‑stage optimization framework, consisting of an offline stage and an online stage. In the offline stage, a liquid neural network‑powered module performs multi‑step spatio‑temporal demand forecasting at distributed ESs, whose outputs are exploited by an enhanced ant colony optimization‑based routing scheme and an auction‑based incentive‑compatible contracting mechanism, to jointly determine ES‑AP contracts and pre‑planned service routes. In the online stage, we formulate the congestion‑aware task scheduling as a potential game among HUs, MUs, and heterogeneous ES/UAVs, and devise a potential‑guided best‑response dynamics algorithm that provably converges to a pure‑strategy Nash equilibrium. Experiments on both synthetic and real‑world datasets show that FUSION consistently achieves higher social welfare and improved resource utilization, while maintaining latency and energy costs comparable to state‑of‑the‑art baselines and preserving individual rationality, budget balance, and near‑truthfulness.
Authors: Xichen Ding, Jianzhe Gao, Cong Pan, Wenguan Wang, Jie Qin
Abstract: Aerial Vision‑and‑Language Navigation (AVLN) requires Unmanned Aerial Vehicle (UAV) agents to localize targets in large‑scale urban environments based on linguistic instructions. While successful navigation demands both global environmental reasoning and local scene comprehension, existing UAV agents typically adopt mono‑granularity frameworks that struggle to balance these two aspects. To address this limitation, this work proposes a History‑Enhanced Two‑Stage Transformer (HETT) framework, which integrates the two aspects through a coarse‑to‑fine navigation pipeline. Specifically, HETT first predicts coarse‑grained target positions by fusing spatial landmarks and historical context, then refines actions via fine‑grained visual analysis. In addition, a historical grid map is designed to dynamically aggregate visual features into a structured spatial memory, enhancing comprehensive scene awareness. Additionally, the CityNav dataset annotations are manually refined to enhance data quality. Experiments on the refined CityNav dataset show that HETT delivers significant performance gains, while extensive ablation studies further verify the effectiveness of each component.
Authors: Zhuoxiao Li, Wenzong Ma, Taoyu Wu, Jinjing Zhu, Zhenchao Q, Shuai Zhang, Jing Ou, Yinrui Ren, Weiqing Qi, Guobin Shen, Hui Xiong, Wufan Zhao
Abstract: Recent advances in Neural Radiance Fields and 3D Gaussian Splatting have demonstrated strong potential for large‑scale UAV‑based 3D reconstruction tasks by fitting the appearance of images. However, real‑world large‑scale captures are often based on multi‑temporal data capture, where illumination inconsistencies across different times of day can significantly lead to color artifacts, geometric inaccuracies, and inconsistent appearance. Due to the lack of UAV datasets that systematically capture the same areas under varying illumination conditions, this challenge remains largely underexplored. To fill this gap, we introduceSkyLume, a large‑scale, real‑world UAV dataset specifically designed for studying illumination robust 3D reconstruction in urban scene modeling: (1) We collect data from 10 urban regions data comprising more than 100k high resolution UAV images (four oblique views and nadir), where each region is captured at three periods of the day to systematically isolate illumination changes. (2) To support precise evaluation of geometry and appearance, we provide per‑scene LiDAR scans and accurate 3D ground‑truth for assessing depth, surface normals, and reconstruction quality under varying illumination. (3) For the inverse rendering task, we introduce the Temporal Consistency Coefficient (TCC), a metric that measuress cross‑time albedo stability and directly evaluates the robustness of the disentanglement of light and material. We aim for this resource to serve as a foundation that advances research and real‑world evaluation in large‑scale inverse rendering, geometry reconstruction, and novel view synthesis.
Authors: Boyang Li, Zhongpeng Jin, Shuai Zhao, Jiahui Liao, Tian Liu, Han Liu, Yuanhai Zhang, Kai Huang
Abstract: The ability to adapt to changing environments is crucial for the autonomous navigation systems of Unmanned Aerial Vehicles (UAVs). However, existing navigation systems adopt fixed execution configurations without considering environmental dynamics based on available computing resources, e.g., with a high execution frequency and task workload. This static approach causes rigid flight strategies and excessive computations, ultimately degrading flight performance or even leading to failures in UAVs. Despite the necessity for an adaptive system, dynamically adjusting workloads remains challenging, due to difficulties in quantifying environmental complexity and modeling the relationship between environment and system configuration. Aiming at adapting to dynamic environments, this paper proposes E‑Navi, an environmental‑adaptive navigation system for UAVs that dynamically adjusts task executions on the CPUs in response to environmental changes based on available computational resources. Specifically, the perception‑planning pipeline of UAVs navigation system is redesigned through dynamic adaptation of mapping resolution and execution frequency, driven by the quantitative environmental complexity evaluations. In addition, E‑Navi supports flexible deployment across hardware platforms with varying levels of computing capability. Extensive Hardware‑In‑the‑Loop and real‑world experiments demonstrate that the proposed system significantly outperforms the baseline method across various hardware platforms, achieving up to 53.9% navigation task workload reduction, up to 63.8% flight time savings, and delivering more stable velocity control.
Authors: Daniyal Ganiuly, Nurzhau Bolatbek, Assel Smaiyl
Abstract: Cyber‑physical systems (CPS) such as unmanned aerial vehicles are vulnerable to slow degradation that develops without causing immediate or obvious failures. Small sensor biases or timing irregularities can accumulate over time, gradually reducing stability while standard monitoring mechanisms continue to report normal operation. Detecting this early phase of degradation remains a challenge, as most existing approaches focus on abrupt faults or visible trajectory deviations. This paper introduces an early warning method based on stability drift, which measures the divergence between predicted and observed state transitions over short horizons. By tracking the gradual growth of this divergence, the proposed approach identifies emerging instability before it becomes visible in the flight trajectory or estimator residuals. The method operates externally to the flight stack and relies only on standard telemetry, making it suitable for deployment without modifying autopilot firmware. The approach was evaluated on a PX4 x500 platform in a software in the loop environment under two realistic degradation scenarios, gradual IMU bias drift and timing irregularities in the control loop. In both cases, the stability drift metric provided a consistent early warning signal several seconds before visible instability appeared, while remaining stable during nominal and aggressive but non degraded flight. The results demonstrate that stability drift can serve as a practical indicator of early degradation in UAV control systems. By providing advance notice during a pre instability phase, the proposed method complements existing safety mechanisms and offers additional time for mitigation or safe mode transitions under slow and subtle attacks.
Authors: Yan Zhang, Baoxin Li, Han Sun, Yuhang Gao, Mingtai Zhang, Pei Wang
Abstract: Forest pests threaten ecosystem stability, requiring efficient monitoring. To overcome the limitations of traditional methods in large‑scale, fine‑grained detection, this study focuses on accurately identifying infected trees and analyzing infestation patterns. We propose FID‑Net, a deep learning model that detects pest‑affected trees from UAV visible‑light imagery and enables infestation analysis via three spatial metrics. Based on YOLOv8n, FID‑Net introduces a lightweight Feature Enhancement Module (FEM) to extract disease‑sensitive cues, an Adaptive Multi‑scale Feature Fusion Module (AMFM) to align and fuse dual‑branch features (RGB and FEM‑enhanced), and an Efficient Channel Attention (ECA) mechanism to enhance discriminative information efficiently. From detection results, we construct a pest situation analysis framework using: (1) Kernel Density Estimation to locate infection hotspots; (2) neighborhood evaluation to assess healthy trees' infection risk; (3) DBSCAN clustering to identify high‑density healthy clusters as priority protection zones. Experiments on UAV imagery from 32 forest plots in eastern Tianshan, China, show that FID‑Net achieves 86.10% precision, 75.44% recall, 82.29% mAP@0.5, and 64.30% mAP@0.5:0.95, outperforming mainstream YOLO models. Analysis confirms infected trees exhibit clear clustering, supporting targeted forest protection. FID‑Net enables accurate tree health discrimination and, combined with spatial metrics, provides reliable data for intelligent pest monitoring, early warning, and precise management.
Authors: Kai Xiong, Xingyu Wu, Anna Duan, Supeng Leng, Jianhua He
Abstract: The efficacy of UAV swarm cooperative perception fundamentally depends on three‑dimensional (3D) formation geometry, which governs target observability and sensor complementarity. In the literature, the exploitation of formation geometry and its impact on UAV sensing have rarely been studied, which can significantly degrade multimodal cooperative perception at scenarios where heterogeneous payloads (vision cameras and LiDAR) should be geometrically arranged to exploit their complementary strengths while managing communication interference and hardware budgets. To bridge this critical gap, we propose an information‑theoretic optimization framework that allocation of UAVs and multimodal sensors, configures formation geometries, and flight control. The UAV‑sensor allocation is optimized by the Fisher Information Matrix (FIM) determinant maximization. Under this framework we introduce an equivalent formation transition strategy that enhances field‑of‑view (FOV) coverage without compromising perception accuracy and communication interference. Furthermore, we design a novel Lyapunov‑stable flight control scheme with logarithmic potential fields to generate energy‑efficient trajectories for formation transitions. Extensive simulations demonstrate our formation‑aware design achieves 25.0% improvement in FOV coverage, 104.2% enhancement in communication signal strength, and 47.2% reduction in energy consumption compared to conventional benchmarks. This work establishes that task‑driven geometric configuration represents a foundational rather than incidental component in next‑generation UAV swarm systems.
Authors: Rishit Agnihotri, Sandeep Kumar Sharma
Abstract: Urban Air Mobility (UAM) poses unprecedented traffic coordination challenges, especially with increasing UAV densities in dense urban corridors. This paper introduces a mathematical model using a control algorithm to optimize an Edge AI‑driven decentralized swarm architecture for intelligent conflict resolution, enabling real‑time decision‑making with low latency. Using lightweight neural networks, the system leverages edge nodes to perform distributed conflict detection and resolution. A simulation platform was developed to evaluate the scheme under various UAV densities. Results indicate that the conflict resolution time is dramatically minimized up to 3.8 times faster, and accuracy is enhanced compared to traditional centralized control models. The proposed architecture is highly promising for scalable, efficient, and safe aerial traffic management in future UAM systems.
Authors: Tian Shi, Wenkun Wen, Peiran Wu, Minghua Xia
Abstract: Low‑altitude wireless networks are increasingly vital for the low‑altitude economy, enabling wireless coverage in high‑mobility and hard‑to‑reach environments. However, providing reliable connectivity to sparsely distributed aerial users in dynamic three‑dimensional (3D) spaces remains a significant challenge. This paper investigates downlink coverage enhancement in vertical heterogeneous networks (VHetNets) beyond 5G, where uncrewed aerial vehicles (UAVs) operate as emerging aerial base stations (ABSs) alongside legacy terrestrial base stations (TBSs). To improve coverage performance, we propose a coordinated multi‑point (CoMP) transmission framework that enables joint transmission from ABSs and TBSs. This approach mitigates the limitations of non‑uniform user distributions and enhances reliability for sparse aerial users. Two UAV deployment strategies are considered: i) random UAV placement, analyzed using stochastic geometry to derive closed‑form coverage expressions, and ii) optimized UAV placement using a coverage‑aware weighted K‑means clustering algorithm to maximize cooperative coverage in underserved areas. Theoretical analyses and Monte Carlo simulations demonstrate that the proposed CoMP‑enabled VHetNet significantly improves downlink coverage probability, particularly in scenarios with sparse aerial users. These findings highlight the potential of intelligent UAV coordination and geometry‑aware deployment to enable robust, adaptive connectivity in low‑altitude wireless networks.
Authors: Jinfan Zhou, Lixin Luo, Sungmin Eum, Heesung Kwon, Jeong Joon Park
Abstract: We explore spatiotemporal data augmentation using video foundation models to diversify both camera viewpoints and scene dynamics. Unlike existing approaches based on simple geometric transforms or appearance perturbations, our method leverages off‑the‑shelf video diffusion models to generate realistic 3D spatial and temporal variations from a given image dataset. Incorporating these synthesized video clips as supplemental training data yields consistent performance gains in low‑data settings, such as UAV‑captured imagery where annotations are scarce. Beyond empirical improvements, we provide practical guidelines for (i) choosing an appropriate spatiotemporal generative setup, (ii) transferring annotations to synthetic frames, and (iii) addressing disocclusion ‑ regions newly revealed and unlabeled in generated views. Experiments on COCO subsets and UAV‑captured datasets show that, when applied judiciously, spatiotemporal augmentation broadens the data distribution along axes underrepresented by traditional and prior generative methods, offering an effective lever for improving model performance in data‑scarce regimes.
Authors: Tasweer Ahmad, Arindam Sikdar, Sandip Pradhan, Ardhendu Behera
Abstract: Few‑shot image classification remains difficult under limited supervision and visual domain shift. Recent cache‑based adaptation approaches (e.g., Tip‑Adapter) address this challenge to some extent by learning lightweight residual adapters over frozen features, yet they still inherit CLIP's tendency to encode global, general‑purpose representations that are not optimally discriminative to adapt the generalist to the specialist's domain in low‑data regimes. We address this limitation with a novel patch‑driven relational refinement that learns cache adapter weights from intra‑image patch dependencies rather than treating an image embedding as a monolithic vector. Specifically, we introduce a relational gated graph attention network that constructs a patch graph and performs edge‑aware attention to emphasize informative inter‑patch interactions, producing context‑enriched patch embeddings. A learnable multi‑aggregation pooling then composes these into compact, task‑discriminative representations that better align cache keys with the target few‑shot classes. Crucially, the proposed graph refinement is used only during training to distil relational structure into the cache, incurring no additional inference cost beyond standard cache lookup. Final predictions are obtained by a residual fusion of cache similarity scores with CLIP zero‑shot logits. Extensive evaluations on 11 benchmarks show consistent gains over state‑of‑the‑art CLIP adapter and cache‑based baselines while preserving zero‑shot efficiency. We further validate battlefield relevance by introducing an Injured vs. Uninjured Soldier dataset for casualty recognition. It is motivated by the operational need to support triage decisions within the "platinum minutes" and the broader "golden hour" window in time‑critical UAV‑driven search‑and‑rescue and combat casualty care.
Authors: Ercan Erkalkan, Vedat Topuz, Ayça Ak
Abstract: This study introduces a lightweight perimeter tracking method designed for micro UAV teams operating over wildfire environments under limited bandwidth conditions. Thermal image frames generate coarse hot region masks through adaptive thresholding and morphological refinement, while RGB frames contribute edge cues and suppress texture related false detections using gradient based filtering. A rule level merging strategy selects boundary candidates and simplifies them via the Ramer Douglas Peucker algorithm. The system incorporates periodic beacons and an inertial feedback loop that maintains trajectory stability in the presence of GPS degradation. The guidance loop targets sub 50 ms latency on embedded System on Chip (SoC) platforms by constraining per frame pixel operations and precomputing gradient tables. Small scale simulations demonstrate reductions in average path length and boundary jitter compared to a pure edge tracking baseline, while maintaining environmental coverage measured through intersection merge analysis. Battery consumption and computational utilization confirm the feasibility of achieving 10, 15 m/s forward motion on standard micro platforms. This approach enables rapid deployment in the field, requiring robust sensing and minimal communications for emergency reconnaissance applications.
Authors: Yun Hou, Yening Zhang
Abstract: This study addressed the challenge of improving network connectivity in autonomous V2X networks by jointly optimizing transmission power and vehicle mobility. We proposed a link reception model based on a sigmoid approximation of SINR and transformed it into a power‑based formulation for simplicity in optimization. Building on this, we formulated a multi‑node Network Utility Maximization (NUM) problem and demonstrated its concavity, enabling distributed trajectory and power adjustments. Both simulation and real‑world experiments validated the theoretical findings, showing that symmetric positioning and balanced power allocation significantly enhance packet reception rates under interference‑limited conditions. These results confirm that coordinated mobility and power control can effectively mitigate interference and improve connectivity in highly dynamic vehicular networks, paving the way for robust communication in future autonomous and UAV systems.
Authors: Yawar Ali, K. Ramachandra Rao, Ashish Bhaskar, Niladri Chatterjee
Abstract: This paper offers openly available microscopic vehicle trajectory (MVT) datasets collected using unmanned aerial vehicles (UAVs) in heterogeneous, area‑based urban traffic conditions. Traditional roadside video collection often fails in dense mixed traffic due to occlusion, limited viewing angles, and irregular vehicle movements. UAV‑based recording provides a top‑down perspective that reduces these issues and captures rich spatial and temporal dynamics. The datasets described here were extracted using the Data from Sky (DFS) platform and validated against manual counts, space mean speeds, and probe trajectories in earlier work. Each dataset contains time‑stamped vehicle positions, speeds, longitudinal and lateral accelerations, and vehicle classifications at a resolution of 30 frames per second. Data were collected at six mid‑block locations in the national capital region of India, covering diverse traffic compositions and density levels. Exploratory analyses highlight key behavioural patterns, including lane‑keeping preferences, speed distributions, and lateral manoeuvres typical of heterogeneous and area‑based traffic settings. These datasets are intended as a resource for the global research community to support simulation modelling, safety assessment, and behavioural studies under area‑based traffic conditions. By making these empirical datasets openly available, this work offers researchers a unique opportunity to develop, test, and validate models that more accurately represent complex urban traffic environments.
Authors: Jiahao You, Ziye Jia, Can Cui, Chao Dong, Qihui Wu, Zhu Han
Abstract: The low‑altitude intelligent networks (LAINs) emerge as a promising architecture for delivering low‑latency and energy‑efficient edge intelligence in dynamic and infrastructure‑limited environments. By integrating unmanned aerial vehicles (UAVs), aerial base stations, and terrestrial base stations, LAINs can support mission‑critical applications such as disaster response, environmental monitoring, and real‑time sensing. However, these systems face key challenges, including energy‑constrained UAVs, stochastic task arrivals, and heterogeneous computing resources. To address these issues, we propose an integrated air‑ground collaborative network and formulate a time‑dependent integer nonlinear programming problem that jointly optimizes UAV trajectory planning and task offloading decisions. The problem is challenging to solve due to temporal coupling among decision variables. Therefore, we design a hierarchical learning framework with two timescales. At the large timescale, a Vickrey‑Clarke‑Groves auction mechanism enables the energy‑aware and incentive‑compatible trajectory assignment. At the small timescale, we propose the diffusion‑heterogeneous‑agent proximal policy optimization, a generative multi‑agent reinforcement learning algorithm that embeds latent diffusion models into actor networks. Each UAV samples actions from a Gaussian prior and refines them via observation‑conditioned denoising, enhancing adaptability and policy diversity. Extensive simulations show that our framework outperforms baselines in energy efficiency, task success rate, and convergence performance.
Authors: Yanan Liu, Jun Liu, Hao Zhang, Dan Xu, Hossein Rahmani, Mohammed Bennamoun, Qiuhong Ke
Abstract: Skeleton‑based action recognition has garnered significant attention in the computer vision community. Inspired by the recent success of the selective state‑space model (SSM) Mamba in modeling 1D temporal sequences, we propose TSkel‑Mamba, a hybrid Transformer‑Mamba framework that effectively captures both spatial and temporal dynamics. In particular, our approach leverages Spatial Transformer for spatial feature learning while utilizing Mamba for temporal modeling. Mamba, however, employs separate SSM blocks for individual channels, which inherently limits its ability to model inter‑channel dependencies. To better adapt Mamba for skeleton data and enhance Mamba`s ability to model temporal dependencies, we introduce a Temporal Dynamic Modeling (TDM) block, which is a versatile plug‑and‑play component that integrates a novel Multi‑scale Temporal Interaction (MTI) module. The MTI module employs multi‑scale Cycle operators to capture cross‑channel temporal interactions, a critical factor in action recognition. Extensive experiments on NTU‑RGB+D 60, NTU‑RGB+D 120, NW‑UCLA and UAV‑Human datasets demonstrate that TSkel‑Mamba achieves state‑of‑the‑art performance while maintaining low inference time, making it both efficient and highly effective.
Authors: Mohammad Sadegh Gholizadeh, Amir Arsalan Rezapour, Hamidreza Shayegh, Ehsan Pazouki
Abstract: Efficient crop detection via Unmanned Aerial Vehicles is critical for scaling precision agriculture, yet it remains challenging due to the small scale of targets and environmental variability. This paper addresses the detection of rice seedlings in paddy fields by leveraging a Faster R‑CNN architecture initialized via transfer learning. To overcome the specific difficulties of detecting minute objects in high‑resolution aerial imagery, we curate a significant UAV dataset for training and rigorously evaluate the model's generalization capabilities. Specifically, we validate performance across three distinct test sets acquired at different temporal intervals, thereby assessing robustness against varying imaging conditions. Our empirical results demonstrate that transfer learning not only facilitates the rapid convergence of object detection models in agricultural contexts but also yields consistent performance despite domain shifts in image acquisition.
Authors: Zamirddine Mari, Jérôme Pasquet, Julien Seinturier
Abstract: Autonomous drone navigation in confined tubular environments remains a major challenge due to the constraining geometry of the conduits, the proximity of the walls, and the perceptual limitations inherent to such scenarios. We propose a reinforcement learning approach enabling a drone to navigate unknown three‑dimensional tubes without any prior knowledge of their geometry, relying solely on local observations from LiDAR and a conditional visual detection of the tube center. In contrast, the Pure Pursuit algorithm, used as a deterministic baseline, benefits from explicit access to the centerline, creating an information asymmetry designed to assess the ability of RL to compensate for the absence of a geometric model. The agent is trained through a progressive Curriculum Learning strategy that gradually exposes it to increasingly curved geometries, where the tube center frequently disappears from the visual field. A turning‑negotiation mechanism, based on the combination of direct visibility, directional memory, and LiDAR symmetry cues, proves essential for ensuring stable navigation under such partial observability conditions. Experiments show that the PPO policy acquires robust and generalizable behavior, consistently outperforming the deterministic controller despite its limited access to geometric information. Validation in a high‑fidelity 3D environment further confirms the transferability of the learned behavior to a continuous physical dynamics.
The proposed approach thus provides a complete framework for autonomous navigation in unknown tubular environments and opens perspectives for industrial, underground, or medical applications where progressing through narrow and weakly perceptive conduits represents a central challenge.
Authors: Haowen Yu, Na Fan, Xing Liu, Ximin Lyu
Abstract: Accurate real‑time wind vector estimation is essential for enhancing the safety, navigation accuracy, and energy efficiency of unmanned aerial vehicles (UAVs). Traditional approaches rely on external sensors or simplify vehicle dynamics, which limits their applicability during agile flight or in resource‑constrained platforms. This paper proposes a real‑time wind estimation method based solely on onboard sensors. The approach first estimates external aerodynamic forces using a disturbance observer (DOB), and then maps these forces to wind vectors using a thin‑plate spline (TPS) model. A custom‑designed wind barrel mounted on the UAV enhances aerodynamic sensitivity, further improving estimation accuracy. The system is validated through comprehensive experiments in wind tunnels, indoor and outdoor flights. Experimental results demonstrate that the proposed method achieves consistently high‑accuracy wind estimation across controlled and real‑world conditions, with speed RMSEs as low as \SI0.06m/s in wind tunnel tests, \SI0.22m/s during outdoor hover, and below \SI0.38m/s in indoor and outdoor dynamic flights, and direction RMSEs under \ang7.3 across all scenarios, outperforming existing baselines. Moreover, the method provides vertical wind estimates ‑‑ unavailable in baselines ‑‑ with RMSEs below \SI0.17m/s even during fast indoor translations.
Authors: Chong Huang, Gaojie Chen, Zhuoao Xu, Jing Zhu, Taisong Pan, Rahim Tafazolli, Wei Huang
Abstract: In recent years, unmanned aerial vehicles (UAVs) have become a key role in wireless communication networks due to their flexibility and dynamic adaptability. However, the openness of UAV‑based communications leads to security and privacy concerns in wireless transmissions. This paper investigates a framework of UAV covert communications which introduces flexible reconfigurable intelligent surfaces (F‑RIS) in UAV networks. Unlike traditional RIS, F‑RIS provides advanced deployment flexibility by conforming to curved surfaces and dynamically reconfiguring its electromagnetic properties to enhance the covert communication performance. We establish an electromagnetic model for F‑RIS and further develop a fitted model that describes the relationship between F‑RIS reflection amplitude, reflection phase, and incident angle. To maximize the covert transmission rate among UAVs while meeting the covert constraint and public transmission constraint, we introduce a strategy of jointly optimizing UAV trajectories, F‑RIS reflection vectors, F‑RIS incident angles, and non‑orthogonal multiple access (NOMA) power allocation. Considering this is a complicated non‑convex optimization problem, we propose a deep reinforcement learning (DRL) algorithm‑based optimization solution. Simulation results demonstrate that our proposed framework and optimization method significantly outperform traditional benchmarks, and highlight the advantages of F‑RIS in enhancing covert communication performance within UAV networks.
Authors: Lidan Xu, Dadong Fan, Junhong Wang, Wenshuo Li, Hao Lu, Jianzhong Qiao
Abstract: Cooperative suspended aerial transportation is highly susceptible to multi‑source disturbances such as aerodynamic effects and thrust uncertainties. To achieve precise load manipulation, existing methods often rely on extra sensors to measure cable directions or the payload's pose, which increases the system cost and complexity. A fundamental question remains: is the payload's pose observable under multi‑source disturbances using only the drones' odometry information? To answer this question, this work focuses on the two‑drone‑bar system and proves that the whole system is observable when only two or fewer types of lumped disturbances exist by using the observability rank criterion. To the best of our knowledge, we are the first to present such a conclusion and this result paves the way for more cost‑effective and robust systems by minimizing their sensor suites. Next, to validate this analysis, we consider the situation where the disturbances are only exerted on the drones, and develop a composite disturbance filtering scheme. A disturbance observer‑based error‑state extended Kalman filter is designed for both state and disturbance estimation, which renders improved estimation performance for the whole system evolving on the manifold (\mathbbR^3)^2×(TS^2)^3. Our simulation and experimental tests have validated that it is possible to fully estimate the state and disturbance of the system with only odometry information of the drones.
Authors: Jingeun Kim, Yong-Hyuk Kim, Yourim Yoon
Abstract: We present a novel predict‑then‑optimize framework for maritime search operations that integrates trajectory forecasting with UAV deployment optimization‑an end‑to‑end approach not addressed in prior work. A large language model predicts the drifter's trajectory, and spatial uncertainty is modeled using Gaussian‑based particle sampling. Unlike traditional static deployment methods, we dynamically adapt UAV detection radii based on distance and optimize their placement using meta‑heuristic algorithms. Experiments on real‑world data from the Korean coastline demonstrate that our method, particularly the repair mechanism designed for this problem, significantly outperforms the random search baselines. This work introduces a practical and robust integration of trajectory prediction and spatial optimization for intelligent maritime rescue.
Authors: Maoyu Wang, Yao Lu, Bo Zhou, Zhuangzhi Chen, Yun Lin, Qi Xuan, Guan Gui
Abstract: With the rapid development of Unmanned Aerial Vehicles (UAVs) and the increasing complexity of low‑altitude security threats, traditional UAV identification methods struggle to extract reliable signal features and meet real‑time requirements in complex environments. Recently, deep learning based Radio Frequency Fingerprint Identification (RFFI) approaches have greatly improved recognition accuracy. However, their large model sizes and high computational demands hinder deployment on resource‑constrained edge devices. While model pruning offers a general solution for complexity reduction, existing weight, channel, and layer pruning techniques struggle to concurrently optimize compression rate, hardware acceleration, and recognition accuracy. To this end, in this paper, we introduce HSCP, a Hierarchical Spectral Clustering Pruning framework that combines layer pruning with channel pruning to achieve extreme compression, high performance, and efficient inference. In the first stage, HSCP employs spectral clustering guided by Centered Kernel Alignment (CKA) to identify and remove redundant layers. Subsequently, the same strategy is applied to the channel dimension to eliminate a finer redundancy. To ensure robustness, we further employ a noise‑robust fine‑tuning strategy. Experiments on the UAV‑M100 benchmark demonstrate that HSCP outperforms existing channel and layer pruning methods. Specifically, HSCP achieves 86.39% parameter reduction and 84.44% FLOPs reduction on ResNet18 while improving accuracy by 1.49% compared to the unpruned baseline, and maintains superior robustness even in low signal‑to‑noise ratio environments.
Authors: Manduhu Manduhu, Alexander Dow, Gerard Dooly, James Riordan
Abstract: Rotation invariance is essential for precise, object‑level segmentation in UAV aerial imagery, where targets can have arbitrary orientations and exhibit fine‑scale details. Conventional segmentation architectures like U‑Net rely on convolution operators that are not rotation‑invariant, leading to degraded segmentation accuracy across varying viewpoints. Rotation invariance can be achieved by expanding the filter bank across multiple orientations; however, this will significantly increase computational cost and memory traffic. In this paper, we introduce a GPU‑optimized rotation‑invariant convolution framework that eliminates the traditional data‑lowering (im2col) step required for matrix‑multiplication‑based convolution. By exploiting structured data sharing among symmetrically rotated filters, our method achieves multi‑orientation convolution with greatly reduced memory traffic and computational redundancy. We further generalize the approach to accelerate convolution with arbitrary (non‑symmetric) rotation angles.
Across extensive benchmarks, the proposed convolution achieves 20‑‑55% faster training and 15‑‑45% lower energy consumption than CUDNN, while maintaining accuracy comparable to state‑of‑the‑art rotation‑invariant methods. In the eight‑orientation setting, our approach achieves up to 45% speedup and 41% energy savings on 256\(×\)256 inputs, and 32% speedup and 23% lower energy usage on 1024\(×\)1024 inputs. Integrated into a U‑Net segmentation model, the framework yields up to 6% improvement in accuracy over the non‑rotation‑aware baseline. These results demonstrate that the proposed method provides an effective and highly efficient alternative to existing rotation‑invariant CNN frameworks.
Authors: Dongdong Yang, Bin Li, Jiguang He
Abstract: Reconfigurable intelligent surface (RIS) and simultaneously transmitting and reflecting RIS (STAR‑RIS) have emerged as key enablers for enhancing wireless coverage and capacity in next‑generation networks. When mounted on unmanned aerial vehicles (UAVs), they benefit from flexible deployment and improved line‑of‑sight conditions. Despite their promising potential, a comprehensive performance comparison between aerial RIS and STAR‑RIS architectures has not been thoroughly investigated. This letter presents a detailed performance comparison between aerial RIS and STAR‑RIS in three‑dimensional wireless environments. Accurate channel models incorporating directional radiation patterns are established, and the influence of deployment altitude and orientation is thoroughly examined. To optimize the system sum‑rate, we formulate joint optimization problems for both architectures and propose an efficient solution based on the weighted minimum mean square error and block coordinate descent algorithms. Simulation results reveal that STAR‑RIS outperforms RIS in low‑altitude scenarios due to its full‑space coverage capability, whereas RIS delivers better performance near the base station at higher altitudes. The findings provide practical insights for the deployment of aerial intelligent surfaces in future 6G communication systems.
Authors: Jason Hughes, Marcel Hussing, Edward Zhang, Shenbagaraj Kannapiran, Joshua Caswell, Kenneth Chaney, Ruichen Deng, Michaela Feehery, Agelos Kratimenos, Yi Fan Li, Britny Major, Ethan Sanchez, Sumukh Shrote, Youkang Wang, Jeremy Wang, Daudi Zein, Luying Zhang, Ruijun Zhang, Alex Zhou, Tenzi Zhouga, Jeremy Cannon, Zaffir Qasim, Jay Yelon, Fernando Cladera, Kostas Daniilidis, Camillo J. Taylor, Eric Eaton
Abstract: This report presents a heterogeneous robotic system designed for remote primary triage in mass‑casualty incidents (MCIs). The system employs a coordinated air‑ground team of unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs) to locate victims, assess their injuries, and prioritize medical assistance without risking the lives of first responders. The UAV identify and provide overhead views of casualties, while UGVs equipped with specialized sensors measure vital signs and detect and localize physical injuries. Unlike previous work that focused on exploration or limited medical evaluation, this system addresses the complete triage process: victim localization, vital sign measurement, injury severity classification, mental status assessment, and data consolidation for first responders. Developed as part of the DARPA Triage Challenge, this approach demonstrates how multi‑robot systems can augment human capabilities in disaster response scenarios to maximize lives saved.
Authors: Marta Manzoni, Alessandro Nazzari, Roberto Rubinacci, Marco Lovera
Abstract: This paper investigates the use of Multi‑Task Bayesian Optimization for tuning decentralized trajectory generation algorithms in multi‑drone systems. We treat each task as a trajectory generation scenario defined by a specific number of drone‑to‑drone interactions. To model relationships across scenarios, we employ Multi‑Task Gaussian Processes, which capture shared structure across tasks and enable efficient information transfer during optimization. We compare two strategies: optimizing the average mission time across all tasks and optimizing each task individually. Through a comprehensive simulation campaign, we show that single‑task optimization leads to progressively shorter mission times as swarm size grows, but requires significantly more optimization time than the average‑task approach.
Authors: Vit Kratky, Robert Penicka, Parakh M. Gupta, Ondrej Prochazka, Martin Saska
Abstract: This paper presents an approach to mutual collision avoidance based on Nonlinear Model Predictive Control (NMPC) with time‑dependent Reciprocal Velocity Constraints (RVCs). Unlike most existing methods, the proposed approach relies solely on observable information about other robots, eliminating the necessity of excessive communication use. The computationally efficient algorithm for computing RVCs, together with the direct integration of these constraints into NMPC problem formulation on a controller level, allows the whole pipeline to run at 100 Hz. This high processing rate, combined with modeled nonlinear dynamics of the controlled Uncrewed Aerial Vehicles (UAVs), is a key feature that facilitates the use of the proposed approach for an agile UAV flight. The proposed approach was evaluated through extensive simulations emulating real‑world conditions in scenarios involving up to 10 UAVs and velocities of up to 25 m/s, and in real‑world experiments with accelerations up to 30 m/s^2. Comparison with state of the art shows 31% improvement in terms of flight time reduction in challenging scenarios, while maintaining a collision‑free navigation in all trials.
Authors: Ngoc-Tan Nguyen, Thi-Thu Hoang, Trung-Dung Hoang, Thai-Duong Nguyen
Abstract: The open and broadcast nature of wireless communication systems, while enabling ubiquitous connectivity, also exposes them to jamming attacks that may critically compromise network performance or disrupt service availability. The proliferation of Unmanned Aerial Vehicles (UAVs) introduces a new dimension to this threat, as UAVs can act as mobile, intelligent jammers capable of launching sophisticated attacks by leveraging Line‑of‑Sight (LoS) channels and adaptive strategies. This paper addresses a critical challenge of countering intelligent UAV jamming in the context of energy‑constrained ambient backscatter communication systems. Traditional anti‑jamming techniques often fall short against such dynamic threats or are unsuitable for low‑power backscatter devices. Hence, we propose a novel anti‑jamming framework based on Deep Reinforcement Learning (DRL) that empowers the transmitter to not only defend against but also strategically exploit the UAV's jamming signals. In particular, our approach allows the transmitter to learn an optimal policy for switching between active transmission, energy harvesting from the jamming signal, and backscattering information using the jammer's own emissions. We then formulate the problem as a Markov Decision Process (MDP) and employ a Deep Q‑Network (DQN) to derive the optimal operational strategy. Simulation results demonstrate that our DQN‑based method significantly outperforms conventional Q‑learning in convergence speed and surpasses a greedy anti‑jamming strategy in terms of average throughput, packet loss rate, and packet delivery ratio.
Authors: Thai Duong Nguyen, Ngoc-Tan Nguyen, Thanh-Dao Nguyen, Nguyen Van Huynh, Dinh-Hieu Tran, Symeon Chatzinotas
Abstract: The deployment of Unmanned Aerial Vehicle (UAV) swarms as dynamic communication relays is critical for next‑generation tactical networks. However, operating in contested environments requires solving a complex trade‑off, including maximizing system throughput while ensuring collision avoidance and resilience against adversarial jamming. Existing heuristic‑based approaches often struggle to find effective solutions due to the dynamic and multi‑objective nature of this problem. This paper formulates this challenge as a cooperative Multi‑Agent Reinforcement Learning (MARL) problem, solved using the Centralized Training with Decentralized Execution (CTDE) framework. Our approach employs a centralized critic that uses global state information to guide decentralized actors which operate using only local observations. Simulation results show that our proposed framework significantly outperforms heuristic baselines, increasing the total system throughput by approximately 50% while simultaneously achieving a near‑zero collision rate. A key finding is that the agents develop an emergent anti‑jamming strategy without explicit programming. They learn to intelligently position themselves to balance the trade‑off between mitigating interference from jammers and maintaining effective communication links with ground users.
Authors: Thanh-Dao Nguyen, Ngoc-Tan Nguyen, Thai-Duong Nguyen, Nguyen Van Huynh, Dinh-Hieu Tran, Symeon Chatzinotas
Abstract: Non terrestrial networks are critical for achieving global 6G coverage, yet efficient resource management in aerial and space environments remains challenging due to limited onboard power and dynamic operational conditions. Network slicing offers a promising solution for spectrum optimization in UAV based systems serving heterogeneous service demands. For that, this paper proposes a hierarchical network slicing framework for UAV satellite integrated networks supporting eMBB, URLLC, and mMTC services. Specifically, we formulate a joint optimization of UAV trajectory, transmission power, and spectrum allocation as a decentralized partially observable Markov decision process that ensures quality of service while minimizing energy consumption and maximizing resource fairness. To address the computational intractability and partial observability, we develop a multi agent deep reinforcement learning solution under the centralized training and decentralized execution paradigm. In the proposed system, UAV agents act as distributed actors coordinated by a shared critic operating with multi head attention mechanism at a low Earth orbit satellite. Experimental results then demonstrate that our approach outperforms existing methods by up to 33% in cumulative reward while achieving superior energy efficiency and fairness.
Authors: Haoran Wang, Zhuohang Chen, Guang Li, Bo Ma, Chuanghuang Li
Abstract: The future of UAV interaction systems is evolving from engineer‑driven to user‑driven, aiming to replace traditional predefined Human‑UAV Interaction designs. This shift focuses on enabling more personalized task planning and design, thereby achieving a higher quality of interaction experience and greater flexibility, which can be used in many fileds, such as agriculture, aerial photography, logistics, and environmental monitoring. However, due to the lack of a common language between users and the UAVs, such interactions are often difficult to be achieved. The developments of Large Language Models possess the ability to understand nature languages and Robots' (UAVs') behaviors, marking the possibility of personalized Human‑UAV Interaction. Recently, some HUI frameworks based on LLMs have been proposed, but they commonly suffer from difficulties in mixed task planning and execution, leading to low adaptability in complex scenarios. In this paper, we propose a novel dual‑agent HUI framework. This framework constructs two independent LLM agents (a task planning agent, and an execution agent) and applies different Prompt Engineering to separately handle the understanding, planning, and execution of tasks. To verify the effectiveness and performance of the framework, we have built a task database covering four typical application scenarios of UAVs and quantified the performance of the HUI framework using three independent metrics. Meanwhile different LLM models are selected to control the UAVs with compared performance. Our user study experimental results demonstrate that the framework improves the smoothness of HUI and the flexibility of task execution in the tasks scenario we set up, effectively meeting users' personalized needs.
Authors: Lampis Papakostas, Aristeidis Geladaris, Athanasios Mastrogeorgiou, Jim Sharples, Gautier Hattenberger, Panagiotis Chatzakos, Panagiotis Polygerinos
Abstract: This paper presents a UAV swarm system designed to assist first responders in disaster scenarios like wildfires. By distributing sensors across multiple agents, the system extends flight duration and enhances data availability, reducing the risk of mission failure due to collisions. To mitigate this risk further, we introduce an autonomous navigation framework that utilizes a local Euclidean Signed Distance Field (ESDF) map for obstacle avoidance while maintaining swarm formation with minimal path deviation. Additionally, we incorporate a Traveling Salesman Problem (TSP) variant to optimize area coverage, prioritizing Points of Interest (POIs) based on preassigned values derived from environmental behavior and critical infrastructure. The proposed system is validated through simulations with varying swarm sizes, demonstrating its ability to maximize coverage while ensuring collision avoidance between UAVs and obstacles.
Authors: Nikita Vaibhav Pavle, Shrreya Rajneesh, Rakesh Kumar Sahoo, Manoranjan Sinha
Abstract: The conventional Artificial Potential Field (APF) is fundamentally limited by the local minima issue and its inability to account for the kinematics of moving obstacles. This paper addresses the critical challenge of autonomous collision avoidance for Unmanned Aerial Vehicles (UAVs) operating in dynamic and cluttered airspace by proposing a novel Direction and Relative Velocity Weighted Artificial Potential Field (APF). In this approach, a bounded weighting function, ω(θ,v_e), is introduced to dynamically scale the repulsive potential based on the direction and velocity of the obstacle relative to the UAV. This robust APF formulation is integrated within a Model Predictive Control (MPC) framework to generate collision‑free trajectories while adhering to kinematic constraints. Simulation results demonstrate that the proposed method effectively resolves local minima and significantly enhances safety by enabling smooth, predictive avoidance maneuvers. The system ensures superior path integrity and reliable performance, confirming its viability for autonomous navigation in complex environments.
Authors: Ximing Huang, Yirui Rao
Abstract: The proliferation of autonomous Unmanned Aerial Vehicles (UAVs) in Beyond Visual Line of Sight (BVLOS) applications is critically dependent on resilient, high‑bandwidth, and low‑latency communication links. Existing solutions face critical limitations: TCP's head‑of‑line blocking stalls time‑sensitive data, UDP lacks reliability and congestion control, and cellular networks designed for terrestrial users degrade severely for aerial platforms. This paper introduces AQUILA, a cross‑layer communication architecture built on QUIC to address these challenges. AQUILA contributes three key innovations: (1) a unified transport layer using QUIC's reliable streams for MAVLink Command and Control (C2) and unreliable datagrams for video, eliminating head‑of‑line blocking under unified congestion control; (2) a priority scheduling mechanism that structurally ensures C2 latency remains bounded and independent of video traffic intensity; (3) a UAV‑adapted congestion control algorithm extending SCReAM with altitude‑adaptive delay targeting and telemetry headroom reservation. AQUILA further implements 0‑RTT connection resumption to minimize handover blackouts with application‑layer replay protection, deployed over an IP‑native architecture enabling global operation. Experimental validation demonstrates that AQUILA significantly outperforms TCP‑ and UDP‑based approaches in C2 latency, video quality, and link resilience under realistic conditions, providing a robust foundation for autonomous BVLOS missions.
Authors: Jifar Wakuma Ayana, Huang Qiming
Abstract: Large Language Models (LLMs) are emerging as powerful enablers for autonomous reasoning and natural‑language coordination in unmanned aerial vehicle (UAV) swarms operating within Internet of Things (IoT) environments. However, existing LLM‑driven UAV systems process sensitive operational data in plaintext, exposing them to privacy and security risks. This work introduces PrivLLMSwarm, a privacy‑preserving framework that performs secure LLM inference for UAV swarm coordination through Secure Multi‑Party Computation (MPC). The framework incorporates MPC‑optimized transformer components with efficient approximations of nonlinear activations, enabling practical encrypted inference on resource‑constrained aerial platforms. A fine‑tuned GPT‑based command generator, enhanced through reinforcement learning in simulation, provides reliable instructions while maintaining confidentiality. Experimental evaluation in urban‑scale simulations demonstrates that PrivLLMSwarm achieves high semantic accuracy, low encrypted inference latency, and robust formation control under privacy constraints. Comparative analysis shows PrivLLMSwarm offers a superior privacy‑utility balance compared to differential privacy, federated learning, and plaintext baselines. To support reproducibility, the full implementation including source code, MPC components, and a synthetic dataset is publicly available. PrivLLMSwarm establishes a practical foundation for secure, LLM‑enabled UAV swarms in privacy‑sensitive IoT applications including smart‑city monitoring and emergency response.
Authors: Marvin Harms, Jaeyoung Lim, David Rohr, Friedrich Rockenbauer, Nicholas Lawrance, Roland Siegwart
Abstract: Dynamic soaring is a flying technique to exploit the energy available in wind shear layers, enabling potentially unlimited flight without the need for internal energy sources. We propose a framework for autonomous dynamic soaring with a fixed‑wing unmanned aerial vehicle (UAV). The framework makes use of an explicit representation of the wind field and a classical approach for guidance and control of the UAV. Robustness to wind field estimation error is achieved by constructing point‑wise robust reference paths for dynamic soaring and the development of a robust path following controller for the fixed‑wing UAV. Wind estimation and path tracking performance are validated with real flight tests to demonstrate robust path‑following in real wind conditions. In simulation, we demonstrate robust dynamic soaring flight subject to varied wind conditions, estimation errors and disturbances. Together, our results strongly indicate the ability of the proposed framework to achieve autonomous dynamic soaring flight in wind shear.
Authors: Andrii Lysyi, Anatoliy Sachenko, Pavlo Radiuk, Mykola Lysyi, Oleksandr Melnychenko, Diana Zahorodnia
Abstract: The subject of this research is the development of an intelligent, integrated framework for the automated inspection of photovoltaic (PV) infrastructure that addresses the critical shortcomings of conventional methods, including thermal palette bias, data redundancy, and high communication bandwidth requirements. The goal of this study is to design, develop, and validate a comprehensive, multi‑modal system that fully automates the monitoring workflow, from data acquisition to the generation of actionable, geo‑located maintenance alerts, thereby enhancing plant safety and operational efficiency. The methods employed involve a synergistic architecture that begins with a palette‑invariant thermal embedding, learned by enforcing representational consistency, which is fused with a contrast‑normalized RGB stream via a gated mechanism. This is supplemented by a closed‑loop, adaptive re‑acquisition controller that uses Rodrigues‑based updates for targeted confirmation of ambiguous anomalies and a geospatial deduplication module that clusters redundant alerts using DBSCAN over the haversine distance. In conclusion, this study establishes a powerful new paradigm for proactive PV inspection, with the proposed system achieving a mean Average Precision (mAP@0.5) of 0.903 on the public PVF‑10 benchmark, a significant 12‑15% improvement over single‑modality baselines. Field validation confirmed the system's readiness, achieving 96% recall, while the de‑duplication process reduced duplicate‑induced false positives by 15‑20%, and relevance‑only telemetry cut airborne data transmission by 60‑70%.
Authors: Yuxuan Song, Haiquan Lu, Chiya Zhang, Beixiong Zheng, Yong Zeng
Abstract: Cellular‑connected unmanned aerial vehicles (UAVs) are expected to play an increasingly important role in future wireless networks. To facilitate the reliable navigation for cellular‑connected UAVs, channel knowledge map (CKM) is considered a promising approach capable of tackling the non‑negligible co‑channel interference resulting from the high line‑of‑sight (LoS) probability of air‑ground (AG) channels. Nevertheless, due to measurement constraints and the aging of information, CKM is usually incomplete and needs to be regularly updated to capture the dynamic nature of complex environments. In this paper, we propose a novel trajectory design strategy in which UAV navigation and CKM completion are incorporated into a common framework, enabling mutual benefits for both tasks. Specifically, a cellular‑connected UAV deployed in an urban environment measures the radio information during its flight and completes the CKM with Kriging interpolation. Based on the method of grid discretization and spherical approximation, a mixed‑integer multi‑objective optimization problem is formulated. The problem falls into the category of combinatorial mathematics and is essentially equivalent to determining an optimum sequence of grid points to traverse. Through proper mathematical manipulation, the problem is reformulated as variants of two classic models in graph theory, namely the shortest‑path problem (SPP) and the traveling salesman problem (TSP). Two navigation strategies based on the two different models are proposed and thoroughly compared based on numerical results to provide implementable methods for engineering practice and reveal the trade‑offs between UAV navigation and CKM completion. Simulation results reveal that the proposed navigation strategies can quickly expand the Pareto boundary of the problem and approach the performance of fully‑known CKM.
Authors: Melone Nyoba Tchonkeu, Soulaimane Berkane, Tarek Hamel
Abstract: This paper addresses the problem of estimating air velocity and full attitude for unmanned aerial vehicles (UAVs) in GNSS‑denied environments using minimal onboard sensing‑an interesting and practically relevant challenge for UAV navigation. The contribution of the paper is twofold: (i) an observability analysis establishing the conditions for uniform observability, which are useful for trajectory planning and motion control of the UAV; and (ii) the design of a nonlinear observer on SO3R3R that incorporates pitot‑tube, barometric altitude, and magnetometer measurements as outputs, with IMU data used as inputs, within a unified framework. Simulation results are presented to confirm the convergence and robustness of the proposed design, including under minimally excited trajectories.
Authors: Amit Shivam, Manuel C. R. M. Fernandes, Fernando A. C. C. Fontes, Lorenzo Fagiano
Abstract: This paper presents a geometric and theoretical study of an exponentially varying look‑ahead parameter for UAV path‑following guidance. Conventional guidance laws with a fixed look‑ahead distance often drive the vehicle into turn‑rate saturation when the heading or cross‑track error is large, leading to constrained maneuvers and higher control effort. The proposed variable L0 strategy reshapes the look‑ahead profile so that the guidance command adapts to the evolving tracking error geometry. A detailed investigation shows that this adaptation significantly enlarges the region in which the commanded turn rate remains unsaturated, allowing the vehicle to operate smoothly over a broader range of error conditions. For representative settings, the unsaturated operational envelope increases by more than 70% relative to the constant L0 formulation. These geometric insights translate to smoother trajectories, earlier recovery from saturation, and reduced control demand. Simulation studies on straight‑line and elliptical paths demonstrate the merits of the variable look‑ahead strategy, highlighting its control‑efficient and reliable path‑following performance.
Authors: Xiaobo Wu, Youmin Zhang
Abstract: Accurate real‑time waypoints estimation for the UAV‑based online Terrain Following during wildfire patrol missions is critical to ensuring flight safety and enabling wildfire detection. However, existing real‑time filtering algorithms struggle to maintain accurate waypoints under measurement noise in nonlinear and time‑varying systems, posing risks of flight instability and missed wildfire detections during UAV‑based terrain following. To address this issue, a Residual Variance Matching Recursive Least Squares (RVM‑RLS) filter, guided by a Residual Variance Matching Estimation (RVME) criterion, is proposed to adaptively estimate the real‑time waypoints of nonlinear, time‑varying UAV‑based terrain following systems. The proposed method is validated using a UAV‑based online terrain following system within a simulated terrain environment. Experimental results show that the RVM‑RLS filter improves waypoints estimation accuracy by approximately 88% compared with benchmark algorithms across multiple evaluation metrics. These findings demonstrate both the methodological advances in real‑time filtering and the practical potential of the RVM‑RLS filter for UAV‑based online wildfire patrol.
Authors: Ji Wang, Miroslav Krstic
Abstract: This paper presents a safe output regulation control strategy for a class of systems modeled by a coupled 2× 2 hyperbolic PDE‑ODE structure, subject to fully distributed disturbances throughout the system. A state‑feedback controller is developed by the nonovershooting backstepping method to simultaneously achieve exponential output regulation and enforce safety constraints on the regulated output that is the state furthest from the control input. To handle unmeasurable states and external disturbances, a state observer and a disturbance estimator are designed. Explicit bounds on the estimation errors are derived and used to construct a robust safe regulator that accounts for the uncertainties. The proposed control scheme guarantees that: 1) If the regulated output is initially within the safe region, it remains there; otherwise, it will be rescued to the safety within a prescribed time; 2) The output tracking error converges to zero exponentially; 3) The observer accurately estimates both the distributed states and external disturbances, with estimation errors converging to zero exponentially; 4) All signals in the closed‑loop system remain bounded. The effectiveness of the proposed method is demonstrated through a UAV delivery scenario with a cable‑suspended payload, where the payload is regulated to track a desired reference while avoiding collisions with barriers.
Authors: Marios-Nektarios Stamatopoulos, Shridhar Velhal, Avijit Banerjee, George Nikolakopoulos
Abstract: This article presents a novel coordination and task‑planning framework to enable the simultaneous conflict‑free collaboration of multiple unmanned aerial vehicles (UAVs) for aerial 3D printing. The proposed framework formulates an optimization problem that takes a construction mission divided into sub‑tasks and a team of autonomous UAVs, along with limited volume and battery. It generates an optimal mission plan comprising task assignments and scheduling while accounting for task dependencies arising from the geometric and structural requirements of the 3D design, inter‑UAV safety constraints, material usage, and total flight time of each UAV. The potential conflicts occurring during the simultaneous operation of the UAVs are addressed at a segment level by dynamically selecting the starting time and location of each task to guarantee collision‑free parallel execution. An importance prioritization is proposed to accelerate the computation by guiding the solution toward more important tasks. Additionally, a utility maximization formulation is proposed to dynamically determine the optimal number of UAVs required for a given mission, balancing the trade‑off between minimizing makespan and the deployment of excess agents. The proposed framework's effectiveness is evaluated through a Gazebo‑based simulation setup, where agents are coordinated by a mission control module allocating the printing tasks based on the generated optimal scheduling plan while remaining within the material and battery constraints of each UAV.
Authors: Ali Krayani, Seyedeh Fatemeh Sadati, Lucio Marcenaro, Carlo Regazzoni
Abstract: This paper proposes a hierarchical trajectory planning framework for UAVs operating under adversarial jamming conditions. Leveraging Bayesian Active Inference, the approach combines expert‑generated demonstrations with probabilistic generative modeling to encode high‑level symbolic planning, low‑level motion policies, and wireless signal feedback. During deployment, the UAV performs online inference to anticipate interference, localize jammers, and adapt its trajectory accordingly, without prior knowledge of jammer locations. Simulation results demonstrate that the proposed method achieves near‑expert performance, significantly reducing communication interference and mission cost compared to model‑free reinforcement learning baselines, while maintaining robust generalization in dynamic environments.
Authors: Yida Lin, Bing Xue, Mengjie Zhang, Sam Schofield, Richard Green
Abstract: Traditional stereo matching algorithms like Semi‑Global Block Matching (SGBM) with Weighted Least Squares (WLS) filtering offer speed advantages over neural networks for UAV applications, generating disparity maps in approximately 0.5 seconds per frame. However, these algorithms require meticulous parameter tuning. We propose a Genetic Algorithm (GA) based parameter optimization framework that systematically searches for optimal parameter configurations for SGBM and WLS, enabling UAVs to measure distances to tree branches with enhanced precision while maintaining processing efficiency. Our contributions include: (1) a novel GA‑based parameter optimization framework that eliminates manual tuning; (2) a comprehensive evaluation methodology using multiple image quality metrics; and (3) a practical solution for resource‑constrained UAV systems. Experimental results demonstrate that our GA‑optimized approach reduces Mean Squared Error by 42.86% while increasing Peak Signal‑to‑Noise Ratio and Structural Similarity by 8.47% and 28.52%, respectively, compared with baseline configurations. Furthermore, our approach demonstrates superior generalization performance across varied imaging conditions, which is critcal for real‑world forestry applications.
Authors: Houzhang Fang, Chenxing Wu, Kun Bai, Tianqi Chen, Xiaolin Wang, Xiyang Liu, Yi Chang, Luxin Yan
Abstract: Unmanned aerial vehicle (UAV) target tracking based on thermal infrared imaging has been one of the most important sensing technologies in anti‑UAV applications. However, the infrared UAV targets often exhibit weak features and complex backgrounds, posing significant challenges to accurate tracking. To address these problems, we introduce SiamDFF, a novel dynamic feature fusion Siamese network that integrates feature enhancement and global contextual attention knowledge distillation for infrared UAV target (IRUT) tracking. The SiamDFF incorporates a selective target enhancement network (STEN), a dynamic spatial feature aggregation module (DSFAM), and a dynamic channel feature aggregation module (DCFAM). The STEN employs intensity‑aware multi‑head cross‑attention to adaptively enhance important regions for both template and search branches. The DSFAM enhances multi‑scale UAV target features by integrating local details with global features, utilizing spatial attention guidance within the search frame. The DCFAM effectively integrates the mixed template generated from STEN in the template branch and original template, avoiding excessive background interference with the template and thereby enhancing the emphasis on UAV target region features within the search frame. Furthermore, to enhance the feature extraction capabilities of the network for IRUT without adding extra computational burden, we propose a novel tracking‑specific target‑aware contextual attention knowledge distiller. It transfers the target prior from the teacher network to the student model, significantly improving the student network's focus on informative regions at each hierarchical level of the backbone network. Extensive experiments on real infrared UAV datasets demonstrate that the proposed approach outperforms state‑of‑the‑art target trackers under complex backgrounds while achieving a real‑time tracking speed.
Authors: Ghoshana Bista, Abbas Bradai, Emmanuel Moulay, Abdulhalim Dandoush
Abstract: The growing demand for robust, scalable wireless networks in the 5G‑and‑beyond era has led to the deployment of Unmanned Aerial Vehicles (UAVs) as mobile base stations to enhance coverage in dense urban and underserved rural areas. This paper presents a Multi‑Agent Deep Reinforcement Learning (MADRL) framework that integrates Proximal Policy Optimization (MAPPO), Multi‑Agent Deep Deterministic Policy Gradient (MADDPG), and Multi‑Agent Deep Q‑Networks (MADQN) to jointly optimize UAV positioning, resource allocation, Quality of Service (QoS), and energy efficiency through 5G network slicing. The framework adopts Centralized Training with Decentralized Execution (CTDE), enabling autonomous real‑time decision‑making while preserving global coordination. Users are prioritized into Premium (A), Silver (B), and Bronze (C) slices with distinct QoS requirements. Experiments in realistic urban and rural scenarios show that MAPPO achieves the best overall QoS‑energy tradeoff, especially in interference‑rich environments; MADDPG offers more precise continuous control and can attain slightly higher SINR in open rural settings at the cost of increased energy usage; and MADQN provides a computationally efficient baseline for discretized action spaces. These findings demonstrate that no single MARL algorithm is universally dominant; instead, algorithm suitability depends on environmental topology, user density, and service requirements. The proposed framework highlights the potential of MARL‑driven UAV systems to enhance scalability, reliability, and differentiated QoS delivery in next‑generation wireless networks.
Authors: Sonali Rout, Vireshwar Kumar
Abstract: Unmanned Aerial Vehicles (UAVs) or drones are being introduced in a wide range of commercial applications. This has also made them prime targets of attackers who compromise their fundamental security properties, including confidentiality, integrity, and availability. As researchers discover novel threat vectors in UAVs, the government and industry are increasingly concerned about their limited ability to secure and regulate UAVs and their usage. With the aim of unfolding a path for a large‑scale commercial UAV network deployment, we conduct a comprehensive state‑of‑the‑art study and examine the prevailing security challenges. Unlike the prior art, we focus on uncovering the research gaps that must be addressed to enforce security policy regulations in civilian off‑the‑shelf drone systems. To that end, we first examine the known security threats to UAVs based on their impact and effectiveness. We then analyze existing countermeasures to prevent, detect, and respond to these threats in terms of security and performance overhead. We further outline the future research directions for securing UAVs. Finally, we establish the fundamental requirements and highlight critical research challenges in introducing a regulatory entity to achieve a secure and regulated UAV network.
Authors: Zexin Lin, Yebin Zhong, Hanwen Wan, Jiu Cheng, Zhenglong Sun, Xiaoqiang Ji
Abstract: Transition control poses a critical challenge in Vertical Take‑Off and Landing Unmanned Aerial Vehicle (VTOL UAV) development due to the tilting rotor mechanism, which shifts the center of gravity and thrust direction during transitions. Current control methods' decoupled control of altitude and position leads to significant vibration, and limits interaction consideration and adaptability. In this study, we propose a novel coupled transition control methodology based on reinforcement learning (RL) driven controller. Besides, contrasting to the conventional phase‑transition approach, the ST3M method demonstrates a new perspective by treating cruise mode as a special case of hover. We validate the feasibility of applying our method in simulation and real‑world environments, demonstrating efficient controller development and migration while accurately controlling UAV position and attitude, exhibiting outstanding trajectory tracking and reduced vibrations during the transition process.
Authors: Pavlo Mykytyn, Ronald Chitauro, Onur Yener, Peter Langendoerfer
Abstract: This work presents an experimental performance evaluation of a private 5G airfield network under controlled directional SDR jamming attacks targeting UAV‑based UE nodes. Using a QualiPoc Android UE, mounted as a payload on a quadcopter UAV, we conducted a series of experiments to evaluate signal degradation, handover performance, and ser‑vice stability in the presence of constant directional jamming. The conducted experiments aimed to examine the effects of varying travel speeds, altitudes, and moving patterns of a UAV‑based UE to record and analyze the key physical‑layer and network‑layer metrics such as CQI, MCS, RSRP, SINR, BLER, Net PDSCH Throughput and RLF. The re‑sults of this work describe the link stability and signal degradation dependencies, caused by the level of mobility of the UAV‑based UE nodes during autonomous and automatic operation in private 5G Airfield networks
Authors: Zhen Wang, Bin Lin, Qiang, Ye
Abstract: In this paper, we propose a double‑edge‑assisted computation offloading and resource allocation scheme tailored for space‑air‑marine integrated networks (SAMINs). Specifically, we consider a scenario where both unmanned aerial vehicles (UAVs) and a low earth orbit (LEO) satellite are equipped with edge servers, providing computing services for maritime autonomous surface ships (MASSs). Partial computation workloads of MASSs can be offloaded to both UAVs and the LEO satellite, concurrently, for processing via a multi‑access approach. To minimize the energy consumption of SAMINs under latency constraints, we formulate an optimization problem and propose energy efficient algorithms to jointly optimize offloading mode, offloading volume, and computing resource allocation of the LEO satellite and the UAVs, respectively. We further exploit an alternating optimization (AO) method and a layered approach to decompose the original problem to attain the optimal solutions. Finally, we conduct simulations to validate the effectiveness and efficiency of the proposed scheme in comparison with benchmark algorithms.
Authors: Yida Lin, Bing Xue, Mengjie Zhang, Sam Schofield, Richard Green
Abstract: Autonomous UAV forestry operations require robust depth estimation methods with strong cross‑domain generalization. However, existing evaluations focus on urban and indoor scenarios, leaving a critical gap for specialized vegetation‑dense environments. We present the first systematic zero‑shot evaluation of eight state‑of‑the‑art stereo methods‑‑RAFT‑Stereo, IGEV, IGEV++, BridgeDepth, StereoAnywhere, DEFOM (plus baseline methods ACVNet, PSMNet, TCstereo)‑‑spanning iterative refinement, foundation model, and zero‑shot adaptation paradigms. All methods are trained exclusively on Scene Flow and evaluated without fine‑tuning on four standard benchmarks (ETH3D, KITTI 2012/2015, Middlebury) plus a novel 5,313‑pair Canterbury forestry dataset captured with ZED Mini camera (1920x1080). Performance reveals scene‑dependent patterns: foundation models excel on structured scenes (BridgeDepth: 0.23 px on ETH3D, 0.83‑1.07 px on KITTI; DEFOM: 0.35‑4.65 px across benchmarks), while iterative methods maintain cross‑domain robustness (IGEV++: 0.36‑6.77 px; IGEV: 0.33‑21.91 px). Critical finding: RAFT‑Stereo exhibits catastrophic ETH3D failure (26.23 px EPE, 98 percent error rate) due to negative disparity predictions, while performing normally on KITTI (0.90‑1.11 px). Qualitative evaluation on Canterbury forestry dataset identifies DEFOM as the optimal gold‑standard baseline for vegetation depth estimation, exhibiting superior depth smoothness, occlusion handling, and cross‑domain consistency compared to IGEV++, despite IGEV++'s finer detail preservation.
Authors: Tristan Amadei, Enric Meinhardt-Llopis, Benedicte Bascle, Corentin Abgrall, Gabriele Facciolo
Abstract: Image‑based localization in GNSS‑denied environments is critical for UAV autonomy. Existing state‑of‑the‑art approaches rely on matching UAV images to geo‑referenced satellite images; however, they typically require large‑scale, paired UAV‑satellite datasets for training. Such data are costly to acquire and often unavailable, limiting their applicability. To address this challenge, we adopt a training paradigm that removes the need for UAV imagery during training by learning directly from satellite‑view reference images. This is achieved through a dedicated augmentation strategy that simulates the visual domain shift between satellite and real‑world UAV views. We introduce CAEVL, an efficient model designed to exploit this paradigm, and validate it on ViLD, a new and challenging dataset of real‑world UAV images that we release to the community. Our method achieves competitive performance compared to approaches trained with paired data, demonstrating its effectiveness and strong generalization capabilities.
Authors: Qionglin Ren, Dawei Zhang, Chunxu Tian, Dan Zhang
Abstract: Research in Anti‑UAV (Unmanned Aerial Vehicle) tracking has explored various modalities, including RGB, TIR, and RGB‑T fusion. However, a unified framework for cross‑modal collaboration is still lacking. Existing approaches have primarily focused on independent models for individual tasks, often overlooking the potential for cross‑modal information sharing. Furthermore, Anti‑UAV tracking techniques are still in their infancy, with current solutions struggling to achieve effective multimodal data fusion. To address these challenges, we propose UAUTrack, a unified single‑target tracking framework built upon a single‑stream, single‑stage, end‑to‑end architecture that effectively integrates multiple modalities. UAUTrack introduces a key component: a text prior prompt strategy that directs the model to focus on UAVs across various scenarios. Experimental results show that UAUTrack achieves state‑of‑the‑art performance on the Anti‑UAV and DUT Anti‑UAV datasets, and maintains a favourable trade‑off between accuracy and speed on the Anti‑UAV410 dataset, demonstrating both high accuracy and practical efficiency across diverse Anti‑UAV scenarios.
Authors: Bach Hung Luu, Sinh Cong Lam, Nam Hoang Nguyen
Abstract: Cell‑edge users (CEUs) in cellular networks typically suffer from poor channel conditions due to long distances from serving base stations and physical obstructions, resulting in much lower data rates compared to cell‑center users (CCUs). This paper proposes an Unmanned Aerial Vehicles (UAV)‑assisted cellular network with intelligent power control to address the performance gap between CEUs and CCUs. Unlike conventional approaches that either deploy UAVs for all users or use no UAV assistance, our model uses a distance‑based criterion where only users beyond a reference distance receive UAV relay assistance. Each UAV operates as an amplify‑and‑forward relay, enabling assisted users to receive signals from both the base station and the UAV simultaneously, thereby achieving diversity gain. To optimize transmission power allocation across base stations, we employ a Deep Q‑Network (DQN) learning framework that learns power control policies without requiring accurate channel models. Simulation results show that the proposed approach achieves a peak average rate of 2.28 bps/Hz at the optimal reference distance of 400m, which represents a 3.6% improvement compared to networks without UAV assistance and 0.9% improvement compared to networks where all users receive UAV support. The results also reveal that UAV altitude and reference distance are critical factors affecting system performance, with lower altitudes providing better performance.
Authors: Hongyang Pan, Bin Lin, Yanheng Liu, Shuang Liang, Chau Yuen
Abstract: The Internet‑of‑Things (IoT) is widely applied for forest monitoring, since the sensor nodes (SNs) in IoT network are low‑cost and have computing ability to process the monitoring data. To further improve the performance of forest monitoring, uncrewed aerial vehicles (UAVs) are employed as the data processors to enhance computing capability. However, efficient forest monitoring with limited energy budget and computing resource presents a significant challenge. For this purpose, this paper formulates a multi‑objective optimization framework to simultaneously consider three optimization objectives, which are minimizing the maximum computing delay, minimizing the total motion energy consumption, and minimizing the maximum computing resource, corresponding to efficient forest monitoring, energy consumption reduction, and computing resource control, respectively. Due to the hybrid solution space that consists of continuous and discrete solutions, we propose a diffusion model‑enhanced improved multi‑objective grey wolf optimizer (IMOGWO) to solve the formulated framework. The simulation results show that the proposed IMOGWO outperforms other benchmarks for solving the formulated framework. Specifically, for a small‑scale network with 6 UAVs and 50 SNs, compared to the suboptimal benchmark, IMOGWO reduces the motion energy consumption and the computing resource by 53.32% and 9.83%, respectively, while maintaining computing delay at the same level. Similarly, for a large‑scale network with 8 UAVs and 100 SNs, IMOGWO achieves reductions of 41.81% in motion energy consumption and 7.93% in computing resource, with the computing delay also remaining comparable.
Authors: Wenhao Wang, Yi Rong, Yanyan Li, Long Jiao, Jiawei Yuan
Abstract: Recent advances in Large language models (LLMs) have demonstrated their promising capabilities of generating robot operation code to enable LLM‑driven robots. To enhance the reliability of operation code generated by LLMs, corrective designs with feedback from the observation of executing code have been increasingly adopted in existing research. However, the code execution in these designs relies on either a physical experiment or a customized simulation environment, which limits their deployment due to the high configuration effort of the environment and the potential long execution time. In this paper, we explore the possibility of directly leveraging LLM to enable static simulation of robot operation code, and then leverage it to design a new reliable LLM‑driven corrective robot operation code generation framework. Our framework configures the LLM as a static simulator with enhanced capabilities that reliably simulate robot code execution by interpreting actions, reasoning over state transitions, analyzing execution outcomes, and generating semantic observations that accurately capture trajectory dynamics. To validate the performance of our framework, we performed experiments on various operation tasks for different robots, including UAVs and small ground vehicles. The experiment results not only demonstrated the high accuracy of our static text‑based simulation but also the reliable code generation of our LLM‑driven corrective framework, which achieves a comparable performance with state‑of‑the‑art research while does not rely on dynamic code execution using physical experiments or simulators.
Authors: Yujie Huang, Haibin Wan, Xiangcheng Li, Tuanfa Qin, Yun Li, Jun Li, Wen Chen
Abstract: With the rapid advances in programmable materials, reconfigurable intelligent surfaces (RIS) have become a pivotal technology for future wireless communications. The simultaneous transmitting and reflecting reconfigurable intelligent surfaces (STAR‑RIS) can both transmit and reflect signals, enabling comprehensive signal control and expanding application scenarios. This paper introduces an unmanned aerial vehicle (UAV) to further enhance system flexibility and proposes an optimization design for the spectrum efficiency of the STAR‑RIS‑UAV‑assisted wireless communication system. We present a deep reinforcement learning (DRL) algorithm capable of iteratively optimizing beamforming, phase shifts, and UAV positioning to maximize the system's sum rate through continuous interactions with the environment. To improve exploration in deterministic policies, we introduce a stochastic perturbation factor, which enhances exploration capabilities. As exploration is strengthened, the algorithm's ability to accurately evaluate the state‑action value function becomes critical. Thus, based on the deep deterministic policy gradient (DDPG) algorithm, we propose a convolution‑augmented deep deterministic policy gradient (CA‑DDPG) algorithm that balances exploration and evaluation to improve the system's sum rate. The simulation results demonstrate that the CA‑DDPG algorithm effectively interacts with the environment, optimizing the beamforming matrix, phase shift matrix, and UAV location, thereby improving system capacity and achieving better performance than other algorithms.
Authors: Yasaswini Konapalli, Lotfi Ben Othmane, Cihan Tunc, Feras Benchellal, Likhita Mudagere
Abstract: Unmanned Aerial Vehicle (UAV) technologies are gaining high interest for many domains, which makes UAV security of utmost importance. ArduPilot is among the most widely used open‑source autopilot UAV frameworks; yet, many studies demonstrate the vulnerabilities affecting such systems. Vulnerabilities within its communication subsystems (including WiFi, telemetry, or GPS) expose critical entry points, and vulnerabilities in Ardupilot can affect the control procedure. In this paper, we reconstruct the software architecture and the control models implemented by ArduPilot and then examine how these control models could potentially misused to induce malicious behaviors while relying on legitimate inputs.
Authors: Hui Zhou, Jiaying Guo, Marios Aristodemou, Zhaoyang Du, Shen Wang, Xiaolan Liu, Soufiene Djahel, Celimuge Wu
Abstract: As mission‑critical (MC) services such as Unmanned Aerial Vehicles (UAVs) based emergency communication and Internet of Vehicles (IoVs) enabled autonomous driving emerge, the traditional communication framework can not meet the growing demands for higher reliability and lower latency and the increasing transmission loads. Semantic Communication (SemCom), an emerging communication paradigm that shifts the focus from bit‑level data to its context and intended task at the receiver (i.e., semantic level), is envisioned to be a key revolution in Sixth Generation (6G) networks. However, an explicit and systematic SemCom framework specifically tailored for Vehicle‑based MC (VbMC) services has yet to be proposed, primarily due to the complexity and lack of analysis on their MC characteristics. In this article, we first present the key information‑critical and infrastructure‑critical vehicle‑based services within the SemCom framework. We then analyze the unique characteristics of MC services and the corresponding challenges they present for SemCom. Building on this, we propose a novel SemCom framework designed to address the specific needs of MC services in vehicle systems, offering potential solutions to existing challenges. Finally, we present a case study on UAV‑based rapid congestion relief, utilizing eXplainable AI (XAI) to validate the effectiveness of the proposed SemCom framework.
Authors: Orestis Kaparounakis, Yunqi Zhang, Phillip Stanley-Marbell
Abstract: Particle filtering algorithms have enabled practical solutions to problems in autonomous robotics (self‑driving cars, UAVs, warehouse robots), target tracking, and econometrics, with further applications in speech processing and medicine (patient monitoring). Yet, their inherent weakness at representing the likelihood of the observation (which often leads to particle degeneracy) remains unaddressed for real‑time resource‑constrained systems. Improvements such as the optimal proposal and auxiliary particle filter mitigate this issue under specific circumstances and with increased computational cost. This work presents a new particle filtering method and its implementation, which enables tunably‑approximative representation of arbitrary likelihood densities as program transformations of parametric distributions. Our method leverages a recent computing platform thatcan perform deterministic computation on probability distributionrepresentations (UxHw) without relying on stochastic methods. For non‑Gaussian non‑linear systems and with an optimal‑auxiliary particle filter, we benchmark the likelihood evaluation error and speed for a total of 294840 evaluation points. For such models, the results show that the UxHw method leads to as much as 37.7x speedup compared to the Monte Carlo alternative. For narrow uniform measurement uncertainty, the particle filter falsely assigns zero likelihood as much as 81.89% of the time whereas UxHw achieves 1.52% false‑zero rate. The UxHw approach achieves filter RMSE improvement of as much as 18.9% (average 3.3%) over the Monte Carlo alternative.
Authors: Md Muzakkir Quamar, Ali Nasir, Sami ELFerik
Abstract: This paper presents a scalable and fault‑tolerant framework for unmanned aerial vehicle (UAV) mission management in complex and uncertain environments. The proposed approach addresses the computational bottleneck inherent in solving large‑scale Markov Decision Processes (MDPs) by introducing a two‑stage decomposition strategy. In the first stage, a factor‑based algorithm partitions the global MDP into smaller, goal‑specific sub‑MDPs by leveraging domain‑specific features such as goal priority, fault states, spatial layout, and energy constraints. In the second stage, a priority‑based recombination algorithm solves each sub‑MDP independently and integrates the results into a unified global policy using a meta‑policy for conflict resolution. Importantly, we present a theoretical analysis showing that, under mild probabilistic independence assumptions, the combined policy is provably equivalent to the optimal global MDP policy. Our work advances artificial intelligence (AI) decision scalability by decomposing large MDPs into tractable subproblems with provable global equivalence. The proposed decomposition framework enhances the scalability of Markov Decision Processes, a cornerstone of sequential decision‑making in artificial intelligence, enabling real‑time policy updates for complex mission environments. Extensive simulations validate the effectiveness of our method, demonstrating orders‑of‑magnitude reduction in computation time without sacrificing mission reliability or policy optimality. The proposed framework establishes a practical and robust foundation for scalable decision‑making in real‑time UAV mission execution.
Authors: Basilis Mamalis, Marios Perlitis
Abstract: Flying Ad‑hoc Networks (FANETs), formed by Unmanned Aerial Vehicles (UAVs), represent an emerging and promising communication paradigm. These networks face unique challenges due to UAVs high mobility, limited energy resources, and dynamic topology. In this work, we propose a novel multi‑hop clustering algorithm aimed at creating stable, energy‑efficient clusters in FANET environments. The proposed solution enhances cluster longevity and communication efficiency through mobility‑aware clustering, energy‑centric cluster head (CH) selection, and a ground station(GS)‑assisted cluster maintenance management mechanism. First, steady multi‑hop clusters are constructed, having CHs with not only high stability and high energy but also with steady and high‑energy neighboring areas, and then a proper GS‑assisted cluster maintenance mechanism is applied. Experimental results, based on extended simulations, demonstrate that our approach outperforms existing schemes significantly, in terms of cluster stability, communication overhead, and security resilience.
Authors: Dnyandeep Mandaokar, Bernhard Rinner
Abstract: Dynamic obstacle avoidance (DOA) for unmanned aerial vehicles (UAVs) requires fast reaction under limited onboard resources. We introduce the distributionally robust acceleration control barrier function (DR‑ACBF) as an efficient collision avoidance method maintaining safety regions. The method constructs a second‑order control barrier function as linear half‑space constraints on commanded acceleration. Latency, actuator limits, and obstacle accelerations are handled through an effective clearance that considers dynamics and delay. Uncertainty is mitigated using Cantelli tightening with per‑obstacle risk. A DR‑conditional value at risk (DR‑CVaR)based early trigger expands margins near violations to improve DOA. Real‑time execution is ensured via constant‑time Gauss‑Southwell projections. Simulation studies achieve similar avoidance performance at substantially lower computational effort than state‑of‑the‑art baseline approaches. Experiments with Crazyflie drones demonstrate the feasibility of our approach.
Authors: Hongzong Li, Luwei Liao, Xiangguang Dai, Yuming Feng, Rong Feng, Shiqin Tang
Abstract: Multi‑UAV cooperative path planning (MUCPP) is a fundamental problem in multi‑agent systems, aiming to generate collision‑free trajectories for a team of unmanned aerial vehicles (UAVs) to complete distributed tasks efficiently. A key challenge lies in achieving both efficiency, by minimizing total mission cost, and fairness, by balancing the workload among UAVs to avoid overburdening individual agents. This paper presents a novel Iterative Exchange Framework for MUCPP, balancing efficiency and fairness through iterative task exchanges and path refinements. The proposed framework formulates a composite objective that combines the total mission distance and the makespan, and iteratively improves the solution via local exchanges under feasibility and safety constraints. For each UAV, collision‑free trajectories are generated using A search over a terrain‑aware configuration space. Comprehensive experiments on multiple terrain datasets demonstrate that the proposed method consistently achieves superior trade‑offs between total distance and makespan compared to existing baselines.
Authors: Diogo Ferreira, Pedro Ribeiro, André Coelho, Rui Campos
Abstract: Autonomous Flying Networks (FNs) are emerging as a key enabler of on‑demand connectivity in dynamic and infrastructure‑limited environments. However, current approaches mainly focus on UAV placement, routing, and resource management, neglecting the autonomous perception of users and their service demands ‑ a critical capability for zero‑touch network operation.
This paper presents the Multi‑Agent Perception System (MAPS), a modular and scalable system that leverages multi‑modal large language models (MM‑LLMs) and agentic Artificial Intelligence (AI) to interpret visual and audio data collected by UAVs and generate Service Level Specifications (SLSs) describing user count, spatial distribution, and traffic demand. MAPS is evaluated using a synthetic multimodal emergency dataset, achieving user detection accuracies above 70% and SLS generation under 130 seconds in 90% of cases. Results demonstrate that combining audio and visual modalities enhances user detection and show that MAPS provides the perception layer required for autonomous, zero‑touch FNs.
Authors: Andreas Kouloumpris, Georgios L. Stavrinides, Maria K. Michael, Theocharis Theocharides
Abstract: With the advent of the Internet of Things (IoT), novel critical applications have emerged that leverage the edge/hub/cloud paradigm, which diverges from the conventional edge computing perspective. A growing number of such applications require a streamlined architecture for their effective execution, often comprising a single edge device with sensing capabilities, a single hub device (e.g., a laptop or smartphone) for managing and assisting the edge device, and a more computationally capable cloud server. Typical examples include the utilization of an unmanned aerial vehicle (UAV) for critical infrastructure inspection or a wearable biomedical device (e.g., a smartwatch) for remote patient monitoring. Task allocation in this streamlined architecture is particularly challenging, due to the computational, communication, and energy limitations of the devices at the network edge. Consequently, there is a need for a comprehensive framework that can address the specific task allocation problem optimally and efficiently. To this end, we propose a complete, binary integer linear programming (BILP) based formulation for an application‑driven design‑time approach, capable of providing an optimal task allocation in the targeted edge/hub/cloud environment. The proposed method minimizes the desired objective, either the overall latency or overall energy consumption, while considering several crucial parameters and constraints often overlooked in related literature. We evaluate our framework using a real‑world use‑case scenario, as well as appropriate synthetic benchmarks. Our extensive experimentation reveals that the proposed approach yields optimal and scalable results, enabling efficient design space exploration for different applications and computational devices.
Authors: Jiachen Li, Shihao Li, Jian Chu, Dongmei Chen
Abstract: Data Enabled Predictive Control (DeePC) is an established model free approach to predictive control, but it faces two open challenges: computational complexity that scales cubically with dataset size and performance degradation when data are corrupted. This paper introduces Robust Data Selection DeePC (RDS DeePC), a framework that addresses both obstacles through influence function analysis. We derive a sensitivity score quantifying the leverage each trajectory segment exerts on the optimization solution and prove that high sensitivity segments correspond to outliers while low sensitivity segments represent consistent data. Selecting low sensitivity segments thus yields both computational efficiency and automatic outlier filtering without requiring data quality labels. For nonlinear systems, we extend the framework via a two stage online selection approach accelerated by the LiSSA algorithm. Experiments on four systems of increasing complexity including a DC motor, an inverted pendulum, a planar quadrotor UAV tracking a figure 8 trajectory, and a kinematic bicycle vehicle following a figure 8 path demonstrate that RDS DeePC achieves 94 to 97 percent clean data selection and comparable or better tracking performance under 20 percent data corruption.
Authors: Zhongming Feng, Qiling Gao, Zeping Sui, Yun Lin, Michail Matthaiou
Abstract: This letter proposes a two‑stage distributionally robust optimization (DRO) framework for secure deployment and beamforming in an aerial reconfigurable intelligent surface (A‑RIS) assisted millimeter‑wave system. To account for multi‑timescale uncertainties arising from user mobility, imperfect channel state information (CSI), and hardware impairments, our approach decouples the long‑term unmanned aerial vehicle (UAV) placement from the per‑slot beamforming design. By employing the conditional value‑at‑risk (CVaR) as a distribution‑free risk metric, a low‑complexity algorithm is developed, which combines a surrogate model for efficient deployment with an alternating optimization (AO) scheme for robust real‑time beamforming. Simulation results validate that the proposed DRO‑CVaR framework significantly enhances the tail‑end secrecy spectral efficiency and maintains a lower outage probability compared to benchmark schemes, especially under severe uncertainty conditions.
Authors: Yuying Zhang, Na Fan, Haowen Zheng, Junning Liang, Zongliang Pan, Qifeng Chen, Ximin Lyu
Abstract: Uncrewed aerial vehicles (UAVs) performing tasks such as transportation and aerial photography are vulnerable to intentional projectile attacks from humans. Dodging such a sudden and fast projectile poses a significant challenge for UAVs, requiring ultra‑low latency responses and agile maneuvers. Drawing inspiration from baseball, in which pitchers' body movements are analyzed to predict the ball's trajectory, we propose a novel real‑time dodging system that leverages an RGB‑D camera. Our approach integrates human pose estimation with depth information to predict the attacker's motion trajectory and the subsequent projectile trajectory. Additionally, we introduce an uncertainty‑aware dodging strategy to enable the UAV to dodge incoming projectiles efficiently. Our perception system achieves high prediction accuracy and outperforms the baseline in effective distance and latency. The dodging strategy addresses temporal and spatial uncertainties to ensure UAV safety. Extensive real‑world experiments demonstrate the framework's reliable dodging capabilities against sudden attacks and its outstanding robustness across diverse scenarios.
Authors: Kanchon Gharami, Shafika Showkat Moni
Abstract: The rapid proliferation of unmanned aerial vehicles (UAVs) and their applications in diverse domains, such as surveillance, disaster management, agriculture, and defense, have revolutionized modern technology. While the potential benefits of swarm‑based UAV networks are growing significantly, they are vulnerable to various security attacks that can jeopardize the overall mission success by degrading their performance, disrupting decision‑making, and compromising the trajectory planning process. The Intrusion Detection System (IDS) plays a vital role in identifying potential security attacks to ensure the secure operation of UAV swarm networks. However, conventional IDS primarily focuses on binary classification with resource‑intensive neural networks and faces challenges, including latency, privacy breaches, increased performance overhead, and model drift. This research aims to address these challenges by developing a novel lightweight and federated continuous learning‑based IDS scheme. Our proposed model facilitates decentralized training across diverse UAV swarms to ensure data heterogeneity and privacy. The performance evaluation of our model demonstrates significant improvements, with classification accuracies of 99.45% on UKM‑IDS, 99.99% on UAV‑IDS, 96.85% on TLM‑UAV dataset, and 98.05% on Cyber‑Physical datasets.
Authors: Longkun Zou, Jiale Wang, Rongqin Liang, Hai Wu, Ke Chen, Yaowei Wang
Abstract: Accurate perception of UAVs in complex low‑altitude environments is critical for airspace security and related intelligent systems. Developing reliable solutions requires large‑scale, accurately annotated, and multimodal data. However, real‑world UAV data collection faces inherent constraints due to airspace regulations, privacy concerns, and environmental variability, while manual annotation of 3D poses and cross‑modal correspondences is time‑consuming and costly. To overcome these challenges, we introduce UAV‑MM3D, a high‑fidelity multimodal synthetic dataset for low‑altitude UAV perception and motion understanding. It comprises 400K synchronized frames across diverse scenes (urban areas, suburbs, forests, coastal regions) and weather conditions (clear, cloudy, rainy, foggy), featuring multiple UAV models (micro, small, medium‑sized) and five modalities ‑ RGB, IR, LiDAR, Radar, and DVS (Dynamic Vision Sensor). Each frame provides 2D/3D bounding boxes, 6‑DoF poses, and instance‑level annotations, enabling core tasks related to UAVs such as 3D detection, pose estimation, target tracking, and short‑term trajectory forecasting. We further propose LGFusionNet, a LiDAR‑guided multimodal fusion baseline, and a dedicated UAV trajectory prediction baseline to facilitate benchmarking. With its controllable simulation environment, comprehensive scenario coverage, and rich annotations, UAV3D offers a public benchmark for advancing 3D perception of UAVs.
Authors: Cahit Ikbal Er, Amin Kashiri, Yasin Yazicioglu
Abstract: We consider the robust planning of energy‑constrained unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs), which act as mobile charging stations, to perform long‑horizon aerial monitoring missions. More specifically, given a set of points to be visited by the UAVs and desired final positions of the UAV‑UGV teams, the objective is to find a robust plan (the vehicle trajectories) that can be realized without a major revision in the face of uncertainty (e.g., unknown obstacles/terrain, wind) to complete this mission in minimum time. We provide a formal description of this problem as a mixed‑integer program (MIP), which is NP‑hard. Since exact solution methods are computationally intractable for such problems, we propose RSPECT, a scalable and efficient heuristic. We provide theoretical results on the complexity of our algorithm and the feasibility and robustness of resulting plans. We also demonstrate the performance of our method via simulations and experiments.
Authors: Xusheng Zhu, Kai-Kit Wong, Hanjiang Hong, Han Xiao, Hao Xu, Tuo Wu, Chan-Byoung Chae
Abstract: This paper develops a comprehensive framework for the performance analysis of fluid antenna system (FAS)‑enabled unmanned aerial vehicle (UAV) relaying networks operating in the finite blocklength regime. Our contribution lies in establishing a rigorous methodology for characterizing system reliability under diverse propagation environments. Closed‑form expressions for the block error rate (BLER) are derived by employing a tractable eigenvalue‑based approximation of the spatially correlated UAV‑to‑user link, whose underlying independent diversity components are modeled as Nakagami‑m fading. This approach addresses both line‑of‑sight (LoS) dominant rural and probabilistic non‑line‑of‑sight (NLoS) urban scenarios. Furthermore, a high signal‑to‑noise ratio (SNR) asymptotic analysis is developed, revealing the fundamental diversity order of the UAV‑to‑user link. Based on this, we further address the practical issue of energy efficiency. A realistic energy efficiency maximization problem is formulated, which explicitly accounts for the time and energy overhead inherent in the FAS port selection process, a factor often omitted in idealized models. An efficient hierarchical algorithm is then proposed to jointly optimize the key system parameters. Extensive numerical results validate the analysis and illustrate that while FASs can yield substantial power gains, the operational overhead introduces a non‑trivial trade‑off. This trade‑off leads to an optimal number of ports and fundamentally different UAV deployment strategies in rural versus urban environments. This work provides both foundational analysis and practical design guidelines for FAS‑enabled UAV communications.
Authors: Kang Du, Xue Liao, Junpeng Xia, Chaozheng Guo, Yi Gu, Yirui Guan, Duotun Wang, Sheng Huang, Zeyu Wang
Abstract: Illumination inconsistency is a fundamental challenge in multi‑view 3D reconstruction. Variations in sunlight direction, cloud cover, and shadows break the constant‑lighting assumption underlying both classical multi‑view stereo (MVS) and structure from motion (SfM) pipelines and recent neural rendering methods, leading to geometry drift, color inconsistency, and shadow imprinting. This issue is especially critical in UAV‑based reconstruction, where long flight durations and outdoor environments make lighting changes unavoidable. However, existing datasets either restrict capture to short time windows, thus lacking meaningful illumination diversity, or span months and seasons, where geometric and semantic changes confound the isolated study of lighting robustness. We introduce UAVLight, a controlled‑yet‑real benchmark for illumination‑robust 3D reconstruction. Each scene is captured along repeatable, geo‑referenced flight paths at multiple fixed times of day, producing natural lighting variation under consistent geometry, calibration, and viewpoints. With standardized evaluation protocols across lighting conditions, UAVLight provides a reliable foundation for developing and benchmarking reconstruction methods that are consistent, faithful, and relightable in real outdoor environments.
Authors: Chenglizhao Chen, Shaofeng Liang, Runwei Guan, Xiaolou Sun, Haocheng Zhao, Haiyun Jiang, Tao Huang, Henghui Ding, Qing-Long Han
Abstract: Referring Multi‑Object Tracking (RMOT) aims to achieve precise object detection and tracking through natural language instructions, representing a fundamental capability for intelligent robotic systems. However, current RMOT research remains mostly confined to ground‑level scenarios, which constrains their ability to capture broad‑scale scene contexts and perform comprehensive tracking and path planning. In contrast, Unmanned Aerial Vehicles (UAVs) leverage their expansive aerial perspectives and superior maneuverability to enable wide‑area surveillance. Moreover, UAVs have emerged as critical platforms for Embodied Intelligence, which has given rise to an unprecedented demand for intelligent aerial systems capable of natural language interaction. To this end, we introduce AerialMind, the first large‑scale RMOT benchmark in UAV scenarios, which aims to bridge this research gap. To facilitate its construction, we develop an innovative semi‑automated collaborative agent‑based labeling assistant (COALA) framework that significantly reduces labor costs while maintaining annotation quality. Furthermore, we propose HawkEyeTrack (HETrack), a novel method that collaboratively enhances vision‑language representation learning and improves the perception of UAV scenarios. Comprehensive experiments validated the challenging nature of our dataset and the effectiveness of our method.
Authors: Benjamin Sportich, Kenza Boubakri, Olivier Simonin, Alessandro Renzaglia
Abstract: Reasons for mapping an unknown environment with autonomous robots are wide‑ranging, but in practice, they are often overlooked when developing planning strategies. Rapid information gathering and comprehensive structural assessment of buildings have different requirements and therefore necessitate distinct methodologies. In this paper, we propose a novel modular Next‑Best‑View (NBV) planning framework for aerial robots that explicitly uses a reconstruction quality objective to guide the exploration planning. In particular, our approach introduces new and efficient methods for view generation and selection of viewpoint candidates that are adaptive to the user‑defined quality requirements, fully exploiting the uncertainty encoded in a Truncated Signed Distance field (TSDF) representation of the environment. This results in informed and efficient exploration decisions tailored towards the predetermined objective. Finally, we validate our method via extensive simulations in realistic environments. We demonstrate that it successfully adjusts its behavior to the user goal while consistently outperforming conventional NBV strategies in terms of coverage, quality of the final 3D map and path efficiency.
Authors: Sakib Ahmed, Oscar Pizarro
Abstract: Unmanned Aerial Vehicles (UAVs) are crucial in Search and Rescue (SAR) missions due to their ability to monitor vast maritime areas. However, small objects often remain difficult to detect from high altitudes due to low object‑to‑background pixel ratios. We propose an altitude‑aware dynamic tiling method that scales and adaptively subdivides the image into tiles for enhanced small object detection. By integrating altitude‑dependent scaling with an adaptive tiling factor, we reduce unnecessary computation while maintaining detection performance. Tested on the SeaDronesSee dataset [1] with YOLOv5 [2] and Slicing Aided Hyper Inference (SAHI) framework [3], our approach improves Mean Average Precision (mAP) for small objects by 38% compared to a baseline and achieves more than double the inference speed compared to static tiling. This approach enables more efficient and accurate UAV‑based SAR operations under diverse conditions.
Authors: Mahmud Suhaimi Ibrahim, Shantanu Rahman, Muhammad Samin Hasan, Minhaj Uddin Ahmad, Abdullah Abrar
Abstract: Collision‑free path planning is the most crucial component in multi‑UAV formation‑flying (MFF). We use unlabeled homogenous quadcopters (UAVs) to demonstrate the use of a flow network to create complete (inter‑UAV) collision‑free paths. This procedure has three main parts: 1) Creating a flow network graph from physical GPS coordinates, 2) Finding a path of minimum cost (least distance) using any graph‑based path‑finding algorithm, and 3) Implementing the Ford‑Fulkerson Method to find the paths with the maximum flow (no collision). Simulations of up to 64 UAVs were conducted for various formations, followed by a practical experiment with 3 quadcopters for testing physical plausibility and feasibility. The results of these tests show the efficacy of this method's ability to produce safe, collision‑free paths.
Authors: Ehsan Karimi, Nhut Le, Maryam Rahnemoonfar
Abstract: Timely and accurate assessment of damages following natural disasters is essential for effective emergency response and recovery. Recent AI‑based frameworks have been developed to analyze large volumes of aerial imagery collected by Unmanned Aerial Vehicles, providing actionable insights rapidly. However, creating and annotating data for training these models is costly and time‑consuming, resulting in datasets that are limited in size and diversity. Furthermore, most existing approaches rely on traditional classification‑based frameworks with fixed answer spaces, restricting their ability to provide new information without additional data collection or model retraining. Using pre‑trained generative models built on in‑context learning (ICL) allows for flexible and open‑ended answer spaces. However, these models often generate hallucinated outputs or produce generic responses that lack domain‑specific relevance. To address these limitations, we propose ThiFAN‑VQA, a two‑stage reasoning‑based framework for visual question answering (VQA) in disaster scenarios. ThiFAN‑VQA first generates structured reasoning traces using chain‑of‑thought (CoT) prompting and ICL to enable interpretable reasoning under limited supervision. A subsequent answer selection module evaluates the generated responses and assigns the most coherent and contextually accurate answer, effectively improve the model performance. By integrating a custom information retrieval system, domain‑specific prompting, and reasoning‑guided answer selection, ThiFAN‑VQA bridges the gap between zero‑shot and supervised methods, combining flexibility with consistency. Experiments on FloodNet and RescueNet‑VQA, UAV‑based datasets from flood‑ and hurricane‑affected regions, demonstrate that ThiFAN‑VQA achieves superior accuracy, interpretability, and adaptability for real‑world post‑disaster damage assessment tasks.
Authors: Shuyu Cao, Minxin Chen, Yucheng Song, Zhaozhong Chen, Xinyou Zhang
Abstract: Small object detection in Unmanned Aerial Vehicle (UAV) imagery is a persistent challenge, hindered by low resolution and background clutter. While fusing RGB and infrared (IR) data offers a promising solution, existing methods often struggle with the trade‑off between effective cross‑modal interaction and computational efficiency. In this letter, we introduce MambaRefine‑YOLO. Its core contributions are a Dual‑Gated Complementary Mamba fusion module (DGC‑MFM) that adaptively balances RGB and IR modalities through illumination‑aware and difference‑aware gating mechanisms, and a Hierarchical Feature Aggregation Neck (HFAN) that uses a ``refine‑then‑fuse'' strategy to enhance multi‑scale features. Our comprehensive experiments validate this dual‑pronged approach. On the dual‑modality DroneVehicle dataset, the full model achieves a state‑of‑the‑art mAP of 83.2%, an improvement of 7.9% over the baseline. On the single‑modality VisDrone dataset, a variant using only the HFAN also shows significant gains, demonstrating its general applicability. Our work presents a superior balance between accuracy and speed, making it highly suitable for real‑world UAV applications.
Authors: Nguyen Duc Minh Quang, Chang Liu, Huy-Trung Nguyen, Shuangyang Li, Derrick Wing Kwan Ng, Wei Xiang
Abstract: Low‑altitude wireless networks (LAWN) are rapidly expanding with the growing deployment of unmanned aerial vehicles (UAVs) for logistics, surveillance, and emergency response. Reliable connectivity remains a critical yet challenging task due to three‑dimensional (3D) mobility, time‑varying user density, and limited power budgets. The transmit power of base stations (BSs) fluctuates dynamically according to user locations and traffic demands, leading to a highly non‑stationary 3D radio environment. Radio maps (RMs) have emerged as an effective means to characterize spatial power distributions and support radio‑aware network optimization. However, most existing works construct static or offline RMs, overlooking real‑time power variations and spatio‑temporal dependencies in multi‑UAV networks. To overcome this limitation, we propose a 3D dynamic radio map (3D‑DRM) framework that learns and predicts the spatio‑temporal evolution of received power. Specially, a Vision Transformer (ViT) encoder extracts high‑dimensional spatial representations from 3D RMs, while a Transformer‑based module models sequential dependencies to predict future power distributions. Experiments unveil that 3D‑DRM accurately captures fast‑varying power dynamics and substantially outperforms baseline models in both RM reconstruction and short‑term prediction.
Authors: Xueyan Oh, Leonard Loh, Shaohui Foong, Zhong Bao Andy Koh, Kow Leong Ng, Poh Kang Tan, Pei Lin Pearlin Toh, U-Xuan Tan
Abstract: General Visual Inspection is a manual inspection process regularly used to detect and localise obvious damage on the exterior of commercial aircraft. There has been increasing demand to perform this process at the boarding gate to minimise the downtime of the aircraft and automating this process is desired to reduce the reliance on human labour. Automating this typically requires estimating a camera's pose with respect to the aircraft for initialisation but most existing localisation methods require infrastructure, which is very challenging in uncontrolled outdoor environments and within the limited turnover time (approximately 2 hours) on an airport tarmac. Additionally, many airlines and airports do not allow contact with the aircraft's surface or using UAVs for inspection between flights, and restrict access to commercial aircraft. Hence, this paper proposes an on‑site method that is infrastructure‑free and easy to deploy for estimating a pan‑tilt‑zoom camera's pose and localising scan images. This method initialises using the same pan‑tilt‑zoom camera used for the inspection task by utilising a Deep Convolutional Neural Network fine‑tuned on only synthetic images to predict its own pose. We apply domain randomisation to generate the dataset for fine‑tuning the network and modify its loss function by leveraging aircraft geometry to improve accuracy. We also propose a workflow for initialisation, scan path planning, and precise localisation of images captured from a pan‑tilt‑zoom camera. We evaluate and demonstrate our approach through experiments with real aircraft, achieving root‑mean‑square camera pose estimation errors of less than 0.24 m and 2 degrees for all real scenes.
Authors: Yanbo Yin, Dingzhu Wen, Changsheng You, XiaoWen Cao, Tat-Ming Lok, Dusit Niyato
Abstract: Space‑Air‑Ground Integrated Networks (SAGINs) are pivotal for enabling ubiquitous connectivity in 6G systems, yet they face significant challenges due to severe satellite‑to‑ground link impairments. Although Unmanned Aerial Vehicles (UAVs) can function as relay nodes to compensate for air‑to‑ground channel degradation, the satellite‑to‑UAV link remains a critical bottleneck. Semantic Communication (SemCom) emerges as a promising solution to enhance spectral efficiency by transmitting essential semantic information. This paper proposes a novel multi‑cluster UAV‑aided SAGIN SemCom architecture that supports both semantic users (SemUsers) and conventional users (ConUsers). While SemCom is employed in the satellite‑to‑UAV link to improve transmission efficiency, the UAVs implement an intelligent adaptive relay strategy, capable of either directly forwarding semantic data to SemUsers or converting it into bit‑level data for ConUsers. Compared to existing similar schemes, this design guarantees the high‑efficiency advantages of SemCom while enabling network access for larger coverage area. A joint optimization problem is formulated to maximize the system's sum‑rate through coordinated allocation of power, bandwidth, and UAV positions. To address this non‑convex problem, we develop an efficient alternating optimization (AO) algorithm, which decomposes the original problem into tractable subproblems. Numerical results demonstrate that the proposed algorithm significantly outperforms baseline schemes in terms of both sum‑rate and spectral efficiency across various channel conditions and user distributions, underscoring the importance of joint resource allocation and intelligent UAV deployment.
Authors: Sigrid Helene Strand, Thomas Wiedemann, Bram Burczek, Dmitriy Shutin
Abstract: Search and rescue missions are often critical following sudden natural disasters or in high‑risk environmental situations. The most challenging search and rescue missions involve difficult‑to‑access terrains, such as dense forests with high occlusion. Deploying unmanned aerial vehicles for exploration can significantly enhance search effectiveness, facilitate access to challenging environments, and reduce search time. However, in dense forests, the effectiveness of unmanned aerial vehicles depends on their ability to capture clear views of the ground, necessitating a robust search strategy to optimize camera positioning and perspective. This work presents an optimized planning strategy and an efficient algorithm for the next best view problem in occluded environments. Two novel optimization heuristics, a geometry heuristic, and a visibility heuristic, are proposed to enhance search performance by selecting optimal camera viewpoints. Comparative evaluations in both simulated and real‑world settings reveal that the visibility heuristic achieves greater performance, identifying over 90% of hidden objects in simulated forests and offering 10% better detection rates than the geometry heuristic. Additionally, real‑world experiments demonstrate that the visibility heuristic provides better coverage under the canopy, highlighting its potential for improving search and rescue missions in occluded environments.
Authors: Rajat Bhattacharjya, Sing-Yao Wu, Hyunwoo Oh, Chaewon Nam, Suyeon Koo, Mohsen Imani, Elaheh Bozorgzadeh, Nikil Dutt
Abstract: Unmanned Aerial Vehicles (UAVs) in disaster response require complex, queryable intelligence that onboard CNNs cannot provide. While Vision‑Language Models (VLMs) offer this semantic reasoning, their high resource demands make on‑device deployment infeasible, and naive cloud offloading fails under the low‑bandwidth, unstable networks endemic to disaster zones. We present AVERY, an intent‑driven adaptive split computing framework for efficient VLM deployment on resource‑constrained platforms. AVERY is motivated by the observation that operator intent must be treated as a first‑class system objective, since missions such as broad situational monitoring and precise, spatially grounded investigation require different semantic products, latency targets, and resource allocations. To reflect this, AVERY advances split computing beyond traditional depth‑wise partitioning through a functional, cognitive‑inspired dual‑stream split: a high‑frequency, low‑resolution Context stream for real‑time awareness, and a low‑frequency, high‑fidelity Insight stream for deep analysis. This design enables a hierarchical split strategy: computation is first separated by function, then partitioned depth‑wise across edge and cloud when the Insight stream is required. A lightweight, self‑aware onboard controller monitors network conditions and operator intent to select from pre‑trained compression models, navigating the accuracy‑throughput trade‑off at runtime. Evaluated using LISA‑7B in an edge‑cloud setting under fluctuating network conditions, AVERY achieves 11.2% higher accuracy than raw image compression, 93.98% lower energy consumption than full‑edge execution, and average accuracy within 0.75% of the static High‑Accuracy baseline during dynamic adaptation. Overall, AVERY enhances mission efficiency and enables real‑time, queryable intelligence in dynamic disaster environments.
Authors: Miguel Lourenço, António Grilo
Abstract: Unmanned Aerial Vehicle (UAV) swarms represent a key advancement in autonomous systems, enabling coordinated missions through inter‑UAV communication. However, their reliance on wireless links makes them vulnerable to jamming, which can disrupt coordination and mission success. This work investigates whether a UAV swarm can effectively overcome jamming while maintaining communication and mission efficiency.
To address this, a unified optimization framework combining Genetic Algorithms (GA), Supervised Learning (SL), and Reinforcement Learning (RL) is proposed. The mission model, structured into epochs and timeslots, allows dynamic path planning, antenna orientation, and swarm formation while progressively enforcing collision rules. Null‑steering antennas enhance resilience by directing antenna nulls toward interference sources.
Results show that the GA achieved stable, collision‑free trajectories but with high computational cost. SL models replicated GA‑based configurations but struggled to generalize under dynamic or constrained settings. RL, trained via Proximal Policy Optimization (PPO), demonstrated adaptability and real‑time decision‑making with consistent communication and lower computational demand. Additionally, the Adaptive Movement Model generalized UAV motion to arbitrary directions through a rotation‑based mechanism, validating the scalability of the proposed system.
Overall, UAV swarms equipped with null‑steering antennas and guided by intelligent optimization algorithms effectively mitigate jamming while maintaining communication stability, formation cohesion, and collision safety. The proposed framework establishes a unified, flexible, and reproducible basis for future research on resilient swarm communication systems.
Authors: Darren Chiu, Zhehui Huang, Ruohai Ge, Gaurav S. Sukhatme
Abstract: Nano‑UAV teams offer great agility yet face severe navigation challenges due to constrained onboard sensing, communication, and computation. Existing approaches rely on high‑resolution vision or compute‑intensive planners, rendering them infeasible for these platforms. We introduce LEARN, a lightweight, two‑stage safety‑guided reinforcement learning (RL) framework for multi‑UAV navigation in cluttered spaces. Our system combines low‑resolution Time‑of‑Flight (ToF) sensors and a simple motion planner with a compact, attention‑based RL policy. In simulation, LEARN outperforms two state‑of‑the‑art planners by 10% while using substantially fewer resources. We demonstrate LEARN's viability on six Crazyflie quadrotors, achieving fully onboard flight in diverse indoor and outdoor environments at speeds up to 2.0 m/s and traversing 0.2 m gaps.
Authors: Kien Nguyen, Feng Liu, Clinton Fookes, Sridha Sridharan, Xiaoming Liu, Arun Ross
Abstract: The rapid emergence of airborne platforms and imaging sensors is enabling new forms of aerial surveillance due to their unprecedented advantages in scale, mobility, deployment, and covert observation capabilities. This paper provides a comprehensive overview of 150+ papers over the last 10 years of human‑centric aerial surveillance tasks from a computer vision and machine learning perspective. It aims to provide readers with an in‑depth systematic review and technical analysis of the current state of aerial surveillance tasks using drones, UAVs, and other airborne platforms. The object of interest is humans, where human subjects are to be detected, identified, and re‑identified. More specifically, for each of these tasks, we first identify unique challenges in performing these tasks in an aerial setting compared to the popular ground‑based setting and subsequently compile and analyze aerial datasets publicly available for each task. Most importantly, we delve deep into the approaches in the aerial surveillance literature with a focus on investigating how they presently address aerial challenges and techniques for improvement. We conclude the paper by discussing the gaps and open research questions to inform future research avenues.
Authors: Xusheng Zhu, Kai-Kit Wong, Qingqing Wu, Hyundong Shin, Yangyang Zhang
Abstract: Fluid antenna systems (FAS) have emerged as a revolutionary technology offering enhanced spatial diversity within a compact form factor. Concurrently, unmanned aerial vehicles (UAVs) are integral to future networks, necessitating channel models that capture both multipath fading and shadowing. This letter presents a novel performance analysis of a UAV‑to‑ground link, where the receiver is equipped with an N‑port FAS operating over the challenging double‑shadowing fading channel. By adapting a tractable eigenvalue‑based approximation for the correlated FAS ports, we derive new analytical expressions for the end‑to‑end signal‑to‑noise ratio statistics, namely the cumulative distribution function and the probability density function. Based on these statistics, we present exact integral expressions for the outage probability, average bit error rate, and average channel capacity. We further derive new, tractable closed‑form solutions for the average bit error rate and capacity for the practical dual‑rank, independent but non‑identically distributed case. Finally, a key asymptotic analysis reveals that the system achieves a multiplicative diversity order of G_d = M × d, which is precisely the product of the FAS spatial rank M and the intrinsic channel diversity order d. Simulation results are provided to validate the high accuracy of our entire theoretical framework.
Authors: Tomáš Musil, Matěj Petrlík, Martin Saska
Abstract: Autonomous exploration of unknown environments is a key capability for mobile robots, but it is largely unsolved for robots equipped with only a single monocular camera and no dense range sensors. In this paper, we present a novel approach to monocular vision‑based exploration that can safely cover large‑scale unstructured indoor and outdoor 3D environments by explicitly accounting for the properties of a sparse monocular SLAM frontend in both mapping and planning. The mapping module solves the problems of sparse depth data, free‑space gaps, and large depth uncertainty by oversampling free space in texture‑sparse areas and keeping track of obstacle position uncertainty. The planning module handles the added free‑space uncertainty through rapid replanning and perception‑aware heading control. We further show that frontier‑based exploration is possible with sparse monocular depth data when parallax requirements and the possibility of textureless surfaces are taken into account. We evaluate our approach extensively in diverse real‑world and simulated environments, including ablation studies. To the best of the authors' knowledge, the proposed method is the first to achieve 3D monocular exploration in real‑world unstructured outdoor environments. We open‑source our implementation to support future research.
Authors: Ali Azarbahram, Chrystian Pool Yuca Huanca, Gian Paolo Incremona, Patrizio Colaneri
Abstract: This paper introduces a Koopman‑enhanced distributed switched model predictive control (SMPC) framework for safe and scalable navigation of quadrotor unmanned aerial vehicles (UAVs) in dynamic environments with moving obstacles. The proposed method integrates switched motion modes and data‑driven prediction to enable real‑time, collision‑free coordination. A localized Koopman operator approximates nonlinear obstacle dynamics as linear models based on online measurements, enabling accurate trajectory forecasting. These predictions are embedded into a distributed SMPC structure, where each UAV makes autonomous decisions using local and cluster‑based information. This computationally efficient architecture is particularly promising for applications in surface transportation, including coordinated vehicle flows, shared infrastructure with pedestrians or cyclists, and urban UAV traffic. Simulation results demonstrate reliable formation control and real‑time obstacle avoidance, highlighting the frameworks broad relevance for intelligent and cooperative mobility systems.
Authors: Tim Lakemann, Daniel Bonilla Licea, Viktor Walter, Martin Saska
Abstract: Reflections of active markers in the environment are a common source of ambiguity in onboard visual relative localization. This work presents a novel approach that exploits these typically unwanted reflections for onboard relative localization in heterogeneous multi‑UAV teams. The method operates without prior knowledge of robot size or predefined marker configurations, remains independent of surface properties, and explicitly accounts for uncertainties caused by surface irregularities, including dynamic water surfaces relevant for marine deployments. We validated the approach in both indoor and outdoor experiments, demonstrating reliable operation across varying lighting conditions and achieving greater effective range (above 30 m) and accuracy than state‑of‑the‑art methods. The video is available under the following link: https://youtu.be/y0zp8cIwkig.
Authors: Yendo Hu, Yiliang Wu, Weican Chen
Abstract: In multi UAV scenarios,the traditional Artificial Potential Field (APF) method often leads to redundant flight paths and frequent abrupt heading changes due to unreasonable obstacle avoidance path planning,and is highly prone to inter UAV collisions during the obstacle avoidance process.To address these issues,this study proposes a novel hybrid algorithm that combines the improved Multi‑Robot Formation Obstacle Avoidance (MRF IAPF) algorithm with an enhanced APF optimized for single UAV path planning.Its core ideas are as follows:first,integrating three types of interaction forces from MRF IAPF obstacle repulsion force,inter UAV interaction force,and target attraction force;second,incorporating a refined single UAV path optimization mechanism,including collision risk assessment and an auxiliary sub goal strategy.When a UAV faces a high collision threat,temporary waypoints are generated to guide obstacle avoidance,ensuring eventual precise arrival at the actual target.Simulation results demonstrate that compared with traditional APF based formation algorithms,the proposed algorithm achieves significant improvements in path length optimization and heading stability,can effectively avoid obstacles and quickly restore the formation configuration,thus verifying its applicability and effectiveness in static environments with unknown obstacles.
Authors: Chi-Han Chen, Chieh-Ming Chen, Wen-Huang Cheng, Ching-Chun Huang
Abstract: The study of terrain and landform classification through UAV remote sensing diverges significantly from ground vehicle patrol tasks. Besides grappling with the complexity of data annotation and ensuring temporal consistency, it also confronts the scarcity of relevant data and the limitations imposed by the effective range of many technologies. This research substantiates that, in aerial positioning tasks, both the mean Intersection over Union (mIoU) and temporal consistency (TC) metrics are of paramount importance. It is demonstrated that fully labeled data is not the optimal choice, as selecting only key data lacks the enhancement in TC, leading to failures. Hence, a teacher‑student architecture, coupled with key frame selection and key frame updating algorithms, is proposed. This framework successfully performs weakly supervised learning and TC knowledge distillation, overcoming the deficiencies of traditional TC training in aerial tasks. The experimental results reveal that our method utilizing merely 30% of labeled data, concurrently elevates mIoU and temporal consistency ensuring stable localization of terrain objects. Result demo : https://gitlab.com/prophet.ai.inc/drone‑based‑riverbed‑inspection
Authors: Huan Lin, Dakai Liu, Lianghui Ding, Lin Wang, Feng Yang
Abstract: Unmanned aerial vehicle (UAV) swarms encounter the challenge of high overhead due to both network management and formation control requirements. In this paper, we propose a Bio‑inspired Integrated Networking and Control (BINC) scheme, enabling efficient formation management for swarms comprising thousands of UAVs. The scheme forms a two‑layer hierarchical structure, where network clusters and formations share the same groups so that cross‑cluster control is eliminated. For networking, we design a fused routing message together with control information to reduce overhead, and limit clusters' size to local two‑hop topologies for fast command transmission. For controlling, we develop a hybrid bio‑inspired control approach, including a pigeon‑like leader‑follower algorithm within formations under the consideration of cluster topology maintenance, and a starling‑like algorithm among formations that helps to improve the ability of obstacle avoidance. We establish a simulation platform for UAV swarms with over 1000 nodes, and experimental results show that the proposed BINC scheme can achieve highly maneuverable swarm formation marching with significant reduction on communication overhead.
Authors: Timilehin T. Ayanlade, Anirudha Powadi, Talukder Z. Jubery, Baskar Ganapathysubramanian, Soumik Sarkar
Abstract: Recent advances in plant phenotyping have driven widespread adoption of multi sensor platforms for collecting crop canopy reflectance data. This includes the collection of heterogeneous data across multiple platforms, with Unmanned Aerial Vehicles (UAV) seeing significant usage due to their high performance in crop monitoring, forecasting, and prediction tasks. Similarly, satellite missions have been shown to be effective for agriculturally relevant tasks. In contrast to UAVs, such missions are bound to the limitation of spatial resolution, which hinders their effectiveness for modern farming systems focused on micro‑plot management. In this work, we propose a cross modal learning strategy that enriches high‑resolution satellite imagery with UAV level visual detail for crop canopy trait estimation. Using a dataset of approximately co registered satellite UAV image pairs collected from replicated plots of 84 hybrid maize varieties across five distinct locations in the U.S. Corn Belt, we train a model that learns fine grained spectral spatial correspondences between sensing modalities. Results show that the generated UAV‑like representations from satellite inputs consistently outperform real satellite imagery on multiple downstream tasks, including yield and nitrogen prediction, demonstrating the potential of cross‑modal correspondence learning to bridge the gap between satellite and UAV sensing in agricultural monitoring.
Authors: Md Main Uddin Hasan, Milena Radenkovic
Abstract: Opportunistic routing architectures offer a resilient communication paradigm in environments where conventional networks fail due to disrupted infrastructure, dynamic node mobility, and intermittent connectivity conditions that commonly arise during large‑scale disasters. In Bangladesh, recurring floods severely hinder communication systems, isolating affected populations and obstructing emergency response efforts. To address these challenges, there is a growing demand for intelligent and adaptive routing solutions capable of sustaining critical communication and services without relying on fixed infrastructure. This research presents AZIZA (Adaptive Zone‑based Intelligent Fully Distributed Trust‑Aware Routing Protocol), a next‑generation opportunistic protocol designed to improve the resiliency of critical communication and services in disaster‑prone and flood‑affected regions. AZIZA supports adaptive data delivery for emergency alerts, sensor readings, and inter‑zone coordination by integrating (1) zone‑based forwarding to optimize localized transmission, (2) trust‑aware logic to bypass uncooperative or malicious nodes, and (3) context‑driven decision‑making based on trust metrics, residual energy, and historical delivery patterns. AZIZA operates over lightweight, infrastructure‑less edge ad hoc networks comprising mobile phones, UAVs, and ground vehicles acting as decentralized service relays. Simulation results using The Opportunistic Network Environment (ONE) Simulator configured with real‑world mobility traces and flood data from Bangladesh demonstrate that AZIZA significantly outperforms benchmark approaches in delivery reliability, energy efficiency, and routing resilience. As a scalable and deployable framework, AZIZA advances the use of next‑generation opportunistic routing in environments where traditional systems routinely collapse.
Authors: Yajun Zhao, Mengnan Jian, Yifei Yuan
Abstract: Unmanned Aerial Vehicles (UAVs) play a pivotal role in the emerging low‑altitude economy. However, they face significant challenges in achieving reliable network coverage during transit operations. This paper provides an in‑depth investigation into the characteristics and challenges of communication networks tailored for UAVs. First, we outline typical operational scenarios, traffic patterns, and a dual‑layer heterogeneous network topology. This topology is essential for enabling three‑dimensional continuous coverage and ensuring seamless network coexistence between UAVs and other network entities. Moreover, the paper delves into the channel characteristics and specific challenges faced by UAV Integrated Sensing and Communication (ISAC) networks. It highlights the limitations of traditional Active Phased Array Antenna (APAA)‑based networks, particularly regarding cost, complexity, and site deployment constraints. We then introduce Reconfigurable Intelligent Surface (RIS)‑assisted networks as a promising solution for enhancing UAV signal coverage. The key technical features of RIS are discussed, including design principles, antenna tilt configurations, new beam types, and beam tracking mechanisms. In addition, we examine the impact of highfrequency bands and their absorption peaks on signal attenuation. The paper further explores network architecture designs aimed at improving UAV signal coverage, facilitating network coexistence, and supporting RIS‑enhanced UAV sensing. Field trial results evaluating the effectiveness of RIS in improving UAV coverage are presented. Finally, we outline future technological trends and highlight potential advancements to further optimize UAV communication systems. We also emphasize the importance of engineering implementation and standardization efforts in RIS‑based UAV‑ISAC networks.
Authors: Zizhen Zhou, Ying-Chang Liang, Yanyu Cheng, Wei Yang Bryan Lim
Abstract: Deploying foundation models (FMs) on uncrewed aerial vehicles (UAVs) promises broad ``low‑altitude economy'' applications. Split federated learning (SFL)‑based fine‑tuning leverages distributed data while keeping raw data local and reduces client‑side burden by partitioning the model between client and server. However, the per‑round training latency is dominated by stragglers. Training paradigms featuring parallel gradient transmission (GT) allocate dedicated portions of downlink communication resources to each client. They may leave resources idle and suffer from prolonged GT latency, especially in UAV networks, where the communication latency typically far exceeds the computation latency. To address this, we propose a sequential GT paradigm, where the server dedicates all downlink resources for the current GT. We further propose communication‑pipelined SFL (CPSFL), characterized by downlink GT priority scheduling and intra‑round asynchronous training. We investigate CPSFL‑based LoRA fine‑tuning of FMs in UAV networks and formulate an optimization problem to minimize a weighted sum of per‑round training latency and worst‑case client energy consumption by optimizing the split point selection (SPS) and the computing and communication resource allocation (CCRA) (the uplink bandwidth allocation and the server computing frequency allocation). To solve this problem, we develop an attention‑based deep reinforcement learning (DRL) framework, where the base station agent decides the split point and the CCRA in each round by leveraging previous round information, including UAV trajectories. Simulation results show that the proposed DRL‑based CPSFL scheme outperforms the parallel GT benchmarks, the ablation variants, the fixed CCRA scheme, while approaching the best fixed‑SPS scheme.
Authors: Spyridon Loukovitis, Vasileios Karampinis, Athanasios Voulodimos
Abstract: Developing reliable UAV navigation systems requires robust air‑to‑air object detectors capable of distinguishing between objects seen during training and previously unseen objects. While many methods address closed‑set detection and achieve high‑confidence recognition of in‑domain (ID) targets, they generally do not tackle open‑set detection, which requires simultaneous handling of both ID and out‑of‑distribution (OOD) objects. Existing open‑set approaches typically rely on a single uncertainty score with thresholding, limiting flexibility and often conflating OOD objects with background clutter. In contrast, we propose a lightweight, model‑agnostic post‑processing framework that explicitly separates background from unknown objects while preserving the base detector's performance. Our approach extends open‑set detection beyond binary ID/OOD classification to real‑time three‑way classification among ID targets, OOD objects, and background. To this end, we employ a fusion scheme that aggregates multiple confidence estimates and per‑detection features using a compact multilayer perceptron (MLP). Incorporating different logit variants into the MLP consistently enhances performance across both binary and three‑class classification without compromising throughput. Extensive ablation and comparative experiments confirm that our method surpasses threshold‑based baselines in two‑class classification by an average of 2.7% AUROC, while retaining or improving open‑set mAP. Furthermore, our study uniquely enables robust three‑class classification, a critical capability for safe UAV navigation, where OOD objects must be actively avoided and background regions safely ignored. Comparative analysis highlights that our method surpasses competitive techniques in AUROC across datasets, while improving closed‑set mAP by up to 9 points, an 18% relative gain.
Authors: Mauro Larrat, Claudomiro Sales
Abstract: Unmanned aerial vehicle (UAV) detection and aerial object recognition are critical for modern surveillance and security, prompting a need for robust systems that overcome limitations of single‑modality approaches. This research addresses these challenges by designing and rigorously evaluating a novel multimodal Transformer model that integrates diverse data streams: radar, visual band video (RGB), infrared (IR) video, and audio. The architecture effectively fuses distinct features from each modality, leveraging the Transformer's self‑attention mechanisms to learn comprehensive, complementary, and highly discriminative representations for classification. The model demonstrated exceptional performance on an independent test set, achieving macro‑averaged metrics of 0.9812 accuracy, 0.9873 recall, 0.9787 precision, 0.9826 F1‑score, and 0.9954 specificity. Notably, it exhibited particularly high precision and recall in distinguishing drones from other aerial objects. Furthermore, computational analysis confirmed its efficiency, with 1.09 GFLOPs, 1.22 million parameters, and an inference speed of 41.11 FPS, highlighting its suitability for real‑time applications. This study presents a significant advancement in aerial object classification, validating the efficacy of multimodal data fusion via a Transformer architecture for achieving state‑of‑the‑art performance, thereby offering a highly accurate and resilient solution for UAV detection and monitoring in complex airspace.
Authors: Jiajun Liu, Yimin Zhu, Xiaorui Liu, Mingye Cao, Mingchao Li, Lixian Zhang
Abstract: This paper proposed a novel fully‑actuated hexacopter. It features a dual‑frame passive tilting structure and achieves independent control of translational motion and attitude with minimal actuators. Compared to previous fully‑actuated UAVs, it liminates internal force cancellation, resulting in higher flight efficiency and endurance under equivalent payload conditions. Based on the dynamic model of fully‑actuated hexacopter, a full‑actuation controller is designed to achieve efficient and stable control. Finally, simulation is conducted, validating the superior fully‑actuated motion capability of fully‑actuated hexacopter and the effectiveness of the proposed control strategy.
Authors: Dimitria Silveria, Kleber Cabral, Peter Jardine, Sidney Givigi
Abstract: This paper presents the integration and experimental validation of advanced control strategies for quadcopters based on Lie groups. We build upon recent theoretical developments on SE2(3)‑based controllers and introduce a novel SE2(3) model predictive controller (MPC) that combines the predictive capabilities and constraint‑handling of optimal control with the geometric properties of Lie group formulations. We evaluated this MPC against a state‑of‑the‑art SE2(3)‑based LQR approach and obtained comparable performance in simulation. Both controllers where also deployed on the Quanser QDrone platform and compared to each other and an industry standard control architecture. Results show that the SE_2(3) MPC achieves superior trajectory tracking performance and robustness across a range of scenarios. This work demonstrates the practical effectiveness of Lie group‑based controllers and offers comparative insights into their impact on system behaviour and real‑time performance
Authors: Yue Yu, Xiaobo Zheng, Shaoming He
Abstract: Distributed optimization offers a promising paradigm for trajectory planning in Unmanned Aerial Vehicle (UAV) swarms, yet its deployment in communication‑constrained environments remains challenging due to unreliable links and limited data exchange. This paper addresses this issue via a two‑tier architecture explicitly designed for operation under communication constraints. We develop a Communication‑Aware Asynchronous Distributed Trajectory Optimization (CA‑ADTO) framework that integrates Parameterized Differential Dynamic Programming (PDDP) for local trajectory optimization of individual UAVs with an asynchronous Alternating Direction Method of Multipliers (async‑ADMM) for swarm‑level coordination. The proposed architecture enables fully distributed optimization while substantially reducing communication overhead, making it suitable for real‑world scenarios in which reliable connectivity cannot be guaranteed. The method is particularly effective in handling nonlinear dynamics and spatio‑temporal coupling under communication constraints.
Authors: Jan Quenzel, Valerij Sekin, Daniel Schleich, Alexander Miller, Merlin Stampa, Norbert Pahlke, Christof Röhrig, Sven Behnke
Abstract: Fires in industrial facilities pose special challenges to firefighters, e.g., due to the sheer size and scale of the buildings. The resulting visual obstructions impair firefighting accuracy, further compounded by inaccurate assessments of the fire's location. Such imprecision simultaneously increases the overall damage and prolongs the fire‑brigades operation unnecessarily.
We propose an automated assistance system for firefighting using a motorized fire monitor on a turntable ladder with aerial support from an unmanned aerial vehicle (UAV). The UAV flies autonomously within an obstacle‑free flight funnel derived from geodata, detecting and localizing heat sources. An operator supervises the operation on a handheld controller and selects a fire target in reach. After the selection, the UAV automatically plans and traverses between two triangulation poses for continued fire localization. Simultaneously, our system steers the fire monitor to ensure the water jet reaches the detected heat source. In preliminary tests, our assistance system successfully localized multiple heat sources and directed a water jet towards the fires.
Authors: Svetlana Seliunina, Daniel Schleich, Sven Behnke
Abstract: In our work, we extend the current state‑of‑the‑art approach for autonomous multi‑UAV exploration to consumer‑level UAVs, such as the DJI Mini 3 Pro. We propose a pipeline that selects viewpoint pairs from which the depth can be estimated and plans the trajectory that satisfies motion constraints necessary for odometry estimation. For the multi‑UAV exploration, we propose a semi‑distributed communication scheme that distributes the workload in a balanced manner. We evaluate our model performance in simulation for different numbers of UAVs and prove its ability to safely explore the environment and reconstruct the map even with the hardware limitations of consumer‑grade UAVs.
Authors: Xiaolin Wang, Houzhang Fang, Qingshan Li, Lu Wang, Yi Chang, Luxin Yan
Abstract: Infrared unmanned aerial vehicle (UAV) target images often suffer from motion blur degradation caused by rapid sensor movement, significantly reducing contrast between target and background. Generally, detection performance heavily depends on the discriminative feature representation between target and background. Existing methods typically treat deblurring as a preprocessing step focused on visual quality, while neglecting the enhancement of task‑relevant features crucial for detection. Improving feature representation for detection under blur conditions remains challenging. In this paper, we propose a novel Joint Feature‑Domain Deblurring and Detection end‑to‑end framework, dubbed JFD3. We design a dual‑branch architecture with shared weights, where the clear branch guides the blurred branch to enhance discriminative feature representation. Specifically, we first introduce a lightweight feature restoration network, where features from the clear branch serve as feature‑level supervision to guide the blurred branch, thereby enhancing its distinctive capability for detection. We then propose a frequency structure guidance module that refines the structure prior from the restoration network and integrates it into shallow detection layers to enrich target structural information. Finally, a feature consistency self‑supervised loss is imposed between the dual‑branch detection backbones, driving the blurred branch to approximate the feature representations of the clear one. Wealso construct a benchmark, named IRBlurUAV, containing 30,000 simulated and 4,118 real infrared UAV target images with diverse motion blur. Extensive experiments on IRBlurUAV demonstrate that JFD3 achieves superior detection performance while maintaining real‑time efficiency.
Authors: Hesam Mojtahedi, Reza Akhavian
Abstract: This paper presents a BIM‑discrepancy‑driven active sensing framework for cooperative navigation between unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs) in dynamic construction environments. Traditional navigation approaches rely on static Building Information Modeling (BIM) priors or limited onboard perception. In contrast, our framework continuously fuses real‑time LiDAR data from aerial and ground robots with BIM priors to maintain an evolving 2D occupancy map. We quantify navigation safety through a unified corridor‑risk metric integrating occupancy uncertainty, BIM‑map discrepancy, and clearance. When risk exceeds safety thresholds, the UAV autonomously re‑scans affected regions to reduce uncertainty and enable safe replanning. Validation in PX4‑Gazebo simulation with Robotec GPU LiDAR demonstrates that risk‑triggered re‑scanning reduces mean corridor risk by 58% and map entropy by 43% compared to static BIM navigation, while maintaining clearance margins above 0.4 m. Compared to frontier‑based exploration, our approach achieves similar uncertainty reduction in half the mission time. These results demonstrate that integrating BIM priors with risk‑adaptive aerial sensing enables scalable, uncertainty‑aware autonomy for construction robotics.
Authors: Yuqi Ping, Tingting Zhang, Tianhao Liang
Abstract: In this paper, we study a cellular‑connected unmanned aerial vehicle (UAV) which aims to fly between two predetermined locations while maintaining ultra‑reliable low‑latency communications (URLLC) for command‑and‑control (C2) links with terrestrial base stations (BSs). Long‑range flights often trigger frequent inter‑cell handovers, which may introduce delays and synchronization overhead. We jointly optimize the continuous trajectory and BS association to minimize handovers, path length, and flying time, subject to communication reliability and kinematic constraints. To address this problem, we reformulate it as an optimization based on the graph of convex sets (GCS). First, the URLLC requirement is translated into spatially feasible regions in the flight plane for each BS. And an intersection graph is constructed including the start and goal points. Each graph node is associated with a smooth and dynamically feasible trajectory segment. The trajectory is parameterized in space by Bézier curves and in time by a monotonic Bézier scaling, together with convex constraints that ensure continuity and enforce speed bounds. Next, we impose unit‑flow constraints to enforce a single path, and by coupling the resulting binary edge‑selection variables with the convex constraints, we obtain a mixed‑integer convex program (MICP). Applying a convex relaxation and rounding to the mixed‑integer convex program produces nearly globally optimal routes, and a final refinement yields smooth, dynamically feasible trajectories. Simulations verify that the method preserves URLLC connectivity while achieving a clear trade‑off between fewer handovers and flight efficiency.
Authors: Harsh Abhinandan, Aditya Dhanraj, Aryan Katoch, R. Raja Singh
Abstract: Unmanned Aerial Vehicles (UAVs) or drones have witnessed a spectacular surge in applications for military, commercial, and civilian purposes. However, their potential for flight is always limited by the finite power budget of their onboard power supplies. The limited flight time problem has led to intensive research into new sources of power and innovative charging strategies to enable protracted, autonomous flight. This paper gives a comparative summary of the current state‑of‑the‑art in UAV power and refuelling technology. The paper begins with an analysis of the variety of energy sources, from classical batteries to fuel cells and hybrid systems, based on their relative advantages and disadvantages in energy density, weight, and safety. Subsequently, the review explores a spectrum of replenishment options, from simple manual battery swapping to sophisticated high‑tech automatic docking stations and smart contact‑based charging pads. Most of the review is dedicated to the newer technology of wireless power transfer, which involves near‑field (inductive, capacitive) and far‑field (laser, microwave) technology. The article also delves into the most important power electronic converter topologies, battery management systems, and control approaches that form the core of these charging systems. Finally, it recapitulates the most significant challenges in technical, economic, and social aspects for promising avenues of future research. The comprehensive review is a valuable guide for researchers, engineers, and policymakers striving to enhance UAV operational performance.
Authors: Fangzhi Li, Zhichu Ren, Cunhua Pan, Hong Ren, Jing Jin, Qixing Wang, Jiangzhou Wang
Abstract: To enhance the performance of aerial‑ground networks, this paper proposes an integrated sensing and communication (ISAC) framework for multi‑UAV systems. In our model, ground base stations (BSs) cooperatively serve multiple unmanned aerial vehicles (UAVs), employing a dynamic time‑division strategy where beam scanning for sensing precedes data communication in each time slot. To maximize the sum communication rate while satisfying a mission‑level cumulative radar mutual information (MI) requirement, we jointly optimize the UAV trajectories, communication and sensing power allocation, and the time‑division ratio. The resulting highly coupled non‑convex optimization problem is efficiently solved using an alternating optimization (AO) and successive convex approximation (SCA) framework, which yields a non‑decreasing objective sequence and convergence to a finite objective value under the adopted surrogate‑based iterative procedure. Extensive simulation results demonstrate that our proposed joint design significantly outperforms benchmark schemes with static trajectories, partially optimized resources, or non‑cooperative single‑BS transmission. Furthermore, a comprehensive sensitivity analysis reveals the distinct mechanisms by which sensing thresholds and the number of UAVs influence resource allocation and spatial organization, highlighting the critical importance of dynamic, multi‑dimensional resource management for effectively navigating the sensing‑communication trade‑off in low‑altitude economies.
Authors: Liangshun Wu, Wen Chen, Shunqing Zhang, Yajun Wang, Kunlun Wang
Abstract: In post‑disaster space‑air‑ground integrated networks (SAGINs), terrestrial infrastructure is often impaired, and unmanned aerial vehicles (UAVs) must rapidly restore connectivity for mission‑critical ground terminals in cluttered non‑line‑of‑sight (NLoS) urban environments. To enhance coverage, UAVs employ movable antennas (MAs), while reconfigurable intelligent surfaces (RISs) on surviving high‑rises redirect signals. The key challenge is communication‑limited partial observability, leaving each UAV with a narrow, fast‑changing neighborhood view that destabilizes value estimation. Existing multi‑agent reinforcement learning (MARL) approaches are inadequate‑‑non‑communication methods rely on unavailable global critics, heuristic sharing is brittle and redundant, and learnable protocols (e.g., CommNet, DIAL) lose per‑neighbor structure and aggravate non‑stationarity under tight bandwidth. To address partial observability, we propose a spatiotemporal A2C where each UAV transmits prior‑decision messages with local state, a compact policy fingerprint, and a recurrent belief, encoded per neighbor and concatenated. A spatial discount shapes value targets to emphasize local interactions, while analysis under one‑hop‑per‑slot latency explains stable training with delayed views. Experimental results show our policy outperforms IA2C, ConseNet, FPrint, DIAL, and CommNet‑‑achieving faster convergence, higher asymptotic reward, reduced Temporal‑Difference(TD)/advantage errors, and a better communication throughput‑energy trade‑off.
Authors: Amelia Samandari, Andreas Willig, Barry Wu, Philippa Martin
Abstract: Deployment of Unmanned Aerial Vehicles (UAVs) in autonomous formations necessitates accurate and timely communication of safety information. A communication protocol that supports timely and successful transfer of safety information between UAVs is therefore needed. This paper presents Distributed Self‑allocated Time slot Reuse (D‑STR). Our D‑STR protocol addresses the essential task of communicating safety information in rigid Unmanned Aerial Vehicle (UAV) formations with different network topologies, enabling collision‑free deployment of the formation. This is an important step for improving the safety and practicality of UAV formations in application scenarios that span a range of industries.
Authors: Jordan Leyva, Nahim J. Moran Vera, Yihan Xu, Adrien Durasno, Christopher U. Romero, Tendai Chimuka, Gabriel O. Huezo Ramirez, Ziqian Dong, Roberto Rojas-Cessa
Abstract: Obstacle avoidance path planning for uncrewed aerial vehicles (UAVs), or drones, is rarely addressed in most flight path planning schemes, despite obstacles being a realistic condition. Obstacle avoidance can also be energy‑intensive, making it a critical factor in efficient point‑to‑point drone flights. To address these gaps, we propose EcoFlight, an energy‑efficient pathfinding algorithm that determines the lowest‑energy route in 3D space with obstacles. The algorithm models energy consumption based on the drone propulsion system and flight dynamics. We conduct extensive evaluations, comparing EcoFlight with direct‑flight and shortest‑distance schemes. The simulation results across various obstacle densities show that EcoFlight consistently finds paths with lower energy consumption than comparable algorithms, particularly in high‑density environments. We also demonstrate that a suitable flying speed can further enhance energy savings.
Authors: Sungjun Seo, Kooktae Lee
Abstract: The growing scale of modern farms has increased the need for efficient and adaptive multi‑agent coverage strategies for pest, weed, and disease management. Traditional methods such as manual inspection and blanket pesticide spraying often lead to excessive chemical use, resource waste, and environmental impact. While unmanned aerial vehicles (UAVs) offer a promising platform for precision agriculture through targeted spraying and improved operational efficiency, existing UAV‑based approaches remain limited by battery life, payload capacity, and scalability, especially in large fields where single‑UAV or uniformly distributed spraying is insufficient. Although multi‑UAV coordination has been explored, many current frameworks still assume uniform spraying and do not account for infestation severity, UAV dynamics, non‑uniform resource allocation, or energy‑efficient coordination.
To address these limitations, this paper proposes a Density‑Driven Optimal Control (D2OC) framework that integrates Optimal Transport (OT) theory with multi‑UAV coverage control for large‑scale agricultural spraying. The method supports non‑uniform, priority‑aware resource allocation based on infestation intensity, reducing unnecessary chemical application. UAVs are modeled as a linear time‑varying (LTV) system to capture variations in mass and inertia during spraying missions. The D2OC control law, derived using Lagrangian mechanics, enables efficient coordination, balanced workload distribution, and improved mission duration. Simulation results demonstrate that the proposed approach outperforms uniform spraying and Spectral Multiscale Coverage (SMC) in coverage efficiency, chemical reduction, and operational sustainability, providing a scalable solution for smart agriculture.
Authors: Jiao Chen, Haoyi Wang, Jianhua Tang, Junyi Wang
Abstract: Low‑altitude Unmanned Aerial Vehicle (UAV) networks rely on robust semantic segmentation as a foundational enabler for distributed sensing‑communication‑control co‑design across heterogeneous agents within the network. However, segmentation foundation models deteriorate quickly under weather, lighting, and viewpoint drift. Resource‑limited UAVs cannot run gradient‑based test‑time adaptation, while resource‑massive UAVs adapt independently, wasting shared experience. To address these challenges, we propose AdaptFly, a prompt‑guided test‑time adaptation framework that adjusts segmentation models without weight updates. AdaptFly features two complementary adaptation modes. For resource‑limited UAVs, it employs lightweight token‑prompt retrieval from a shared global memory. For resource‑massive UAVs, it uses gradient‑free sparse visual prompt optimization via Covariance Matrix Adaptation Evolution Strategy. An activation‑statistic detector triggers adaptation, while cross‑UAV knowledge pool consolidates prompt knowledge and enables fleet‑wide collaboration with negligible bandwidth overhead. Extensive experiments on UAVid and VDD benchmarks, along with real‑world UAV deployments under diverse weather conditions, demonstrate that AdaptFly significantly improves segmentation accuracy and robustness over static models and state‑of‑the‑art TTA baselines. The results highlight a practical path to resilient, communication‑efficient perception in the emerging low‑altitude economy.
Authors: Rathin Chandra Shit, Sharmila Subudhi
Abstract: The real‑time performance, adversarial resiliency, and privacy preservation are the most important metrics that need to be balanced to practice collision avoidance in large‑scale multi‑UAV (Unmanned Aerial Vehicle) systems. Current frameworks tend to prescribe monolithic solutions that are not only prohibitively computationally complex with a scaling cost of O(n^2) but simply do not offer Byzantine fault tolerance. The proposed hierarchical framework presented in this paper tries to eliminate such trade‑offs by stratifying a three‑layered architecture. We spread the intelligence into three layers: an immediate collision avoiding local layer running on dense graph attention with latency of <10 ms, a regional layer using sparse attention with O(nk) computational complexity and asynchronous federated learning with coordinate‑wise trimmed mean aggregation, and lastly, a global layer using a lightweight Hashgraph‑inspired protocol. We have proposed an adaptive differential privacy mechanism, wherein the noise level (ε\in [0.1, 1.0]) is dynamically reduced based on an evaluation of the measured real‑time threat that in turn maximized the privacy‑utility tradeoff. Through the use of Distributed Hash Table (DHT)‑based lightweight audit logging instead of heavyweight blockchain consensus, the median cost of getting a 95^th percentile decision within 50ms is observed across all tested swarm sizes. This architecture provides a scalable scenario of 500 UAVs with a collision rate of < 2.0% and the Byzantine fault tolerance of f < n/3.
Authors: Michael Z. Zgurovsky, Pavlo O. Kasyanov, Liliia S. Paliichuk
Abstract: This note presents an analytical framework for decision‑making in drone swarm systems operating under uncertainty, based on the integration of Partially Observable Markov Decision Processes (POMDP) with Deep Deterministic Policy Gradient (DDPG) reinforcement learning. The proposed approach enables adaptive control and cooperative behavior of unmanned aerial vehicles (UAVs) within a cognitive AI platform, where each agent learns optimal energy management and navigation policies from dynamic environmental states. We extend the standard DDPG architecture with a belief‑state representation derived from Bayesian filtering, allowing for robust decision‑making in partially observable environments. In this paper, for the Gaussian case, we numerically compare the performance of policies derived from DDPG to optimal policies for discretized versions of the original continuous problem. Simulation results demonstrate that the POMDP‑DDPG‑based swarm control model significantly improves mission success rates and energy efficiency compared to baseline methods. The developed framework supports distributed learning and decision coordination across multiple agents, providing a foundation for scalable cognitive swarm autonomy. The outcomes of this research contribute to the advancement of energy‑aware control algorithms for intelligent multi‑agent systems and can be applied in security, environmental monitoring, and infrastructure inspection scenarios.
Authors: Abdul Saboor, Evgenii Vinogradov
Abstract: Uncrewed Aerial Vehicles (UAVs) serving as Aerial Base Stations (ABSs) are expected to extend 6G millimeter‑Wave (mmWave) coverage and improve link reliability in urban areas. However, UAV‑based Air‑to‑Ground (A2G) channels are highly dependent on height and urban geometry. This paper proposes an ABS height‑dependent mmWave channel model and investigates whether urban geometry, beyond the standard built‑up parameters, significantly affects LoS probability (PLoS) and Large‑Scale Fading (LSF). Using MATLAB ray tracing at 26 GHz, we simulate approximately 10K city realizations for four urban layouts that share identical built‑up parameters but differ in their spatial organization. We extract elevation‑based PLoS using a sigmoid model and derive height‑dependent Path‑Loss Exponents (PLEs) and shadow‑fading trends using exponential fits. Results show that PLE for Non‑Line‑of‑Sight (NLoS) decreases toward 2.5‑3 at high altitudes, Line‑of‑Sight (LoS) PLE remains near 2, and shadow fading reduces with height. We also find that geometric layout introduces a modest but consistent change in PLE (+/‑ 0.2), even when built‑up parameters are fixed. The proposed unified model aligns well with ray‑tracing statistics and offers a practical, height‑dependent LSF model suitable for ABS planning in complex urban scenarios.
Authors: Raghav Adhikari, Sachet Khatiwada, Suman Poudel
Abstract: Post‑disaster situations pose unique navigation challenges. One of those challenges is the unstructured nature of the environment, which makes it hard to layout paths for rescue vehicles. We propose the use of Uncrewed Aerial Vehicle (UAV) in such scenario to perform reconnaissance across the environment. To accomplish this, we propose an optimization‑based approach to plan a path for the UAV at optimal height where the sensors of the UAV can cover the most area and collect data with minimum uncertainty.
Authors: Honghao Wang, Qingqing Wu, Yifan Jiang, Ziyuan Zheng, Ziheng Zhang, Yanze Zhu, Ying Gao, Wen Chen, Guanghai Liu, Abbas Jamalipour
Abstract: Low‑altitude unmanned aerial vehicle (UAV) networks are integral to future 6G integrated sensing and communication (ISAC) systems. However, their deployment is hindered by challenges stemming from high mobility of UAVs, complex propagation environments, and the inherent trade‑offs between coexisting sensing and communication functions. This article proposes a novel framework that leverages movable antennas (MAs) and intelligent reflecting surfaces (IRSs) as dual enablers to overcome these limitations. MAs, through active transceiver reconfiguration, and IRSs, via passive channel reconstruction, can work in synergy to significantly enhance system performance. Our analysis first elaborates on the fundamental gains offered by MAs and IRSs, and provides simulation results that validate the immense potential of the MA‑IRS‑enabled ISAC architecture. Two core UAV deployment scenarios are then investigated: (i) UAVs as ISAC users, where we focus on achieving high‑precision tracking and aerial safety, and (ii) UAVs as aerial network nodes, where we address robust design and complex coupled resource optimization. Finally, key technical challenges and research opportunities are identified and analyzed for each scenario, charting a clear course for the future design of advanced low‑altitude ISAC networks.
Authors: Houzhang Fang, Shukai Guo, Qiuhuan Chen, Yi Chang, Luxin Yan
Abstract: Moving infrared small target detection (IRSTD) plays a critical role in practical applications, such as surveillance of unmanned aerial vehicles (UAVs) and UAV‑based search system. Moving IRSTD still remains highly challenging due to weak target features and complex background interference. Accurate spatio‑temporal feature modeling is crucial for moving target detection, typically achieved through either temporal differences or spatio‑temporal (3D) convolutions. Temporal difference can explicitly leverage motion cues but exhibits limited capability in extracting spatial features, whereas 3D convolution effectively represents spatio‑temporal features yet lacks explicit awareness of motion dynamics along the temporal dimension. In this paper, we propose a novel moving IRSTD network (TDCNet), which effectively extracts and enhances spatio‑temporal features for accurate target detection. Specifically, we introduce a novel temporal difference convolution (TDC) re‑parameterization module that comprises three parallel TDC blocks designed to capture contextual dependencies across different temporal ranges. Each TDC block fuses temporal difference and 3D convolution into a unified spatio‑temporal convolution representation. This re‑parameterized module can effectively capture multi‑scale motion contextual features while suppressing pseudo‑motion clutter in complex backgrounds, significantly improving detection performance. Moreover, we propose a TDC‑guided spatio‑temporal attention mechanism that performs cross‑attention between the spatio‑temporal features from the TDC‑based backbone and a parallel 3D backbone. This mechanism models their global semantic dependencies to refine the current frame's features. Extensive experiments on IRSTD‑UAV and public infrared datasets demonstrate that our TDCNet achieves state‑of‑the‑art detection performance in moving target detection.
Authors: Ziqi Chen, Jun Du, Chunxiao Jiang, Tony Q. S. Quek, Zhu Han
Abstract: With the explosive advancement of unmanned aerial vehicles (UAVs), the security of efficient UAV networks has become increasingly critical. Owing to the open nature of its communication environment, illegitimate malicious UAVs (MUs) can infer the position of the source UAV (SU) by analyzing received signals, thus compromising the SU location privacy. To protect the SU location privacy while ensuring efficient communication with legitimate receiving UAVs (RUs), we propose an Active Reconfigurable Intelligent Surface (ARIS)‑assisted covert communication scheme based on virtual partitioning and artificial noise (AN). Specifically, we design a novel ARIS architecture integrated with an AN module. This architecture dynamically partitions its reflecting elements into multiple sub‑regions: one subset is optimized to enhance the communication rate between the SU and RUs, while the other subset generates AN to interfere with the localization of the SU by MUs. We first derive the Cramér‑Rao Lower Bound (CRLB) for the localization with received signal strength (RSS), based on which, we establish a joint optimization framework for communication enhancement and localization interference. Subsequently, we derive and validate the optimal ARIS partitioning and power allocation under average channel conditions. Finally, tailored optimization methods are proposed for the reflection precoding and AN design of the two partitions. Simulation results validate that, compared to baseline schemes, the proposed scheme significantly increases the localization error of the SU by MUs while maintaining efficient communication between the SU and RUs, thereby effectively protecting the SU location privacy.
Authors: Mingyang Yu, Haorui Yang, Kangning An, Xinjian Wei, Xiaoxuan Xu, Jing Xu
Abstract: With the widespread adoption of unmanned aerial vehicles (UAV), effective path planning has become increasingly important. Although traditional search methods have been extensively applied, metaheuristic algorithms have gained popularity due to their efficiency and problem‑specific heuristics. However, challenges such as premature convergence and lack of solution diversity still hinder their performance in complex scenarios. To address these issues, this paper proposes an Enhanced Multi‑Strategy Dwarf Mongoose Optimization (EDMO) algorithm, tailored for three‑dimensional UAV trajectory planning in dynamic and obstacle‑rich environments. EDMO integrates three novel strategies: (1) a Dynamic Quantum Tunneling Optimization Strategy (DQTOS) to enable particles to probabilistically escape local optima; (2) a Bio‑phototactic Dynamic Focusing Search Strategy (BDFSS) inspired by microbial phototaxis for adaptive local refinement; and (3) an Orthogonal Lens Opposition‑Based Learning (OLOBL) strategy to enhance global exploration through structured dimensional recombination. EDMO is benchmarked on 39 standard test functions from CEC2017 and CEC2020, outperforming 14 advanced algorithms in convergence speed, robustness, and optimization accuracy. Furthermore, real‑world validations on UAV three‑dimensional path planning and three engineering design tasks confirm its practical applicability and effectiveness in field robotics missions requiring intelligent, adaptive, and time‑efficient planning.
Authors: Selim Ahmet Iz, Mustafa Unel
Abstract: This paper presents a novel image‑based path planning algorithm that was developed using computer vision techniques, as well as its comparative analysis with well‑known deterministic and probabilistic algorithms, namely A and Probabilistic Road Map algorithm (PRM). The terrain depth has a significant impact on the calculated path safety. The craters and hills on the surface cannot be distinguished in a two‑dimensional image. The proposed method uses a disparity map of the terrain that is generated by using a UAV. Several computer vision techniques, including edge, line and corner detection methods, as well as the stereo depth reconstruction technique, are applied to the captured images and the found disparity map is used to define candidate way‑points of the trajectory. The initial and desired points are detected automatically using ArUco marker pose estimation and circle detection techniques. After presenting the mathematical model and vision techniques, the developed algorithm is compared with well‑known algorithms on different virtual scenes created in the V‑REP simulation program and a physical setup created in a laboratory environment. Results are promising and demonstrate effectiveness of the proposed algorithm.
Authors: Collin Hague, Artur Wolek
Abstract: This paper considers the problem of searching for a point of interest (POI) moving along an urban road network with an uncrewed aerial vehicle (UAV). The UAV is modeled as a variable‑speed Dubins vehicle with a line‑of‑sight sensor in an urban environment that may occlude the sensor's view of the POI. A search strategy is proposed that exploits a probabilistic visibility volume (VV) to plan its future motion with iterative deepening A^\ast. The probabilistic VV is a time‑varying three‑dimensional representation of the sensing constraints for a particular distribution of the POI's state. To find the path most likely to view the POI, the planner uses a heuristic to optimistically estimate the probability of viewing the POI over a time horizon. The probabilistic VV is max‑pooled to create a variable‑timestep planner that reduces the search space and balances long‑term and short‑term planning. The proposed path planning method is compared to prior work with a Monte‑Carlo simulation and is shown to outperform the baseline methods in cluttered environments when the UAV's sensor has a higher false alarm probability.
Authors: Dao Lan Vy Dinh, Anh Nguyen Thi Mai, Hung Tran, Giang Quynh Le Vu, Tu Dac Ho, Zhenni Pan, Vo Nhan Van, Symeon Chatzinotas, Dinh-Hieu Tran
Abstract: This paper investigates the unmanned aerial vehicle (UAV)‑assisted resilience perspective in the 6G network energy saving (NES) scenario. More specifically, we consider multiple ground base stations (GBSs) and each GBS has three different sectors/cells in the terrestrial networks, and multiple cells may become inactive due to unexpected events such as power outages, disasters, hardware failures, or erroneous energy‑saving decisions made by external network management systems. During the time required to reactivate these cells, UAVs are deployed to temporarily restore user service. To address this, we propose a Multi‑Agent Deep Deterministic Policy Gradient (MADDPG) framework to enable UAV‑assisted communication by jointly optimizing UAV trajectories, transmission power, and user‑UAV association under a sleeping ground base station (GBS) strategy. This framework aims to ensure the resilience of active users in the network and the long‑term operability of UAVs. Specifically, it maximizes service coverage for users during power outages or NES zones, while minimizing the energy consumption of UAVs. Simulation results demonstrate that the proposed MADDPG policy consistently achieves high coverage ratio across different testing episodes, outperforming other baselines. Moreover, the MADDPG framework attains the lowest total energy consumption, while maintaining a comparable user service rate. These results confirm the effectiveness of the proposed approach in achieving a superior trade‑off between energy efficiency and service performance, supporting the development of sustainable and resilient UAV‑assisted cellular networks.
Authors: Mengqi Li, Lixin Li, Wensheng Lin, Zhu Han, Tamer Başar
Abstract: Emerging 6G wireless systems suffer severe performance degradation in challenging environments like high‑speed trains traversing dense urban corridors and Unmanned Aerial Vehicles (UAVs) links over mountainous terrain. These scenarios exhibit non‑Gaussian, non‑stationary channels with heavy‑tailed fading and abrupt signal fluctuations. To address these challenges, this paper proposes a novel wireless channel model based on symmetric α‑stable Lévy processes, thereby enabling continuous‑time state‑space characterization of both long‑term and short‑term fading. Building on this model, a generalized optimal control framework is developed via a fractional Hamilton‑Jacobi‑Bellman (HJB) equation that incorporates the Riesz fractional operator to capture non‑local spatial effects and memory‑dependent dynamics. The existence and uniqueness of viscosity solutions to the fractional HJB equation are rigorously established, thus ensuring the theoretical validity of the proposed control formulation. Numerical simulations conducted in a multi‑cell, multi‑user downlink setting demonstrate the effectiveness of the fractional HJB‑based strategy in optimizing transmission power under heavy‑tailed co‑channel and multi‑user interference.
Authors: Vitor Bueno, Ali Azarbahram, Marcello Farina, Lorenzo Fagiano
Abstract: This paper presents a Koopman‑based model predictive control (MPC) framework for safe UAV navigation in dynamic environments using real‑time LiDAR data. By leveraging the Koopman operator to linearly approximate the dynamics of surrounding objets, we enable efficient and accurate prediction of the position of moving obstacles. Embedding this into an MPC formulation ensures robust, collision‑free trajectory planning suitable for real‑time execution. The method is validated through simulation and ROS2‑Gazebo implementation, demonstrating reliable performance under sensor noise, actuation delays, and environmental uncertainty.
Authors: Selim Ahmet Iz, Mustafa Unel
Abstract: Unmanned Aerial Vehicles (UAVs) are widely used for aerial photography and remote sensing applications. One of the main challenges is to stitch together multiple images into a single high‑resolution image that covers a large area. Featurebased image stitching algorithms are commonly used but can suffer from errors and ambiguities in feature detection and matching. To address this, several approaches have been proposed, including using bundle adjustment techniques or direct image alignment. In this paper, we present a novel method that uses a combination of IMU data and computer vision techniques for stitching images captured by a UAV. Our method involves several steps such as estimating the displacement and rotation of the UAV between consecutive images, correcting for perspective distortion, and computing a homography matrix. We then use a standard image stitching algorithm to align and blend the images together. Our proposed method leverages the additional information provided by the IMU data, corrects for various sources of distortion, and can be easily integrated into existing UAV workflows. Our experiments demonstrate the effectiveness and robustness of our method, outperforming some of the existing feature‑based image stitching algorithms in terms of accuracy and reliability, particularly in challenging scenarios such as large displacements, rotations, and variations in camera pose.
Authors: Jiawei Huang, Aimin Wang, Geng Sun, Jiahui Li, Jiacheng Wang, Weijie Yuan, Dusit Niyato, Xianbin Wang
Abstract: Low‑altitude wireless networks (LAWNs) have emerged as a viable solution for maritime communications. In these maritime LAWNs, unmanned aerial vehicles (UAVs) serve as practical low‑altitude platforms for wireless communications due to their flexibility and ease of deployment. However, the open and clear UAV communication channels make maritime LAWNs vulnerable to eavesdropping attacks. Existing security approaches often assume eavesdroppers follow predefined trajectories, which fails to capture the dynamic movement patterns of eavesdroppers in realistic maritime environments. To address this challenge, we consider a low‑altitude maritime communication system that employs intelligent jamming to counter dynamic eavesdroppers with uncertain positioning to enhance the physical layer security. Since such a system requires balancing the conflicting performance metrics of the secrecy rate and energy consumption of UAVs, we formulate a secure and energy‑efficient maritime communication multi‑objective optimization problem (SEMCMOP). To solve this dynamic and long‑term optimization problem, we first reformulate it as a partially observable Markov decision process (POMDP). We then propose a novel soft actor‑critic with conditional variational autoencoder (SAC‑CVAE) algorithm, which is a deep reinforcement learning algorithm improved by generative artificial intelligence. Specifically, the SAC‑CVAE algorithm employs advantage‑conditioned latent representations to disentangle and optimize policies, while enhancing computational efficiency by reducing the state space dimension. Simulation results demonstrate that our proposed intelligent jamming approach achieves secure and energy‑efficient maritime communications.
Authors: Sivaram Krishnan, Jinho Choi, Jihong Park
Abstract: A wide variety of real‑world data, such as sea measurements, e.g., temperatures collected by distributed sensors and multiple unmanned aerial vehicles (UAV) trajectories, can be naturally represented as graphs, often exhibiting non‑Euclidean structures. These graph representations may evolve over time, forming time‑varying graphs. Effectively modeling and analyzing such dynamic graph data is critical for tasks like predicting graph evolution and reconstructing missing graph data. In this paper, we propose a framework based on the Koopman autoencoder (KAE) to handle time‑varying graph data. Specifically, we assume the existence of a hidden non‑linear dynamical system, where the state vector corresponds to the graph embedding of the time‑varying graph signals. To capture the evolving graph structures, the graph data is first converted into a vector time series through graph embedding, representing the structural information in a finite‑dimensional latent space. In this latent space, the KAE is applied to learn the underlying non‑linear dynamics governing the temporal evolution of graph features, enabling both prediction and reconstruction tasks.
Authors: Peican Lin, Gan Sun, Chenxi Liu, Fazeng Li, Weihong Ren, Yang Cong
Abstract: Vision‑language models (VLMs) have been widely‑applied in ground‑based vision‑language navigation (VLN). However, the vast complexity of outdoor aerial environments compounds data acquisition challenges and imposes long‑horizon trajectory planning requirements on Unmanned Aerial Vehicles (UAVs), introducing novel complexities for aerial VLN. To address these challenges, we propose a data‑efficient Open‑world aerial Vision‑Language Navigation (i.e., OpenVLN) framework, which could execute language‑guided flight with limited data constraints and enhance long‑horizon trajectory planning capabilities in complex aerial environments. Specifically, we reconfigure a reinforcement learning framework to optimize the VLM for UAV navigation tasks, which can efficiently fine‑tune VLM by using rule‑based policies under limited training data. Concurrently, we introduce a long‑horizon planner for trajectory synthesis that dynamically generates precise UAV actions via value‑based rewards. To the end, we conduct sufficient navigation experiments on the TravelUAV benchmark with dataset scaling across diverse reward settings. Our method demonstrates consistent performance gains of up to 4.34% in Success Rate, 6.19% in Oracle Success Rate, and 4.07% in Success weighted by Path Length over baseline methods, validating its deployment efficacy for long‑horizon UAV navigation in complex aerial environments.
Authors: Selim Ahmet Iz, Francesco Nex, Norman Kerle, Henry Meissner, Ralf Berger
Abstract: Real‑time processing of UAV imagery is crucial for applications requiring urgent geospatial information, such as disaster response, where rapid decision‑making and accurate spatial data are essential. However, processing high‑resolution imagery in real time presents significant challenges due to the computational demands of feature extraction, matching, and bundle adjustment (BA). Conventional BA methods either downsample images, sacrificing important details, or require extensive processing time, making them unsuitable for time‑critical missions. To overcome these limitations, we propose a novel real‑time BA framework that operates directly on fullresolution UAV imagery without downsampling. Our lightweight, onboard‑compatible approach divides each image into user‑defined patches (e.g., NxN grids, default 150x150 pixels) and dynamically tracks them across frames using UAV GNSS/IMU data and a coarse, globally available digital surface model (DSM). This ensures spatial consistency for robust feature extraction and matching between patches. Overlapping relationships between images are determined in real time using UAV navigation system, enabling the rapid selection of relevant neighbouring images for localized BA. By limiting optimization to a sliding cluster of overlapping images, including those from adjacent flight strips, the method achieves real‑time performance while preserving the accuracy of global BA. The proposed algorithm is designed for seamless integration into the DLR Modular Aerial Camera System (MACS), supporting largearea mapping in real time for disaster response, infrastructure monitoring, and coastal protection. Validation on MACS datasets with 50MP images demonstrates that the method maintains precise camera orientations and high‑fidelity mapping across multiple strips, running full bundle adjustment in under 2 seconds without GPU acceleration.
Authors: Ali Elkhazraji, Mohamed-Slim Alouini, Aamir Farooq
Abstract: High‑Altitude Platform Stations (HAPS) are emerging as key enablers of future non‑terrestrial networks (NTNs), supporting gigabit‑class free‑space optical (FSO) backhaul links while hosting laser‑based sensing payloads. This tutorial and survey reviews recent advances in HAPS optical communication and integration with atmospheric remote sensing via shared optical links. Among several sensing techniques, Differential Absorption Lidar (DIAL) is identified as most promising due to its range‑resolved sensitivity, spectral selectivity, and compatibility with HAPS constraints. A roadmap is outlined for implementing telecom‑band DIAL on HAPS alongside high‑throughput FSO systems. The paper analyzes architectural and atmospheric advantages of HAPS over terrestrial and satellite nodes, emphasizing spatial‑temporal coverage, station‑keeping ability, and support for compact laser payloads such as DIAL, in‑situ sensors, and multispectral imagers. It highlights the feasibility of co‑locating sensing and communication within a shared optical and power envelope, especially in the telecom C‑band (1.53‑1.57 um), enabling trace‑gas retrieval (CO2, CH4, N2O, H2S, O3) while maintaining multi‑Gbps downlinks. Suitable HAPS architectures (balloons, UAVs, airships) and use cases are identified where integrated sensing and communication (ISAC)‑enabled HAPS outperform satellites and UAVs, including greenhouse gas monitoring, disaster response, air‑quality mapping, and 6G NTN extensions. A literature survey for 2005‑2025 shows HAPS publications have tripled since 2014, indicating rapid growth. The results confirm that optical hardware, favorable transmission windows, and active R&D are positioning HAPS as a persistent stratospheric layer for 6G ISAC communications and environmental observation.
Authors: Addison Kalanther, Daniel Bostwick, Chinmay Maheshwari, Shankar Sastry
Abstract: We consider a scenario where a team of two unmanned aerial vehicles (UAVs) pursue an evader UAV within an urban environment. Each agent has a limited view of their environment where buildings can occlude their field‑of‑view. Additionally, the pursuer team is agnostic about the evader in terms of its initial and final location, and the behavior of the evader. Consequently, the team needs to gather information by searching the environment and then track it to eventually intercept. To solve this multi‑player, partially‑observable, pursuit‑evasion game, we develop a two‑phase neuro‑symbolic algorithm centered around the principle of bounded rationality. First, we devise an offline approach using deep reinforcement learning to progressively train adversarial policies for the pursuer team against fictitious evaders. This creates k‑levels of rationality for each agent in preparation for the online phase. Then, we employ an online classification algorithm to determine a "best guess" of our current opponent from the set of iteratively‑trained strategic agents and apply the best player response. Using this schema, we improved average performance when facing a random evader in our environment.
Authors: Li-Yu Lin, Benjamin Perseghetti, James Goppert
Abstract: Most of the rigid‑body systems which evolve on nonlinear Lie groups where Euclidean control designs lose geometric meaning. In this paper, we introduce a log‑linear backstepping control law on SE2(3) that preserves full rotational‑translational coupling. Leveraging a class of mixed‑invariant system, which is a group‑affine dynamic model, we derive exact logarithmic error dynamics that are linear in the Lie algebra. The closed‑form expressions for the left‑ and right‑Jacobian inverses of SE2(3) are expressed in the paper, which provides us the exact error dynamics without local approximations. A log‑linear backstepping control design ensures exponential stability for our error dynamics; since our error dynamics is a block‑triangular structure, this allows us to use Linear Matrix Inequality (LMI) formulation or H_\infty gain performance design. This work establishes the exact backstepping framework for a class of mixed‑invariant system, providing a geometrically consistent foundation for future Unmanned Aerial Vehicle (UAV) and spacecraft control design.
Authors: Jinfeng Liang, Haocheng Guo, Ximin Lyu
Abstract: The tailsitter vertical takeoff and landing (VTOL) UAV is widely used due to its lower dead weight, which eliminates the actuators and mechanisms for tilting. However, the tailsitter UAV is susceptible to wind disturbances in multi‑rotor mode, as it exposes a large frontal fuselage area. To address this issue, our tailsitter UAV features a reconfigurable wing design, allowing wings to retract in multi‑rotor mode and extend in fixed‑ wing mode. Considering power efficiency, we design a coaxial heterogeneous dual‑rotor configuration, which significantly re‑ duces the total power consumption. To reduce structural weight and simplify structural complexity, we employ a swashplateless mechanism with an improved design to control pitch and roll in multi‑rotor mode. We optimize the structure of the swashplateless mechanism by adding flapping hinges, which reduces vibration during cyclic acceleration and deceleration. Finally, we perform comprehensive transition flight tests to validate stable flight performance across the entire flight envelope of the tailsitter UAV.
Authors: Kailun Ji, Xiaoyu Hu, Xinyu Zhang, Jun Chen
Abstract: Large‑scale disaster Search And Rescue (SAR) operations are persistently challenged by complex terrain and disrupted communications. While Unmanned Aerial Vehicle (UAV) swarms offer a promising solution for tasks like wide‑area search and supply delivery, yet their effective coordination places a significant cognitive burden on human operators. The core human‑machine collaboration bottleneck lies in the ``intention‑to‑action gap'', which is an error‑prone process of translating a high‑level rescue objective into a low‑level swarm command under high intensity and pressure. To bridge this gap, this study proposes a novel LLM‑CRF system that leverages Large Language Models (LLMs) to model and augment human‑swarm teaming cognition. The proposed framework initially captures the operator's intention through natural and multi‑modal interactions with the device via voice or graphical annotations. It then employs the LLM as a cognitive engine to perform intention comprehension, hierarchical task decomposition, and mission planning for the UAV swarm. This closed‑loop framework enables the swarm to act as a proactive partner, providing active feedback in real‑time while reducing the need for manual monitoring and control, which considerably advances the efficacy of the SAR task. We evaluate the proposed framework in a simulated SAR scenario. Experimental results demonstrate that, compared to traditional order and command‑based interfaces, the proposed LLM‑driven approach reduced task completion time by approximately 64.2% and improved task success rate by 7%. It also leads to a considerable reduction in subjective cognitive workload, with NASA‑TLX scores dropping by 42.9%. This work establishes the potential of LLMs to create more intuitive and effective human‑swarm collaborations in high‑stakes scenarios.
Authors: Hangyu Teng
Abstract: Co‑simulation is a critical approach for the design and analysis of complex cyber‑physical systems. It will enhance development efficiency and reduce costs. This paper presents a co‑simulation framework integrating ROS 2 and MATLAB/Simulink for quadrotor unmanned aerial vehicle (UAV) control system design and verification. First, a six‑degree‑of‑freedom nonlinear dynamic model of the quadrotor is derived accurately that based on Newton‑Euler equations. Second, within the proposed framework, a hierarchical control architecture was designed and implemented: LQR controller for attitude control to achieve optimal regulation performance, and PID controller for position control to ensure robustness and practical applicability. Third, elaborated the architecture of the framework, including the implementation details of the cross‑platform data exchange mechanism. Simulation results demonstrate the effectiveness of the framework, highlighting its capability to provide an efficient and standardized solution for rapid prototyping and Software‑in‑the‑Loop (SIL) validation of UAV control algorithms.
Authors: Mohsin Mahmud Topu, Mahfuz Ahmed Anik, Azmine Toushik Wasi, Md Manjurul Ahsan
Abstract: Pavement infrastructure monitoring is challenged by complex spatial dependencies, changing environmental conditions, and non‑linear deterioration across road networks. Traditional Pavement Management Systems (PMS) remain largely reactive, lacking real‑time intelligence for failure prevention and optimal maintenance planning. To address this, we propose a unified Digital Twin (DT) and Graph Neural Network (GNN) framework for scalable, data‑driven pavement health monitoring and predictive maintenance. Pavement segments and spatial relations are modeled as graph nodes and edges, while real‑time UAV, sensor, and LiDAR data stream into the DT. The inductive GNN learns deterioration patterns from graph‑structured inputs to forecast distress and enable proactive interventions. Trained on a real‑world‑inspired dataset with segment attributes and dynamic connectivity, our model achieves an R2 of 0.3798, outperforming baseline regressors and effectively capturing non‑linear degradation. We also develop an interactive dashboard and reinforcement learning module for simulation, visualization, and adaptive maintenance planning. This DT‑GNN integration enhances forecasting precision and establishes a closed feedback loop for continuous improvement, positioning the approach as a foundation for proactive, intelligent, and sustainable pavement management, with future extensions toward real‑world deployment, multi‑agent coordination, and smart‑city integration.
Authors: Rui Zhang, Fuwang Dong, Wei Wang
Abstract: In this paper, we construct an air‑sea collaborative system framework based on the Integrated Sensing and Communication (ISAC) techniques, where the Unmanned Aerial Vehicle (UAV) and Unmanned Surface Vehicle (USV) jointly inspect targets of interest while keeping communication with each other simultaneously. First, we demonstrate the unique challenges encountered in this collaborative system, i.e., the coupling and heterogeneity of the UAV/USV's trajectories. Then, we formulate a total energy consumption minimization problem to jointly optimize the trajectories, flying and hovering times, target scheduling, and beamformers under the constraints of water currents, collision avoidance, and Sensing and Communication (S\&C) requirements. To address the strong coupling of the variables, we divide the original problem into two subproblems, namely, the hover point selection and the joint trajectory planning and beamforming design. In the first subproblem, we propose a three‑step hierarchical method including: (1) a virtual base station coverage (VBSC) and clustering algorithm to obtain the target scheduling and rough position of hover points; (2) a Bi‑traveling salesman problem with neighborhood (Bi‑TSPN)‑based algorithm to determine the visiting order sequence of the hover points; (3) a hover point refinement and time allocation algorithm to further optimize the time allocation. In the latter subproblem, we complete the remaining trajectory planning and beamforming design in each flying and hovering stage by developing a semi‑definite relaxation (SDR) and successive convex approximation (SCA) method. Finally, we conduct a series of simulations to demonstrate the superiority of the proposed scheme over existing sequential access and leader‑follower strategies.
Authors: Rushi Moliya, Dhaval K. Patel, Brijesh Soni, Miguel López-Benítez
Abstract: In this work, we consider a multi‑unmanned aerial vehicle (UAV) cooperative sensing system where UAVs are deployed to sense multiple targets in terrain‑aware line of sight (LoS) conditions in uneven terrain equipped with directional antennas. To mitigate terrain‑induced LoS blockages that degrade detection performance, we incorporate a binary LoS indicator and propose a bounding volume hierarchy (BHV)‑based adaptive scheme for efficient LoS evaluation. We formulate a bi‑objective problem that maximizes the probability of cooperative detection with minimal hover energy constraints governing spatial, orientational, and safety constraints. To address the problem, which is inherently non‑convex, we propose a hierarchical heuristic framework that combines exploration through a genetic algorithm (GA) with per‑UAV refinement via particle swarm optimization (PSO), where a penalty‑based fitness evaluation guides solutions toward feasibility, bounded within constraints. The proposed methodology is an effective trade‑off method of traversing through a complex search space and maintaining terrain‑aware LoS connectivity and energy aware deployment. Monte Carlo simulations on real‑world terrain data show that the proposed GA+PSO framework improves detection probability by 37.02% and 36.5% for 2 and 3 UAVs, respectively, while reducing average excess hover energy by 45.0% and 48.9% compared to the PSO‑only baseline. Relative to the non‑optimized scheme, it further achieves 59.5% and 54.2% higher detection probability with 59.8% and 65.9% lower excess hover energy, thereby showing its effectiveness with a small number of UAVs over uneven terrain.
Authors: Rodrigo Nunes, André Melo, Rafael Albarello, Reinaldo Gomes, Cesar Marcondes, Lourenço Pereira
Abstract: The integration of Uncrewed Aerial Vehicles (UAVs) into low‑altitude airspace has led authorities to adopt distributed Uncrewed Traffic Management (UTM) architectures that ensure interoperability and safety. Blockchain has been proposed as an enabler for trustworthy coordination among UTM stakeholders. Yet, its real‑time performance under aeronautical constraints remains insufficiently characterized. This paper presentes a quantitative benchmark comparing two regulation compliant distributed architectures: the federated InterUSS platform maintained by the Linux Foundation and a permissioned blockchain based on Hyperledger Fabric. Both systems were evaluated through Operational Intent Reference (OIR) registration work loads generated via Hyperledger Caliper, measuring throughput, latency, and transaction loss under loads up to 50 transactions per second. Results show that InterUSS sustained sub‑second latency and stable performance up to 30 TPS. At the same time, Fabric exhibited exponential degradation with median latency exceeding 3 s and tail latencies above 15 s beyond that point. These findings demonstrate that blockchain‑based architectures must be redesigned to meet aeronautical timing and scalability requirements, suggesting that hybrid models combining distributed ledgers for auditability with federated frameworks for real‑time coordination are more suitable for future UTM deployments.
Authors: Alaa Awad Abdellatif, Helder Fontes, Andre Coelho, Luis M. Pessoa, Rui Campos
Abstract: This paper presents an optimized Joint Radar‑Communication (JRC) system utilizing multiple Unmanned Aerial Vehicles (UAVs) to simultaneously achieve sensing and communication objectives. By leveraging UAVs equipped with dual radar and communication capabilities, the proposed framework aims to maximize radar sensing performance across all UAVs in challenging environments. The proposed approach focuses on formulating and solving a UAV positioning and power allocation problem to optimize multi‑UAV sensing and communications performance over multiple targets within designated zones. Due to the NP‑hard and combinatorial nature of the problem, we propose a Distributed JRC‑based (DJRC) solution. This solution employs an efficient reward for potential actions and consistently selects the best action that maximizes the reward while ensuring both communications and sensing performance. Simulation results demonstrate significant performance improvements of the proposed solution over state‑of‑the‑art radar‑ or communication‑centric trajectory planning methods, with polynomial complexity dependent on the number of UAVs and linear dependence on the iteration count.
Authors: Sivaram Krishnan, Jinho Choi, Jihong Park, Gregory Sherman, Benjamin Campbell
Abstract: The application of machine learning (ML) to communication systems is expected to play a pivotal role in future artificial intelligence (AI)‑based next‑generation wireless networks. While most existing works focus on ML techniques for static wireless environments, they often face limitations when applied to highly dynamic environments, such as flying ad hoc networks (FANETs). This paper explores the use of data‑driven Koopman approaches to address these challenges. Specifically, we investigate how these approaches can model UAV trajectory dynamics within FANETs, enabling more accurate predictions and improved network performance. By leveraging Koopman operator theory, we propose two possible approaches ‑‑ centralized and distributed ‑‑ to efficiently address the challenges posed by the constantly changing topology of FANETs. To demonstrate this, we consider a FANET performing surveillance with UAVs following pre‑determined trajectories and predict signal‑to‑interference‑plus‑noise ratios (SINRs) to ensure reliable communication between UAVs. Our results show that these approaches can accurately predict connectivity and isolation events that lead to modelled communication outages. This capability could help UAVs schedule their transmissions based on these predictions.
Authors: Zihan Wang, Jianwen Li, Li-Fan Wu, Nina Mahmoudian
Abstract: Rivers are critical corridors for environmental monitoring and disaster response, where Unmanned Aerial Vehicles (UAVs) guided by vision‑driven policies can provide fast, low‑cost coverage. However, deployment exposes simulation‑trained policies with distribution shift and safety risks and requires efficient adaptation from limited human interventions. We study human‑in‑the‑loop (HITL) learning with a conservative overseer who vetoes unsafe or inefficient actions and provides statewise preferences by comparing the agent's proposal with a corrective override. We introduce Statewise Hybrid Preference Alignment for Robotics (SPAR‑H), which fuses direct preference optimization on policy logits with a reward‑based pathway that trains an immediate‑reward estimator from the same preferences and updates the policy using a trust‑region surrogate. With five HITL rollouts collected from a fixed novice policy, SPAR‑H achieves the highest final episodic reward and the lowest variance across initial conditions among tested methods. The learned reward model aligns with human‑preferred actions and elevates nearby non‑intervened choices, supporting stable propagation of improvements. We benchmark SPAR‑H against imitation learning (IL), direct preference variants, and evaluative reinforcement learning (RL) in the HITL setting, and demonstrate real‑world feasibility of continual preference alignment for UAV river following. Overall, dual statewise preferences empirically provide a practical route to data‑efficient online adaptation in riverine navigation.
Authors: Mathias Mankoe, Fuqiang Lu, Hualing Bi, Abdulsalam Sibidoo Mubashiru
Abstract: Autonomous navigation of UAV swarms in perceptually‑degraded environments, where GPS is unavailable and terrain is densely cluttered, presents a critical bottleneck for real‑world deployment. Existing optimization‑based planners lack the resilience to avoid catastrophic convergence to local optima under such uncertainty. Inspired by principles of computational meta‑cognition, this paper introduces a novel swarm intelligence framework that enables a fleet of UAVs to autonomously sense, adapt, and recover from planning failures in real‑time. At its core is the Self‑Learning Slime Mould Algorithm (SLSMA), which integrates three meta‑cognitive layers: a situation‑aware search strategy that dynamically selects between exploration and exploitation based on perceived search stagnation; a collective memory mechanism that allows the swarm to learn from and avoid previously failed trajectories; and an adaptive recovery behavior that triggers global re‑exploration upon entrapment. We formulate the multi‑UAV trajectory problem as a resilient planning challenge, with a cost function that penalizes not only path length and collisions but also navigational uncertainty and proximity to failure states. Extensive simulations in synthetically complex 3D worlds and against the CEC 2017 benchmark suite demonstrate the framework's superior performance. The SLSMA does not merely optimize paths; it generates resilient trajectories, demonstrating a 99.5% mission success rate and significantly outperforming state‑of‑the‑art metaheuristics in recovery speed and solution reliability. This work provides a foundational step towards truly autonomous swarms capable of persistent operation in denied and dynamic environments.
Authors: Shuang Qi, Bin Lin, Yiqin Deng, Xianhao Chen, Yuguang Fang
Abstract: Unmanned Aerial Vehicles (UAVs) play a crucial role in Maritime Search and Rescue (MSAR), contributing to the improvement of rescue efficiency and reduction of casualties. Typically, UAVs equipped with cameras collect data from disaster areas and transmit it to the shore‑based rescue command centers. By deploying Mobile Edge Computing (MEC) servers, UAVs can pre‑process video footage to reduce data transmission volume, thus reducing transmission delays. However, the limited computational capacity and energy of UAVs pose significant challenges to the efficiency of UAV‑assisted MSAR systems. To address these problems, in this paper, we investigate a multi‑UAV assisted MSAR system consisting of multiple Surveillance UAVs (S‑UAVs) and a Relay UAV (R‑UAV). Then, we formulate a joint optimization problem to minimize the maximum total latency among all S‑UAVs via jointly making the computing offloading decisions, R‑UAV deployment, and the association between a S‑UAV and rescue targets while ensuring that all targets are monitored by S‑UAVs. Since the formulated optimization problem is typically hard to solve due to its non‑convexity, we propose an effective iterative algorithm by breaking it into three sub‑problems. Numerical simulation results show the effectiveness of the proposed algorithm with various performance parameters.
Authors: Shuaijun Li, Jie Tang, Beixiong Zheng, Lipeng Zhu, Cui Yang, Nan Zhao, Xiu Yin Zhang, Kai-Kit Wong
Abstract: Low‑altitude economy (LAE) is an emerging technological paradigm that enables continuous airspace coverage at multiple altitudes by providing highly reliable data connectivity for numerous low‑altitude applications. However, existing networks cannot sufficiently support LAE development, as current base stations (BSs) are primarily designed for terrestrial users and lack the capability to provide continuous coverage at low altitudes. To overcome these challenges, rotatable antenna system (RAS) is introduced in LAE, enabling flexible beamforming by dynamically adjusting the boresight of directional antennas to extend low‑altitude coverage and enhance the stability of data transmission. In this article, we first provide an overview of RAS‑empowered LAE applications, including low‑altitude communication, sensing, control, and computation. Then, we present two practical RAS deployment strategies for LAE scenarios, namely RAS‑aided multi‑BS and multi‑unmanned aerial vehicle (UAV) cooperative coverages, as well as provide detailed discussions on their system architectures and performance benefits. Additionally, key design issues of RAS in LAE are discussed, including channel modeling and estimation, cellular access and interference cancellation, as well as RAS configuration and boresight optimization. Finally, we demonstrate the performance gains of RAS in LAE networks through experimental and simulation results.
Authors: Xiaoling Han, Bin Lin, Zhenyu Na, Bowen Li, Chaoyue Zhang, Ran Zhang
Abstract: Driven by the unceasing development of maritime services, tasks of unmanned aerial vehicle (UAV)‑assisted maritime data collection (MDC) are becoming increasingly diverse, complex and personalized. As a result, effective task allocation for MDC is becoming increasingly critical. In this work, integrating the concept of spatial crowdsourcing (SC), we develop an SC‑based MDC network model and investigate the task allocation problem for UAV‑assisted MDC. In variable maritime service scenarios, tasks are allocated to UAVs based on the spatial and temporal requirements of the tasks, as well as the mobility of the UAVs. To address this problem, we design an SC‑based task allocation algorithm for the MDC (SC‑MDC‑TA). The quality estimation is utilized to assess and regulate task execution quality by evaluating signal to interference plus noise ratio and the UAV energy consumption. The reverse auction is employed to potentially reduce the task waiting time as much as possible while ensuring timely completion. Additionally, we establish typical task allocation scenarios based on maritime service requirements indicated by electronic navigational charts. Simulation results demonstrate that the proposed SC‑MDC‑TA algorithm effectively allocates tasks for various MDC scenarios. Furthermore, compared to the benchmark, the SC‑MDC‑TA algorithm can also reduce the task completion time and lower the UAV energy consumption.
Authors: Ran Xu, Yupeng Qi, Jingsen Feng, Xu Chu
Abstract: In modern engineering practice, human engineers collaborate in specialized teams to design complex products, with each expert completing their respective tasks while communicating and exchanging results and data with one another. While this division of expertise is essential for managing multidisciplinary complexity, it demands substantial development time and cost. Recently, we introduced OpenFOAMGPT (1.0, 2.0), which functions as an autonomous AI engineer for computational fluid dynamics, and turbulence.ai, which can conduct end‑to‑end research in fluid mechanics draft publications and PhD theses. Building upon these foundations, we present Engineering.ai, a platform for teams of AI engineers in computational design. The framework employs a hierarchical multi‑agent architecture where a Chief Engineer coordinates specialized agents consisting of Aerodynamics, Structural, Acoustic, and Optimization Engineers, each powered by LLM with domain‑specific knowledge. Agent‑agent collaboration is achieved through file‑mediated communication for data provenance and reproducibility, while a comprehensive memory system maintains project context, execution history, and retrieval‑augmented domain knowledge to ensure reliable decision‑making across the workflow. The system integrates FreeCAD, Gmsh, OpenFOAM, CalculiX, and BPM acoustic analysis, enabling parallel multidisciplinary simulations while maintaining computational accuracy. The framework is validated through UAV wing optimization. This work demonstrates that agentic‑AI‑enabled AI engineers has the potential to perform complex engineering tasks autonomously. Remarkably, the automated workflow achieved a 100% success rate across over 400 parametric configurations, with zero mesh generation failures, solver convergence issues, or manual interventions required, validating that the framework is trustworthy.
Authors: Leonhard Duda, Khadijeh Alibabaei, Elena Vollmer, Leon Klug, Valentin Kozlov, Lisana Berberi, Mishal Benz, Rebekka Volk, Juan Pedro Gutiérrez Hermosillo Muriedas, Markus Götz, Judith Sáínz-Pardo Díaz, Álvaro López García, Frank Schultmann, Achim Streit
Abstract: Federated Learning (FL) is an approach for training a shared Machine Learning (ML) model with distributed training data and multiple participants. FL allows bypassing limitations of the traditional Centralized Machine Learning CL if data cannot be shared or stored centrally due to privacy or technical restrictions ‑‑ the participants train the model locally with their training data and do not need to share it among the other participants. This paper investigates the practical implementation and effectiveness of FL in a real‑world scenario, specifically focusing on unmanned aerial vehicle (UAV)‑based thermal images for common thermal feature detection in urban environments. The distributed nature of the data arises naturally and makes it suitable for FL applications, as images captured in two German cities are available. This application presents unique challenges due to non‑identical distribution and feature characteristics of data captured at both locations. The study makes several key contributions by evaluating FL algorithms in real deployment scenarios rather than simulation. We compare several FL approaches with a centralized learning baseline across key performance metrics such as model accuracy, training time, communication overhead, and energy usage. This paper also explores various FL workflows, comparing client‑controlled workflows and server‑controlled workflows. The findings of this work serve as a valuable reference for understanding the practical application and limitations of the FL methods in segmentation tasks in UAV‑based imaging.
Authors: Suman Raj, Radhika Mittal, Rajiv Mayani, Pawel Zuk, Anirban Mandal, Michael Zink, Yogesh Simmhan, Ewa Deelman
Abstract: Drone fleets equipped with onboard cameras, computer vision, and Deep Neural Network (DNN) models present a powerful paradigm for real‑time spatio‑temporal decision‑making. In wildfire response, such drones play a pivotal role in monitoring fire dynamics, supporting firefighter coordination, and facilitating safe evacuation. In this paper, we introduce AeroResQ, an edge‑accelerated UAV framework designed for scalable, resilient, and collaborative escape route planning during wildfire scenarios. AeroResQ adopts a multi‑layer orchestration architecture comprising service drones (SDs) and coordinator drones (CDs), each performing specialized roles. SDs survey fire‑affected areas, detect stranded individuals using onboard edge accelerators running fire detection and human pose identification DNN models, and issue requests for assistance. CDs, equipped with lightweight data stores such as Apache IoTDB, dynamically generate optimal ground escape routes and monitor firefighter movements along these routes. The framework proposes a collaborative path‑planning approach based on a weighted A search algorithm, where CDs compute context‑aware escape paths. AeroResQ further incorporates intelligent load‑balancing and resilience mechanisms: CD failures trigger automated data redistribution across IoTDB replicas, while SD failures initiate geo‑fenced re‑partitioning and reassignment of spatial workloads to operational SDs. We evaluate AeroResQ using realistic wildfire emulated setup modeled on recent Southern California wildfires. Experimental results demonstrate that AeroResQ achieves a nominal end‑to‑end latency of <=500ms, much below the 2s request interval, while maintaining over 98% successful task reassignment and completion, underscoring its feasibility for real‑time, on‑field deployment in emergency response and firefighter safety operations.
Authors: Robert Pommeranz, Kevin Tebbe, Ralf Heynicke, Gerd Scholl
Abstract: In this paper a modular and scalable architecture for heterogeneous swarm‑based Counter Unmanned Aerial Systems (C‑UASs) built on PX4‑Autopilot and Robot Operating System 2 (ROS 2) framework is presented. The proposed architecture emphasizes seamless integration of hardware components by introducing independent ROS 2 nodes for each component of a Unmanned Aerial Vehicle (UAV). Communication between swarm participants is abstracted in software, allowing the use of various technologies without architectural changes. Key functionalities are supported, e.g. leader following and formation flight to maneuver the swarm. The system also allows computer vision algorithms to be integrated for the detection and tracking of UAVs. Additionally, a ground station control is integrated for the coordination of swarm operations. Swarm‑based Unmanned Aerial System (UAS) architecture is verified within a Gazebo simulation environment but also in real‑world demonstrations.
Authors: Marius Feldmann, Tobias Nöthlich, Felix Walter, Maximilian Nitsch, Juan A. Fraire, Georg A. Murzik, Fiona Fuchs
Abstract: This whitepaper presents parts of the results of the REDMARS2 project conducted in 2021‑2022, exploring the integration of Recursive Internetwork Architecture (RINA) concepts into Delay‑ and Disruption‑Tolerant Networking (DTN) protocols. Using Bundle‑in‑Bundle Encapsulation (BIBE), we implemented scope‑based separation mechanisms resulting in scalable DTNs. A key contribution of this work is the demonstration of practical BIBE‑based use cases, including a realistic Solar System Internet communication scenario involving unmanned aerial vehicles (UAVs) and satellite relays. The evaluation, supported by field tests in collaboration with the European Space Agency (ESA), confirmed the viability of BIBE as a foundation for scalable, recursive, and interoperable DTN architectures.
Authors: Luis Antonio L. F. da Costa, Rodrigo C. de Lamare, Rafael Kunst, Edison Pignaton de Freitas
Abstract: The sixth generation (6G) wireless networks are envisioned to deliver ultra‑low latency, massive connectivity, and high data rates, enabling advanced applications such as autonomous unmaned aerial vehicles (UAV) swarms and aerial edge computing. However, realizing this vision in Flying Ad Hoc Networks (FANETs) requires intelligent and adaptive clustering mechanisms to ensure efficient routing and resource utilization. This paper proposes a novel machine learning‑driven framework for dynamic cluster formation and cluster head selection in 6G‑enabled FANETs. The system leverages mobility prediction using Extreme Gradient Boosting (XGBoost) and a composite optimization strategy based on signal strength and spatial proximity to identify optimal cluster heads. To evaluate the proposed method, comprehensive simulations were conducted in both centralized (5G) and decentralized (6G) topologies using realistic video traffic patterns. Results show that the proposed model achieves significant improvements in delay, jitter, and throughput in decentralized scenarios. These findings demonstrate the potential of combining machine learning with clustering techniques to enhance scalability, stability, and performance in next‑generation aerial networks.
Authors: Yunfeng Jiang, Zhiming Huang, Jianping Pan
Abstract: The analytical characterization of coverage probability in finite three‑dimensional wireless networks has long remained an open problem, hindered by the loss of spatial independence in finite‑node settings and the coupling between link distances and interference in bounded geometries. This paper closes this gap by presenting the first exact analytical framework for coverage probability in finite 3D networks modeled by a binomial point process within a cylindrical region. To bypass the intractability that has long hindered such analyses, we leverage the independence structure, convolution geometry, and derivative properties of Laplace transforms, yielding a formulation that is both mathematically exact and computationally efficient. Extensive Monte Carlo simulations verify the analysis and demonstrate significant accuracy gains over conventional Poisson‑based models. The results generalize to any confined 3D wireless system, including aerial, underwater, and robotic networks.
Authors: Bowen Li, Jiping Luo, Themistoklis Charalambous, Nikolaos Pappas
Abstract: Guaranteeing stringent data freshness for low‑altitude unmanned aerial vehicles (UAVs) in shared spectrum forces a critical trade‑off between two operational costs: the UAV's own energy consumption and the occupation of terrestrial channel resources. The core challenge is to satisfy the aerial data freshness while finding a Pareto‑optimal balance between these costs. Leveraging predictive channel models and predictive UAV trajectories, we formulate a bi‑objective Pareto optimization problem over a long‑term planning horizon to jointly optimize the sampling timing for aerial traffic and the power and spectrum allocation for fair coexistence. However, the problem's non‑convex, mixed‑integer nature renders classical methods incapable of fully characterizing the complete Pareto frontier. Notably, we show monotonicity properties of the frontier, building on which we transform the bi‑objective problem into several single‑objective problems. We then propose a new graph‑based algorithm and prove that it can find the complete set of Pareto optima with low complexity, linear in the horizon and near‑quadratic in the resource block (RB) budget. Numerical comparisons show that our approach meets the stringent timeliness requirement and achieves a six‑fold reduction in RB utilization or a 6 dB energy saving compared to benchmarks.
Authors: Bingcong Huo, Zhiming Wang
Abstract: To address the challenges in UAV object detection, such as complex backgrounds, severe occlusion, dense small objects, and varying lighting conditions,this paper proposes PT‑DETR based on RT‑DETR, a novel detection algorithm specifically designed for small objects in UAV imagery. In the backbone network, we introduce the Partially‑Aware Detail Focus (PADF) Module to enhance feature extraction for small objects. Additionally,we design the Median‑Frequency Feature Fusion (MFFF) module,which effectively improves the model's ability to capture small‑object details and contextual information. Furthermore,we incorporate Focaler‑SIoU to strengthen the model's bounding box matching capability and increase its sensitivity to small‑object features, thereby further enhancing detection accuracy and robustness. Compared with RT‑DETR, our PT‑DETR achieves mAP improvements of 1.6% and 1.7% on the VisDrone2019 dataset with lower computational complexity and fewer parameters, demonstrating its robustness and feasibility for small‑object detection tasks.
Authors: Chuang Zhang, Geng Sun, Jiahui Li, Jiacheng Wang, Qingqing Wu, Dusit Niyato, Shiwen Mao, Tony Q. S. Quek
Abstract: The proliferation of Internet of Things (IoT) networks has created an urgent need for sustainable energy solutions, particularly for the battery‑constrained spatially distributed IoT nodes. While low‑altitude uncrewed aerial vehicles (UAVs) employed with wireless power transfer (WPT) capabilities offer a promising solution, the line‑of‑sight channels that facilitate efficient energy delivery also expose sensitive operational data to adversaries. This paper proposes a novel low‑altitude UAV‑carried movable antenna‑enhanced transmission system joint WPT and covert communications, which simultaneously performs energy supplements to IoT nodes and establishes transmission links with a covert user by leveraging wireless energy signals as a natural cover. Then, we formulate a multi‑objective optimization problem that jointly maximizes the total harvested energy of IoT nodes and sum achievable rate of the covert user, while minimizing the propulsion energy consumption of the low‑altitude UAV. To address the non‑convex and temporally coupled optimization problem, we propose a mixture‑of‑experts‑augmented soft actor‑critic (MoE‑SAC) algorithm that employs a sparse Top‑K gated mixture‑of‑shallow‑experts architecture to represent multimodal policy distributions arising from the conflicting optimization objectives. We also incorporate an action projection module that explicitly enforces per‑time‑slot power budget constraints and antenna position constraints. Simulation results demonstrate that the proposed approach significantly outperforms some baseline approaches and other state‑of‑the‑art deep reinforcement learning algorithms.
Authors: Jikang Deng, Hui Zhou, Mohamed-Slim Alouini
Abstract: In post‑disaster scenarios, the rapid deployment of adequate communication infrastructure is essential to support disaster search, rescue, and recovery operations. To achieve this, uncrewed aerial vehicle (UAV) has emerged as a promising solution for emergency communication due to its low cost and deployment flexibility. However, conventional untethered UAV (U‑UAV) is constrained by size, weight, and power (SWaP) limitations, making it incapable of maintaining the operation of a macro base station. To address this limitation, we propose a heterogeneous UAV‑based framework that integrates tethered UAV (T‑UAV) and U‑UAVs, where U‑UAVs are utilized to enhance the throughput of cell‑edge ground user equipments (G‑UEs) and guarantee seamless connectivity during G‑UEs' mobility to safe zones. It is noted that the integrated access and backhaul (IAB) technique is adopted to support the wireless backhaul of U‑UAVs. Accordingly, we formulate a two‑timescale joint user scheduling and trajectory control optimization problem, aiming to maximize the downlink throughput under asymmetric traffic demands and G‑UEs' mobility. To solve the formulated problem, we proposed a two‑timescale multi‑agent deep deterministic policy gradient (TTS‑MADDPG) algorithm based on the centralized training and distributed execution paradigm. Numerical results show that the proposed algorithm outperforms other benchmarks, including the two‑timescale multi‑agent proximal policy optimization (TTS‑MAPPO) algorithm and MADDPG scheduling method, with robust and higher throughput. Specifically, the proposed algorithm obtains up to 12.2% average throughput gain compared to the MADDPG scheduling method.
Authors: Wondimagegn Abebe Demissie, Stefano Roccella, Rudy Rossetto, Antonio Minnocci, Andrea Vannini, Luca Sebastiani
Abstract: Olive tree biovolume estimation is a key task in precision agriculture, supporting yield prediction and resource management, especially in Mediterranean regions severely impacted by climate‑induced stress. This study presents a comparative analysis of three deep learning models U‑Net, YOLOv11m‑seg, and Mask RCNN for segmenting olive tree crowns and their shadows in ultra‑high resolution UAV imagery. The UAV dataset, acquired over Vicopisano, Italy, includes manually annotated crown and shadow masks. Building on these annotations, the methodology emphasizes spatial feature extraction and robust segmentation; per‑tree biovolume is then estimated by combining crown projected area with shadow‑derived height using solar geometry. In testing, Mask R‑CNN achieved the best overall accuracy (F1 = 0.86; mIoU = 0.72), while YOLOv11m‑seg provided the fastest throughput (0.12 second per image). The estimated biovolumes spanned from approximately 4 to 24 cubic meters, reflecting clear structural differences among trees. These results indicate Mask R‑CNN is preferable when biovolume accuracy is paramount, whereas YOLOv11m‑seg suits large‑area deployments where speed is critical; U‑Net remains a lightweight, high‑sensitivity option. The framework enables accurate, scalable orchard monitoring and can be further strengthened with DEM or DSM integration and field calibration for operational decision support.
Authors: David Leprich, Mario Rosenfelder, Markus Herrmann-Wicklmayr, Kathrin Flaßkamp, Peter Eberhard, Henrik Ebel
Abstract: This article proposes a modular optimal control framework for local three‑dimensional ellipsoidal obstacle avoidance, exemplarily applied to model predictive path‑following control. Static as well as moving obstacles are considered. Central to the approach is a computationally efficient and continuously differentiable condition for detecting collisions with ellipsoidal obstacles. A novel two‑stage optimization approach mitigates numerical issues arising from the structure of the resulting optimal control problem. The effectiveness of the approach is demonstrated through simulations and real‑world experiments with the Crazyflie quadrotor. This represents the first hardware demonstration of an MPC controller of this kind for UAVs in a three‑dimensional task.
Authors: Federica Tonti, Ricardo Vinuesa
Abstract: Unmanned Aerial Vehicles (UAVs) are increasingly populating urban areas for delivery and surveillance purposes. In this work, we develop an optimal navigation strategy based on Deep Reinforcement Learning. The environment is represented by a three‑dimensional high‑fidelity simulation of an urban flow, characterized by turbulence and recirculation zones. The algorithm presented here is a flow‑aware Proximal Policy Optimization (PPO) combined with a Gated Transformer eXtra Large (GTrXL) architecture, giving the agent richer information about the turbulent flow field in which it navigates. The results are compared with a PPO+GTrXL without the secondary prediction tasks, a PPO combined with Long Short Term Memory (LSTM) cells and a traditional navigation algorithm. The obtained results show a significant increase in the success rate (SR) and a lower crash rate (CR) compared to a PPO+LSTM, PPO+GTrXL and the classical Zermelo's navigation algorithm, paving the way to a completely reimagined UAV landscape in complex urban environments.
Authors: Sana Hafeez, Ghulam E Mustafa Abro, Hifza Mustafa
Abstract: The rapid deployment of unmanned aerial vehicle (UAV) corridors in sixth‑generation (6G) networks requires safe, intelligence‑driven integrated sensing and communications (ISAC). Reconfigurable intelligent surfaces (RIS) enhance spectrum efficiency, localisation accuracy, and situational awareness, while introducing new vulnerabilities. The rise of quantum computing increases the risks associated with harvest‑now‑decrypt‑later strategies and quantum‑enhanced spoofing. We propose a Quantum‑Resilient Threat Modelling (QRTM) framework for RIS‑assisted ISAC in UAV corridors to address these challenges. QRTM integrates classical, quantum‑ready, and quantum‑aided adversaries, countered using post‑quantum cryptographic (PQC) primitives: ML‑KEM for key establishment and Falcon for authentication, both embedded within RIS control signalling and UAV coordination. To strengthen security sensing, the framework introduces RIS‑coded scene watermarking validated through a generalised likelihood ratio test (GLRT), with its detection probability characterised by the Marcum Q function. Furthermore, a Secure ISAC Utility (SIU) jointly optimises secrecy rate, spoofing detection, and throughput under RIS constraints, enabled by a scheduler with computational complexity of O(n^2). Monte Carlo evaluations using 3GPP Release 19 mid‑band urban‑canyon models (7‑15 GHz) demonstrate a spoof‑detection probability approaching 0.99 at a false‑alarm rate of 1e‑3, secrecy‑rate retention exceeding 90 percent against quantum‑capable adversaries, and signal‑interference utilisation improvements of about 25 percent compared with baselines. These results show a standards‑compliant path towards reliable, quantum‑resilient ISAC for UAV corridors in smart cities and non‑terrestrial networks.
Authors: Lei Han, Jinhao Zhang, Jinhui Liu, Zhiyong Yu, Liang Wang, Quan Wang, Zhiwen Yu
Abstract: Frequent natural disasters cause significant losses to human society, and timely, efficient collection of post‑disaster environmental information is the foundation for effective rescue operations. Due to the extreme complexity of post‑disaster environments, existing sensing technologies such as mobile crowdsensing suffer from weak environmental adaptability, insufficient professional sensing capabilities, and poor practicality of sensing solutions. Therefore, this paper explores a heterogeneous multi‑agent online collaborative scheduling algorithm, HoCs‑MPQ, to achieve efficient collection of post‑disaster environmental information. HoCs‑MPQ models collaboration and conflict relationships among multiple elements through weighted undirected graph construction, and iteratively solves the maximum weight independent set based on multi‑priority queues, ultimately achieving collaborative sensing scheduling of time‑dependent UA Vs, vehicles, and workers. Specifically, (1) HoCs‑MPQ constructs weighted undirected graph nodes based on collaborative relationships among multiple elements and quantifies their weights, then models the weighted undirected graph based on conflict relationships between nodes; (2) HoCs‑MPQ solves the maximum weight independent set based on iterated local search, and accelerates the solution process using multi‑priority queues. Finally, we conducted detailed experiments based on extensive real‑world and simulated data. The experiments show that, compared to baseline methods (e.g., HoCs‑GREEDY, HoCs‑K‑WTA, HoCs‑MADL, and HoCs‑MARL), HoCs‑MPQ improves task completion rates by an average of 54.13%, 23.82%, 14.12%, and 12.89% respectively, with computation time for single online autonomous scheduling decisions not exceeding 3 seconds.
Authors: Hongyu Song, Rishabh Dev Yadav, Cheng Guo, Wei Pan
Abstract: Autonomous navigation under natural language instructions represents a crucial step toward embodied intelligence, enabling complex task execution in environments ranging from industrial facilities to domestic spaces. However, language‑driven 3D navigation for Unmanned Aerial Vehicles (UAVs) requires precise spatial reasoning, a capability inherently lacking in current zero‑shot Vision‑Language Models (VLMs) which often generate ambiguous outputs and cannot guarantee geometric feasibility. Furthermore, existing Vision‑Language Navigation (VLN) methods are predominantly tailored for 2.5D ground robots, rendering them unable to generalize to the unconstrained 3D spatial reasoning required for aerial tasks in small‑scale, cluttered environments. In this paper, we present SoraNav, a novel framework enabling zero‑shot VLM reasoning for UAV task‑centric navigation. To address the spatial‑semantic gap, we introduce Multi‑modal Visual Annotation (MVA), which encodes 3D geometric priors directly into the VLM's 2D visual input. To mitigate hallucinated or infeasible commands, we propose an Adaptive Decision Making (ADM) strategy that validates VLM proposals against exploration history, seamlessly switching to geometry‑based exploration to avoid dead‑ends and redundant revisits. Deployed on a custom PX4‑based micro‑UAV, SoraNav demonstrates robust real‑world performance. Quantitative results show our approach significantly outperforms state‑of‑the‑art baselines, increasing Success Rate (SR) by 25.7% and navigation efficiency (SPL) by 17.3% in 2.5D scenarios, and achieving improvements of 39.3% (SR) and 24.7% (SPL) in complex 3D scenarios.
Authors: Bin Li, Dongdong Yang, Lei Liu, Dusit Niyato
Abstract: Reconfigurable intelligent surface (RIS) has emerged as a pivotal technology for enhancing wireless networks. Compared to terrestrial RIS deployed on building facades, aerial RIS (ARIS) mounted on quadrotor unmanned aerial vehicle (UAV) offers superior flexibility and extended coverage. However, the inevitable tilt and altitude variations of a quadrotor UAV during flight may lead to severe beam misalignment, significantly degrading ARIS's performance. To address this challenge, we propose a Euler angles‑based ARIS control scheme that jointly optimizes the altitude and trajectory of the ARIS by leveraging the UAV's dynamic model. Considering the constraints on ARIS flight energy consumption, flight safety, and the transmission power of a base station (BS), we jointly design the ARIS's altitude, trajectory, phase shifts, and BS beamforming to maximize the system sum‑rate. Due to the continuous control nature of ARIS flight and the strong coupling among variables, we formulate the problem as a Markov decision process and adopt a soft actor‑critic algorithm with prioritized experience replay to learn efficient ARIS control policies. Based on the optimized ARIS configuration, we further employ the water‑filling and bisection method to efficiently determine the optimal BS beamforming. Numerical results demonstrate that the proposed algorithm significantly outperforms benchmarks in both convergence and communication performance, achieving approximately 14.4% improvement in sum‑rate. Moreover, in comparison to the fixed‑horizontal ARIS scheme, the proposed scheme yields more adaptive trajectories and significantly mitigates performance degradation caused by ARIS tilting, demonstrating strong potential for practical ARIS deployment.
Authors: Baozhe Zhang, Xinwei Chen, Qingcheng Chen, Chao Xu, Fei Gao, Yanjun Cao
Abstract: CoNi‑MPC provides an efficient framework for UAV control in air‑ground cooperative tasks by relying exclusively on relative states, eliminating the need for global state estimation. However, its lack of environmental information poses significant challenges for obstacle avoidance. To address this issue, we propose a novel obstacle avoidance algorithm, Cooperative Non‑inertial frame‑based Obstacle Avoidance (CoNi‑OA), designed explicitly for UAV‑UGV cooperative scenarios without reliance on global state estimation or obstacle prediction. CoNi‑OA uniquely utilizes a single frame of raw LiDAR data from the UAV to generate a modulation matrix, which directly adjusts the quadrotor's velocity to achieve obstacle avoidance. This modulation‑based method enables real‑time generation of collision‑free trajectories within the UGV's non‑inertial frame, significantly reducing computational demands (less than 5 ms per iteration) while maintaining safety in dynamic and unpredictable environments. The key contributions of this work include: (1) a modulation‑based obstacle avoidance algorithm specifically tailored for UAV‑UGV cooperation in non‑inertial frames without global states; (2) rapid, real‑time trajectory generation based solely on single‑frame LiDAR data, removing the need for obstacle modeling or prediction; and (3) adaptability to both static and dynamic environments, thus extending applicability to featureless or unknown scenarios.
Authors: Jihao Luo, Zesong Fei, Xinyi Wang, Le Zhao, Yuanhao Cui, Guangxu Zhu, Dusit Niyato
Abstract: Unmanned aerial vehicles (UAVs) are emerging as key enablers for low‑altitude wireless network (LAWN), particularly when terrestrial networks are unavailable. In such scenarios, the environmental topology is typically unknown; hence, designing efficient and safe UAV trajectories is essential yet challenging. To address this, we propose a digital twin (DT)‑assisted training and deployment framework. In this framework, the UAV transmits integrated sensing and communication signals to provide communication services to ground users, while simultaneously collecting echoes that are uploaded to the DT server to progressively construct virtual environments (VEs). These VEs accelerate model training and are continuously updated with real‑time UAV sensing data during deployment, supporting decision‑making and enhancing flight safety. Based on this framework, we further develop a trajectory design scheme that integrates simulated annealing for efficient user scheduling with the twin‑delayed deep deterministic policy gradient algorithm for continuous trajectory design, aiming to minimize mission completion time while ensuring obstacle avoidance. Simulation results demonstrate that the proposed approach achieves faster convergence, higher flight safety, and shorter mission completion time compared with baseline methods, providing a robust and efficient solution for LAWN deployment in unknown environments.
Authors: Lysander Miller, Airlie Chapman, James Kennedy, Richard Hebden, Jeremy M. C. Brown
Abstract: Advances in scintillation crystal and Silicon PhotoMultiplier (SiPM) technologies have enabled the development of compact, lightweight, and low‑power radiation detectors that are suitable for integration with Unmanned Aerial Vehicles (UAVs). This integration enables efficient and cost‑effective large‑area radiation monitoring while minimising occupational exposure. In this work, a SiPM‑based NaIL scintillation detection payload was developed, characterised, and mounted on a multirotor UAV for gamma ray and neutron source localisation and activity estimation applications. To support these capabilities, an analytic radionuclide detection efficiency model was developed and used to estimate radioactivity on the ground from aerial energy spectrum measurements. The analytic expression for the detection efficiency incorporated physical phenomena, including the branching ratio, detector solid angle, air attenuation, and intrinsic peak efficiency, leading to agreement within 10% of experimental radionuclide detection efficiencies. The UAV‑based radiation detection system was physically validated through a controlled indoor live radioactive source demonstration at 1.5 m, 3 m, and 4.5 m flight heights. Using the developed ground‑level radioactivity estimation method, Cs‑137 and Co‑60 sources were successfully localised within 0.5 m, and their activities were estimated with errors on the order of 10% or less.
Authors: Maria G. Mendoza, Addison Kalanther, Daniel Bostwick, Emma Stephan, Chinmay Maheshwari, Shankar Sastry
Abstract: Autonomous drone technology holds significant promise for enhancing search and rescue operations during evacuations by guiding humans toward safety and supporting broader emergency response efforts. However, their application in dynamic, real‑time evacuation support remains limited. Existing models often overlook the psychological and emotional complexity of human behavior under extreme stress. In real‑world fire scenarios, evacuees frequently deviate from designated safe routes due to panic and uncertainty. To address these challenges, this paper presents a multi‑agent coordination framework in which autonomous Unmanned Aerial Vehicles (UAVs) assist human evacuees in real‑time by locating, intercepting, and guiding them to safety under uncertain conditions. We model the problem as a Partially Observable Markov Decision Process (POMDP), where two heterogeneous UAV agents, a high‑level rescuer (HLR) and a low‑level rescuer (LLR), coordinate through shared observations and complementary capabilities. Human behavior is captured using an agent‑based model grounded in empirical psychology, where panic dynamically affects decision‑making and movement in response to environmental stimuli. The environment features stochastic fire spread, unknown evacuee locations, and limited visibility, requiring UAVs to plan over long horizons to search for humans and adapt in real‑time. Our framework employs the Proximal Policy Optimization (PPO) algorithm with recurrent policies to enable robust decision‑making in partially observable settings. Simulation results demonstrate that the UAV team can rapidly locate and intercept evacuees, significantly reducing the time required for them to reach safety compared to scenarios without UAV assistance.
Authors: Abdul Saboor, Zhuangzhuang Cui, Achiel Colpaert, Evgenii Vinogradov, Wout Joseph, Sofie Pollin
Abstract: This paper presents a comprehensive measurement‑based trajectory‑aware characterization of low‑altitude Air‑to‑Ground (A2G) channels in a suburban environment. A 64‑element Massive Multi‑Input Multi‑Output (MaMIMO) array was used to capture channels for three trajectories of an Uncrewed Aerial Vehicle (UAV), including two horizontal zig‑zag flights at fixed altitudes and one vertical ascent, chosen to emulate AUE operations and to induce controlled azimuth and elevation sweeps for analyzing geometry‑dependent propagation dynamics. We examine large‑scale power variations and their correlation with geometric features, such as elevation, azimuth, and 3D distance, followed by an analysis of fading behavior through distribution fitting and Rician K‑factor estimation. Furthermore, temporal non‑stationarity is quantified using the Correlation Matrix Distance (CMD), and angular stationarity spans are utilized to demonstrate how channel characteristics change with the movement of the UAV. We also analyze Spectral Efficiency (SE) in relation to K‑factor and Root Mean Square (RMS) delay spread, highlighting their combined influence on link performance. The results show that the elevation angle is the strongest predictor of the received power, with a correlation of more than 0.77 for each trajectory, while the Nakagami model best fits the small‑scale fading. The K‑factor increases from approximately 5 dB at low altitudes to over 15 dB at higher elevations, indicating stronger LoS dominance. Non‑stationarity patterns are highly trajectory‑ and geometry‑dependent, with azimuth most affected in horizontal flights and elevation during vertical flight. These findings offer valuable insights for modeling and improving UAV communication channels in 6G Non‑Terrestrial Networks (NTNs).
Authors: Guanwang Jiang, Ziye Jia, Can Cui, Lijun He, Qiuming Zhu, Qihui Wu
Abstract: The low‑altitude networks (LANs) integrating unmanned aerial vehicles (UAVs) and high‑altitude platforms (HAPs) have become a promising solution for the rising computation demands. However, the uncertain task sizes and high mobility of UAVs pose great challenges to guarantee the quality of service. To address these issues, we propose an LAN architecture where UAVs and HAPs collaboratively provide computation offloading for ground users. Moreover, the uncertainty sets are constructed to characterize the uncertain task size, and a distributionally robust optimization problem is formulated to minimize the worst‑case delay by jointly optimizing the offloading decisions and UAV trajectories. To solve the mixed‑integer min‑max optimization problem, we design the distributionally robust computation offloading and trajectories optimization algorithm. Specifically, the original problem is figured out by iteratively solving the outerlayer and inner‑layer problems. The convex outer‑layer problem with probability distributions is solved by the optimization toolkit. As for the inner‑layer mixed‑integer problem, we employ the Benders decomposition. The decoupled master problem concerning the binary offloading decisions is solved by the integer solver, and UAV trajectories in the sub‑problem are optimized via the successive convex approximation. Simulation results show the proposed algorithm outperforms traditional optimization methods in balancing the worst‑case delay and robustness.
Authors: Zhiyu Wang, Suman Raj, Rajkumar Buyya
Abstract: Multiple Unmanned Aerial Vehicles (UAVs) cooperative Mobile Edge Computing (MEC) systems face critical challenges in coordinating trajectory planning, task offloading, and resource allocation while ensuring Quality of Service (QoS) under dynamic and uncertain environments. Existing approaches suffer from limited scalability, slow convergence, and inefficient knowledge sharing among UAVs, particularly when handling large‑scale IoT device deployments with stringent deadline constraints. This paper proposes AirFed, a novel federated graph‑enhanced multi‑agent reinforcement learning framework that addresses these challenges through three key innovations. First, we design dual‑layer dynamic Graph Attention Networks (GATs) that explicitly model spatial‑temporal dependencies among UAVs and IoT devices, capturing both service relationships and collaborative interactions within the network topology. Second, we develop a dual‑Actor single‑Critic architecture that jointly optimizes continuous trajectory control and discrete task offloading decisions. Third, we propose a reputation‑based decentralized federated learning mechanism with gradient‑sensitive adaptive quantization, enabling efficient and robust knowledge sharing across heterogeneous UAVs. Extensive experiments demonstrate that AirFed achieves 42.9% reduction in weighted cost compared to state‑of‑the‑art baselines, attains over 99% deadline satisfaction and 94.2% IoT device coverage rate, and reduces communication overhead by 54.5%. Scalability analysis confirms robust performance across varying UAV numbers, IoT device densities, and system scales, validating AirFed's practical applicability for large‑scale UAV‑MEC deployments.
Authors: Zhenglai Shen, Hongyu Zhou
Abstract: Compounding climate hazards, such as wildfire‑induced outages and urban heatwaves, challenge the stability and equity of cities. We present a Hazard‑Responsive Digital Twin (H‑RDT) that combines physics‑informed neural network modeling, multimodal data fusion, and equity‑aware risk analytics for urban‑scale response. In a synthetic district with diverse building archetypes and populations, a simulated wildfire‑outage‑heatwave cascade shows that H‑RDT maintains stable indoor temperature predictions (approximately 31 to 33 C) under partial sensor loss, reproducing outage‑driven surges and recovery. The reinforcement learning based fusion module adaptively reweights IoT, UAV, and satellite inputs to sustain spatiotemporal coverage, while the equity‑adjusted mapping isolates high‑vulnerability clusters (schools, clinics, low‑income housing). Prospective interventions, such as preemptive cooling‑center activation and microgrid sharing, reduce population‑weighted thermal risk by 11 to 13 percent, shrink the 95th‑percentile (tail) risk by 7 to 17 percent, and cut overheating hours by up to 9 percent. Beyond the synthetic demonstration, the framework establishes a transferable foundation for real‑city implementation, linking physical hazard modeling with social equity and decision intelligence. The H‑RDT advances digital urban resilience toward adaptive, learning‑based, and equity‑centered decision support for climate adaptation.
Authors: Mingze Gong, Juan Du, Jianbang You
Abstract: Anomaly detection in complex, high‑dimensional data, such as UAV sensor readings, is essential for operational safety but challenging for existing methods due to their limited sensitivity, scalability, and inability to capture intricate dependencies. We propose the Diffuse to Detect (DTD) framework, a novel approach that innovatively adapts diffusion models for anomaly detection, diverging from their conventional use in generative tasks with high inference time. By comparison, DTD employs a single‑step diffusion process to predict noise patterns, enabling rapid and precise identification of anomalies without reconstruction errors. This approach is grounded in robust theoretical foundations that link noise prediction to the data distribution's score function, ensuring reliable deviation detection. By integrating Graph Neural Networks to model sensor relationships as dynamic graphs, DTD effectively captures spatial (inter‑sensor) and temporal anomalies. Its two‑branch architecture, with parametric neural network‑based energy scoring for scalability and nonparametric statistical methods for interpretability, provides flexible trade‑offs between computational efficiency and transparency. Extensive evaluations on UAV sensor data, multivariate time series, and images demonstrate DTD's superior performance over existing methods, underscoring its generality across diverse data modalities. This versatility, combined with its adaptability, positions DTD as a transformative solution for safety‑critical applications, including industrial monitoring and beyond.
Authors: Van Le, Tan Le
Abstract: SpoofTrackBench is a reproducible, modular benchmark for evaluating adversarial robustness in real‑time localization and tracking (RTLS) systems under radar spoofing. Leveraging the Hampton University Skyler Radar Sensor dataset, we simulate drift, ghost, and mirror‑type spoofing attacks and evaluate tracker performance using both Joint Probabilistic Data Association (JPDA) and Global Nearest Neighbor (GNN) architectures. Our framework separates clean and spoofed detection streams, visualizes spoof‑induced trajectory divergence, and quantifies assignment errors via direct drift‑from‑truth metrics. Clustering overlays, injection‑aware timelines, and scenario‑adaptive visualizations enable interpretability across spoof types and configurations. Evaluation figures and logs are auto‑exported for reproducible comparison. SpoofTrackBench sets a new standard for open, ethical benchmarking of spoof‑aware tracking pipelines, enabling rigorous cross‑architecture analysis and community validation.
Authors: Qingjie Wu, Miao Cui, Guangchi Zhang, Beixiong Zheng, Xiaoli Chu, Qingqing Wu
Abstract: Unmanned aerial vehicle (UAV)‑enabled mobile edge computing (MEC) systems can use different multiple access schemes to coordinate multi‑user task offloading. However, it is still unknown which scheme is the most energy‑efficient, especially when the offloading blocklength is finite. To answer this question, this paper minimizes and compares the MEC‑related energy consumption of non‑orthogonal multiple access (NOMA), frequency division multiple access (FDMA), and time division multiple access (TDMA)‑based offloading schemes within UAV‑enabled MEC systems, considering both infinite and finite blocklength scenarios. Through theoretically analysis of the minimum energy consumption required by these three schemes, two novel findings are presented. First, TDMA consistently achieves lower energy consumption than FDMA in both infinite and finite blocklength cases, due to the degrees of freedom afforded by sequential task offloading. Second, NOMA does not necessarily achieve lower energy consumption than FDMA when the offloading blocklength is finite, especially when the channel conditions and the offloaded task data sizes of two user equipments (UEs) are relatively symmetric. Furthermore, an alternating optimization algorithm that jointly optimizes the portions of task offloaded, the offloading times of all UEs, and the UAV location is proposed to solve the formulated energy consumption minimization problems. Simulation results verify the correctness of our analytical findings and demonstrate that the proposed algorithm effectively reduces MEC‑related energy consumption compared to benchmark schemes that do not optimize task offloading portions and/or offloading times.
Authors: Qiao Li, Jie Li, Yukang Zhang, Lei Tan, Jing Chen, Jiayi Ji
Abstract: Aerial‑Ground person re‑identification (AG‑ReID) is an emerging yet challenging task that aims to match pedestrian images captured from drastically different viewpoints, typically from unmanned aerial vehicles (UAVs) and ground‑based surveillance cameras. The task poses significant challenges due to extreme viewpoint discrepancies, occlusions, and domain gaps between aerial and ground imagery. While prior works have made progress by learning cross‑view representations, they remain limited in handling severe pose variations and spatial misalignment. To address these issues, we propose a Geometric and Semantic Alignment Network (GSAlign) tailored for AG‑ReID. GSAlign introduces two key components to jointly tackle geometric distortion and semantic misalignment in aerial‑ground matching: a Learnable Thin Plate Spline (LTPS) Module and a Dynamic Alignment Module (DAM). The LTPS module adaptively warps pedestrian features based on a set of learned keypoints, effectively compensating for geometric variations caused by extreme viewpoint changes. In parallel, the DAM estimates visibility‑aware representation masks that highlight visible body regions at the semantic level, thereby alleviating the negative impact of occlusions and partial observations in cross‑view correspondence. A comprehensive evaluation on CARGO with four matching protocols demonstrates the effectiveness of GSAlign, achieving significant improvements of +18.8% in mAP and +16.8% in Rank‑1 accuracy over previous state‑of‑the‑art methods on the aerial‑ground setting.
Authors: Weixian Qian, Tianyi Yang, Sebastian Schroder, Yao Deng, Jiaohong Yao, Xiao Cheng, Richard Han, Xi Zheng
Abstract: Reliable assessment of safe landing sites in unstructured environments is essential for deploying Unmanned Aerial Vehicles (UAVs) in real‑world applications such as delivery, inspection, and surveillance. Existing learning‑based approaches often degrade under covariate shift and offer limited transparency, making their decisions difficult to interpret and validate on resource‑constrained platforms. We present NeuroSymLand, a neuro‑symbolic framework for marker‑free UAV landing site safety assessment that explicitly separates perception‑driven world modeling from logic‑based safety reasoning. A lightweight segmentation model incrementally constructs a probabilistic semantic scene graph encoding objects, attributes, and spatial relations. Symbolic safety rules, synthesized offline via large language models with human‑in‑the‑loop refinement, are executed directly over this world model at runtime to perform white‑box reasoning, producing ranked landing candidates with human‑readable explanations of the underlying safety constraints. Across 72 simulated and hardware‑in‑the‑loop landing scenarios, NeuroSymLand achieves 61 successful assessments, outperforming four competitive baselines, which achieve between 37 and 57 successes. Qualitative analysis highlights its superior interpretability and transparent reasoning, while deployment incurs negligible edge overhead. Our results suggest that combining explicit world modeling with symbolic reasoning can support accurate, interpretable, and edge‑deployable safety assessment in mobile systems, as demonstrated through UAV landing site assessment.
Authors: Jiahui Li, Xinyue Liang, Geng Sun, Hui Kang, Jiacheng Wang, Dusit Niyato, Shiwen Mao, Abbas Jamalipour
Abstract: Low‑altitude wireless networks (LAWNs) represent a promising architecture that integrates unmanned aerial vehicles (UAVs) as aerial nodes to provide enhanced coverage, reliability, and throughput for diverse applications. However, these networks face significant security vulnerabilities from both known and potential unknown eavesdroppers, which may threaten data confidentiality and system integrity. To solve this critical issue, we propose a novel secure communication framework for LAWNs where the selected UAVs within a swarm function as a virtual antenna array (VAA), complemented by intelligent reflecting surface (IRS) to create a robust defense against eavesdropping attacks. Specifically, we formulate a multi‑objective optimization problem that simultaneously maximizes the secrecy rate while minimizing the maximum sidelobe level and total energy consumption, requiring joint optimization of UAV excitation current weights, flight trajectories, and IRS phase shifts. This problem presents significant difficulties due to the dynamic nature of the system and heterogeneous components. Thus, we first transform the problem into a heterogeneous Markov decision process (MDP). Then, we propose a heterogeneous multi‑agent control approach (HMCA) that integrates a dedicated IRS control policy with a multi‑agent soft actor‑critic framework for UAV control, which enables coordinated operation across heterogeneous network elements. Simulation results show that the proposed HMCA achieves superior performance compared to baseline approaches in terms of secrecy rate improvement, sidelobe suppression, and energy efficiency. Furthermore, we find that the collaborative and passive beamforming synergy between VAA and IRS creates robust security guarantees when the number of UAVs increases.
Authors: Xinyue Liang, Hui Kang, Junwei Che, Jiahui Li, Geng Sun, Qingqing Wu, Jiacheng Wang, Dusit Niyato
Abstract: While low‑altitude wireless networks (LAWNs) based on uncrewed aerial vehicles (UAVs) offer high mobility, flexibility, and coverage for urban communications, they face severe signal attenuation in dense environments due to obstructions. To address this critical issue, we consider introducing collaborative beamforming (CB) of UAVs and omnidirectional reconfigurable beamforming (ORB) of simultaneous transmitting and reflecting reconfigurable intelligent surfaces (STAR‑RIS) to enhance the signal quality and directionality. On this basis, we formulate a joint rate and energy optimization problem (JREOP) to maximize the transmission rate of the overall system, while minimizing the energy consumption of the UAV swarm. Due to the non‑convex and NP‑hard nature of JREOP, we propose a heterogeneous multi‑agent collaborative dynamic (HMCD) optimization framework, which has two core components. The first component is a simulated annealing (SA)‑based STAR‑RIS control method, which dynamically optimizes reflection and transmission coefficients to enhance signal propagation. The second component is an improved multi‑agent deep reinforcement learning (MADRL) control method, which incorporates a self‑attention evaluation mechanism to capture interactions between UAVs and an adaptive velocity transition mechanism to enhance training stability. Simulation results demonstrate that HMCD outperforms various baselines in terms of convergence speed, average transmission rate, and energy consumption. Further analysis reveals that the average transmission rate of the overall system scales positively with both UAV count and STAR‑RIS element numbers.
Authors: Shuning Zhang
Abstract: Unmanned aerial vehicles (UAVs) operating in dynamic wind fields must generate safe and energy‑efficient trajectories under physical and environmental constraints. Traditional planners, such as A and kinodynamic RRT, often yield suboptimal or non‑smooth paths due to discretization and sampling limitations. This paper presents a physics‑informed neural network (PINN) framework that embeds UAV dynamics, wind disturbances, and obstacle avoidance directly into the learning process. Without requiring supervised data, the PINN learns dynamically feasible and collision‑free trajectories by minimizing physical residuals and risk‑aware objectives. Comparative simulations show that the proposed method outperforms A and Kino‑RRT in control energy, smoothness, and safety margin, while maintaining similar flight efficiency. The results highlight the potential of physics‑informed learning to unify model‑based and data‑driven planning, providing a scalable and physically consistent framework for UAV trajectory optimization.
Authors: Liangqi Yuan, Chuhao Deng, Dong-Jun Han, Inseok Hwang, Sabine Brunswicker, Christopher G. Brinton
Abstract: With the rapid advancement of Large Language Models (LLMs), their capabilities in various automation domains, particularly Unmanned Aerial Vehicle (UAV) operations, have garnered increasing attention. Current research remains predominantly constrained to small‑scale UAV applications, with most studies focusing on isolated components such as path planning for toy drones, while lacking comprehensive investigation of medium‑ and long‑range UAV systems in real‑world operational contexts. Larger UAV platforms introduce distinct challenges, including stringent requirements for airport‑based take‑off and landing procedures, adherence to complex regulatory frameworks, and specialized operational capabilities with elevated mission expectations. This position paper presents the Next‑Generation LLM for UAV (NeLV) system ‑‑ a comprehensive demonstration and automation roadmap for integrating LLMs into multi‑scale UAV operations. The NeLV system processes natural language instructions to orchestrate short‑, medium‑, and long‑range UAV missions through five key technical components: (i) LLM‑as‑Parser for instruction interpretation, (ii) Route Planner for Points of Interest (POI) determination, (iii) Path Planner for waypoint generation, (iv) Control Platform for executable trajectory implementation, and (v) UAV monitoring. We demonstrate the system's feasibility through three representative use cases spanning different operational scales: multi‑UAV patrol, multi‑POI delivery, and multi‑hop relocation. Beyond the current implementation, we establish a five‑level automation taxonomy that charts the evolution from current LLM‑as‑Parser capabilities (Level 1) to fully autonomous LLM‑as‑Autopilot systems (Level 5), identifying technical prerequisites and research challenges at each stage.
Authors: Inbazhagan Ravikumar, Ram Sundhar, Narendhiran Vijayakumar
Abstract: Micro aerial vehicles are becoming increasingly important in search and rescue operations due to their agility, speed, and ability to access confined spaces or hazardous areas. However, designing lightweight aerial systems presents significant structural, aerodynamic, and computational challenges. This work addresses two key limitations in many low‑cost aerial systems under two kilograms: their lack of structural durability during flight through rough terrains and inability to replan paths dynamically when new victims or obstacles are detected. We present a fully customised drone built from scratch using only commonly available components and materials, emphasising modularity, low cost, and ease of assembly. The structural frame is reinforced with lightweight yet durable materials to withstand impact, while the onboard control system is powered entirely by free, open‑source software solutions. The proposed system demonstrates real‑time perception and adaptive navigation capabilities without relying on expensive hardware accelerators, offering an affordable and practical solution for real‑world search and rescue missions.
Authors: Xuzhao Li, Xuchen Li, Shiyu Hu
Abstract: Nighttime UAV tracking faces significant challenges in real‑world robotics operations. Low‑light conditions not only limit visual perception capabilities, but cluttered backgrounds and frequent viewpoint changes also cause existing trackers to drift or fail during deployment. To address these difficulties, researchers have proposed solutions based on low‑light enhancement and domain adaptation. However, these methods still have notable shortcomings in actual UAV systems: low‑light enhancement often introduces visual artifacts, domain adaptation methods are computationally expensive and existing lightweight designs struggle to fully leverage dynamic object information. Based on an in‑depth analysis of these key issues, we propose MATrack‑a multiscale adaptive system designed specifically for nighttime UAV tracking. MATrack tackles the main technical challenges of nighttime tracking through the collaborative work of three core modules: Multiscale Hierarchy Blende (MHB) enhances feature consistency between static and dynamic templates. Adaptive Key Token Gate accurately identifies object information within complex backgrounds. Nighttime Template Calibrator (NTC) ensures stable tracking performance over long sequences. Extensive experiments show that MATrack achieves a significant performance improvement. On the UAVDark135 benchmark, its precision, normalized precision and AUC surpass state‑of‑the‑art (SOTA) methods by 5.9%, 5.4% and 4.2% respectively, while maintaining a real‑time processing speed of 81 FPS. Further tests on a real‑world UAV platform validate the system's reliability, demonstrating that MATrack can provide stable and effective nighttime UAV tracking support for critical robotics applications such as nighttime search and rescue and border patrol.
Authors: Weihong Qin, Aimin Wang, Geng Sun, Zemin Sun, Jiacheng Wang, Dusit Niyato, Dong In Kim, Zhu Han
Abstract: Space‑air‑ground integrated multi‑access edge computing (SAGIN‑MEC) provides a promising solution for the rapidly developing low‑altitude economy (LAE) to deliver flexible and wide‑area computing services. However, fully realizing the potential of SAGIN‑MEC in the LAE presents significant challenges, including coordinating decisions across heterogeneous nodes with different roles, modeling complex factors such as mobility and network variability, and handling real‑time decision‑making under partially observable environment with hybrid variables. To address these challenges, we first present a hierarchical SAGIN‑MEC architecture that enables the coordination between user devices (UDs), uncrewed aerial vehicles (UAVs), and satellites. Then, we formulate a UD cost minimization optimization problem (UCMOP) to minimize the UD cost by jointly optimizing the task offloading ratio, UAV trajectory planning, computing resource allocation, and UD association. We show that the UCMOP is an NP‑hard problem. To overcome this challenge, we propose a multi‑agent deep deterministic policy gradient (MADDPG)‑convex optimization and coalitional game (MADDPG‑COCG) algorithm. Specifically, we employ the MADDPG algorithm to optimize the continuous temporal decisions for heterogeneous nodes in the partially observable SAGIN‑MEC system. Moreover, we propose a convex optimization and coalitional game (COCG) method to enhance the conventional MADDPG by deterministically handling the hybrid and varying‑dimensional decisions. Simulation results demonstrate that the proposed MADDPG‑COCG algorithm significantly enhances the user‑centric performances in terms of the aggregated UD cost, task completion delay, and UD energy consumption, with a slight increase in UAV energy consumption, compared to the benchmark algorithms. Moreover, the MADDPG‑COCG algorithm shows superior convergence stability and scalability.
Authors: Daniel Schleich, Jan Quenzel, Sven Behnke
Abstract: In recent years, consumer‑grade UAVs have been widely adopted by first responders. In general, they are operated manually, which requires trained pilots, especially in unknown GNSS‑denied environments and in the vicinity of structures. Autonomous flight can facilitate the application of UAVs and reduce operator strain. However, autonomous systems usually require special programming interfaces, custom sensor setups, and strong onboard computers, which limits a broader deployment.
We present a system for autonomous flight using lightweight consumer‑grade DJI drones. They are controlled by an Android app for state estimation and obstacle avoidance directly running on the UAV's remote control. Our ground control station enables a single operator to configure and supervise multiple heterogeneous UAVs at once. Furthermore, it combines the observations of all UAVs into a joint 3D environment model for improved situational awareness.
Authors: Reza Ahmari, Ahmad Mohammadi, Vahid Hemmati, Mohammed Mynuddin, Mahmoud Nabil Mahmoud, Parham Kebria, Abdollah Homaifar, Mehrdad Saif
Abstract: This study investigates the vulnerabilities of autonomous navigation and landing systems in Urban Air Mobility (UAM) vehicles. Specifically, it focuses on Trojan attacks that target deep learning models, such as Convolutional Neural Networks (CNNs). Trojan attacks work by embedding covert triggers within a model's training data. These triggers cause specific failures under certain conditions, while the model continues to perform normally in other situations. We assessed the vulnerability of Urban Autonomous Aerial Vehicles (UAAVs) using the DroNet framework. Our experiments showed a significant drop in accuracy, from 96.4% on clean data to 73.3% on data triggered by Trojan attacks. To conduct this study, we collected a custom dataset and trained models to simulate real‑world conditions. We also developed an evaluation framework designed to identify Trojan‑infected models. This work demonstrates the potential security risks posed by Trojan attacks and lays the groundwork for future research on enhancing the resilience of UAM systems.
Authors: Václav Pritzl, Xianjia Yu, Tomi Westerlund, Petr Štěpán, Martin Saska
Abstract: Accurate long‑term localization using onboard sensors is crucial for robots operating in Global Navigation Satellite System (GNSS)‑denied environments. While complementary sensors mitigate individual degradations, carrying all the available sensor types on a single robot significantly increases the size, weight, and power demands. Distributing sensors across multiple robots enhances the deployability but introduces challenges in fusing asynchronous, multi‑modal data from independently moving platforms. We propose a novel adaptive multi‑modal multi‑robot cooperative localization approach using a factor‑graph formulation to fuse asynchronous Visual‑Inertial Odometry (VIO), LiDAR‑Inertial Odometry (LIO), and 3D inter‑robot detections from distinct robots in a loosely‑coupled fashion. The approach adapts to changing conditions, leveraging reliable data to assist robots affected by sensory degradations. A novel interpolation‑based factor enables fusion of the unsynchronized measurements. LIO degradations are evaluated based on the approximate scan‑matching Hessian. A novel approach of weighting odometry data proportionally to the Wasserstein distance between the consecutive VIO outputs is proposed. A theoretical analysis is provided, investigating the cooperative localization problem under various conditions, mainly in the presence of sensory degradations. The proposed method has been extensively evaluated on real‑world data gathered with heterogeneous teams of an Unmanned Ground Vehicle (UGV) and Unmanned Aerial Vehicles (UAVs), showing that the approach provides significant improvements in localization accuracy in the presence of various sensory degradations.
Authors: Shuang Qi, Bin Lin, Yiqin Deng, Hongyang Pan, Xu Hu
Abstract: Devices within the marine Internet of Things (MIoT) can connect to low Earth orbit (LEO) satellites and unmanned aerial vehicles (UAVs) to facilitate low‑latency data transmission and execution, as well as enhanced‑capacity data storage. However, without proper traffic handling strategy, it is still difficult to effectively meet the low‑latency requirements. In this paper, we consider a cooperative satellite‑aerial‑MIoT network (CSAMN) for maritime edge computing and maritime data storage to prioritize delay‑sensitive (DS) tasks by employing mobile edge computing, while handling delay‑tolerant (DT) tasks via the store‑carry‑forward method. Considering the delay constraints of DS tasks, we formulate a constrained joint optimization problem of maximizing satellite‑collected data volume while minimizing system energy consumption by controlling four interdependent variables, including the transmit power of UAVs for DS tasks, the start time of DT tasks, computing resource allocation, and offloading ratio. To solve this non‑convex and non‑linear problem, we propose a joint computation offloading and resource management (JCORM) algorithm using the Dinkelbach method and linear programming. Our results show that the volume of data collected by the proposed JCORM algorithm can be increased by up to 41.5% compared to baselines. Moreover, JCORM algorithm achieves a dramatic reduction in computational time, from a maximum of 318.21 seconds down to just 0.16 seconds per experiment, making it highly suitable for real‑time maritime applications.
Authors: Wenli Yuan, Kan Yu, Xiaowu Liu, Kaixuan Li, Qixun Zhang, Zhiyong Feng
Abstract: In low altitude UAV communications, accurate channel estimation remains challenging due to the dynamic nature of air to ground links, exacerbated by high node mobility and the use of large scale antenna arrays, which introduce hybrid near and far field propagation conditions. While conventional estimation methods rely on far field assumptions, they fail to capture the intricate channel variations in near‑field scenarios and overlook valuable geometric priors such as real‑time transceiver positions. To overcome these limitations, this paper introduces a unified channel estimation framework based on a location aware hybrid deep learning architecture. The proposed model synergistically combines convolutional neural networks (CNNs) for spatial feature extraction, bidirectional long short term memory (BiLSTM) networks for modeling temporal evolution, and a multihead self attention mechanism to enhance focus on discriminative channel components. Furthermore, real‑time transmitter and receiver locations are embedded as geometric priors, improving sensitivity to distance under near field spherical wavefronts and boosting model generalization. Extensive simulations validate the effectiveness of the proposed approach, showing that it outperforms existing benchmarks by a significant margin, achieving at least a 30.25% reduction in normalized mean square error (NMSE) on average.
Authors: Zhenyu Zhao, Xiaoxia Xu, Tiankui Zhang, Junjie Li, Yuanwei Liu
Abstract: This paper proposes a novel multi‑unmanned aerial vehicle (UAV) assisted collaborative mobile edge computing (MEC) framework, where the computing tasks of terminal devices (TDs) can be decomposed into serial or parallel sub‑tasks and offloaded to collaborative UAVs. We first model the dependencies among all sub‑tasks as a directed acyclic graph (DAG) and design a two‑timescale frame structure to decouple the sub‑task interdependencies for sub‑task scheduling. Then, a joint sub‑task offloading, computational resource allocation, and UAV trajectories optimization problem is formulated, which aims to minimize the system cost, i.e., the weighted sum of the task completion delay and the system energy consumption. To solve this non‑convex mixed‑integer nonlinear programming (MINLP) problem, a penalty dual decomposition and successive convex approximation (PDD‑SCA) algorithm is developed. Particularly, the original MINLP problem is equivalently transferred into a continuous form relying on PDD theory. By decoupling the resulting problem into three nested subproblems, the SCA method is further combined to recast the non‑convex components and obtain desirable solutions. Numerical results demonstrate that: 1) Compared to the benchmark algorithms, the proposed scheme can significantly reduce the system cost, and thus realize an improved trade‑off between task latency and energy consumption; 2) The proposed algorithm can achieve an efficient workload balancing for distributed computation across multiple UAVs.
Authors: Swati Dantu, Robert Pěnička, Martin Saska
Abstract: This paper tackles the challenge of learning a generalizable minimum‑time flight policy for UAVs, capable of navigating between arbitrary start and goal states while balancing agile flight and stable hovering. Traditional approaches, particularly in autonomous drone racing, achieve impressive speeds and agility but are constrained to predefined track layouts, limiting real‑world applicability. To address this, we propose a reinforcement learning‑based framework that simultaneously learns state‑to‑state minimum‑time planning and control and generalizes to arbitrary state‑to‑state flights. Our approach leverages Point Mass Model (PMM) trajectories as proxy rewards to approximate the true optimal flight objective and employs curriculum learning to scale the training process efficiently and to achieve generalization. We validate our method through simulation experiments, comparing it against Nonlinear Model Predictive Control (NMPC) tracking PMM‑generated trajectories and conducting ablation studies to assess the impact of curriculum learning. Finally, real‑world experiments confirm the robustness of our learned policy in outdoor environments, demonstrating its ability to generalize and operate on a small ARM‑based single‑board computer.
Authors: Vojtěch Vrba, Viktor Walter, Petr Štěpán, Martin Saska
Abstract: A novel approach for the fast onboard detection of isolated markers for visual relative localisation of multiple teammates in agile UAV swarms is introduced in this paper. As the detection forms a key component of real‑time localisation systems, a three‑fold innovation is presented, consisting of an optimised procedure for CPUs, a GPU shader program, and a functionally equivalent FPGA streaming architecture. For the proposed CPU and GPU solutions, the mean processing time per pixel of input camera frames was accelerated by two to three orders of magnitude compared to the \revunoptimised state‑of‑the‑art approach. For the localisation task, the proposed FPGA architecture offered the most significant overall acceleration by minimising the total delay from camera exposure to detection results. Additionally, the proposed solutions were evaluated on various 32‑bit and 64‑bit embedded platforms to demonstrate their efficiency, as well as their feasibility for applications using low‑end UAVs and MAVs. Thus, it has become a crucial enabling technology for agile UAV swarming.
Authors: Francesco Vona, Mohamed Amer, Omar Abdellatif, Michelle Celina Hallmann, Maximilian Warsinke, Adriana-Simona Mihaita, Jan-Niklas Voigt-Antons
Abstract: Controlling Unmanned Aerial Vehicles (UAVs) is a cognitively demanding task, with accidents often arising from insufficient situational awareness, inadequate training, and poor user experiences. Providing more intuitive and immersive visual feedback, particularly through Digital Twin technologies, offers new opportunities to enhance pilot awareness and overall experience quality. In this study, we investigate how different virtual points of view (POVs) influence user experience and performance during UAV piloting in Virtual Reality (VR), utilizing a digital twin that faithfully replicates the real‑world flight environment. We developed a VR application that enables participants to control a physical DJI Mini 4 Pro drone while immersed in a digital twin with four distinct camera perspectives: Baseline View (static external), First‑Person View, Chase View, and Third‑Person View. Nineteen participants completed a series of ring‑based obstacle courses from each perspective. In addition to objective flight data, we collected standardized subjective assessments of user experience, presence, workload, cybersickness, and situational awareness. Quantitative analyses revealed that the First‑Person View was associated with significantly higher mental demand and effort, greater trajectory deviation, but smoother control inputs compared to the Third‑Person and Chase perspectives. Complementing these findings, preference data indicated that the Third‑Person View was most consistently favored, whereas the First‑Person View elicited polarized reactions.
Authors: Jie Song, Yang Bai, Mikhail Svinin, Naoki Wakamiya
Abstract: This study presents a control strategy for coordinating multiple unmanned aerial vehicles (UAVs) to monitor unknown flood regions and estimate the extent of inundation. The proposed method adopts a density‑driven coverage framework based on Centroidal Voronoi Tessellation (CVT), in which the density function is modeled using a Gaussian Mixture of Density Functions (GMDF). This formulation provides a more accurate characterization of inundated areas compared to conventional axis‑aligned Gaussian models. The performance of the two density modeling approaches is systematically evaluated under different UAV fleet sizes (16, 20, and 24), with multiple simulation trials conducted in the ROS/Gazebo environment. The results show that the GMDF‑based formulation consistently achieves higher coverage rates, demonstrating its effectiveness in enhancing flood monitoring and improving UAV spatial distribution.
Authors: Yue Wang, Lixian Zhang, Yimin Zhu, Yangguang Liu, Xuwei Yang
Abstract: The aim of this paper is to design a new type of grasping and perching unmanned aerial vehicle (UAV), called Flexbee, which features a soft vector‑propulsion nozzle (SVPN). Compared to previous UAVs, Flexbee integrates flight, grasping, and perching functionalities into the four SVPNs. This integration offers advantages including decoupled position and attitude control, high structural reuse, and strong adaptability strong adaptability for grasping and perching. A dynamics model of Flexbee has been developed, and the nonlinear coupling issue of the moment has been resolved through linearization of the equivalent moment model. A hierarchical control strategy was used to design controllers for the two operational modes of Flexbee. Finally, flight, grasping, and perching experiments were conducted to validate Flexbee's kinematic capabilities and the effectiveness of the control strategy.
Authors: Zaineh Abughazzah, Emna Baccour, Loay Ismail, Amr Mohamed, Mounir Hamdi
Abstract: The integration of Unmanned Aerial Vehicles (UAVs) into Open Radio Access Networks (O‑RAN) enhances communication in disaster management and Search and Rescue (SAR) operations by ensuring connectivity when infrastructure fails. However, SAR scenarios demand stringent security and low‑latency communication, as delays or breaches can compromise mission success. While UAVs serve as mobile relays, they introduce challenges in energy consumption and resource management, necessitating intelligent allocation strategies. Existing UAV‑assisted O‑RAN approaches often overlook the joint optimization of security, latency, and energy efficiency in dynamic environments. This paper proposes a novel Reinforcement Learning (RL)‑based framework for dynamic resource allocation in UAV relays, explicitly addressing these trade‑offs. Our approach formulates an optimization problem that integrates security‑aware resource allocation, latency minimization, and energy efficiency, which is solved using RL. Unlike heuristic or static methods, our framework adapts in real‑time to network dynamics, ensuring robust communication. Simulations demonstrate superior performance compared to heuristic baselines, achieving enhanced security and energy efficiency while maintaining ultra‑low latency in SAR scenarios.
Authors: Yiheng Wang
Abstract: Intelligent reflecting surface (IRS) assisted unmanned aerial vehicle (UAV) systems provide a new paradigm for reconfigurable and flexible wireless communications. To enable more energy efficient and spectrum efficient IRS assisted UAV wireless communications, this paper introduces a novel IRS‑assisted UAV enabled spectrum sharing system with orthogonal frequency division multiplexing (OFDM). The goal is to maximize the energy efficiency (EE) of the secondary network by jointly optimizing the beamforming, subcarrier allocation, IRS phase shifts, and the UAV trajectory subject to practical transmit power and passive reflection constraints as well as UAV physical limitations. A physically grounded propulsion‑energy model is adopted, with its tight upper bound used to form a tractable EE lower bound for the spectrum sharing system. To handle highly non convex, time coupled optimization problems with a mixed continuous and discrete policy space, we develop a deep reinforcement learning (DRL) approach based on the actor critic framework. Extended experiments show the significant EE improvement of the proposed DRL‑based approach compared to several benchmark schemes, thus demonstrating the effectiveness and robustness of the proposed approach with mobility.
Authors: Zeeshan Kaleem, Muhammad Afaq, Chau Yuen, Octavia A. Dobre, John M. Cioffi
Abstract: This letter introduces a Graph‑Condensed Quantum‑Inspired Placement (GC‑QAP) framework for reliability‑driven trajectory optimization in Uncrewed Aerial Vehicle (UAV) assisted low‑altitude wireless networks. The dense waypoint graph is condensed using probabilistic quantum‑annealing to preserve interference‑aware centroids while reducing the control state space and maintaining link‑quality. The resulting problem is formulated as a priority‑aware Markov decision process and solved using epsilon‑greedy off‑policy Q‑learning, considering UAV kinematic and flight corridor constraints. Unlike complex continuous‑action reinforcement learning approaches, GC‑QAP achieves stable convergence and low outage with substantially and lower computational cost compared to baseline schemes.
Authors: Zenghuang Fu, Xiaofeng Han, Mingda Jia, Jin ming Yang, Qi Zeng, Muyang Zahng, Changwei Wang, Weiliang Meng, Xiaopeng Zhang
Abstract: Multi‑object tracking (MOT) from unmanned aerial vehicles (UAVs) presents unique challenges due to unpredictable object motion, frequent occlusions, and limited appearance cues inherent to aerial viewpoints. These issues are further exacerbated by abrupt UAV movements, leading to unreliable trajectory estimation and identity switches. Conventional motion models, such as Kalman filters or static sequence encoders, often fall short in capturing both linear and non‑linear dynamics under such conditions. To tackle these limitations, we propose DMTrack, a deformable motion tracking framework tailored for UAV‑based MOT. Our DMTrack introduces three key components: DeformMamba, a deformable state‑space predictor that dynamically aggregates historical motion states for adaptive trajectory modeling; MotionGate, a lightweight gating module that fuses Kalman and Mamba predictions based on motion context and uncertainty; and an uncertainty‑aware association strategy that enhances identity preservation by aligning motion trends with prediction confidence. Extensive experiments on the VisDrone‑MOT and UAVDT benchmarks demonstrate that our DMTrack achieves state‑of‑the‑art performance in identity consistency and tracking accuracy, particularly under high‑speed and non‑linear motion. Importantly, our method operates without appearance models and maintains competitive efficiency, highlighting its practicality for robust UAV‑based tracking.
Authors: Shumaila Javaid, Nasir Saeed
Abstract: Integrated Satellite Aerial Terrestrial Networks (ISATNs) are envisioned as key enablers of 6G, providing global connectivity for applications such as autonomous transportation, Industrial IoT, and disaster response. Their large‑scale deployment, however, risks unsustainable energy use and carbon emissions. This work advances prior energy‑aware studies by proposing a carbon‑aware orchestration framework for ISATNs that leverages Digital Twin (DT) technology. The framework adopts grams of CO_2‑equivalent per bit (gCO_2/bit) as a primary sustainability metric and implements a multi timescale Plan Do Check Act (PDCA) loop that combines day‑ahead forecasting with real‑time adaptive optimization. ISATN‑specific control knobs, including carbon‑aware handovers, UAV duty cycling, and renewable‑aware edge placement, are exploited to reduce emissions. Simulation results with real carbon intensity data show up to 29% lower gCO_2/bit than QoS‑only orchestration, while improving renewable utilization and resilience under adverse events.
Authors: Siqi Chen, Shanyue Guan
Abstract: The advancement of UAV technology has enabled efficient, non‑contact structural health monitoring. Combined with photogrammetry, UAVs can capture high‑resolution scans and reconstruct detailed 3D models of infrastructure. However, a key challenge remains in segmenting specific structural components from these models‑a process traditionally reliant on time‑consuming and error‑prone manual labeling. To address this issue, we propose a machine learning‑based framework for automated segmentation of 3D point clouds. Our approach uses the complementary strengths of real‑world UAV‑scanned point clouds and synthetic data generated from Building Information Modeling (BIM) to overcome the limitations associated with manual labeling. Validation on a railroad track dataset demonstrated high accuracy in identifying and segmenting major components such as rails and crossties. Moreover, by using smaller‑scale datasets supplemented with BIM data, the framework significantly reduced training time while maintaining reasonable segmentation accuracy. This automated approach improves the precision and efficiency of 3D infrastructure model segmentation and advances the integration of UAV and BIM technologies in structural health monitoring and infrastructure management.
Authors: Xiaobo Zheng, Pan Tang, Defu Lin, Shaoming He
Abstract: Swarm trajectory optimization problems are a well‑recognized class of multi‑agent optimal control problems with strong nonlinearity. However, the heuristic nature of needing to set the final time for agents beforehand and the time‑consuming limitation of the significant number of iterations prohibit the application of existing methods to large‑scale swarm of Unmanned Aerial Vehicles (UAVs) in practice. In this paper, we propose a spatial‑temporal trajectory optimization framework that accomplishes multi‑UAV consensus based on the Alternating Direction Multiplier Method (ADMM) and uses Differential Dynamic Programming (DDP) for fast local planning of individual UAVs. The introduced framework is a two‑level architecture that employs Parameterized DDP (PDDP) as the trajectory optimizer for each UAV, and ADMM to satisfy the local constraints and accomplish the spatial‑temporal parameter consensus among all UAVs. This results in a fully distributed algorithm called Distributed Parameterized DDP (D‑PDDP). In addition, an adaptive tuning criterion based on the spectral gradient method for the penalty parameter is proposed to reduce the number of algorithmic iterations. Several simulation examples are presented to verify the effectiveness of the proposed algorithm.
Authors: Shantnav Agarwal, Javier Alonso-Mora, Sihao Sun
Abstract: Existing approaches for transporting and manipulating cable‑suspended loads using multiple UAVs along reference trajectories typically rely on either centralized control architectures or reliable inter‑agent communication. In this work, we propose a novel machine learning based method for decentralized kinodynamic planning that operates effectively under partial observability and without inter‑agent communication. Our method leverages imitation learning to train a decentralized student policy for each UAV by imitating a centralized kinodynamic motion planner with access to privileged global observations. The student policy generates smooth trajectories using physics‑informed neural networks that respect the derivative relationships in motion. During training, the student policies utilize the full trajectory generated by the teacher policy, leading to improved sample efficiency. Moreover, each student policy can be trained in under two hours on a standard laptop. We validate our method in both simulation and real‑world environments to follow an agile reference trajectory, demonstrating performance comparable to that of centralized approaches.
Authors: Sebastian Mocanu, Emil Slusanschi, Marius Leordeanu
Abstract: This paper presents a vision‑only autonomous flight system for small UAVs operating in controlled indoor environments. The system combines semantic segmentation with monocular depth estimation to enable obstacle avoidance, scene exploration, and autonomous safe landing operations without requiring GPS or expensive sensors such as LiDAR. A key innovation is an adaptive scale factor algorithm that converts non‑metric monocular depth predictions into accurate metric distance measurements by leveraging semantic ground plane detection and camera intrinsic parameters, achieving a mean distance error of 14.4 cm. The approach uses a knowledge distillation framework where a color‑based Support Vector Machine (SVM) teacher generates training data for a lightweight U‑Net student network (1.6M parameters) capable of real‑time semantic segmentation. For more complex environments, the SVM teacher can be replaced with a state‑of‑the‑art segmentation model. Testing was conducted in a controlled 5x4 meter laboratory environment with eight cardboard obstacles simulating urban structures. Extensive validation across 30 flight tests in a real‑world environment and 100 flight tests in a digital‑twin environment demonstrates that the combined segmentation and depth approach increases the distance traveled during surveillance and reduces mission time while maintaining 100% success rates. The system is further optimized through end‑to‑end learning, where a compact student neural network learns complete flight policies from demonstration data generated by our best‑performing method, achieving an 87.5% autonomous mission success rate. This work advances practical vision‑based drone navigation in structured environments, demonstrating solutions for metric depth estimation and computational efficiency challenges that enable deployment on resource‑constrained platforms.
Authors: Chi Zhang, Xian Huang, Wei Dong
Abstract: UAVs equipped with a single depth camera encounter significant challenges in dynamic obstacle avoidance due to limited field of view and inevitable blind spots. While active vision strategies that steer onboard cameras have been proposed to expand sensing coverage, most existing methods separate motion planning from sensing considerations, resulting in less effective and delayed obstacle response. To address this limitation, we introduce SPOT (Sensing‑augmented Planning via Obstacle Threat modeling), a unified planning framework for observation‑aware trajectory planning that explicitly incorporates sensing objectives into motion optimization. At the core of our method is a Gaussian Process‑based obstacle belief map, which establishes a unified probabilistic representation of both recognized (previously observed) and potential obstacles. This belief is further processed through a collision‑aware inference mechanism that transforms spatial uncertainty and trajectory proximity into a time‑varying observation urgency map. By integrating urgency values within the current field of view, we define differentiable objectives that enable real‑time, observation‑aware trajectory planning with computation times under 10 ms. Simulation and real‑world experiments in dynamic, cluttered, and occluded environments show that our method detects potential dynamic obstacles 2.8 seconds earlier than baseline approaches, increasing dynamic obstacle visibility by over 500%, and enabling safe navigation through cluttered, occluded environments.
Authors: Vienna Li, Justin Villa, Dan Diessner, Jayson Clifford, Laxima Niure Kandel
Abstract: GPS spoofing poses a growing threat to aviation by falsifying satellite signals and misleading aircraft navigation systems. This paper demonstrates a proof‑of‑concept spoofing detection strategy based on analyzing satellite Carrier‑to‑Noise Density Ratio (C/N_0) variation during controlled static antenna orientations. Using a u‑blox EVK‑M8U receiver and a GPSG‑1000 satellite simulator, C/N_0 data is collected under three antenna orientations flat, banked right, and banked left) in both real‑sky (non‑spoofed) and spoofed environments. Our findings reveal that under non‑spoofed signals, C/N_0 values fluctuate naturally with orientation, reflecting true geometric dependencies. However, spoofed signals demonstrate a distinct pattern: the flat orientation, which directly faces the spoofing antenna, consistently yielded the highest C/N_0 values, while both banked orientations showed reduced C/N_0 due to misalignment with the spoofing source. These findings suggest that simple maneuvers such as brief banking to induce C/N_0 variations can provide early cues of GPS spoofing for general aviation and UAV systems.
Authors: Manuel J. Fernandez, Alejandro Suarez, Anibal Ollero, Matteo Fumagalli
Abstract: This paper presents the integration of a Variable Stiffness Link (VSL) for long‑reach aerial manipulation, enabling adaptable mechanical coupling between an aerial multirotor platform and a dual‑arm manipulator. Conventional long‑reach manipulation systems rely on rigid or cable connections, which limit precision or transmit disturbances to the aerial vehicle. The proposed VSL introduces an adjustable stiffness mechanism that allows the link to behave either as a flexible rope or as a rigid rod, depending on task requirements.
The system is mounted on a quadrotor equipped with the LiCAS dual‑arm manipulator and evaluated through teleoperated experiments, involving external disturbances and parcel transportation tasks. Results demonstrate that varying the link stiffness significantly modifies the dynamic interaction between the UAV and the payload. The flexible configuration attenuates external impacts and aerodynamic perturbations, while the rigid configuration improves positional accuracy during manipulation phases.
These results confirm that VSL enhances versatility and safety, providing a controllable trade‑off between compliance and precision. Future work will focus on autonomous stiffness regulation, multi‑rope configurations, cooperative aerial manipulation and user studies to further assess its impact on teleoperated and semi‑autonomous aerial tasks.
Authors: Andre Rochow, Jonas Marcic, Svetlana Seliunina, Sven Behnke
Abstract: 3D phenotyping of plants plays a crucial role for understanding plant growth, yield prediction, and disease control. We present a pipeline capable of generating high‑quality 3D reconstructions of individual agricultural plants. To acquire data, a small commercially available UAV captures images of a selected plant. Apart from placing ArUco markers, the entire image acquisition process is fully autonomous, controlled by a self‑developed Android application running on the drone's controller. The reconstruction task is particularly challenging due to environmental wind and downwash of the UAV. Our proposed pipeline supports the integration of arbitrary state‑of‑the‑art 3D reconstruction methods. To mitigate errors caused by leaf motion during image capture, we use an iterative method that gradually adjusts the input images through deformation. Motion is estimated using optical flow between the original input images and intermediate 3D reconstructions rendered from the corresponding viewpoints. This alignment gradually reduces scene motion, resulting in a canonical representation. After a few iterations, our pipeline improves the reconstruction of state‑of‑the‑art methods and enables the extraction of high‑resolution 3D meshes. We will publicly release the source code of our reconstruction pipeline. Additionally, we provide a dataset consisting of multiple plants from various crops, captured across different points in time.
Authors: Chitralekha Gupta, Soundarya Ramesh, Praveen Sasikumar, Kian Peen Yeo, Suranga Nanayakkara
Abstract: Unmanned Aerial Vehicles (UAVs) or drones, are increasingly used in search and rescue missions to detect human presence. Existing systems primarily leverage vision‑based methods which are prone to fail under low‑visibility or occlusion. Drone‑based audio perception offers promise but suffers from extreme ego‑noise that masks sounds indicating human presence. Existing datasets are either limited in diversity or synthetic, lacking real acoustic interactions, and there are no standardized setups for drone audition. To this end, we present DroneAudioset (The dataset is publicly available at https://huggingface.co/datasets/ahlab‑drone‑project/DroneAudioSet/ under the MIT license), a comprehensive drone audition dataset featuring 23.5 hours of annotated recordings, covering a wide range of signal‑to‑noise ratios (SNRs) from ‑57.2 dB to ‑2.5 dB, across various drone types, throttles, microphone configurations as well as environments. The dataset enables development and systematic evaluation of noise suppression and classification methods for human‑presence detection under challenging conditions, while also informing practical design considerations for drone audition systems, such as microphone placement trade‑offs, and development of drone noise‑aware audio processing. This dataset is an important step towards enabling design and deployment of drone‑audition systems.
Authors: Shiying Chen, Guangji Chen, Long Shi, Qingqing Wu, Kang Wei
Abstract: Integrated sensing and communication (ISAC) is viewed as a key enabler for future wireless networks by sharing the hardware and wireless resources between the functionalities of sensing and communication (S&C). Due to the shared wireless resources for both S&C, it is challenging to achieve a critical trade‑off between these two integrated functionalities. To address this issue, this paper proposes a novel dual‑level channel reconfiguration framework for ISAC by deploying rotatable antennas at an unmanned aerial vehicle (UAV), where both the large‑scale path loss and the correlation of S&C channels can be proactively controlled, thereby allowing a flexible trade‑off between S&C performance. To characterize the S&C tradeoff, we aim to maximize the communication rate by jointly optimizing the RA rotation, the transmit beamforming, and the UAV trajectory, subject to the given requirement of sensing performance. For the typical scenario of static UAV deployment, we introduce the concept of subspace correlation coefficient to derive closed‑form solutions for the optimal RA rotation, transmit beamforming, and UAV hovering location. For the scenario of a fully mobile UAV, we prove that the optimal trajectory of a UAV follows a hover‑fly‑hover (HFH) structure, thereby obtaining its global optimal solution. Simulation results show that the proposed design significantly improves the achievable S&C trade‑off region compared to benchmark schemes.
Authors: Sina Kazemdehbashi, Yanchao Liu, Boris S. Mordukhovich
Abstract: Natural and human‑made disasters can cause severe devastation and claim thousands of lives worldwide. Therefore, developing efficient methods for disaster response and management is a critical task for relief teams. One of the most essential components of effective response is the rapid collection of information about affected areas, damages, and victims. More data translates into better coordination, faster rescue operations, and ultimately, more lives saved. However, in some disasters, such as earthquakes, the communication infrastructure is often partially or completely destroyed, making it extremely difficult for victims to send distress signals and for rescue teams to locate and assist them in time. Unmanned Aerial Vehicles (UAVs) have emerged as valuable tools in such scenarios. In particular, a fleet of UAVs can be dispatched from a mobile station to the affected area to facilitate data collection and establish temporary communication networks. Nevertheless, real‑world deployment of UAVs faces several challenges, with adverse weather conditions‑‑especially wind‑‑being among the most significant. To address this, we develop a novel mathematical framework to determine the optimal location of a mobile UAV station while explicitly accounting for the heterogeneity of the UAVs and the effect of wind. In particular, we generalize the Sylvester problem to introduce the Sylvester‑Fermat‑Torricelli (SFT) problem, which captures complex factors such as wind influence, UAV heterogeneity, and back‑and‑forth motion within a unified framework. The proposed framework enhances the practicality of UAV‑based disaster response planning by accounting for real‑world factors such as wind and UAV heterogeneity. Experimental results demonstrate that it can reduce wasted operational time by up to 84%, making post‑disaster missions significantly more efficient and effective.
Authors: Marios-Nektarios Stamatopoulos, Elias Small, Shridhar Velhal, Avijit Banerjee, George Nikolakopoulos
Abstract: This article presents a fully autonomous aerial masonry construction framework using heterogeneous unmanned aerial vehicles (UAVs), supported by experimental validation. Two specialized UAVs were developed for the task: (i) a brick‑carrier UAV equipped with a ball‑joint actuation mechanism for precise brick manipulation, and (ii) an adhesion UAV integrating a servo‑controlled valve and extruder nozzle for accurate adhesion application. The proposed framework employs a reactive mission planning unit that combines a dependency graph of the construction layout with a conflict graph to manage simultaneous task execution, while hierarchical state machines ensure robust operation and safe transitions during task execution. Dynamic task allocation allows real‑time adaptation to environmental feedback, while minimum‑jerk trajectory generation ensures smooth and precise UAV motion during brick pickup and placement. Additionally, the brick‑carrier UAV employs an onboard vision system that estimates brick poses in real time using ArUco markers and a least‑squares optimization filter, enabling accurate alignment during construction. To the best of the authors' knowledge, this work represents the first experimental demonstration of fully autonomous aerial masonry construction using heterogeneous UAVs, where one UAV precisely places the bricks while another autonomously applies adhesion material between them. The experimental results supported by the video showcase the effectiveness of the proposed framework and demonstrate its potential to serve as a foundation for future developments in autonomous aerial robotic construction.
Authors: Rajendra Upadhyay, Al Nahian Bin Emran, Rajendra Paudyal, Lisa Donnan, Duminda Wijesekera
Abstract: Uncooperative unmanned aerial vehicles (UAVs) pose emerging threats to critical infrastructure and border protection by operating as rogue user equipment (UE) within cellular networks, consuming resources, creating interference, and potentially violating restricted airspaces. This paper presents minimal features of the operating space, yet an end‑to‑end simulation framework to analyze detect‑to‑mitigate latency of such intrusions in a hybrid terrestrial‑non‑terrestrial (LEO satellite) 5G system. The system model includes terrestrial gNBs, satellite backhaul (with stochastic outages), and a detection logic (triggered by handover instability and signal quality variance). A lockdown mechanism is invoked upon detection, with optional local fallback to cap mitigation delays. Monte Carlo sweeps across UAV altitudes, speeds, and satellite outage rates yield several insights. First, satellite backhaul outages can cause arbitrarily long mitigation delays, yet, to meet fallback deadlines, they need to be effectively bounded. Second, while handover instability was hypothesized, our results show that extra handovers have a negligible effect within the range of parameters we considered. The main benefit of resilience from fallback comes from the delay in limiting mitigation. Third, patrol UEs experience negligible collateral impact, with handover rates close to terrestrial baselines. Stress scenarios further highlight that fallback is indispensable in preventing extreme control‑plane and physical security vulnerabilities: Without fallback, prolonged outages in the satellite backhaul delay lockdown commands, allowing rogue UAVs to linger inside restricted corridors for several seconds longer. These results underscore the importance of complementing non‑terrestrial links with local control to ensure robust and timely response against uncooperative UAV intrusions.
Authors: José Manuel Rúa-Estévez, Alicia Meleiro-Estévez, Pablo Fondo-Ferreiro, Felipe Gil-Castiñeira, Brais Sánchez-Rama, Lois Gomez-Gonzalez
Abstract: This work presents a simulator designed for the validation, evaluation, and demonstration of flying adhoc networks (FANETs) using 5G vehicle‑to‑everything (V2X) communications and the named‑data networking (NDN) paradigm. The simulator integrates the ns‑3 network simulator and the Zenoh NDN protocol, enabling realistic testing of applications that involve the multi‑hop communication among multiple unmanned aerial vehicles (UAVs).
Authors: Chen Chen, Kangcheng Bin, Ting Hu, Jiahao Qi, Xingyue Liu, Tianpeng Liu, Zhen Liu, Yongxiang Liu, Ping Zhong
Abstract: Unmanned aerial vehicles (UAV)‑based object detection with visible (RGB) and infrared (IR) images facilitates robust around‑the‑clock detection, driven by advancements in deep learning techniques and the availability of high‑quality dataset. However, the existing dataset struggles to fully capture real‑world complexity for limited imaging conditions. To this end, we introduce a high‑diversity dataset ATR‑UMOD covering varying scenarios, spanning altitudes from 80m to 300m, angles from 0° to 75°, and all‑day, all‑year time variations in rich weather and illumination conditions. Moreover, each RGB‑IR image pair is annotated with 6 condition attributes, offering valuable high‑level contextual information. To meet the challenge raised by such diverse conditions, we propose a novel prompt‑guided condition‑aware dynamic fusion (PCDF) to adaptively reassign multimodal contributions by leveraging annotated condition cues. By encoding imaging conditions as text prompts, PCDF effectively models the relationship between conditions and multimodal contributions through a task‑specific soft‑gating transformation. A prompt‑guided condition‑decoupling module further ensures the availability in practice without condition annotations. Experiments on ATR‑UMOD dataset reveal the effectiveness of PCDF.
Authors: Francesco Barbato, Matteo Caligiuri, Pietro Zanuttigh
Abstract: The development of computer vision algorithms for Unmanned Aerial Vehicle (UAV) applications in urban environments heavily relies on the availability of large‑scale datasets with accurate annotations. However, collecting and annotating real‑world UAV data is extremely challenging and costly. To address this limitation, we present FlyAwareV2, a novel multimodal dataset encompassing both real and synthetic UAV imagery tailored for urban scene understanding tasks. Building upon the recently introduced SynDrone and FlyAware datasets, FlyAwareV2 introduces several new key contributions: 1) Multimodal data (RGB, depth, semantic labels) across diverse environmental conditions including varying weather and daytime; 2) Depth maps for real samples computed via state‑of‑the‑art monocular depth estimation; 3) Benchmarks for RGB and multimodal semantic segmentation on standard architectures; 4) Studies on synthetic‑to‑real domain adaptation to assess the generalization capabilities of models trained on the synthetic data. With its rich set of annotations and environmental diversity, FlyAwareV2 provides a valuable resource for research on UAV‑based 3D urban scene understanding.
Authors: Shahab Ataei, Dipankar Maity, Debdipta Goswami
Abstract: Cloud‑assisted system identification and control have emerged as practical solutions for low‑power, resource‑constrained control systems such as micro‑UAVs. In a typical cloud‑assisted setting, state and input data are transmitted from local agents to a central computer over low‑bandwidth wireless links, leading to quantization. This paper investigates the impact of state and input data quantization on a linear time invariant (LTI) system identification, derives a worst‑case bound on the identification error, and develops a robust controller for guaranteed cost control. We establish a fundamental bound on the model error that depends only on the quantized data and quantization resolution, and develop a linear matrix inequality (LMI) based guaranteed cost robust controller under this error bound.
Authors: Pavel Pochobradský, Ondřej Procházka, Robert Pěnička, Vojtěch Vonásek, Martin Saska
Abstract: In this letter, we introduce Geometric Model Predictive Path Integral (GMPPI), a sampling‑based controller capable of tracking agile trajectories while avoiding obstacles. In each iteration, GMPPI generates a large number of candidate rollout trajectories and then averages them to create a nominal control to be followed by the controlled Unmanned Aerial Vehicle (UAV). Classical Model Predictive Path Integral (MPPI) faces a trade‑off between tracking precision and obstacle avoidance; high‑noise random rollouts are inefficient for tracking but necessary for collision avoidance. To this end, we propose leveraging geometric SE(3) control to generate a portion of GMPPI rollouts. To maximize their benefit, we introduce a UAV‑tailored cost function balancing tracking performance with obstacle avoidance. All generated rollouts are projected onto depth images for collision avoidance, representing, to our knowledge, the first method utilizing depth data directly in a UAV MPPI loop. Simulations show GMPPI matches the tracking error of an obstacle‑blind geometric controller while exceeding the avoidance capabilities of state‑of‑the‑art planners and learning‑based controllers. Real‑world experiments demonstrate flight at speeds up to 17 m/s and obstacle avoidance up to 10 m/s.
Authors: Zefu Lin, Wenbo Chen, Xiaojuan Jin, Yuran Yang, Lue Fan, Yixin Zhang, Yufeng Zhang, Zhaoxiang Zhang
Abstract: Unmanned Aerial Vehicle (UAV) swarm systems necessitate efficient collaborative perception mechanisms for diverse operational scenarios. Current Bird's Eye View (BEV)‑based approaches exhibit two main limitations: bounding‑box representations fail to capture complete semantic and geometric information of the scene, and their performance significantly degrades when encountering undefined or occluded objects. To address these limitations, we propose a novel multi‑UAV collaborative occupancy prediction framework. Our framework effectively preserves 3D spatial structures and semantics through integrating a Spatial‑Aware Feature Encoder and Cross‑Agent Feature Integration. To enhance efficiency, we further introduce Altitude‑Aware Feature Reduction to compactly represent scene information, along with a Dual‑Mask Perceptual Guidance mechanism to adaptively select features and reduce communication overhead. Due to the absence of suitable benchmark datasets, we extend three datasets for evaluation: two virtual datasets (Air‑to‑Pred‑Occ and UAV3D‑Occ) and one real‑world dataset (GauUScene‑Occ). Experiments results demonstrate that our method achieves state‑of‑the‑art accuracy, significantly outperforming existing collaborative methods while reducing communication overhead to only a fraction of previous approaches.
Authors: Qizhi Guo, Siyuan Yang, Junning Lyu, Jianjun Sun, Defu Lin, Shaoming He
Abstract: Accurate and robust heading estimation is crucial for unmanned aerial vehicles (UAVs) when conducting indoor inspection tasks. However, the cluttered nature of indoor environments often introduces severe magnetic disturbances, which can significantly degrade heading accuracy. To address this challenge, this paper presents an Adaptive MARG‑Only Heading (AMO‑HEAD) estimation approach for UAVs operating in magnetically disturbed environments. AMO‑HEAD is a lightweight and computationally efficient Extended Kalman Filter (EKF) framework that leverages inertial and magnetic sensors to achieve reliable heading estimation. In the proposed approach, gyroscope angular rate measurements are integrated to propagate the quaternion state, which is subsequently corrected using accelerometer and magnetometer data. The corrected quaternion is then used to compute the UAV's heading. An adaptive process noise covariance method is introduced to model and compensate for gyroscope measurement noise, bias drift, and discretization errors arising from the Euler method integration. To mitigate the effects of external magnetic disturbances, a scaling factor is applied based on real‑time magnetic deviation detection. A theoretical observability analysis of the proposed AMO‑HEAD is performed using the Lie derivative. Extensive experiments were conducted in real world indoor environments with customized UAV platforms. The results demonstrate the effectiveness of the proposed algorithm in providing precise heading estimation under magnetically disturbed conditions.
Authors: Dhrumil Bhatt, Siddharth Penumatsa, Vidushi Kumar
Abstract: Flying Ad Hoc Networks (FANETs) present unique challenges due to high node mobility, dynamic topologies, and strict resource constraints. Existing routing protocols often optimize for a single metric, such as path length or energy, while neglecting the complex dependencies between network performance, security, and MAC layer efficiency. This paper introduces a novel hardware software co design framework for secure and adaptive UAV swarm communications, featuring an energy aware protocol stack. The architecture employs a multicast, clustered organization where routing decisions integrate dynamic trust scores, historical link quality, and internodal distance. A hybrid MAC protocol combines contention based and scheduled channel access for optimized throughput. Security is ensured through a zero trust model that fuses cryptographic authentication with a behavioral reputation system, alongside hardware accelerated AES GCM encryption. Comparative analysis in an NS 3 simulation environment demonstrates the framework's superiority in packet delivery ratio, latency, resilience, and overhead, providing a scalable foundation for high performance swarm operations.
Authors: Zixu Zhao, Yang Zhan
Abstract: Unmanned aerial vehicles (UAVs) have become powerful platforms for real‑time, high‑resolution data collection, producing massive volumes of aerial videos. Efficient retrieval of relevant content from these videos is crucial for applications in urban management, emergency response, security, and disaster relief. While text‑video retrieval has advanced in natural video domains, the UAV domain remains underexplored due to limitations in existing datasets, such as coarse and redundant captions. Thus, in this work, we construct the Drone Video‑Text Match Dataset (DVTMD), which contains 2,864 videos and 14,320 fine‑grained, semantically diverse captions. The annotations capture multiple complementary aspects, including human actions, objects, background settings, environmental conditions, and visual style, thereby enhancing text‑video correspondence and reducing redundancy. Building on this dataset, we propose the Text‑Conditioned Multi‑granularity Alignment (TCMA) framework, which integrates global video‑sentence alignment, sentence‑guided frame aggregation, and word‑guided patch alignment. To further refine local alignment, we design a Word and Patch Selection module that filters irrelevant content, as well as a Text‑Adaptive Dynamic Temperature Mechanism that adapts attention sharpness to text type. Extensive experiments on DVTMD and CapERA establish the first complete benchmark for drone text‑video retrieval. Our TCMA achieves state‑of‑the‑art performance, including 45.5% R@1 in text‑to‑video and 42.8% R@1 in video‑to‑text retrieval, demonstrating the effectiveness of our dataset and method. The code and dataset will be released.
Authors: Hongxing Peng, Haopei Xie, Weijia Lia, Huanai Liuc, Ximing Li
Abstract: Litchi is a high‑value fruit, yet traditional manual selection methods are increasingly inadequate for modern production demands. Integrating UAV‑based aerial imagery with deep learning offers a promising solution to enhance efficiency and reduce costs. This paper introduces YOLOv11‑Litchi, a lightweight and robust detection model specifically designed for UAV‑based litchi detection. Built upon the YOLOv11 framework, the proposed model addresses key challenges such as small target size, large model parameters hindering deployment, and frequent target occlusion. To tackle these issues, three major innovations are incorporated: a multi‑scale residual module to improve contextual feature extraction across scales, a lightweight feature fusion method to reduce model size and computational costs while maintaining high accuracy, and a litchi occlusion detection head to mitigate occlusion effects by emphasizing target regions and suppressing background interference. Experimental results validate the model's effectiveness. YOLOv11‑Litchi achieves a parameter size of 6.35 MB ‑ 32.5% smaller than the YOLOv11 baseline ‑ while improving mAP by 2.5% to 90.1% and F1‑Score by 1.4% to 85.5%. Additionally, the model achieves a frame rate of 57.2 FPS, meeting real‑time detection requirements. These findings demonstrate the suitability of YOLOv11‑Litchi for UAV‑based litchi detection in complex orchard environments, showcasing its potential for broader applications in precision agriculture.
Authors: Aniruddha Srinivas Joshi, Godwyn James William, Shreyas Srinivas Joshi
Abstract: Accurate fire and smoke detection is critical for safety and disaster response, yet existing vision‑based methods face challenges in balancing efficiency and reliability. Compact deep learning models such as YOLOv5n and YOLOv8n are widely adopted for deployment on UAVs, CCTV systems, and IoT devices, but their reduced capacity often results in false positives and missed detections. Conventional post‑detection methods such as Non‑Maximum Suppression and Soft‑NMS rely only on spatial overlap, which can suppress true positives or retain false alarms in cluttered or ambiguous fire scenes. To address these limitations, we propose an uncertainty aware post‑detection framework that rescales detection confidences using both statistical uncertainty and domain relevant visual cues. A lightweight Confidence Refinement Network integrates uncertainty estimates with color, edge, and texture features to adjust detection scores without modifying the base model. Experiments on the D‑Fire dataset demonstrate improved precision, recall, and mean average precision compared to existing baselines, with only modest computational overhead. These results highlight the effectiveness of post‑detection rescoring in enhancing the robustness of compact deep learning models for real‑world fire and smoke detection.
Authors: Pîrvu Mihai-Cristian, Marius Leordeanu
Abstract: The computer vision domain has greatly benefited from an abundance of data across many modalities to improve on various visual tasks. Recently, there has been a lot of focus on self‑supervised pre‑training methods through Masked Autoencoders (MAE) \citehe2022masked,bachmann2022multimae, usually used as a first step before optimizing for a downstream task, such as classification or regression. This is very useful as it doesn't require any manually labeled data. In this work, we introduce Probabilistic Hyper‑Graphs using Masked Autoencoders (PHG‑MAE): a novel model that unifies the classical work on neural graphs \citeleordeanu2021semi with the modern approach of masked autoencoders under a common theoretical framework. Through random masking of entire modalities, not just patches, the model samples from the distribution of hyper‑edges on each forward pass. Additionally, the model adapts the standard MAE algorithm by combining pre‑training and fine‑tuning into a single training loop. Moreover, our approach enables the creation of inference‑time ensembles which, through aggregation, boost the final prediction performance and consistency. Lastly, we show that we can apply knowledge distillation on top of the ensembles with little loss in performance, even with models that have fewer than 1M parameters. While our work mostly focuses on outdoor UAV scenes that contain multiple world interpretations and modalities, the same steps can be followed in other similar domains, such as autonomous driving or indoor robotics. In order to streamline the process of integrating external pre‑trained experts for computer vision multi‑modal multi‑task learning (MTL) scenarios, we developed a data‑pipeline software. Using this tool, we have created and released a fully‑automated extension of the Dronescapes dataset. All the technical details, code and reproduction steps are publicly released.
Authors: Yang Li, Ruichen Zhang, Yinqiu Liu, Guangyuan Liu, Dusit Niyato, Abbas Jamalipour, Xianbin Wang, Dong In Kim
Abstract: The rapid advancement of Low‑Altitude Economy Networks (LAENets) has enabled a variety of applications, including aerial surveillance, environmental sensing, and semantic data collection. To support these scenarios, unmanned aerial vehicles (UAVs) equipped with onboard vision‑language models (VLMs) offer a promising solution for real‑time multimodal inference. However, ensuring both inference accuracy and communication efficiency remains a significant challenge due to limited onboard resources and dynamic network conditions. In this paper, we first propose a UAV‑enabled LAENet system model that jointly captures UAV mobility, user‑UAV communication, and the onboard visual question answering (VQA) pipeline. Based on this model, we formulate a mixed‑integer non‑convex optimization problem to minimize task latency and power consumption under user‑specific accuracy constraints. To solve the problem, we design a hierarchical optimization framework composed of two parts: (i) an Alternating Resolution and Power Optimization (ARPO) algorithm for resource allocation under accuracy constraints, and (ii) a Large Language Model‑augmented Reinforcement Learning Approach (LLaRA) for adaptive UAV trajectory optimization. The large language model (LLM) serves as an expert in refining reward design of reinforcement learning in an offline fashion, introducing no additional latency in real‑time decision‑making. Numerical results demonstrate the efficacy of our proposed framework in improving inference performance and communication efficiency under dynamic LAENet conditions.
Authors: D. V. Brovko
Abstract: The relevance of this research lies in the growing demand for unmanned aerial vehicles (UAVs) capable of operating reliably in complex environments where conventional navigation becomes unreliable due to interference, poor visibility, or camouflage. Hyperspectral imaging (HSI) provides unique opportunities for UAV‑based computer vision by enabling fine‑grained material recognition and object differentiation, which are critical for navigation, surveillance, agriculture, and environmental monitoring. The aim of this work is to develop a deep learning architecture integrating HSI into UAV perception for navigation, object detection, and terrain classification. Objectives include: reviewing existing HSI methods, designing a hybrid 2D/3D convolutional architecture with spectral‑spatial cross‑attention, training, and benchmarking. The methodology is based on the modification of the Mobile 3D Vision Transformer (MDvT) by introducing the proposed SpectralCA block. This block employs bi‑directional cross‑attention to fuse spectral and spatial features, enhancing accuracy while reducing parameters and inference time. Experimental evaluation was conducted on the WHU‑Hi‑HongHu dataset, with results assessed using Overall Accuracy, Average Accuracy, and the Kappa coefficient. The findings confirm that the proposed architecture improves UAV perception efficiency, enabling real‑time operation for navigation, object recognition, and environmental monitoring tasks.
Keywords: SpectralCA, deep learning, computer vision, hyperspectral imaging, unmanned aerial vehicle, object detection, semi‑supervised learning.
Authors: Sajad Khatiri, Francisco Eli Vina Barrientos, Maximilian Wulf, Paolo Tonella, Sebastiano Panichella
Abstract: Ensuring robust robotic navigation in dynamic environments is a key challenge, as traditional testing methods often struggle to cover the full spectrum of operational requirements. This paper presents the industrial adoption of Surrealist, a simulation‑based test generation framework originally for UAVs, now applied to the ANYmal quadrupedal robot for industrial inspection. Our method uses a search‑based algorithm to automatically generate challenging obstacle avoidance scenarios, uncovering failures often missed by manual testing. In a pilot phase, generated test suites revealed critical weaknesses in one experimental algorithm (40.3% success rate) and served as an effective benchmark to prove the superior robustness of another (71.2% success rate). The framework was then integrated into the ANYbotics workflow for a six‑month industrial evaluation, where it was used to test five proprietary algorithms. A formal survey confirmed its value, showing it enhances the development process, uncovers critical failures, provides objective benchmarks, and strengthens the overall verification pipeline.
Authors: Juanqin Liu, Leonardo Plotegher, Eloy Roura, Shaoming He
Abstract: The extensive application of unmanned aerial vehicles (UAVs) in military reconnaissance, environmental monitoring, and related domains has created an urgent need for accurate and efficient multi‑object tracking (MOT) technologies, which are also essential for UAV situational awareness. However, complex backgrounds, small‑scale targets, and frequent occlusions and interactions continue to challenge existing methods in terms of detection accuracy and trajectory continuity. To address these issues, this paper proposes the Global‑Local Detection and Tracking (GL‑DT) framework. It employs a Spatio‑Temporal Feature Fusion (STFF) module to jointly model motion and appearance features, combined with a global‑local collaborative detection strategy, effectively enhancing small‑target detection. Building upon this, the JPTrack tracking algorithm is introduced to mitigate common issues such as ID switches and trajectory fragmentation. Experimental results demonstrate that the proposed approach significantly improves the continuity and stability of MOT while maintaining real‑time performance, providing strong support for the advancement of UAV detection and tracking technologies.
Authors: Tianhao Liang, Mu Jia, Tingting Zhang, Junting Chen, Longyu Zhou, Tony Q. S. Quek, Pooi-Yuen Kam
Abstract: The rapid growth of the low‑altitude economy has resulted in a significant increase in the number of Low, slow, and small (LLS) unmanned aerial vehicles (UAVs), raising critical challenges for secure airspace management and reliable trajectory planning. To address this, this paper proposes a cooperative radio‑frequency (RF) detection and localization framework that leverages existing cellular base stations. The proposed approach features a robust scheme for LSS target identification, integrating a cell averaging‑constant false alarm rate (CA‑CFAR) detector with a micro‑Doppler signature (MDS) based recognition method. Multi‑station measurements are fused through a grid‑based probabilistic algorithm combined with clustering techniques, effectively mitigating ghost targets and improving localization accuracy in multi‑UAV scenarios. Furthermore, the Cramer‑Rao lower bound (CRLB) is derived as a performance benchmark and reinforcement learning (RL)‑based optimization is employed to balance localization accuracy against station resource usage. Simulations demonstrate that increasing from one to multiple BSs reduces the positioning error to near the CRLB, while practical experiments further verify the framework's effectiveness. Furthermore, our RL‑based optimization can find solutions that maintain high accuracy while minimizing resource usage, highlighting its potential as a scalable solution for ensuring airspace safety in the emerging low‑altitude economy.
Authors: Victor Victor, Tania Krisanty, Matthew McGinity, Stefan Gumhold, Uwe Aßmann
Abstract: As the markets for unmanned aerial vehicles (UAVs) and mixed reality (MR) headsets continue to grow, recent research has increasingly explored their integration, which enables more intuitive, immersive, and situationally aware control systems. We present IGUANA, an MR‑based immersive guidance, navigation, and control system for consumer UAVs. IGUANA introduces three key elements beyond conventional control interfaces: (1) a 3D terrain map interface with draggable waypoint markers and live camera preview for high‑level control, (2) a novel spatial control metaphor that uses a virtual ball as a physical analogy for low‑level control, and (3) a spatial overlay that helps track the UAV when it is not visible with the naked eye or visual line of sight is interrupted. We conducted a user study to evaluate our design, both quantitatively and qualitatively, and found that (1) the 3D map interface is intuitive and easy to use, relieving users from manual control and suggesting improved accuracy and consistency with lower perceived workload relative to conventional dual‑stick controller, (2) the virtual ball interface is intuitive but limited by the lack of physical feedback, and (3) the spatial overlay is very useful in enhancing the users' situational awareness.
Authors: Dewi Endah Kharismawati, Toni Kazic
Abstract: Accurate maize stand counts are essential for crop management and research, informing yield prediction, planting density optimization, and early detection of germination issues. Manual counting is labor‑intensive, slow, and error‑prone, especially across large or variable fields. We present MaizeStandCounting (MaSC), a robust algorithm for automated maize seedling stand counting from RGB imagery captured by low‑cost UAVs and processed on affordable hardware. MaSC operates in two modes: (1) mosaic images divided into patches, and (2) raw video frames aligned using homography matrices. Both modes use a lightweight YOLOv9 model trained to detect maize seedlings from V2‑V10 growth stages. MaSC distinguishes maize from weeds and other vegetation, then performs row and range segmentation based on the spatial distribution of detections to produce precise row‑wise stand counts. Evaluation against in‑field manual counts from our 2024 summer nursery showed strong agreement with ground truth (R^2= 0.616 for mosaics, R^2 = 0.906 for raw frames). MaSC processed 83 full‑resolution frames in 60.63 s, including inference and post‑processing, highlighting its potential for real‑time operation. These results demonstrate MaSC's effectiveness as a scalable, low‑cost, and accurate tool for automated maize stand counting in both research and production environments.
Authors: Tiago Silva, António Grilo
Abstract: In recent years, Unmanned Aerial Vehicles (UAVs) have brought a new true revolution to military tactics. While UAVs already constitute an advantage when operating alone, multi‑UAV swarms expand the available possibilities, allowing the UAVs to collaborate and support each other as a team to carry out a given task. This entails the capability to exchange information related with situation awareness and action coordination by means of a suitable wireless communication technology. In such scenario, the adversary is expected to disrupt communications by jamming the communication channel. The latter becomes the Achilles heel of the swarm. While anti‑jamming techniques constitute a well covered topic in the literature, the use of intelligent swarm behaviors to leverage those techniques is still an open research issue.
This paper explores the use of Genetic Algorithms (GAs) to jointly optimize UAV swarm formation, beam‑steering antennas and traffic routing in order to mitigate the effect of jamming in the main coordination channel, under the assumption that a more robust and low data rate channel is used for formation management signaling. Simulation results show the effectiveness of proposed approach. However, the significant computational cost paves the way for further research.
Authors: Fengze Xie, Xiaozhou Fan, Jacob Schuster, Yisong Yue, Morteza Gharib
Abstract: Fixed‑wing unmanned aerial vehicles (UAVs) offer endurance and efficiency but lack low‑speed agility due to highly coupled dynamics. We present an end‑to‑end sensing‑to‑control pipeline that combines bio‑inspired hardware, physics‑informed dynamics learning, and convex control allocation. Measuring airflow on a small airframe is difficult because near‑body aerodynamics, propeller slipstream, control‑surface actuation, and ambient gusts distort pressure signals. Inspired by the narwhal's protruding tusk, we mount in‑house multi‑hole probes far upstream and complement them with sparse, carefully placed wing pressure sensors for local flow measurement. A data‑driven calibration maps probe pressures to airspeed and flow angles. We then learn a control‑affine dynamics model using the estimated airspeed/angles and sparse sensors. A soft left/right symmetry regularizer improves identifiability under partial observability and limits confounding between wing pressures and flaperon inputs. Desired wrenches (forces and moments) are realized by a regularized least‑squares allocator that yields smooth, trimmed actuation. Wind‑tunnel studies across a wide operating range show that adding wing pressures reduces force‑estimation error by 25‑30%, the proposed model degrades less under distribution shift (about 12% versus 44% for an unstructured baseline), and force tracking improves with smoother inputs, including a 27% reduction in normal‑force RMSE versus a plain affine model and 34% versus an unstructured baseline.
Authors: Fangzhou Zhao, Yao Sun, Jianglin Lan, Lan Zhang, Xuesong Liu, Muhammad Ali Imran
Abstract: Effective path planning is fundamental to the coordination of unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs) systems, particularly in applications such as surveillance, navigation, and emergency response. Combining UAVs' broad field of view with UGVs' ground‑level operational capability greatly improve the likelihood of successfully achieving task objectives such as locating victims, monitoring target areas, or navigating hazardous terrain. In complex environments, UAVs need to provide precise environmental perception information for UGVs to optimize their routing policy. However, due to severe interference and non‑line‑of‑sight conditions, wireless communication is often unstable in such complex environments, making it difficult to support timely and accurate path planning for UAV‑UGV coordination. To this end, this paper proposes a semantic communication (SemCom) framework to enhance UAV/UGV cooperative path planning under unreliable wireless conditions. Unlike traditional methods that transmit raw data, SemCom transmits only the key information for path planning, reducing transmission volume without sacrificing accuracy. The proposed framework is developed by defining key semantics for path planning and designing a transceiver for meeting the requirements of UAV‑UGV cooperative path planning. Simulation results show that, compared to conventional SemCom transceivers, the proposed transceiver significantly reduces data transmission volume while maintaining path planning accuracy, thereby enhancing system collaboration efficiency.
Authors: Praveen Kumar Ranjan, Abhinav Sinha, Yongcan Cao
Abstract: This paper presents a nonlinear integrated guidance and control (IGC) approach for flexible leader‑follower formation flight of fixed‑wing unmanned aerial vehicles (UAVs) while accounting for high‑fidelity aerodynamics and thrust dynamics. Unlike conventional leader‑follower schemes that fix the follower's position relative to the leader, the follower is steered to maintain range and bearing angles (which is the angle between its velocity vector and its line‑of‑sight (LOS) with respect to the leader) arbitrarily close to the prescribed values, enabling the follower to maintain formation on a hemispherical region behind the leader. The proposed IGC framework directly maps leader‑follower relative range dynamics to throttle commands, and the follower's velocity orientation relative to the LOS to aerodynamic control surface deflections. This enables synergism between guidance and control subsystems. The control design uses a dynamic surface control‑based backstepping approach to achieve convergence to the desired formation set, where Lyapunov barrier functions are incorporated to ensure the follower's bearing angle is constrained within specified bounds. Rigorous stability analysis guarantees uniform ultimate boundedness of all error states and strict constraint satisfaction in the presence of aerodynamic nonlinearities. The proposed flexible formation scheme allows the follower to have an orientation mismatch relative to the leader to execute anticipatory reconfiguration by transitioning between the relative positions in the admissible formation set when the leader aggressively maneuvers. The proposed IGC law relies only on relative information and onboard sensors without the information about the leader's maneuver, making it suitable for GPS‑denied or non‑cooperative scenarios. Finally, we present simulation results to vindicate the effectiveness and robustness of our approach.
Authors: Kürşat Tekbıyık, Güneş Karabulut Kurt, Antoine Lesage-Landry
Abstract: Unmanned aerial vehicle (UAV) communications demand accurate yet interpretable air‑to‑ground (A2G) channel models that can adapt to nonstationary propagation environments. While deterministic models offer interpretability and deep learning (DL) models provide accuracy, both approaches suffer from either rigidity or a lack of explainability. To bridge this gap, we propose the Physics‑Inspired Kolmogorov‑Arnold Network (PIKAN) that embeds physical principles (e.g., free‑space path loss, two‑ray reflections) into the learning process. Unlike physics‑informed neural networks (PINNs), PIKAN is more flexible for applying physical information because it introduces them as flexible inductive biases. Thus, it enables a more flexible training process. Experiments on UAV A2G measurement data show that PIKAN achieves comparable accuracy to DL models while providing symbolic and explainable expressions aligned with propagation laws. Remarkably, PIKAN achieves this performance with only 232 parameters, making it up to 37 times lighter than multilayer perceptron (MLP) baselines with thousands of parameters, without sacrificing correlation with measurements and also providing symbolic expressions. These results highlight PIKAN as an efficient, interpretable, and scalable solution for UAV channel modelling in beyond‑5G and 6G networks.
Authors: Yousef Emami, Seyedsina Nabavirazavi, Jingjing Zheng, Hao Zhou, Miguel Gutierrez Gaitan, Kai Li, Luis Almeida
Abstract: Recently, Unmanned Aerial Vehicles (UAVs) are increasingly being investigated to collect sensory data in post‑disaster monitoring scenarios, such as tsunamis, where early actions are critical to limit coastal damage. A major challenge is to design the data collection schedules and flight velocities, as unfavorable schedules and velocities can lead to transmission errors and buffer overflows of the ground sensors, ultimately resulting in significant packet loss. Meanwhile, online Deep Reinforcement Learning (DRL) solutions have a complex training process and a mismatch between simulation and reality that does not meet the urgent requirements of tsunami monitoring. Recent advances in Large Language Models (LLMs) offer a compelling alternative. With their strong reasoning and generalization capabilities, LLMs can adapt to new tasks through In‑Context Learning (ICL), which enables task adaptation through natural language prompts and example‑based guidance without retraining. However, LLM models have input data limitations and thus require customized approaches. In this paper, a joint optimization of data collection schedules and velocities control for multiple UAVs is proposed to minimize data loss. The battery level of the ground sensors, the length of the queues, and the channel conditions, as well as the trajectories of the UAVs, are taken into account. Attention‑Based In‑Context Learning for Velocity Control and Data Collection Schedule (AIC‑VDS) is proposed as an alternative to DRL in emergencies. The simulation results show that the proposed AIC‑VDS outperforms both the Deep‑Q‑Network (DQN) and maximum channel gain baselines.
Authors: Junfeng Cai, Marco Lovera
Abstract: A novel active fault‑tolerant control (AFTC) scheme for a dual‑system vertical takeoff and landing (VTOL) unmanned aerial vehicle (UAV) during transition flight is proposed in this paper. The AFTC scheme is composed of a baseline control law and an online control reallocation module. First, the structured H_\infty baseline control law is able to guarantee the stability of closed‑loop systems without being reconfigured under simultaneous actuator fault conditions. Second, compared to the existing mainstream method of sliding mode control that is a discontinuous control strategy, the AFTC scheme can effectively avoid control chattering problem by adopting the structured H_\infty baseline control law. Third, an online control allocation (CA) module is implemented to carry out a unified CA for all the available actuators. When actuator faults/failures occur, the CA matrix is updated according to fault information and real‑time airspeed, which is able to redistribute the virtual control signals to the remaining healthy actuators, avoiding significant performance degradation. Based on the developed AFTC scheme, symmetric and non‑symmetric actuator fault scenarios are simulated on a nonlinear six‑degree‑of‑freedom simulator, where the cases of merely structured H_\infty control and structured H_\infty based AFTC are compared and analyzed. The results show that the proposed structured H_\infty based AFTC system is capable of handling more complicated fault scenarios and model uncertainties with no need to reconfigure the baseline control law. The proposed AFTC scheme significantly improves the safety and reliability of the transition flight of dual‑system VTOL UAVs.
Authors: André Coelho, Pedro Ribeiro, Helder Fontes, Rui Campos
Abstract: This position paper presents A4FN, an Agentic Artificial Intelligence (AI) architecture for intent‑driven automation in Flying Networks (FNs) using Unmanned Aerial Vehicles (UAVs) as access nodes. A4FN leverages Generative AI and Large Language Models (LLMs) to enable real‑time, context‑aware network control via a distributed agentic system. It comprises two components: the Perception Agent (PA), which semantically interprets multimodal input ‑‑ including imagery, audio, and telemetry data ‑‑ from UAV‑mounted sensors to derive Service Level Specifications (SLSs); and the Decision‑and‑Action Agent (DAA), which reconfigures the network based on inferred intents. A4FN embodies key properties of Agentic AI, including autonomy, goal‑driven reasoning, and continuous perception‑action cycles. Designed for mission‑critical, infrastructure‑limited scenarios such as disaster response, it supports adaptive reconfiguration, dynamic resource management, and interoperability with emerging wireless technologies. The paper details the A4FN architecture, its core innovations, and open research challenges in multi‑agent coordination and Agentic AI integration in next‑generation FNs.
Authors: Duanjiao Li, Yun Chen, Ying Zhang, Junwen Yao, Dongyue Huang, Jianguo Zhang, Ning Ding
Abstract: For typical applications of UAVs in power grid scenarios, we construct the problem as planning UAV trajectories for coverage in cluttered environments. In this paper, we propose an optimal smooth coverage trajectory planning algorithm. The algorithm consists of two stages. In the front‑end, a Genetic Algorithm (GA) is employed to solve the Traveling Salesman Problem (TSP) for Points of Interest (POIs), generating an initial sequence of optimized visiting points. In the back‑end, the sequence is further optimized by considering trajectory smoothness, time consumption, and obstacle avoidance. This is formulated as a nonlinear least squares problem and solved to produce a smooth coverage trajectory that satisfies these constraints. Numerical simulations validate the effectiveness of the proposed algorithm, ensuring UAVs can smoothly cover all POIs in cluttered environments.
Authors: Amir Habel, Fawad Mehboob, Jeffrin Sam, Clement Fortin, Dzmitry Tsetserukou
Abstract: Achieving precise lateral motion modeling and decoupled control in hover remains a significant challenge for tail‑sitter Unmanned Aerial Vehicles (UAVs), primarily due to complex aerodynamic couplings and the absence of welldefined lateral dynamics. This paper presents a novel modeling and control strategy that enhances yaw authority and lateral motion by introducing a sideslip force model derived from differential propeller slipstream effects acting on the fuselage under differential thrust. The resulting lateral force along the body y‑axis enables yaw‑based lateral position control without inducing roll coupling. The control framework employs a YXZ Euler rotation formulation to accurately represent attitude and incorporate gravitational components while directly controlling yaw in the yaxis, thereby improving lateral dynamic behavior and avoiding singularities. The proposed approach is validated through trajectory‑tracking simulations conducted in a Unity‑based environment. Tests on both rectangular and circular paths in hover mode demonstrate stable performance, with low mean absolute position errors and yaw deviations constrained within 5.688 degrees. These results confirm the effectiveness of the proposed lateral force generation model and provide a foundation for the development of agile, hover‑capable tail‑sitter UAVs.
Authors: Tarun Kumar Biswas, Ashrafun Zannat, Waqas Ishtiaq, Md. Alamgir Hossain
Abstract: The growing integration of drones across commercial, industrial, and civilian domains has introduced significant cybersecurity challenges, particularly due to the susceptibility of drone networks to a wide range of cyberattacks. Existing intrusion detection mechanisms often lack the adaptability, efficiency, and generalizability required for the dynamic and resource constrained environments in which drones operate. This paper proposes TSLT‑Net, a novel lightweight and unified Temporal Spatial Transformer based intrusion detection system tailored specifically for drone networks. By leveraging self attention mechanisms, TSLT‑Net effectively models both temporal patterns and spatial dependencies in network traffic, enabling accurate detection of diverse intrusion types. The framework includes a streamlined preprocessing pipeline and supports both multiclass attack classification and binary anomaly detection within a single architecture. Extensive experiments conducted on the ISOT Drone Anomaly Detection Dataset, consisting of more than 2.3 million labeled records, demonstrate the superior performance of TSLT‑Net with 99.99 percent accuracy in multiclass detection and 100 percent in binary anomaly detection, while maintaining a minimal memory footprint of only 0.04 MB and 9722 trainable parameters. These results establish TSLT‑Net as an effective and scalable solution for real time drone cybersecurity, particularly suitable for deployment on edge devices in mission critical UAV systems.
Authors: Sagar Lekhak, Emmett J. Ientilucci, Jasper Baur, Susmita Ghosh
Abstract: This paper introduces a novel benchmark dataset of Visible and Near‑Infrared (VNIR) hyperspectral imagery acquired via an unmanned aerial vehicle (UAV) platform for landmine and unexploded ordnance (UXO) detection research. The dataset was collected over a controlled test field seeded with 143 realistic surrogate landmine and UXO targets, including surface, partially buried, and fully buried configurations. Data acquisition was performed using a Headwall Nano‑Hyperspec sensor mounted on a multi‑sensor drone platform, flown at an altitude of approximately 20.6 m, capturing 270 contiguous spectral bands spanning 398‑1002 nm. Radiometric calibration, orthorectification, and mosaicking were performed followed by reflectance retrieval using a two‑point Empirical Line Method (ELM), with reference spectra acquired using an SVC spectroradiometer. Cross‑validation against six reference objects yielded RMSE values below 1.0 and SAM values between 1 and 6 degrees in the 400‑900 nm range, demonstrating high spectral fidelity. The dataset is released alongside raw radiance cubes, GCP/AeroPoint data, and reference spectra to support reproducible research. This contribution fills a critical gap in open‑access UAV‑based hyperspectral data for landmine detection and offers a multi‑sensor benchmark when combined with previously published drone‑based electromagnetic induction (EMI) data from the same test field.
Authors: Yan Miao, Ege Yuceel, Georgios Fainekos, Bardh Hoxha, Hideki Okamoto, Sayan Mitra
Abstract: Visual policy design is crucial for aerial navigation. However, state‑of‑the‑art visual policies often overfit to a single track and their performance degrades when track geometry changes. We develop FalconGym 2.0, a photorealistic simulation framework built on Gaussian Splatting (GSplat) with an Edit API that programmatically generates diverse static and dynamic tracks in milliseconds. Leveraging FalconGym 2.0's editability, we propose a Performance‑Guided Refinement (PGR) algorithm, which concentrates visual policy's training on challenging tracks while iteratively improving its performance. Across two case studies (fixed‑wing UAVs and quadrotors) with distinct dynamics and environments, we show that a single visual policy trained with PGR in FalconGym 2.0 outperforms state‑of‑the‑art baselines in generalization and robustness: it generalizes to three unseen tracks with 100% success without per‑track retraining and maintains higher success rates under gate‑pose perturbations. Finally, we demonstrate that the visual policy trained with PGR in FalconGym 2.0 can be zero‑shot sim‑to‑real transferred to a quadrotor hardware, achieving a 98.6% success rate (69 / 70 gates) over 30 trials spanning two three‑gate tracks and a moving‑gate track.
Authors: Masoud Ghazikor, Van Ly Nguyen, Morteza Hashemi
Abstract: This paper investigates the performance of downlink non‑orthogonal multiple access (NOMA) communication in unmanned aerial vehicle (UAV) networks enhanced by partitionable reconfigurable intelligent surfaces (RISs). We analyze three types of links between base station (BS) and UAVs: direct, RIS‑only indirect, and composite links, under both Line‑of‑Sight (LoS) and Non‑LoS (NLoS) propagation. The RIS‑only indirect link and direct link are modeled using double Nakagami‑m and Nakagami‑m fading, respectively, while the composite link follows a combined fading channel model. Closed‑form expressions for the cumulative distribution function (CDF) of the received signal‑to‑noise ratio (SNR) are derived for all links, enabling tractable outage probability analysis. Then, we formulate a fairness‑efficiency bilevel optimization problem to minimize the maximum outage probability among UAVs while minimizing the total number of required RIS reflecting elements. Accordingly, an RIS‑assisted UAV Outage Minimization (RUOM) algorithm is proposed, which fairly allocates the NOMA power coefficients while minimizing the total number of RIS reflecting elements required, subject to NOMA‑defined constraints, RIS resource limitations, and maximum allowable outage threshold. Simulation results validate the analytical models and demonstrate that the proposed RUOM algorithm significantly improves fairness and efficiency in BS‑UAV communication.
Authors: Alessandro Nazzari, Roberto Rubinacci, Marco Lovera
Abstract: When a single pilot is responsible for managing a multi‑drone system, the task may demand varying levels of autonomy, from direct control of individual UAVs, to group‑level coordination, to fully autonomous swarm behaviors for accomplishing high‑level tasks. Enabling such flexible interaction requires a framework that supports multiple modes of shared autonomy. As language models continue to improve in reasoning and planning, they provide a natural foundation for such systems, reducing pilot workload by enabling high‑level task delegation through intuitive, language‑based interfaces. In this paper we present TACOS (Task‑Agnostic COordinator of a multi‑drone System), a unified framework that enables high‑level natural language control of multi‑UAV systems through Large Language Models (LLMs). TACOS integrates three key capabilities into a single architecture: a one‑to‑many natural language interface for intuitive user interaction, an intelligent coordinator for translating user intent into structured task plans, and an autonomous agent that executes plans interacting with the real world. TACOS allows a LLM to interact with a library of executable APIs, bridging semantic reasoning with real‑time multi‑robot coordination. We demonstrate the system on a real‑world multi‑drone system, and conduct an ablation study to assess the contribution of each module.
Authors: Shaba Shaon, Dinh C. Nguyen
Abstract: This paper investigates federated multimodal learning (FML) assisted by unmanned aerial vehicles (UAVs) with a focus on minimizing system latency and providing convergence analysis. In this framework, UAVs are distributed throughout the network to collect data, participate in model training, and collaborate with a base station (BS) to build a global model. By utilizing multimodal sensing, the UAVs overcome the limitations of unimodal systems, enhancing model accuracy, generalization, and offering a more comprehensive understanding of the environment. The primary objective is to optimize FML system latency in UAV networks by jointly addressing UAV sensing scheduling, power control, trajectory planning, resource allocation, and BS resource management. To address the computational complexity of our latency minimization problem, we propose an efficient iterative optimization algorithm combining block coordinate descent and successive convex approximation techniques, which provides high‑quality approximate solutions. We also present a theoretical convergence analysis for the UAV‑assisted FML framework under a non‑convex loss function. Numerical experiments demonstrate that our FML framework outperforms existing approaches in terms of system latency and model training performance under different data settings.
Authors: Alex Paul Hoffmann, Matthew G. Finley, Eftyhia Zesta, Mark B. Moldwin, Lauro V. Ojeda
Abstract: Landmines have been extensively used in conflict zones as an indiscriminate weapon to control military movements, often remaining active long after hostilities have ended. Their presence poses a persistent danger to civilians, hindering post‑war recovery efforts, causing injuries or death, and restricting access to essential land for agriculture and infrastructure. Unmanned aerial vehicles (UAV) equipped with magnetometers are commonly used to detect remnant hidden landmines but come with significant technical challenges due to magnetic field interference from UAV electronics such as motors. We propose the use of a frame‑mounted UAV‑borne two‑magnetometer payload to perform a two‑step automated interference removal and landmine detection analysis. The first step removes interference via the Wavelet‑Adaptive Interference Cancellation for Underdetermined Platform (WAIC‑UP) method designed for spaceflight magnetometers. The second method uses the Rapid Unsupervised Detection of Events (RUDE) algorithm to detect landmine signatures. This two‑step WAIC‑UP/RUDE approach with multiple magnetometers achieves high‑fidelity ordinance detection at a low computational cost and simplifies the design of magnetic survey payloads. We validate the method through a Monte Carlo simulation of randomized landmine placements in a 10 x 10 m square grid and drone motor interference. Additionally, we assess the efficacy of the algorithm by varying the drone's altitude, examining its performance at different heights above the ground.
Authors: Michal Werner, David Čapek, Tomáš Musil, Ondřej Franěk, Tomáš Báča, Martin Saska
Abstract: Reliable long‑range flight of unmanned aerial vehicles (UAVs) in GNSS‑denied environments is challenging: integrating odometry leads to drift, loop closures are unavailable in previously unseen areas and embedded platforms provide limited computational power. We present a fully onboard UAV system developed for the SPRIN‑D Funke Fully Autonomous Flight Challenge, which required 9 km long‑range waypoint navigation below 25 m AGL (Above Ground Level) without GNSS or prior dense mapping. The system integrates perception, mapping, planning, and control with a lightweight drift‑correction method that matches LiDAR‑derived local heightmaps to a prior geo‑data heightmap via gradient‑template matching and fuses the evidence with odometry in a clustered particle filter. Deployed during the competition, the system executed kilometer‑scale flights across urban, forest, and open‑field terrain and reduced drift substantially relative to raw odometry, while running in real time on CPU‑only hardware. We describe the system architecture, the localization pipeline, and the competition evaluation, and we report practical insights from field deployment that inform the design of GNSS‑denied UAV autonomy.
Authors: Junfeng Cai, Marco Lovera
Abstract: Dual‑system UAVs with vertical take‑off and landing capabilities have become increasingly popular in recent years. As a safety‑critical system, it is important that a dual‑system UAV can maintain safe flight after faults/failures occur. This paper proposes a gain‑scheduled passive fault‑tolerant control (PFTC) method for the transition flight of dual‑system UAVs. In this novel FTC design method, the model uncertainties arising from the loss of control effectiveness caused by actuator faults/failures, for the first time, are treated as model input uncertainty, allowing us to use multiplicative uncertainty descriptions to represent it. The advantages of the proposed method consist in significantly reducing the number of design points, thereby simplifying the control synthesis process and improving the efficiency of designing the FTC system for dual‑system UAV transition flight compared with the existing FTC design methods. As a general method, it can be applied to the design of FTC systems with multiple uncertain parameters and multiple channels. The developed passive FTC system is validated on a nonlinear six‑degree‑of‑freedom simulator. The simulation results demonstrate that the gain‑scheduled structured H infinity (GS SHIF) PFTC system provides superior fault tolerance performance compared with the LQR and structured H infinity control systems, thereby showcasing the effectiveness and the advantages of the proposed GS SHIF PFTC approach.
Authors: Ian Reid, Joseph Ritchie, Jacob Moore, Brandon Sutherland, Gabe Snow, Phillip Tokumaru, Tim McLain
Abstract: Unmanned aerial vehicle (UAV) research requires the integration of cutting‑edge technology into existing autopilot frameworks. This process can be arduous, requiring extensive resources, time, and detailed knowledge of the existing system. ROSplane is a lean, open‑source fixed‑wing autonomy stack built by researchers for researchers. It is designed to accelerate research by providing clearly defined interfaces with an easily modifiable framework. Built around ROS 2, ROSplane allows for rapid integration of low or high‑level control, path planning, or estimation algorithms. A focus on lean, easily‑understood code and extensive documentation lowers the barrier to entry for researchers. Recent developments to ROSplane improve its capacity to accelerate UAV research, including the transition from ROS 1 to ROS 2, enhanced estimation and control algorithms, increased modularity, and an improved aerodynamic modeling pipeline. This aerodynamic modeling pipeline significantly reduces the effort of transitioning from simulation to real‑world testing without requiring costly system identification or computational fluid dynamics tools. ROSplane's architecture reduces the effort required to integrate new research tools and methods, expediting hardware experimentation.
Authors: Jacob Moore, Phil Tokumaru, Ian Reid, Brandon Sutherland, Joseph Ritchie, Gabe Snow, Tim McLain
Abstract: ROSflight is a lean, open‑source autopilot ecosystem for unmanned aerial vehicles (UAVs). Designed by researchers for researchers, it is built to lower the barrier to entry to UAV research and accelerate the transition from simulation to hardware experiments by maintaining a lean (not full‑featured), well‑documented, and modular codebase. This publication builds on previous treatments and describes significant additions to the architecture that improve the modularity and usability of ROSflight, including the transition from ROS 1 to ROS 2, supported hardware, low‑level actuator mixing, and the simulation environment. We believe that these changes improve the usability of ROSflight and enable ROSflight to accelerate research in areas like advanced‑air mobility. Hardware results are provided, showing that ROSflight is able to control a multirotor over a serial connection at 400 Hz while closing all control loops on the companion computer.
Authors: Chengzhen Li, Likun Zhang, Chuang Zhang, Jiahui Li, Changyuan Zhao, Ruichen Zhang, Geng Sun
Abstract: Low‑altitude uncrewed aerial vehicles (UAVs) have become integral enablers for the Internet of Things (IoT) by offering enhanced coverage, improved connectivity and access to remote areas. A critical challenge limiting their operational capacity lies in the energy constraints of both aerial platforms and ground‑based sensors. This paper explores WLPT as a transformative solution for sustainable energy provisioning in UAV‑assisted IoT networks. We first systematically investigate the fundamental principles of WLPT and analysis the comparative advantages. Then, we introduce three operational paradigms for system integration, identify key challenges, and discuss corresponding potential solutions. In case study, we propose a multi‑agent reinforcement learning framework to address the coordination and optimization challenges in WLPT‑enabled UAV‑assisted IoT data collection. Simulation results demonstrate that our framework significantly improves energy sustainability and data freshness. Finally, we discuss some future directions.
Authors: Tanay Kumar, Raktim Bhattacharya
Abstract: Attitude stabilization of unmanned aerial vehicles (UAVs) in uncertain environments presents significant challenges due to nonlinear dynamics, parameter variations, and sensor limitations. This paper presents a comparative study of \mathcalH_\infty and classical PID controllers for multi‑rotor attitude regulation in the presence of wind disturbances and gyroscope noise. The flight dynamics are modeled using a linear parameter‑varying (LPV) framework, where nonlinearities and parameter variations are systematically represented as structured uncertainties within a linear fractional transformation formulation. A robust controller based on \mathcalH_\infty formulation is designed using only gyroscope measurements to ensure guaranteed performance bounds. Nonlinear simulation results demonstrate the effectiveness of the robust controllers compared to classical PID control, showing significant improvement in attitude regulation under severe wind disturbances.
Authors: Andrés Martínez-Silva, David Alejo, Luis Merino, Fernando Caballero
Abstract: Radio‑based methods such as Ultra‑Wideband (UWB) and RAdio Detection And Ranging (radar), which have traditionally seen limited adoption in robotics, are experiencing a boost in popularity thanks to their robustness to harsh environmental conditions and cluttered environments. This work proposes a multi‑robot UGV‑UAV localization system that leverages the two technologies with inexpensive and readily‑available sensors, such as Inertial Measurement Units (IMUs) and wheel encoders, to estimate the relative position of an aerial robot with respect to a ground robot. The first stage of the system pipeline includes a nonlinear optimization framework to trilaterate the location of the aerial platform based on UWB range data, and a radar pre‑processing module with loosely coupled ego‑motion estimation which has been adapted for a multi‑robot scenario. Then, the pre‑processed radar data as well as the relative transformation are fed to a pose‑graph optimization framework with odometry and inter‑robot constraints. The system, implemented for the Robotic Operating System (ROS 2) with the Ceres optimizer, has been validated in Software‑in‑the‑Loop (SITL) simulations and in a real‑world dataset. The proposed relative localization module outperforms state‑of‑the‑art closed‑form methods which are less robust to noise. Our SITL environment includes a custom Gazebo plugin for generating realistic UWB measurements modeled after real data. Conveniently, the proposed factor graph formulation makes the system readily extensible to full Simultaneous Localization And Mapping (SLAM). Finally, all the code and experimental data is publicly available to support reproducibility and to serve as a common open dataset for benchmarking.
Authors: Shuoyu Yue, Pengpeng Li, Yang Xu, Kunrui Ze, Xingjian Long, Huazi Cao, Guibin Sun
Abstract: Mean‑shift‑based approaches have recently emerged as a representative class of methods for robot swarm shape assembly. They rely on image‑based target‑shape representations to compute local density gradients and perform mean‑shift exploration, which constitute their core mechanism. However, such representations incur substantial memory overhead, especially for high‑resolution or 3D shapes. To address this limitation, we propose a memory‑efficient tree representation that hierarchically encodes user‑specified shapes in both 2D and 3D. Based on this representation, we design a behavior‑based distributed controller for assignment‑free shape assembly. Comparative 2D and 3D simulations against a state‑of‑the‑art mean‑shift algorithm show one to two orders of magnitude lower memory usage and two to four times faster shape entry. Physical experiments with 6 to 7 UAVs further validate real‑world practicality.
Authors: Houyi Qi, Minghui Liwang, Liqun Fu, Sai Zou, Xinlei Yi, Wei Ni, Huaiyu Dai
Abstract: Incentive‑driven resource trading is essential for UAV applications with intensive, time‑sensitive computing demands. Traditional spot trading suffers from negotiation delays and high energy costs, while conventional futures trading struggles to adapt to the dynamic, uncertain UAV‑edge environment. To address these challenges, we propose PAST (pilot‑and‑adaptive stable trading), a novel framework for edge‑assisted UAV networks with spatio‑temporal dynamism. PAST integrates two complementary mechanisms: PilotAO (pilot trading agreements with overbooking), a risk‑aware, overbooking‑enabled early‑stage decision‑making module that establishes long‑term, mutually beneficial agreements and boosts resource utilization; and AdaptAO (adaptive trading agreements with overbooking rate update), an intelligent adaptation module that dynamically updates agreements and overbooking rates based on UAV mobility, supply‑demand variations, and agreement performance. Together, these mechanisms enable both stability and flexibility, guaranteeing individual rationality, strong stability, competitive equilibrium, and weak Pareto optimality. Extensive experiments on real‑world datasets show that PAST consistently outperforms benchmark methods in decision‑making overhead, task completion latency, resource utilization, and social welfare. By combining predictive planning with real‑time adjustments, PAST offers a valuable reference on robust and adaptive practice for improving low‑altitude mission performance.
Authors: Suyu Lv, Meng Li, Qi Li, Yuanwei Liu
Abstract: A pinching‑antenna systems (PASS)‑enabled unmanned aerial vehicle (UAV) delivery framework is proposed, which exploits the capability of PASS to establish a strong line‑of‑sight link and reduce free‑space pathloss.Aiming at minimizing the communication energy consumption in one cycle, a double‑layer optimization (DLO) algorithm is developed by jointly optimizing the UAV delivery sequence and the pinching antenna (PA) activation vector. More specifically, at the outer layer, a hierarchical alternating optimization (HAO) scheme is proposed to tackle the NP‑hard problem of delivery sequence planning, where a genetic algorithm performs global exploration to generate candidate solutions at the top‑level, while a dynamic programming performs local refinement to obtain elite solutions at the lower‑level. With determined UAV trajectory, at the inner layer, focus is placed on addressing the highly coupled mixed‑integer nonlinear programming problem of PA activation vector optimization, where a pair of algorithms are proposed: 1) Branch‑and‑Bound (BnB) algorithm for finding global optimum; 2) incremental search and local refinement (ISLR) algorithm for reducing computational complexity. Simulation results indicate that: i) The proposed HAO‑based delivery sequence planning scheme can effectively reduce the total flight distance, thereby decreasing flight time and communication energy consumption; ii) Both the proposed BnB and ISLR algorithms can achieve energy‑efficient PA activation, with the former exhibiting better performance and the latter having lower complexity; iii) PASS outperforms the conventional multi‑antenna systems, especially with higher communication rate requirements.
Authors: Xianyang Deng, Wenshuai Liu, Yaru FuB, Qi Zhu
Abstract: Unmanned aerial vehicles (UAVs)‑assisted mobile crowdsensing (MCS) has emerged as a promising paradigm for data collection. However, challenges such as spectrum scarcity, device heterogeneity, and user mobility hinder efficient coordination of sensing, communication, and computation. To tackle these issues, we propose a joint optimization framework that integrates time slot partition for sensing, communication, and computation phases, resource allocation, and UAV 3D trajectory planning, aiming to maximize the amount of processed sensing data. The problem is formulated as a non‑convex stochastic optimization and further modeled as a partially observable Markov decision process (POMDP) that can be solved by multi‑agent deep reinforcement learning (MADRL) algorithm. To overcome the limitations of conventional multi‑layer perceptron (MLP) networks, we design a novel MADRL algorithm with hybrid actor network. The newly developed method is based on heterogeneous agent proximal policy optimization (HAPPO), empowered by convolutional neural networks (CNN) for feature extraction and Kolmogorov‑Arnold networks (KAN) to capture structured state‑action dependencies. Extensive numerical results demonstrate that our proposed method achieves significant improvements in the amount of processed sensing data when compared with other benchmarks.
Authors: Pablo Pueyo, Fernando Caballero, Ana Cristina Murillo, Eduardo Montijano
Abstract: Drones, or unmanned aerial vehicles (UAVs), have become powerful tools across domains‑from industry to the arts. In documentary filmmaking, they offer dynamic, otherwise unreachable perspectives, transforming how stories are told. Wildlife documentaries especially benefit, yet drones also raise ethical concerns: the risk of disturbing the animals they aim to capture. This paper introduces CineWild, an autonomous UAV framework that combines robotics, cinematography, and ethics. Built on model predictive control, CineWild dynamically adjusts flight paths and camera settings to balance cinematic quality with animal welfare. Key features include adaptive zoom for filming from acoustic and visual safe distances, path‑planning that avoids an animal's field of view, and smooth, low‑noise maneuvers. CineWild exemplifies interdisciplinary innovation‑bridging engineering, visual storytelling, and environmental ethics. We validate the system through simulation studies and will release the code upon acceptance.
Authors: Qingyang Wang, Zhuohui Yao, Wenchi Cheng, Xiao Zheng
Abstract: This paper proposes a rate‑splitting multiple access (RSMA) transmission scheme to maximize the minimum achievable rate among ground users for emergency communications in post‑disaster scenarios with obstacles, with which the optimal positioning of multiple unmanned aerial vehicle (UAV)‑enabled base stations can be achieved timely.To address the resulting non‑convex and intractable optimization problem, we design an alternating optimization approach. Specifically, we relax obstacle‑related constraints using penalty terms. In each iteration, block coordinate descent (BCD) and successive convex approximation (SCA) are applied alternately to obtain locally optimal solutions, and penalty multipliers are updated to ensure convergence of the relaxed problem to the original one. Simulation results demonstrate that the proposed scheme significantly outperforms benchmark methods in terms of the minimum achievable rate, verifying its effectiveness and superiority.
Authors: Tianjiao Sun, Ningyan Guo, Haozhe Gu, Yanyan Peng, Zhiyong Feng
Abstract: The deployment of unmanned aerial vehicle (UAV) swarm‑assisted communication networks has become an increasingly vital approach for remediating coverage limitations in infrastructure‑deficient environments, with especially pressing applications in temporary scenarios, such as emergency rescue, military and security operations, and remote area coverage. However, complex geographic environments lead to unpredictable and highly dynamic wireless channel conditions, resulting in frequent interruptions of air‑to‑ground (A2G) links that severely constrain the reliability and quality of service in UAV swarm‑assisted mobile communications. To improve the quality of UAV swarm‑assisted communications in complex geographic environments, we propose an integrated communication and control co‑design mechanism. Given the stringent energy constraints inherent in UAV swarms, our proposed mechanism is designed to optimize energy efficiency while maintaining an equilibrium between equitable communication rates for mobile ground users (GUs) and UAV energy expenditure. We formulate the joint resource allocation and 3D trajectory control problem as a Markov decision process (MDP), and develop a multi‑agent reinforcement learning (MARL) framework to enable real‑time coordinated actions across the UAV swarm. To optimize the action policy of UAV swarms, we propose a novel multi‑agent hybrid proximal policy optimization with action masking (MAHPPO‑AM) algorithm, specifically designed to handle complex hybrid action spaces. The algorithm incorporates action masking to enforce hard constraints in high‑dimensional action spaces. Experimental results demonstrate that our approach achieves a fairness index of 0.99 while reducing energy consumption by up to 25% compared to baseline methods.
Authors: Zhouyu Qu, Andreas Willig, Xiaobing Wu
Abstract: Unmanned Aerial Vehicles (UAVs), commonly known as drones, have experienced expanding use in urban environments in recent years. However, the growing density of drones raises significant challenges, such as avoiding collisions and managing air traffic efficiently, especially in congested areas. To address these issues, a structured road system and an effective guidance algorithm are essential. In this paper, we introduce a markup language allowing to describe drone road systems (DRS), in which a road system is given by a set of individual roads, each of which can have a varying number of lanes. Roads can be linked through connecting lanes. Furthermore, we propose a novel short‑term decentralized greedy (STDG) guidance algorithm that uses only the position and speed information of nearby drones ‑‑ communicated via periodically transmitted beacons ‑‑ to make real‑time decisions such as stopping, changing lanes, or adjusting speed for the next few seconds. Unlike existing methods that rely on centralized coordination, our algorithm enables drones to operate independently while ensuring safety and efficiency. We present simulation results showing the impact of key wireless and algorithm parameters on performance metrics like the drone collision rate, average speed and throughput of the drone road system.
Authors: Runze Dong, Buhong Wang, Cunqian Feng, Jiang Weng, Chen Han, Jiwei Tian
Abstract: Integrated sensing and communication (ISAC) emerges as a key enabler for next‑generation applications such as smart cities and autonomous systems. Its integration with unmanned aerial vehicles (UAVs) unlocks new potentials for reliable communication and precise sensing in dynamic aerial environments. However, existing research predominantly treats UAVs as aerial base stations, overlooking their role as ISAC users, and fails to leverage large‑scale antenna arrays at terrestrial base stations to enhance security and spectral efficiency. This paper propose a secure and spectral efficient ISAC framework for multi‑UAV networks, and a two‑stage optimization approach is developed to jointly design hybrid beamforming (HBF), artificial noise (AN) injection, and UAV trajectories. Aiming at maximizing the sum secrecy rate, the first stage employs Proximal Policy Optimization (PPO) to optimize digital beamformers and trajectories, and the second stage decomposes the digital solution into analog and digital components via low‑complexity matrix factorization. Simulation results demonstrate the effectiveness of the proposed framework compared to benchmark schemes.
Authors: Aubida A. Al-Hameed, Mohammed M. H. Qazzaz, Maryam Hafeez, Syed A. Zaidi
Abstract: 6G wireless networks aim to exploit semantic awareness to optimize radio resources. By optimizing the transmission through the lens of the desired goal, the energy consumption of transmissions can also be reduced, and the latency can be improved. To that end, this paper investigates a paradigm in which the capabilities of generative AI (GenAI) on the edge are harnessed for network optimization. In particular, we investigate an Unmanned Aerial Vehicle (UAV) handover framework that takes advantage of GenAI and semantic communication to maintain reliable connectivity. To that end, we propose a framework in which a lightweight MobileBERT language model, fine‑tuned using Low‑Rank Adaptation (LoRA), is deployed on the UAV. This model processes multi‑attribute flight and radio measurements and performs multi‑label classification to determine appropriate handover action. Concurrently, the model identifies an appropriate set of contextual "Reason Tags" that elucidate the decision's rationale. Our model, evaluated on a rule‑based synthetic dataset of UAV handover scenarios, demonstrates the model's high efficacy in learning these rules, achieving high accuracy in predicting the primary handover decision. The model also shows strong performance in identifying supporting reasons, with an F1 micro‑score of approximately 0.9 for reason tags.
Authors: Chih Yao Hu, Yang-Sen Lin, Yuna Lee, Chih-Hai Su, Jie-Ying Lee, Shr-Ruei Tsai, Chin-Yang Lin, Kuan-Wen Chen, Tsung-Wei Ke, Yu-Lun Liu
Abstract: We present See, Point, Fly (SPF), a training‑free aerial vision‑and‑language navigation (AVLN) framework built atop vision‑language models (VLMs). SPF is capable of navigating to any goal based on any type of free‑form instructions in any kind of environment. In contrast to existing VLM‑based approaches that treat action prediction as a text generation task, our key insight is to consider action prediction for AVLN as a 2D spatial grounding task. SPF harnesses VLMs to decompose vague language instructions into iterative annotation of 2D waypoints on the input image. Along with the predicted traveling distance, SPF transforms predicted 2D waypoints into 3D displacement vectors as action commands for UAVs. Moreover, SPF also adaptively adjusts the traveling distance to facilitate more efficient navigation. Notably, SPF performs navigation in a closed‑loop control manner, enabling UAVs to follow dynamic targets in dynamic environments. SPF sets a new state of the art in DRL simulation benchmark, outperforming the previous best method by an absolute margin of 63%. In extensive real‑world evaluations, SPF outperforms strong baselines by a large margin. We also conduct comprehensive ablation studies to highlight the effectiveness of our design choice. Lastly, SPF shows remarkable generalization to different VLMs. Project page: https://spf‑web.pages.dev
Authors: Xuhui Zhang, Wenchao Liu, Chunjie Wang, Jinke Ren, Huijun Xing, Shuqiang Wang, Yanyan Shen
Abstract: Fluid antenna system (FAS) is emerging as a key technology for enhancing spatial flexibility and sensing accuracy in future wireless systems. This paper investigates an unmanned aerial vehicle (UAV)‑enabled FAS for multi‑target wireless sensing in low‑altitude wireless consumer networks (LAWCNs) for achieving the low‑altitude economy (LAE) missions. We formulate an optimization problem aimed at minimizing the average Cramér‑Rao bound (CRB) for multiple target estimations. To tackle this non‑convex problem, an efficient alternating optimization (AO) algorithm is proposed, which jointly optimizes the UAV trajectory, the antenna position of the transmit fluid antennas (FAs) and the receive FAs, and the transmit beamforming at the UAV. Simulation results demonstrate significant performance improvements in estimation accuracy and sensing reliability compared to conventional schemes, e.g., the fixed position antenna scheme. The proposed system achieves enhanced sensing performance through adaptive trajectory design and beamforming, alongside effective interference suppression via the flexible FAS antenna repositioning, underscoring its practical potential for precision sensing in the UAV‑enabled LAWCNs.
Authors: Defan Chen, Yaohua Hu, Luchan Zhang
Abstract: The real‑time detection of small objects in complex scenes, such as the unmanned aerial vehicle (UAV) photography captured by drones, has dual challenges of detecting small targets (<32 pixels) and maintaining real‑time efficiency on resource‑constrained platforms. While YOLO‑series detectors have achieved remarkable success in real‑time large object detection, they suffer from significantly higher false negative rates for drone‑based detection where small objects dominate, compared to large object scenarios. This paper proposes HierLight‑YOLO, a hierarchical feature fusion and lightweight model that enhances the real‑time detection of small objects, based on the YOLOv8 architecture. We propose the Hierarchical Extended Path Aggregation Network (HEPAN), a multi‑scale feature fusion method through hierarchical cross‑level connections, enhancing the small object detection accuracy. HierLight‑YOLO includes two innovative lightweight modules: Inverted Residual Depthwise Convolution Block (IRDCB) and Lightweight Downsample (LDown) module, which significantly reduce the model's parameters and computational complexity without sacrificing detection capabilities. Small object detection head is designed to further enhance spatial resolution and feature fusion to tackle the tiny object (4 pixels) detection. Comparison experiments and ablation studies on the VisDrone2019 benchmark demonstrate state‑of‑the‑art performance of HierLight‑YOLO.
Authors: Boying Li, Chang Liu, Petter Kyösti, Mattias Öhman, Devashish Singha Roy, Sofia Plazzi, Hamam Mokayed, Olle Hagner
Abstract: Aside from common challenges in remote sensing like small, sparse targets and computation cost limitations, detecting vehicles from UAV images in the Nordic regions faces strong visibility challenges and domain shifts caused by diverse levels of snow coverage. Although annotated data are expensive, unannotated data is cheaper to obtain by simply flying the drones. In this work, we proposed a sideload‑CL‑adaptation framework that enables the use of unannotated data to improve vehicle detection using lightweight models. Specifically, we propose to train a CNN‑based representation extractor through contrastive learning on the unannotated data in the pretraining stage, and then sideload it to a frozen YOLO11n backbone in the fine‑tuning stage. To find a robust sideload‑CL‑adaptation, we conducted extensive experiments to compare various fusion methods and granularity. Our proposed sideload‑CL‑adaptation model improves the detection performance by 3.8% to 9.5% in terms of mAP50 on the NVD dataset.
Authors: Md. Mahfuzur Rahman, Kishor Datta Gupta, Marufa Kamal, Fahad Rahman, Sunzida Siddique, Ahmed Rafi Hasan, Mohd Ariful Haque, Roy George
Abstract: General‑purpose vision‑language models (VLMs) such as LLaVA and QwenVL produce descriptions of disaster imagery that lack domain‑specific vocabulary and actionable detail. We propose the Vision‑Language Caption Enhancer (VLCE), a framework that integrates external semantic knowledge from ConceptNet and WordNet into the caption generation process for post‑disaster satellite and UAV imagery. VLCE operates in two stages: first, a baseline VLM generates an initial caption conditioned on YOLOv8 object detections; second, a knowledge‑enriched sequential model, a CNN‑LSTM or a hierarchical cross‑modal Transformer, refines the caption using a vocabulary augmented with 1,566 domain‑relevant terms extracted from knowledge graphs. We evaluate VLCE on two disaster benchmarks: xBD (satellite, 6,369 images, 3 damage classes) and RescueNet (UAV, 4,494 images, 12 damage classes), using CLIPScore for semantic alignment and InfoMetIC for informativeness. On RescueNet with the Transformer decoder, VLCE with knowledge graph enrichment produces captions preferred over QwenVL baselines in 95.33% of image pairs on InfoMetIC and 73.64% on CLIPScore. Qualitative analysis shows that without knowledge graph integration, generated captions exhibit hallucinations, word repetition, and semantic incoherence, whereas knowledge‑enriched captions maintain factual consistency and domain‑appropriate vocabulary.
Authors: Haozhe Xu, Cheng Cheng, Hongrui Sang, Zhipeng Wang, Qiyong He, Xiuxian Li, Bin He
Abstract: Autonomous docking between Unmanned Aerial Vehicles (UAVs) and ground robots is essential for heterogeneous systems, yet most existing approaches target wheeled platforms whose limited mobility constrains exploration in complex terrains. Quadruped robots offer superior adaptability but undergo frequent posture variations, making it difficult to provide a stable landing surface for UAVs. To address these challenges, we propose an autonomous UAV‑quadruped docking framework for GPS‑denied environments. On the quadruped side, a Hybrid Internal Model with Horizontal Alignment (HIM‑HA), learned via deep reinforcement learning, actively stabilizes the torso to provide a level platform. On the UAV side, a three‑phase strategy is adopted, consisting of long‑range acquisition with a median‑filtered YOLOv8 detector, close‑range tracking with a constraint‑aware controller that integrates a Nonsingular Fast Terminal Sliding Mode Controller (NFTSMC) and a logarithmic Barrier Function (BF) to guarantee finite‑time error convergence under field‑of‑view (FOV) constraints, and terminal descent guided by a Safety Period (SP) mechanism that jointly verifies tracking accuracy and platform stability. The proposed framework is validated in both simulation and real‑world scenarios, successfully achieving docking on outdoor staircases higher than 17 cm and rough slopes steeper than 30 degrees. Supplementary materials and videos are available at: https://uav‑quadruped‑docking.github.io.
Authors: Xiaofan Yu, Yuwei Wu, Katherine Mao, Ye Tian, Vijay Kumar, Tajana Rosing
Abstract: Multi‑robot target tracking is a fundamental problem that requires coordinated monitoring of dynamic entities in applications such as precision agriculture, environmental monitoring, disaster response, and security surveillance. While Federated Learning (FL) has the potential to enhance learning across multiple robots without centralized data aggregation, its use in multi‑Unmanned Aerial Vehicle (UAV) target tracking remains largely underexplored. Key challenges include limited onboard computational resources, significant data heterogeneity in FL due to varying targets and the fields of view, and the need for tight coupling between trajectory prediction and multi‑robot planning. In this paper, we introduce DroneFL, the first federated learning framework specifically designed for efficient multi‑UAV target tracking. We design a lightweight local model to predict target trajectories from sensor inputs, using a frozen YOLO backbone and a shallow transformer for efficient onboard training. The updated models are periodically aggregated in the cloud for global knowledge sharing. To alleviate the data heterogeneity that hinders FL convergence, DroneFL introduces a position‑invariant model architecture with altitude‑based adaptive instance normalization. Finally, we fuse predictions from multiple UAVs in the cloud and generate optimal trajectories that balance target prediction accuracy and overall tracking performance. Our results show that DroneFL reduces prediction error by 6%‑83% and tracking distance by 0.4%‑4.6% compared to a distributed non‑FL framework. In terms of efficiency, DroneFL runs in real time on a Raspberry Pi 5 and has on average just 1.56 KBps data rate to the cloud.
Authors: Nyi Nyi Aung, Neil Muralles, Adrian Stein
Abstract: This work addresses object identification under known dynamics in unmanned aerial vehicle applications, where learning and classification are combined through a physics‑informed residual neural network. The proposed framework leverages physics‑informed learning for state mapping and state‑derivative prediction, while a softmax layer enables multi‑class confidence estimation. Quadcopter, fixed‑wing, and helicopter aerial vehicles are considered as case studies. The results demonstrate high classification accuracy with reduced training time, offering a promising solution for system identification problems in domains where the underlying dynamics are well understood.
Authors: Wenchao Liu, Xuhui Zhang, Jinke Ren, Weijie Yuan, Changsheng You, Shuangyang Li
Abstract: Unmanned aerial vehicle (UAV)‑enabled integrated sensing and communication (ISAC) is regarded as a key enabler for next‑generation wireless systems. However, conventional fixed‑position antennas limit the ability of UAVs to fully exploit their inherent potential. To overcome this limitation, we propose a UAV‑enabled ISAC framework equipped with fluid antennas (FAs), where the mobility of antenna elements introduces additional spatial degrees of freedom to simultaneously enhance communication and sensing performance. A multi‑objective optimization problem is formulated to maximize the communication rates of multiple users while minimizing the Cramér‑Rao bound (CRB) for the angle estimation of a single target. Due to excessively frequent updates of FA positions may lead to response delay, a three‑timescale optimization framework is developed to jointly optimize transmit beamforming, FA positions, and UAV trajectory based on their characteristics. To solve the non‑convexity of the problem, an alternating optimization‑based algorithm is developed to obtain a sub‑optimal solution. Numerical results show that the proposed scheme significantly outperforms various benchmark schemes, validating the effectiveness of integrating the FA technology into the UAV‑enabled ISAC systems.
Authors: Xiaowei Wang, Di Wang, Ke Li, Yifeng Wang, Chengjian Wang, Libin Sun, Zhihong Wu, Yiming Zhang, Quan Wang
Abstract: Cross‑view geo‑localization (CVGL) aims to match images of the same location captured from drastically different viewpoints. Despite recent progress, existing methods still face two key challenges: (1) achieving robustness under severe appearance variations induced by diverse UAV orientations and fields of view, which hinders cross‑domain generalization, and (2) establishing reliable correspondences that capture both global scene‑level semantics and fine‑grained local details. In this paper, we propose EGS, a novel CVGL framework designed to enhance cross‑domain generalization. Specifically, we introduce an E(2)‑Steerable CNN encoder to extract stable and reliable features under rotation and viewpoint shifts. Furthermore, we construct a graph with a virtual super‑node that connects to all local nodes, enabling global semantics to be aggregated and redistributed to local regions, thereby enforcing global‑local consistency. Extensive experiments on the University‑1652 and SUES‑200 benchmarks demonstrate that EGS consistently achieves substantial performance gains and establishes a new state of the art in cross‑domain CVGL.
Authors: Kishor Datta Gupta, Mohd Ariful Haque, Marufa Kamal, Ahmed Rafi Hasan, Md. Mahfuzur Rahman, Roy George
Abstract: Traditional clustering techniques often rely solely on similarity in the input data, limiting their ability to capture structural or semantic constraints that are critical in many domains. We introduce the Domain Aware Rule Triggered Variational Autoencoder (DARTVAE), a rule guided multimodal clustering framework that incorporates domain specific constraints directly into the representation learning process. DARTVAE extends the VAE architecture by embedding explicit rules, semantic representations, and data driven features into a unified latent space, while enforcing constraint compliance through rule consistency and violation penalties in the loss function. Unlike conventional clustering methods that rely only on visual similarity or apply rules as post hoc filters, DARTVAE treats rules as first class learning signals. The rules are generated by LLMs, structured into knowledge graphs, and enforced through a loss function combining reconstruction, KL divergence, consistency, and violation penalties. Experiments on aircraft and automotive datasets demonstrate that rule guided clustering produces more operationally meaningful and interpretable clusters for example, isolating UAVs, unifying stealth aircraft, or separating SUVs from sedans while improving traditional clustering metrics. However, the framework faces challenges: LLM generated rules may hallucinate or conflict, excessive rules risk overfitting, and scaling to complex domains increases computational and consistency difficulties. By combining rule encodings with learned representations, DARTVAE achieves more meaningful and consistent clustering outcomes than purely data driven models, highlighting the utility of constraint guided multimodal clustering for complex, knowledge intensive settings.
Authors: Zhouxiang Zhao, Ran Yi, Yihan Cang, Boyang Jin, Zhaohui Yang, Mingzhe Chen, Chongwen Huang, Zhaoyang Zhang
Abstract: This letter addresses the energy efficiency issue in unmanned aerial vehicle (UAV)‑assisted autonomous systems. We propose a framework for an agentic artificial intelligence (AI)‑powered low‑altitude semantic wireless network, that intelligently orchestrates a sense‑communicate‑decide‑control workflow. A system‑wide energy consumption minimization problem is formulated to enhance mission endurance. This problem holistically optimizes key operational variables, including UAV's location, semantic compression ratio, transmit power of the UAV and a mobile base station, and binary decision for AI inference task offloading, under stringent latency and quality‑of‑service constraints. To tackle the formulated mixed‑integer non‑convex problem, we develop a low‑complexity algorithm which can obtain the globally optimal solution with two‑dimensional search. Simulation results validate the effectiveness of our proposed design, demonstrating significant reductions in total energy consumption compared to conventional baseline approaches.
Authors: Wenwen Xie, Geng Sun, Jiahui Li, Jiacheng Wang, Yinqiu Liu, Dusit Niyato, Dong In Kim, Shiwen Mao
Abstract: Low‑altitude wireless networks (LAWNs) have become effective solutions for collecting data from low‑power Internet‑of‑Things devices (IoTDs) in remote areas with limited communication infrastructure. However, some outdoor IoTDs deployed in such areas face both energy constraints and low‑channel quality challenges, making it challenging to ensure timely data collection from these IoTDs in LAWNs. In this work, we investigate a reconfigurable intelligent surface (RIS)‑assisted uncrewed aerial vehicle (UAV)‑enabled data collection and wireless power transfer system in LAWN. Specifically, IoTDs first harvest energy from a low‑altitude UAV, and then upload their data to the UAV by applying the time division multiple access (TDMA) protocol, supported by an RIS to improve the channel quality. To maintain satisfactory data freshness of the IoTDs and save energy for an energy‑constrained UAV, we aim to minimize the age of information (AoI) and energy consumption of the UAV by jointly optimizing the RIS phase shits, UAV trajectory, charging time allocation, and binary IoTD scheduling. We propose a deep reinforcement learning (DRL)‑based approach, namely the alternating optimization‑improved parameterized deep Q‑network (AO‑IPDQN). Specifically, considering that RIS typically contains a large number of reflecting elements, we first adopt an alternating optimization (AO) method to optimize the RIS phase shifts to reduce the dimension of the action space. Then, we propose the improved parameterized deep Q‑network (IPDQN) method to deal with the hybrid action space. Simulation results indicate that AO‑IPDQN approach achieves excellent performance relative to multiple comparison methods across various simulation scenarios.
Authors: Alessandro Saviolo, Jeffrey Mao, Giuseppe Loianno
Abstract: Search and rescue operations require unmanned aerial vehicles to both traverse unknown unstructured environments at high speed and track targets once detected. Achieving both capabilities under degraded sensing and without global localization remains an open challenge. Recent works on relative navigation have shown robust tracking by anchoring planning and control to a visible detected object, but cannot address navigation when no target is in the field of view. We present HUNT (High‑speed UAV Navigation and Tracking), a real‑time framework that unifies traversal, acquisition, and tracking within a single relative formulation. HUNT defines navigation objectives directly from onboard instantaneous observables such as attitude, altitude, and velocity, enabling reactive high‑speed flight during search. Once a target is detected, the same perception‑control pipeline transitions seamlessly to tracking. Outdoor experiments in dense forests, container compounds, and search‑and‑rescue operations with vehicles and mannequins demonstrate robust autonomy where global methods fail.
Authors: Ruiqi Zheng, Jingxu Chen, Jinkun Hu, Haikun Huang, Junyi Zhang, Wufei Zhou, Sheng Dong, Xudong Wang, Xinhuan Feng, Jiejun Zhang, Jianping Yao
Abstract: Future space‑ground communication networks require a seamless fusion of technologies that combine the all‑weather reliability of microwave links with the ultra‑high data capacity of near‑infrared optical systems. Achieving this vision demands compact, robust, and multifunctional hardware, yet monolithic integration of these fundamentally distinct domains has remained elusive. Here, we present the first monolithically integrated silicon photonic chip that bridges microwave and optical domains for dual‑band free‑space communications and dynamic beamforming. The chip integrates a microwave true time delay (TTD) beamforming network, an optical phased array (OPA) beamforming network, and an optical coherent transceiver, all on a silicon‑on‑insulator (SOI) platform. By uniting the strengths of microwave resilience, optical bandwidth, and coherent detection sensitivity, this photonic integrated circuit represents a critical step toward reconfigurable, interference‑resistant, high‑throughput links for satellites, UAVs, and ground stations. Experimental demonstrations confirm two‑dimensional dynamic beam steering in both bands 24.9 deg x 18.5 deg at microwave frequencies and 10 deg x 4.7 deg in the optical domain. In a 5‑meter free‑space link, the chip achieves error‑free transmission at 10 Gbps for microwave and 80 Gbps per wavelength in the near infrared band. These results establish integrated microwave photonics as a promising platform for bridging Earth and orbit through compact, dual‑band, beamforming‑enabled transceivers.
Authors: Abdullahi Isa Ahmed, Jamal Bentahar, El Mehdi Amhoud
Abstract: The rapid advancement of Low‑Power Wide Area Networks (LPWANs), particularly Long Range (LoRa) systems, has positioned them as a cornerstone for Next‑Generation Internet of Things (NG‑IoT) applications within 5G/6G ecosystems. Despite their long‑range and low‑power advantages, achieving high energy efficiency in LoRa networks remains a significant challenge in highly dynamic environments. Traditional terrestrial gateway deployments often suffer from coverage gaps and non‑line‑of‑sight propagation, while satellite‑based alternatives incur excessive energy consumption and prohibitive latency. To address these limitations, we propose a multi‑UAV architecture where unmanned aerial vehicles (UAVs) serve as mobile LoRa gateways to dynamically collect data from ground‑based end devices (EDs). We formulate a joint optimization problem to maximize the system's weighted energy efficiency by jointly optimizing spreading factors, transmission powers, UAV trajectories, and ED‑UAV associations. This problem is transformed into a partially observable stochastic game (POSG), which we solve using our proposed Green LoRa Multi‑Agent Proximal Policy Optimization (GLo‑MAPPO). Our framework leverages centralized training with decentralized execution (CTDE) and is enhanced by a gain‑based ED‑UAV association scheme. Simulation results show that GLo‑MAPPO significantly outperforms state‑of‑the‑art multi‑agent reinforcement learning (MARL) benchmarks in energy efficiency and power consumption across varying network densities. Furthermore, ablation studies validate the necessity of each optimization component and the effectiveness of the proposed association scheme.
Authors: Xiaoyu Wang, Yan Rui Tan, William Leong, Sunan Huang, Rodney Teo, Cheng Xiang
Abstract: This paper proposes an image‑based visual servoing (IBVS) framework for UAV navigation and collision avoidance using only an RGB camera. While UAV navigation has been extensively studied, it remains challenging to apply IBVS in missions involving multiple visual targets and collision avoidance. The proposed method achieves navigation without explicit path planning, and collision avoidance is realized through AI‑based monocular depth estimation from RGB images. Unlike approaches that rely on stereo cameras or external workstations, our framework runs fully onboard a Jetson platform, ensuring a self‑contained and deployable system. Experimental results validate that the UAV can navigate across multiple AprilTags and avoid obstacles effectively in GPS‑denied environments.
Authors: Andrea Vaiuso, Gabriele Immordino, Ludovica Onofri, Giuliano Coppotelli, Marcello Righi
Abstract: Integrating unmanned aerial vehicles into daily use requires controllers that ensure stable flight, efficient energy use, and reduced noise. Proportional integral derivative controllers remain standard but are highly sensitive to gain selection, with manual tuning often yielding suboptimal trade‑offs. This paper studies different optimization techniques for the automated tuning of quadrotor proportional integral derivative gains under a unified simulation that couples a blade element momentum based aerodynamic model with a fast deep neural network surrogate, six degrees of freedom rigid body dynamics, turbulence, and a data driven acoustic surrogate model that predicts third octave spectra and propagates them to ground receivers. We compare three families of gradient‑free optimizers: metaheuristics, Bayesian optimization, and deep reinforcement learning. Candidate controllers are evaluated using a composite cost function that incorporates multiple metrics, such as noise footprint and power consumption, simultaneously. Metaheuristics improve performance consistently, with Grey Wolf Optimization producing optimal results. Bayesian optimization is sample efficient but carries higher per iteration overhead and depends on the design domain. The reinforcement learning agents do not surpass the baseline in the current setup, suggesting the problem formulation requires further refinement. On unseen missions the best tuned controller maintains accurate tracking while reducing oscillations, power demand, and acoustic emissions. These results show that noise aware proportional integral derivative tuning through black box search can deliver quieter and more efficient flight without hardware changes.
Authors: Yao Wu, Ziye Jia, Qihui Wu, Yian Zhu
Abstract: The advancement of low‑altitude intelligent networks enables unmanned aerial vehicle (UAV) interconnection via flying ad‑hoc networks (FANETs), offering flexibility and decentralized coordination. However, resource constraints, dynamic topologies, and UAV operations in open environments present significant security and communication challenges. Existing multi‑factor and public‑key cryptography protocols are vulnerable due to their reliance on stored sensitive information, increasing the risk of exposure and compromise. This paper proposes a lightweight authentication and key agreement protocol for FANETs, integrating physical unclonable functions with dynamic credential management and lightweight cryptographic primitives. The protocol reduces computational and communication overhead while enhancing security. Security analysis confirms its resilience against various attacks, and comparative evaluations demonstrate its superiority in security, communication efficiency, and computational cost.
Authors: Jeongmin Lee, Chanhong Jeon, Hyungjoo Seo, Taewook Kang
Abstract: This paper proposes DroFiT (Drone Frequency lightweight Transformer for speech enhancement, a single microphone speech enhancement network for severe drone self‑noise. DroFit integrates a frequency‑wise Transformer with a full/sub‑band hybrid encoder‑decoder and a TCN back‑end for memory‑efficient streaming. A learnable skip‑and‑gate fusion with a combined spectral‑temporal loss further refines reconstruction. The model is trained on VoiceBank‑DEMAND mixed with recorded drone noise (‑5 to ‑25 dB SNR) and evaluate using standard speech enhancement metrics and computational efficiency. Experimental results show that DroFiT achieves competitive enhancement performance while significantly reducing computational and memory demands, paving the way for real‑time processing on resource‑constrained UAV platforms. Audio demo samples are available on our demo page.
Authors: Rim Zrelli, Henrique Amaral Misson, Sorelle Kamkuimo, Maroua Ben Attia, Abdo Shabah, Felipe Gohring de Magalhaes, Gabriela Nicolescu
Abstract: This technical report presents the detailed implementation of a Collision Avoidance System (CAS) for Unmanned Aerial Vehicles (UAVs), developed as a case study to demonstrate a rigorous methodology for achieving DO‑178C compliance in safety‑critical software. The CAS is based on functional requirements inspired by NASA's Access 5 project and is designed to autonomously detect, evaluate, and avoid potential collision threats in real‑time, supporting the safe integration of UAVs into civil airspace.
The implementation environment combines formal methods, model‑based development, and automated verification tools, including Alloy, SPIN, Simulink Embedded Coder, and the LDRA tool suite. The report documents each phase of the software lifecycle: requirements specification and validation, architectural and detailed design, coding, verification, and traceability, with a strong focus on compliance with DO‑178C Design Assurance Level B objectives.
Results demonstrate that formal modelling and automated toolchains enabled early detection and correction of specification defects, robust traceability, and strong evidence of verification and validation across all development stages. Static and dynamic analyses confirmed code quality and coverage, while formal verification methods provided mathematical assurance of correctness for critical components. Although the integration phase was not fully implemented, the approach proved effective in addressing certification challenges for UAV safety‑critical systems.
\keywords Collision Avoidance System (CAS), Unmanned Aerial Vehicles (UAVs), DO‑178C compliance, Safety‑critical software, Formal methods, Model‑based development, Alloy, SPIN model checker, Simulink Embedded Coder, LDRA tool suite, Software verification and validation, Traceability, Certification.
Authors: Yifan Lin, Sophie Ziyu Liu, Ran Qi, George Z. Xue, Xinping Song, Chao Qin, Hugh H. -T. Liu
Abstract: We present Agentic Aerial Cinematography: From Dialogue Cues to Cinematic Trajectories (ACDC), an autonomous drone cinematography system driven by natural language communication between human directors and drones. The main limitation of previous drone cinematography workflows is that they require manual selection of waypoints and view angles based on predefined human intent, which is labor‑intensive and yields inconsistent performance. In this paper, we propose employing large language models (LLMs) and vision foundation models (VFMs) to convert free‑form natural language prompts directly into executable indoor UAV video tours. Specifically, our method comprises a vision‑language retrieval pipeline for initial waypoint selection, a preference‑based Bayesian optimization framework that refines poses using aesthetic feedback, and a motion planner that generates safe quadrotor trajectories. We validate ACDC through both simulation and hardware‑in‑the‑loop experiments, demonstrating that it robustly produces professional‑quality footage across diverse indoor scenes without requiring expertise in robotics or cinematography. These results highlight the potential of embodied AI agents to close the loop from open‑vocabulary dialogue to real‑world autonomous aerial cinematography.
Authors: Ashwin Gupta, Kevin Wolfe, Gino Perrotta, Joseph Moore
Abstract: Unsteady aerodynamic effects can have a profound impact on aerial vehicle flight performance, especially during agile maneuvers and in complex aerodynamic environments. In this paper, we present a real‑time planning and control approach capable of reasoning about unsteady aerodynamics. Our approach relies on a lightweight vortex particle model, parallelized to allow GPU acceleration, and a sampling‑based policy optimization strategy capable of leveraging the vortex particle model for predictive reasoning. We demonstrate, through both simulation and hardware experiments, that by replanning with our unsteady aerodynamics model, we can improve the performance of aggressive post‑stall maneuvers in the presence of unsteady environmental flow disturbances.
Authors: Matteo Repetto, Enrico Cambiaso, Fabio Patrone, Sandro Zappatore
Abstract: Today, many critical services and industrial systems rely on wireless networks for interaction with the IoT, hence becoming vulnerable to a broad number of cyber‑threats. While detecting this kind of attacks is not difficult with common cyber‑security tools, and even trivial for jamming, finding their origin and identifying culprits is almost impossible today, yet indispensable to stop them, especially when attacks are generated with portable or self‑made devices that continuously move around. To address this open challenge, the FOLLOWME project investigates the feasibility of using UAV to locate and even chase attackers during illicit usage of the radio spectrum. The main objective is to develop a cyber‑physical security framework that integrates network telemetry with wireless localization. The former triggers alarms in case of anomalies or known attack patterns and provides a coarse‑grained indication of the physical area (i.e., the position of affected access gateways), whereas the latter systematically scans such area to identify the exact location of the attacker. The project will specifically address long‑range metropolitan area networks and focus on the LoRaWAN protocol, which is the typical scenario for Smart City services.
Authors: Can Cui, Ziye Jia, Jiahao You, Chao Dong, Qihui Wu, Han Zhu
Abstract: The unmanned aerial vehicle (UAV) based multi‑access edge computing (MEC) appears as a popular paradigm to reduce task processing latency. However, the secure offloading is an important issue when occurring aerial eavesdropping. Besides, the potential uncertainties in practical applications and flexible trajectory optimizations of UAVs pose formidable challenges for realizing robust offloading. In this paper, we consider the aerial secure MEC network including ground users, service unmanned aerial vehicles (S‑UAVs) integrated with edge servers, and malicious UAVs overhearing transmission links. To deal with the task computation complexities, which are characterized as uncertainties, a robust problem is formulated with chance constraints. The energy cost is minimized by optimizing the connections, trajectories of S‑UAVs and offloading ratios. Then, the proposed non‑linear problem is tackled via the distributionally robust optimization and conditional value‑at‑risk mechanism, which is further transformed into the second order cone programming forms. Moreover, we decouple the reformulated problem and design the successive convex approximation for S‑UAV trajectories. The global algorithm is designed to solve the sub‑problems in a block coordinate decent manner. Finally, extensive simulations and numerical analyses are conducted to verify the robustness of the proposed algorithms, with just 2% more energy cost compared with the ideal circumstance.
Authors: Gokul Puthumanaillam, Ram Padmanabhan, Jose Fuentes, Nicole Cruz, Paulo Padrao, Ruben Hernandez, Hao Jiang, William Schafer, Leonardo Bobadilla, Melkior Ornik
Abstract: In supervisory control settings, autonomous systems are not monitored continuously. Instead, monitoring often occurs at sporadic intervals within known bounds. We study the problem of deception, where an agent pursues a private objective while remaining plausibly compliant with a supervisor's reference policy when observations occur. Motivated by the behavior of real, human supervisors, we situate the problem within Theory of Mind: the representation of what an observer believes and expects to see. We show that Theory of Mind can be repurposed to steer online reinforcement learning (RL) toward such deceptive behavior. We model the supervisor's expectations and distill from them a single, calibrated scalar ‑‑ the expected evidence of deviation if an observation were to happen now. This scalar combines how unlike the reference and current action distributions appear, with the agent's belief that an observation is imminent. Injected as a state‑dependent weight into a KL‑regularized policy improvement step within an online RL loop, this scalar informs a closed‑form update that smoothly trades off self‑interest and compliance, thus sidestepping hand‑crafted or heuristic policies. In real‑world, real‑time hardware experiments on marine (ASV) and aerial (UAV) navigation, our ToM‑guided RL runs online, achieves high return and success with observed‑trace evidence calibrated to the supervisor's expectations.
Authors: Seth Farrell, Chenghao Li, Hesam Mojtahedi, Henrik I. Christensen
Abstract: We present a cooperative aerial‑ground search‑and‑rescue (SAR) framework that pairs two unmanned aerial vehicles (UAVs) with an unmanned ground vehicle (UGV) to achieve rapid victim localization and obstacle‑aware navigation in unknown environments. We dub this framework Guided Long‑horizon Integrated Drone Escort (GLIDE), highlighting the UGV's reliance on UAV guidance for long‑horizon planning. In our framework, a goal‑searching UAV executes real‑time onboard victim detection and georeferencing to nominate goals for the ground platform, while a terrain‑scouting UAV flies ahead of the UGV's planned route to provide mid‑level traversability updates. The UGV fuses aerial cues with local sensing to perform time‑efficient A planning and continuous replanning as information arrives. Additionally, we present a hardware demonstration (using a GEM e6 golf cart as the UGV and two X500 UAVs) to evaluate end‑to‑end SAR mission performance and include simulation ablations to assess the planning stack in isolation from detection. Empirical results demonstrate that explicit role separation across UAVs, coupled with terrain scouting and guided planning, improves reach time and navigation safety in time‑critical SAR missions.
Authors: Guangjin Pan, Liping Bai, Zhuojun Tian, Hui Chen, Mehdi Bennis, Henk Wymeersch
Abstract: Integrated sensing and communication (ISAC) is a core technology for 6G, and its application to closed‑loop sensing, communication, and control (SCC) enables various services. Existing SCC solutions often treat sensing and control separately, leading to suboptimal performance and resource usage. In this work, we introduce the active inference framework (AIF) into SCC‑enabled unmanned aerial vehicle (UAV) systems for joint state estimation, control, and sensing resource allocation. By formulating a unified generative model, the problem reduces to minimizing variational free energy for inference and expected free energy for action planning. Simulation results show that both control cost and sensing cost are reduced relative to baselines.
Authors: Viktor Lorentz, Khaled Wahba, Sayantan Auddy, Marc Toussaint, Wolfgang Hönig
Abstract: Collaborative transportation of cable‑suspended payloads by teams of Unmanned Aerial Vehicles (UAVs) has the potential to enhance payload capacity, adapt to different payload shapes, and provide built‑in compliance, making it attractive for applications ranging from disaster relief to precision logistics. However, multi‑UAV coordination under disturbances, nonlinear payload dynamics, and slack‑‑taut cable modes remains a challenging control problem. To our knowledge, no prior work has addressed these cable mode transitions in the multi‑UAV context, instead relying on simplifying rigid‑link assumptions. We propose CrazyMARL, a decentralized Reinforcement Learning (RL) framework for multi‑UAV cable‑suspended payload transport. Simulation results demonstrate that the learned policies can outperform classical decentralized controllers in terms of disturbance rejection and tracking precision, achieving an 80% recovery rate from harsh conditions compared to 44% for the baseline method. We also achieve successful zero‑shot sim‑to‑real transfer and demonstrate that our policies are highly robust under harsh conditions, including wind, random external disturbances, and transitions between slack and taut cable dynamics. This work paves the way for autonomous, resilient UAV teams capable of executing complex payload missions in unstructured environments.
Authors: Valerii Serpiva, Artem Lykov, Faryal Batool, Vladislav Kozlovskiy, Miguel Altamirano Cabrera, Dzmitry Tsetserukou
Abstract: We present FlightDiffusion, a diffusion‑model‑based framework for training autonomous drones from first‑person view (FPV) video. Our model generates realistic video sequences from a single frame, enriched with corresponding action spaces to enable reasoning‑driven navigation in dynamic environments. Beyond direct policy learning, FlightDiffusion leverages its generative capabilities to synthesize diverse FPV trajectories and state‑action pairs, facilitating the creation of large‑scale training datasets without the high cost of real‑world data collection. Our evaluation demonstrates that the generated trajectories are physically plausible and executable, with a mean position error of 0.25 m (RMSE 0.28 m) and a mean orientation error of 0.19 rad (RMSE 0.24 rad). This approach enables improved policy learning and dataset scalability, leading to superior performance in downstream navigation tasks. Results in simulated environments highlight enhanced robustness, smoother trajectory planning, and adaptability to unseen conditions. An ANOVA revealed no statistically significant difference between performance in simulation and reality (F(1, 16) = 0.394, p = 0.541), with success rates of M = 0.628 (SD = 0.162) and M = 0.617 (SD = 0.177), respectively, indicating strong sim‑to‑real transfer. The generated datasets provide a valuable resource for future UAV research. This work introduces diffusion‑based reasoning as a promising paradigm for unifying navigation, action generation, and data synthesis in aerial robotics.
Authors: Salim Oyinlola, Nitesh Subedi, Soumik Sarkar
Abstract: Unmanned Aerial Vehicles (UAVs) are increasingly used in automated inspection, delivery, and navigation tasks that require reliable autonomy. This project develops a reinforcement learning (RL) approach to enable a single UAV to autonomously navigate between predefined points without manual intervention. The drone learns navigation policies through trial‑and‑error interaction, using a custom reward function that encourages goal‑reaching efficiency while penalizing collisions and unsafe behavior. The control system integrates ROS with a Gym‑compatible training environment, enabling flexible deployment and testing. After training, the learned policy is deployed on a real UAV platform and evaluated under practical conditions. Results show that the UAV can successfully perform autonomous navigation with minimal human oversight, demonstrating the viability of RL‑based control for point‑to‑point drone operations in real‑world scenarios.
Authors: Zhixion Chen, Jiangzhou Wang, Hyundong Shin, Arumugam Nallanathan
Abstract: The deployment of unmanned aerial vehicles (UAVs) for reliable and energy‑efficient data collection from spatially distributed devices holds great promise in supporting diverse Internet of Things (IoT) applications. Nevertheless, the limited endurance and communication range of UAVs necessitate intelligent trajectory planning. While reinforcement learning (RL) has been extensively explored for UAV trajectory optimization, its interactive nature entails high costs and risks in real‑world environments. Offline RL mitigates these issues but remains susceptible to unstable training and heavily rely on expert‑quality datasets. To address these challenges, we formulate a joint UAV trajectory planning and resource allocation problem to maximize energy efficiency of data collection. The resource allocation subproblem is first transformed into an equivalent linear programming formulation and solved optimally with polynomial‑time complexity. Then, we propose a large language model (LLM)‑empowered critic‑regularized decision transformer (DT) framework, termed LLM‑CRDT, to learn effective UAV control policies. In LLM‑CRDT, we incorporate critic networks to regularize the DT model training, thereby integrating the sequence modeling capabilities of DT with critic‑based value guidance to enable learning effective policies from suboptimal datasets. Furthermore, to mitigate the data‑hungry nature of transformer models, we employ a pre‑trained LLM as the transformer backbone of the DT model and adopt a parameter‑efficient fine‑tuning strategy, i.e., LoRA, enabling rapid adaptation to UAV control tasks with small‑scale dataset and low computational overhead. Extensive simulations demonstrate that LLM‑CRDT outperforms benchmark online and offline RL methods, achieving up to 36.7% higher energy efficiency than the current state‑of‑the‑art DT approaches.
Authors: Songhao Huang, Yuwei Wu, Guangyao Shi, Gaurav S. Sukhatme, Vijay Kumar
Abstract: We investigate the problem of automatic domain generation for the Planning Domain Definition Language (PDDL) using Large Language Models (LLMs), with a particular focus on unmanned aerial vehicle (UAV) tasks. Although PDDL is a widely adopted standard in robotic planning, manually designing domains for diverse applications such as surveillance, delivery, and inspection is labor‑intensive and error‑prone, which hinders adoption and real‑world deployment. To address these challenges, we propose SPAR, a framework that leverages the generative capabilities of LLMs to automatically produce valid, diverse, and semantically accurate PDDL domains from natural language input. To this end, we first introduce a systematically formulated and validated UAV planning dataset, consisting of ground‑truth PDDL domains and associated problems, each paired with detailed domain and action descriptions. Building on this dataset, we design a prompting framework that generates high‑quality PDDL domains from language input. The generated domains are evaluated through syntax validation, executability, feasibility, and interpretability. Overall, this work demonstrates that LLMs can substantially accelerate the creation of complex planning domains, providing a reproducible dataset and evaluation pipeline that enables application experts without prior experience to leverage it for practical tasks and advance future research in aerial robotics and automated planning.
Authors: Md Bokhtiar Al Zami, Md Raihan Uddin, Dinh C. Nguyen
Abstract: Federated learning (FL) has gained popularity as a privacy‑preserving method of training machine learning models on decentralized networks. However to ensure reliable operation of UAV‑assisted FL systems, issues like as excessive energy consumption, communication inefficiencies, and security vulnerabilities must be solved. This paper proposes an innovative framework that integrates Digital Twin (DT) technology and Zero‑Knowledge Federated Learning (zkFed) to tackle these challenges. UAVs act as mobile base stations, allowing scattered devices to train FL models locally and upload model updates for aggregation. By incorporating DT technology, our approach enables real‑time system monitoring and predictive maintenance, improving UAV network efficiency. Additionally, Zero‑Knowledge Proofs (ZKPs) strengthen security by allowing model verification without exposing sensitive data. To optimize energy efficiency and resource management, we introduce a dynamic allocation strategy that adjusts UAV flight paths, transmission power, and processing rates based on network conditions. Using block coordinate descent and convex optimization techniques, our method significantly reduces system energy consumption by up to 29.6% compared to conventional FL approaches. Simulation results demonstrate improved learning performance, security, and scalability, positioning this framework as a promising solution for next‑generation UAV‑based intelligent networks.
Authors: Anis Koubaa, Khaled Gabr
Abstract: Unmanned Aerial Vehicles (UAVs) are increasingly used in defense, surveillance, and disaster response, yet most systems still operate at SAE Level 2 to 3 autonomy. Their dependence on rule‑based control and narrow AI limits adaptability in dynamic and uncertain missions. Current UAV architectures lack context‑aware reasoning, autonomous decision‑making, and integration with external systems. Importantly, none make use of Large Language Model (LLM) agents with tool‑calling for real‑time knowledge access.
This paper introduces the Agentic UAVs framework, a five‑layer architecture consisting of Perception, Reasoning, Action, Integration, and Learning. The framework enhances UAV autonomy through LLM‑driven reasoning, database querying, and interaction with third‑party systems.
A prototype built with ROS 2 and Gazebo combines YOLOv11 for object detection with GPT‑4 for reasoning and a locally deployed Gemma 3 model. In simulated search‑and‑rescue scenarios, agentic UAVs achieved higher detection confidence (0.79 compared to 0.72), improved person detection rates (91% compared to 75%), and a major increase in correct action recommendations (92% compared to 4.5%). These results show that modest computational overhead can enable significantly higher levels of autonomy and system‑level integration.
Authors: Mehran Behjati, Rosdiadee Nordin, Nor Fadzilah Abdullah
Abstract: This paper presents a reinforcement learning (RL) based approach for path planning of cellular connected unmanned aerial vehicles (UAVs) operating beyond visual line of sight (BVLoS). The objective is to minimize travel distance while maximizing the quality of cellular link connectivity by considering real world aerial coverage constraints and employing an empirical aerial channel model. The proposed solution employs RL techniques to train an agent, using the quality of communication links between the UAV and base stations (BSs) as the reward function. Simulation results demonstrate the effectiveness of the proposed method in training the agent and generating feasible UAV path plans. The proposed approach addresses the challenges due to limitations in UAV cellular communications, highlighting the need for investigations and considerations in this area. The RL algorithm efficiently identifies optimal paths, ensuring maximum connectivity with ground BSs to ensure safe and reliable BVLoS flight operation. Moreover, the solution can be deployed as an offline path planning module that can be integrated into future ground control systems (GCS) for UAV operations, enhancing their capabilities and safety. The method holds potential for complex long range UAV applications, advancing the technology in the field of cellular connected UAV path planning.
Authors: Danish Rizvi, David Boyle
Abstract: This study departs from the prevailing assumption of independent Transmission and Reflection Coefficients (TRC) in Airborne Simultaneous Transmit and Reflect Reconfigurable Intelligent Surface (STAR‑RIS) research. Instead, we explore a novel multi‑user downlink communication system that leverages a UAV‑mounted STAR‑RIS (Aerial‑STAR) incorporating a coupled TRC phase shift model. Our key contributions include the joint optimization of UAV trajectory, active beamforming vectors at the base station, and passive RIS TRCs to enhance communication efficiency, while considering UAV energy constraints. We design the TRC as a combination of discrete and continuous actions, and propose a novel Dual Actor Deep Deterministic Policy Gradient (DA‑DDPG) algorithm. The algorithm relies on two separate actor networks for high‑dimensional hybrid action space. We also propose a novel harmonic mean index (HFI)‑based reward function to ensure communication fairness amongst users. For comprehensive analysis, we study the impact of RIS size on UAV aerodynamics showing that it increases drag and energy demand. Simulation results demonstrate that the proposed DA‑DDPG algorithm outperforms conventional DDPG and DQN‑based solutions by 24% and 97%, respectively, in accumulated reward. Three‑dimensional UAV trajectory optimization achieves 28% higher communication efficiency compared to two‑dimensional and altitude optimization. The HFI based reward function provides 41% lower QoS denial rates as compared to other benchmarks. The mobile Aerial‑STAR system shows superior performance over fixed deployed counterparts, with the coupled phase STAR‑RIS outperforming dual Transmit/Reflect RIS and conventional RIS setups. These findings highlight the potential of Aerial‑STAR systems and the effectiveness of our proposed DA‑DDPG approach in optimizing their performance.
Authors: Yihua Chen, Xingle Que, Jiashuo Zhang, Ting Chen, Guangshun Li, Jiachi Chen
Abstract: The integration of unmanned aerial vehicles (UAVs) and large language models (LLMs) has emerged as a research direction of growing interest, with the potential to address challenges in autonomous decision‑making, human‑UAV interaction, and real‑time adaptability. However, existing studies have remained largely in preliminary exploration with a limited understanding of real‑world practice, risking a misalignment between academic research and practical needs and hindering the translation of results. To examine and address these potential challenges, we conducted an empirical study of 74 selected papers and 56 public GitHub projects, identified nine task types for LLMs in UAV systems, and quantified their distribution. Our findings show that academic research emphasizes theoretical modeling and task optimization with dispersed attention across tasks. In contrast, industrial projects focus on flight control, task planning, and human‑machine interaction, prioritizing operability and efficiency. To further capture industry perspectives, we distributed an online questionnaire. We obtained 52 valid responses: 40.4% of practitioners have attempted to apply LLMs to UAV tasks. We further identify factors that impede real‑world integration, including technological maturity, performance, safety, cost, and other considerations. Finally, we highlight challenges for future development and provide recommendations.
Authors: Yifan Jiang, Qingqing Wu, Hongxun Hui, Wen Chen, Derrick Wing Kwan Ng
Abstract: Sensing‑assisted predictive beamforming, as one of the enabling technologies for emerging integrated sensing and communication (ISAC) paradigm, shows significant promise for enhancing various future unmanned aerial vehicle (UAV) applications. However, current works predominately emphasized on spectral efficiency enhancement, while the impact of such beamforming techniques on the communication reliability was largely unexplored and challenging to characterize. To fill this research gap and tackle this issue, this paper investigates outage capacity maximization for UAV tracking under the sensing‑assisted predictive beamforming scheme. Specifically, a cellular‑connected UAV tracking scheme is proposed leveraging extended Kalman filtering (EKF), where the predicted UAV trajectory, sensing duration ratio, and target constant received signal‑to‑noise ratio (SNR) are jointly optimized to maximize the outage capacity at each time slot. To address the implicit nature of the objective function, closed‑form approximations of the outage probabilities (OPs) at both prediction and measurement stages of each time slot are proposed based on second‑order Taylor expansions, providing an efficient and full characterization of outage capacity. Subsequently, an efficient algorithm is proposed based on a combination of bisection search and successive convex approximation (SCA) to address the non‑convex optimization problem with guaranteed convergence. To further reduce computational complexity, a second efficient algorithm is developed based on alternating optimization (AO). Simulation results validate the accuracy of the derived OP approximations, the effectiveness of the proposed algorithms, and the significant outage capacity enhancement over various benchmarks, while also indicating a trade‑off between decreasing path loss and enjoying wide beam coverage for outage capacity maximization.
Authors: YiTong Liu, TianZhu Liu, YanFeng GU
Abstract: Cross‑view geo‑localization aims to determine the geographical location of a query image by matching it against a gallery of images. This task is challenging due to the significant appearance variations of objects observed from variable views, along with the difficulty in extracting discriminative features. Existing approaches often rely on extracting features through feature map segmentation while neglecting spatial and semantic information. To address these issues, we propose the EVA02‑based Multi‑scale Frequency Attention Fusion (MFAF) method. The MFAF method consists of Multi‑Frequency Branch‑wise Block (MFB) and the Frequency‑aware Spatial Attention (FSA) module. The MFB block effectively captures both low‑frequency structural features and high‑frequency edge details across multiple scales, improving the consistency and robustness of feature representations across various viewpoints. Meanwhile, the FSA module adaptively focuses on the key regions of frequency features, significantly mitigating the interference caused by background noise and viewpoint variability. Extensive experiments on widely recognized benchmarks, including University‑1652, SUES‑200, and Dense‑UAV, demonstrate that the MFAF method achieves competitive performance in both drone localization and drone navigation tasks.
Authors: Siri Vennela Geddam, Sruthi Ilapuram, Kamesh Namuduri, K L V Sai Prakash Sakuru
Abstract: Space‑Air‑Ground‑Integrated Networks (SAGIN) enable seamless data connectivity for applications such as smart transport, healthcare, smart cities, and disaster response through the coordinated use of low‑earth orbit (LEO) satellites, base stations mounted with uncrewed aerial vehicles (UAV), and terrestrial infrastructure. This paper provides a detailed analysis of resource management frameworks, reviews the literature, and evaluates key methods such as alternating optimization (AO), damped iterative water filling (DIWF), and genetic algorithms (GA) for resource allocation. MATLAB simulation results benchmark these algorithms across 10,000 trials, demonstrating robust, fair, and low‑latency resource allocation. In addition, this paper also analyzes strategies for user association with terrestrial and aerial base stations during emergencies and network overloads. The main contributions include a comparative assessment of resource allocation strategies in SAGIN and an in‑depth analysis of user association policies for emergency scenarios. The study provides guidance for designing resilient and efficient next‑generation networks. Potential future research directions include investigating satellite handover and multi‑domain orchestration for SAGIN deployments.
Authors: Àlmos Veres-Vitàlyos, Genis Castillo Gomez-Raya, Filip Lemic, Daniel Johannes Bugelnig, Bernhard Rinner, Sergi Abadal, Xavier Costa-Pérez
Abstract: Small Unmanned Aerial Vehicles (UAVs) exhibit immense potential for navigating indoor and hard‑to‑reach areas, yet their significant constraints in payload and autonomy have largely prevented their use for complex tasks like high‑quality 3‑Dimensional (3D) reconstruction. To overcome this challenge, we introduce a novel system architecture that enables fully autonomous, high‑fidelity 3D scanning of static objects using UAVs weighing under 100 grams. Our core innovation lies in a dual‑reconstruction pipeline that creates a real‑time feedback loop between data capture and flight control. A near‑real‑time (near‑RT) process uses Structure from Motion (SfM) to generate an instantaneous pointcloud of the object. The system analyzes the model quality on the fly and dynamically adapts the UAV's trajectory to intelligently capture new images of poorly covered areas. This ensures comprehensive data acquisition. For the final, detailed output, a non‑real‑time (non‑RT) pipeline employs a Neural Radiance Fields (NeRF)‑based Neural 3D Reconstruction (N3DR) approach, fusing SfM‑derived camera poses with precise Ultra Wide‑Band (UWB) location data to achieve superior accuracy. We implemented and validated this architecture using Crazyflie 2.1 UAVs. Our experiments, conducted in both single‑ and multi‑UAV configurations, conclusively show that dynamic trajectory adaptation consistently improves reconstruction quality over static flight paths. This work demonstrates a scalable and autonomous solution that unlocks the potential of miniaturized UAVs for fine‑grained 3D reconstruction in constrained environments, a capability previously limited to much larger platforms.
Authors: Rongkun Zhu, Kangning Cui, Wei Tang, Rui-Feng Wang, Sarra Alqahtani, David Lutz, Fan Yang, Paul Fine, Jordan Karubian, Robert Plemmons, Jean-Michel Morel, Victor Pauca, Miles Silman
Abstract: Accurate mapping of individual trees is essential for ecological monitoring and forest management. Orthomosaic imagery from unmanned aerial vehicles (UAVs) is widely used, but stitching artifacts and heavy preprocessing limit its suitability for field deployment. This study explores the use of raw UAV imagery for palm detection and crown‑center localization in tropical forests. Two research questions are addressed: (1) how detection performance varies across orthomosaic and raw imagery, including within‑domain and cross‑domain transfer, and (2) to what extent crown‑center annotations improve localization accuracy beyond bounding‑box centroids. Using state‑of‑the‑art detectors and keypoint models, we show that raw imagery yields superior performance in deployment‑relevant scenarios, while orthomosaics retain value for robust cross‑domain generalization. Incorporating crown‑center annotations in training further improves localization and provides precise tree positions for downstream ecological analyses. These findings offer practical guidance for UAV‑based biodiversity and conservation monitoring.
Authors: Xuli Cai, Poonam Lohan, Burak Kantarci
Abstract: This letter addresses a critical challenge in the context of 6G and beyond wireless networks, the joint optimization of power and bandwidth resource allocation for aerial intelligent platforms, specifically uncrewed aerial vehicles (UAVs), operating in highly dynamic environments with mobile ground user equipment (UEs). We introduce FLARE (Flying Learning Agents for Resource Efficiency), a learning‑enabled aerial intelligence framework that jointly optimizes UAV positioning, altitude, transmit power, and bandwidth allocation in real‑time. To adapt to UE mobility, we employ Silhouette‑based K‑Means clustering, enabling dynamic grouping of users and UAVs' deployment at cluster centroids for efficient service delivery. The problem is modeled as a multi‑agent control task, with bandwidth discretized into resource blocks and power treated as a continuous variable. To solve this, our proposed framework, FLARE, employs a hybrid reinforcement learning strategy that combines Multi‑Agent Deep Deterministic Policy Gradient (MADDPG) and Deep Q‑Network (DQN) to enhance learning efficiency. Simulation results demonstrate that our method significantly enhances user coverage, achieving a 73.45% improvement in the number of served users under a 5 Mbps data rate constraint, outperforming MADDPG baseline.
Authors: Nhut Le, Ehsan Karimi, Maryam Rahnemoonfar
Abstract: Timely assessment of structural damage is critical for disaster response and recovery. However, most prior work in natural disaster analysis relies on 2D imagery, which lacks depth, suffers from occlusions, and provides limited spatial context. 3D semantic segmentation offers a richer alternative, but existing 3D benchmarks focus mainly on urban or indoor scenes, with little attention to disaster‑affected areas. To address this gap, we present 3DAeroRelief‑‑the first 3D benchmark dataset specifically designed for post‑disaster assessment. Collected using low‑cost unmanned aerial vehicles (UAVs) over hurricane‑damaged regions, the dataset features dense 3D point clouds reconstructed via Structure‑from‑Motion and Multi‑View Stereo techniques. Semantic annotations were produced through manual 2D labeling and projected into 3D space. Unlike existing datasets, 3DAeroRelief captures 3D large‑scale outdoor environments with fine‑grained structural damage in real‑world disaster contexts. UAVs enable affordable, flexible, and safe data collection in hazardous areas, making them particularly well‑suited for emergency scenarios. To demonstrate the utility of 3DAeroRelief, we evaluate several state‑of‑the‑art 3D segmentation models on the dataset to highlight both the challenges and opportunities of 3D scene understanding in disaster response. Our dataset serves as a valuable resource for advancing robust 3D vision systems in real‑world applications for post‑disaster scenarios.
Authors: Nisha Pillai, Aditi Virupakshaiah, Harrison W. Smith, Amanda J. Ashworth, Prasanna Gowda, Phillip R. Owens, Adam R. Rivers, Bindu Nanduri, Mahalingam Ramkumar
Abstract: Animal health monitoring and population management are critical aspects of wildlife conservation and livestock management that increasingly rely on automated detection and tracking systems. While Unmanned Aerial Vehicle (UAV) based systems combined with computer vision offer promising solutions for non‑invasive animal monitoring across challenging terrains, limited availability of labeled training data remains an obstacle in developing effective deep learning (DL) models for these applications. Transfer learning has emerged as a potential solution, allowing models trained on large datasets to be adapted for resource‑limited scenarios such as those with limited data. However, the vast landscape of pre‑trained neural network architectures makes it challenging to select optimal models, particularly for researchers new to the field. In this paper, we propose a reinforcement learning (RL)‑based transfer learning framework that employs an upper confidence bound (UCB) algorithm to automatically select the most suitable pre‑trained model for animal detection tasks. Our approach systematically evaluates and ranks candidate models based on their performance, streamlining the model selection process. Experimental results demonstrate that our framework achieves a higher detection rate while requiring significantly less computational time compared to traditional methods.
Authors: Dimitri Jacquemont, Carlo Bosio, Teaya Yang, Ruiqi Zhang, Ozgur Orun, Shuai Li, Reza Alam, Thomas M. Schutzius, Simo A. Makiharju, Mark W. Mueller
Abstract: Photovoltaic (PV) panels are becoming increasingly widespread in the domain of renewable energy, and thus, small efficiency gains can have massive effects. Anti‑reflective and self‑cleaning coatings enhance panel performance but degrade over time, requiring periodic reapplication. Uncrewed Aerial Vehicles (UAVs) offer a flexible and autonomous way to apply protective coatings more often and at lower cost compared to traditional manual coating methods. In this letter, we propose a quadcopter‑based system, equipped with a liquid dispersion mechanism, designed to automate such tasks. The localization stack only uses onboard sensors, relying on visual‑inertial odometry and the relative position of the PV panel detected with respect to the quadcopter. The control relies on a model‑based controller that accounts for the ground effect and the mass decrease of the quadcopter during liquid dispersion. We validate the autonomy capabilities of our system through extensive indoor and outdoor experiments.
Authors: Yilun Xiao
Abstract: Dense small objects in UAV imagery are often missed due to long‑range viewpoints, occlusion, and clutter[cite: 5]. This paper presents a detector‑agnostic post‑processing framework that converts overlap‑induced redundancy into group evidence[cite: 6]. Overlapping tiling first recovers low‑confidence candidates[cite: 7]. A Spatial Gate (DBSCAN on box centroids) and a Semantic Gate (DBSCAN on ResNet‑18 embeddings) then validates group evidence[cite: 7]. Validated groups receive controlled confidence reweighting before class‑aware NMS fusion[cite: 8]. Experiments on VisDrone show a recall increase from 0.685 to 0.778 (+0.093) and a precision adjustment from 0.801 to 0.595, yielding F1=0.669[cite: 9]. Post‑processing latency averages 0.095 s per image[cite: 10]. These results indicate recall‑first, precision‑trade‑off behavior that benefits recall‑sensitive applications such as far‑field counting and monitoring[cite: 10]. Ablation confirms that tiling exposes missed objects, spatial clustering stabilizes geometry, semantic clustering enforces appearance coherence, and reweighting provides calibrated integration with the baseline[cite: 11]. The framework requires no retraining and integrates with modern detectors[cite: 12]. Future work will reduce semantic gating cost and extend the approach with temporal cues[cite: 13].
Authors: Yunfan Ren, Yixi Cai, Haotian Li, Nan Chen, Fangcheng Zhu, Longji Yin, Fanze Kong, Rundong Li, Fu Zhang
Abstract: This survey offers a comprehensive overview of recent advancements in LiDAR‑based autonomous Unmanned Aerial Vehicles (UAVs), covering their design, perception, planning, and control strategies. Over the past decade, LiDAR technology has become a crucial enabler for high‑speed, agile, and reliable UAV navigation, especially in GPS‑denied environments. The paper begins by examining the evolution of LiDAR sensors, emphasizing their unique advantages such as high accuracy, long‑range depth measurements, and robust performance under various lighting conditions, making them particularly well‑suited for UAV applications. The integration of LiDAR with UAVs has significantly enhanced their autonomy, enabling complex missions in diverse and challenging environments. Subsequently, we explore essential software components, including perception technologies for state estimation and mapping, as well as trajectory planning and control methodologies, and discuss their adoption in LiDAR‑based UAVs. Additionally, we analyze various practical applications of the LiDAR‑based UAVs, ranging from industrial operations to supporting different aerial platforms and UAV swarm deployments. The survey concludes by discussing existing challenges and proposing future research directions to advance LiDAR‑based UAVs and enhance multi‑UAV collaboration. By synthesizing recent developments, this paper aims to provide a valuable resource for researchers and practitioners working to push the boundaries of LiDAR‑based UAV systems.
Authors: Alireza Mohammadhosseini, Jacob Chakareski, Nicholas Mastronarde
Abstract: We propose ASL360, an adaptive deep reinforcement learning‑based scheduler for on‑demand 360^\circ video streaming to mobile VR users in next generation wireless networks. We aim to maximize the overall Quality of Experience (QoE) of the users served over a UAV‑assisted 5G wireless network. Our system model comprises a macro base station (MBS) and a UAV‑mounted base station which both deploy mm‑Wave transmission to the users. The 360^\circ video is encoded into dependent layers and segmented tiles, allowing a user to schedule downloads of each layer's segments. Furthermore, each user utilizes multiple buffers to store the corresponding video layer's segments. We model the scheduling decision as a Constrained Markov Decision Process (CMDP), where the agent selects Base or Enhancement layers to maximize the QoE and use a policy gradient‑based method (PPO) to find the optimal policy. Additionally, we implement a dynamic adjustment mechanism for cost components, allowing the system to adaptively balance and prioritize the video quality, buffer occupancy, and quality change based on real‑time network and streaming session conditions. We demonstrate that ASL360 significantly improves the QoE, achieving approximately 2 dB higher average video quality, 80% lower average rebuffering time, and 57% lower video quality variation, relative to competitive baseline methods. Our results show the effectiveness of our layered and adaptive approach in enhancing the QoE in immersive videostreaming applications, particularly in dynamic and challenging network environments.
Authors: Weiyan Lu, Huizhe Li, Yuhao Fang, Zhexuan Zhou, Junda Wu, Yude Li, Youmin Gong, Jie Mei
Abstract: Unmanned aerial vehicles (UAVs) with suspended payloads offer significant advantages for aerial transportation in complex and cluttered environments. However, existing systems face critical limitations, including unreliable perception of the cable‑payload dynamics, inefficient planning in large‑scale environments, and the inability to guarantee whole‑body safety under cable bending and external disturbances. This paper presents Acetrans, an Autonomous, Corridor‑based, and Efficient UAV suspended transport system that addresses these challenges through a unified perception, planning, and control framework. A LiDAR‑IMU fusion module is proposed to jointly estimate both payload pose and cable shape under taut and bent modes, enabling robust whole‑body state estimation and real‑time filtering of cable point clouds. To enhance planning scalability, we introduce the Multi‑size‑Aware Configuration‑space Iterative Regional Inflation (MACIRI) algorithm, which generates safe flight corridors while accounting for varying UAV and payload geometries. A spatio‑temporal, corridor‑constrained trajectory optimization scheme is then developed to ensure dynamically feasible and collision‑free trajectories. Finally, a nonlinear model predictive controller (NMPC) augmented with cable‑bending constraints provides robust whole‑body safety during execution. Simulation and experimental results validate the effectiveness of Acetrans, demonstrating substantial improvements in perception accuracy, planning efficiency, and control safety compared to state‑of‑the‑art methods.
Authors: Junshan Luo, Shilian Wang, Boxiang He
Abstract: Aerial reconfigurable intelligent surfaces (ARIS), deployed on unmanned aerial vehicles (UAVs), could enhance anti‑jamming communication performance by dynamically configuring channel conditions and establishing reliable air‑ground links. However, large‑scale ARIS faces critical deployment challenges due to the prohibitive computational complexity of conventional discrete optimization methods and sophisticated jamming threats. In this paper, we introduce a mean field modeling approach to design the spatial configuration of ARIS by a continuous density function, thus bypassing high‑dimensional combinatorial optimization. We consider an adaptive jammer which adjusts its position and beamforming to minimize the sum‑rate. A key finding reveals that the jammer's optimal strategy is governed by a proximity‑directivity trade‑off between reducing path loss and enhancing spatial focusing. To combat the jamming, we propose a robust anti‑jamming transmission framework that jointly optimizes the BS beamforming, the ARIS reflection, and the ARIS spatial distribution to maximize the worst‑case sum‑rate. By leveraging variational optimization and Riemannian manifold methods, we efficiently solve the functional optimization problems. Our analysis further unveils that the optimal ARIS deployment follows a spatial water‑filling principle, concentrating resources in high‑gain regions while avoiding interference‑prone areas. Simulation results demonstrate that the proposed framework remarkably improves the sum‑rate. Furthermore, the computational complexity of the proposed algorithm is independent of the number of UAVs, validating its effectiveness for scalable ARIS‑assisted anti‑jamming communications.
Authors: Jonas Kühne, Christian Vogt, Michele Magno, Luca Benini
Abstract: Visual Inertial Odometry (VIO) is a widely used computer vision method that determines an agent's movement through a camera and an IMU sensor. This paper presents an efficient and accurate VIO pipeline optimized for applications on micro‑ and nano‑UAVs. The proposed design incorporates state‑of‑the‑art feature detection and tracking methods (SuperPoint, PX4FLOW, ORB), all optimized and quantized for emerging RISC‑V‑based ultra‑low‑power parallel systems on chips (SoCs). Furthermore, by employing a rigid body motion model, the pipeline reduces estimation errors and achieves improved accuracy in planar motion scenarios. The pipeline's suitability for real‑time VIO is assessed on an ultra‑low‑power SoC in terms of compute requirements and tracking accuracy after quantization. The pipeline, including the three feature tracking methods, was implemented on the SoC for real‑world validation. This design bridges the gap between high‑accuracy VIO pipelines that are traditionally run on computationally powerful systems and lightweight implementations suitable for microcontrollers. The optimized pipeline on the GAP9 low‑power SoC demonstrates an average reduction in RMSE of up to a factor of 3.65x over the baseline pipeline when using the ORB feature tracker. The analysis of the computational complexity of the feature trackers further shows that PX4FLOW achieves on‑par tracking accuracy with ORB at a lower runtime for movement speeds below 24 pixels/frame.
Authors: Mohammadreza Narimani, Alireza Pourreza, Ali Moghimi, Mohsen Mesgaran, Parastoo Farajpoor, Hamid Jafarbiglu
Abstract: This study addresses the escalating threat of branched broomrape (Phelipanche ramosa) to California's tomato industry, which supplies over 90 percent of U.S. processing tomatoes. The parasite's largely underground life cycle makes early detection difficult, while conventional chemical controls are costly, environmentally harmful, and often ineffective. To address this, we combined drone‑based multispectral imagery with Long Short‑Term Memory (LSTM) deep learning networks, using the Synthetic Minority Over‑sampling Technique (SMOTE) to handle class imbalance. Research was conducted on a known broomrape‑infested tomato farm in Woodland, Yolo County, CA, across five key growth stages determined by growing degree days (GDD). Multispectral images were processed to isolate tomato canopy reflectance. At 897 GDD, broomrape could be detected with 79.09 percent overall accuracy and 70.36 percent recall without integrating later stages. Incorporating sequential growth stages with LSTM improved detection substantially. The best‑performing scenario, which integrated all growth stages with SMOTE augmentation, achieved 88.37 percent overall accuracy and 95.37 percent recall. These results demonstrate the strong potential of temporal multispectral analysis and LSTM networks for early broomrape detection. While further real‑world data collection is needed for practical deployment, this study shows that UAV‑based multispectral sensing coupled with deep learning could provide a powerful precision agriculture tool to reduce losses and improve sustainability in tomato production.
Authors: Razvan Stefanescu, Ethan Oh, Ruben Vazquez, Chris Mesterharm, Constantin Serban, Ritu Chadha
Abstract: We introduce a multi‑modal WAVE‑DETR drone detector combining visible RGB and acoustic signals for robust real‑life UAV object detection. Our approach fuses visual and acoustic features in a unified object detector model relying on the Deformable DETR and Wav2Vec2 architectures, achieving strong performance under challenging environmental conditions. Our work leverage the existing Drone‑vs‑Bird dataset and the newly generated ARDrone dataset containing more than 7,500 synchronized images and audio segments. We show how the acoustic information is used to improve the performance of the Deformable DETR object detector on the real ARDrone dataset. We developed, trained and tested four different fusion configurations based on a gated mechanism, linear layer, MLP and cross attention. The Wav2Vec2 acoustic embeddings are fused with the multi resolution feature mappings of the Deformable DETR and enhance the object detection performance over all drones dimensions. The best performer is the gated fusion approach, which improves the mAP of the Deformable DETR object detector on our in‑distribution and out‑of‑distribution ARDrone datasets by 11.1% to 15.3% for small drones across all IoU thresholds between 0.5 and 0.9. The mAP scores for medium and large drones are also enhanced, with overall gains across all drone sizes ranging from 3.27% to 5.84%.
Authors: Spyridon Loukovitis, Anastasios Arsenos, Vasileios Karampinis, Athanasios Voulodimos
Abstract: Open‑set detection is crucial for robust UAV autonomy in air‑to‑air object detection under real‑world conditions. Traditional closed‑set detectors degrade significantly under domain shifts and flight data corruption, posing risks to safety‑critical applications. We propose a novel, model‑agnostic open‑set detection framework designed specifically for embedding‑based detectors. The method explicitly handles unknown object rejection while maintaining robustness against corrupted flight data. It estimates semantic uncertainty via entropy modeling in the embedding space and incorporates spectral normalization and temperature scaling to enhance open‑set discrimination. We validate our approach on the challenging AOT aerial benchmark and through extensive real‑world flight tests. Comprehensive ablation studies demonstrate consistent improvements over baseline methods, achieving up to a 10% relative AUROC gain compared to standard YOLO‑based detectors. Additionally, we show that background rejection further strengthens robustness without compromising detection accuracy, making our solution particularly well‑suited for reliable UAV perception in dynamic air‑to‑air environments.
Authors: Yuan Shufang
Abstract: Object detection in unmanned aerial vehicle (UAV) imagery presents significant challenges. Issues such as densely packed small objects, scale variations, and occlusion are commonplace. This paper introduces RT‑DETR++, which enhances the encoder component of the RT‑DETR model. Our improvements focus on two key aspects. First, we introduce a channel‑gated attention‑based upsampling/downsampling (AU/AD) mechanism. This dual‑path system minimizes errors and preserves details during feature layer propagation. Second, we incorporate CSP‑PAC during feature fusion. This technique employs parallel hollow convolutions to process local and contextual information within the same layer, facilitating the integration of multi‑scale features. Evaluation demonstrates that our novel neck design achieves superior performance in detecting small and densely packed objects. The model maintains sufficient speed for real‑time detection without increasing computational complexity. This study provides an effective approach for feature encoding design in real‑time detection systems.
Authors: Luke Snow, Vikram Krishnamurthy
Abstract: Multi‑agent inverse reinforcement learning (IRL) aims to identify Pareto‑efficient behavior in a multi‑agent system, and reconstruct utility functions of the individual agents. Motivated by the problem of detecting UAV coordination, how can we construct a statistical detector for Pareto‑efficient behavior given noisy measurements of the decisions of a multi‑agent system? This paper approaches this IRL problem by deriving necessary and sufficient conditions for a dataset of multi‑agent system dynamics to be consistent with Pareto‑efficient coordination, and providing algorithms for recovering utility functions which are consistent with the system dynamics. We derive an optimal statistical detector for determining Pareto‑efficient coordination from noisy system measurements, which minimizes Type‑I statistical detection error. Then, we provide a utility estimation algorithm which minimizes the worst‑case estimation error over a statistical ambiguity set centered at empirical observations; this min‑max solution achieves distributionally robust IRL, which is crucial in adversarial strategic interactions. We illustrate these results in a detailed example for detecting Pareto‑efficient coordination among multiple UAVs given noisy measurement recorded at a radar. We then reconstruct the utility functions of the UAVs in a distributionally robust sense.
Authors: Muhammad Ali Jamshed, Muhammad Ahmed Mohsin, Hongliang Zhang, Bushra Haq, Aryan Kaushik, Boya Di, Weiwei Jiang
Abstract: To overcome the challenges of ultra‑low latency, ubiquitous coverage, and soaring data rates, this article presents a combined use of Near Field Communication (NFC) and Reconfigurable Holographic Surfaces (RHS) for Non‑Terrestrial Networks (NTN). A system architecture has been presented, which shows that the integration of RHS with NTN platforms such as satellites, High Altitute Platform Stations (HAPS), and Uncrewed Aerial Vehicles (UAV) can achieve precise beamforming and intelligent wavefront control in near‑field regions, enhancing Energy Efficiency (EE), spectral utilization, and spatial resolution. Moreover, key applications, challenges, and future directions have been identified to fully adopt this integration. In addition, a use case analysis has been presented to improve the EE of the system in a public safety use case scenario, further strengthening the UAV‑RHS fusion.
Authors: Zeinab Ghasemi Darehnaei, Mohammad Shokouhifar, Hossein Yazdanjouei, S. M. J. Rastegar Fatemi
Abstract: This paper introduces SI‑EDTL, a two‑stage swarm intelligence ensemble deep transfer learning model for detecting multiple vehicles in UAV images. It combines three pre‑trained Faster R‑CNN feature extractor models (InceptionV3, ResNet50, GoogLeNet) with five transfer classifiers (KNN, SVM, MLP, C4.5, Naïve Bayes), resulting in 15 different base learners. These are aggregated via weighted averaging to classify regions as Car, Van, Truck, Bus, or background. Hyperparameters are optimized with the whale optimization algorithm to balance accuracy, precision, and recall. Implemented in MATLAB R2020b with parallel processing, SI‑EDTL outperforms existing methods on the AU‑AIR UAV dataset.
Authors: Kristan Hilby, Ian Hunter
Abstract: Stop‑rotor aircraft have long been proposed as the ideal vertical takeoff and landing (VTOL) aircraft for missions with equal time spent in both flight regimes, such as agricultural monitoring, search and rescue, and last‑mile delivery. Featuring a central lifting surface that rotates in VTOL to generate vertical thrust and locks in forward flight to generate passive lift, the stop‑rotor offers the potential for high efficiency across both modes. However, practical implementation has remained infeasible due to aerodynamic and stability conflicts between flight modes. In this work, we present SPERO (Stopped‑Penta Rotor), a stop‑rotor uncrewed aerial vehicle (UAV) featuring a flipping and latching wing, an active center of pressure mechanism, thrust vectored counterbalances, a five‑rotor architecture, and an eleven‑state machine flight controller coordinating geometric and controller reconfiguration. Furthermore, SPERO establishes a generalizable design and control framework for stopped‑rotor UAVs. Together, these innovations overcome longstanding challenges in stop‑rotor flight and enable the first stable, bidirectional transition between VTOL and forward flight.
Authors: Shucong Li, Zhenyu Liu, Zijie Hong, Zhiheng Zhou, Xianghai Cao
Abstract: Multispectral object detection is an important application for unmanned aerial vehicles (UAVs). However, it faces several challenges. First, low‑light RGB images weaken the multispectral fusion due to details loss. Second, the interference information is introduced to local target modeling during multispectral fusion. Third, computational cost poses deployment challenge on UAV platforms, such as transformer‑based methods with quadratic complexity. To address these issues, a framework named DEPFusion consisting of two designed modules, Dual‑Domain Enhancement (DDE) and Priority‑Guided Mamba Fusion (PGMF) , is proposed for UAV multispectral object detection. Firstly, considering the adoption of low‑frequency component for global brightness enhancement and frequency spectra features for texture‑details recovery, DDE module is designed with Cross‑Scale Wavelet Mamba (CSWM) block and Fourier Details Recovery (FDR) block. Secondly, considering guiding the scanning of Mamba from high priority score tokens, which contain local target feature, a novel Priority‑Guided Serialization is proposed with theoretical proof. Based on it, PGMF module is designed for multispectral feature fusion, which enhance local modeling and reduce interference information. Experiments on DroneVehicle and VEDAI datasets demonstrate that DEPFusion achieves good performance with state‑of‑the‑art methods.
Authors: Tianyu Huo, Jian Xiong, Yiyan Wu, Songjie Yang, Bo Liu, Wenjun Zhang
Abstract: Extremely large antenna array (ELAA) is key to enhancing spectral efficiency in 6G networks. Leveraging the distributed nature of multi‑unmanned aerial vehicle (UAV) systems enables the formation of distributed ELAA, which often operate in the near‑field region with spatial sparsity, rendering the conventional far‑field plane wave assumption invalid. This paper investigates channel estimation for distributed near‑field multi‑UAV communication systems. We first derive closed‑form signal‑to‑noise ratio (SNR) expressions under the plane wave model (PWM), spherical wave model (SWM), and a hybrid spherical‑plane wave model (HSPWM), also referred to as the cross‑field model, within a distributed uniform planar array (UPA) scenario. The analysis shows that HSPWM achieves a good balance between modeling accuracy and analytical tractability. Based on this, we propose two channel estimation algorithms: the spherical‑domain orthogonal matching pursuit (SD‑OMP) and the tensor‑OMP. The SD‑OMP generalizes the polar domain to jointly consider elevation, azimuth, and range. Under the HSPWM, the channel is naturally formulated as a tensor, enabling the use of tensor‑OMP. Simulation results demonstrate that tensor‑OMP achieves normalized mean square error (NMSE) performance comparable to SD‑OMP, while offering reduced computational complexity and improved scalability.
Authors: Christian Geckeler, Niklas Neugebauer, Manasi Muglikar, Davide Scaramuzza, Stefano Mintchev
Abstract: Uncrewed aerial vehicles (UAVs) are increasingly deployed in forest environments for tasks such as environmental monitoring and search and rescue, which require safe navigation through dense foliage and precise data collection. Traditional sensing approaches, including passive multispectral and RGB imaging, suffer from latency, poor depth resolution, and strong dependence on ambient light ‑ especially under forest canopies. In this work, we present a novel event spectroscopy system that simultaneously enables high‑resolution, low‑latency depth reconstruction with integrated multispectral imaging using a single sensor. Depth is reconstructed using structured light, and by modulating the wavelength of the projected structured light, our system captures spectral information in controlled bands between 650 nm and 850 nm. We demonstrate up to 60% improvement in RMSE over commercial depth sensors and validate the spectral accuracy against a reference spectrometer and commercial multispectral cameras, demonstrating comparable performance. A portable version limited to RGB (3 wavelengths) is used to collect real‑world depth and spectral data from a Masoala Rainforest. We demonstrate the use of this prototype for color image reconstruction and material differentiation between leaves and branches using spectral and depth data. Our results show that adding depth (available at no extra effort with our setup) to material differentiation improves the accuracy by over 30% compared to color‑only method. Our system, tested in both lab and real‑world rainforest environments, shows strong performance in depth estimation, RGB reconstruction, and material differentiation ‑ paving the way for lightweight, integrated, and robust UAV perception and data collection in complex natural environments.
Authors: Sajad Ahmadi, Mohammadreza Davoodi, Javad Mohammadpour Velni
Abstract: This paper presents an adaptive coverage control method for a fleet of off‑road and Unmanned Ground Vehicles (UGVs) operating in dynamic (time‑varying) agricultural environments. Traditional coverage control approaches often assume static conditions, making them unsuitable for real‑world farming scenarios where obstacles, such as moving machinery and uneven terrains, create continuous challenges. To address this, we propose a real‑time path planning framework that integrates Unmanned Aerial Vehicles (UAVs) for obstacle detection and terrain assessment, allowing UGVs to dynamically adjust their coverage paths. The environment is modeled as a weighted directed graph, where the edge weights are continuously updated based on the UAV observations to reflect obstacle motion and terrain variations. The proposed approach incorporates Voronoi‑based partitioning, adaptive edge weight assignment, and cost‑based path optimization to enhance navigation efficiency. Simulation results demonstrate the effectiveness of the proposed method in improving path planning, reducing traversal costs, and maintaining robust coverage in the presence of dynamic obstacles and muddy terrains.
Authors: Guangyu Lei, Tianhao Liang, Yuqi Ping, Xinglin Chen, Longyu Zhou, Junwei Wu, Xiyuan Zhang, Huahao Ding, Xingjian Zhang, Weijie Yuan, Tingting Zhang, Qinyu Zhang
Abstract: The rapid development of the low‑altitude economy emphasizes the critical need for effective perception and intent recognition of non‑cooperative unmanned aerial vehicles (UAVs). The advanced generative reasoning capabilities of multimodal large language models (MLLMs) present a promising approach in such tasks. In this paper, we focus on the combination of UAV intent recognition and the MLLMs. Specifically, we first present an MLLM‑enabled UAV intent recognition architecture, where the multimodal perception system is utilized to obtain real‑time payload and motion information of UAVs, generating structured input information, and MLLM outputs intent recognition results by incorporating environmental information, prior knowledge, and tactical preferences. Subsequently, we review the related work and demonstrate their progress within the proposed architecture. Then, a use case for low‑altitude confrontation is conducted to demonstrate the feasibility of our architecture and offer valuable insights for practical system design. Finally, the future challenges are discussed, followed by corresponding strategic recommendations for further applications.
Authors: Yanwei Gong, Ruichen Zhang, Xiaoqing Wang, Xiaolin Chang, Bo Ai, Junchao Fan, Bocheng Ju, Dusit Niyato
Abstract: Unmanned Aerial Vehicle (UAV) cluster services are crucial for promoting the low‑altitude economy by enabling scalable, flexible, and adaptive aerial networks. To meet diverse service demands, clusters must dynamically incorporate a New UAVs (NUAVs) or an Existing UAV (EUAV). However, achieving sustained service reliability remains challenging due to the need for efficient and scalable NUAV authentication, privacy‑preserving cross‑cluster authentication for EUAVs, and robust protection of the cluster session key, including both forward and backward secrecy. To address these challenges, we propose a Lightweight and Privacy‑Preserving Cluster Authentication and Session Key Update (LP2‑CASKU) scheme tailored for dynamic UAV clusters in low‑altitude economy networks. LP2‑CASKU integrates an efficient batch authentication mechanism that simultaneously authenticates multiple NUAVs with minimal communication overhead. It further introduces a lightweight cross‑cluster authentication mechanism that ensures EUAV anonymity and unlinkability. Additionally, a secure session key update mechanism is incorporated to maintain key confidentiality over time, thereby preserving both forward and backward secrecy. We provide a comprehensive security analysis and evaluate LP2‑CASKU performance through both theoretical analysis and OMNeT++ simulations. Experimental results demonstrate that, compared to the baseline, LP2‑CASKU achieves a latency reduction of 82.8%‑90.8% by across different UAV swarm configurations and network bitrates, demonstrating strong adaptability to dynamic communication environments. Besides, under varying UAV swarm configurations, LP2‑CASKU reduces the energy consumption by approximately 37.6‑72.6%, while effectively supporting privacy‑preserving authentication in highly dynamic UAV cluster environments.
Authors: Feng Shen, Jiaming Cui, Wenqiang Li, Shuai Zhou
Abstract: Automated defect detection from UAV imagery of transmission lines is a challenging task due to the small size, ambiguity, and complex backgrounds of defects. This paper proposes TinyDef‑DETR, a DETR‑based framework designed to achieve accurate and efficient detection of transmission line defects from UAV‑acquired images. The model integrates four major components: an edge‑enhanced ResNet backbone to strengthen boundary‑sensitive representations, a stride‑free space‑to‑depth module to enable detail‑preserving downsampling, a cross‑stage dual‑domain multi‑scale attention mechanism to jointly model global context and local cues, and a Focaler‑Wise‑SIoU regression loss to improve the localization of small and difficult objects. Together, these designs effectively mitigate the limitations of conventional detectors. Extensive experiments on both public and real‑world datasets demonstrate that TinyDef‑DETR achieves superior detection performance and strong generalization capability, while maintaining modest computational overhead. The accuracy and efficiency of TinyDef‑DETR make it a suitable method for UAV‑based transmission line defect detection, particularly in scenarios involving small and ambiguous objects.
Authors: Zhenhai Weng, Xinjie Li, Can Wu, Weijie He, Jianfeng Lv, Dong Zhou, Zhongliang Yu
Abstract: Open‑Vocabulary Object Detection (OVD) faces severe performance degradation when applied to UAV imagery due to the domain gap from ground‑level datasets. To address this challenge, we propose a complete UAV‑oriented solution that combines both dataset construction and model innovation. First, we design a refined UAV‑Label Engine, which efficiently resolves annotation redundancy, inconsistency, and ambiguity, enabling the generation of largescale UAV datasets. Based on this engine, we construct two new benchmarks: UAVDE‑2M, with over 2.4M instances across 1,800+ categories, and UAVCAP‑15K, providing rich image‑text pairs for vision‑language pretraining. Second, we introduce the Cross‑Attention Gated Enhancement (CAGE) module, a lightweight dual‑path fusion design that integrates cross‑attention, adaptive gating, and global FiLM modulation for robust textvision alignment. By embedding CAGE into the YOLO‑World‑v2 framework, our method achieves significant gains in both accuracy and efficiency, notably improving zero‑shot detection on VisDrone by +5.3 mAP while reducing parameters and GFLOPs, and demonstrating strong cross‑domain generalization on SIMD. Extensive experiments and real‑world UAV deployment confirm the effectiveness and practicality of our proposed solution for UAV‑based OVD
Authors: Hongyu Zhou, Yunzhou Zhang, Tingsong Huang, Fawei Ge, Man Qi, Xichen Zhang, Yizhong Zhang
Abstract: Cross‑view geo‑localization plays a critical role in Unmanned Aerial Vehicle (UAV) localization and navigation. However, significant challenges arise from the drastic viewpoint differences and appearance variations between images. Existing methods predominantly rely on semantic features from RGB images, often neglecting the importance of spatial structural information in capturing viewpoint‑invariant features. To address this issue, we incorporate geometric structural information from normal images and introduce a Joint perception network to integrate RGB and Normal images (JRN‑Geo). Our approach utilizes a dual‑branch feature extraction framework, leveraging a Difference‑Aware Fusion Module (DAFM) and Joint‑Constrained Interaction Aggregation (JCIA) strategy to enable deep fusion and joint‑constrained semantic and structural information representation. Furthermore, we propose a 3D geographic augmentation technique to generate potential viewpoint variation samples, enhancing the network's ability to learn viewpoint‑invariant features. Extensive experiments on the University‑1652 and SUES‑200 datasets validate the robustness of our method against complex viewpoint ariations, achieving state‑of‑the‑art performance.
Authors: Andrzej D. Dobrzycki, Ana M. Bernardos, José R. Casar
Abstract: The You Only Look Once (YOLO) architecture is crucial for real‑time object detection. However, deploying it in resource‑constrained environments such as unmanned aerial vehicles (UAVs) requires efficient transfer learning. Although layer freezing is a common technique, the specific impact of various freezing configurations on contemporary YOLOv8 and YOLOv10 architectures remains unexplored, particularly with regard to the interplay between freezing depth, dataset characteristics, and training dynamics. This research addresses this gap by presenting a detailed analysis of layer‑freezing strategies. We systematically investigate multiple freezing configurations across YOLOv8 and YOLOv10 variants using four challenging datasets that represent critical infrastructure monitoring. Our methodology integrates a gradient behavior analysis (L2 norm) and visual explanations (Grad‑CAM) to provide deeper insights into training dynamics under different freezing strategies. Our results reveal that there is no universal optimal freezing strategy but, rather, one that depends on the properties of the data. For example, freezing the backbone is effective for preserving general‑purpose features, while a shallower freeze is better suited to handling extreme class imbalance. These configurations reduce graphics processing unit (GPU) memory consumption by up to 28% compared to full fine‑tuning and, in some cases, achieve mean average precision (mAP@50) scores that surpass those of full fine‑tuning. Gradient analysis corroborates these findings, showing distinct convergence patterns for moderately frozen models. Ultimately, this work provides empirical findings and practical guidelines for selecting freezing strategies. It offers a practical, evidence‑based approach to balanced transfer learning for object detection in scenarios with limited resources.
Authors: Ashen Rodrigo, Isuru Munasinghe, Pubudu Sanjeewani, Asanka Perera
Abstract: Timely and accurate detection of defects and contaminants in solar panels is critical for maintaining the efficiency and reliability of photovoltaic (PV) systems. While recent studies have applied deep learning to PV inspection, fair benchmarking across detector architectures and unbiased handling of class imbalance remain limited. This work presents a comprehensive benchmark of convolutional and transformer‑based object detectors on UAV‑captured RGB imagery of solar panels. It introduces a class‑targeted augmentation strategy applied exclusively to the training split to mitigate imbalance without compromising evaluation integrity. Faster R‑CNN with ResNet50 and MobileNetV3 backbones, RetinaNet with ResNet50, YOLOv5, YOLOv8, and Swin Transformer backbones integrated with Faster R‑CNN (Tiny, Small, and Base variants) are evaluated. Performance is assessed using mean Average Precision (mAP) across multiple IoU thresholds, precision, recall, F1 score, and inference throughput to enable accuracy‑throughput tradeoff analysis relevant to UAV deployment. Experimental results show that Faster R‑CNN with a ResNet50 backbone achieves the highest localization accuracy, with mAP@0.5 of 0.893 and mAP@0.5:0.95 of 0.759, whereas the MobileNetV3 variant provides the best overall reliability balance, achieving recall of 0.745, F1‑score of 0.809, and accuracy of 0.679 on the test set. The dataset and code will be released upon acceptance of the paper.
Authors: Zhangding Liu, Neda Mohammadi, John E. Taylor
Abstract: Rapid and accurate post‑hurricane damage assessment is vital for disaster response and recovery. Yet existing CNN‑based methods struggle to capture multi‑scale spatial features and to distinguish visually similar or co‑occurring damage types. To address these issues, we propose MCANet, a multi‑label classification framework that learns multi‑scale representations and adaptively attends to spatially relevant regions for each damage category. MCANet employs a Res2Net‑based hierarchical backbone to enrich spatial context across scales and a multi‑head class‑specific residual attention module to enhance discrimination. Each attention branch focuses on different spatial granularities, balancing local detail with global context. We evaluate MCANet on the RescueNet dataset of 4,494 UAV images collected after Hurricane Michael. MCANet achieves a mean average precision (mAP) of 91.75%, outperforming ResNet, Res2Net, VGG, MobileNet, EfficientNet, and ViT. With eight attention heads, performance further improves to 92.35%, boosting average precision for challenging classes such as Road Blocked by over 6%. Class activation mapping confirms MCANet's ability to localize damage‑relevant regions, supporting interpretability. Outputs from MCANet can inform post‑disaster risk mapping, emergency routing, and digital twin‑based disaster response. Future work could integrate disaster‑specific knowledge graphs and multimodal large language models to improve adaptability to unseen disasters and enrich semantic understanding for real‑world decision‑making.
Authors: Ali Khanpour, Tianyi Wang, Afra Vahidi-Shams, Wim Ectors, Farzam Nakhaie, Amirhossein Taheri, Christian Claudel
Abstract: Traffic congestion and violations pose significant challenges for urban mobility and road safety. Traditional traffic monitoring systems, such as fixed cameras and sensor‑based methods, are often constrained by limited coverage, low adaptability, and poor scalability. To address these challenges, this paper introduces an advanced unmanned aerial vehicle (UAV)‑based traffic surveillance system capable of accurate vehicle detection, classification, tracking, and behavioral analysis in real‑world, unconstrained urban environments. The system leverages multi‑scale and multi‑angle template matching, Kalman filtering, and homography‑based calibration to process aerial video data collected from altitudes of approximately 200 meters. A case study in urban area demonstrates robust performance, achieving a detection precision of 91.8%, an F1‑score of 90.5%, and tracking metrics (MOTA/MOTP) of 92.1% and 93.7%, respectively. Beyond precise detection, the system classifies five vehicle types and automatically detects critical traffic violations, including unsafe lane changes, illegal double parking, and crosswalk obstructions, through the fusion of geofencing, motion filtering, and trajectory deviation analysis. The integrated analytics module supports origin‑destination tracking, vehicle count visualization, inter‑class correlation analysis, and heatmap‑based congestion modeling. Additionally, the system enables entry‑exit trajectory profiling, vehicle density estimation across road segments, and movement direction logging, supporting comprehensive multi‑scale urban mobility analytics. Experimental results confirms the system's scalability, accuracy, and practical relevance, highlighting its potential as an enforcement‑aware, infrastructure‑independent traffic monitoring solution for next‑generation smart cities.
Authors: Nicole Fronda, Hariharan Narayanan, Sadia Afrin Ananna, Steven Weber, Houssam Abbas
Abstract: We present a new approach for designing risk‑bounded controllers for Uncrewed Aerial Vehicles (UAVs). Existing frameworks for assessing risk of UAV operations rely on knowing the conditional probability of an incident occurring given different causes. Limited data for computing these probabilities makes real‑world implementation of these frameworks difficult. Furthermore, existing frameworks do not include control methods for risk mitigation. Our approach relies on UAV dynamics, and employs reachability analysis for a probabilistic risk assessment over all feasible UAV trajectories. We use this holistic risk assessment to formulate a control optimization problem that minimally changes a UAV's existing control law to be bounded by an accepted risk threshold. We call our approach PRReach. Public and readily available UAV dynamics models and open source spatial data for mapping hazard outcomes enables practical implementation of PRReach for both offline pre‑flight and online in‑flight risk assessment and mitigation. We evaluate PRReach through simulation experiments on real‑world data. Results show that PRReach controllers reduce risk by up to 24% offline, and up to 53% online from classical controllers.
Authors: Guangyu Lei, Yuqi Ping, Tianhao Liang, Huahao Ding, Tingting Zhang
Abstract: Relative localization of unmanned aerial vehicle (UAV) swarms in global navigation satellite system (GNSS) denied environments is essential for emergency rescue and battlefield reconnaissance. Existing methods suffer from significant localization errors among UAVs due to packet loss and high computational complexity in large swarms. This paper proposes a clustering‑based framework where the UAVs simultaneously use communication signals for channel estimation and ranging. Firstly, the spectral clustering is utilized to divide the UAV swarm into different sub‑clusters, where matrix completion and multidimensional scaling yield high‑precision relative coordinates. Subsequently, a global map is created by the inter‑cluster anchor fusion. A case study of UAV integrated communication and sensing (ISAC) system is presented, where the Orthogonal Time Frequency Space (OTFS) is adopted for ranging and communication. Experimental results show that the proposed method reduces localization errors in large swarms and loss of range information. It also explores the impact of signal parameters on communication and localization, highlighting the interplay between communication and localization performance.
Authors: Serhii Svystun, Pavlo Radiuk, Oleksandr Melnychenko, Oleg Savenko, Anatoliy Sachenko
Abstract: Unmanned aerial vehicles (UAVs) equipped with advanced sensors have opened up new opportunities for monitoring wind power plants, including blades, towers, and other critical components. However, reliable defect detection requires high‑resolution data and efficient methods to process multispectral imagery. In this research, we aim to enhance defect detection accuracy through the development of an ensemble of YOLO‑based deep learning models that integrate both visible and thermal channels. We propose an ensemble approach that integrates a general‑purpose YOLOv8 model with a specialized thermal model, using a sophisticated bounding box fusion algorithm to combine their predictions. Our experiments show this approach achieves a mean Average Precision (mAP@.5) of 0.93 and an F1‑score of 0.90, outperforming a standalone YOLOv8 model, which scored an mAP@.5 of 0.91. These findings demonstrate that combining multiple YOLO architectures with fused multispectral data provides a more reliable solution, improving the detection of both visual and thermal defects.
Authors: Wenfei Yao, Xiaoming Chen, Qi Wang, Xingyu Peng
Abstract: Low Earth orbit (LEO) satellite constellations play a pivotal role in sixth‑generation (6G) wireless networks by providing global coverage, massive connections, and huge capacity. In this paper, we present a novel LEO satellite constellation communication framework, where a reconfigurable intelligent surface‑mounted unmanned aerial vehicle (RIS‑UAV) is deployed to improve the communication quality of multiple terrestrial user equipments (UEs) under the condition of long distance between satellite and ground. To reduce the overhead for channel state information (CSI) acquisition with multiple‑satellite collaboration, statistical CSI (sCSI) is utilized in the system. In such a situation, we first derive an approximated but exact expression for ergodic rate of each UE. Then, we aim to maximize the minimum approximated UE ergodic rate by the proposed alternating optimization (AO)‑based algorithm that jointly optimizes LEO satellite beamforming, RIS phase shift, and UAV trajectory. Finally, extensive simulations are conducted to demonstrate the superiority of the proposed algorithm in terms of spectrum efficiency over baseline algorithms.
Authors: Yuchen Zhu, Longxiang Yin, Kai Zhao
Abstract: In the frontier research and application of current video surveillance technology, traditional camera systems exhibit significant limitations of response delay exceeding 200 ms in dynamic scenarios due to the insufficient deep feature extraction capability of automatic recognition algorithms and the efficiency bottleneck of computing architectures, failing to meet the real‑time requirements in complex scenes. To address this issue, this study proposes a heterogeneous computing architecture based on Phytium processors and Cambricon accelerator cards, constructing a UAV tracking and gazing system with millisecond‑level response capability. At the hardware level, the system adopts a collaborative computing architecture of Phytium FT‑2000/4 processors and MLU220 accelerator cards, enhancing computing power through multi‑card parallelism. At the software level, it innovatively integrates a lightweight YOLOv5s detection network with a DeepSORT cascaded tracking algorithm, forming a closed‑loop control chain of "detection‑tracking‑feedback". Experimental results demonstrate that the system achieves a stable single‑frame comprehensive processing delay of 50‑100 ms in 19201080 resolution video stream processing, with a multi‑scale target recognition accuracy of over 98.5%, featuring both low latency and high precision. This study provides an innovative solution for UAV monitoring and the application of domestic chips.
Authors: Sherwan Jalal Abdullah, Sravan Reddy Chintareddy, Victor S. Frost, Shawn Keshmiri, Morteza Hashemi
Abstract: In this work, we develop a measurement platform to capture mobile network performance metrics including coverage and quality of service in regions where conventional coverage testing approaches are frequently time‑intensive, labor‑demanding, and occasionally hazardous. Traditionally, crowd‑sourcing methods are used to collect cellular network performance metrics. However, these approaches are inadequate in rural areas due to low‑density population, and difficult terrain. The platform described here is a UAV‑based and is designed to investigate the mobile network performance through aerial operations and gather Radio Access Network (RAN) signal alongside end‑to‑end network performance metrics. Our platform gathers metrics through the integration of an onboard computation unit and commercial off‑the‑shelf cellular modem. The gathered data are subsequently analyzed and displayed using geospatial mapping utilities and statistical techniques to deliver key observations on cellular network performance. Experimental results showed that the received signal power improves at higher altitudes due to enhanced line‑of‑sight (LoS) conditions as expected. However, the signal quality degrades as a result of increased interference from neighboring cells. The analysis reveals that for most of the geographic area covered in the initial experiments the system maintained acceptable signal quality, with adequate throughput performance for both uplink and downlink communications, while maintaining satisfactory round‑trip time characteristics. Notably, the experiment showed that a strong radio signal metric for a given cell does not necessarily translate to consistent spatial coverage across the tested region.
Authors: Josafat-Mattias Burmeister, Andreas Tockner, Stefan Reder, Markus Engel, Rico Richter, Jan-Peter Mund, Jürgen Döllner
Abstract: Close‑range laser scanning provides detailed 3D captures of forest stands but requires efficient software for processing 3D point cloud data and extracting individual trees. Although recent studies have introduced deep learning methods for tree instance segmentation, these approaches require large annotated datasets and substantial computational resources. As a resource‑efficient alternative, we present a revised version of the treeX algorithm, an unsupervised method that combines clustering‑based stem detection with region growing for crown delineation. While the original treeX algorithm was developed for personal laser scanning (PLS) data, we provide two parameter presets, one for ground‑based laser scanning (stationary terrestrial ‑ TLS and PLS), and one for UAV‑borne laser scanning (ULS). We evaluated the method on six public datasets (FOR‑instance, ForestSemantic, LAUTx, NIBIO MLS, TreeLearn, Wytham Woods) and compared it to six open‑source methods (original treeX, treeiso, RayCloudTools, ForAINet, SegmentAnyTree, TreeLearn). Compared to the original treeX algorithm, our revision reduces runtime and improves accuracy, with instance detection F_1‑score gains of +0.11 to +0.49 for ground‑based data. For ULS data, our preset achieves an F_1‑score of 0.58, whereas the original algorithm fails to segment any correct instances. For TLS and PLS data, our algorithm achieves accuracy similar to recent open‑source methods, including deep learning. Given its algorithmic design, we see two main applications for our method: (1) as a resource‑efficient alternative to deep learning approaches in scenarios where the data characteristics align with the method design (sufficient stem visibility and point density), and (2) for the semi‑automatic generation of labels for deep learning models. To enable broader adoption, we provide an open‑source Python implementation in the pointtree package.
Authors: Ignacio Rubio Scola, Omar Alejandro Garcia Alcantara, Steven Sandoval, Eduardo Steed Espinoza Quesada, Hernan Haimovich, Luis Rodolfo Garcia Carrillo
Abstract: This paper employs Geometric Algebra (GA) tools to model the dynamics of objects in 3‑dimensional space, serving as a proof of concept to facilitate control design for trajectory tracking in underactuated systems. For control purposes, the model is structured as a cascade system, where a rotational subsystem drives a translational one. The rotational subsystem is linear, while the translational subsystem follows a linear‑plus‑perturbation form, thereby reducing the complexity of control design. A control strategy requiring only simple operations, no memory, and no iterative search loops is presented to illustrate the main features of the GA model. By employing GA to model both translations and rotations, a singularity‑free and geometrically intuitive representation can be achieved through the use of the geometric product. Closed‑loop stability is rigorously established using input‑to‑state stability methods. Numerical simulations of a quad tilt‑rotorcraft performing trajectory tracking in a windy environment validate the controller's stability and performance.
Authors: Mazyar Taghavi, Rahman Farnoosh
Abstract: Protecting endangered wildlife from illegal poaching presents a critical challenge, particularly in vast and partially observable environments where real‑time response is essential. This paper introduces a novel Expectation‑Maximization (EM) based latent variable modeling approach in the context of Multi‑Agent Reinforcement Learning (MARL) for Unmanned Aerial Vehicle (UAV) coordination in wildlife protection. By modeling hidden environmental factors and inter‑agent dynamics through latent variables, our method enhances exploration and coordination under uncertainty.We implement and evaluate our EM‑MARL framework using a custom simulation involving 10 UAVs tasked with patrolling protected habitats of the endangered Iranian leopard. Extensive experimental results demonstrate superior performance in detection accuracy, adaptability, and policy convergence when compared to standard algorithms such as Proximal Policy Optimization (PPO) and Deep Deterministic Policy Gradient (DDPG). Our findings underscore the potential of combining EM inference with MARL to improve decentralized decisionmaking in complex, high‑stakes conservation scenarios. The full implementation, simulation environment, and training scripts are publicly available on GitHub.
Authors: Keiwan Soltani, Vishesh Kumar Tanwar, Ashish Gupta, Sajal K. Das
Abstract: Smart farming systems encounter significant challenges, including limited resources, the need for data privacy, and poor connectivity in rural areas. To address these issues, we present eEnergy‑Split, an energy‑efficient framework that utilizes split learning (SL) to enable collaborative model training without direct data sharing or heavy computation on edge devices. By distributing the model between edge devices and a central server, eEnergy‑Split reduces on‑device energy usage by up to 86 percent compared to federated learning (FL) while safeguarding data privacy. Moreover, SL improves classification accuracy by up to 6.2 percent over FL on ResNet‑18 and by more modest amounts on GoogleNet and MobileNetV2. We propose an optimal edge deployment algorithm and a UAV trajectory planning strategy that solves the Traveling Salesman Problem (TSP) exactly to minimize flight cost and extend and maximize communication rounds. Comprehensive evaluations on agricultural pest datasets reveal that eEnergy‑Split lowers UAV energy consumption compared to baseline methods and boosts overall accuracy by up to 17 percent. Notably, the energy efficiency of SL is shown to be model‑dependent‑yielding substantial savings in lightweight models like MobileNet, while communication and memory overheads may reduce efficiency gains in deeper networks. These results highlight the potential of combining SL with energy‑aware design to deliver a scalable, privacy‑preserving solution for resource‑constrained smart farming environments.
Authors: Zehra Yigit, Sefa Kayraklik, Ertugrul Basar, Ali Gorcin
Abstract: The synergy between integrated sensing and communication (ISAC) and reconfigurable intelligent surfaces (RISs) unlocks novel applications and advanced services for next‑generation wireless networks, yet also introduces new security challenges. In this study, a novel dual target‑mounted RISs‑assisted ISAC scheme is proposed, where a base station with ISAC capability performs sensing of two unmanned aerial vehicle (UAV) targets, one of which is legitimate and the other is eavesdropper, while communicating with the users through an RIS mounted on the legitimate UAV target. The proposed scheme addresses dual security threats posed by a hostile UAV target: eavesdropping on legitimate user communications and random interference attacks launched by a malicious RIS mounted on this eavesdropper UAV target, aiming to disrupt secure transmissions. Moreover, malicious RIS interference is also optimized for a worst‑case scenario, in which both the channel state information (CSI) and the transmit beamforming of the base station are assumed to be fully compromised by a malicious RIS‑mounted eavesdropper UAV. A non‑convex optimization problem maximizing the secrecy rate of the users is formulated, and a semi‑definite relaxation (SDR)‑based two‑stage solution is developed to optimize the transmit beamforming matrix of the base station and the phase shift coefficients of the legitimate RIS. Extensive computer simulations are conducted to evaluate the robustness of the proposed solution under various system configurations. The proposed system's communication performance is assessed using the secrecy rate metric, while the sensing performance is evaluated through the signal‑to‑interference‑plus‑noise ratio and the Cramer‑Rao bound (CRB) for angle‑of‑departure (AoD) estimation of the eavesdropper UAV target.
Authors: Kuan-Cheng Chen, Samuel Yen-Chi Chen, Tai-Yue Li, Chen-Yu Liu, Kin K. Leung
Abstract: Intrusion detection in unmanned‑aerial‑vehicle (UAV) swarms is complicated by high mobility, non‑stationary traffic, and severe class imbalance. Leveraging a 120 k‑flow simulation corpus that covers five attack types, we benchmark three quantum‑machine‑learning (QML) approaches ‑ quantum kernels, variational quantum neural networks (QNNs), and hybrid quantum‑trained neural networks (QT‑NNs) ‑ against strong classical baselines. All models consume an 8‑feature flow representation and are evaluated under identical preprocessing, balancing, and noise‑model assumptions. We analyse the influence of encoding strategy, circuit depth, qubit count, and shot noise, reporting accuracy, macro‑F1, ROC‑AUC, Matthews correlation, and quantum‑resource footprints. Results reveal clear trade‑offs: quantum kernels and QT‑NNs excel in low‑data, nonlinear regimes, while deeper QNNs suffer from trainability issues, and CNNs dominate when abundant data offset their larger parameter count. The complete codebase and dataset partitions are publicly released to enable reproducible QML research in network security.
Authors: Li Weigang, Pedro Carvalho Brom, Lucas Ramson Siefert
Abstract: We propose a novel SuperBrain framework for collective intelligence, grounded in the co‑evolution of large language models (LLMs) and human users. Unlike static prompt engineering or isolated agent simulations, our approach emphasizes a dynamic pathway from Subclass Brain to Superclass Brain: (1) A Subclass Brain arises from persistent, personalized interaction between a user and an LLM, forming a cognitive dyad with adaptive learning memory. (2) Through GA‑assisted forward‑backward evolution, these dyads iteratively refine prompts and task performance. (3) Multiple Subclass Brains coordinate via Swarm Intelligence, optimizing across multi‑objective fitness landscapes and exchanging distilled heuristics. (4) Their standardized behaviors and cognitive signatures integrate into a Superclass Brain, an emergent meta‑intelligence capable of abstraction, generalization and self‑improvement. We outline the theoretical constructs, present initial implementations (e.g., UAV scheduling, KU/KI keyword filtering) and propose a registry for cross‑dyad knowledge consolidation. This work provides both a conceptual foundation and an architectural roadmap toward scalable, explainable and ethically aligned collective AI.
Authors: Nouhaila Innan, Muhammad Kashif, Alberto Marchisio, Yung-Sze Gan, Frederic Barbaresco, Muhammad Shafique
Abstract: The growing demand for drone navigation in urban and restricted airspaces requires real‑time path planning that is both safe and scalable. Classical methods often struggle with the computational load of high‑dimensional optimization under dynamic constraints like obstacle avoidance and no‑fly zones. This work introduces QUAV, a quantum‑assisted UAV path planning framework based on the Quantum Approximate Optimization Algorithm (QAOA), to the best of our knowledge, this is one of the first applications of QAOA for drone trajectory optimization. QUAV models pathfinding as a quantum optimization problem, allowing efficient exploration of multiple paths while incorporating obstacle constraints and geospatial accuracy through UTM coordinate transformation. A theoretical analysis shows that QUAV achieves linear scaling in circuit depth relative to the number of edges, under fixed optimization settings. Extensive simulations and a real‑hardware implementation on IBM's ibm_kyiv backend validate its performance and robustness under noise. Despite hardware constraints, results demonstrate that QUAV generates feasible, efficient trajectories, highlighting the promise of quantum approaches for future drone navigation systems.
Authors: Changheng Wang, Zhiqing Wei, Wangjun Jiang, Haoyue Jiang, Zhiyong Feng
Abstract: The high mobility of unmanned aerial vehicles (UAVs) enables them to be used in various civilian fields, such as rescue and cargo transport. Path‑following is a crucial way to perform these tasks while sensing and collision avoidance are essential for safe flight. In this paper, we investigate how to efficiently and accurately achieve path‑following, obstacle sensing and avoidance subtasks, as well as their conflict‑free fusion scheduling. Firstly, a high precision deep reinforcement learning (DRL)‑based UAV formation path‑following model is developed, and the reward function with adaptive weights is designed from the perspective of distance and velocity errors. Then, we use integrated sensing and communication (ISAC) signals to detect the obstacle and derive the Cramer‑Rao lower bound (CRLB) for obstacle sensing by information‑level fusion, based on which we propose the variable formation enhanced obstacle position estimation (VFEO) algorithm. In addition, an online obstacle avoidance scheme without pretraining is designed to solve the sparse reward. Finally, with the aid of null space based (NSB) behavioral method, we present a hierarchical subtasks fusion strategy. Simulation results demonstrate the effectiveness and superiority of the subtask algorithms and the hierarchical fusion strategy.
Authors: Kalyani Panigrahi, Rohan Bhattacharya, Sabareesh G. R., Pardha S Gurugubelli
Abstract: Wind tunnel's consisting an array of computer controlled/programmable fans have evolved over the last three decades for simulating the atmospheric turbulence to study the aerodynamic loads over bluff bodies such as buildings and bridges. In addition to civil engineering structures, these wind tunnels can also have applications extending towards free‑flight aerodynamic tests on drones and unmanned aerial vehicles (UAVs). Achieving velocity profiles such as linearly sheared flows or boundary layer flows in a traditional wind tunnel would require long fetch lengths for the flow to evolve. However, in this work, with the aid of an array of computer‑controlled fans, a perfectly linearly sheared flow can be developed at a distance of 1 meter from the source. Unlike the previous fan‑array wind tunnels (FAWT) developed for atmospheric boundary layer flows, the fan‑array tunnel constructed in this work does not include a contraction section or a honeycomb, offering portability and compactness for performing unconfined flight tests. The current work involves the design and development of a 10 x 10 two‑dimensional array of multiple fans facing a 2.4 m long, enclosed test section with a 1.44 m^2 cross‑sectional area. A comprehensive characterization of key flow parameters, including turbulence intensity, mean velocity, and pressure, in the streamwise, spanwise, and transverse directions of the test section is investigated in the developed FAWT. These measurements were conducted under three distinct flow conditions, namely uniform, linearly sheared, and parabolic. Additionally, this work also presents the evolution of the turbulent length scales and time scales across various sections inside the tunnel for a range of wind speeds by employing autocorrelation techniques and power spectral analysis.
Authors: Alexander Gräfe, Joram Eickhoff, Marco Zimmerling, Sebastian Trimpe
Abstract: Swarms of unmanned aerial vehicles (UAVs) are increasingly becoming vital to our society, undertaking tasks such as search and rescue, surveillance and delivery. A special variant of Distributed Model Predictive Control (DMPC) has emerged as a promising approach for the safe management of these swarms by combining the scalability of distributed computation with dynamic swarm motion control. In this DMPC method, multiple agents solve local optimization problems with coupled anti‑collision constraints, periodically exchanging their solutions. Despite its potential, existing methodologies using this DMPC variant have yet to be deployed on distributed hardware that fully utilize true distributed computation and wireless communication. This is primarily due to the lack of a communication system tailored to meet the unique requirements of mobile swarms and an architecture that supports distributed computation while adhering to the payload constraints of UAVs. We present DMPC‑SWARM, a new swarm control methodology that integrates an efficient, stateless low‑power wireless communication protocol with a novel DMPC algorithm that provably avoids UAV collisions even under message loss. By utilizing event‑triggered and distributed off‑board computing, DMPC‑SWARM supports nano UAVs, allowing them to benefit from additional computational resources while retaining scalability and fault tolerance. In a detailed theoretical analysis, we prove that DMPC‑SWARM guarantees collision avoidance under realistic conditions, including communication delays and message loss. Finally, we present DMPC‑SWARM's implementation on a swarm of up to 16 nano‑quadcopters, demonstrating the first realization of these DMPC variants with computation distributed on multiple physical devices interconnected by a real wireless mesh networks. A video showcasing DMPC‑SWARM is available at http://tiny.cc/DMPCSwarm.
Authors: Zheng Li, Xueyi Zhang, Yanming Guo, Yuxiang Xie, Ding Zhaoyun, Siqi Cai, Haizhou Li, Mingrui Lao
Abstract: Cross‑view geo‑localization is a critical task for UAV navigation, event detection, and aerial surveying, as it enables matching between drone‑captured and satellite imagery. Most existing approaches embed multi‑modal data into a joint feature space to maximize the similarity of paired images. However, these methods typically assume perfect alignment of image pairs during training, which rarely holds true in real‑world scenarios. In practice, factors such as urban canyon effects, electromagnetic interference, and adverse weather frequently induce GPS drift, resulting in systematic alignment shifts where only partial correspondences exist between pairs. Despite its prevalence, this source of noisy correspondence has received limited attention in current research. In this paper, we formally introduce and address the Noisy Correspondence on Cross‑View Geo‑Localization (NC‑CVGL) problem, aiming to bridge the gap between idealized benchmarks and practical applications. To this end, we propose PAUL (Partition and Augmentation by Uncertainty Learning), a novel framework that partitions and augments training data based on estimated data uncertainty through uncertainty‑aware co‑augmentation and evidential co‑training. Specifically, PAUL selectively augments regions with high correspondence confidence and utilizes uncertainty estimation to refine feature learning, effectively suppressing noise from misaligned pairs. Distinct from traditional filtering or label correction, PAUL leverages both data uncertainty and loss discrepancy for targeted partitioning and augmentation, thus providing robust supervision for noisy samples. Comprehensive experiments validate the effectiveness of individual components in PAUL,which consistently achieves superior performance over other competitive noisy‑correspondence‑driven methods in various noise ratios.
Authors: Kaiqiang Lin, Mohamed-Slim Alouini
Abstract: Wireless underground sensor networks (WUSNs) offer significant social and economic benefits by enabling the monitoring of subterranean entities. However, the communication reliability of WUSNs diminishes in harsh environments where terrestrial network infrastructure is either unavailable or unreliable. To address this challenge, we explore the feasibility of integrating buried massive machine‑type communication (mMTC) sensors with non‑terrestrial networks (NTNs), including unmanned aerial vehicles (UAVs), high‑altitude platforms (HAPs), and low Earth orbit (LEO) satellites, to establish underground‑to‑NTN connectivity for various large‑scale underground monitoring applications. To assess the effectiveness of underground‑to‑NTN connectivity, we develop a Monte Carlo simulator that incorporates a multi‑layer underground attenuation model, the 3GPP empirical path loss model for various NTN platforms, and two LoRaWAN modulation schemes, i.e., LoRa and LoRa‑frequency hopping spread spectrum (LR‑FHSS). Our results evidence that LoRa SF7 is a strong candidate for short‑range UAV communication in rural environments, while LR‑FHSS modulation proves to be a promising option for HAP and LEO satellite platforms in massive WUSNs scenarios thanks to its adequate link budget and robustness to the interference. Finally, we demonstrate that the success probability of underground‑to‑NTN connectivity using LoRa and LR‑FHSS is significantly affected by factors such as the monitoring environment, the number of devices, burial depth, and the soil's volumetric water content.
Authors: Hichem Cheriet, Khellat Kihel Badra, Chouraqui Samira
Abstract: Efficient and safe navigation of Unmanned Aerial Vehicles (UAVs) is critical for various applications, including combat support, package delivery and Search and Rescue Operations. This paper introduces the Tangent Intersection Guidance (TIG) algorithm, an advanced approach for UAV path planning in both static and dynamic environments. The algorithm uses the elliptic tangent intersection method to generate feasible paths. It generates two sub‑paths for each threat, selects the optimal route based on a heuristic rule, and iteratively refines the path until the target is reached. Considering the UAV kinematic and dynamic constraints, a modified smoothing technique based on quadratic Bézier curves is adopted to generate a smooth and efficient route. Experimental results show that the TIG algorithm can generate the shortest path in less time, starting from 0.01 seconds, with fewer turning angles compared to A, PRM, RRT, Tangent Graph, and Static APPATT algorithms in static environments. Furthermore, in completely unknown and partially known environments, TIG demonstrates efficient real‑time path planning capabilities for collision avoidance, outperforming APF and Dynamic APPATT algorithms.
Authors: Matteo Contini, Victor Illien, Sylvain Poulain, Serge Bernard, Julien Barde, Sylvain Bonhommeau, Alexis Joly
Abstract: Obtaining pixel‑level annotations over large spatial extents remains a major bottleneck for deploying machine learning in ecological applications. Here we present a multi‑scale weakly supervised semantic segmentation (WSSS) framework that enables training high‑resolution segmentation models from dense, classification‑based outputs. Our method combines fine‑scale, multi‑label predictions from underwater imagery with broad‑coverage aerial data. We convert these point‑level classifications into coarse supervision masks that can be used to train a semantic segmentation model on Unmanned Aerial Vehicle (UAV) orthophotos. A second training step using the model's own refined predictions is then used to further improve spatial accuracy without requiring additional annotations. We demonstrate the approach on coral reef imagery, enabling large‑area segmentation of coral morphotypes and illustrating its flexibility in integrating new classes. The final model achieves 86.07% pixel accuracy and 52.23% mean Intersection over Union (mIoU) on manually annotated reef zones, demonstrating that accurate large‑scale coral segmentation can be obtained without pixel‑level annotations. By bridging image classification and segmentation across scales and modalities, this method provides an efficient solution for deploying segmentation models in settings where annotations are unavailable and opens opportunities for scalable, efficient monitoring in ecology and beyond.
Authors: Afan Ali, Irfanullah Khan
Abstract: Non‑Terrestrial Networks (NTNs) based on Unmanned Aerial Vehicles (UAVs) as base stations are extremely susceptible to security attacks due to their distributed and dynamic nature, which makes them vulnerable to rogue nodes. In this paper, a new Dynamic Trust Score Adjustment Mechanism with Energy‑Aware Consensus (DTSAM‑EAC) is proposed to enhance security in UAV‑based NTNs. The proposed framework integrates a permissioned Hyperledger Fabric blockchain with Federated Learning (FL) to support privacy‑preserving trust evaluation. Trust ratings are updated continuously through weighted aggregation of past trust, present behavior, and energy contribution, thus making the system adaptive to changing network conditions. An energy‑aware consensus mechanism prioritizes UAVs with greater available energy for block validation, ensuring efficient use of resources under resource‑constrained environments. FL aggregation with trust‑weighting further increases the resilience of the global trust model. Simulation results verify the designed framework achieves 94% trust score prediction accuracy and 96% rogue UAV detection rate while outperforming centralized and static baselines of trust‑based solutions on privacy, energy efficiency, and reliability. It complies with 6G requirements in terms of distributed intelligence and sustainability and is an energy‑efficient and scalable solution to secure NTNs.
Authors: Ziye Jia, Jia He, Lijun He, Min Sheng, Junyu Liu, Qihui Wu, Zhu Han
Abstract: Unmanned aerial vehicles (UAVs) can serve as aerial base stations (BSs) to extend the ubiquitous connectivity for ground users (GUs) in the sixth‑generation (6G) era. However, it is challenging to cooperatively deploy multiple UAV swarms in large‑scale remote areas. Hence, in this paper, we propose a hierarchical UAV swarms structure for 6G aerial access networks, where the head UAVs serve as aerial BSs, and tail UAVs (T‑UAVs) are responsible for relay. In detail, we jointly optimize the dynamic deployment and trajectory of UAV swarms, which is formulated as a multi‑objective optimization problem (MOP) to concurrently minimize the energy consumption of UAV swarms and GUs, as well as the delay of GUs. However, the proposed MOP is a mixed integer nonlinear programming and NP‑hard to solve. Therefore, we develop a K‑means and Voronoi diagram based area division method, and construct Fermat points to establish connections between GUs and T‑UAVs. Then, an improved non‑dominated sorting whale optimization algorithm is proposed to seek Pareto optimal solutions for the transformed MOP. Finally, extensive simulations are conducted to verify the performance of proposed algorithms by comparing with baseline mechanisms, resulting in a 50% complexity reduction.
Authors: Filippos Fotiadis, Brian M. Sadler, Ufuk Topcu
Abstract: Efficient mobile jamming against eavesdroppers in wireless networks necessitates accurate coordination between mobility and antenna beamforming. We study the coordinated beamforming and control problem for a UAV that carries two omnidirectional antennas, and which uses them to jam an eavesdropper while leaving a friendly client unaffected. The UAV can shape its jamming beampattern by controlling its position, its antennas' orientation, and the relative phasing for each antenna. We derive a closed‑form expression for the antennas' phases that guarantees zero jamming impact on the client. In addition, we determine the antennas' orientation and the UAV's position that maximizes jamming impact on the eavesdropper through an optimal control problem, optimizing the orientation pointwise and the position through the UAV's control input. Simulations show how this coordinated beamforming and control scheme enables directional GPS denial while guaranteeing zero interference towards a friendly direction.
Authors: Shayesta Naziri, Xu Wang, Guangsheng Yu, Christy Jie Liang, Wei Ni
Abstract: The increasing deployment of Unmanned Aerial Vehicles (UAVs) for military, commercial, and logistics applications has raised significant concerns regarding flight path privacy. Conventional UAV communication systems often expose flight path data to third parties, making them vulnerable to tracking, surveillance, and location inference attacks. Existing encryption techniques provide security but fail to ensure complete privacy, as adversaries can still infer movement patterns through metadata analysis. To address these challenges, we propose a zk‑SNARK(Zero‑Knowledge Succinct Non‑Interactive Argument of Knowledge)‑based privacy‑preserving flight path authentication and verification framework. Our approach ensures that a UAV can prove its authorisation, validate its flight path with a control centre, and comply with regulatory constraints without revealing any sensitive trajectory information. By leveraging zk‑SNARKs, the UAV can generate cryptographic proofs that verify compliance with predefined flight policies while keeping the exact path and location undisclosed. This method mitigates risks associated with real‑time tracking, identity exposure, and unauthorised interception, thereby enhancing UAV operational security in adversarial environments. Our proposed solution balances privacy, security, and computational efficiency, making it suitable for resource‑constrained UAVs in both civilian and military applications.
Authors: Han Zeng, Haibo Wang, Kan Wang, Xutao Yu, Zaichen Zhang
Abstract: The rise of sixth‑generation (6G) wireless networks sets high demands on UAV‑assisted Free Space Optical (FSO) communications, where the channel environment becomes more complex and variable due to both atmospheric turbulence and UAV‑induced vibrations. These factors increase the challenge of maintaining reliable communication and require adaptive processing methods. Autoencoders are promising as they learn optimal encodings from channel data. However, existing autoencoder designs are generic and lack the specific adaptability and computational flexibility needed for UAV‑FSO scenarios. To address this, we propose AEAT‑AE (Adaptive Environment‑aware Transformer Autoencoder), a Transformer‑based framework that integrates environmental parameters into both encoder and decoder via a cross‑attention mechanism. Moreover, AEAT‑AE incorporates a Deep Q‑Network (DQN) that dynamically selects which layers of the Transformer autoencoder to activate based on real‑time environmental inputs, effectively balancing performance and computational cost. Simulation results demonstrate that AEAT‑AE outperforms conventional methods in bit error rate while maintaining efficient runtime, representing a novel tailored solution for next‑generation UAV‑FSO communications.
Authors: Marco S. Tayar, Lucas K. de Oliveira, Felipe Andrade G. Tommaselli, Juliano D. Negri, Thiago H. Segreto, Ricardo V. Godoy, Marcelo Becker
Abstract: Autonomous UAV inspection of confined industrial infrastructure, such as ventilation ducts, demands robust navigation policies where collisions are unacceptable. While Deep Reinforcement Learning (DRL) offers a powerful paradigm for developing such policies, it presents a critical trade‑off between on‑policy and off‑policy algorithms. Off‑policy methods promise high sample efficiency, a vital trait for minimizing costly and unsafe real‑world fine‑tuning. In contrast, on‑policy methods often exhibit greater training stability, which is essential for reliable convergence in hazard‑dense environments. This paper directly investigates this trade‑off by comparing a leading on‑policy algorithm, Proximal Policy Optimization (PPO), against an off‑policy counterpart, Soft Actor‑Critic (SAC), for precision flight in procedurally generated ducts within a high‑fidelity simulator. Our results show that PPO consistently learned a stable, collision‑free policy that completed the entire course. In contrast, SAC failed to find a complete solution, converging to a suboptimal policy that navigated only the initial segments before failure. This work provides evidence that for high‑precision, safety‑critical navigation tasks, the reliable convergence of a well‑established on‑policy method can be more decisive than the nominal sample efficiency of an off‑policy algorithm.
Authors: Yanbing Bai, Rui-Yang Ju, Lemeng Zhao, Junjie Hu, Jianchao Bi, Erick Mas, Shunichi Koshimura
Abstract: Unmanned Aerial Vehicles (UAVs) have become increasingly important in disaster emergency response by facilitating aerial video analysis. Due to the limited computational resources available on UAVs, large models cannot be run efficiently for on‑board analysis. To overcome this challenge, we propose a lightweight and efficient two‑stage framework for wildfire monitoring and fire source detection on UAV platforms. Specifically, in Stage 1, we utilize a policy network to identify and discard redundant video clips, thereby reducing computational costs. We also introduce a station point mechanism that incorporates future frame information within the sequential policy network to improve prediction accuracy. This mechanism allows Stage 1 to operate in a near‑real‑time manner. In Stage 2, for frames classified as containing fire, we apply an improved YOLOv8 model to accurately localize the fire source in real‑time on selected frames. We evaluate Stage 1 using the FLAME and HMDB51 datasets, and Stage 2 using the Fire & Smoke Detection Dataset. Experimental results show that our method significantly reduces computational costs while maintaining classification accuracy in Stage 1, and achieves high detection accuracy with real‑time inference in Stage 2.
Authors: Hichem Cheriet, Khellat Kihel Badra, Chouraqui Samira
Abstract: The most crucial challenges for UAVs are planning paths and avoiding obstacles in their way. In recent years, a wide variety of path‑planning algorithms have been developed. These algorithms have successfully solved path‑planning problems; however, they suffer from multiple challenges and limitations. To test the effectiveness and efficiency of three widely used algorithms, namely A, RRT, and Particle Swarm Optimization (PSO), this paper conducts extensive experiments in 3D urban city environments cluttered with obstacles. Three experiments were designed with two scenarios each to test the aforementioned algorithms. These experiments consider different city map sizes, different altitudes, and varying obstacle densities and sizes in the environment. According to the experimental results, the A algorithm outperforms the others in both computation efficiency and path quality. PSO is especially suitable for tight turns and dense environments, and RRT offers a balance and works well across all experiments due to its randomized approach to finding solutions.
Authors: Jiri Horyna, Roland Jung, Stephan Weiss, Eliseo Ferrante, Martin Saska
Abstract: In this paper, we present the Swarming Without an Anchor (SWA) approach to state estimation in swarms of Unmanned Aerial Vehicles (UAVs) experiencing ego‑localization dropout, where individual agents are laterally stabilized using relative information only. We propose to fuse decentralized state estimation with robust mutual perception and onboard sensor data to maintain accurate state awareness despite intermittent localization failures. Thus, the relative information used to estimate the lateral state of UAVs enables the identification of the unambiguous state of UAVs with respect to the local constellation. The resulting behavior reaches velocity consensus, as this task can be referred to as the double integrator synchronization problem. All disturbances and performance degradations except a uniform translation drift of the swarm as a whole is attenuated which is enabling new opportunities in using tight cooperation for increasing reliability and resilience of multi‑UAV systems. Simulations and real‑world experiments validate the effectiveness of our approach, demonstrating its capability to sustain cohesive swarm behavior in challenging conditions of unreliable or unavailable primary localization.
Authors: Feibo Jiang, Li Dong, Xitao Pan, Kezhi Wang, Cunhua Pan
Abstract: This paper proposes a novel Agentic Retrieval‑augmented generation with Mamba‑Attention Integrated Transformer (ARMAIT) framework for multi‑Unmanned Aerial Vehicle (UAV) trajectory optimization. The framework is built upon Large Language Models (LLMs), incorporating Retrieval‑Augmented Generation (RAG) empowered by Agentic AI and integrated with a UAV‑specific knowledge base. Through the Agentic RAG, the LLM autonomously interprets high‑level task requirements and identifies the key components necessary for trajectory optimization, including model inputs and outputs, network architecture, reward functions, and task constraints. To support efficient modeling across different system scales, we introduce the Mamba‑Attention Integrated Transformer (MAIT), a hybrid neural architecture that combines the long‑range dependency modeling capability of attention mechanisms with the efficient temporal dynamic representation of Mamba. Furthermore, a Trajectory‑Group Relative Policy Optimization (T‑GRPO) method is proposed to achieve unified policy gradient optimization in both discrete and continuous trajectory spaces for MAIT training. Extensive experimental results validate the feasibility and effectiveness of the proposed ARMAIT framework.
Authors: Aykut Sirma, Angelos Plastropoulos, Gilbert Tang, Argyrios Zolotas
Abstract: Recent advancements in computer vision and deep learning have enhanced disaster‑response capabilities, particularly in the rapid assessment of earthquake‑affected urban environments. Timely identification of accessible entry points and structural obstacles is essential for effective search‑and‑rescue (SAR) operations. To address this need, we introduce DRespNeT, a high‑resolution dataset specifically developed for aerial instance segmentation of post‑earthquake structural environments. Unlike existing datasets, which rely heavily on satellite imagery or coarse semantic labeling, DRespNeT provides detailed polygon‑level instance segmentation annotations derived from high‑definition (1080p) aerial footage captured in disaster zones, including the 2023 Turkiye earthquake and other impacted regions. The dataset comprises 28 operationally critical classes, including structurally compromised buildings, access points such as doors, windows, and gaps, multiple debris levels, rescue personnel, vehicles, and civilian visibility. A distinctive feature of DRespNeT is its fine‑grained annotation detail, enabling differentiation between accessible and obstructed areas, thereby improving operational planning and response efficiency. Performance evaluations using YOLO‑based instance segmentation models, specifically YOLOv8‑seg, demonstrate significant gains in real‑time situational awareness and decision‑making. Our optimized YOLOv8‑DRN model achieves 92.7% mAP50 with an inference speed of 27 FPS on an RTX‑4090 GPU for multi‑target detection, meeting real‑time operational requirements. The dataset and models support SAR teams and robotic systems, providing a foundation for enhancing human‑robot collaboration, streamlining emergency response, and improving survivor outcomes.
Authors: Ousmane Youme, Jean Marie Dembélé, Eugene C. Ezin, Christophe Cambier
Abstract: Convolutional neural networks (CNN) have been used efficiently in several fields, including environmental challenges. In fact, CNN can help with the monitoring of marine litter, which has become a worldwide problem. UAVs have higher resolution and are more adaptable in local areas than satellite images, making it easier to find and count trash. Since the sand is heterogeneous, a basic CNN model encounters plenty of inferences caused by reflections of sand color, human footsteps, shadows, algae present, dunes, holes, and tire tracks. For these types of images, other CNN models, such as CNN‑based segmentation methods, may be more appropriate. In this paper, we use an instance‑based segmentation method and a panoptic segmentation method that show good accuracy with just a few samples. The model is more robust and less
Authors: Mauro Belgiovine, Chris Dick, Kaushik Chowdhury
Abstract: Airborne Base Stations (ABSs) allow for flexible geographical allocation of network resources with dynamically changing load as well as rapid deployment of alternate connectivity solutions during natural disasters. Since the radio infrastructure is carried by unmanned aerial vehicles (UAVs) with limited flight time, it is important to establish the best location for the ABS without exhaustive field trials. This paper proposes a digital twin (DT)‑guided approach to achieve this through the following key contributions: (i) Implementation of an interactive software bridge between two open‑source DTs such that the same scene is evaluated with high fidelity across NVIDIA's Sionna and Aerial Omniverse Digital Twin (AODT), highlighting the unique features of each of these platforms for this allocation problem, (ii) Design of a back‑propagation‑based algorithm in Sionna for rapidly converging on the physical location of the UAVs, orientation of the antennas and transmit power to ensure efficient coverage across the swarm of the UAVs, and (iii) numerical evaluation in AODT for large network scenarios (50 UEs, 10 ABS) that identifies the environmental conditions in which there is agreement or divergence of performance results between these twins. Finally, (iv) we propose a resilience mechanism to provide consistent coverage to mission‑critical devices and demonstrate a use case for bi‑directional flow of information between the two DTs.
Authors: Ruipu Wu, Yige Zhang, Jinyu Chen, Linjiang Huang, Shifeng Zhang, Xu Zhou, Liang Wang, Si Liu
Abstract: Aerial Vision‑and‑Language Navigation (VLN) is an emerging task that enables Unmanned Aerial Vehicles (UAVs) to navigate outdoor environments using natural language instructions and visual cues. However, due to the extended trajectories and complex maneuverability of UAVs, achieving reliable UAV‑VLN performance is challenging and often requires human intervention or overly detailed instructions. To harness the advantages of UAVs' high mobility, which could provide multi‑grained perspectives, while maintaining a manageable motion space for learning, we introduce a novel task called Dual‑Altitude UAV Collaborative VLN (DuAl‑VLN). In this task, two UAVs operate at distinct altitudes: a high‑altitude UAV responsible for broad environmental reasoning, and a low‑altitude UAV tasked with precise navigation. To support the training and evaluation of the DuAl‑VLN, we construct the HaL‑13k, a dataset comprising 13,838 collaborative high‑low UAV demonstration trajectories, each paired with target‑oriented language instructions. This dataset includes both unseen maps and an unseen object validation set to systematically evaluate the model's generalization capabilities across novel environments and unfamiliar targets. To consolidate their complementary strengths, we propose a dual‑UAV collaborative VLN framework, AeroDuo, where the high‑altitude UAV integrates a multimodal large language model (Pilot‑LLM) for target reasoning, while the low‑altitude UAV employs a lightweight multi‑stage policy for navigation and target grounding. The two UAVs work collaboratively and only exchange minimal coordinate information to ensure efficiency.
Authors: Yufei Ye, Shijian Gao, Xinhu Zheng, Liuqing Yang
Abstract: This paper presents a novel multi‑tier UAV‑assisted edge computing system designed for low‑altitude networks. The system comprises vehicle users, lightweight Low‑Tier UAVs (L‑UAVs), and High‑Tier UAV (H‑UAV). L‑UAVs function as small‑scale edge servers positioned closer to vehicle users, while the H‑UAV, equipped with more powerful server and larger‑capacity battery, serves as mobile backup server to address the limitations in endurance and computing resources of L‑UAVs. The primary objective is to minimize task execution delays while ensuring long‑term energy stability for L‑UAVs. To address this challenge, the problem is first decoupled into a series of deterministic problems for each time slot using Lyapunov optimization. The priorities of task delay and energy consumption for L‑UAVs are adaptively adjusted based on real‑time energy status. The optimization tasks include assignment of tasks, allocation of computing resources, and trajectory planning for both L‑UAVs and H‑UAV. Simulation results demonstrate that the proposed approach achieves a reduction of at least 26% in transmission energy for L‑UAVs and exhibits superior energy stability compared to existing benchmarks.
Authors: Xinkai Liang, Yigu Ge, Yangxi Shi, Haoyu Yang, Xu Cao, Hao Fang
Abstract: To address the challenges of localization drift and perception‑planning coupling in unmanned aerial vehicles (UAVs) operating in open‑top scenarios (e.g., collapsed buildings, roofless mazes), this paper proposes EAROL, a novel framework with a downward‑mounted tilted LiDAR configuration (20° inclination), integrating a LiDAR‑Inertial Odometry (LIO) system and a hierarchical trajectory‑yaw optimization algorithm. The hardware innovation enables constraint enhancement via dense ground point cloud acquisition and forward environmental awareness for dynamic obstacle detection. A tightly‑coupled LIO system, empowered by an Iterative Error‑State Kalman Filter (IESKF) with dynamic motion compensation, achieves high level 6‑DoF localization accuracy in feature‑sparse environments. The planner, augmented by environment, balancing environmental exploration, target tracking precision, and energy efficiency. Physical experiments demonstrate 81% tracking error reduction, 22% improvement in perceptual coverage, and near‑zero vertical drift across indoor maze and 60‑meter‑scale outdoor scenarios. This work proposes a hardware‑algorithm co‑design paradigm, offering a robust solution for UAV autonomy in post‑disaster search and rescue missions. We will release our software and hardware as an open‑source package for the community. Video: https://youtu.be/7av2ueLSiYw.
Authors: Nicole Fronda, Bardh Hoxha, Houssam Abbas
Abstract: We propose injecting notions of fairness into multi‑robot motion planning. When robots have competing interests, it is important to optimize for some kind of fairness in their usage of resources. In this work, we explore how the robots' energy expenditures might be fairly distributed among them, while maintaining mission success. We formulate a distributed fair motion planner and integrate it with safe controllers in a algorithm called FiReFly. For simulated reach‑avoid missions, FiReFly produces fairer trajectories and improves mission success rates over a non‑fair planner. We find that real‑time performance is achievable up to 15 UAVs, and that scaling up to 50 UAVs is possible with trade‑offs between runtime and fairness improvements.
Authors: Dian Ning, Dong Seog Han
Abstract: In one‑stage multi‑object detection tasks, various intersection over union (IoU)‑based solutions aim at smooth and stable convergence near the targets during training. However, IoU‑based losses fail to correctly update the gradient of small objects due to an extremely flat gradient. During the update of multiple objects, the learning of small objects' gradients suffers more because of insufficient gradient updates. Therefore, we propose an inter‑class relational loss to efficiently update the gradient of small objects while not sacrificing the learning efficiency of other objects based on the simple fact that an object has a spatial relationship to another object (e.g., a car plate is attached to a car in a similar position). When the predicted car plate's bounding box is not within its car, a loss punishment is added to guide the learning, which is inversely proportional to the overlapped area of the car's and predicted car plate's bounding box. By leveraging the spatial relationship at the inter‑class level, the loss guides small object predictions using larger objects and enhances latent information in deeper feature maps. In this paper, we present twofold contributions using license plate detection as a case study: (1) a new small vehicle multi‑license plate dataset (SVMLP), featuring diverse real‑world scenarios with high‑quality annotations; and (2) a novel inter‑class relational loss function designed to promote effective detection performance. We highlight the proposed ICR loss penalty can be easily added to existing IoU‑based losses and enhance the performance. These contributions improve the standard mean Average Precision (mAP) metric, achieving gains of 10.3% and 1.6% in mAP^\texttest_50 for YOLOv12‑T and UAV‑DETR, respectively, without any additional hyperparameter tuning. Code and dataset will be available soon.
Authors: Zhanxi Xie, Baili Lu, Yanzhao Gu, Zikun Li, Junhao Wei, Ngai Cheong
Abstract: This study investigates the application of unmanned aerial vehicles (UAVs) in public management, focusing on optimizing path planning to address challenges such as energy consumption, obstacle avoidance, and airspace constraints. As UAVs transition from 'technical tools' to 'governance infrastructure', driven by advancements in low‑altitude economy policies and smart city demands, efficient path planning becomes critical. The research proposes an enhanced Rapidly‑exploring Random Tree algorithm (dRRT), incorporating four strategies: Target Bias (to accelerate convergence), Dynamic Step Size (to balance exploration and obstacle navigation), Detour Priority (to prioritize horizontal detours over vertical ascents), and B‑spline smoothing (to enhance path smoothness). Simulations in a 500 m3 urban environment with randomized buildings demonstrate dRRT's superiority over traditional RRT, A, and Ant Colony Optimization (ACO). Results show dRRT achieves a 100% success rate with an average runtime of 0.01468s, shorter path lengths, fewer waypoints, and smoother trajectories (maximum yaw angles <45°). Despite improvements, limitations include increased computational overhead from added mechanisms and potential local optima due to goal biasing. The study highlights dRRT's potential for efficient UAV deployment in public management scenarios like emergency response and traffic monitoring, while underscoring the need for integration with real‑time obstacle avoidance frameworks. This work contributes to interdisciplinary advancements in urban governance, robotics, and computational optimization.
Authors: Wenguang Tao, Xiaotian Wang, Tian Yan, Jie Yan, Guodong Li, Kun Bai
Abstract: As a key research direction in the field of multi‑object tracking (MOT), UAV‑based multi‑object tracking has significant application value in the analysis and understanding of urban intelligent transportation systems. However, in complex UAV perspectives, challenges such as small target scale variations, occlusions, nonlinear crossing motions, and motion blur severely hinder the stability of multi‑object tracking. To address these challenges, this paper proposes a novel multi‑object tracking framework, SocialTrack, aimed at enhancing the tracking accuracy and robustness of small targets in complex urban traffic environments. The specialized small‑target detector enhances the detection performance by employing a multi‑scale feature enhancement mechanism. The Velocity Adaptive Cubature Kalman Filter (VACKF) improves the accuracy of trajectory prediction by incorporating a velocity dynamic modeling mechanism. The Group Motion Compensation Strategy (GMCS) models social group motion priors to provide stable state update references for low‑quality tracks, significantly improving the target association accuracy in complex dynamic environments. Furthermore, the Spatio‑Temporal Memory Prediction (STMP) leverages historical trajectory information to predict the future state of low‑quality tracks, effectively mitigating identity switching issues. Extensive experiments on the UAVDT and MOT17 datasets demonstrate that SocialTrack outperforms existing state‑of‑the‑art (SOTA) methods across several key metrics. Significant improvements in MOTA and IDF1, among other core performance indicators, highlight its superior robustness and adaptability. Additionally, SocialTrack is highly modular and compatible, allowing for seamless integration with existing trackers to further enhance performance.
Authors: Zhongyao Li, Peirui Cheng, Liangjin Zhao, Chen Chen, Yundu Li, Zhechao Wang, Xue Yang, Xian Sun, Zhirui Wang
Abstract: Multi‑UAV collaborative 3D detection enables accurate and robust perception by fusing multi‑view observations from aerial platforms, offering significant advantages in coverage and occlusion handling, while posing new challenges for computation on resource‑constrained UAV platforms. In this paper, we present AdaBEV, a novel framework that learns adaptive instance‑aware BEV representations through a refine‑and‑contrast paradigm. Unlike existing methods that treat all BEV grids equally, AdaBEV introduces a Box‑Guided Refinement Module (BG‑RM) and an Instance‑Background Contrastive Learning (IBCL) to enhance semantic awareness and feature discriminability. BG‑RM refines only BEV grids associated with foreground instances using 2D supervision and spatial subdivision, while IBCL promotes stronger separation between foreground and background features via contrastive learning in BEV space. Extensive experiments on the Air‑Co‑Pred dataset demonstrate that AdaBEV achieves superior accuracy‑computation trade‑offs across model scales, outperforming other state‑of‑the‑art methods at low resolutions and approaching upper bound performance while maintaining low‑resolution BEV inputs and negligible overhead.
Authors: Chunliang Hua, Xiao Hu, Jiayang Sun, Zeyuan Yang
Abstract: As urban aerial mobility (UAM) infrastructure development accelerates globally, cities like Shenzhen are planning large‑scale vertiport networks (e.g., 1,200+ facilities by 2026). Existing planning frameworks remain inadequate for this complexity due to historical limitations in data granularity and real‑world applicability. This paper addresses these gaps by first proposing the Capacitated Dynamic Maximum Covering Location Problem (CDMCLP), a novel optimization framework that simultaneously models urban‑scale spatial‑temporal demand, heterogeneous user behaviors, and infrastructure capacity constraints. Building on this foundation, we introduce an Integrated Planning Recommendation System that combines CDMCLP with socio‑economic factors and dynamic clustering initialization. This system leverages adaptive parameter tuning based on empirical user behavior to generate practical planning solutions. Validation in a Chinese center city demonstrates the effectiveness of the new optimization framework and recommendation system. Under the evaluation and optimization of CDMCLP, the quantitative performance of traditional location methods are exposed and can be improved by 38%‑‑52%, while the recommendation system shows user‑friendliness and the effective integration of complex elements. By integrating mathematical rigor with practical implementation considerations, this hybrid approach bridges the gap between theoretical location modeling and real‑world UAM infrastructure planning, offering municipalities a pragmatic tool for vertiport network design.
Authors: Haolin Zheng, Ning Gao, Donghong Cai, Shi Jin, Michail Matthaiou
Abstract: Unmanned aerial vehicle (UAV) individual (ID) identification is a critical security surveillance strategy in low‑altitude integrated sensing and communication (ISAC) networks. In this paper, we propose a novel dynamic knowledge distillation (KD)‑enabled wireless radio frequency fingerprint large language model (RFF‑LLM) framework for UAV ID identification. First, we propose an RFF‑LLM framework based on the modified GPT‑2 model to improve the identification accuracy in complex outdoor environments. Then, considering the parameter overhead of the RFF‑LLM, we design a dynamic KD strategy to compress the model. Specifically, the proximal policy optimization (PPO) algorithm is employed to dynamically adjust the distillation temperature, overcoming the local optimum dilemma inherent in static KD. As a next step, the knowledge of the RFF‑LLM is adequately transferred to the lightweight Lite‑HRNet model. Finally, our experiments are conducted based on the self‑built drone RFF dataset of Release one, namely DRFF‑R1, by collecting the I/Q signals of 20 commercial UAVs in channel 149. The experiment results show that the proposed framework achieves 98.38% ID identification accuracy with merely 0.15 million parameters and 2.74 ms response time, which outperforms the benchmarks.
Authors: Fei Lin, Tengchao Zhang, Qinghua Ni, Jun Huang, Siji Ma, Yonglin Tian, Yisheng Lv, Naiqi Wu
Abstract: The rapid adoption of Large Language Models (LLMs) in unmanned systems has significantly enhanced the semantic understanding and autonomous task execution capabilities of Unmanned Aerial Vehicle (UAV) swarms. However, limited communication bandwidth and the need for high‑frequency interactions pose severe challenges to semantic information transmission within the swarm. This paper explores the feasibility of LLM‑driven UAV swarms for autonomous semantic compression communication, aiming to reduce communication load while preserving critical task semantics. To this end, we construct four types of 2D simulation scenarios with different levels of environmental complexity and design a communication‑execution pipeline that integrates system prompts with task instruction prompts. On this basis, we systematically evaluate the semantic compression performance of nine mainstream LLMs in different scenarios and analyze their adaptability and stability through ablation studies on environmental complexity and swarm size. Experimental results demonstrate that LLM‑based UAV swarms have the potential to achieve efficient collaborative communication under bandwidth‑constrained and multi‑hop link conditions.
Authors: Sangwoo Jeon, Juchul Shin, YeonJe Cho, Gyeong-Tae Kim, Seongwoo Kim
Abstract: Modern autonomous drone missions increasingly require software frameworks capable of seamlessly integrating structured symbolic planning with adaptive reinforcement learning (RL). Although traditional rule‑based architectures offer robust structured reasoning for drone autonomy, their capabilities fall short in dynamically complex operational environments that require adaptive symbolic planning. Symbolic RL (SRL), using the Planning Domain Definition Language (PDDL), explicitly integrates domain‑specific knowledge and operational constraints, significantly improving the reliability and safety of unmanned aerial vehicle (UAV) decision making. In this study, we propose the AMAD‑SRL framework, an extended and refined version of the Autonomous Mission Agents for Drones (AMAD) cognitive multi‑agent architecture, enhanced with symbolic reinforcement learning for dynamic mission planning and execution. We validated our framework in a Software‑in‑the‑Loop (SIL) environment structured identically to an intended Hardware‑In‑the‑Loop Simulation (HILS) platform, ensuring seamless transition to real hardware. Experimental results demonstrate stable integration and interoperability of modules, successful transitions between BDI‑driven and symbolic RL‑driven planning phases, and consistent mission performance. Specifically, we evaluate a target acquisition scenario in which the UAV plans a surveillance path followed by a dynamic reentry path to secure the target while avoiding threat zones. In this SIL evaluation, mission efficiency improved by approximately 75% over a coverage‑based baseline, measured by travel distance reduction. This study establishes a robust foundation for handling complex UAV missions and discusses directions for further enhancement and validation.
Authors: Hamza Kheddar, Yassine Habchi, Mohamed Chahine Ghanem, Mustapha Hemis, Dusit Niyato
Abstract: The rapid advancement of Transformer‑based models has reshaped the landscape of uncrewed aerial vehicle (UAV) systems by enhancing perception, decision‑making, and autonomy. This review paper systematically categorizes and evaluates recent developments in Transformer architectures applied to UAVs, including attention mechanisms, CNN‑Transformer hybrids, reinforcement learning Transformers, and large language models (LLMs). Unlike previous surveys, this work presents a unified taxonomy of Transformer‑based UAV models, highlights emerging applications such as precision agriculture and autonomous navigation, and provides comparative analyses through structured tables and performance benchmarks. The paper also reviews key datasets, simulators, and evaluation metrics used in the field. Furthermore, it identifies existing gaps in the literature, outlines critical challenges in computational efficiency and real‑time deployment, and offers future research directions. This comprehensive synthesis aims to guide researchers and practitioners in understanding and advancing Transformer‑driven UAV technologies.
Authors: Jingpu Yang, Mingxuan Cui, Hang Zhang, Fengxian Ji, Zhengzhao Lai, Yufeng Wang
Abstract: Unmanned Aerial Vehicle communications are encountering increasingly severe multi‑source interference challenges in dynamic adversarial environments, which impose higher demands on their reliability and resilience. To address these challenges, agent‑based autonomous anti‑jamming techniques have emerged as a crucial research direction. This paper presents a comprehensive survey that first formalizes the concept of intelligent anti‑jamming agents for UAV communications and establishes a closed‑loop decision‑making framework centered on the "Perception‑Decision‑Action" (P‑D‑A) paradigm. Within this framework, we systematically review key technologies at each stage, with particular emphasis on employing game theory to model UAV‑jammer interactions and integrating reinforcement learning‑based intelligent algorithms to derive adaptive anti‑jamming strategies. Furthermore, we discuss potential limitations of current approaches, identify critical engineering challenges, and outline promising future research directions, aiming to provide valuable references for developing more intelligent and robust anti‑jamming communication systems for UAVs.
Authors: Martin Jiroušek, Tomáš Báča, Martin Saska
Abstract: This paper addresses the problem of tracking the position of a cable‑suspended payload carried by an unmanned aerial vehicle, with a focus on real‑world deployment and minimal hardware requirements. In contrast to many existing approaches that rely on motion‑capture systems, additional onboard cameras, or instrumented payloads, we propose a framework that uses only standard onboard sensors‑‑specifically, real‑time kinematic global navigation satellite system measurements and data from the onboard inertial measurement unit‑‑to estimate and control the payload's position. The system models the full coupled dynamics of the aerial vehicle and payload, and integrates a linear Kalman filter for state estimation, a model predictive contouring control planner, and an incremental model predictive controller. The control architecture is designed to remain effective despite sensing limitations and estimation uncertainty. Extensive simulations demonstrate that the proposed system achieves performance comparable to control based on ground‑truth measurements, with only minor degradation (< 6%). The system also shows strong robustness to variations in payload parameters. Field experiments further validate the framework, confirming its practical applicability and reliable performance in outdoor environments using only off‑the‑shelf aerial vehicle hardware.
Authors: Jiajin Guan, Haibo Mei, Bonan Zhang, Dan Liu, Yuanshuang Fu, Yue Zhang
Abstract: Recent advances in vision‑language models (VLMs) have demonstrated strong generalization in natural image tasks. However, their performance often degrades on unmanned aerial vehicle (UAV)‑based aerial imagery, which features high resolution, complex spatial semantics, and strict real‑time constraints. These challenges limit the applicability of general‑purpose VLMs to structured aerial reasoning tasks. To address these challenges, we propose UAV‑VL‑R1, a lightweight VLM explicitly designed for aerial visual reasoning. It is trained using a hybrid method that combines supervised fine‑tuning (SFT) and multi‑stage reinforcement learning (RL). We leverage the group relative policy optimization (GRPO) algorithm to promote structured and interpretable reasoning through rule‑guided rewards and intra‑group policy alignment. To support model training and evaluation, we introduce a high‑resolution visual question answering dataset named HRVQA‑VL, which consists of 50,019 annotated samples covering eight UAV‑relevant reasoning tasks, including object counting, transportation recognition, and spatial scene inference. Experimental results show that UAV‑VL‑R1 achieves a 48.17% higher zero‑shot accuracy than the Qwen2‑VL‑2B‑Instruct baseline and even outperforms its 72B‑scale variant, which is 36x larger, on multiple tasks. Ablation studies reveal that while SFT improves semantic alignment, it may reduce reasoning diversity in mathematical tasks. GRPO‑based RL compensates for this limitation by enhancing logical flexibility and the robustness of inference. Additionally, UAV‑VL‑R1 requires only 3.9GB of memory under FP16 inference and can be quantized to 2.5GB with INT8, supporting real‑time deployment on resource‑constrained UAV platforms.
Authors: Metin Ozturk, Maryam Salamatmoghadasi, Halim Yanikomeroglu
Abstract: Sustainability is paramount in modern cellular networks, which face significant energy consumption challenges from rising mobile traffic and advancements in wireless technology. Cell‑switching, well‑established in literature as an effective solution, encounters limitations such as inadequate capacity and limited coverage when implemented through terrestrial networks (TN). This study enhances cell‑switching by integrating non‑terrestrial networks (NTN), including satellites (used for cell‑switching for the first time), high altitude platform stations (HAPS), and uncrewed aerial vehicles (UAVs) into TN. This integration significantly boosts energy savings by expanding capacity, enhancing coverage, and increasing operational flexibility. We introduce a multi‑tier cell‑switching approach that dynamically offloads users across network layers to manage energy effectively and minimize delays, accommodating diverse user demands with a context aware strategy. Additionally, we explore the role of artificial intelligence (AI), particularly generative AI, in optimizing network efficiency through data compression, handover optimization between different network layers, and enhancing device compatibility, further improving the adaptability and energy efficiency of cell‑switching operations. A case study confirms substantial improvements in network power consumption and user satisfaction, demonstrating the potential of our approach for future networks.
Authors: Kan Yu, Kaixuan Li, Xiaowu Liu, Qixun Zhang, Zhiyong Feng
Abstract: In complex urban environments, dynamic obstacles and multipath effects lead to significant link attenuation and pervasive coverage blind spots. Conventional approaches based on large‑scale fixed antenna arrays and UAV trajectory optimization struggle to balance energy efficiency, real‑time adaptation, and spatial flexibility. The movable antenna (MA) technology has emerged as a promising solution, offering enhanced spatial flexibility and reduced energy consumption to overcome the bottlenecks of urban low‑altitude communications. However, MA deployment faces a critical velocity mismatch between UAV mobility and mechanical repositioning latency, undermining real‑time link optimization and security assurance. To overcome this, we propose a predictive MA‑UAV collaborative control framework. First, optimal antenna positions are derived via secrecy rate maximization. Second, a Transformer‑enhanced long short‑term memory (LSTM) network predicts future MA positions by capturing spatio‑temporal correlations in antenna trajectories. Extensive simulations demonstrate superior prediction accuracy (NMSE reduction exceeds 49%) and communication reliability versus current popular benchmarks.
Authors: Zihan Wang, Nina Mahmoudian
Abstract: Vision‑driven autonomous river following by Unmanned Aerial Vehicles is critical for applications such as rescue, surveillance, and environmental monitoring, particularly in dense riverine environments where GPS signals are unreliable. These safety‑critical navigation tasks must satisfy hard safety constraints while optimizing performance. Moreover, the reward in river following is inherently history‑dependent (non‑Markovian) by which river segment has already been visited, making it challenging for standard safe Reinforcement Learning (SafeRL). To address these gaps, we propose three contributions. First, we introduce Marginal Gain Advantage Estimation, which refines the reward advantage function by using a sliding window baseline computed from historical episodic returns, aligning the advantage estimate with non‑Markovian dynamics. Second, we develop a Semantic Dynamics Model based on patchified water semantic masks offering more interpretable and data‑efficient short‑term prediction of future observations compared to latent vision dynamics models. Third, we present the Constrained Actor Dynamics Estimator architecture, which integrates the actor, cost estimator, and SDM for cost advantage estimation to form a model‑based SafeRL framework. Simulation results demonstrate that MGAE achieves faster convergence and superior performance over traditional critic‑based methods like Generalized Advantage Estimation. SDM provides more accurate short‑term state predictions that enable the cost estimator to better predict potential violations. Overall, CADE effectively integrates safety regulation into model‑based RL, with the Lagrangian approach providing a "soft" balance between reward and safety during training, while the safety layer enhances inference by imposing a "hard" action overlay.
Authors: Zhehan Zhou, Xiaoming Chen, Ming Ying, Zhaohui Yang, Chongwen Huang, Yunlong Cai, Zhaoyang Zhang
Abstract: With the explosive growth of maritime activities, it is expected to provide seamless communications with quality of service (QoS) guarantee over broad sea area. In the context, this paper proposes a space‑air‑ground‑sea integrated maritime communication architecture combining satellite, unmanned aerial vehicle (UAV), terrestrial base station (TBS) and unmanned surface vessel (USV). Firstly, according to the distance away from the shore, the whole marine space is divided to coastal area, offshore area, middle‑sea area and open‑sea area, the maritime users in which are served by TBS, USV, UAV and satellite, respectively. Then, by exploiting the potential of integrated maritime communication system, a joint beamforming and trajectory optimization algorithm is designed to maximize the minimum transmission rate of maritime users. Finally, theoretical analysis and simulation results validate the effectiveness of the proposed algorithm.
Authors: Changyuan Zhao, Guangyuan Liu, Ruichen Zhang, Yinqiu Liu, Jiacheng Wang, Jiawen Kang, Dusit Niyato, Zan Li, Xuemin, Shen, Zhu Han, Sumei Sun, Chau Yuen, Dong In Kim
Abstract: Edge General Intelligence (EGI) represents a transformative evolution of edge computing, where distributed agents possess the capability to perceive, reason, and act autonomously across diverse, dynamic environments. Central to this vision are world models, which act as proactive internal simulators that not only predict but also actively imagine future trajectories, reason under uncertainty, and plan multi‑step actions with foresight. This proactive nature allows agents to anticipate potential outcomes and optimize decisions ahead of real‑world interactions. While prior works in robotics and gaming have showcased the potential of world models, their integration into the wireless edge for EGI remains underexplored. This survey bridges this gap by offering a comprehensive analysis of how world models can empower agentic artificial intelligence (AI) systems at the edge. We first examine the architectural foundations of world models, including latent representation learning, dynamics modeling, and imagination‑based planning. Building on these core capabilities, we illustrate their proactive applications across EGI scenarios such as vehicular networks, unmanned aerial vehicle (UAV) networks, the Internet of Things (IoT) systems, and network functions virtualization, thereby highlighting how they can enhance optimization under latency, energy, and privacy constraints. We then explore their synergy with foundation models and digital twins, positioning world models as the cognitive backbone of EGI. Finally, we highlight open challenges, such as safety guarantees, efficient training, and constrained deployment, and outline future research directions. This survey provides both a conceptual foundation and a practical roadmap for realizing the next generation of intelligent, autonomous edge systems.
Authors: Jialei Xu, Zizhuang Wei, Weikang You, Linyun Li, Weijian Sun
Abstract: Semantic segmentation of city‑scale point clouds is a critical technology for Unmanned Aerial Vehicle (UAV) perception systems, enabling the classification of 3D points without relying on any visual information to achieve comprehensive 3D understanding. However, existing models are frequently constrained by the limited scale of 3D data and the domain gap between datasets, which lead to reduced generalization capability. To address these challenges, we propose CitySeg, a foundation model for city‑scale point cloud semantic segmentation that incorporates text modality to achieve open vocabulary segmentation and zero‑shot inference. Specifically, in order to mitigate the issue of non‑uniform data distribution across multiple domains, we customize the data preprocessing rules, and propose a local‑global cross‑attention network to enhance the perception capabilities of point networks in UAV scenarios. To resolve semantic label discrepancies across datasets, we introduce a hierarchical classification strategy. A hierarchical graph established according to the data annotation rules consolidates the data labels, and the graph encoder is used to model the hierarchical relationships between categories. In addition, we propose a two‑stage training strategy and employ hinge loss to increase the feature separability of subcategories. Experimental results demonstrate that the proposed CitySeg achieves state‑of‑the‑art (SOTA) performance on nine closed‑set benchmarks, significantly outperforming existing approaches. Moreover, for the first time, CitySeg enables zero‑shot generalization in city‑scale point cloud scenarios without relying on visual information.
Authors: Kelen C. Teixeira Vivaldini, Robert Pěnička, Martin Saska
Abstract: One of the most critical features for the successful operation of autonomous UAVs is the ability to make decisions based on the information acquired from their surroundings. Each UAV must be able to make decisions during the flight in order to deal with uncertainties in its system and the environment, and to further act upon the information being received. Such decisions influence the future behavior of the UAV, which is expressed as the path plan. Thus, decision‑making in path planning is an enabling technique for deploying autonomous UAVs in real‑world applications. This survey provides an overview of existing studies that use aspects of decision‑making in path planning, presenting the research strands for Exploration Path Planning and Informative Path Planning, and focusing on characteristics of how data have been modeled and understood. Finally, we highlight the existing challenges for relevant topics in this field.
Authors: Jimin Choi, Max Z. Li
Abstract: Hazardous environments such as chemical spills, radiological zones, and bio‑contaminated sites pose significant threats to human safety and public infrastructure. Rapid and reliable hazard mitigation in these settings often unsafe for humans, calling for autonomous systems that can adaptively sense and respond to evolving risks. This paper presents a decision‑making framework for autonomous vehicle dispatch in hazardous environments with uncertain and evolving risk levels. The system integrates a Bayesian Upper Confidence Bound (BUCB) sensing strategy with task‑specific vehicle routing problems with profits (VRPP), enabling adaptive coordination of unmanned aerial vehicles (UAVs) for hazard sensing and unmanned ground vehicles (UGVs) for cleaning. Using VRPP allows selective site visits under resource constraints by assigning each site a visit value that reflects sensing or cleaning priorities. Site‑level hazard beliefs are maintained through a time‑weighted Bayesian update. BUCB scores guide UAV routing to balance exploration and exploitation under uncertainty, while UGV routes are optimized to maximize expected hazard reduction under resource constraints. Simulation results demonstrate that our framework reduces the number of dispatch cycles to resolve hazards by around 30% on average compared to baseline dispatch strategies, underscoring the value of uncertainty‑aware vehicle dispatch for reliable hazard mitigation.
Authors: Fen Liu, Shenghai Yuan, Thien-Minh Nguyen, Wei Meng, Lihua Xie
Abstract: This paper proposes a strategy to encircle and intercept a non‑cooperative aerial point‑mass moving target by leveraging noisy range measurements for state estimation. In this approach, the guardians actively ensure the observability of the target by using an anti‑synchronization (AS), 3D ``vibrating string" trajectory, which enables rapid position and velocity estimation based on the Kalman filter. Additionally, a novel anti‑target controller is designed for the guardians to enable adaptive transitions from encircling a protected target to encircling, intercepting, and neutralizing a hostile target, taking into consideration the input constraints of the guardians. Based on the guaranteed uniform observability, the exponentially bounded stability of the state estimation error and the convergence of the encirclement error are rigorously analyzed. Simulation results and real‑world UAV experiments are presented to further validate the effectiveness of the system design.
Authors: Malaika Zafar, Roohan Ahmed Khan, Faryal Batool, Yasheerah Yaqoot, Ziang Guo, Mikhail Litvinov, Aleksey Fedoseev, Dzmitry Tsetserukou
Abstract: With the growing demand for efficient logistics, unmanned aerial vehicles (UAVs) are increasingly being paired with automated guided vehicles (AGVs). While UAVs offer the ability to navigate through dense environments and varying altitudes, they are limited by battery life, payload capacity, and flight duration, necessitating coordinated ground support.
Focusing on heterogeneous navigation, SwarmVLM addresses these limitations by enabling semantic collaboration between UAVs and ground robots through impedance control. The system leverages the Vision Language Model (VLM) and the Retrieval‑Augmented Generation (RAG) to adjust impedance control parameters in response to environmental changes. In this framework, the UAV acts as a leader using Artificial Potential Field (APF) planning for real‑time navigation, while the ground robot follows via virtual impedance links with adaptive link topology to avoid collisions with short obstacles.
The system demonstrated a 92% success rate across 12 real‑world trials. Under optimal lighting conditions, the VLM‑RAG framework achieved 8% accuracy in object detection and selection of impedance parameters. The mobile robot prioritized short obstacle avoidance, occasionally resulting in a lateral deviation of up to 50 cm from the UAV path, which showcases safe navigation in a cluttered setting.
Authors: Shashwat Jaiswal, Suman Raj, Subhajit Sidhanta, Yogesh Simmhan
Abstract: Recent years have seen an unprecedented growth in research that leverages the newest computing paradigm of Internet of Drones, comprising a fleet of connected Unmanned Aerial Vehicles (UAVs) used for a wide range of tasks such as monitoring and analytics in highly mobile and changing environments characteristic of disaster regions. Given that the typical data (i.e., videos and images) collected by the fleet of UAVs deployed in such scenarios can be considerably larger than what the onboard computers can process, the UAVs need to offload their data in real‑time to the edge and the cloud for further processing. To that end, we present the design of AerialDB ‑ a lightweight decentralized data storage and query system that can store and process time series data on a multi‑UAV system comprising: A) a fleet of hundreds of UAVs fitted with onboard computers, and B) ground‑based edge servers connected through a cellular link. Leveraging lightweight techniques for content‑based replica placement and indexing of shards, AerialDB has been optimized for efficient processing of different possible combinations of typical spatial and temporal queries performed by real‑world disaster management applications. Using containerized deployment spanning up to 400 drones and 80 edges, we demonstrate that AerialDB is able to scale efficiently while providing near real‑time performance with different realistic workloads. Further, AerialDB comprises a decentralized and locality‑aware distributed execution engine which provides graceful degradation of performance upon edge failures with relatively low latency while processing large spatio‑temporal data. AerialDB exhibits comparable insertion performance and 100 times improvement in query performance against state‑of‑the‑art baseline. Moreover, it exhibits a 10 times and 100 times improvement with insertion and query workloads respectively over the cloud baseline.
Authors: Bing Li, Haoming Guo, Zhiyuan Ren, Wenchi Cheng, Jialin Hu, Xinke Jian
Abstract: In emergency scenarios, the dynamic and harsh conditions necessitate timely trajectory adjustments for drones, leading to highly dynamic network topologies and potential task failures. To address these challenges, a collaborative computing strategy based strapdown inertial navigation system (SINS) prediction for emergency UAVs network (EUN) is proposed, where a two‑step weighted time expanded graph (WTEG) is constructed to deal with dynamic network topology changes. Furthermore, the task scheduling is formulated as a Directed Acyclic Graph (DAG) to WTEG mapping problem to achieve collaborative computing while transmitting among UAVs. Finally, the binary particle swarm optimization (BPSO) algorithm is employed to choose the mapping strategy that minimizes end‑to‑end processing latency. The simulation results validate that the collaborative computing strategy significantly outperforms both cloud and local computing in terms of latency. Moreover, the task success rate using SINS is substantially improved compared to approaches without prior prediction.
Authors: Hamidreza Asadian-Rad, Hossein Soleimani, Shahrokh Farahmand
Abstract: Unmanned aerial vehicles (UAVs) have been recently utilized in multi‑access edge computing (MEC) as edge servers. It is desirable to design UAVs' trajectories and user to UAV assignments to ensure satisfactory service to the users and energy efficient operation simultaneously. The posed optimization problem is challenging to solve because: (i) The formulated problem is non‑convex, (ii) Due to the mobility of ground users, their future positions and channel gains are not known in advance, (iii) Local UAVs' observations should be communicated to a central entity that solves the optimization problem. The (semi‑) centralized processing leads to communication overhead, communication/processing bottlenecks, lack of flexibility and scalability, and loss of robustness to system failures. To simultaneously address all these limitations, we advocate a fully decentralized setup with no centralized entity. Each UAV obtains its local observation and then communicates with its immediate neighbors only. After sharing information with neighbors, each UAV determines its next position via a locally run deep reinforcement learning (DRL) algorithm. None of the UAVs need to know the global communication graph. Two main components of our proposed solution are (i) Graph attention layers (GAT), and (ii) Experience and parameter sharing proximal policy optimization (EPS‑PPO). Our proposed approach eliminates all the limitations of semi‑centralized MADRL methods such as MAPPO and MA deep deterministic policy gradient (MADDPG), while guaranteeing a better performance than independent local DRLs such as in IPPO. Numerical results reveal notable performance gains in several different criteria compared to the existing MADDPG algorithm, demonstrating the potential for offering a better performance, while utilizing local communications only.
Authors: Xin Tang, Qian Chen, Fengshun Li, Youchun Gong, Yinqiu Liu, Wen Tian, Shaowen Qin, Xiaohuan Li
Abstract: With the growing demand for Uncrewed Aerial Vehicle (UAV) networks in sensitive applications, such as urban monitoring, emergency response, and secure sensing, ensuring reliable connectivity and covert communication has become increasingly vital. However, dynamic mobility and exposure risks pose significant challenges. To tackle these challenges, this paper proposes a self‑organizing UAV network framework combining Graph Diffusion‑based Policy Optimization (GDPO) with a Stackelberg Game (SG)‑based incentive mechanism. The GDPO method uses generative AI to dynamically generate sparse but well‑connected topologies, enabling flexible adaptation to changing node distributions and Ground User (GU) demands. Meanwhile, the Stackelberg Game (SG)‑based incentive mechanism guides self‑interested UAVs to choose relay behaviors and neighbor links that support cooperation and enhance covert communication. Extensive experiments are conducted to validate the effectiveness of the proposed framework in terms of model convergence, topology generation quality, and enhancement of covert communication performance.
Authors: Sasa Maric, Rasil Baidar, Robert Abbas, Sam Reisenfeld
Abstract: The integration of Terrestrial Networks (TN) and Non‑Terrestrial Networks (NTN), including 5G Advanced/6G and the Internet of Things (IoT) technologies, using Low Earth Orbit (LEO) satellites, high‑altitude platforms (HAPS), and Unmanned Aerial Vehicles (UAVs), is redefining the landscape of global connectivity. This paper introduces a new system‑level security framework for 5G Advanced/6G IoT‑integrated TN‑NTN architectures with AI‑native‑enabled cloud security. Due to the heterogeneity, scale, and distributed nature of these networks, new security challenges have emerged. Leveraging AI‑native cloud platforms offers powerful capabilities for real‑time threat detection, security automation, and intelligent policy enforcement. The NTN satellite access function enhances security for discontinuous coverage via satellite connections. In addition, this paper explores the security risks associated with integrated 5G Advanced/6G IoT TN‑NTN systems, including full network segmentation, network slicing, and the cloudification of the RAN and core. We present a comprehensive AI‑enabled cloud security framework and conclude with proposals for implementing AI‑powered, satellite‑based NTN within future 5G Advanced/6G IoT networks. Our approach emphasizes zero‑trust principles, federated learning, secure orchestration, a layered security framework, and resilience against adversarial threats.
Authors: Gabriele Magrini, Lorenzo Berlincioni, Luca Cultrera, Federico Becattini, Pietro Pala
Abstract: The diffusion of drones presents significant security and safety challenges. Traditional surveillance systems, particularly conventional frame‑based cameras, struggle to reliably detect these targets due to their small size, high agility, and the resulting motion blur and poor performance in challenging lighting conditions. This paper surveys the emerging field of event‑based vision as a robust solution to these problems. Event cameras virtually eliminate motion blur and enable consistent detection in extreme lighting. Their sparse, asynchronous output suppresses static backgrounds, enabling low‑latency focus on motion cues. We review the state‑of‑the‑art in event‑based drone detection, from data representation methods to advanced processing pipelines using spiking neural networks. The discussion extends beyond simple detection to cover more sophisticated tasks such as real‑time tracking, trajectory forecasting, and unique identification through propeller signature analysis. By examining current methodologies, available datasets, and the distinct advantages of the technology, this work demonstrates that event‑based vision provides a powerful foundation for the next generation of reliable, low‑latency, and efficient counter‑UAV systems.
Authors: Abdul Saboor, Zhuangzhuang Cui, Achiel Colpaert, Evgenii Vinogradov, Sofie Pollin
Abstract: Urban Air Mobility (UAM) envisions aerial corridors for Unmanned Aerial Vehicles (UAVs) to reduce ground traffic congestion by supporting 3D mobility, such as air taxis. A key challenge in these high‑mobility aerial corridors is ensuring reliable connectivity, where frequent handovers can degrade network performance. To resolve this, we present a Context‑Aware Smart Handover (CASH) protocol that uses a forward‑looking scoring mechanism based on UAV trajectory to make proactive handover decisions. We evaluate the performance of the proposed CASH against existing handover protocols in a custom‑built simulator. Results show that CASH reduces handover frequency by up to 78% while maintaining low outage probability. We then investigate the impact of base station density and safety margin on handover performance, where their optimal setups are empirically obtained to ensure reliable UAM communication.
Authors: Amirreza Rouhi, Sneh Patel, Noah McCarthy, Siddiqa Khan, Hadi Khorsand, Kaleb Lefkowitz, David K. Han
Abstract: The exponential growth in Unmanned Aerial Vehicles (UAVs) usage underscores the critical need of detecting them at extended distances to ensure safe operations, especially in densely populated areas. Despite the tremendous advances made in computer vision through deep learning, the detection of these small airborne objects remains a formidable challenge. While several datasets have been developed specifically for drone detection, the need for a more extensive and diverse collection of drone image data persists, particularly for long‑range detection under varying environmental conditions. We introduce here the Long Range Drone Detection (LRDD) Version 2 dataset, comprising 39,516 meticulously annotated images, as a second release of the LRDD dataset released previously. The LRDDv2 dataset enhances the LRDDv1 by incorporating a greater variety of images, providing a more diverse and comprehensive resource for drone detection research. What sets LRDDv2 apart is its inclusion of target range information for over 8,000 images, making it possible to develop algorithms for drone range estimation. Tailored for long‑range aerial object detection, the majority of LRDDv2's dataset consists of images capturing drones with 50 or fewer pixels in 1080p resolution. For access to the complete Long‑Range Drone Detection Dataset (LRDD)v2, please visit https://research.coe.drexel.edu/ece/imaple/lrddv2/ .
Authors: Chien-Wei Fu, Meng-Lin Ku
Abstract: In this paper, we propose an unmanned aerial vehicle (UAV)‑assisted federated learning (FL) framework that jointly optimizes UAV trajectory, user participation, power allocation, and data volume control to minimize overall system energy consumption. We begin by deriving the convergence accuracy of the FL model under multiple local updates, enabling a theoretical understanding of how user participation and data volume affect FL learning performance. The resulting joint optimization problem is non‑convex; to address this, we employ alternating optimization (AO) and successive convex approximation (SCA) techniques to convexify the non‑convex constraints, leading to the design of an iterative energy consumption optimization (ECO) algorithm. Simulation results confirm that ECO consistently outperform existing baseline schemes.
Authors: Jingpu Yang, Hang Zhang, Fengxian Ji, Yufeng Wang, Mingjie Wang, Yizhe Luo, Wenrui Ding
Abstract: Unmanned Aerial Vehicles (UAVs) have made significant advancements in communication stability and security through techniques such as frequency hopping, signal spreading, and adaptive interference suppression. However, challenges remain in modeling spectrum competition, integrating expert knowledge, and predicting opponent behavior. To address these issues, we propose UAV‑FPG (Unmanned Aerial Vehicle ‑ Frequency Point Game), a game‑theoretic environment model that simulates the dynamic interaction between interference and anti‑interference strategies of opponent and ally UAVs in communication frequency bands. The model incorporates a prior expert knowledge base to optimize frequency selection and employs large language models for path planning, simulating a "strong adversary". Experimental results highlight the effectiveness of integrating the expert knowledge base and the large language model, with the latter significantly improving path planning in dynamic scenarios through iterative interactions, outperforming fixed‑path strategies. UAV‑FPG provides a robust platform for advancing anti‑jamming strategies and intelligent decision‑making in UAV communication systems.
Authors: Peng Wei, Prabhash Ragbir, Stavros G. Vougioukas, Zhaodan Kong
Abstract: Autonomous unmanned aerial vehicle (UAV) navigation in orchards presents significant challenges due to obstacles and GPS‑deprived environments. In this work, we introduce a learning‑based approach to achieve vision‑based navigation of UAVs within orchard rows. Our method employs a variational autoencoder (VAE)‑based controller, trained with an intervention‑based learning framework that allows the UAV to learn a visuomotor policy from human experience. We validate our approach in real orchard environments with a custom‑built quadrotor platform. Field experiments demonstrate that after only a few iterations of training, the proposed VAE‑based controller can autonomously navigate the UAV based on a front‑mounted camera stream. The controller exhibits strong obstacle avoidance performance, achieves longer flying distances with less human assistance, and outperforms existing algorithms. Furthermore, we show that the policy generalizes effectively to novel environments and maintains competitive performance across varying conditions and speeds. This research not only advances UAV autonomy but also holds significant potential for precision agriculture, improving efficiency in orchard monitoring and management.
Authors: Xin Dong, Yiwei Zhang, Yangjie Cui, Jinwu Xiang, Daochun Li, Zhan Tu
Abstract: Event cameras offer significant advantages, including a wide dynamic range, high temporal resolution, and immunity to motion blur, making them highly promising for addressing challenging visual conditions. Extracting and utilizing effective information from asynchronous event streams is essential for the onboard implementation of event cameras. In this paper, we propose a streamlined event‑based intensity reconstruction scheme, event‑based single integration (ESI), to address such implementation challenges. This method guarantees the portability of conventional frame‑based vision methods to event‑based scenarios and maintains the intrinsic advantages of event cameras. The ESI approach reconstructs intensity images by performing a single integration of the event streams combined with an enhanced decay algorithm. Such a method enables real‑time intensity reconstruction at a high frame rate, typically 100 FPS. Furthermore, the relatively low computation load of ESI fits onboard implementation suitably, such as in UAV‑based visual tracking scenarios. Extensive experiments have been conducted to evaluate the performance comparison of ESI and state‑of‑the‑art algorithms. Compared to state‑of‑the‑art algorithms, ESI demonstrates remarkable runtime efficiency improvements, superior reconstruction quality, and a high frame rate. As a result, ESI enhances UAV onboard perception significantly under visual adversary surroundings. In‑flight tests, ESI demonstrates effective performance for UAV onboard visual tracking under extremely low illumination conditions(2‑10lux), whereas other comparative algorithms fail due to insufficient frame rate, poor image quality, or limited real‑time performance.
Authors: Muhammad Farhan Khan, Muhammad Ahmed Mohsin, Zeeshan Alam, Muhammad Saad, Muhammad Waqar
Abstract: In the future 6G and wireless networks, particularly in dense urban environments, bandwidth exhaustion and limited capacity pose significant challenges to enhancing data rates. We introduce a novel system model designed to improve the data rate of users in next‑generation multi‑cell networks by integrating Unmanned Aerial Vehicle (UAV)‑Assisted Reconfigurable Intelligent Surfaces (RIS), Non‑Orthogonal Multiple Access (NOMA), and Coordinated Multipoint Transmission (CoMP). Optimally deploying Aerial RIS for higher data rates, employing NOMA to improve spectral efficiency, and utilizing CoMP to mitigate inter‑cell interference (ICI), we significantly enhance the overall system capacity and sum rate. Furthermore, we address the challenge of feedback overhead associated with Quantized Phase Shifts (QPS) from the receiver to RIS. The feedback channel is band‑limited and cannot support a large overhead of QPS for uplink communication. To ensure seamless transmission, we propose a Machine Learning Autoencoder technique for a compressed communication of QPS from the receiver to RIS, while maintaining high accuracy. Additionally, we investigate the impact of the number of Aerial RIS elements and power allocation ratio for NOMA on the individual data rate of users. Our simulation results demonstrate substantial improvements in spectral efficiency, outage probability, and bandwidth utilization, highlighting the potential of the proposed architecture to enhance network performance.
Authors: Jianbo Ma, Hui Luo, Qi Chen, Yuankai Qi, Yumei Sun, Amin Beheshti, Jianlin Zhang, Ming-Hsuan Yang
Abstract: Multi‑object tracking (MOT) aims to track multiple objects while maintaining consistent identities across frames of a given video. In unmanned aerial vehicle (UAV) recorded videos, frequent viewpoint changes and complex UAV‑ground relative motion dynamics pose significant challenges, which often lead to unstable affinity measurement and ambiguous association. Existing methods typically model motion and appearance cues separately, overlooking their spatio‑temporal interplay and resulting in suboptimal tracking performance. In this work, we propose AMOT, which jointly exploits appearance and motion cues through two key components: an Appearance‑Motion Consistency (AMC) matrix and a Motion‑aware Track Continuation (MTC) module. Specifically, the AMC matrix computes bi‑directional spatial consistency under the guidance of appearance features, enabling more reliable and context‑aware identity association. The MTC module complements AMC by reactivating unmatched tracks through appearance‑guided predictions that align with Kalman‑based predictions, thereby reducing broken trajectories caused by missed detections. Extensive experiments on three UAV benchmarks, including VisDrone2019, UAVDT, and VT‑MOT‑UAV, demonstrate that our AMOT outperforms current state‑of‑the‑art methods and generalizes well in a plug‑and‑play and training‑free manner.
Authors: Luke Snow, Vikram Krishnamurthy
Abstract: Suppose there is an adversarial UAV network being tracked by a radar. How can the radar determine whether the UAVs are coordinating, in some well‑defined sense? How can the radar infer the objectives of the individual UAVs and the network as a whole? We present an abstract interpretation of such a strategic interaction, allowing us to conceptualize coordination as a linearly constrained multi‑objective optimization problem. Then, we present some tools from microeconomic theory that allow us to detect coordination and reconstruct individual UAV objective functions, from radar tracking signals. This corresponds to performing inverse multi‑objective optimization. We present details for how the abstract microeconomic interpretation corresponds to, and naturally arises from, physical‑layer radar waveform modulation and multi‑target filtering. This article serves as a tutorial, bringing together concepts from several established research contributions in an expository style.
Authors: Jixia Li, Nanben Suo, Shenzhe Xu, Shijie Sun, Shifan Zuo, Yougang Wang, Fengquan Wu, Juyong Zhang, Peter Timbie, Reza Ansari, Albert Stebbins, Xuelei Chen
Abstract: The Tianlai Cylinder Pathfinder Array consists of three adjacent cylindrical reflectors fixed on the ground, each 40 meters long and 15 meters wide, with the cylinder axis oriented along the North‑South (N‑S)direction. Dual linear polarisation feeds are distributed along the focus line, parallel to the cylinder axis. Measurement of the primary beam profile of these cylindrical reflectors is difficult, as they are too large to be placed in an anechoic chamber. While the beam profile along the East‑West (E‑W) direction can be measured with the transit observations of bright astronomical radio sources, the beam profile along the N‑S direction remains very uncertain. We present a preliminary measurement of the beam profile of the Tianlai cylindrical antenna along both the N‑S direction and E‑W direction in the frequency range of 700‑800 MHz, using a calibrator source carried by an unmanned aerial vehicle (UAV) flying in the far field. The beam profile of the Tianlai cylindrical antenna is determined from the analysis of the auto‑correlation signals from the the cylinder array correlator, taking into account the emitter antenna beam profile, itself measured with a dipole antenna on the ground. The accuracy of the UAV‑based determination of the cylinder beam profiles is validated by comparing the results with the one derived from bright astronomical source transits, and with simulated beams.
Authors: Ziye Jia, Sijie He, Qiuming Zhu, Wei Wang, Qihui Wu, Zhu Han
Abstract: Due to the high flexibility and versatility, unmanned aerial vehicles (UAVs) are leveraged in various fields including surveillance and disaster rescue.However, in UAV networks, routing is vulnerable to malicious damage due to distributed topologies and high dynamics. Hence, ensuring the routing security of UAV networks is challenging. In this paper, we characterize the routing process in a time‑varying UAV network with malicious nodes. Specifically, we formulate the routing problem to minimize the total delay, which is an integer linear programming and intractable to solve. Then, to tackle the network security issue, a blockchain‑based trust management mechanism (BTMM) is designed to dynamically evaluate trust values and identify low‑trust UAVs. To improve traditional practical Byzantine fault tolerance algorithms in the blockchain, we propose a consensus UAV update mechanism. Besides, considering the local observability, the routing problem is reformulated into a decentralized partially observable Markov decision process. Further, a multi‑agent double deep Q‑network based routing algorithm is designed to minimize the total delay. Finally, simulations are conducted with attacked UAVs and numerical results show that the delay of the proposed mechanism decreases by 13.39%, 12.74%, and 16.6% than multi‑agent proximal policy optimal algorithms, multi‑agent deep Q‑network algorithms, and methods without BTMM, respectively.
Authors: Prerana Ramkumar
Abstract: Generative Adversarial Networks (GANs) have achieved realistic super‑resolution (SR) of images however, they lack semantic consistency and per‑pixel confidence, limiting their credibility in critical remote sensing applications such as disaster response, urban planning and agriculture. This paper introduces Semantic and Uncertainty‑Aware ESRGAN (SU‑ESRGAN), the first SR framework designed for satellite imagery to integrate the ESRGAN, segmentation loss via DeepLabv3 for class detail preservation and Monte Carlo dropout to produce pixel‑wise uncertainty maps. The SU‑ESRGAN produces results (PSNR, SSIM, LPIPS) comparable to the Baseline ESRGAN on aerial imagery. This novel model is valuable in satellite systems or UAVs that use wide field‑of‑view (FoV) cameras, trading off spatial resolution for coverage. The modular design allows integration in UAV data pipelines for on‑board or post‑processing SR to enhance imagery resulting due to motion blur, compression and sensor limitations. Further, the model is fine‑tuned to evaluate its performance on cross domain applications. The tests are conducted on two drone based datasets which differ in altitude and imaging perspective. Performance evaluation of the fine‑tuned models show a stronger adaptation to the Aerial Maritime Drone Dataset, whose imaging characteristics align with the training data, highlighting the importance of domain‑aware training in SR‑applications.
Authors: Kapel Dev, Yash Madhwal, Sofia Shevelo, Pavel Osinenko, Yury Yanovich
Abstract: Unmanned aerial vehicle (UAV) swarms are increasingly used in critical applications such as aerial mapping, environmental monitoring, and autonomous delivery. However, the reliability of these systems is highly dependent on uninterrupted access to the Global Navigation Satellite Systems (GNSS) signals, which can be disrupted in real‑world scenarios due to interference, environmental conditions, or adversarial attacks, causing disorientation, collision risks, and mission failure. This paper proposes SwarmRaft, a blockchain‑inspired positioning and consensus framework for maintaining coordination and data integrity in UAV swarms operating under GNSS‑denied conditions. SwarmRaft leverages the Raft consensus algorithm to enable distributed drones (nodes) to agree on state updates such as location and heading, even in the absence of GNSS signals for one or more nodes. In our prototype, each node uses GNSS and local sensing, and communicates over WiFi in a simulated swarm. Upon signal loss, consensus is used to reconstruct or verify the position of the failed node based on its last known state and trajectory. Our system demonstrates robustness in maintaining swarm coherence and fault tolerance through a lightweight, scalable communication model. This work offers a practical and secure foundation for decentralized drone operation in unpredictable environments.
Authors: Mingzhe Fan, Geng Sun, Hongyang Pan, Jiacheng Wang, Jiancheng An, Hongyang Du, Chau Yuen
Abstract: Stacked intelligent metasurfaces (SIMs) have emerged as a promising technology for realizing wave‑domain signal processing, while the fixed SIMs will limit the communication performance of the system compared to the mobile SIMs. In this work, we consider a UAV‑mounted SIMs (UAV‑SIMs) assisted communication system, where UAVs as base stations (BSs) can cache the data processed by SIMs, and also as mobile vehicles flexibly deploy SIMs to enhance the communication performance. To this end, we formulate a UAV‑SIM‑based joint optimization problem (USBJOP) to comprehensively consider the association between UAV‑SIMs and users, the locations of UAV‑SIMs, and the phase shifts of UAV‑SIMs, aiming to maximize the network capacity. Due to the non‑convexity and NP‑hardness of USBJOP, we decompose it into three sub‑optimization problems, which are the association between UAV‑SIMs and users optimization problem (AUUOP), the UAV location optimization problem (ULOP), and the UAV‑SIM phase shifts optimization problem (USPSOP). Then, these three sub‑optimization problems are solved by an alternating optimization (AO) strategy. Specifically, AUUOP and ULOP are transformed to a convex form and then solved by the CVX tool, while we employ a layer‑by‑layer iterative optimization method for USPSOP. Simulation results verify the effectiveness of the proposed strategy under different simulation setups.
Authors: Hengxing Cai, Jinhan Dong, Yijie Rao, Jingcheng Deng, Jingjun Tan, Qien Chen, Haidong Wang, Zhen Wang, Shiyu Huang, Agachai Sumalee, Renxin Zhong
Abstract: Unmanned Aerial Vehicle (UAV) Vision‑Language Navigation (VLN) aims to enable agents to accurately localize targets and plan flight paths in complex environments based on natural language instructions, with broad applications in intelligent inspection, disaster rescue, and urban monitoring. Recent progress in Vision‑Language Models (VLMs) has provided strong semantic understanding for this task, while reinforcement learning (RL) has emerged as a promising post‑training strategy to further improve generalization. However, existing RL methods often suffer from inefficient use of training data, slow convergence, and insufficient consideration of the difficulty variation among training samples, which limits further performance improvement. To address these challenges, we propose Semantic‑Aware Gaussian Curriculum Scheduling (SA‑GCS), a novel training framework that systematically integrates Curriculum Learning (CL) into RL. SA‑GCS employs a Semantic‑Aware Difficulty Estimator (SA‑DE) to quantify the complexity of training samples and a Gaussian Curriculum Scheduler (GCS) to dynamically adjust the sampling distribution, enabling a smooth progression from easy to challenging tasks. This design significantly improves training efficiency, accelerates convergence, and enhances overall model performance. Extensive experiments on the CityNav benchmark demonstrate that SA‑GCS consistently outperforms strong baselines across all metrics, achieves faster and more stable convergence, and generalizes well across models of different scales, highlighting its robustness and scalability. The implementation of our approach is publicly available.
Authors: Belman Jahir Rodriguez, Sergio F. Chevtchenko, Marcelo Herrera Martinez, Yeshwanth Bethi, Saeed Afshar
Abstract: We introduce a U‑net model for 360° acoustic source localization formulated as a spherical semantic segmentation task. Rather than regressing discrete direction‑of‑arrival (DoA) angles, our model segments beamformed audio maps (azimuth & elevation) into regions of active sound presence. Using delay‑and‑sum (DAS) beamforming on a custom 24‑microphone array, we generate signals aligned with drone GPS telemetry to create binary supervision masks. A modified U‑Net, trained on frequency‑domain representations of these maps, learns to identify spatially distributed source regions while addressing class imbalance via the Tversky loss. Because the network operates on beamformed energy maps, the approach is inherently array‑independent and can adapt to different microphone configurations and can be transferred to different microphone configurations with minimal adaptation. The segmentation outputs are post‑processed by computing centroids over activated regions, enabling robust DoA estimates. Our dataset includes real‑world open‑field recordings of a DJI Air 3 drone, synchronized with 360° video and flight logs across multiple dates and locations. Experimental results show that U‑net generalizes across environments, providing improved angular precision, offering a new paradigm for dense spatial audio understanding beyond traditional Sound Source Localization (SSL). We additionally validate the same beamforming‑plus‑segmentation formulation on the DCASE 2019 TAU Spatial Sound Events benchmark, showing that the approach generalizes beyond drone acoustics to multiclass Sound Event Localization and Detection (SELD) scenarios.
Authors: Jianqiang Xiao, Yuexuan Sun, Yixin Shao, Boxi Gan, Rongqiang Liu, Yanjing Wu, Weili Guan, Xiang Deng
Abstract: Aerial navigation is a fundamental yet underexplored capability in embodied intelligence, enabling agents to operate in large‑scale, unstructured environments where traditional navigation paradigms fall short. However, most existing research follows the Vision‑and‑Language Navigation (VLN) paradigm, which heavily depends on sequential linguistic instructions, limiting its scalability and autonomy. To address this gap, we introduce UAV‑ON, a benchmark for large‑scale Object Goal Navigation (ObjectNav) by aerial agents in open‑world environments, where agents operate based on high‑level semantic goals without relying on detailed instructional guidance as in VLN. UAV‑ON comprises 14 high‑fidelity Unreal Engine environments with diverse semantic regions and complex spatial layouts, covering urban, natural, and mixed‑use settings. It defines 1270 annotated target objects, each characterized by an instance‑level instruction that encodes category, physical footprint, and visual descriptors, allowing grounded reasoning. These instructions serve as semantic goals, introducing realistic ambiguity and complex reasoning challenges for aerial agents. To evaluate the benchmark, we implement several baseline methods, including Aerial ObjectNav Agent (AOA), a modular policy that integrates instruction semantics with egocentric observations for long‑horizon, goal‑directed exploration. Empirical results show that all baselines struggle in this setting, highlighting the compounded challenges of aerial navigation and semantic goal grounding. UAV‑ON aims to advance research on scalable UAV autonomy driven by semantic goal descriptions in complex real‑world environments.
Authors: Saichao Liu, Geng Sun, Chuang Zhang, Xuejie Liu, Jiacheng Wang, Changyuan Zhao, Dusit Niyato
Abstract: Mobile edge computing (MEC) is a promising technique to improve the computational capacity of smart devices (SDs) in Internet of Things (IoT). However, the performance of MEC is restricted due to its fixed location and limited service scope. Hence, we investigate an unmanned aerial vehicle (UAV)‑assisted MEC system, where multiple UAVs are dispatched and each UAV can simultaneously provide computing service for multiple SDs. To improve the performance of system, we formulated a UAV‑based trajectory control and resource allocation multi‑objective optimization problem (TCRAMOP) to simultaneously maximize the offloading number of UAVs and minimize total offloading delay and total energy consumption of UAVs by optimizing the flight paths of UAVs as well as the computing resource allocated to served SDs. Then, consider that the solution of TCRAMOP requires continuous decision‑making and the system is dynamic, we propose an enhanced deep reinforcement learning (DRL) algorithm, namely, distributed proximal policy optimization with imitation learning (DPPOIL). This algorithm incorporates the generative adversarial imitation learning technique to improve the policy performance. Simulation results demonstrate the effectiveness of our proposed DPPOIL and prove that the learned strategy of DPPOIL is better compared with other baseline methods.
Authors: Bin Xie, Congxuan Zhang, Fagan Wang, Peng Liu, Feng Lu, Zhen Chen, Weiming Hu
Abstract: The widespread application of Unmanned Aerial Vehicles (UAVs) has raised serious public safety and privacy concerns, making UAV perception crucial for anti‑UAV tasks. However, existing UAV tracking datasets predominantly feature conspicuous objects and lack diversity in scene complexity and attribute representation, limiting their applicability to real‑world scenarios. To overcome these limitations, we present the CST Anti‑UAV, a new thermal infrared dataset specifically designed for Single Object Tracking (SOT) in Complex Scenes with Tiny UAVs (CST). It contains 220 video sequences with over 240k high‑quality bounding box annotations, highlighting two key properties: a significant number of tiny‑sized UAV targets and the diverse and complex scenes. To the best of our knowledge, CST Anti‑UAV is the first dataset to incorporate complete manual frame‑level attribute annotations, enabling precise evaluations under varied challenges. To conduct an in‑depth performance analysis for CST Anti‑UAV, we evaluate 20 existing SOT methods on the proposed dataset. Experimental results demonstrate that tracking tiny UAVs in complex environments remains a challenge, as the state‑of‑the‑art method achieves only 35.92% state accuracy, much lower than the 67.69% observed on the Anti‑UAV410 dataset. These findings underscore the limitations of existing benchmarks and the need for further advancements in UAV tracking research. The CST Anti‑UAV benchmark is about to be publicly released, which not only fosters the development of more robust SOT methods but also drives innovation in anti‑UAV systems.
Authors: Bidya Debnath, Mst Mostary Begum, Prashant Neupant, Brooke E. Molen, Junming Diao
Abstract: This paper presents a novel UAV swarm‑based phased array antenna system that leverages MagSafe‑ and LEGO‑inspired radio frequency (RF) connectors to address key challenges in distributed phased arrays, including inter‑element oscillator synchronization, localization, phase coherence, and positional accuracy. The proposed non‑threaded, hands‑free connectors enable precise inter‑element spacing and establish a continuous, low‑loss RF signal propagation path during mid‑flight docking. A multi‑stage optimization of the RF connector achieves a compact form factor, DC‑to‑RF bandwidth, and a measured insertion loss as low as 0.2\,dB. The system architecture offers scalability in gain and frequency by adjusting the array element density per UAV and UAV dimensions. Experimental results from both stationary and in‑flight tests of two UAV‑based phased array prototypes align closely with simulations, demonstrating robust beam steering to multiple directions. This work delivers a practical, scalable, and low‑complexity platform that enables rapid deployment for next‑generation airborne communications, radar, and remote sensing applications.
Authors: Shengcai Zhou, Luping Xiang, Kun Yang, Kai Kit Wong, Dapeng Oliver Wu, Chan-Byoung Chae
Abstract: Airborne mobile Integrated Sensing and Communication (ISAC) base stations have garnered significant attention recently, with ISAC technology being a crucial application for 6G networks. Since ISAC can sense potential mobile communication users, this paper studies an effective scheme for a multi‑UAV network tailored for emergency communication. In this paper, we develop a temporal‑assisted frame structure utilizing integrated omnidirectional and directional beampattern to facilitate efficient and frequent searching, with extended Kalman filtering (EKF) as an aid to beam alignment. Further, we address an optimization problem to maximize the total achievable rate per slot by jointly designing UAV beamforming, load management, and UAV direction planning, all while adhering to the constraints of the predicted beam coverage. Given the problem NP‑hard, we introduce three robust mechanisms for its resolution: an enhanced distributed Successive Convex Approximation (SCA)‑Iterative Rank Minimization (IRM) algorithm, an coalition game approach, and a Fermat point search method. In particular, the proposed SCA‑IRM algorithm decomposes the original complex optimization problem into several sub‑problems and assigns them equally to each UAV, so as to realize distributed computing and improve computational efficiency. Our proposed simulations demonstrate the improved system performance in terms of communication rate, fairness, and sensing accuracy, providing design guidelines of UAV‑assisted emergency communication networking.
Authors: Yuda Chen, Shuaikang Wang, Jie Li, Meng Guo
Abstract: A reliable communication network is essential for multiple UAVs operating within obstacle‑cluttered environments, where limited communication due to obstructions often occurs. A common solution is to deploy intermediate UAVs to relay information via a multi‑hop network, which introduces two challenges: (i) how to design the structure of multihop networks; and (ii) how to maintain connectivity during collaborative motion. To this end, this work first proposes an efficient constrained search method based on the minimumedge RRT? algorithm, to find a spanning‑tree topology that requires a less number of UAVs for the deployment task. Then, to achieve this deployment, a distributed model predictive control strategy is proposed for the online motion coordination. It explicitly incorporates not only the inter‑UAV and UAVobstacle distance constraints, but also the line‑of‑sight (LOS) connectivity constraint. These constraints are well‑known to be nonlinear and often tackled by various approximations. In contrast, this work provides a theoretical guarantee that all agent trajectories are ensured to be collision‑free with a teamwise LOS connectivity at all time. Numerous simulations are performed in 3D valley‑like environments, while hardware experiments validate its dynamic adaptation when the deployment position changes online.
Authors: Julian Kanz, Christian Gesell, Christina Bonfert, David Werbunat, Alexander Grathwohl, Julian Aguilar, Martin Vossiek, Christian Waldschmidt
Abstract: Advancements in analog‑to‑digital converter (ADC) technology have enabled higher sampling rates, making it feasible to adopt digital radar architectures that directly sample the radio‑frequency (RF) signal, eliminating the need for analog downconversion. This digital approach supports greater flexibility in waveform design and signal processing, particularly through digital modulation schemes like orthogonal frequency division multiplexing (OFDM). This paper presents a digital radar system mounted on an uncrewed aerial vehicle (UAV), which employs OFDM waveforms for coherent multistatic synthetic aperture radar (SAR) imaging in the L‑band. The radar setup features a primary UAV node responsible for signal transmission and monostatic data acquisition, alongside secondary nodes that operate in a receive‑only mode. These secondary nodes capture the radar signal reflected from the scene as well as a direct sidelink signal. RF signals from both the radar and sidelink paths are sampled and processed offline. To manage data storage efficiently, a trigger mechanism is employed to record only the relevant portions of the radar signal. The system maintains coherency in both fast‑time and slow‑time domains, which is essential for multistatic SAR imaging. Because the secondary nodes are passive, the system can be easily scaled to accommodate a larger swarm of UAVs. The paper details the full signal processing workflow for both monostatic and multistatic SAR image formation, including an analysis and correction of synchronization errors that arise from the uncoupled operation of the nodes. The proposed coherent processing method is validated through static radar measurements, demonstrating coherency achieved by the concept. Additionally, a UAV‑based bistatic SAR experiment demonstrates the system's performance by producing high‑resolution monostatic, bistatic, and combined multistatic SAR images.
Authors: Huiyang Hu, Peijin Wang, Yingchao Feng, Kaiwen Wei, Wenxin Yin, Wenhui Diao, Mengyu Wang, Hanbo Bi, Kaiyue Kang, Tong Ling, Kun Fu, Xian Sun
Abstract: Remote sensing (RS) images from multiple modalities and platforms exhibit diverse details due to differences in sensor characteristics and imaging perspectives. Existing vision‑language research in RS largely relies on relatively homogeneous data sources. Moreover, they still remain limited to conventional visual perception tasks such as classification or captioning. As a result, these methods fail to serve as a unified and standalone framework capable of effectively handling RS imagery from diverse sources in real‑world applications. To address these issues, we propose RingMo‑Agent, a model designed to handle multi‑modal and multi‑platform data that performs perception and reasoning tasks based on user textual instructions. Compared with existing models, RingMo‑Agent 1) is supported by a large‑scale vision‑language dataset named RS‑VL3M, comprising over 3 million image‑text pairs, spanning optical, SAR, and infrared (IR) modalities collected from both satellite and UAV platforms, covering perception and challenging reasoning tasks; 2) learns modality adaptive representations by incorporating separated embedding layers to construct isolated features for heterogeneous modalities and reduce cross‑modal interference; 3) unifies task modeling by introducing task‑specific tokens and employing a token‑based high‑dimensional hidden state decoding mechanism designed for long‑horizon spatial tasks. Extensive experiments on various RS vision‑language tasks demonstrate that RingMo‑Agent not only proves effective in both visual understanding and sophisticated analytical tasks, but also exhibits strong generalizability across different platforms and sensing modalities.
Authors: Zhang Liu, Lianfen Huang, Zhibin Gao, Xianbin Wang, Dusit Niyato, Xuemin, Shen
Abstract: Low altitude uncrewed aerial vehicles (UAVs) are expected to facilitate the development of aerial‑ground integrated intelligent transportation systems and unlocking the potential of the emerging low‑altitude economy. However, several critical challenges persist, including the dynamic optimization of network resources and UAV trajectories, limited UAV endurance, and imperfect channel state information (CSI). In this paper, we offer new insights into low‑altitude economy networking by exploring intelligent UAV‑assisted vehicle‑to‑everything communication strategies aligned with UAV energy efficiency. Particularly, we formulate an optimization problem of joint channel allocation, power control, and flight altitude adjustment in UAV‑assisted vehicular networks. Taking CSI feedback delay into account, our objective is to maximize the vehicle‑to‑UAV communication sum rate while satisfying the UAV's long‑term energy constraint. To this end, we first leverage Lyapunov optimization to decompose the original long‑term problem into a series of per‑slot deterministic subproblems. We then propose a diffusion‑based deep deterministic policy gradient (D3PG) algorithm, which innovatively integrates diffusion models to determine optimal channel allocation, power control, and flight altitude adjustment decisions. Through extensive simulations using real‑world vehicle mobility traces, we demonstrate the superior performance of the proposed D3PG algorithm compared to existing benchmark solutions.
Authors: Ali M. Ali, Hashim A. Hashim, Awantha Jayasiri
Abstract: This paper presents a novel design for finite‑time position control of quadrotor Unmanned Aerial Vehicles (UAVs). A robust, finite‑time, nonlinear feedback controller is introduced to reject bounded disturbances in tracking tasks. The proposed control framework differs conceptually from conventional controllers that utilize Euler angle parameterization for attitude and adhere to the traditional hierarchical inner‑outer loop design. In standard approaches, the translational controller and the corresponding desired attitude are computed first, followed by the design of the attitude controller based on time‑scale separation between fast attitude and slow translational dynamics. In contrast, the proposed control scheme is quaternion‑based and utilizes a transit feed‑forward term in the attitude dynamics that anticipates the slower translational subsystem. Robustness is achieved through the use of continuously differentiable sliding manifolds. The proposed approach guarantees semi‑global finite‑time stability, without requiring time‑scale separation. Finally, numerical simulation results are provided to demonstrate the effectiveness of the proposed controller.
Authors: Beining Wu, Jun Huang, Shui Yu
Abstract: The development of next‑generation networking systems has inherently shifted from throughput‑based paradigms towards intelligent, information‑aware designs that emphasize the quality, relevance, and utility of transmitted information, rather than sheer data volume. While classical network metrics, such as latency and packet loss, remain significant, they are insufficient to quantify the nuanced information quality requirements of modern intelligent applications, including autonomous vehicles, digital twins, and metaverse environments. In this survey, we present the first comprehensive study of the ``X of Information'' continuum by introducing a systematic four‑dimensional taxonomic framework that structures information metrics along temporal, quality/utility, reliability/robustness, and network/communication dimensions. We uncover the increasing interdependencies among these dimensions, whereby temporal freshness triggers quality evaluation, which in turn helps with reliability appraisal, ultimately enabling effective network delivery. Our analysis reveals that artificial intelligence technologies, such as deep reinforcement learning, multi‑agent systems, and neural optimization models, enable adaptive, context‑aware optimization of competing information quality objectives. In our extensive study of six critical application domains, covering autonomous transportation, industrial IoT, healthcare digital twins, UAV communications, LLM ecosystems, and metaverse settings, we illustrate the revolutionary promise of multi‑dimensional information metrics for meeting diverse operational needs. Our survey identifies prominent implementation challenges, including ...
Authors: Bowen Li, Junting Chen
Abstract: Dynamic low altitude networks offer significant potential for efficient and reliable data transport via unmanned aerial vehicles (UAVs) relays which usually operate with predetermined trajectories. However, it is challenging to optimize the data routing and resource allocation due to the time‑varying topology and the need to control interference with terrestrial systems. Traditional schemes rely on time‑expanded graphs with uniform and fine time subdivisions, making them impractical for interference‑aware applications. This paper develops a dynamic space‑time graph model with a cross‑layer optimization framework that converts a joint routing and predictive resource allocation problem into a joint bottleneck path planning and resource allocation problem. We develop explicit deterministic bounds to handle the channel uncertainty and prove a monotonicity property in the problem structure that enables us to efficiently reach the globally optimal solution to the predictive resource allocation subproblem. Then, this approach is extended to multi‑commodity transmission tasks through time‑frequency allocation, and a bisection search algorithm is developed to find the optimum solution by leveraging the monotonicity of the feasible set family. Simulations verify that the single‑commodity algorithm approaches global optimality with more than 30 dB performance gain over the classical graph‑based methods for delay‑sensitive and large data transportation. At the same time, the multi‑commodity method achieves 100X improvements in dense service scenarios and enables an additional 20 dB performance gain by data segmenting.
Authors: Weihao Mao, Yang Lu, Bo Ai, Tony Q. S. Quek
Abstract: Low‑altitude economy (LAE) is an emerging business model, which heavily relies on integrated sensing and communications (ISAC), mobile edge computing (MEC), and covert communications. This paper investigates the convert transmission design in MEC‑based networked ISAC systems towards LAE, where an MEC server coordinates multiple access points to simultaneously receive computation tasks from multiple unmanned aerial vehicles (UAVs), locate a target in a sensing area, and maintain UAVs' covert transmission against multiple wardens. We first derive closed‑form expressions for the detection error probability (DEP) at wardens. Then, we formulate a total energy consumption minimization problem by optimizing communication, sensing, and computation resources as well as UAV trajectories, subject to the requirements on quality of MEC services, DEP, and radar signal‑to‑interference‑and‑noise ratio, and the causality of UAV trajectories. An alternating optimization based algorithm is proposed to handle the considered problem, which decomposes it into two subproblems: joint optimization of communication, sensing, and computation resources, and UAV trajectory optimization. The former is addressed by a successive convex approximation based algorithm, while the latter is solved via a trust‑region based algorithm. Simulations validate the effectiveness of the proposed algorithm compared with various benchmarks, and reveal the trade‑offs among communication, sensing, and computation in LAE systems.
Authors: Luka Šiktar, Branimir Ćaran, Bojan Šekoranja, Marko Švaco
Abstract: In this paper, we present a subsystem, using Unmanned Aerial Vehicles (UAV), for search and rescue missions, focusing on people detection, face recognition and tracking of identified individuals. The proposed solution integrates a UAV with ROS2 framework, that utilizes multiple convolutional neural networks (CNN) for search missions. System identification and PD controller deployment are performed for autonomous UAV navigation. The ROS2 environment utilizes the YOLOv11 and YOLOv11‑pose CNNs for tracking purposes, and the dlib library CNN for face recognition. The system detects a specific individual, performs face recognition and starts tracking. If the individual is not yet known, the UAV operator can manually locate the person, save their facial image and immediately initiate the tracking process. The tracking process relies on specific keypoints identified on the human body using the YOLOv11‑pose CNN model. These keypoints are used to track a specific individual and maintain a safe distance. To enhance accurate tracking, system identification is performed, based on measurement data from the UAVs IMU. The identified system parameters are used to design PD controllers that utilize YOLOv11‑pose to estimate the distance between the UAVs camera and the identified individual. The initial experiments, conducted on 14 known individuals, demonstrated that the proposed subsystem can be successfully used in real time. The next step involves implementing the system on a large experimental UAV for field use and integrating autonomous navigation with GPS‑guided control for rescue operations planning.
Authors: Alberto Marchisio, Muhammad Shafique
Abstract: The growing need for intelligent, adaptive, and energy‑efficient autonomous systems across fields such as robotics, mobile agents (e.g., UAVs), and self‑driving vehicles is driving interest in neuromorphic computing. By drawing inspiration from biological neural systems, neuromorphic approaches offer promising pathways to enhance the perception, decision‑making, and responsiveness of autonomous platforms. This paper surveys recent progress in neuromorphic algorithms, specialized hardware, and cross‑layer optimization strategies, with a focus on their deployment in real‑world autonomous scenarios. Special attention is given to event‑based dynamic vision sensors and their role in enabling fast, efficient perception. The discussion highlights new methods that improve energy efficiency, robustness, adaptability, and reliability through the integration of spiking neural networks into autonomous system architectures. We integrate perspectives from machine learning, robotics, neuroscience, and neuromorphic engineering to offer a comprehensive view of the state of the field. Finally, emerging trends and open challenges are explored, particularly in the areas of real‑time decision‑making, continual learning, and the development of secure, resilient autonomous systems.
Authors: R. Spencer Hallyburton, Miroslav Pajic
Abstract: Multi‑agent collaboration enhances situational awareness in intelligence, surveillance, and reconnaissance (ISR) missions. Ad hoc networks of unmanned aerial vehicles (UAVs) allow for real‑time data sharing, but they face security challenges due to their decentralized nature, making them vulnerable to cyber‑physical attacks. This paper introduces a trust‑based framework for assured sensor fusion in distributed multi‑agent networks, utilizing a hidden Markov model (HMM)‑based approach to estimate the trustworthiness of agents and their provided information in a decentralized fashion. Trust‑informed data fusion prioritizes fusing data from reliable sources, enhancing resilience and accuracy in contested environments. To evaluate the assured sensor fusion under attacks on system/mission sensing, we present a novel multi‑agent aerial dataset built from the Unreal Engine simulator. We demonstrate through case studies improved ISR performance and an ability to detect malicious actors in adversarial settings.
Authors: Maharshi Shastri, Ujjval Shrivastav
Abstract: The increasing demand for fast and cost effective last mile delivery solutions has catalyzed significant advancements in drone based logistics. This research describes the development of an AI integrated drone delivery system, focusing on route optimization, object detection, secure package handling, and real time tracking. The proposed system leverages YOLOv4 Tiny for object detection, the NEO 6M GPS module for navigation, and the A7670 SIM module for real time communication. A comparative analysis of lightweight AI models and hardware components is conducted to determine the optimal configuration for real time UAV based delivery. Key challenges including battery efficiency, regulatory compliance, and security considerations are addressed through the integration of machine learning techniques, IoT devices, and encryption protocols. Preliminary studies demonstrate improvement in delivery time compared to conventional ground based logistics, along with high accuracy recipient authentication through facial recognition. The study also discusses ethical implications and societal acceptance of drone deliveries, ensuring compliance with FAA, EASA and DGCA regulatory standards. Note: This paper presents the architecture, design, and preliminary simulation results of the proposed system. Experimental results, simulation benchmarks, and deployment statistics are currently being acquired. A comprehensive analysis will be included in the extended version of this work.
Authors: Lijie Zheng, Ji He, Shih Yu Chang, Yulong Shen, Dusit Niyato
Abstract: This work tackles the physical layer security (PLS) problem of maximizing the secrecy rate in heterogeneous UAV networks (HetUAVNs) under propulsion energy constraints. Unlike prior studies that assume uniform UAV capabilities or overlook energy‑security trade‑offs, we consider a realistic scenario where UAVs with diverse payloads and computation resources collaborate to serve ground terminals in the presence of eavesdroppers. To manage the complex coupling between UAV motion and communication, we propose a hierarchical optimization framework. The inner layer uses a semidefinite relaxation (SDR)‑based S2DC algorithm combining penalty functions and difference‑of‑convex (d.c.) programming to solve the secrecy precoding problem with fixed UAV positions. The outer layer introduces a Large Language Model (LLM)‑guided heuristic multi‑agent reinforcement learning approach (LLM‑HeMARL) for trajectory optimization. LLM‑HeMARL efficiently incorporates expert heuristics policy generated by the LLM, enabling UAVs to learn energy‑aware, security‑driven trajectories without the inference overhead of real‑time LLM calls. The simulation results show that our method outperforms existing baselines in secrecy rate and energy efficiency, with consistent robustness across varying UAV swarm sizes and random seeds.
Authors: Maaz Qureshi, Mohammad Omid Bagheri, Abdelrahman Elbadrawy, William Melek, George Shaker
Abstract: Accurate characterization of modern on‑chip antennas remains challenging, as current probe‑station techniques offer limited angular coverage, rely on bespoke hardware, and require frequent manual alignment. This research introduces RAPTAR (Radiation Pattern Acquisition through Robotic Automation), a portable, state‑of‑the‑art, and autonomous system based on collaborative robotics. RAPTAR enables 3D radiation‑pattern measurement of integrated radar modules without dedicated anechoic facilities. The system is designed to address the challenges of testing radar modules mounted in diverse real‑world configurations, including vehicles, UAVs, AR/VR headsets, and biomedical devices, where traditional measurement setups are impractical. A 7‑degree‑of‑freedom Franka cobot holds the receiver probe and performs collision‑free manipulation across a hemispherical spatial domain, guided by real‑time motion planning and calibration accuracy with RMS error below 0.9 mm. The system achieves an angular resolution upto 2.5 degree and integrates seamlessly with RF instrumentation for near‑ and far‑field power measurements. Experimental scans of a 60 GHz radar module show a mean absolute error of less than 2 dB compared to full‑wave electromagnetic simulations ground truth. Benchmarking against baseline method demonstrates 36.5% lower mean absolute error, highlighting RAPTAR accuracy and repeatability.
Authors: Boni Hu, Zhenyu Xia, Lin Chen, Pengcheng Han, Shuhui Bu
Abstract: Visual relocalization, which estimates the 6‑degree‑of‑freedom (6‑DoF) camera pose from query images, is fundamental to remote sensing and UAV applications. Existing methods face inherent trade‑offs: image‑based retrieval and pose regression approaches lack precision, while structure‑based methods that register queries to Structure‑from‑Motion (SfM) models suffer from computational complexity and limited scalability. These challenges are particularly pronounced in remote sensing scenarios due to large‑scale scenes, high altitude variations, and domain gaps of existing visual priors. To overcome these limitations, we leverage 3D Gaussian Splatting (3DGS) as a novel scene representation that compactly encodes both 3D geometry and appearance. We introduce \mathrmHi^2‑GSLoc, a dual‑hierarchical relocalization framework that follows a sparse‑to‑dense and coarse‑to‑fine paradigm, fully exploiting the rich semantic information and geometric constraints inherent in Gaussian primitives. To handle large‑scale remote sensing scenarios, we incorporate partitioned Gaussian training, GPU‑accelerated parallel matching, and dynamic memory management strategies. Our approach consists of two stages: (1) a sparse stage featuring a Gaussian‑specific consistent render‑aware sampling strategy and landmark‑guided detector for robust and accurate initial pose estimation, and (2) a dense stage that iteratively refines poses through coarse‑to‑fine dense rasterization matching while incorporating reliability verification. Through comprehensive evaluation on simulation data, public datasets, and real flight experiments, we demonstrate that our method delivers competitive localization accuracy, recall rate, and computational efficiency while effectively filtering unreliable pose estimates. The results confirm the effectiveness of our approach for practical remote sensing applications.
Authors: Ioannis Tsampikos Papapetros, Ioannis Kansizoglou, Antonios Gasteratos
Abstract: Visual Place Recognition (vPR) plays a crucial role in Unmanned Aerial Vehicle (UAV) navigation, enabling robust localization across diverse environments. Despite significant advancements, aerial vPR faces unique challenges due to the limited availability of large‑scale, high‑altitude datasets, which limits model generalization, along with the inherent rotational ambiguity in UAV imagery. To address these challenges, we introduce LASED, a large‑scale aerial dataset with approximately one million images, systematically sampled from 170,000 unique locations throughout Estonia over a decade, offering extensive geographic and temporal diversity. Its structured design ensures clear place separation significantly enhancing model training for aerial scenarios. Furthermore, we propose the integration of steerable Convolutional Neural Networks (CNNs) to explicitly handle rotational variance, leveraging their inherent rotational equivariance to produce robust, orientation‑invariant feature representations. Our extensive benchmarking demonstrates that models trained on LASED achieve significantly higher recall compared to those trained on smaller, less diverse datasets, highlighting the benefits of extensive geographic coverage and temporal diversity. Moreover, steerable CNNs effectively address rotational ambiguity inherent in aerial imagery, consistently outperforming conventional convolutional architectures, achieving on average 12% recall improvement over the best‑performing non‑steerable network. By combining structured, large‑scale datasets with rotation‑equivariant neural networks, our approach significantly enhances model robustness and generalization for aerial vPR.
Authors: Andres Navarro, Carlos de Quinto, José Alberto Hernández
Abstract: Unmanned Aerial Vehicles are reshaping Non‑Terrestrial Networks by acting as agile, intelligent nodes capable of advanced analytics and instantaneous situational awareness. This article introduces a budget‑friendly quadcopter platform that unites 5G communications, edge‑based processing, and AI to tackle core challenges in NTN scenarios. Outfitted with a panoramic camera, robust onboard computation, and LLMs, the drone system delivers seamless object recognition, contextual analysis, and immersive operator experiences through virtual reality VR technology. Field evaluations confirm the platform's ability to process visual streams with low latency and sustain robust 5G links. Adding LLMs further streamlines operations by extracting actionable insights and refining collected data for decision support. Demonstrated use cases, including emergency response, infrastructure assessment, and environmental surveillance, underscore the system's adaptability in demanding contexts.
Authors: Zeeshan Kaleem, Misha Urooj Khan, Ahmad Suleman, Waqas Khalid, Kai-Kit Wong, Chau Yuen
Abstract: Recently, low‑altitude wireless networks (LAWNs) have emerged as a critical backbone for supporting the low‑altitude economy, particularly with the densification of unmanned aerial vehicles (UAVs) and high‑altitude platforms (HAPs). To meet growing data demands, some LAWN deployments incorporate free‑space optical (FSO) links, which offer exceptional bandwidth and beam directivity. However, without strong security measures in place, both conventional radio frequency channels and FSO beams remain vulnerable to interception and spoofing and FSO in particular can suffer from turbulence, misalignment, and weather‑related attenuation. To address these challenges in the quantum era, a quantum‑secure architecture called Quantum Skyshield is proposed to enable reliable communication between the base transceiver station (BTS) and LAWN. The proposed design integrates BB84 quantum key distribution (QKD) with post‑quantum authentication mechanisms. Simulation results confirm the reliable generation of a 128‑bit symmetric key when the quantum bit error rate (QBER) remains below the threshold of 11%. Authentication is enforced using Lamport one‑time signatures and hash‑based message authentication codes (HMAC) to ensure message integrity. A Grover‑inspired threat detection mechanism identifies anomalies with up to 89% probability in a single iteration, enabling real‑time trust evaluation. Lastly, future research challenges have also been identified and discussed to guide further development in this area.
Authors: Kaiqiang Lin, Yijie Mao, Onel Luis Alcaraz López, Mohamed-Slim Alouini
Abstract: Wireless‑powered underground communication networks (WPUCNs), which allow underground devices (UDs) to harvest energy from wireless signals for battery‑free communication, offer a promising solution for sustainable underground monitoring. However, the severe wireless signal attenuation in challenging underground environments and the costly acquisition of channel state information (CSI) make large‑scale WPUCNs economically infeasible in practice. To address this challenge, we introduce flexible unmanned aerial vehicles (UAVs) into WPUCNs, leading to UAV‑enabled WPUCN systems. In this system, a UAV is first charged by a terrestrial hybrid access point (HAP), then flies to the monitoring area to wirelessly charge UDs. Afterwards, the UAV collects data from the UDs and finally returns to the HAP for data offloading. Based on the proposed UAV‑enabled WPUCN system, we first propose its energy consumption model and a hybrid wireless energy transfer (WET) approach (i.e., UDs can harvest energy from both the HAP and the UAV) relying on full‑CSI and CSI‑free multi‑antenna beamforming. Then, we formulate and address a time allocation problem to minimize the energy consumption of UAV, while ensuring that the throughput requirements of all UDs are met and all sensor data is offloaded. Through simulations of a realistic farming scenario, we demonstrate that the proposed hybrid WET approach outperforms other WET approaches, with performance gains influenced by the number of antennas, communication distance, number of UDs, and underground conditions. Additionally, under the optimized time allocation, we found that the proposed hybrid WET approach based on a CSI‑free multi‑antenna scheme achieves the lowest UAV's energy consumption among all WET mechanisms, thereby enabling sustainable underground monitoring in WPUCNs.
Authors: Haochen Liu, Jia Bi, Xiaomin Wang, Xin Yang, Ling Wang
Abstract: Unmanned Aerial Vehicles (UAVs) are increasingly used in surveillance, logistics, agriculture, disaster management, and military operations. Accurate detection and classification of UAV flight states, such as hovering, cruising, ascending, or transitioning, which are essential for safe and effective operations. However, conventional time series classification (TSC) methods often lack robustness and generalization for dynamic UAV environments, while state of the art(SOTA) models like Transformers and LSTM based architectures typically require large datasets and entail high computational costs, especially with high‑dimensional data streams. This paper proposes a novel framework that integrates a Transformer‑based Generative Adversarial Network (GAN) with Multiple Instance Locally Explainable Learning (MILET) to address these challenges in UAV flight state classification. The Transformer encoder captures long‑range temporal dependencies and complex telemetry dynamics, while the GAN module augments limited datasets with realistic synthetic samples. MIL is incorporated to focus attention on the most discriminative input segments, reducing noise and computational overhead. Experimental results show that the proposed method achieves superior accuracy 96.5% on the DroneDetect dataset and 98.6% on the DroneRF dataset that outperforming other SOTA approaches. The framework also demonstrates strong computational efficiency and robust generalization across diverse UAV platforms and flight states, highlighting its potential for real‑time deployment in resource constrained environments.
Authors: Yu Bai, Yifan Zhang, Boxuan Xie, Zheng Chang, Yanru Zhang, Riku Jantti, Zhu Han
Abstract: Unmanned aerial vehicles (UAVs) equipped with integrated sensing and communication (ISAC) capabilities are envisioned to play a pivotal role in future wireless networks due to their enhanced flexibility and efficiency. However, jointly optimizing UAV trajectory planning, multi‑user communication, and target sensing under stringent resource constraints and time‑critical conditions remains a significant challenge. To address this, we propose an Age of Information (AoI)‑centric UAV‑ISAC system that simultaneously performs target sensing and serves multiple ground users, emphasizing information freshness as the core performance metric. We formulate a long‑term average AoI minimization problem that jointly optimizes the UAV's flight trajectory and beamforming. To tackle the high‑dimensional, non‑convexity of this problem, we develop a deep reinforcement learning (DRL)‑based algorithm capable of providing real‑time decisions on UAV movement and beamforming for both radar sensing and multi‑user communication. Specifically, a Kalman filter is employed for accurate target state prediction, regularized zero‑forcing is utilized to mitigate inter‑user interference, and the Soft Actor‑Critic algorithm is applied for training the DRL agent on continuous actions. The proposed framework adaptively balances the trade‑offs between sensing accuracy and communication quality. Extensive simulation results demonstrate that our proposed method consistently achieves lower average AoI compared to baseline approaches.
Authors: Genliang Li, Yaxin Cui, Jinyu Su
Abstract: Metaheuristic algorithms have gained widespread application across various fields owing to their ability to generate diverse solutions. One such algorithm is the Snake Optimizer (SO), a progressive optimization approach. However, SO suffers from the issues of slow convergence speed and susceptibility to local optima. In light of these shortcomings, we propose a novel Multi‑strategy Improved Snake Optimizer (MISO). Firstly, we propose a new adaptive random disturbance strategy based on sine function to alleviate the risk of getting trapped in a local optimum. Secondly, we introduce adaptive Levy flight strategy based on scale factor and leader and endow the male snake leader with flight capability, which makes it easier for the algorithm to leap out of the local optimum and find the global optimum. More importantly, we put forward a position update strategy combining elite leadership and Brownian motion, effectively accelerating the convergence speed while ensuring precision. Finally, to demonstrate the performance of MISO, we utilize 30 CEC2017 test functions and the CEC2022 test suite, comparing it with 11 popular algorithms across different dimensions to validate its effectiveness. Moreover, Unmanned Aerial Vehicle (UAV) has been widely used in various fields due to its advantages of low cost, high mobility and easy operation. However, the UAV path planning problem is crucial for flight safety and efficiency, and there are still challenges in establishing and optimizing the path model. Therefore, we apply MISO to the UAV 3D path planning problem as well as 6 engineering design problems to assess its feasibility in practical applications. The experimental results demonstrate that MISO exceeds other competitive algorithms in terms of solution quality and stability, establishing its strong potential for application.
Authors: Ziliang Li, Hongming Chen, Yiyang Lin, Biyu Ye, Ximin Lyu
Abstract: Autonomous aerial systems play an increasingly vital role in a wide range of applications, particularly for transport and delivery tasks in complex environments. In airdrop missions, these platforms face the dual challenges of abrupt control mode switching and inherent system delays along with control errors. To address these issues, this paper presents an autonomous airdrop system based on an aerial manipulator (AM). The introduction of additional actuated degrees of freedom enables active compensation for UAV tracking errors. By imposing smooth and continuous constraints on the parabolic landing point, the proposed approach generates aerial throwing trajectories that are less sensitive to the timing of payload release. A hierarchical disturbance compensation strategy is incorporated into the Nonlinear Model Predictive Control (NMPC) framework to mitigate the effects of sudden changes in system parameters, while the predictive capabilities of NMPC are further exploited to improve the precision of aerial throwing. Both simulation and real‑world experimental results demonstrate that the proposed system achieves greater agility and precision in airdrop missions.
Authors: Minze Li, Wei Zhao, Ran Chen, Mingqiang Wei
Abstract: Real‑time trajectory planning for unmanned aerial vehicles (UAVs) in dynamic environments remains a key challenge due to high computational demands and the need for fast, adaptive responses. Traditional Particle Swarm Optimization (PSO) methods, while effective for offline planning, often struggle with premature convergence and latency in real‑time scenarios. To overcome these limitations, we propose PE‑PSO, an enhanced PSO‑based online trajectory planner. The method introduces a persistent exploration mechanism to preserve swarm diversity and an entropy‑based parameter adjustment strategy to dynamically adapt optimization behavior. UAV trajectories are modeled using B‑spline curves, which ensure path smoothness while reducing optimization complexity. To extend this capability to UAV swarms, we develop a multi‑agent framework that combines genetic algorithm (GA)‑based task allocation with distributed PE‑PSO, supporting scalable and coordinated trajectory generation. The distributed architecture allows for parallel computation and decentralized control, enabling effective cooperation among agents while maintaining real‑time performance. Comprehensive simulations demonstrate that the proposed framework outperforms conventional PSO and other swarm‑based planners across several metrics, including trajectory quality, energy efficiency, obstacle avoidance, and computation time. These results confirm the effectiveness and applicability of PE‑PSO in real‑time multi‑UAV operations under complex environmental conditions.
Authors: Debao Huang, Rongjun Qin
Abstract: Uncertainty quantification of the photogrammetry process is essential for providing per‑point accuracy credentials of the point clouds. Unlike airborne LiDAR, whose accuracy generally remains consistent with objects with varying geometric complexity, the accuracy of photogrammetric point clouds is rather object/scene‑dependent, as it relies on algorithm‑derived measurements. Generally, errors of the photogrammetric point clouds propagate through a two‑step process: Structure‑from‑Motion (SfM) with Bundle adjustment (BA), followed by Multi‑view Stereo (MVS). While uncertainty estimation in the SfM stage has been well studied using the first‑order statistics of the reprojection error function, that in the MVS stage remains largely unsolved and non‑standardized, primarily due to its non‑differentiable and multi‑modal nature (i.e., from pixel values to geometry). In this paper, we present an uncertainty quantification framework closing this gap by associating an error covariance matrix per point accounting for this two‑step photogrammetry process. Specifically, to estimate the uncertainty in the MVS stage, we propose a novel, self‑calibrating method by taking reliable n‑view points (n>=6) per‑view to regress the disparity uncertainty using highly relevant cues (such as matching cost values) from the MVS stage. Compared to existing approaches, our method uses self‑contained, reliable 3D points extracted directly from the MVS process, with the benefit of being self‑supervised and naturally adhering to error propagation path of the photogrammetry process, thereby providing a robust and certifiable uncertainty quantification across diverse scenes. We evaluate the framework using a variety of publicly available airborne and UAV imagery datasets. Results demonstrate that our method outperforms existing approaches by achieving high bounding rates without overestimating uncertainty.
Authors: Alejandro Flores C., Konstantinos Ntontin, Ashok Bandi, Symeon Chatzinotas
Abstract: In this work, we consider the resource allocation problem for task offloading from Internet of Medical Things (IoMT) devices, to a non‑terrestrial network. The architecture considers clusters of IoMT devices that offload their tasks to a dedicated unmanned aerial vehicle (UAV) serving as a multi‑access edge computing (MEC) server, which can compute the task or further offload it to an available high‑altitude platform station (HAPS) or to a low‑earth orbit (LEO) satellite for remote computing. We formulate a problem that has as objective the minimization of the weighted sum delay of the tasks. Given the non‑convex nature of the problem, and acknowledging that the complexity of the optimization algorithms impact their performance, we derive a low‑complexity joint subchannel allocation and offloading decision algorithm with dynamic computing resource initialization, developed as a greedy heuristic based on convex optimization criteria. Simulations show the gain obtained by including the different non‑terrestrial nodes against architectures without them.
Authors: Xudong Wang, Hongyang Du, Lei Feng, Kaibin Huang
Abstract: The growing demand for low‑latency computing in 6G is driving the use of UAV‑based low‑altitude mobile edge computing (MEC) systems. However, limited spectrum often leads to severe uplink interference among ground terminals (GTs). In this paper, we investigate a rate‑splitting multiple access (RSMA)‑enabled low‑altitude MEC system, where a UAV‑based edge server assists multiple GTs in concurrently offloading their tasks over a shared uplink. We formulate a joint optimization problem involving the UAV 3D trajectory, RSMA decoding order, task offloading decisions, and resource allocation, aiming to mitigate multi‑user interference and maximize energy efficiency. Given the high dimensionality, non‑convex nature, and dynamic characteristics of this optimization problem, we propose a generative AI‑enhanced deep reinforcement learning (DRL) framework to solve it efficiently. Specifically, we embed a diffusion model into the actor network to generate high‑quality action samples, improving exploration in hybrid action spaces and avoiding local optima. In addition, a priority‑based RSMA decoding strategy is designed to facilitate efficient successive interference cancellation with low complexity. Simulation results demonstrate that the proposed method for low‑altitude MEC systems outperforms baseline methods, and that integrating GDM with RSMA can achieve significantly improved energy efficiency performance.
Authors: Yuki Kondo, Norimichi Ukita, Riku Kanayama, Yuki Yoshida, Takayuki Yamaguchi, Xiang Yu, Guang Liang, Xinyao Liu, Guan-Zhang Wang, Wei-Ta Chu, Bing-Cheng Chuang, Jia-Hua Lee, Pin-Tseng Kuo, I-Hsuan Chu, Yi-Shein Hsiao, Cheng-Han Wu, Po-Yi Wu, Jui-Chien Tsou, Hsuan-Chi Liu, Chun-Yi Lee, Yuan-Fu Yang, Kosuke Shigematsu, Asuka Shin, Ba Tran
Abstract: Small Multi‑Object Tracking (SMOT) is particularly challenging when targets occupy only a few dozen pixels, rendering detection and appearance‑based association unreliable. Building on the success of the MVA2023 SOD4SB challenge, this paper introduces the SMOT4SB challenge, which leverages temporal information to address limitations of single‑frame detection. Our three main contributions are: (1) the SMOT4SB dataset, consisting of 211 UAV video sequences with 108,192 annotated frames under diverse real‑world conditions, designed to capture motion entanglement where both camera and targets move freely in 3D; (2) SO‑HOTA, a novel metric combining Dot Distance with HOTA to mitigate the sensitivity of IoU‑based metrics to small displacements; and (3) a competitive MVA2025 challenge with 78 participants and 308 submissions, where the winning method achieved a 5.1x improvement over the baseline. This work lays a foundation for advancing SMOT in UAV scenarios with applications in bird strike avoidance, agriculture, fisheries, and ecological monitoring.
Authors: Heegyeong Kim, Alice James, Avishkar Seth, Endrowednes Kuantama, Jane Williamson, Yimeng Feng, Richard Han
Abstract: This paper introduces an autonomous UAV vision system for continuous, real‑time tracking of marine animals, specifically sharks, in dynamic marine environments. The system integrates an onboard computer with a stabilised RGB‑D camera and a custom‑trained OSTrack pipeline, enabling visual identification under challenging lighting, occlusion, and sea‑state conditions. A key innovation is the inter‑UAV handoff protocol, which enables seamless transfer of tracking responsibilities between drones, extending operational coverage beyond single‑drone battery limitations. Performance is evaluated on a curated shark dataset of 5,200 frames, achieving a tracking success rate of 81.9% during real‑time flight control at 100 Hz, and robustness to occlusion, illumination variation, and background clutter. We present a seamless UAV handoff framework, where target transfer is attempted via high‑confidence feature matching, achieving 82.9% target coverage. These results confirm the viability of coordinated UAV operations for extended marine tracking and lay the groundwork for scalable, autonomous monitoring.
Authors: Pranav Rajbhandari, Abhi Veda, Matthew Garratt, Mandyam Srinivasan, Sridhar Ravi
Abstract: Bio‑inspired design is often used in autonomous UAV navigation due to the capacity of biological systems for flight and obstacle avoidance despite limited sensory and computational capabilities. In particular, honeybees mainly use the sensory input of optic flow, the apparent motion of objects in their visual field, to navigate cluttered environments. In our work, we train a Reinforcement Learning agent to navigate a tunnel with obstacles using only optic flow as sensory input. We inspect the attention patterns of trained agents to determine the regions of optic flow on which they primarily base their motor decisions. We find that agents trained in this way pay most attention to regions of discontinuity in optic flow, as well as regions with large optic flow magnitude. The trained agents appear to navigate a cluttered tunnel by avoiding the obstacles that produce large optic flow, while maintaining a centered position in their environment, which resembles the behavior seen in flying insects. This pattern persists across independently trained agents, which suggests that this could be a good strategy for developing a simple explicit control law for physical UAVs.
Authors: Deepak Kumar Panda, Weisi Guo
Abstract: Autonomous unmanned aerial vehicles (UAVs) rely on global navigation satellite system (GNSS) pseudorange measurements for accurate real‑time localization and navigation. However, this dependence exposes them to sophisticated spoofing threats, where adversaries manipulate pseudoranges to deceive UAV receivers. Among these, drift‑evasive spoofing attacks subtly perturb measurements, gradually diverting the UAVs trajectory without triggering conventional signal‑level anti‑spoofing mechanisms. Traditional distributional shift detection techniques often require accumulating a threshold number of samples, causing delays that impede rapid detection and timely response. Consequently, robust temporal‑scale detection methods are essential to identify attack onset and enable contingency planning with alternative sensing modalities, improving resilience against stealthy adversarial manipulations. This study explores a Bayesian online change point detection (BOCPD) approach that monitors temporal shifts in value estimates from a reinforcement learning (RL) critic network to detect subtle behavioural deviations in UAV navigation. Experimental results show that this temporal value‑based framework outperforms conventional GNSS spoofing detectors, temporal semi‑supervised learning frameworks, and the Page‑Hinkley test, achieving higher detection accuracy and lower false‑positive and false‑negative rates for drift‑evasive spoofing attacks.
Authors: Shuangyao Huang, Haibo Zhang, Zhiyi Huang
Abstract: This paper presents a multi‑agent reinforcement learning (MARL) framework for cooperative collision avoidance of UAV swarms leveraging domain knowledge‑driven reward. The reward is derived from knowledge in the domain of image processing, approximating contours on a two‑dimensional field. By modeling obstacles as maxima on the field, collisions are inherently avoided as contours never go through peaks or intersect. Additionally, counters are smooth and energy‑efficient. Our framework enables training with large swarm sizes as the agent interaction is minimized and the need for complex credit assignment schemes or observation sharing mechanisms in state‑of‑the‑art MARL approaches are eliminated. Moreover, UAVs obtain the ability to adapt to complex environments where contours may be non‑viable or non‑existent through intensive training. Extensive experiments are conducted to evaluate the performances of our framework against state‑of‑the‑art MARL algorithms.
Authors: Jianing Zhi, Xinghua Li, Zidong Chen
Abstract: The rapid development of urban low‑altitude unmanned aerial vehicle (UAV) economy poses new challenges for dynamic site selection of UAV landing points and supply stations. Traditional deep reinforcement learning methods face computational complexity bottlenecks, particularly with standard attention mechanisms, when handling large‑scale urban‑level location problems. This paper proposes GeoHopNet, a Hopfield‑augmented sparse spatial attention network specifically designed for dynamic UAV site location problems. Our approach introduces four core innovations: (1) distance‑biased multi‑head attention mechanism that explicitly encodes spatial geometric information; (2) K‑nearest neighbor sparse attention that reduces computational complexity from O(N^2) to O(NK); (3) a modern Hopfield external memory module; and (4) a memory regularization strategy. Experimental results demonstrate that GeoHopNet extends the boundary of solvable problem sizes. For large‑scale instances with 1,000 nodes, where standard attention models become prohibitively slow (over 3 seconds per instance) and traditional solvers fail, GeoHopNet finds high‑quality solutions (0.22% optimality gap) in under 0.1 seconds. Compared to the state‑of‑the‑art ADNet baseline on 100‑node instances, our method improves solution quality by 22.2% and is 1.8× faster.
Authors: Viktor Sinitsyn, Nils Schlautmann, Florian Schwaiger, Florian Holzapfel
Abstract: The aerospace industry has experienced significant transformations over the last decade, driven by technological advancements and innovative solutions in goods and personal transportation. This evolution has spurred the emergence of numerous start‑ups that now face challenges traditionally encountered by established aerospace companies. Among these challenges is the efficient processing of digital intra‑device communication interfaces for onboard equipment ‑ a critical component for ensuring seamless system integration and functionality. Addressing this challenge requires solutions that emphasize clear and consistent interface descriptions, automation of processes, and reduced labor‑intensive efforts.
This paper presents a novel process and toolchain designed to streamline the development of digital interfaces and onboard software, which our team has successfully applied in several completed projects. The proposed approach focuses on automation and flexibility while maintaining compliance with design assurance requirements.
Authors: Venkat Margapuri
Abstract: Visual coverage path planning with unmanned aerial vehicles (UAVs) requires agents to strategically coordinate UAV motion and camera control to maximize coverage, minimize redundancy, and maintain battery efficiency. Traditional reinforcement learning (RL) methods rely on environment‑specific reward formulations that lack semantic adaptability. This study proposes Prompt‑Informed Reinforcement Learning (PIRL), a novel approach that integrates the zero‑shot reasoning ability and in‑context learning capability of large language models with curiosity‑driven RL. PIRL leverages semantic feedback from an LLM, GPT‑3.5, to dynamically shape the reward function of the Proximal Policy Optimization (PPO) RL policy guiding the agent in position and camera adjustments for optimal visual coverage. The PIRL agent is trained using OpenAI Gym and evaluated in various environments. Furthermore, the sim‑to‑real‑like ability and zero‑shot generalization of the agent are tested by operating the agent in Webots simulator which introduces realistic physical dynamics. Results show that PIRL outperforms multiple learning‑based baselines such as PPO with static rewards, PPO with exploratory weight initialization, imitation learning, and an LLM‑only controller. Across different environments, PIRL outperforms the best‑performing baseline by achieving up to 14% higher visual coverage in OpenAI Gym and 27% higher in Webots, up to 25% higher battery efficiency, and up to 18% lower redundancy, depending on the environment. The results highlight the effectiveness of LLM‑guided reward shaping in complex spatial exploration tasks and suggest a promising direction for integrating natural language priors into RL for robotics.
Authors: Yousef Emami, Hao Zhou, Miguel Gutierrez Gaitan, Kai Li, Luis Almeida
Abstract: Uncrewed Aerial Vehicles (UAVs) play a vital role in public safety, especially in monitoring wildfires, where early detection reduces environmental impact. In UAV‑Assisted Wildfire Monitoring (UAWM) systems, jointly optimizing the data collection schedule and UAV velocity is essential to minimize the average Age of Information (AoI) for sensory data. Deep Reinforcement Learning (DRL) has been used for this optimization, but its limitations‑including low sampling efficiency, discrepancies between simulation and real‑world conditions, and complex training make it unsuitable for time‑critical applications such as wildfire monitoring. Recent advances in Large Language Models (LLMs) provide a promising alternative. With strong reasoning and generalization capabilities, LLMs can adapt to new tasks through In‑Context Learning (ICL), which enables task adaptation using natural language prompts and example‑based guidance without retraining. This paper proposes a novel online Flight Resource Allocation scheme based on LLM‑Enabled In‑Context Learning (FRSICL) to jointly optimize the data collection schedule and UAV velocity along the trajectory in real time, thereby asymptotically minimizing the average AoI across all ground sensors. Unlike DRL, FRSICL generates data collection schedules and velocities using natural language task descriptions and feedback from the environment, enabling dynamic decision‑making without extensive retraining. Simulation results confirm the effectiveness of FRSICL compared to state‑of‑the‑art baselines, namely Proximal Policy Optimization, Block Coordinate Descent, and Nearest Neighbor.
Authors: Guanghai Ding, Yihua Ren, Yuting Liu, Qijun Zhao, Shuiwang Li
Abstract: With the rapid advancement of UAV technology and its extensive application in various fields such as military reconnaissance, environmental monitoring, and logistics, achieving efficient and accurate Anti‑UAV tracking has become essential. The importance of Anti‑UAV tracking is increasingly prominent, especially in scenarios such as public safety, border patrol, search and rescue, and agricultural monitoring, where operations in complex environments can provide enhanced security. Current mainstream Anti‑UAV tracking technologies are primarily centered around computer vision techniques, particularly those that integrate multi‑sensor data fusion with advanced detection and tracking algorithms. This paper first reviews the characteristics and current challenges of Anti‑UAV detection and tracking technologies. Next, it investigates and compiles several publicly available datasets, providing accessible links to support researchers in efficiently addressing related challenges. Furthermore, the paper analyzes the major vision‑based and vision‑fusion‑based Anti‑UAV detection and tracking algorithms proposed in recent years. Finally, based on the above research, this paper outlines future research directions, aiming to provide valuable insights for advancing the field.
Authors: Nguyen Van Duc, Bui Duc Manh, Quang-Trung Luu, Dinh Thai Hoang, Van-Linh Nguyen, Diep N. Nguyen
Abstract: This paper aims to propose a novel machine learning (ML) approach incorporating Homomorphic Encryption (HE) to address privacy limitations in Unmanned Aerial Vehicles (UAV)‑based face detection. Due to challenges related to distance, altitude, and face orientation, high‑resolution imagery and sophisticated neural networks enable accurate face recognition in dynamic environments. However, privacy concerns arise from the extensive surveillance capabilities of UAVs. To resolve this issue, we propose a novel framework that integrates HE with advanced neural networks to secure facial data throughout the inference phase. This method ensures that facial data remains secure with minimal impact on detection accuracy. Specifically, the proposed system leverages the Cheon‑Kim‑Kim‑Song (CKKS) scheme to perform computations directly on encrypted data, optimizing computational efficiency and security. Furthermore, we develop an effective data encoding method specifically designed to preprocess the raw facial data into CKKS form in a Single‑Instruction‑Multiple‑Data (SIMD) manner. Building on this, we design a secure inference algorithm to compute on ciphertext without needing decryption. This approach not only protects data privacy during the processing of facial data but also enhances the efficiency of UAV‑based face detection systems. Experimental results demonstrate that our method effectively balances privacy protection and detection performance, making it a viable solution for UAV‑based secure face detection. Significantly, our approach (while maintaining data confidentially with HE encryption) can still achieve an accuracy of less than 1% compared to the benchmark without using encryption.
Authors: Zihao Zhou, Zipeng Dai, Linyi Huang, Cui Yang, Youjun Xiang, Jie Tang, Kai-kit Wong
Abstract: In unmanned aerial vehicle (UAV) networks, communication protocols and algorithms are essential for cooperation and collaboration between UAVs. Simulation provides a cost‑effective solution for prototyping, debugging, and analyzing protocols and algorithms, avoiding the prohibitive expenses of field experiments. In this paper, we present ``UavNetSim‑v1'', an open‑source Python‑based simulation platform designed for rapid development, testing, and evaluating the protocols and algorithms in UAV networks. ``UavNetSim‑v1'' provides most of the functionalities developers may need, including routing/medium access control (MAC) protocols, topology control algorithms and mobility/energy models, while maintaining ease of use. Furthermore, the platform supports comprehensive performance evaluation and features an interactive visualization interface for in‑depth algorithm analysis. In short, ``UavNetSim‑v1'' lends itself to both rapid prototyping and educational purposes, and can serve as a lightweight yet powerful alternative to mature network simulators for UAV communication research.
Authors: Azfar Azdi Arfakhsyad, Aufa Nasywa Rahman, Larasati Kinanti, Ahmad Ataka Awwalur Rizqi, Hannan Nur Muhammad
Abstract: Unmanned Aerial Vehicles (UAV) have emerged as versatile platforms, driving the demand for accurate modeling to support developmental testing. This paper proposes data‑driven modeling software for UAV. Emphasizes the utilization of cost‑effective sensors to obtain orientation and location data subsequently processed through the application of data filtering algorithms and sensor fusion techniques to improve the data quality to make a precise model visualization on the software. UAV's orientation is obtained using processed Inertial Measurement Unit (IMU) data and represented using Quaternion Representation to avoid the gimbal lock problem. The UAV's location is determined by combining data from the Global Positioning System (GPS), which provides stable geographic coordinates but slower data update frequency, and the accelerometer, which has higher data update frequency but integrating it to get position data is unstable due to its accumulative error. By combining data from these two sensors, the software is able to calculate and continuously update the UAV's real‑time position during its flight operations. The result shows that the software effectively renders UAV orientation and position with high degree of accuracy and fluidity
Authors: Hongyu Nie, Xu Liu, Zhaotong Tan, Sen Mei, Wenbo Su
Abstract: Autonomous navigation in mobile robots, reliant on perception and planning, faces major hurdles in large‑scale, complex environments. These include heavy computational burdens for mapping, sensor occlusion failures for UAVs, and traversal challenges on irregular terrain for UGVs, all compounded by a lack of perception‑aware strategies. To address these challenges, we introduce Random Mapping and Random Projection (RMRP). This method constructs a lightweight linear parametric map by first mapping data to a high‑dimensional space, followed by a sparse random projection for dimensionality reduction. Our novel Residual Energy Preservation Theorem provides theoretical guarantees for this process, ensuring critical geometric properties are preserved. Based on this map, we propose the RPATR (Robust Perception‑Aware Trajectory Planner) framework. For UAVs, our method unifies grid and Euclidean Signed Distance Field (ESDF) maps. The front‑end uses an analytical occupancy gradient to refine initial paths for safety and smoothness, while the back‑end uses a closed‑form ESDF for trajectory optimization. Leveraging the trained RMRP model's generalization, the planner predicts unobserved areas for proactive navigation. For UGVs, the model characterizes terrain and provides closed‑form gradients, enabling online planning to circumvent large holes. Validated in diverse scenarios, our framework demonstrates superior mapping performance in time, memory, and accuracy, and enables computationally efficient, safe navigation for high‑speed UAVs and UGVs. The code will be released to foster community collaboration.
Authors: Xiaoren Xu, Hao Xu, Dongyu Wei, Walid Saad, Mehdi Bennis, Mingzhe Chen
Abstract: In this paper, a novel Three dimensional (3D) positioning framework of fluid antenna system (FAS)‑enabled unmanned aerial vehicles (UAVs) is developed. In the proposed framework, a set of controlled UAVs cooperatively estimate the real‑time 3D position of a target UAV. Here, the active UAV transmits a measurement signal to the passive UAVs via the reflection from the target UAV. Each passive UAV estimates the distance of the active‑target‑passive UAV link and selects an antenna port to share the distance information with the base station (BS) that calculates the real‑time position of the target UAV. As the target UAV is moving due to its task operation, the controlled UAVs must optimize their trajectories and select optimal antenna port, aiming to estimate the real‑time position of the target UAV. We formulate this problem as an optimization problem to minimize the target UAV positioning error via optimizing the trajectories of all controlled UAVs and antenna port selection of passive UAVs. Here, an attention‑based recurrent multi‑agent reinforcement learning (AR‑MARL) scheme is proposed, which enables each controlled UAV to use the local Q function to determine its trajectory and antenna port while optimizing the target UAV positioning performance without knowing the trajectories and antenna port selections of other controlled UAVs. Different from current MARL methods, the proposed method uses a recurrent neural network (RNN) that incorporates historical state‑action pairs of each controlled UAV, and an attention mechanism to analyze the importance of these historical state‑action pairs, thus improving the global Q function approximation accuracy and the target UAV positioning accuracy. Simulation results show that the proposed AR‑MARL scheme can reduce the average positioning error by up to 17.5% and 58.5% compared to the VD‑MARL scheme and the proposed method without FAS.
Authors: Geng Sun, Chenbang Liu, Jiahui Li, Guannan Qu, Shuang Liang, Jiacheng Wang, Changyuan Zhao, Dusit Niyato
Abstract: Unmanned aerial vehicle (UAV) swarms utilizing collaborative beamforming (CB) in low‑altitude wireless networks (LAWN) demonstrate significant potential for enhanced communication range, energy efficiency, and signal directivity through the formation of virtual antenna arrays (VAA). However, environmental disturbances, particularly wind fields, significantly degrade CB performance by introducing positional errors that disrupt beam patterns, thereby compromising transmission reliability. This paper investigates the critical challenge of maintaining CB performance in UAV‑based VAAs operating in LAWN under wind field disturbances. We propose a comprehensive framework that models the impact of three distinct wind conditions (constant, shear, and turbulent) on UAV array performance, and formulate a long‑term real‑time optimization problem to maximize directivity while minimizing maximum sidelobe levels through adaptive excitation current weight adjustments. To address the inherent complexity of this problem, we propose a novel proximal policy optimization algorithm with long short‑term memory (LSTM) structure and adaptive learning rate (PPO‑LA), which effectively captures temporal patterns in wind field disturbances and enables real‑time adaptation without requiring extensive prior training for specific wind conditions. Our simulation results demonstrate that the proposed PPO‑LA algorithm successfully recovers degraded CB performance across various wind scenarios, and thus significantly outperforming benchmark algorithms.
Authors: Geng Sun, Likun Zhang, Jiahui Li, Jing Wu, Jiacheng Wang, Zemin Sun, Changyuan Zhao, Victor C. M. Leung
Abstract: The integration of unmanned aerial vehicles (UAVs) with Internet of Things (IoT) networks offers promising solutions for efficient data collection. However, the limited energy capacity of UAVs remains a significant challenge. In this case, laser beam directors (LBDs) have emerged as an effective technology for wireless charging of UAVs during operation, thereby enabling sustained data collection without frequent returns to charging stations (CSs). In this work, we investigate the age of information (AoI) optimization in LBD‑powered UAV‑assisted IoT networks, where multiple UAVs collect data from distributed IoTs while being recharged by laser beams. We formulate a joint optimization problem that aims to minimize the peak AoI while determining optimal UAV trajectories and laser charging strategies. This problem is particularly challenging due to its non‑convex nature, complex temporal dependencies, and the need to balance data collection efficiency with energy consumption constraints. To address these challenges, we propose a novel multi‑agent proximal policy optimization with temporal memory and multi‑agent coordination (MAPPO‑TM) framework. Specifically, MAPPO‑TM incorporates temporal memory mechanisms to capture the dynamic nature of UAV operations and facilitates effective coordination among multiple UAVs through decentralized learning while considering global system objectives. Simulation results demonstrate that the proposed MAPPO‑TM algorithm outperforms conventional approaches in terms of peak AoI minimization and energy efficiency. Ideally, the proposed algorithm achieves up to 15.1% reduction in peak AoI compared to conventional multi‑agent deep reinforcement learning (MADRL) methods.
Authors: Gia-Huy Nguyen, Anh-Nhat Nguyen, Minh-Sang Nguyen, Khai Nguyen, Tung-Son Ngo, Ngoc-Anh Bui, Phuong-Chi Le, Manh-Duc Hoang
Abstract: This article studies the efficiency of secrecy data offloading for an unmanned aerial vehicle (UAV)‑assisted nonorthogonal multiple access (NOMA)‑integrated mobile‑edge computing (MEC) incorporating wireless power transfer (WPT) within an Internet of Things (IoT) network. Specifically, this study assumes an UAV to function in dual roles: as a mobile computation platform and as an aerial power‑supply station, offering substantial advantages for resource‑constrained edge devices (EDs) in mitigating interference from an passive eavesdropper. To assess the system's secrecy offloading efficacy, the secrecy successful computation probability (SSCP) closed‑formed formulation under Nakagami‑m fading channel is derived. The theoretical results are conducted with a variety of parameters, thereby validating the precision of our analysis.
Authors: Rolando A. Hernandez-Hernandez, Adrian Rubio-Solis
Abstract: Multilayer Extreme Learning Machine (ML‑ELM) and its variants have proven to be an effective technique for the classification of different natural signals such as audio, video, acoustic and images. In this paper, a Hybrid Multilayer Extreme Learning Machine (HML‑ELM) that is based on ELM‑based autoencoder (ELM‑AE) and an Interval Type‑2 fuzzy Logic theory is suggested for active image classification and applied to Unmanned Aerial Vehicles (UAVs). The proposed methodology is a hierarchical ELM learning framework that consists of two main phases: 1) self‑taught feature extraction and 2) supervised feature classification. First, unsupervised multilayer feature encoding is achieved by stacking a number of ELM‑AEs, in which input data is projected into a number of high‑level representations. At the second phase, the final features are classified using a novel Simplified Interval Type‑2 Fuzzy ELM (SIT2‑FELM) with a fast output reduction layer based on the SC algorithm; an improved version of the algorithm Center of Sets Type Reducer without Sorting Requirement (COSTRWSR). To validate the efficiency of the HML‑ELM, two types of experiments for the classification of images are suggested. First, the HML‑ELM is applied to solve a number of benchmark problems for image classification. Secondly, a number of real experiments to the active classification and transport of four different objects between two predefined locations using a UAV is implemented. Experiments demonstrate that the proposed HML‑ELM delivers a superior efficiency compared to other similar methodologies such as ML‑ELM, Multilayer Fuzzy Extreme Learning Machine (ML‑FELM) and ELM.
Authors: Wen Zhang, Aimin Wang, Jiahui Li, Geng Sun, Jiacheng Wang, Weijie Yuan, Dusit Niyato
Abstract: The integration of wireless power transfer (WPT) with Internet of Things (IoT) offers promising solutions for sensing applications, but faces significant challenges when deployed in hard‑to‑access areas such as high‑temperature environments. In such extreme conditions, traditional fixed WPT infrastructure cannot be safely installed, and batteries rapidly degrade due to hardware failures. In this paper, we propose an uncrewed aerial vehicle (UAV)‑assisted data collection and WPT framework for batteryless sensor (BLS) networks deployed in these challenging environments. Specifically, we consider a practical scenario where a UAV first transfers energy to BLS nodes via WPT, enabling these nodes to subsequently transmit their collected data to the UAV through orthogonal frequency‑division multiple access (OFDMA). Then, we formulate a multi‑objective optimization problem that aims to maximize the fair data collection volume while minimizing the UAV energy consumption through joint optimization of transmit power allocation and flight trajectory planning. Due to the non‑convex nature and dynamic characteristics of this problem, conventional optimization methods prove inadequate. To address these challenges, we propose an enhanced soft actor‑critic algorithm with parameter‑free attention, prioritized experience replay, and value‑based reward centering (SAC‑PPV), thereby improving the exploration efficiency and learning stability of the algorithm in complex WPT scenarios. Simulation results demonstrate that the proposed approach consistently outperforms benchmark algorithms under various network configurations.
Authors: Antonella Barisic Kulas, Frano Petric, Stjepan Bogdan
Abstract: Autonomous maritime surveillance and target vessel identification in environments where Global Navigation Satellite Systems (GNSS) are not available is critical for a number of applications such as search and rescue and threat detection. When the target vessel is only described by visual cues and its last known position is not available, unmanned aerial vehicles (UAVs) must rely solely on on‑board vision to scan a large search area under strict computational constraints. To address this challenge, we leverage the YOLOv8 object detection model to detect all vessels in the field of view. We then apply feature matching and hue histogram distance analysis to determine whether any detected vessel corresponds to the target. When found, we localize the target using simple geometric principles. We demonstrate the proposed method in real‑world experiments during the MBZIRC2023 competition, integrated into a fully autonomous system with GNSS‑denied navigation. We also evaluate the impact of perspective on detection accuracy and localization precision and compare it with the oracle approach.
Authors: Xuyang Chen, Chong Huang, Daquan Feng, Lei Luo, Yao Sun, Xiang-Gen Xia
Abstract: Real‑time unmanned aerial vehicle (UAV) video streaming is essential for time‑sensitive applications, including remote surveillance, emergency response, and environmental monitoring. However, it faces challenges such as limited bandwidth, latency fluctuations, and high packet loss. To address these issues, we propose a novel semantic self‑correcting video transmission framework with ultra‑fine bitrate granularity (SSCV‑G). In SSCV‑G, video frames are encoded into a compact semantic codebook space, and the transmitter adaptively sends a subset of semantic indices based on bandwidth availability, enabling fine‑grained bitrate control for improved bandwidth efficiency. At the receiver, a spatio‑temporal vision transformer (ST‑ViT) performs multi‑frame joint decoding to reconstruct dropped semantic indices by modeling intra‑ and inter‑frame dependencies. To further improve performance under dynamic network conditions, we integrate a multi‑user proximal policy optimization (MUPPO) reinforcement learning scheme that jointly optimizes communication resource allocation and semantic bitrate selection to maximize user Quality of Experience (QoE). Extensive experiments demonstrate that the proposed SSCV‑G significantly outperforms state‑of‑the‑art video codecs in coding efficiency, bandwidth adaptability, and packet loss robustness. Moreover, the proposed MUPPO‑based QoE optimization consistently surpasses existing benchmarks.
Authors: Tianshun Li, Tianyi Huai, Zhen Li, Yichun Gao, Haoang Li, Xinhu Zheng
Abstract: Unmanned Aerial Vehicles (UAVs) have emerged as versatile tools across various sectors, driven by their mobility and adaptability. This paper introduces SkyVLN, a novel framework integrating vision‑and‑language navigation (VLN) with Nonlinear Model Predictive Control (NMPC) to enhance UAV autonomy in complex urban environments. Unlike traditional navigation methods, SkyVLN leverages Large Language Models (LLMs) to interpret natural language instructions and visual observations, enabling UAVs to navigate through dynamic 3D spaces with improved accuracy and robustness. We present a multimodal navigation agent equipped with a fine‑grained spatial verbalizer and a history path memory mechanism. These components allow the UAV to disambiguate spatial contexts, handle ambiguous instructions, and backtrack when necessary. The framework also incorporates an NMPC module for dynamic obstacle avoidance, ensuring precise trajectory tracking and collision prevention. To validate our approach, we developed a high‑fidelity 3D urban simulation environment using AirSim, featuring realistic imagery and dynamic urban elements. Extensive experiments demonstrate that SkyVLN significantly improves navigation success rates and efficiency, particularly in new and unseen environments.
Authors: Hongbao Li, Ziye Jia, Sijie He, Kun Guo, Qihui Wu
Abstract: With the emergence of compute‑intensive and delay‑sensitive applications in vehicular networks, unmanned aerial vehicles (UAVs) have emerged as a promising complement for vehicular edge computing due to the high mobility and flexible deployment. However, the existing UAV‑assisted offloading strategies are insufficient in coordinating heterogeneous computing resources and adapting to dynamic network conditions. Hence, this paper proposes a dual‑layer UAV‑assisted edge computing architecture based on partial offloading, composed of the relay capability of high‑altitude UAVs and the computing support of low‑altitude UAVs. The proposed architecture enables efficient integration and coordination of heterogeneous resources. A joint optimization problem is formulated to minimize the system delay and energy consumption while ensuring the task completion rate. To solve the high‑dimensional decision problem, we reformulate the problem as a Markov decision process and propose a hierarchical offloading scheme based on the soft actor‑critic algorithm. The method decouples global and local decisions, where the global decisions integrate offloading ratios and trajectory planning into continuous actions, while the local scheduling is handled via designing a priority‑based mechanism. Simulations are conducted and demonstrate that the proposed approach outperforms several baselines in task completion rate, system efficiency, and convergence speed, showing strong robustness and applicability in dynamic vehicular environments.
Authors: Markiyan Kostiv, Anatolii Adamovskyi, Yevhen Cherniavskyi, Mykyta Varenyk, Ostap Viniavskyi, Igor Krashenyi, Oles Dobosevych
Abstract: Multi‑object tracking (MOT) aims to maintain consistent identities of objects across video frames. Associating objects in low‑frame‑rate videos captured by moving unmanned aerial vehicles (UAVs) in actual combat scenarios is complex due to rapid changes in object appearance and position within the frame. The task becomes even more challenging due to image degradation caused by cloud video streaming and compression algorithms. We present how instance association learning from single‑frame annotations can overcome these challenges. We show that global features of the scene provide crucial context for low‑FPS instance association, allowing our solution to be robust to distractors and gaps in detections. We also demonstrate that such a tracking approach maintains high association quality even when reducing the input image resolution and latent representation size for faster inference. Finally, we present a benchmark dataset of annotated military vehicles collected from publicly available data sources. This paper was initially presented at the NATO Science and Technology Organization Symposium (ICMCIS) organized by the Information Systems Technology (IST)Scientific and Technical Committee, IST‑209‑RSY ‑ the ICMCIS, held in Oeiras, Portugal, 13‑14 May 2025.
Authors: Yichuan Shi, Hao Liu, Haowen Zheng, Haowen Yu, Xianqi Liang, Jie Li, Minmin Ma, Ximin Lyu
Abstract: Unmanned aerial vehicles (UAVs) are critical in the automated inspection of wind turbine blades. Nevertheless, several issues persist in this domain. Firstly, existing inspection platforms encounter challenges in meeting the demands of automated inspection tasks and scenarios. Moreover, current blade stop angle estimation methods are vulnerable to environmental factors, restricting their robustness. Additionally, there is an absence of real‑time blade detail prioritized exposure adjustment during capture, where lost details cannot be restored through post‑optimization. To address these challenges, we introduce a platform and two approaches. Initially, a UAV inspection platform is presented to meet the automated inspection requirements. Subsequently, a Fermat point based blade stop angle estimation approach is introduced, achieving higher precision and success rates. Finally, we propose a blade detail prioritized exposure adjustment approach to ensure appropriate brightness and preserve details during image capture. Extensive tests, comprising over 120 flights across 10 wind turbine models in 5 operational wind farms, validate the effectiveness of the proposed approaches in enhancing inspection autonomy.
Authors: Dibyabha Deb, Ujjwal Verma
Abstract: Identifying regions affected by disasters is a vital step in effectively managing and planning relief and rescue efforts. Unlike the traditional approaches of manually assessing post‑disaster damage, analyzing images of Unmanned Aerial Vehicles (UAVs) offers an objective and reliable way to assess the damage. In the past, segmentation techniques have been adopted to identify post‑flood damage in UAV aerial images. However, most of these supervised learning approaches rely on manually annotated datasets. Indeed, annotating images is a time‑consuming and error‑prone task that requires domain expertise. This work focuses on leveraging self‑supervised features to accurately identify flooded regions in UAV aerial images. This work proposes two encoder‑decoder‑based segmentation approaches, which integrate the visual features learned from DINOv2 with the traditional encoder backbone. This study investigates the generalization of self‑supervised features for UAV aerial images. Specifically, we evaluate the effectiveness of features from the DINOv2 model, trained on non‑aerial images, for segmenting aerial images, noting the distinct perspectives between the two image types. Our results demonstrate that DINOv2's self‑supervised pretraining on natural images generates transferable, general‑purpose visual features that streamline the development of aerial segmentation workflows. By leveraging these features as a foundation, we significantly reduce reliance on labor‑intensive manual annotation processes, enabling high‑accuracy segmentation with limited labeled aerial data.
Authors: Wanqing Tu
Abstract: Many UAV‑related applications require group communications between UAVs to reliably and efficiently deliver rich media content as well as to extend line‑of‑sight coverage between sky and ground. This paper studies fast yet resource‑efficient UAV transitions while maintaining high multicasting performance. We develop a set of analytic and algorithmic results to form the efficient transition formation (ETF) algorithm that deals with different UAV transition scenarios in a multicasting environment. The ETF algorithm first evaluates the seamlessness of a straight‑line trajectory (SLT), by processing low‑complexity computations (e.g., Euclidean distances) or a chain of fast checks with controlled traffic overheads. For an interrupted SLT, ETF establishes a new trajectory consisting of a minimum number of seamless straight lines that join at specially selected locations in terms of controlling mobile UAVs' seamless travel distances. Our simulation studies quantify the multicasting performance gains that ETF allows, outperforming compared studies when seamlessly transiting UAV group members.
Authors: Doumegna Mawuto Koudjo Felix, Xianjia Yu, Jiaqiang Zhang, Sier Ha, Zhuo Zou, Tomi Westerlund
Abstract: Lidar technology has been widely employed across various applications, such as robot localization in GNSS‑denied environments and 3D reconstruction. Recent advancements have introduced different lidar types, including cost‑effective solid‑state lidars such as the Livox Avia and Mid‑360. The Mid‑360, with its dome‑like design, is increasingly used in portable mapping and unmanned aerial vehicle (UAV) applications due to its low cost, compact size, and reliable performance. However, the lack of datasets that include dome‑shaped lidars, such as the Mid‑360, alongside other solid‑state and spinning lidars significantly hinders the comparative evaluation of novel approaches across platforms. Additionally, performance differences between low‑cost solid‑state and high‑end spinning lidars (e.g., Ouster OS series) remain insufficiently examined, particularly without an Inertial Measurement Unit (IMU) in odometry.
To address this gap, we introduce a novel dataset comprising data from multiple lidar types, including the low‑cost Livox Avia and the dome‑shaped Mid‑360, as well as high‑end spinning lidars such as the Ouster series. Notably, to the best of our knowledge, no existing dataset comprehensively includes dome‑shaped lidars such as Mid‑360 alongside both other solid‑state and spinning lidars. In addition to the dataset, we provide a benchmark evaluation of state‑of‑the‑art SLAM algorithms applied to this diverse sensor data. Furthermore, we present a quantitative analysis of point cloud registration techniques, specifically point‑to‑point, point‑to‑plane, and hybrid methods, using indoor and outdoor data collected from the included lidar systems. The outcomes of this study establish a foundational reference for future research in SLAM and 3D reconstruction across heterogeneous lidar platforms.
Authors: Xiaochen Wei, Weiwei Guo, Zenghui Zhang, Wenxian Yu
Abstract: It is highly challenging to register large‑scale, heterogeneous SAR and optical images, particularly across platforms, due to significant geometric, radiometric, and temporal differences, which most existing methods struggle to address. To overcome these challenges, we propose Grid‑Reg, a grid‑based multimodal registration framework comprising a domain‑robust descriptor extraction network, Hybrid Siamese Correlation Metric Learning Network (HSCMLNet), and a grid‑based solver (Grid‑Solver) for transformation parameter estimation. In heterogeneous imagery with large modality gaps and geometric differences, obtaining accurate correspondences is inherently difficult. To robustly measure similarity between gridded patches, HSCMLNet integrates a hybrid Siamese module with a correlation metric learning module (CMLModule) based on equiangular unit basis vectors (EUBVs), together with a manifold consistency loss to promote modality‑invariant, discriminative feature learning. The Grid‑Solver estimates transformation parameters by minimizing a global grid matching loss through a progressive dual‑loop search strategy to reliably find patch correspondences across entire images. Furthermore, we curate a challenging benchmark dataset for SAR‑to‑optical registration using real‑world UAV MiniSAR data and Google Earth optical imagery. Extensive experiments demonstrate that our proposed approach achieves superior performance over state‑of‑the‑art methods.
Authors: Yizhou Luo, Kwan-Wu Chin, Ruyi Guan, Xi Xiao, Caimeng Wang, Jingyin Feng, Tengjiao He
Abstract: Devices operating in Internet of Things (IoT) networks may be deployed across vast geographical areas and interconnected via multi‑hop communications. Further, they may be unguarded. This makes them vulnerable to attacks and motivates operators to check on devices frequently. To this end, we propose and study an Unmanned Aerial Vehicle (UAV)‑aided attestation framework for use in IoT networks with a charging station powered by solar. A key challenge is optimizing the trajectory of the UAV to ensure it attests as many devices as possible. A trade‑off here is that devices being checked by the UAV are offline, which affects the amount of data delivered to a gateway. Another challenge is that the charging station experiences time‑varying energy arrivals, which in turn affect the flight duration and charging schedule of the UAV. To address these challenges, we employ a Deep Reinforcement Learning (DRL) solution to optimize the UAV's charging schedule and the selection of devices to be attested during each flight. The simulation results show that our solution reduces the average age of trust by 88% and throughput loss due to attestation by 30%.
Authors: Hiba Bederina
Abstract: This article explores an approach to addressing the Close Enough Traveling Salesman Problem (CETSP). The objective is to streamline the mathematical formulation by introducing reformulations that approximate the Euclidean distances and simplify the objective function. Additionally, the use of convex sets in the constraint design offers computational benefits. The proposed methodology is empirically validated on real‑world CETSP instances, with the aid of computational strategies such as a fragmented CPLEX‑based approach. Results demonstrate its effectiveness in managing computational resources without compromising solution quality. Furthermore, the article analyzes the behavior of the proposed mathematical formulations, providing comprehensive insights into their performance.
Authors: Evangelos Ntouros, Pavel Kelley, Ewoud Smeur
Abstract: This work introduces a novel analytical model for estimating the airspeed of fixed‑wing Unmanned Aerial Vehicles (UAVs) using solely propeller power and rotational speed measurements. The model can be used to replace Pitot‑tube‑based airspeed sensors, or contribute to redundancy in airspeed estimation. It does not require knowledge of the vehicle's dynamic model and is computationally lightweight. It leverages power and rotational speed feedback, which is readily available from modern Electronic Speed Controllers (ESCs), thereby enabling seamless integration with existing systems and off‑the‑shelf components. A systematic approach is followed to derive the model structure based on least squares optimization and regularization techniques on Blade Element Momentum (BEM) simulation, wind tunnel, and flight test datasets. The final model generalizes well achieving a normalized Root Mean Square Error (nRMSE) of 5% on unseen flight data. The model parameters can be identified either offline, using flight logs with airspeed measurements, or in‑flight, using a lightweight identification method based only on Global Positioning System (GPS) velocity data. The resulting system provides a robust and computationally efficient solution for real‑time airspeed estimation across diverse fixed‑wing UAV platforms.
Authors: Hanjian Liu, Jinsong Gui, Xiaoheng Deng
Abstract: This paper designs a post‑disaster powered communication intelligent network (PDPCIN) to address communication disruptions caused by ground base station (GBS) failures within the post‑disaster area. PDPCIN employs unmanned aerial vehicles (UAVs) to provide wireless data collection (WDC) and wireless energy transmission (WET) for affected areas and leverages low earth orbit satellites (LEO SATs) to relay UAV data to the nearest survival GBS. To ensure basic post‑disaster communication while co‑optimizing age of information (AoI), energy efficiency, and spectrum efficiency, intelligent synchronization‑UAV (IS‑UAV) architecture, AoI‑based four thresholds updating (AFTU) mechanism, and Dynamic multi‑LEO access (DMLA) strategy are proposed. However, three key challenges remain: time‑varying task‑resource imbalances, complex topology caused by multi‑device scheduling, and nonlinear coupling in multidimensional metric optimization, making system optimization NP‑hard. Therefore, this paper proposes a hierarchical heterogeneous graph neural networks (HHGNN) framework. It models heterogeneous device nodes and their communication relations as a hierarchical heterogeneous graph structure, integrating our defined graph sensing, exchange, and mask layer to handle the network's input, feature propagation, and output. To search appropriate number of single‑LEO SATs, we propose single‑LEO SAT density optimization (S‑LSDO) algorithm. Finally, we compare the proposed scheme with state‑of‑the‑art benchmarks to validate its superior collaborative optimization of AoI, energy efficiency, and spectrum efficiency. Based on this, we derive the expressions for the expected values of AoI and stagnant AoI proportion.
Authors: Hanfang Liang, Shenghai Yuan, Fen Liu, Yizhuo Yang, Bing Wang, Zhuyu Huang, Chenyang Shi, Jing Jin
Abstract: The widespread use of consumer drones has introduced serious challenges for airspace security and public safety. Their high agility and unpredictable motion make drones difficult to track and intercept. While existing methods focus on detecting current positions, many counter‑drone strategies rely on forecasting future trajectories and thus require more than reactive detection to be effective. To address this critical gap, we propose an unsupervised vision‑based method for predicting the three‑dimensional trajectories of drones. Our approach first uses an unsupervised technique to extract drone trajectories from raw LiDAR point clouds, then aligns these trajectories with camera images through motion consistency to generate reliable pseudo‑labels. We then combine kinematic estimation with a visual Mamba neural network in a self‑supervised manner to predict future drone trajectories. We evaluate our method on the challenging MMAUD dataset, including the V2 sequences that feature wide‑field‑of‑view multimodal sensors and dynamic UAV motion in urban scenes. Extensive experiments show that our framework outperforms supervised image‑only and audio‑visual baselines in long‑horizon trajectory prediction, reducing 5‑second 3D error by around 40 percent without using any manual 3D labels. The proposed system offers a cost‑effective, scalable alternative for real‑time counter‑drone deployment. All code will be released upon acceptance to support reproducible research in the robotics community.
Authors: Gabriel O. Flores-Aquino, Octavio Gutierrez-Frias, Juan Irving Vasquez
Abstract: Path planning algorithms fundamentally aim to compute collision‑free paths, with many works focusing on finding the optimal distance path. However, for several applications, a more suitable approach is to balance response time, path safety, and path length. In this context, a skeleton map is a useful tool in graph‑based schemes, as it provides an intrinsic representation of the free workspace. However, standard skeletonization algorithms are computationally expensive, as they are primarly oriented towards image processing tasks. We propose an efficient path‑planning methodology that finds safe paths within an acceptable processing time. This methodology leverages a Deep Denoising Autoencoder (DDAE) based on the U‑Net architecture to compute a skeletonized version of the navigation map, which we refer to as SkelUnet. The SkelUnet network facilitates exploration of the entire workspace through one‑shot sampling (OSS), as opposed to the iterative or probabilistic sampling used by previous algorithms. SkelUnet is trained and tested on a dataset consisting of 12,500 two‑dimensional dungeon maps. The motion planning methodology is evaluated in a simulation environment with an Unmanned Aerial Vehicle (UAV) in 250 previously unseen maps and assessed using several navigation metrics to quantify the navigability of the computed paths. The results demonstrate that using SkelUnet to construct the roadmap offers significant advantages, such as connecting all regions of free workspace, providing safer paths, and reducing processing time.
Authors: Wenhao Wang, Yanyan Li, Long Jiao, Jiawei Yuan
Abstract: Recent advances in large Language Models (LLMs) have revolutionized mobile robots, including unmanned aerial vehicles (UAVs), enabling their intelligent operation within Internet of Things (IoT) ecosystems. However, LLMs still face challenges from logical reasoning and complex decision‑making, leading to concerns about the reliability of LLM‑driven UAV operations in IoT applications. In this paper, we propose a closed‑loop LLM‑driven UAV operation code generation framework that enables reliable UAV operations powered by effective feedback and refinement using two LLM modules, i.e., a Code Generator and an Evaluator. Our framework transforms numerical state observations from UAV operations into semantic trajectory descriptions to enhance the evaluator LLM's understanding of UAV dynamics for precise feedback generation. Our framework also enables a simulation‑based refinement process, and hence eliminates the risks to physical UAVs caused by incorrect code execution during the refinement. Extensive experiments on UAV control tasks with different complexities are conducted. The experimental results show that our framework can achieve reliable UAV operations using LLMs, which significantly outperforms baseline methods in terms of success rate and completeness with the increase of task complexity.
Authors: Yulan Gao, Ziqiang Ye, Zhonghao Lyu, Ming Xiao, Yue Xiao, Ping Yang, Agata Manolova
Abstract: Emerging low‑altitude economy networks (LAENets) require agile and privacy‑preserving resource control under dynamic agent mobility and limited infrastructure support. To meet these challenges, we propose a vision‑aided integrated sensing and communication (ISAC) framework for UAV‑assisted access systems, where onboard masked De‑Diffusion models extract compact semantic tokens, including agent type, activity class, and heading orientation, while explicitly suppressing sensitive visual content. These tokens are fused with mmWave radar measurements to construct a semantic risk heatmap reflecting motion density, occlusion, and scene complexity, which guides access technology selection and resource scheduling. We formulate a multi‑objective optimization problem to jointly maximize weighted energy and perception efficiency via radio access technology (RAT) assignment, power control, and beamforming, subject to agent‑specific QoS constraints. To solve this, we develop De‑Diffusion‑driven vision‑aided risk‑aware resource optimization algorithm DeDiff‑VARARO, a novel two‑stage cross‑modal control algorithm: the first stage reconstructs visual scenes from tokens via De‑Diffusion model for semantic parsing, while the second stage employs a deep deterministic policy gradient (DDPG)‑based policy to adapt RAT selection, power control, and beam assignment based on fused radar‑visual states. Simulation results show that DeDiff‑VARARO consistently outperforms baselines in reward convergence, link robustness, and semantic fidelity, achieving within 4% of the performance of a raw‑image upper bound while preserving user privacy and scalability in dense environments.
Authors: Bingxi Liu, Calvin Chen, Junhao Li, Guyang Yu, Haoqian Song, Xuchen Liu, Jinqiang Cui, Hong Zhang
Abstract: The Vision Transformer (ViT) model has long struggled with the challenge of quadratic complexity, a limitation that becomes especially critical in unmanned aerial vehicle (UAV) tracking systems, where data must be processed in real time. In this study, we explore the recently proposed State‑Space Model, Mamba, leveraging its computational efficiency and capability for long‑sequence modeling to effectively process dense image sequences in tracking tasks. First, we highlight the issue of temporal inconsistency in existing Mamba‑based methods, specifically the failure to account for temporal continuity in the Mamba scanning mechanism. Secondly, building upon this insight,we propose TrackingMiM, a Mamba‑in‑Mamba architecture, a minimal‑computation burden model for handling image sequence of tracking problem. In our framework, the mamba scan is performed in a nested way while independently process temporal and spatial coherent patch tokens. While the template frame is encoded as query token and utilized for tracking in every scan. Extensive experiments conducted on five UAV tracking benchmarks confirm that the proposed TrackingMiM achieves state‑of‑the‑art precision while offering noticeable higher speed in UAV tracking.
Authors: Ziyao Wang, Rongpeng Li, Sizhao Li, Yuming Xiang, Haiping Wang, Zhifeng Zhao, Honggang Zhang
Abstract: Intelligent control of Unmanned Aerial Vehicles (UAVs) swarms has emerged as a critical research focus, and it typically requires the swarm to navigate effectively while avoiding obstacles and achieving continuous coverage over multiple mission targets. Although traditional Multi‑Agent Reinforcement Learning (MARL) approaches offer dynamic adaptability, they are hindered by the semantic gap in numerical communication and the rigidity of homogeneous role structures, resulting in poor generalization and limited task scalability. Recent advances in Large Language Model (LLM)‑based control frameworks demonstrate strong semantic reasoning capabilities by leveraging extensive prior knowledge. However, due to the lack of online learning and over‑reliance on static priors, these works often struggle with effective exploration, leading to reduced individual potential and overall system performance. To address these limitations, we propose a Role‑Adaptive LLM‑Driven Yoked navigation algorithm RALLY. Specifically, we first develop an LLM‑driven semantic decision framework that uses structured natural language for efficient semantic communication and collaborative reasoning. Afterward, we introduce a dynamic role‑heterogeneity mechanism for adaptive role switching and personalized decision‑making. Furthermore, we propose a Role‑value Mixing Network (RMIX)‑based assignment strategy that integrates LLM offline priors with MARL online policies to enable semi‑offline training of role selection strategies. Experiments in the Multi‑Agent Particle Environment (MPE) environment and a Software‑In‑The‑Loop (SITL) platform demonstrate that RALLY outperforms conventional approaches in terms of task coverage, convergence speed, and generalization, highlighting its strong potential for collaborative navigation in agentic multi‑UAV systems.
Authors: Enzhi Zhou, Yue Xiao, Ziyue Liu, Sotiris A. Tegos, Panagiotis D. Diamantoulakis, George K. Karagiannidis
Abstract: With the rapid development of aerial infrastructure, unmanned aerial vehicles (UAVs) that function as aerial base stations (ABSs) extend terrestrial network services into the sky, enabling on‑demand connectivity and enhancing emergency communication capabilities in cellular networks by leveraging the flexibility and mobility of UAVs. In such a UAV‑assisted network, this paper investigates position‑based beamforming between ABSs and ground users (GUs). To mitigate inter‑cell interference, we propose a novel fluid aerial network that leverages ABS rotation to increase multi‑cell capacity and overall network efficiency. Specifically, considering the line‑of‑sight channel model, the spatial beamforming weights are determined by the orientation angles of the GUs. In this direction, we examine the beamforming gain of a two‑dimensional multiple‑input multiple‑output (MIMO) array at various ground positions, revealing that ABS rotation significantly affects multi‑user channel correlation and inter‑cell interference. Based on these findings, we propose an alternative low‑complexity algorithm to design the optimal rotation angle for ABSs, aiming to reduce inter‑cell interference and thus maximize the sum rate of multi‑cell systems. In simulations, exhaustive search serves as a benchmark to validate the optimization performance of the proposed sequential ABS rotation scheme. Moreover, simulation results demonstrate that, in interference‑limited regions, the proposed ABS rotation paradigm can significantly reduce inter‑cell interference in terrestrial networks and improve the multi‑cell sum rate by approximately 10% compared to fixed‑direction ABSs without rotation.
Authors: Hongxing Peng, Lide Chen, Hui Zhu, Yan Chen
Abstract: Object detection in Unmanned Aerial Vehicle (UAV) imagery is fundamentally challenged by a prevalence of small, densely packed, and occluded objects within cluttered backgrounds. Conventional detectors struggle with this domain, as they rely on hand‑crafted components like pre‑defined anchors and heuristic‑based Non‑Maximum Suppression (NMS), creating a well‑known performance bottleneck in dense scenes. Even recent end‑to‑end frameworks have not been purpose‑built to overcome these specific aerial challenges, resulting in a persistent performance gap. To bridge this gap, we introduce HEDS‑DETR, a holistically enhanced real‑time Detection Transformer tailored for aerial scenes. Our framework features three key innovations. First, we propose a novel High‑Frequency Enhanced Semantics Network (HFESNet) backbone, which yields highly discriminative features by preserving critical high‑frequency details alongside robust semantic context. Second, our Efficient Small Object Pyramid (ESOP) counteracts information loss by efficiently fusing high‑resolution features, significantly boosting small object detection. Finally, we enhance decoder stability and localization precision with two synergistic components: Selective Query Recollection (SQR) and Geometry‑Aware Positional Encoding (GAPE), which stabilize optimization and provide explicit spatial priors for dense object arrangements. On the VisDrone dataset, HEDS‑DETR achieves a +3.8% AP and +5.1% AP50 gain over its baseline while reducing parameters by 4M and maintaining real‑time speeds. This demonstrates a highly competitive accuracy‑efficiency balance, especially for detecting dense and small objects in aerial scenes.
Authors: Can Cui, ZIye Jia, Chao Dong, Qihui Wu
Abstract: Unmanned aerial vehicles (UAVs) are recognized as a promising candidate for the multi‑access edge computing (MEC) in the future sixth generation communication networks. However, the aerial eavesdropping UAVs (EUAVs) pose a significant security threat to the data offloading. In this paper, we investigate a robust MEC scenario with multiple service UAVs (SUAVs) towards the potential eavesdropping from the EUAV, in which the random parameters such as task complexities are considered in the practical applications. In detail, the problem is formulated to optimize the deployment positions of SUAVs, the connection relationships between GUs and SUAVs, and the offloading ratios. With the uncertain task complexities, the corresponding chance constraints are constructed under the uncertainty set, which is tricky to deal with. Therefore, we first optimize the pre‑deployment of SUAVs by the K‑means algorithm. Then, the distributionally robust optimization method is employed, and the conditional value at risk is utilized to transform the chance constraints into convex forms, which can be solved via convex toolkits. Finally, the simulation results show that with the consideration of uncertainties, just 5% more energy is consumed compared with the ideal circumstance, which verifies the robustness of the proposed algorithms.
Authors: Reza Ahmadvand, Sarah Safura Sharif, Yaser Mike Banad
Abstract: Recent advances in multi‑agent systems manipulation have demonstrated a rising demand for the implementation of multi‑UAV systems in urban areas, which are always subjected to the presence of static and dynamic obstacles. Inspired by the collective behavior of tilapia fish and pigeons, the focus of the presented research is on the introduction of a nature‑inspired collision‑free formation control for a multi‑UAV system, considering the obstacle avoidance maneuvers. The developed framework in this study utilizes a semi‑distributed control approach, in which, based on a probabilistic Lloyd's algorithm, a centralized guidance algorithm works for optimal positioning of the UAVs, while a distributed control approach has been used for the intervehicle collision and obstacle avoidance. Further, the presented framework has been extended to the 3D space with a novel definition of 3D maneuvers. Finally, the presented framework has been applied to multi‑UAV systems in 2D and 3D scenarios, and the obtained results demonstrated the validity of the presented method in dynamic environments with stationary and moving obstacles.
Authors: Shawon Mitra, Subhojit Sarkar, Sasthi C. Ghosh
Abstract: One primary focus of next generation wireless communication networks is the millimeterwave (mmWave) spectrum, typically considered in the 30 GHz to 300 GHz frequency range. Despite their promise of high data rates, mmWaves suffer from severe attenuation while passing through obstacles. Unmanned aerial vehicles (UAVs) have been proposed to offset this limitation on account of their additional degrees of freedom, which can be leveraged to provide line of sight (LoS) transmission paths. While some prior works have proposed analytical frameworks to compute the LoS probability for static ground users and a UAV, the same is lacking for mobile users on the ground. In this paper, we consider the popular Manhattan point line process (MPLP) to model an urban environment, within which a ground user moves with a known velocity for a small time interval along the roads. We derive an expression for the expected duration of LoS between a static UAV in the air and a mobile ground user, and validate the same through simulations. To demonstrate the efficacy of the proposed analysis, we propose a simple user association algorithm that greedily assigns the UAVs to users with the highest expected LoS time, and show that it outperforms the existing benchmark schemes that assign the users to the nearest UAVs with LoS without considering the user mobility.
Authors: Jiahui Li, Geng Sun, Xiaoyu Sun, Fang Mei, Jingjing Wang, Xiangwang Hou, Daxin Tian, Victor C. M. Leung
Abstract: Low‑altitude wireless networks (LAWNs) have garnered significant attention in the forthcoming 6G networks. In LAWNs, satellites with wide coverage and unmanned aerial vehicles (UAVs) with flexible mobility can complement each other to form integrated satellite‑UAV networks, providing ubiquitous and high‑speed connectivity for low‑altitude operations. However, the higher line‑of‑sight probability in low‑altitude airspace increases transmission security concerns. In this work, we present a collaborative beamforming‑based physical layer security scheme for LAWNs. We introduce the fundamental aspects of integrated satellite‑UAV networks, physical layer security, UAV swarms, and collaborative beamforming for LAWN applications. Following this, we highlight several opportunities for collaborative UAV swarm secure applications enabled by satellite networks, including achieving physical layer security in scenarios involving data dissemination, data relay, eavesdropper collusion, and imperfect eavesdropper information. Next, we detail two case studies: a secure relay system and a two‑way aerial secure communication framework specifically designed for LAWN environments. Simulation results demonstrate that these physical layer security schemes are effective and beneficial for secure low‑altitude wireless communications. A short practicality analysis shows that the proposed method is applicable to LAWN scenarios. Finally, we discuss current challenges and future research directions for enhancing security in LAWNs.
Authors: Geng Sun, Mingzhe Fan, Lei Zhang, Hongyang Pan, Jiahui Li, Chuang Zhang, Linyao Li, Changyuan Zhao, Chau Yuen
Abstract: Wireless communication systems face challenges in meeting the demand for higher data rates and reliable connectivity in complex environments. Stacked intelligent metasurfaces (SIMs) have emerged as a promising technology for advanced wave‑domain signal processing, where mobile SIMs can outperform fixed counterparts. In this paper, we propose a novel unmanned aerial vehicle (UAV)‑mounted SIM (UAV‑SIM) assisted communication system within low‑altitude economy (LAE) networks, where UAVs act as both cache‑enabled base stations and mobile SIM carriers to enhance uplink transmissions. To maximize network capacity, we formulate a UAV‑SIM‑based joint optimization problem (USBJOP) that integrates user association, UAV‑SIM three‑dimensional positioning, and multi‑layer SIM phase shift design. Due to the non‑convexity and NP‑hardness of USBJOP, we decompose it into three subproblems, which are the association between UAV‑SIMs and users optimization problem (AUUOP), the UAV location optimization problem (ULOP), and the UAV‑SIM phase shifts optimization problem (USPSOP). Then, we solve them through an alternating optimization strategy. Specifically, AUUOP and ULOP are transformed into convex forms solvable via the CVX tool, while USPSOP is addressed by a generative artificial intelligence (GAI)‑based hybrid optimization algorithm. Simulation results show that the proposed approach achieves approximately 1.5 times higher network capacity compared with suboptimal schemes, effectively mitigates multi‑user interference with increasing SIM layers and meta‑atoms, and reduces runtime by 10% while maintaining solution quality, thereby demonstrating its practicality for real‑world deployments.
Authors: Kunwei Lv, Zhiren Xiao, Hang Ren, Ping Lan
Abstract: The rapid proliferation of unmanned aerial vehicles (UAVs) has highlighted the importance of robust and efficient object detection in diverse aerial scenarios. Detecting small objects under complex conditions, however, remains a significant challenge.To address this, we present DGE‑YOLO, an enhanced YOLO‑based detection framework designed to effectively fuse multi‑modal information. We introduce a dual‑branch architecture for modality‑specific feature extraction, enabling the model to process both infrared and visible images. To further enrich semantic representation, we propose an Efficient Multi‑scale Attention (EMA) mechanism that enhances feature learning across spatial scales. Additionally, we replace the conventional neck with a Gather‑and‑Distribute(GD) module to mitigate information loss during feature aggregation. Extensive experiments on the Drone Vehicle dataset demonstrate that DGE‑YOLO achieves superior performance over state‑of‑the‑art methods, validating its effectiveness in multi‑modal UAV object detection tasks.
Authors: Nohgyeom Ha, Horim Lee, Min Jang, Gyoungdeuk Kim, Hoyong Kim, Byeongjin Park, Manos M. Tentzeris, Sangkil Kim
Abstract: This paper presents a comprehensive review and tutorial on multi‑functional metasurfaces integrated with M‑type ferrite materials for millimeter‑wave (mmWave) absorption and beam control. As wireless communication systems transition toward beyond‑5G architectures, including non‑terrestrial networks (NTNs), the demand for adaptive, low‑profile electromagnetic surfaces that can manage interference while enabling beam reconfiguration becomes increasingly critical. Conventional metasurfaces often struggle to simultaneously achieve high absorption and beamforming over wide frequency ranges due to intrinsic material and structural limitations. This paper reviews the state‑of‑the‑art in metasurface design for dual‑functionality, particularly those combining frequency‑selective magnetic materials with periodic surface lattices, to enable passive, compact, and reconfigurable reflectors and absorbers. Special emphasis is placed on the role of M‑type ferrites in enhancing absorption via ferromagnetic resonance, and on the use of surface‑wave trapping mechanisms to achieve narrowband and broadband functionality. A case study of a ferrite‑based hybrid "reflectsorber" (reflectorarray + absorber) is presented to demonstrate key design concepts, analytical models, and application scenarios relevant to satellite, UAV, and NTN ground station deployments. Future directions for low‑loss, tunable, and scalable metasurfaces in next‑generation wireless infrastructures are also discussed.
Authors: Kamran Shafafi, Manuel Ricardo, Rui Campos
Abstract: Unmanned Aerial Vehicles (UAVs) offer a promising solution for enhancing wireless connectivity and Quality of Service (QoS) in urban environments, acting as aerial Wi‑Fi access points or cellular base stations. Their flexibility and rapid deployment capabilities make them suitable for addressing infrastructure gaps and traffic surges. However, optimizing UAV positions to maintain Line of Sight (LoS) links with ground User Equipment (UEs) remains challenging in obstacle‑dense urban scenarios. This paper proposes VTOPA, a Vision‑Aided Traffic‑ and Obstacle‑Aware Positioning Algorithm that autonomously extracts environmental information ‑‑ such as obstacles and UE locations ‑‑ via computer vision and optimizes UAV positioning accordingly. The algorithm prioritizes LoS connectivity and dynamically adapts to user traffic demands in real time. Evaluated through simulations in ns‑3, VTOPA achieves up to a 50% increase in aggregate throughput and a 50% reduction in delay, without compromising fairness, outperforming benchmark approaches in obstacle‑rich environments.
Authors: Hossein B. Jond, Logan Beaver, Martin Jiroušek, Naiemeh Ahmadlou, Veli Bakırcıoğlu, Martin Saska
Abstract: Optimal collision‑free formation control of the unmanned aerial vehicle (UAV) is a challenge. The state‑of‑the‑art optimal control approaches often rely on numerical methods sensitive to initial guesses. This paper presents an innovative collision‑free finite‑time formation control scheme for multiple UAVs leveraging the differential flatness of the UAV dynamics, eliminating the need for numerical methods. We formulate a finite‑time optimal control problem to plan a formation trajectory for feasible initial states. This optimal control problem in formation trajectory planning involves a collective performance index to meet the formation requirements to achieve relative positions and velocity consensus. It is solved by applying Pontryagin's principle. Subsequently, a collision‑constrained regulating problem is addressed to ensure collision‑free tracking of the planned formation trajectory. The tracking problem incorporates a directionally aware collision avoidance strategy that prioritizes avoiding UAVs in the forward path and relative approach. It assigns lower priority to those on the sides with an oblique relative approach, disregarding UAVs behind and not in the relative approach. The high‑fidelity simulation results validate the effectiveness of the proposed control scheme.
Authors: Kien Nguyen, Clinton Fookes, Sridha Sridharan, Huy Nguyen, Feng Liu, Xiaoming Liu, Arun Ross, Dana Michalski, Tamás Endrei, Ivan DeAndres-Tame, Ruben Tolosana, Ruben Vera-Rodriguez, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia, Zijing Gong, Yuhao Wang, Xuehu Liu, Pingping Zhang, Md Rashidunnabi, Hugo Proença, Kailash A. Hambarde, Saeid Rezaei
Abstract: Person re‑identification (ReID) across aerial and ground vantage points has become crucial for large‑scale surveillance and public safety applications. Although significant progress has been made in ground‑only scenarios, bridging the aerial‑ground domain gap remains a formidable challenge due to extreme viewpoint differences, scale variations, and occlusions. Building upon the achievements of the AG‑ReID 2023 Challenge, this paper introduces the AG‑VPReID 2025 Challenge ‑ the first large‑scale video‑based competition focused on high‑altitude (80‑120m) aerial‑ground ReID. Constructed on the new AG‑VPReID dataset with 3,027 identities, over 13,500 tracklets, and approximately 3.7 million frames captured from UAVs, CCTV, and wearable cameras, the challenge featured four international teams. These teams developed solutions ranging from multi‑stream architectures to transformer‑based temporal reasoning and physics‑informed modeling. The leading approach, X‑TFCLIP from UAM, attained 72.28% Rank‑1 accuracy in the aerial‑to‑ground ReID setting and 70.77% in the ground‑to‑aerial ReID setting, surpassing existing baselines while highlighting the dataset's complexity. For additional details, please refer to the official website at https://agvpreid25.github.io.
Authors: Sijie He, Ziye Jia, Qiuming Zhu, Fuhui Zhou, Qihui Wu
Abstract: Due to the scalability and portability, the low‑altitude intelligent networks (LAINs) are essential in various fields such as surveillance and disaster rescue. However, in LAINs, unmanned aerial vehicles (UAVs) are characterized by the distributed topology and high dynamic mobility, and vulnerable to security threats, which may degrade the routing performance for data transmission. Hence, how to ensure the routing stability and security of LAINs is a challenge. In this paper, we focus on the routing process in LAINs with multiple UAV clusters and propose the blockchain‑enabled zero‑trust architecture to manage the joining and exiting of UAVs. Furthermore, we formulate the routing problem to minimize the end‑to‑end (E2E) delay, which is an integer linear programming and intractable to solve. Therefore, considering the distribution of LAINs, we reformulate the routing problem into a decentralized partially observable Markov decision process. With the proposed soft hierarchical experience replay buffer, the multi‑agent double deep Q‑network based adaptive routing algorithm is designed. Finally, simulations are conducted and numerical results show that the total E2E delay of the proposed mechanism decreases by 22.38% than the benchmark on average.
Authors: Fabio Paonessa, Lorenzo Ciorba, Giuseppe Addamo, Paz Alonso-Arias, Barbara Caccianiga, Marco Bersanelli, Francesco Cuttaia, Cristian Franceschet, Ricardo Tanausu Genova Santos, Massimo Gervasi, Roger Hoyland, Mike Jones, Carlos Hugo Lopez-Caraballo, Mauro Lumia, Michele Maris, Aniello Mennella, Gianluca Morgante, Oscar Antonio Peverini, Sabrina Realini, Jose Alberto Rubino-Martin, Stefano Sartor, Angela Taylor, Fabrizio Villa, Mario Zannoni, Giuseppe Virone
Abstract: The Large Scale Polarization Explorer (LSPE) project, funded by the Italian Space Agency (ASI), includes the development of LSPE‑Strip, a ground‑based radio telescope for observing Cosmic Microwave Background (CMB) anisotropies. LSPE‑Strip, nearing its construction phase, will operate from the Teide Observatory in Tenerife, employing 49 coherent polarimeters at 43 GHz to deliver critical data on CMB anisotropies and 6 channels at 95 GHz as atmospheric monitor. On‑site characterization of such advanced instruments is crucial to detect possible systematic effects, such as gain fluctuations, beam distortions, and pointing errors, that can compromise performance by introducing spurious polarizations or radiation collection from unintended directions. To address these challenges, a drone‑mounted Q‑band test source for on‑site characterization of LSPE‑Strip's polarimeter array was developed. Modern Unmanned Aerial Vehicles (UAVs) offer a flexible approach for antenna pattern measurements, yet their use in high‑frequency radio astronomy is not consolidated practice. In October 2022, a UAV‑based measurement campaign was conducted with the TFGI instrument on the second QUIJOTE telescope in Tenerife, in collaboration with the Instituto de Astrofisica de Canarias. This pioneering effort aimed to validate UAV‑based beam characterization methods and assess QUIJOTE's performance under operational conditions. Preliminary results demonstrated high measurement accuracy, leveraging QUIJOTE's dual‑receiver configuration for beam validation. These findings provide valuable insights for optimizing UAV systems in preparation for LSPE‑Strip's future characterization.
Authors: Xinxin Sun, Peter Chang
Abstract: Accurate image alignment is essential for monitoring crack evolution in structural health monitoring (SHM), particularly under real‑world conditions involving perspective distortion, occlusion, and low contrast. However, traditional feature detectors such as SIFT and SURF, which rely on Gaussian‑based scale spaces, tend to suppress high‑frequency edges, making them unsuitable for thin crack localization. Lightweight binary alternatives like ORB and BRISK, while computationally efficient, often suffer from poor keypoint repeatability on textured or shadowed surfaces. This study presents a physics‑informed alignment framework that adapts the open KAZE architecture to SHM‑specific challenges. By utilizing nonlinear anisotropic diffusion to construct a crack‑preserving scale space, and integrating RANSAC‑based homography estimation, the framework enables accurate geometric correction without the need for training, parameter tuning, or prior calibration. The method is validated on time‑lapse images of masonry and concrete acquired via handheld smartphone under varied field conditions, including shadow interference, cropping, oblique viewing angles, and surface clutter. Compared to classical detectors, the proposed framework reduces crack area and spine length errors by up to 70 percent and 90 percent, respectively, while maintaining sub‑5 percent alignment error in key metrics. Unsupervised, interpretable, and computationally lightweight, this approach supports scalable deployment via UAVs and mobile platforms. By tailoring nonlinear scale‑space modeling to SHM image alignment, this work offers a robust and physically grounded alternative to conventional techniques for tracking real‑world crack evolution.
Authors: Pritam Dash, Ethan Chan, Nathan P. Lawrence, Karthik Pattabiraman
Abstract: Unmanned Aerial Vehicles (UAVs) depend on onboard sensors for perception, navigation, and control. However, these sensors are susceptible to physical attacks, such as GPS spoofing, that can corrupt state estimates and lead to unsafe behavior. While reinforcement learning (RL) offers adaptive control capabilities, existing safe RL methods are ineffective against such attacks. We present ARMOR (Adaptive Robust Manipulation‑Optimized State Representations), an attack‑resilient, model‑free RL controller that enables robust UAV operation under adversarial sensor manipulation. Instead of relying on raw sensor observations, ARMOR learns a robust latent representation of the UAV's physical state via a two‑stage training framework. In the first stage, a teacher encoder, trained with privileged attack information, generates attack‑aware latent states for RL policy training. In the second stage, a student encoder is trained via supervised learning to approximate the teacher's latent states using only historical sensor data, enabling real‑world deployment without privileged information. Our experiments show that ARMOR outperforms conventional methods, ensuring UAV safety. Additionally, ARMOR improves generalization to unseen attacks and reduces training cost by eliminating the need for iterative adversarial training.
Authors: Nouf Almesafri, Hector Figueiredo, Miguel Arana-Catania
Abstract: This study investigates the performance of the two most relevant computer vision deep learning architectures, Convolutional Neural Network and Vision Transformer, for event‑based cameras. These cameras capture scene changes, unlike traditional frame‑based cameras with capture static images, and are particularly suited for dynamic environments such as UAVs and autonomous vehicles. The deep learning models studied in this work are ResNet34 and ViT B16, fine‑tuned on the GEN1 event‑based dataset. The research evaluates and compares these models under both standard conditions and in the presence of simulated noise. Initial evaluations on the clean GEN1 dataset reveal that ResNet34 and ViT B16 achieve accuracies of 88% and 86%, respectively, with ResNet34 showing a slight advantage in classification accuracy. However, the ViT B16 model demonstrates notable robustness, particularly given its pre‑training on a smaller dataset. Although this study focuses on ground‑based vehicle classification, the methodologies and findings hold significant promise for adaptation to UAV contexts, including aerial object classification and event‑based vision systems for aviation‑related tasks.
Authors: Deepak Kumar Panda, Weisi Guo
Abstract: The growing integration of UAVs into civilian airspace underscores the need for resilient and intelligent intrusion detection systems (IDS), as traditional anomaly detection methods often fail to identify novel threats. A common approach treats unfamiliar attacks as out‑of‑distribution (OOD) samples; however, this leaves systems vulnerable when mitigation is inadequate. Moreover, conventional OOD detectors struggle to distinguish stealthy adversarial attacks from genuine OOD events. This paper introduces a conditional generative adversarial network (cGAN)‑based framework for crafting stealthy adversarial attacks that evade IDS mechanisms. We first design a robust multi‑class IDS classifier trained on benign UAV telemetry and known cyber‑attacks, including Denial of Service (DoS), false data injection (FDI), man‑in‑the‑middle (MiTM), and replay attacks. Using this classifier, our cGAN perturbs known attacks to generate adversarial samples that misclassify as benign while retaining statistical resemblance to OOD distributions. These adversarial samples are iteratively refined to achieve high stealth and success rates. To detect such perturbations, we implement a conditional variational autoencoder (CVAE), leveraging negative log‑likelihood to separate adversarial inputs from authentic OOD samples. Comparative evaluation shows that CVAE‑based regret scores significantly outperform traditional Mahalanobis distance‑based detectors in identifying stealthy adversarial threats. Our findings emphasize the importance of advanced probabilistic modeling to strengthen IDS capabilities against adaptive, generative‑model‑based cyber intrusions.
Authors: Deepak Kumar Panda, Adolfo Perrusquia, Weisi Guo
Abstract: Reinforcement learning (RL) policies deployed in safety‑critical systems, such as unmanned aerial vehicle (UAV) navigation in dynamic airspace, are vulnerable to out‑ofdistribution (OOD) adversarial attacks in the observation space. These attacks induce distributional shifts that significantly degrade value estimation, leading to unsafe or suboptimal decision making rendering the existing policy fragile. To address this vulnerability, we propose an antifragile RL framework designed to adapt against curriculum of incremental adversarial perturbations. The framework introduces a simulated attacker which incrementally increases the strength of observation‑space perturbations which enables the RL agent to adapt and generalize across a wider range of OOD observations and anticipate previously unseen attacks. We begin with a theoretical characterization of fragility, formally defining catastrophic forgetting as a monotonic divergence in value function distributions with increasing perturbation strength. Building on this, we define antifragility as the boundedness of such value shifts and derive adaptation conditions under which forgetting is stabilized. Our method enforces these bounds through iterative expert‑guided critic alignment using Wasserstein distance minimization across incrementally perturbed observations. We empirically evaluate the approach in a UAV deconfliction scenario involving dynamic 3D obstacles. Results show that the antifragile policy consistently outperforms standard and robust RL baselines when subjected to both projected gradient descent (PGD) and GPS spoofing attacks, achieving up to 15% higher cumulative reward and over 30% fewer conflict events. These findings demonstrate the practical and theoretical viability of antifragile reinforcement learning for secure and resilient decision‑making in environments with evolving threat scenarios.
Authors: Deepak Kumar Panda, Weisi Guo
Abstract: Autonomous UAV navigation using reinforcement learning (RL) is vulnerable to adversarial attacks that manipulate sensor inputs, potentially leading to unsafe behavior and mission failure. Although robust RL methods provide partial protection, they often struggle to generalize to unseen or out‑of‑distribution (OOD) attacks due to their reliance on fixed perturbation settings. To address this limitation, we propose a meta‑policy switching framework in which a meta‑level polic dynamically selects among multiple robust policies to counter unknown adversarial shifts. At the core of this framework lies a discounted Thompson sampling (DTS) mechanism that formulates policy selection as a multi‑armed bandit problem, thereby minimizing value distribution shifts via self‑induced adversarial observations. We first construct a diverse ensemble of action‑robust policies trained under varying perturbation intensities. The DTS‑based meta‑policy then adaptively selects among these policies online, optimizing resilience against self‑induced, piecewise‑stationary attacks. Theoretical analysis shows that the DTS mechanism minimizes expected regret, ensuring adaptive robustness to OOD attacks and exhibiting emergent antifragile behavior under uncertainty. Extensive simulations in complex 3D obstacle environments under both white‑box (Projected Gradient Descent) and black‑box (GPS spoofing) attacks demonstrate significantly improved navigation efficiency and higher conflict free trajectory rates compared to standard robust and vanilla RL baselines, highlighting the practical security and dependability benefits of the proposed approach.
Authors: Ritvik Agarwal, Behnoushsadat Hatami, Alvika Gautam, Parikshit Maini
Abstract: We consider an online variant of the fuel‑constrained UAV routing problem with a ground‑based mobile refueling station (FCURP‑MRS), where targets incur unknown fuel costs. We develop a two‑phase solution: an offline heuristic‑based planner computes initial UAV and UGV paths, and a novel online planning algorithm that dynamically adjusts rendezvous points based on real‑time fuel consumption during target processing. Preliminary Gazebo simulations demonstrate the feasibility of our approach in maintaining UAV‑UGV path validity, ensuring mission completion. Link to video: https://youtu.be/EmpVj‑fjqNY
Authors: Hamza Chakraa, François Guérin, Edouard Leclercq, Dimitri Lefebvre
Abstract: This study addresses the optimisation of task allocation for Unmanned Aerial Vehicles (UAVs) within industrial monitoring missions. The proposed methodology integrates a Genetic Algorithms (GA) with a 2‑Opt local search technique to obtain a high‑quality solution. Our approach was experimentally validated in an industrial zone to demonstrate its efficacy in real‑world scenarios. Also, a Hardware‑in‑the‑loop (HIL) simulator for the UAVs team is introduced. Moreover, insights about the correlation between the theoretical cost function and the actual battery consumption and time of flight are deeply analysed. Results show that the considered costs for the optimisation part of the problem closely correlate with real‑world data, confirming the practicality of the proposed approach.
Authors: Tian Liu, Han Liu, Boyang Li, Long Chen, Kai Huang
Abstract: Unmanned Aerial Vehicles (UAVS) are limited by the onboard energy. Refinement of the navigation strategy directly affects both the flight velocity and the trajectory based on the adjustment of key parameters in the UAVS pipeline, thus reducing energy consumption. However, existing techniques tend to adopt static and conservative strategies in dynamic scenarios, leading to inefficient energy reduction. Dynamically adjusting the navigation strategy requires overcoming the challenges including the task pipeline interdependencies, the environmental‑strategy correlations, and the selecting parameters. To solve the aforementioned problems, this paper proposes a method to dynamically adjust the navigation strategy of the UAVS by analyzing its dynamic characteristics and the temporal characteristics of the autonomous navigation pipeline, thereby reducing UAVS energy consumption in response to environmental changes. We compare our method with the baseline through hardware‑in‑the‑loop (HIL) simulation and real‑world experiments, showing our method 3.2X and 2.6X improvements in mission time, 2.4X and 1.6X improvements in energy, respectively.
Authors: Mohammad Taghi Dabiri, Mazen Hasna, Saif Al-Kuwari, Khalid Qaraqe
Abstract: This paper develops a comprehensive analytical framework for modeling and performance evaluation of unmanned aerial vehicles (UAVs)‑to‑ground quantum communication links, incorporating key physical impairments such as beam divergence, pointing errors at both transmitter and receiver, atmospheric attenuation, turbulence‑induced fading, narrow field‑of‑view (FoV) filtering, and background photon noise. To overcome the limitations of conventional wide‑beam assumptions, we introduce a grid‑based approximation for photon capture probability that remains accurate under tightly focused beams. Analytical expressions are derived for the quantum key generation rate and quantum bit error rate (QBER), enabling fast and reliable system‑level evaluation. Our results reveal that secure quantum key distribution (QKD) over UAV‑based free‑space optical (FSO) links requires beam waists below 10 cm and sub‑milliradian tracking precision to achieve Mbps‑level key rates and QBER below 10^‑3. Additionally, we highlight the critical role of receiver FoV in balancing background noise rejection and misalignment tolerance, and propose adaptive FoV tuning strategies under varying illumination and alignment conditions. The proposed framework provides a tractable and accurate tool for the design, optimization, and deployment of next‑generation airborne quantum communication systems.
Authors: Jingwen Wei
Abstract: The growing use of mobile robots in sectors such as automotive, agriculture, and rescue operations reflects progress in robotics and autonomy. In unmanned aerial vehicles (UAVs), most research emphasizes visual SLAM, sensor fusion, and path planning. However, applying UAVs to search and rescue missions in disaster zones remains underexplored, especially for autonomous navigation.
This report develops methods for real‑time and secure UAV maneuvering in complex 3D environments, crucial during forest fires. Building upon past research, it focuses on designing navigation algorithms for unfamiliar and hazardous environments, aiming to improve rescue efficiency and safety through UAV‑based early warning and rapid response.
The work unfolds in phases. First, a 2D fusion navigation strategy is explored, initially for mobile robots, enabling safe movement in dynamic settings. This sets the stage for advanced features such as adaptive obstacle handling and decision‑making enhancements. Next, a novel 3D reactive navigation strategy is introduced for collision‑free movement in forest fire simulations, addressing the unique challenges of UAV operations in such scenarios.
Finally, the report proposes a unified control approach that integrates UAVs and unmanned ground vehicles (UGVs) for coordinated rescue missions in forest environments. Each phase presents challenges, proposes control models, and validates them with mathematical and simulation‑based evidence. The study offers practical value and academic insights for improving the role of UAVs in natural disaster rescue operations.
Authors: Fangzhi Li, Zhichu Ren, Cunhua Pan, Hong Ren, Jing Jin, Qixing Wang, Jiangzhou Wang
Abstract: To empower the low‑altitude economy with high‑accuracy sensing and high‑rate communication, this paper proposes a cooperative integrated sensing and communication (ISAC) framework for aerial‑ground networks. In the proposed system, the ground base stations (BSs) cooperatively serve the unmanned aerial vehicles (UAVs), which are equipped for either joint communication and sensing or sensing‑only operations. The BSs employ coordinated beamforming to simultaneously transmit communication and sensing signals, while the UAVs execute their missions. To maximize the weighted sum rate under the sensing signal‑to‑interference‑plus‑noise ratio (SINR) constraints, we jointly optimize the transmit beamforming, receive filtering, and UAV trajectory. The resulting non‑convex problem is solved using an alternating optimization framework incorporating semidefinite relaxation (SDR) and successive convex approximation (SCA). Simulation results demonstrate that the proposed joint design achieves higher communication throughput while ensuring required sensing robustness. Additionally, the sensing SINR threshold and the UAV altitude have a significant impact on the trajectory design, highlighting the necessity of adaptive deployment strategies in practical applications.
Authors: Genís Castillo Gómez-Raya, Álmos Veres-Vitályos, Filip Lemic, Pablo Royo, Mario Montagud, Sergi Fernández, Sergi Abadal, Xavier Costa-Pérez
Abstract: The increasing miniaturization of Unmanned Aerial Vehicles (UAVs) has expanded their deployment potential to indoor and hard‑to‑reach areas. However, this trend introduces distinct challenges, particularly in terms of flight dynamics and power consumption, which limit the UAVs' autonomy and mission capabilities. This paper presents a novel approach to overcoming these limitations by integrating Neural 3D Reconstruction (N3DR) with small UAV systems for fine‑grained 3‑Dimensional (3D) digital reconstruction of small static objects. Specifically, we design, implement, and evaluate an N3DR‑based pipeline that leverages advanced models, i.e., Instant‑ngp, Nerfacto, and Splatfacto, to improve the quality of 3D reconstructions using images of the object captured by a fleet of small UAVs. We assess the performance of the considered models using various imagery and pointcloud metrics, comparing them against the baseline Structure from Motion (SfM) algorithm. The experimental results demonstrate that the N3DR‑enhanced pipeline significantly improves reconstruction quality, making it feasible for small UAVs to support high‑precision 3D mapping and anomaly detection in constrained environments. In more general terms, our results highlight the potential of N3DR in advancing the capabilities of miniaturized UAV systems.
Authors: Kaixuan Li, Kan Yu, Dingyou Ma, Yujia Zhao, Xiaowu Liu, Qixun Zhang, ZHiyong Feng
Abstract: This paper investigates the potential of movable antenna (MA)‑enabled micro‑mobility to replace UAV‑enabled macro‑mobility for enhancing physical layer security (PLS) in air‑to‑ground communications. While UAV trajectory optimization offers high flexibility and Line‑of‑Sight (LoS) advantages, it suffers from significant energy consumption, latency, and complex trajectory optimization. Conversely, MA technology provides fine‑grained spatial reconfiguration (antenna positioning within a confined area) with ultra‑low energy overhead and millisecond‑scale response, enabling real‑time channel manipulation and covert beam steering. To systematically compare these paradigms, we establish a dual‑scale mobility framework where a UAV‑mounted uniform linear array (ULA) serves as a base station transmitting confidential information to a legitimate user (Bob) in the presence of an eavesdropper (Eve). We formulate non‑convex average secrecy rate (ASR) maximization problems for both schemes: 1) MA‑based micro‑mobility: Jointly optimizing antenna positions and beamforming (BF) vectors under positioning constraints; 2) UAV‑based macro‑mobility: Jointly optimizing the UAV's trajectory and BF vectors under kinematic constraints. Extensive simulations reveal distinct operational regimes: MA micro‑mobility demonstrates significant ASR advantages in low‑transmit‑power scenarios or under antenna constraints due to its energy‑efficient spatial control. Conversely, UAV macro‑mobility excels under resource‑sufficient conditions (higher power, larger antenna arrays) by leveraging global mobility for optimal positioning. The findings highlight the complementary strengths of both approaches, suggesting hybrid micro‑macro mobility as a promising direction for balancing security, energy efficiency, and deployment complexity in future wireless networks.
Authors: Qiangsheng Gao, Ka Ho Cheng, Li Qiu, Zijun Gong
Abstract: Relative localization in the near‑field scenario is critically important for unmanned vehicle (UxV) applications. Although related works addressing 2D relative localization problem have been widely studied for unmanned ground vehicles (UGVs), the problem in 3D scenarios for unmanned aerial vehicles (UAVs) involves more uncertainties and remains to be investigated. Inspired by the phenomenon that animals can achieve swarm behaviors solely based on individual perception of relative information, this study proposes an infrastructure‑free 3D relative localization framework that relies exclusively on onboard ultra‑wideband (UWB) sensors. Leveraging 2D relative positioning research, we conducted feasibility analysis, system modeling, simulations, performance evaluation, and field tests using UWB sensors. The key contributions of this work include: derivation of the Cramér‑Rao lower bound (CRLB) and geometric dilution of precision (GDOP) for near‑field scenarios; development of two localization algorithms ‑‑ one based on Euclidean distance matrix (EDM) and another employing maximum likelihood estimation (MLE); comprehensive performance comparison and computational complexity analysis against state‑of‑the‑art methods; simulation studies and field experiments; a novel sensor deployment strategy inspired by animal behavior, enabling single‑sensor implementation within the proposed framework for UxV applications. The theoretical, simulation, and experimental results demonstrate strong generalizability to other 3D near‑field localization tasks, with significant potential for a cost‑effective cross‑platform UxV collaborative system.
Authors: Ellis Duncalfe, Milena Radenkovic
Abstract: The detonation of an improvised nuclear device (IND) in an urban area would cause catastrophic damage, followed by hazardous radioactive fallout. Timely dissemination of radiation data is crucial for evacuation and casualty reduction. However, conventional communication infrastructure is likely to be severely disrupted. This study designs and builds a pseudorealistic, geospatially and temporally dynamic post‑nuclear event (PNE) scenario using the Opportunistic Network Environment (ONE) simulator. It integrates radiation sensing by emergency responders, unmanned aerial vehicles (UAVs), and civilian devices as dynamic nodes within Delay‑Tolerant Networks (DTNs). The performance of two DTN routing protocols, Epidemic and PRoPHET, was evaluated across multiple PNE phases. Both protocols achieve high message delivery rates, with PRoPHET exhibiting lower network overhead but higher latency. Findings demonstrate the potential of DTN‑based solutions to support emergency response and evacuation safety by ensuring critical radiation data propagation despite severe infrastructure damage.
Authors: Marios-Nektarios Stamatopoulos, Shridhar Velhal, Avijit Banerjee, George Nikolakopoulos
Abstract: This paper presents a novel high‑level task planning and optimal coordination framework for autonomous masonry construction, using a team of heterogeneous aerial robotic workers, consisting of agents with separate skills for brick placement and mortar application. This introduces new challenges in scheduling and coordination, particularly due to the mortar curing deadline required for structural bonding and ensuring the safety constraints among UAVs operating in parallel. To address this, an automated pipeline generates the wall construction plan based on the available bricks while identifying static structural dependencies and potential conflicts for safe operation. The proposed framework optimizes UAV task allocation and execution timing by incorporating dynamically coupled precedence deadline constraints that account for the curing process and static structural dependency constraints, while enforcing spatio‑temporal constraints to prevent collisions and ensure safety. The primary objective of the scheduler is to minimize the overall construction makespan while minimizing logistics, traveling time between tasks, and the curing time to maintain both adhesion quality and safe workspace separation. The effectiveness of the proposed method in achieving coordinated and time‑efficient aerial masonry construction is extensively validated through Gazebo simulated missions. The results demonstrate the framework's capability to streamline UAV operations, ensuring both structural integrity and safety during the construction process.
Authors: Jagadeswara PKV Pothuri, Aditya Bhatt, Prajit KrisshnaKumar, Manaswin Oddiraju, Souma Chowdhury
Abstract: Autonomous tracking of flying aerial objects has important civilian and defense applications, ranging from search and rescue to counter‑unmanned aerial systems (counter‑UAS). Ground based tracking requires setting up infrastructure, could be range limited, and may not be feasible in remote areas, crowded cities or in dense vegetation areas. Vision based active tracking of aerial objects from another airborne vehicle, e.g., a chaser unmanned aerial vehicle (UAV), promises to fill this important gap, along with serving aerial coordination use cases. Vision‑based active tracking by a UAV entails solving two coupled problems: 1) compute‑efficient and accurate (target) object detection and target state estimation; and 2) maneuver decisions to ensure that the target remains in the field of view in the future time‑steps and favorably positioned for continued detection. As a solution to the first problem, this paper presents a novel integration of standard deep learning based architectures with Kernelized Correlation Filter (KCF) to achieve compute‑efficient object detection without compromising accuracy, unlike standalone learning or filtering approaches. The proposed perception framework is validated using a lab‑scale setup. For the second problem, to obviate the linearity assumptions and background variations limiting effectiveness of the traditional controllers, we present the use of reinforcement learning to train a neuro‑controller for fast computation of velocity maneuvers. New state space, action space and reward formulations are developed for this purpose, and training is performed in simulation using AirSim. The trained model is also tested in AirSim with respect to complex target maneuvers, and is found to outperform a baseline PID control in terms of tracking up‑time and average distance maintained (from the target) during tracking.
Authors: Xiang Yuming, Li Sizhao, Li Rongpeng, Zhao Zhifeng, Zhang Honggang
Abstract: Multiple quadrotor unmanned aerial vehicle (UAV) systems have garnered widespread research interest and fostered tremendous interesting applications, especially in multi‑constrained pursuit‑evasion games (MC‑PEG). The Cooperative Evasion and Formation Coverage (CEFC) task, where the UAV swarm aims to maximize formation coverage across multiple target zones while collaboratively evading predators, belongs to one of the most challenging issues in MC‑PEG, especially under communication‑limited constraints. This multifaceted problem, which intertwines responses to obstacles, adversaries, target zones, and formation dynamics, brings up significant high‑dimensional complications in locating a solution. In this paper, we propose a novel two‑level framework (i.e., Consensus Inference‑based Hierarchical Reinforcement Learning (CI‑HRL)), which delegates target localization to a high‑level policy, while adopting a low‑level policy to manage obstacle avoidance, navigation, and formation. Specifically, in the high‑level policy, we develop a novel multi‑agent reinforcement learning module, Consensus‑oriented Multi‑Agent Communication (ConsMAC), to enable agents to perceive global information and establish consensus from local states by effectively aggregating neighbor messages. Meanwhile, we leverage an Alternative Training‑based Multi‑agent proximal policy optimization (AT‑M) and policy distillation to accomplish the low‑level control. The experimental results, including the high‑fidelity software‑in‑the‑loop (SITL) simulations, validate that CI‑HRL provides a superior solution with enhanced swarm's collaborative evasion and task completion capabilities.
Authors: Mahmoud M. Salim, Khaled M. Rabie, Ali H. Muqaibel
Abstract: In this letter, we propose an energy‑efficient design for an unmanned aerial vehicle (UAV)‑mounted reconfigurable intelligent surface (RIS) communication system with nonlinear energy harvesting (EH) and UAV jitter. A joint optimization problem is formulated to maximize the EH efficiency of the UAV‑mounted RIS by controlling the user powers, RIS phase shifts, and time‑switching factor, subject to quality of service and practical EH constraints. The problem is nonconvex and time‑coupled due to UAV angular jitter and nonlinear EH dynamics, making it intractable for conventional optimization methods. To address this, we reformulate the problem as a deep reinforcement learning (DRL) environment and develop a smoothed softmax dual deep deterministic policy gradient algorithm. The proposed method incorporates action clipping, entropy regularization, and softmax‑weighted Q‑value estimation to improve learning stability and exploration. Simulation results show that the proposed algorithm converges reliably under various UAV jitter levels and achieves an average EH efficiency of 45.07%, approaching the 53.09% upper bound of exhaustive search, and outperforming other DRL baselines.
Authors: Ming He, Peizhao Wang, Haihua Chen, Bin Sun, Hongpeng Wang
Abstract: Multiple unmanned aerial vehicles (UAVs) play a vital role in monitoring and data collection in wide area environments with harsh conditions. In most scenarios, issues such as real‑time data retrieval and real‑time UAV positioning are often disregarded, essentially neglecting the communication constraints. In this paper, we comprehensively address both the coverage of the target area and the data transmission capabilities of the flying ad hoc network (FANET). The data throughput of the network is therefore maximized by optimizing the network topology and the UAV trajectories. The resultant optimization problem is effectively solved by the proposed reinforcement learning‑based trajectory planning (RL‑TP) algorithm and the convex‑based topology optimization (C‑TOP) algorithm sequentially. The RL‑TP optimizes the UAV paths while considering the constraints of FANET. The C‑TOP maximizes the data throughput of the network while simultaneously constraining the neighbors and transmit powers of the UAVs, which is shown to be a convex problem that can be efficiently solved in polynomial time. Simulations and field experimental results show that the proposed optimization strategy can effectively plan the UAV trajectories and significantly improve the data throughput of the FANET over the adaptive local minimum spanning tree (A‑LMST) and cyclic pruning‑assisted power optimization (CPAPO) methods.
Authors: Mehmet Kaan Erol, Eyup Emre Ulku
Abstract: SDR (Software Defined Radio) provides flexible, reproducible, and longer‑lasting radio tools for military and civilian wireless communications infrastructure. SDR is a radio communication system whose components are implemented as software. This study aims to establish multi‑channel wireless communication with FANET between two SDRs to share location information and examine it in a realistic test environment. We used multi‑channel token circulation as a channel access protocol and GNU Radio platform for SDR software development. The structures of the communication layer, including the protocols, communication systems, and network structures suggested in the studies in the literature, are generally tested in the simulation environment. The simulation environment provides researchers with fast and easy development and testing, but disadvantages exist. These cause a product to be isolated from hardware, software, and cost effects encountered while developing and environmental factors affecting the communication channel while testing. Another contribution of the study is to present the developed block diagrams and codes as clear and reproducible. The developed software and block diagrams are available at github.com/knrl/uav‑in‑802.11‑gnuradio.
Authors: Akarsh K Nair, Shanik Hubert Satheesh Kumar., Deepti Gupta
Abstract: The exponential growth of android‑based mobile IoT systems has significantly increased the susceptibility of devices to cyberattacks, particularly in smart homes, UAVs, and other connected mobile environments. This article presents a federated learning‑based intrusion detection framework called AndroIDS that leverages system call traces as a personalized and privacy‑preserving data source. Unlike conventional centralized approaches, the proposed method enables collaborative anomaly detection without sharing raw data, thus preserving user privacy across distributed nodes. A generalized system call dataset was generated to reflect realistic android system behavior and serves as the foundation for experimentation. Extensive evaluation demonstrates the effectiveness of the FL model under both IID and non‑IID conditions, achieving an accuracy of 96.46 % and 92.87 %, and F1‑scores of 89 % and 86 %, respectively. These results highlight the models robustness to data heterogeneity, with only a minor performance drop in the non‑IID case. Further, a detailed comparison with centralized deep learning further illustrates trade‑offs in detection performance and deployment feasibility. Overall, the results validate the practical applicability of the proposed approach for secure and scalable intrusion detection in real‑world mobile IoT scenarios.
Authors: Qingyang Zhang, Mohammad Dwipa Furqan, Tasfia Nutzhat, Fumio Machida, Ermeson Andrade
Abstract: Uncrewed Aerial Vehicle (UAV) computing and networking are becoming a fundamental computation infrastructure for diverse cyber‑physical application systems. UAVs can be empowered by AI on edge devices and can communicate with other UAVs and ground stations via wireless communication networks. Dynamic computation demands and heterogeneous computing resources are distributed in the system and need to be controlled to maintain the quality of services and to accomplish critical missions. With the evolution of UAV‑based systems, dependability assurance of such systems emerges as a crucial challenge. UAV‑based systems confront diverse sources of uncertainty that may threaten their dependability, such as software bugs, component failures, network disconnections, battery shortages, and disturbances from the real world. In this paper, we conduct systematic literature reviews on the dependability of UAV‑based networks and computing systems. The survey report reveals emerging research trends in this field and summarizes the literature into comprehensive categories by threat types and adopted technologies. Based on our literature reviews, we identify eight research fields that require further exploration in the future to achieve dependable UAV‑based systems.
Authors: Liu Zongzhen, Luo Hui, Wang Zhixing, Wei Yuxing, Zuo Haorui, Zhang Jianlin
Abstract: Unmanned aerial vehicle (UAV) object detection plays a vital role in applications such as environmental monitoring and urban security. To improve robustness, recent studies have explored multimodal detection by fusing visible (RGB) and infrared (IR) imagery. However, due to UAV platform motion and asynchronous imaging, spatial misalignment frequently occurs between modalities, leading to weak alignment. This introduces two major challenges: semantic inconsistency at corresponding spatial locations and modality conflict during feature fusion. Existing methods often address these issues in isolation, limiting their effectiveness. In this paper, we propose Cross‑modal Offset‑guided Dynamic Alignment and Fusion (CoDAF), a unified framework that jointly tackles both challenges in weakly aligned UAV‑based object detection. CoDAF comprises two novel modules: the Offset‑guided Semantic Alignment (OSA), which estimates attention‑based spatial offsets and uses deformable convolution guided by a shared semantic space to align features more precisely; and the Dynamic Attention‑guided Fusion Module (DAFM), which adaptively balances modality contributions through gating and refines fused features via spatial‑channel dual attention. By integrating alignment and fusion in a unified design, CoDAF enables robust UAV object detection. Experiments on standard benchmarks validate the effectiveness of our approach, with CoDAF achieving a mAP of 78.6% on the DroneVehicle dataset.
Authors: Jinbo Wen, Cheng Su, Jiawen Kang, Jiangtian Nie, Yang Zhang, Jianhang Tang, Dusit Niyato, Chau Yuen
Abstract: Low‑Altitude Economy Networks (LAENets) are emerging as a promising paradigm to support various low‑altitude services through integrated air‑ground infrastructure. To satisfy low‑latency and high‑computation demands, the integration of Unmanned Aerial Vehicles (UAVs) with Mobile Edge Computing (MEC) systems plays a vital role, which offloads computing tasks from terminal devices to nearby UAVs, enabling flexible and resilient service provisions for ground users. To promote the development of LAENets, it is significant to achieve low‑carbon multi‑UAV‑assisted MEC networks. However, several challenges hinder this implementation, including the complexity of multi‑dimensional UAV modeling and the difficulty of multi‑objective coupled optimization. To this end, this paper proposes a novel Retrieval Augmented Generation (RAG)‑based Large Language Model (LLM) agent framework for model formulation. Specifically, we develop HybridRAG by combining KeywordRAG, VectorRAG, and GraphRAG, empowering LLM agents to efficiently retrieve structural information from expert databases and generate more accurate optimization problems compared with traditional RAG‑based LLM agents. After customizing carbon emission optimization problems for multi‑UAV‑assisted MEC networks, we propose a Double Regularization Diffusion‑enhanced Soft Actor‑Critic (R\textsuperscript2DSAC) algorithm to solve the formulated multi‑objective optimization problem. The R\textsuperscript2DSAC algorithm incorporates diffusion entropy regularization and action entropy regularization to improve the performance of the diffusion policy. Furthermore, we dynamically mask unimportant neurons in the actor network to reduce the carbon emissions associated with model training. Simulation results demonstrate the effectiveness and reliability of the proposed HybridRAG‑based LLM agent framework and the R\textsuperscript2DSAC algorithm.
Authors: Zakria Qadir, Muhammad Bilal, Guoqiang Liu, Xiaolong Xu
Abstract: The unmanned aerial vehicles (UAVs) in a disaster‑prone environment plays important role in assisting the rescue services and providing the internet connectivity with the outside world. However, in such a complex environment the selection of optimum trajectory of UAVs is of utmost importance. UAV trajectory optimization deals with finding the shortest path in the minimal possible time. In this paper, a cluster optimization scheme (COS) is proposed using the Henry gas optimization (HGO) metaheuristic algorithm to identify the shortest path having minimal transportation cost and algorithm complexity. The mathematical model is designed for COS using the HGO algorithm and compared with the state‑of‑the‑art metaheuristic algorithms such as particle swarm optimization (PSO), grey wolf optimization (GWO), cuckoo search algorithm (CSA) and barnacles mating optimizer (BMO). In order to prove the robustness of the proposed model, four different scenarios are evaluated that includes ambient environment, constrict environment, tangled environment, and complex environment. In all the aforementioned scenarios, the HGO algorithm outperforms the existing algorithms. Particularly, in the ambient environment, the HGO algorithm achieves a 39.3% reduction in transportation cost and a 16.8% reduction in computational time as compared to the PSO algorithm. Hence, the HGO algorithm can be used for autonomous trajectory optimization of UAVs in smart cities.
Authors: Di Wang, Shi Li
Abstract: Estimating forest above‑ground biomass (AGB) is crucial for assessing carbon storage and supporting sustainable forest management. Quantitative Structural Model (QSM) offers a non‑destructive approach to AGB estimation through 3D tree structural reconstruction. However, current QSM methods face significant limitations, as they are primarily designed for individual trees,depend on high‑quality point cloud data from terrestrial laser scanning (TLS), and also require multiple pre‑processing steps that hinder scalability and practical deployment. This study presents a novel unified framework that enables end‑to‑end processing of large‑scale point clouds using an innovative graph‑based pipeline. The proposed approach seamlessly integrates tree segmentation,leaf‑wood separation and 3D skeletal reconstruction through dedicated graph operations including pathing and abstracting for tree topology reasoning. Comprehensive validation was conducted on datasets with varying leaf conditions (leaf‑on and leaf‑off), spatial scales (tree‑ and plot‑level), and data sources (TLS and UAV‑based laser scanning, ULS). Experimental results demonstrate strong performance under challenging conditions, particularly in leaf‑on scenarios (~20% relative error) and low‑density ULS datasets with partial coverage (~30% relative error). These findings indicate that the proposed framework provides a robust and scalable solution for large‑scale, non‑destructive AGB estimation. It significantly reduces dependency on specialized pre‑processing tools and establishes ULS as a viable alternative to TLS. To our knowledge, this is the first method capable of enabling seamless, end‑to‑end 3D tree reconstruction at operational scales. This advancement substantially improves the feasibility of QSM‑based AGB estimation, paving the way for broader applications in forest inventory and climate change research.
Authors: Jiahao You, Ziye Jia, Chao Dong, Qihui Wu, Zhu Han
Abstract: The computation demands from the maritime Internet of Things (MIoT) increase rapidly in recent years, and the unmanned aerial vehicles (UAVs) and vessels based multi‑access edge computing (MEC) can fulfill these MIoT requirements. However, the uncertain maritime tasks present significant challenges of inefficient computation offloading and resource allocation. In this paper, we focus on the maritime computation offloading and resource allocation through the cooperation of UAVs and vessels, with consideration of uncertain tasks. Specifically, we propose a cooperative MEC framework for computation offloading and resource allocation, including MIoT devices, UAVs and vessels. Then, we formulate the optimization problem to minimize the total execution time. As for the uncertain MIoT tasks, we leverage Lyapunov optimization to tackle the unpredictable task arrivals and varying computational resource availability.
By converting the long‑term constraints into short‑term constraints, we obtain a set of small‑scale optimization problems. Further, considering the heterogeneity of actions and resources of UAVs and vessels, we reformulate the small‑scale optimization problem into a Markov game (MG). Moreover, a heterogeneous‑agent soft actor‑critic is proposed to sequentially update various neural networks and effectively solve the MG problem.
Finally, simulations are conducted to verify the effectiveness in addressing computational offloading and resource allocation.
Authors: Wanzhe Wang, Jianqiu Peng, Menghao Hu, Weihuang Zhong, Tong Zhang, Shuai Wang, Yixin Zhang, Mingjie Shao, Wanli Ni
Abstract: Hyper‑parameters are essential and critical for the performance of communication algorithms. However, current hyper‑parameters optimization approaches for Warm‑Start Particles Swarm Optimization with Crossover and Mutation (WS‑PSO‑CM) algorithm, designed for radio map‑enabled unmanned aerial vehicle (UAV) trajectory and communication, are primarily heuristic‑based, exhibiting low levels of automation and improvable performance. In this paper, we design an Large Language Model (LLM) agent for automatic hyper‑parameters‑tuning, where an iterative framework and Model Context Protocol (MCP) are applied. In particular, the LLM agent is first set up via a profile, which specifies the boundary of hyper‑parameters, task objective, terminal condition, conservative or aggressive strategy of optimizing hyper‑parameters, and LLM configurations. Then, the LLM agent iteratively invokes WS‑PSO‑CM algorithm for exploration. Finally, the LLM agent exits the loop based on the terminal condition and returns an optimized set of hyperparameters. Our experiment results show that the minimal sum‑rate achieved by hyper‑parameters generated via our LLM agent is significantly higher than those by both human heuristics and random generation methods. This indicates that an LLM agent with PSO and WS‑PSO‑CM algorithm knowledge is useful in seeking high‑performance hyper‑parameters.
Authors: Suman Raj, Swapnil Padhi, Ruchi Bhoot, Prince Modi, Yogesh Simmhan
Abstract: Autonomous navigation by drones using onboard sensors combined with machine learning and computer vision algorithms is impacting a number of domains, including agriculture, logistics, and disaster management. In this paper, we examine the use of drones for assisting visually impaired people (VIPs) in navigating through outdoor urban environments. Specifically, we present a perception‑based path planning system for local planning around the neighborhood of the VIP, integrated with a global planner based on GPS and maps for coarse planning. We represent the problem using a geometric formulation and propose a multi DNN based framework for obstacle avoidance of the UAV as well as the VIP. Our evaluations conducted on a drone human system in a university campus environment verifies the feasibility of our algorithms in three scenarios; when the VIP walks on a footpath, near parked vehicles, and in a crowded street.
Authors: Rongchang Lu, Tianduo Luo, Yunzhi Jiang, Conghan Yue, Pei Yang, Guibao Liu, Changyang Gu
Abstract: Image restoration faces challenges including ineffective feature fusion, computational bottlenecks and inefficient diffusion processes. To address these, we propose DiffRWKVIR, a novel framework unifying Test‑Time Training (TTT) with efficient diffusion. Our approach introduces three key innovations: (1) Omni‑Scale 2D State Evolution extends RWKV's location‑dependent parameterization to hierarchical multi‑directional 2D scanning, enabling global contextual awareness with linear complexity O(L); (2) Chunk‑Optimized Flash Processing accelerates intra‑chunk parallelism by 3.2x via contiguous chunk processing (O(LCd) complexity), reducing sequential dependencies and computational overhead; (3) Prior‑Guided Efficient Diffusion extracts a compact Image Prior Representation (IPR) in only 5‑20 steps, proving 45% faster training/inference than DiffIR while solving computational inefficiency in denoising. Evaluated across super‑resolution and inpainting benchmarks (Set5, Set14, BSD100, Urban100, Places365), DiffRWKVIR outperforms SwinIR, HAT, and MambaIR/v2 in PSNR, SSIM, LPIPS, and efficiency metrics. Our method establishes a new paradigm for adaptive, high‑efficiency image restoration with optimized hardware utilization.
Authors: Zhuoyue Tan, Boyong He, Yuxiang Ji, Liaoni Wu
Abstract: This paper presents VisLanding, a monocular 3D perception‑based framework for safe UAV (Unmanned Aerial Vehicle) landing. Addressing the core challenge of autonomous UAV landing in complex and unknown environments, this study innovatively leverages the depth‑normal synergy prediction capabilities of the Metric3D V2 model to construct an end‑to‑end safe landing zones (SLZ) estimation framework. By introducing a safe zone segmentation branch, we transform the landing zone estimation task into a binary semantic segmentation problem. The model is fine‑tuned and annotated using the WildUAV dataset from a UAV perspective, while a cross‑domain evaluation dataset is constructed to validate the model's robustness. Experimental results demonstrate that VisLanding significantly enhances the accuracy of safe zone identification through a depth‑normal joint optimization mechanism, while retaining the zero‑shot generalization advantages of Metric3D V2. The proposed method exhibits superior generalization and robustness in cross‑domain testing compared to other approaches. Furthermore, it enables the estimation of landing zone area by integrating predicted depth and normal information, providing critical decision‑making support for practical applications.
Authors: Thien Nhan Vo
Abstract: This paper presents the development of a sliding mode controller using the backstepping approach. The controller is employed to synthesize tracking errors and Lyapunov functions. A novel state‑space representation is formulated by incorporating the dynamics of the quadrotor and accounting for non‑holonomic constraints. The proposed sliding mode controller effectively addresses system nonlinearities and improves tracking of predefined trajectories. Simulation results are presented graphically to demonstrate the controller's performance.
Authors: Vasiliki Balaska, Ioannis Tsampikos Papapetros, Katerina Maria Oikonomou, Loukas Bampis, Antonios Gasteratos
Abstract: The mining sector increasingly adopts digital tools to improve operational efficiency, safety, and data‑driven decision‑making. One of the key challenges remains the reliable acquisition of high‑resolution, geo‑referenced spatial information to support core activities such as extraction planning and on‑site monitoring. This work presents an integrated system architecture that combines UAV‑based sensing, LiDAR terrain modeling, and deep learning‑based object detection to generate spatially accurate information for open‑pit mining environments. The proposed pipeline includes geo‑referencing, 3D reconstruction, and object localization, enabling structured spatial outputs to be integrated into an industrial digital twin platform. Unlike traditional static surveying methods, the system offers higher coverage and automation potential, with modular components suitable for deployment in real‑world industrial contexts. While the current implementation operates in post‑flight batch mode, it lays the foundation for real‑time extensions. The system contributes to the development of AI‑enhanced remote sensing in mining by demonstrating a scalable and field‑validated geospatial data workflow that supports situational awareness and infrastructure safety.
Authors: Kamran Shafafi, Alaa Awad Abdellatif, Manuel Ricardo, Rui Campos
Abstract: Unmanned Aerial Vehicles (UAVs) have emerged as a key enabler for next‑generation wireless networks due to their on‑demand deployment, high mobility, and ability to provide Line‑of‑Sight (LoS) connectivity. These features make UAVs particularly well‑suited for dynamic and mission‑critical applications such as intelligent transportation systems and emergency communications. However, effectively positioning multiple UAVs in real‑time to meet non‑uniform, time‑varying traffic demands remains a significant challenge, especially when aiming to optimize network throughput and resource utilization. In this paper, we propose an Efficient Multi‑UAV Traffic‑Aware Deployment (EMTAD) Algorithm, a scalable and adaptive framework that dynamically adjusts UAV placements based on real‑time user locations and spatial traffic distribution. In contrast to existing methods, EMTAD jointly optimizes UAV positioning and minimizes the number of deployed UAVs, ensuring efficient UE‑UAV association while satisfying the traffic demand of users. Simulation results demonstrate that EMTAD significantly improves network performance while reducing deployment overhead by minimizing the number of UAVs required in dynamic and traffic‑aware environments.
Authors: Hongjiang Lei, Congke Jiang, Ki-Hong Park, Mohamed A. Aboulhassan, Sen Zhou, Gaofeng Pan
Abstract: Integrated communication and sensing, which can make full use of the limited spectrum resources to perform communication and sensing tasks simultaneously, is an up‑and‑coming technology in wireless communication networks. In this work, we investigate the secrecy performance of an uncrewed aerial vehicle (UAV)‑assisted secure integrated communication, sensing, and computing system, where the UAV sends radar signals to locate and disrupt potential eavesdroppers while providing offload services to ground users (GUs). Considering the constraints of UAV maximum speed, transmit power, and propulsion energy, as well as secure offloading, data transmission, and computation time, the total energy consumption of GUs is minimized by jointly optimizing user offloading ratio, user scheduling strategy, transmit beamforming, and UAV trajectory. An efficient iterative optimization algorithm is proposed to solve the non‑convex optimization problem caused by tightly coupled dependent variables. In particular, the original optimization problem is decomposed into four sub‑optimization problems, and the non‑convex sub‑problems are transformed into approximately convex forms via successive convex approximation. Then, all sub‑problems are solved successively by using the block coordinate descent technique. Numerical results demonstrate the convergence and validate the effectiveness of the proposed algorithm.
Authors: Fen Liu, Shenghai Yuan, Thien-Minh Nguyen, Rong Su
Abstract: Commercial UAVs are an emerging security threat as they are capable of carrying hazardous payloads or disrupting air traffic. To counter UAVs, we introduce an autonomous 3D target encirclement and interception strategy. Unlike traditional ground‑guided systems, this strategy employs autonomous drones to track and engage non‑cooperative hostile UAVs, which is effective in non‑line‑of‑sight conditions, GPS denial, and radar jamming, where conventional detection and neutralization from ground guidance fail. Using two noisy real‑time distances measured by drones, guardian drones estimate the relative position from their own to the target using observation and velocity compensation methods, based on anti‑synchronization (AS) and an X‑Y circular motion combined with vertical jitter. An encirclement control mechanism is proposed to enable UAVs to adaptively transition from encircling and protecting a target to encircling and monitoring a hostile target. Upon breaching a warning threshold, the UAVs may even employ a suicide attack to neutralize the hostile target. We validate this strategy through real‑world UAV experiments and simulated analysis in MATLAB, demonstrating its effectiveness in detecting, encircling, and intercepting hostile drones. More details: https://youtu.be/5eHW56lPVto.
Authors: Evgenii Vinogradov, Abdul Saboor, Zhuangzhuang Cui, Aymen Fakhreddine
Abstract: In this paper, we present a spatially consistent A2G channel model based on probabilistic LOS/NLOS segmentation to parameterize the deterministic path loss and stochastic shadow fading model. Motivated by the limitations of existing Unmanned Aerial Vehicle (UAV) channel models that overlook spatial correlation, our approach reproduces LOS/NLOS transitions along ground user trajectories in urban environments. This model captures environment‑specific obstructions by means of azimuth and elevation‑dependent LOS probabilities without requiring a full detailed 3D representation of the surroundings. We validate our framework against a geometry‑based simulator by evaluating it across various urban settings. The results demonstrate its accuracy and computational efficiency, enabling further realistic derivations of path loss and shadow fading models and thorough outage analysis.
Authors: Boran Wang, Ziye Jia, Can Cui, Qihui Wu
Abstract: With the development of low earth orbit (LEO) satellites and unmanned aerial vehicles (UAVs), the space‑air‑ground integrated network (SAGIN) becomes a major trend in the next‑generation networks. However, due to the instability of heterogeneous communication and time‑varying characteristics of SAGIN, it is challenging to meet the remote Internet of Things (IoT) demands for data collection and offloading. In this paper, we investigate a two‑phase hierarchical data uplink model in SAGIN. Specifically, UAVs optimize trajectories to enable efficient data collection from IoT devices, and then they transmit the data to LEO satellites with computing capabilities for further processing. The problem is formulated to minimize the total energy consumption for IoT devices, UAVs, and LEO satellites. Since the problem is in the form of mixed‑integer nonlinear programming and intractable to solve directly, we decompose it into two phases. In the IoT‑UAV phase, we design the algorithm to jointly optimize the IoT pairing, power allocation, and UAVs trajectories. Considering the high dynamic characteristics of LEO satellites, a real‑time LEO satellite selection mechanism joint with the Satellite Tool Kit is proposed in the UAV‑LEO phase. Finally, simulation results show the effectiveness of the proposed algorithms, with about 10% less energy consumption compared with the benchmark algorithm.
Authors: Yuqi Ping, Tianhao Liang, Huahao Ding, Guangyu Lei, Junwei Wu, Xuan Zou, Kuan Shi, Rui Shao, Chiya Zhang, Weizheng Zhang, Weijie Yuan, Tingting Zhang
Abstract: Recent breakthroughs in multimodal large language models (MLLMs) have endowed AI systems with unified perception, reasoning and natural‑language interaction across text, image and video streams. Meanwhile, Unmanned Aerial Vehicle (UAV) swarms are increasingly deployed in dynamic, safety‑critical missions that demand rapid situational understanding and autonomous adaptation. This paper explores potential solutions for integrating MLLMs with UAV swarms to enhance the intelligence and adaptability across diverse tasks. Specifically, we first outline the fundamental architectures and functions of UAVs and MLLMs. Then, we analyze how MLLMs can enhance the UAV system performance in terms of target detection, autonomous navigation, and multi‑agent coordination, while exploring solutions for integrating MLLMs into UAV systems. Next, we propose a practical case study focused on the forest fire fighting. To fully reveal the capabilities of the proposed framework, human‑machine interaction, swarm task planning, fire assessment, and task execution are investigated. Finally, we discuss the challenges and future research directions for the MLLMs‑enabled UAV swarm. An experiment illustration video could be found online at https://youtu.be/zwnB9ZSa5A4.
Authors: Yuxiang Wang, Xuecheng Bai, Boyu Hu, Chuanzhi Xu, Haodong Chen, Vera Chung, Tingxue Li, Xiaoming Chen
Abstract: Small object detection in UAV imagery is crucial for applications such as search‑and‑rescue, traffic monitoring, and environmental surveillance, but it is hampered by tiny object size, low signal‑to‑noise ratios, and limited feature extraction. Existing multi‑scale fusion methods help, but add computational burden and blur fine details, making small object detection in cluttered scenes difficult. To overcome these challenges, we propose the Multi‑scale Global‑detail Feature Integration Strategy (MGDFIS), a unified fusion framework that tightly couples global context with local detail to boost detection performance while maintaining efficiency. MGDFIS comprises three synergistic modules: the FusionLock‑TSS Attention Module, which marries token‑statistics self‑attention with DynamicTanh normalization to highlight spectral and spatial cues at minimal cost; the Global‑detail Integration Module, which fuses multi‑scale context via directional convolution and parallel attention while preserving subtle shape and texture variations; and the Dynamic Pixel Attention Module, which generates pixel‑wise weighting maps to rebalance uneven foreground and background distributions and sharpen responses to true object regions. Extensive experiments on the VisDrone benchmark demonstrate that MGDFIS consistently outperforms state‑of‑the‑art methods across diverse backbone architectures and detection frameworks, achieving superior precision and recall with low inference time. By striking an optimal balance between accuracy and resource usage, MGDFIS provides a practical solution for small‑object detection on resource‑constrained UAV platforms.
Authors: Shakil Ahmed, Muhammad Kamran Saeed, Ashfaq Khokhar
Abstract: Quantum communication is poised to become a foundational element of next‑generation networking, offering transformative capabilities in security, entanglement‑based connectivity, and computational offloading. However, the classical OSI model‑designed for deterministic and error‑tolerant systems‑cannot support quantum‑specific phenomena such as coherence fragility, probabilistic entanglement, and the no‑cloning theorem. This paper provides a comprehensive survey and proposes an architectural redesign of the OSI model for quantum networks in the context of 7G. We introduce a Quantum‑Converged OSI stack by extending the classical model with Layer 0 (Quantum Substrate) and Layer 8 (Cognitive Intent), supporting entanglement, teleportation, and semantic orchestration via LLMs and QML. Each layer is redefined to incorporate quantum mechanisms such as enhanced MAC protocols, fidelity‑aware routing, and twin‑based applications. This survey consolidates over 150 research works from IEEE, ACM, MDPI, arXiv, and Web of Science (2018‑2025), classifying them by OSI layer, enabling technologies such as QKD, QEC, PQC, and RIS, and use cases such as satellite QKD, UAV swarms, and quantum IoT. A taxonomy of cross‑layer enablers‑such as hybrid quantum‑classical control, metadata‑driven orchestration, and blockchain‑integrated quantum trust‑is provided, along with simulation tools including NetSquid, QuNetSim, and QuISP. We present several domain‑specific applications, including quantum healthcare telemetry, entangled vehicular networks, and satellite mesh overlays. An evaluation framework is proposed based on entropy throughput, coherence latency, and entanglement fidelity. Key future directions include programmable quantum stacks, digital twins, and AI‑defined QNet agents, laying the groundwork for a scalable, intelligent, and quantum‑compliant OSI framework for 7G and beyond.
Authors: Andrew P. Berg, Qian Zhang, Mia Y. Wang
Abstract: As unmanned aerial vehicles (UAVs) become increasingly prevalent in both consumer and defense applications, the need for reliable, modality‑specific classification systems grows in urgency. This paper addresses the challenge of data scarcity in UAV audio classification by expanding on prior work through the integration of pre‑trained deep learning models, parameter‑efficient fine‑tuning (PEFT) strategies, and targeted data augmentation techniques. Using a custom dataset of 3,100 UAV audio clips (15,500 seconds) spanning 31 distinct drone types, we evaluate the performance of transformer‑based and convolutional neural network (CNN) architectures under various fine‑tuning configurations. Experiments were conducted with five‑fold cross‑validation, assessing accuracy, training efficiency, and robustness. Results show that full fine‑tuning of the EfficientNet‑B0 model with three augmentations achieved the highest validation accuracy (95.95), outperforming both the custom CNN and transformer‑based models like AST. These findings suggest that combining lightweight architectures with PEFT and well‑chosen augmentations provides an effective strategy for UAV audio classification on limited datasets. Future work will extend this framework to multimodal UAV classification using visual and radar telemetry.
Authors: Yuhang Zhang, Haosheng Yu, Jiaping Xiao, Mir Feroskhan
Abstract: Vision‑and‑language navigation (VLN) is a long‑standing challenge in autonomous robotics, aiming to empower agents with the ability to follow human instructions while navigating complex environments. Two key bottlenecks remain in this field: generalization to out‑of‑distribution environments and reliance on fixed discrete action spaces. To address these challenges, we propose Vision‑Language Fly (VLFly), a framework tailored for Unmanned Aerial Vehicles (UAVs) to execute language‑guided flight. Without the requirement for localization or active ranging sensors, VLFly outputs continuous velocity commands purely from egocentric observations captured by an onboard monocular camera. The VLFly integrates three modules: an instruction encoder based on a large language model (LLM) that reformulates high‑level language into structured prompts, a goal retriever powered by a vision‑language model (VLM) that matches these prompts to goal images via vision‑language similarity, and a waypoint planner that generates executable trajectories for real‑time UAV control. VLFly is evaluated across diverse simulation environments without additional fine‑tuning and consistently outperforms all baselines. Moreover, real‑world VLN tasks in indoor and outdoor environments under direct and indirect instructions demonstrate that VLFly achieves robust open‑vocabulary goal understanding and generalized navigation capabilities, even in the presence of abstract language input.
Authors: Sharad Shrestha, Mohammed Ababneh, Satyajayant Misra, Henry M. Cathey,, Roopa Vishwanathan, Matt Jansen, Jinhong Choi, Rakesh Bobba, Yeongjin Jang
Abstract: In the last decade, the rapid growth of Unmanned Aircraft Systems (UAS) and Unmanned Aircraft Vehicles (UAV) in communication, defense, and transportation has increased. The application of UAS will continue to increase rapidly. This has led researchers to examine security vulnerabilities in various facets of UAS infrastructure and UAVs, which form a part of the UAS system to reinforce these critical systems. This survey summarizes the cybersecurity vulnerabilities in several phases of UAV deployment, the likelihood of each vulnerability's occurrence, the impact of attacks, and mitigation strategies that could be applied. We go beyond the state‑of‑the‑art by taking a comprehensive approach to enhancing UAS security by performing an analysis of both UAS‑specific and non‑UAS‑specific mitigation strategies that are applicable within the UAS domain to define the lessons learned. We also present relevant cybersecurity standards and their recommendations in the UAS context. Despite the significant literature in UAS security and the relevance of cyberphysical and networked systems security approaches from the past, which we identify in the survey, we find several critical research gaps that require further investigation. These form part of our discussions and recommendations for the future exploration by our research community.
Authors: Xiangkai Zhang, Xiang Zhou, Mao Chen, Yuchen Lu, Xu Yang, Zhiyong Liu
Abstract: Absolute localization, aiming to determine an agent's location with respect to a global reference, is crucial for unmanned aerial vehicles (UAVs) in various applications, but it becomes challenging when global navigation satellite system (GNSS) signals are unavailable. Vision‑based absolute localization methods, which locate the current view of the UAV in a reference satellite map to estimate its position, have become popular in GNSS‑denied scenarios. However, existing methods mostly rely on traditional and low‑level image matching, suffering from difficulties due to significant differences introduced by cross‑source discrepancies and temporal variations. To overcome these limitations, in this paper, we introduce a hierarchical cross‑source image matching method designed for UAV absolute localization, which integrates a semantic‑aware and structure‑constrained coarse matching module with a lightweight fine‑grained matching module. Specifically, in the coarse matching module, semantic features derived from a vision foundation model first establish region‑level correspondences under semantic and structural constraints. Then, the fine‑grained matching module is applied to extract fine features and establish pixel‑level correspondences. Building upon this, a UAV absolute visual localization pipeline is constructed without any reliance on relative localization techniques, mainly by employing an image retrieval module before the proposed hierarchical image matching modules. Experimental evaluations on public benchmark datasets and a newly introduced CS‑UAV dataset demonstrate superior accuracy and robustness of the proposed method under various challenging conditions, confirming its effectiveness.
Authors: Huan Lin, Chenguang Zhu, Lianghui Ding, Lin Wang, Feng Yang
Abstract: Unmanned aerial vehicle (UAV) swarm networks leverage resilient algorithms to restore connectivity from communication network split issues. However, existing graph learning‑based approaches face over‑aggregation and non‑convergence problems caused by uneven and sparse topology under massive damage. In this paper, we propose a novel Multi‑Level Damage‑Aware (MLDA) Graph Learning algorithm to generate recovery solutions, explicitly utilizing information about destroyed nodes to guide the recovery process. The algorithm first employs a Multi‑Branch Damage Attention (MBDA) module as a pre‑processing step, focusing attention on the critical relationships between remaining nodes and destroyed nodes in the global topology. By expanding multi‑hop neighbor receptive fields of nodes to those damaged areas, it effectively mitigating the initial sparsity and unevenness before graph learning commences. Second, a Dilated Graph Convolution Network (DGCN) is designed to perform convolution on the MBDA‑processed bipartite graphs between remaining and destroyed nodes. The DGCN utilizes a specialized bipartite graph convolution operation to aggregate features and incorporates a residual‑connected architecture to extend depth, directly generating the target locations for recovery. We theoretically proved the convergence of the proposed algorithm and the computational complexity is acceptable. Simulation results show that the proposed algorithm can guarantee the connectivity restoration with excellent scalability, while significantly expediting the recovery time and improving the topology uniformity after recovery.
Authors: Haoran Peng, Ying-Jun Angela Zhang
Abstract: This research focuses on optimizing multi‑UAV systems with dual objectives: maximizing service coverage as the primary goal while extending battery lifetime as the secondary objective. We propose a Graph Attention‑based Decentralized Actor‑Critic (GADC) to optimize the dual objectives. The proposed approach leverages a graph attention network to process UAVs' limited local observation and reduce the dimension of the environment states. Subsequently, an actor‑double‑critic network is developed to manage dual policies for joint objective optimization. The proposed GADC uses a Kullback‑Leibler (KL) divergence factor to balance the tradeoff between coverage performance and battery lifetime in the multi‑UAV system. We assess the scalability and efficiency of GADC through comprehensive benchmarking against state‑of‑the‑art methods, considering both theory and experimental aspects. Extensive testing in both ideal settings and NVIDIA Sionna's realistic ray tracing environment demonstrates GADC's superior performance.
Authors: Antonio Calagna, Yenchia Yu, Paolo Giaccone, Carla Fabiana Chiasserini
Abstract: Stateful migration has emerged as the dominant technology to support microservice mobility at the network edge while ensuring a satisfying experience to mobile end users. This work addresses two pivotal challenges, namely, the implementation and the orchestration of the migration process. We first introduce a novel framework that efficiently implements stateful migration and effectively orchestrates the migration process by fulfilling both network and application KPI targets. Through experimental validation using realistic microservices, we then show that our solution (i) greatly improves migration performance, yielding up to 77% decrease of the migration downtime with respect to the state of the art, and (ii) successfully addresses the strict user QoE requirements of critical scenarios featuring latency‑sensitive microservices. Further, we consider two practical use cases, featuring, respectively, a UAV autopilot microservice and a multi‑object tracking task, and demonstrate how our framework outperforms current state‑of‑the‑art approaches in configuring the migration process and in meeting KPI targets.
Authors: Guiyang Luo, Jinglin Li, Qixun Zhang, Zhiyong Feng, Quan Yuan, Yijing Lin, Hui Zhang, Nan Cheng, Ping Zhang
Abstract: The low‑altitude economy (LAE) is rapidly advancing toward intelligence, connectivity, and coordination, bringing new challenges in dynamic airspace management, unmanned aerial vehicle (UAV) operation, and security management. Existing systems remain fragmented and lack effective coordination. To bridge these gaps, we propose UTICN (Ubiquitous and Trusted Intelligent Cellular‑native Network) for LAE, a unified cellular‑native architecture that integrates multi‑domain sensing, high‑precision positioning, intelligent aircraft‑to‑everything communication, dynamic airspace management, and UAV operational services. UTICN introduces key technologies such as integrated sensing and communication (ISAC), passive and active positioning, intelligent machine communication, swarm coordination, and control‑data decoupled management frameworks. We demonstrate UTICN's feasibility through two use cases, i.e., a city‑level LAE management platform and a multi‑frequency collaborative ISAC system. This work provides a fundamental reference for building a unified operational foundation and airspace management architecture for the LAE.
Authors: Yanwei Gong, Junchao Fan, Ruichen Zhang, Dusit Niyato, Yingying Yao, Xiaolin Chang
Abstract: The rapid growth of the low‑altitude economy has driven the widespread adoption of unmanned aerial vehicles (UAVs). This growing deployment presents new challenges for UAV trajectory planning in complex urban environments. However, existing studies often overlook key factors, such as urban airspace constraints and economic efficiency, which are essential in low‑altitude economy contexts. Deep reinforcement learning (DRL) is regarded as a promising solution to these issues, while its practical adoption remains limited by low learning efficiency. To overcome this limitation, we propose a novel UAV trajectory planning framework that combines DRL with large language model (LLM) reasoning to enable safe, compliant, and economically viable path planning. Experimental results demonstrate that our method significantly outperforms existing baselines across multiple metrics, including data collection rate, collision avoidance, successful landing, regulatory compliance, and energy efficiency. These results validate the effectiveness of our approach in addressing UAV trajectory planning key challenges under constraints of the low‑altitude economy networking.
Authors: Ji Hyuk Jung, Mi Yeon Hong, Ji Won Yoon
Abstract: Recently, approaches using Deep Reinforcement Learning (DRL) have been proposed to solve UAV navigation systems in complex and unknown environments. However, despite extensive research and attention, systematic studies on various security aspects have not yet been conducted. Therefore, in this paper, we conduct research on security vulnerabilities in DRL‑based navigation systems, particularly focusing on GPS spoofing attacks against the system. Many recent basic DRL‑based navigation systems fundamentally share an efficient structure. This paper presents an attack model that operates through GPS spoofing attacks briefly modeling the range of spoofing attack against EKF sensor fusion of PX4 autopilot, and combine this with the DRL‑based system to design attack scenarios that are closer to reality. Finally, this paper experimentally demonstrated that attacks are possible both in the basic DRL system and in attack models combining the DRL system with PX4 autopilot system.
Authors: Matteo Bordin, Madhukara S. Holla, Sakthivel Velumani, Salvatore D'Oro, Tommaso Melodia
Abstract: The application of small‑factor, 5G‑enabled Unmanned Aerial Vehicles (UAVs) has recently gained significant interest in various aerial and Industry 4.0 applications. However, ensuring reliable, high‑throughput, and low‑latency 5G communication in aerial applications remains a critical and underexplored problem. This paper presents the 5th generation (5G) Aero, a compact UAV optimized for 5G connectivity, aimed at fulfilling stringent 3rd Generation Partnership Project (3GPP) requirements. We conduct a set of experiments in an indoor environment, evaluating the UAV's ability to establish high‑throughput, low‑latency communications in both Line‑of‑Sight (LoS) and Non‑Line‑of‑Sight (NLoS) conditions. Our findings demonstrate that the 5G Aero meets the required 3GPP standards for Command and Control (C2) packets latency in both LoS and NLoS, and video latency in LoS communications and it maintains acceptable latency levels for video transmission in NLoS conditions. Additionally, we show that the 5G module installed on the UAV introduces a negligible 1% decrease in flight time, showing that 5G technologies can be integrated into commercial off‑the‑shelf UAVs with minimal impact on battery lifetime. This paper contributes to the literature by demonstrating the practical capabilities of current 5G networks to support advanced UAV operations in telecommunications, offering insights into potential enhancements and optimizations for UAV performance in 5G networks
Authors: Ranjan Sapkota, Konstantinos I. Roumeliotis, Manoj Karkee
Abstract: Agentic UAVs represent a new frontier in autonomous aerial intelligence, integrating perception, decision‑making, memory, and collaborative planning to operate adaptively in complex, real‑world environments. Driven by recent advances in Agentic AI, these systems surpass traditional UAVs by exhibiting goal‑driven behavior, contextual reasoning, and interactive autonomy. We provide a comprehensive foundation for understanding the architectural components and enabling technologies that distinguish Agentic UAVs from traditional autonomous UAVs. Furthermore, a detailed comparative analysis highlights advancements in autonomy with AI agents, learning, and mission flexibility. This study explores seven high‑impact application domains precision agriculture, construction & mining, disaster response, environmental monitoring, infrastructure inspection, logistics, security, and wildlife conservation, illustrating the broad societal value of agentic aerial intelligence. Furthermore, we identify key challenges in technical constraints, regulatory limitations, and data‑model reliability, and we present emerging solutions across hardware innovation, learning architectures, and human‑AI interaction. Finally, a future roadmap is proposed, outlining pathways toward self‑evolving aerial ecosystems, system‑level collaboration, and sustainable, equitable deployments. This survey establishes a foundational framework for the future development, deployment, and governance of agentic aerial systems (Agentic UAVs) across diverse societal and industrial domains.
Authors: Yian Zhu, Ziye Jia, Lei Zhang, Yao Wu, Qiuming Zhu, Qihui Wu
Abstract: The remote identification (Remote ID) broadcast capability allows unmanned aerial vehicles (UAVs) to exchange messages, which is a pivotal technology for inter‑UAV communications. Although this capability enhances the operational visibility, low delay in Remote ID‑based communications is critical for ensuring the efficiency and timeliness of multi‑UAV operations in dynamic environments. To address this challenge, we first establish delay models for Remote ID communications by considering packet reception and collisions across both BLE 4 and Wi‑Fi protocols. Building upon these models, we formulate an optimization problem to minimize the long‑term communication delay through adaptive protocol selection. Since the delay performance varies with the UAV density, we propose an adaptive BLE/Wi‑Fi switching algorithm based on the multi‑agent deep Q‑network approach. Experimental results demonstrate that in dynamic‑density scenarios, our strategy achieves 32.1% and 37.7% lower latency compared to static BLE 4 and Wi‑Fi modes respectively.
Authors: Yuyang Zhou, Guang Cheng, Kang Du, Zihan Chen, Tian Qin, Yuyu Zhao
Abstract: The proliferation of UAVs has enabled a wide range of mission‑critical applications and is becoming a cornerstone of low‑altitude networks, supporting smart cities, emergency response, and more. However, the open wireless environment, dynamic topology, and resource constraints of UAVs expose low‑altitude networks to severe DoS threats. Traditional defense approaches, which rely on fixed configurations or centralized decision‑making, cannot effectively respond to the rapidly changing conditions in UAV swarm environments. To address these challenges, we propose a novel federated multi‑agent deep reinforcement learning (FMADRL)‑driven moving target defense (MTD) framework for proactive DoS mitigation in low‑altitude networks. Specifically, we design lightweight and coordinated MTD mechanisms, including leader switching, route mutation, and frequency hopping, to disrupt attacker efforts and enhance network resilience. The defense problem is formulated as a multi‑agent partially observable Markov decision process, capturing the uncertain nature of UAV swarms under attack. Each UAV is equipped with a policy agent that autonomously selects MTD actions based on partial observations and local experiences. By employing a policy gradient‑based algorithm, UAVs collaboratively optimize their policies via reward‑weighted aggregation. Extensive simulations demonstrate that our approach significantly outperforms state‑of‑the‑art baselines, achieving up to a 34.6% improvement in attack mitigation rate, a reduction in average recovery time of up to 94.6%, and decreases in energy consumption and defense cost by as much as 29.3% and 98.3%, respectively, under various DoS attack strategies. These results highlight the potential of intelligent, distributed defense mechanisms to protect low‑altitude networks, paving the way for reliable and scalable low‑altitude economy.
Authors: Aditya Chakravarty
Abstract: Diffusion models have shown remarkable flexibility for solving inverse problems without task‑specific retraining. However, existing approaches such as Manifold Preserving Guided Diffusion (MPGD) apply only a single gradient update per denoising step, limiting restoration fidelity and robustness, especially in embedded or out‑of‑distribution settings. In this work, we introduce a multistep optimization strategy within each denoising timestep, significantly enhancing image quality, perceptual accuracy, and generalization. Our experiments on super‑resolution and Gaussian deblurring demonstrate that increasing the number of gradient updates per step improves LPIPS and PSNR with minimal latency overhead. Notably, we validate this approach on a Jetson Orin Nano using degraded ImageNet and a UAV dataset, showing that MPGD, originally trained on face datasets, generalizes effectively to natural and aerial scenes. Our findings highlight MPGD's potential as a lightweight, plug‑and‑play restoration module for real‑time visual perception in embodied AI agents such as drones and mobile robots.
Authors: Shahid Mohammad Mulla, Aryan Kanakapudi, Lakshmi Narasimhan, Anuj Tiwari
Abstract: Navigation of UAVs in unknown environments with obstacles is essential for applications in disaster response and infrastructure monitoring. However, existing obstacle avoidance algorithms, such as Artificial Potential Field (APF) are unable to generalize across environments with different obstacle configurations. Furthermore, the precise location of the final target may not be available in applications such as search and rescue, in which case approaches such as RF source seeking can be used to align towards the target location. This paper proposes a real‑time trajectory planning method, which involves real‑time adaptation of APF through a sampling‑based approach. The proposed approach utilizes only the bearing angle of the target without its precise location, and adjusts the potential field parameters according to the environment with new obstacle configurations in real time. The main contributions of the article are i) an RF source seeking algorithm to provide a bearing angle estimate using RF signal calculations based on antenna placement, and ii) a modified APF for adaptable collision avoidance in changing environments, which are evaluated separately in the simulation software Gazebo, using ROS2 for communication. Simulation results show that the RF source‑seeking algorithm achieves high accuracy, with an average angular error of just 1.48 degrees, and with this estimate, the proposed navigation algorithm improves the success rate of reaching the target by 46% and reduces the trajectory length by 1.2% compared to standard potential fields.
Authors: Zijiang Yan, Hao Zhou, Jianhua Pei, Hina Tabassum
Abstract: Unmanned aerial vehicles (UAVs) have been widely adopted in various real‑world applications. However, the control and optimization of multi‑UAV systems remain a significant challenge, particularly in dynamic and constrained environments. This work explores the joint motion and communication control of multiple UAVs operating within integrated terrestrial and non‑terrestrial networks that include high‑altitude platform stations (HAPS). Specifically, we consider an aerial highway scenario in which UAVs must accelerate, decelerate, and change lanes to avoid collisions and maintain overall traffic flow. Different from existing studies, we propose a novel hierarchical and collaborative method based on large language models (LLMs). In our approach, an LLM deployed on the HAPS performs UAV access control, while another LLM onboard each UAV handles motion planning and control. This LLM‑based framework leverages the rich knowledge embedded in pre‑trained models to enable both high‑level strategic planning and low‑level tactical decisions. This knowledge‑driven paradigm holds great potential for the development of next‑generation 3D aerial highway systems. Experimental results demonstrate that our proposed collaborative LLM‑based method achieves higher system rewards, lower operational costs, and significantly reduced UAV collision rates compared to baseline approaches.
Authors: Kaiyuan Chen, Wanpeng Zhao, Yongxi Liu, Yuanqing Xia, Wannian Liang, Shuo Wang
Abstract: In post‑disaster scenarios, rapid and efficient delivery of medical resources is critical and challenging due to severe damage to infrastructure. To provide an optimized solution, we propose a cooperative trajectory optimization and task allocation framework leveraging unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs). This study integrates a Genetic Algorithm (GA) for efficient task allocation among multiple UAVs and UGVs, and employs an informed‑RRT (Rapidly‑exploring Random Tree Star) algorithm for collision‑free trajectory generation. Further optimization of task sequencing and path efficiency is conducted using Covariance Matrix Adaptation Evolution Strategy (CMA‑ES). Simulation experiments conducted in a realistic post‑disaster environment demonstrate that our proposed approach significantly improves the overall efficiency of medical rescue operations compared to traditional strategies. Specifically, our method reduces the total mission completion time to 26.7 minutes for a 15‑task scenario, outperforming K‑Means clustering and random allocation by over 73%. Furthermore, the framework achieves a substantial 15.1% reduction in total traveled distance after CMA‑ES optimization. The cooperative utilization of UAVs and UGVs effectively balances their complementary advantages, highlighting the system's scalability and practicality for real‑world deployment.
Authors: Kaiyuan Chen, Yuhan Suo, Shaowei Cui, Yuanqing Xia, Wannian Liang, Shuo Wang
Abstract: This paper addresses the problem of trajectory optimization for unmanned aerial vehicles (UAVs) performing time‑sensitive medical deliveries in urban environments. Specifically, we consider a single UAV with 3 degree‑of‑freedom dynamics tasked with delivering blood packages to multiple hospitals, each with a predefined time window and priority. Mission objectives are encoded using Signal Temporal Logic (STL), enabling the formal specification of spatial‑temporal constraints. To ensure safety, city buildings are modeled as 3D convex obstacles, and obstacle avoidance is handled through a Convex Feasible Set (CFS) method. The entire planning problem‑combining UAV dynamics, STL satisfaction, and collision avoidance‑is formulated as a convex optimization problem that ensures tractability and can be solved efficiently using standard convex programming techniques. Simulation results demonstrate that the proposed method generates dynamically feasible, collision‑free trajectories that satisfy temporal mission goals, providing a scalable and reliable approach for autonomous UAV‑based medical logistics.
Authors: Mohit Arora, Pratyush Shukla, Shivali Chopra
Abstract: Unmanned Aerial Vehicles (UAVs) are one of the most revolutionary inventions of 21st century. At the core of a UAV lies the central processing system that uses wireless signals to control their movement. The most popular UAVs are quadcopters that use a set of four motors, arranged as two on either side with opposite spin. An autonomous UAV is called a drone. Drones have been in service in the US army since the 90's for covert missions critical to national security. It would not be wrong to claim that drones make up an integral part of the national security and provide the most valuable service during surveillance operations. While UAVs are controlled using wireless signals, there reside some challenges that disrupt the operation of such vehicles such as signal quality and range, real time processing, human expertise, robust hardware and data security. These challenges can be solved by programming UAVs to be autonomous, using object detection and tracking, through Computer Vision algorithms. Computer Vision is an interdisciplinary field that seeks the use of deep learning to gain a high‑level understanding of digital images and videos for the purpose of automating the task of human visual system. Using computer vision, algorithms for detecting and tracking various objects can be developed suitable to the hardware so as to allow real time processing for immediate judgement. This paper attempts to review the various approaches several authors have proposed for the purpose of autonomous navigation of UAVs by through various algorithms of object detection and tracking in real time, for the purpose of applications in various fields such as disaster management, dense area exploration, traffic vehicle surveillance etc.
Authors: Qianli Dong, Xuebo Zhang, Shiyong Zhang, Ziyu Wang, Zhe Ma, Haobo Xi
Abstract: Efficient autonomous exploration in large‑scale environments remains challenging due to the high planning computational cost and low‑speed maneuvers. In this paper, we propose a fast and computationally efficient dual‑layer exploration planning method. The insight of our dual‑layer method is efficiently finding an acceptable long‑term region routing and greedily exploring the target in the region of the first routing area with high speed. Specifically, the proposed method finds the long‑term area routing through an approximate algorithm to ensure real‑time planning in large‑scale environments. Then, the viewpoint in the first routing region with the lowest curvature‑penalized cost, which can effectively reduce decelerations caused by sharp turn motions, will be chosen as the next exploration target. To further speed up the exploration, we adopt an aggressive and safe exploration‑oriented trajectory to enhance exploration continuity. The proposed method is compared to state‑of‑the‑art methods in challenging simulation environments. The results show that the proposed method outperforms other methods in terms of exploration efficiency, computational cost, and trajectory speed. We also conduct real‑world experiments to validate the effectiveness of the proposed method. The code will be open‑sourced.
Authors: Jaehoon Choi, Dongki Jung, Christopher Maxey, Yonghan Lee, Sungmin Eum, Dinesh Manocha, Heesung Kwon
Abstract: Despite significant advancements in dynamic neural rendering, existing methods fail to address the unique challenges posed by UAV‑captured scenarios, particularly those involving monocular camera setups, top‑down perspective, and multiple small, moving humans, which are not adequately represented in existing datasets. In this work, we introduce UAV4D, a framework for enabling photorealistic rendering for dynamic real‑world scenes captured by UAVs. Specifically, we address the challenge of reconstructing dynamic scenes with multiple moving pedestrians from monocular video data without the need for additional sensors. We use a combination of a 3D foundation model and a human mesh reconstruction model to reconstruct both the scene background and humans. We propose a novel approach to resolve the scene scale ambiguity and place both humans and the scene in world coordinates by identifying human‑scene contact points. Additionally, we exploit the SMPL model and background mesh to initialize Gaussian splats, enabling holistic scene rendering. We evaluated our method on three complex UAV‑captured datasets: VisDrone, Manipal‑UAV, and Okutama‑Action, each with distinct characteristics and 10~50 humans. Our results demonstrate the benefits of our approach over existing methods in novel view synthesis, achieving a 1.5 dB PSNR improvement and superior visual sharpness.
Authors: Mehdi Azarafza, Mojtaba Nayyeri, Faezeh Pasandideh, Steffen Staab, Achim Rettberg
Abstract: Autonomous UAV operation necessitates reliable mathematical reasoning for tasks such as trajectory planning and power management. While traditional flight control relies on hardcoded equations, recent Large Language Models (LLMs) offer potential for more flexible problem‑solving but struggle with reliably selecting and applying correct mathematical formulations and executing precise multi‑step arithmetic. We propose RAG‑UAV, a retrieval‑augmented generation framework designed to improve the mathematical reasoning of several LLMs (including GPT o1/Turbo, Llama‑3.2/3.3, Mistral, and DeepSeek R1) in UAV‑specific contexts by providing access to relevant domain literature. To conduct an initial assessment, we introduce the UAV‑Math‑Bench, a 20‑question problem set of UAV‑centric mathematical problems across four difficulty levels. Our experiments demonstrate that incorporating retrieval substantially increases exact answer accuracy (achieving up to 75% with o1), reduces instances of incorrect formulation selection (from 25% without RAG to 5% with RAG), and decreases numerical errors, reducing Mean Squared Error (MSE) by orders of magnitude for the best‑performing models. This pilot study indicates that RAG can enable general‑purpose LLMs to function as more reliable tools for engineering analysis, although direct real‑time flight control requires further investigation and validation on a larger scale. All benchmark data, questions, and answers are publicly available.
Authors: Diana Nunes, Ricardo Amorim, Pedro Ribeiro, André Coelho, Rui Campos
Abstract: This paper proposes FLUC, a modular framework that integrates open‑source Large Language Models (LLMs) with Unmanned Aerial Vehicle (UAV) autopilot systems to enable autonomous control in Flying Networks (FNs). FLUC translates high‑level natural language commands into executable UAV mission code, bridging the gap between operator intent and UAV behaviour.
FLUC is evaluated using three open‑source LLMs ‑ Qwen 2.5, Gemma 2, and LLaMA 3.2 ‑ across scenarios involving code generation and mission planning. Results show that Qwen 2.5 excels in multi‑step reasoning, Gemma 2 balances accuracy and latency, and LLaMA 3.2 offers faster responses with lower logical coherence. A case study on energy‑aware UAV positioning confirms FLUC's ability to interpret structured prompts and autonomously execute domain‑specific logic, showing its effectiveness in real‑time, mission‑driven control.
Authors: Lei Han, Yitong Guo, Pengfei Yang, Zhiyong Yu, Liang Wang, Quan Wang, Zhiwen Yu
Abstract: Natural disasters have caused significant losses to human society, and the timely and efficient acquisition of post‑disaster environmental information is crucial for the effective implementation of rescue operations. Due to the complexity of post‑disaster environments, existing sensing technologies face challenges such as weak environmental adaptability, insufficient specialized sensing capabilities, and limited practicality of sensing solutions. This paper explores the heterogeneous multi‑agent online autonomous collaborative scheduling algorithm HoAs‑PALN, aimed at achieving efficient collection of post‑disaster environmental information. HoAs‑PALN is realized through adaptive dimensionality reduction in the matching process and local Nash equilibrium game, facilitating autonomous collaboration among time‑dependent UAVs, workers and vehicles to enhance sensing scheduling. (1) In terms of adaptive dimensionality reduction during the matching process, HoAs‑PALN significantly reduces scheduling decision time by transforming a five‑dimensional matching process into two categories of three‑dimensional matching processes; (2) Regarding the local Nash equilibrium game, HoAs‑PALN combines the softmax function to optimize behavior selection probabilities and introduces a local Nash equilibrium determination mechanism to ensure scheduling decision performance. Finally, we conducted detailed experiments based on extensive real‑world and simulated data. Compared with the baselines (GREEDY, K‑WTA, MADL and MARL), HoAs‑PALN improves task completion rates by 64.12%, 46.48%, 16.55%, and 14.03% on average, respectively, while each online scheduling decision takes less than 10 seconds, demonstrating its effectiveness in dynamic post‑disaster environments.
Authors: Zuhao Teng, Qian Dong, Ze Zhang, Shuangyao Huang, Wenzhang Zhang, Jingchen Wang, Ji Li, Xi Chen
Abstract: With the widespread application of Unmanned Aerial Vehicles (UAVs) in domains like military reconnaissance, emergency rescue, and logistics delivery, efficiently planning the shortest flight path has become a critical challenge. Traditional heuristic‑based methods often suffer from the inability to escape from local optima, which limits their effectiveness in finding the shortest path. To address these issues, a novel Improved Grey Wolf Optimizer (IGWO) is presented in this study. The proposed IGWO incorporates an Advanced Cooperative Predation (ACP) and a Lens Opposition‑based Learning Strategy (LOBL) in order to improve the optimization capability of the method. Simulation results show that IGWO ranks first in optimization performance on benchmark functions F1‑F5, F7, and F9‑F12, outperforming all other compared algorithms. Subsequently, IGWO is applied to UAV shortest path planning in various obstacle‑laden environments. Simulation results show that the paths planned by IGWO are, on average, shorter than those planned by GWO, PSO, and WOA by 1.70m, 1.68m, and 2.00m, respectively, across four different maps.
Authors: Collin Hague, Artur Wolek
Abstract: This paper considers the problem of tracking a point of interest (POI) moving along a known trajectory on the ground with an uncrewed aerial vehicle (UAV) modeled as a Dubins vehicle using a line‑of‑sight (LOS) sensor through an urban environment that may occlude the POI. A visibility volume (VV) encodes a time‑varying, three‑dimensional representation of the sensing constraints for a particular POI position. A constant‑altitude, translating, and radially time‑varying circular standoff orbit is then inscribed within the dynamically changing VV centered at the POI position. The time‑varying VV is approximated by placing static VVs along the POI's trajectory using an adaptive metric that restricts the volume change of consecutive VVs to below a specified rate. The time‑varying circular standoff orbit is proven to be feasible for a Dubins vehicle and approximated with a piecewise set of linearly interpolated circular orbits inside the static VVs. A steering controller is derived that drives the UAV to the time‑varying standoff orbit. Numerical simulations and a flight test illustrate the proposed approach.
Authors: Runhan Liu, Hui Ren, Wei Fan
Abstract: As the number of Unmanned Aerial Vehicles (UAVs) operating in low‑altitude airspace continues to increase, non‑cooperative targets pose growing challenges to low‑altitude operations. To address this issue, this paper proposes a multi‑UAV‑tethered netted system as a non‑lethal solution for capturing non‑cooperative targets. To validate the proposed system, we develop mySim, a multibody dynamics‑based UAV simulation environment that integrates high‑precision physics modeling, vision‑based motion tracking, and reinforcement learning‑driven control strategies. In mySim, the spring‑damper model is employed to simulate the dynamic behavior of the tethered net, while the dynamics of the entire system is modeled using multibody dynamics (MBD) to achieve accurate representations of system interactions. The motion of the UAVs and the target are estimated using VINS‑MONO and DETR, and the system autonomously executes the capture strategy through MAPPO. Simulation results demonstrate that mySim accurately simulates dynamics and control of the system, successfully enabling the multi‑UAV‑tethered netted system to capture both non‑propelled and maneuvering non‑cooperative targets. By providing a high‑precision simulation platform that integrates dynamics modeling with perception and learning‑based control, mySim enables efficient testing and optimization of UAV‑based control policies before real‑world deployment. This approach offers significant advantages for simulating complex UAVs coordination tasks and has the potential to be applied to the design of other UAV‑based systems.
Authors: Yousef Emami, Hao Zhou, Miguel Gutierrez Gaitan, Kai Li, Luis Almeida, Zhu Han
Abstract: A public safety Uncrewed Aerial Vehicle (UAV) enhances situational awareness during emergency response. Its agility, mobility optimization, and ability to establish Line‑of‑Sight (LoS) communication make it increasingly important for managing emergencies such as disaster response, search and rescue, and wildfire monitoring. Although Deep Reinforcement Learning (DRL) has been used to optimize UAV navigation and control, its high training complexity, low sample efficiency, and the simulation‑to‑reality gap limit its practicality in public safety applications. Recent advances in Large Language Models (LLMs) present a promising alternative. With strong reasoning and generalization abilities, LLMs can adapt to new tasks through In‑Context Learning (ICL), enabling task adaptation via natural language prompts and example‑based guidance without retraining. Deploying LLMs at the network edge, rather than in the cloud, further reduces latency and preserves data privacy, making them suitable for real‑time, mission‑critical public safety UAVs. This paper proposes integrating LLM‑assisted ICL with public safety UAVs to address key functions such as path planning and velocity control in emergency response. We present a case study on data collection scheduling, demonstrating that the LLM‑assisted ICL framework can significantly reduce packet loss compared to conventional approaches while also mitigating potential jailbreaking vulnerabilities. Finally, we discuss LLM optimizers and outline future research directions. The ICL framework enables adaptive, context‑aware decision‑making for public safety UAVs, offering a lightweight and efficient solution to enhance UAV autonomy and responsiveness in emergencies.
Authors: Chenglou Liu, Yufeng Lu, Fangfang Xie, Tingwei Ji, Yao Zheng
Abstract: As UAV popularity soars, so does the mission planning associated with it. The classical approaches suffer from the triple problems of decoupled of task assignment and path planning, poor real‑time performance and limited adaptability. Aiming at these challenges, this paper proposes a dynamic real‑time multi‑UAV collaborative mission planning algorithm based on Dubins paths under a distributed formation structure. Dubins path with multiple advantages bridges the gap between task assignment and path planning, leading to a coupled solution for mission planning. Then, a series of acceleration techniques, task clustering preprocessing, highly efficient distance cost functions, low‑complexity and less iterative task allocation strategies, are employed to guarantee the real‑time performance of the algorithms. To cope with different emergencies and their simultaneous extremes, real‑time planning of emerging tasks and mission replanning due to the reduction of available UAVs are appropriately handled. Finally, the developed algorithm is comprehensively exemplified and studied through simulations, highlighting that the proposed method only sacrifices 9.57% of the path length, while achieving a speed improvement of 4‑5 orders of magnitude over the simulated annealing method, with a single mission planning of about 0.0003s.
Authors: Mo Tian, Kolappan Chidambaranathan, Md Zubair Ebne Rafique, Neel Desai, Jing Bai, Randy Brost, Daniel Small, David Novick, Julius Yellowhair, Yu Yao
Abstract: On a Concentrated Solar Power (CSP) field, optical errors have significant impacts on the collection efficiency of heliostats. Fast, cost‑effective, labor‑efficient, and non‑intrusive autonomous field inspection remains a challenge. Approaches using imaging drone, i.e., Unmanned Aerial Vehicle (UAV) system integrated with high resolution visible imaging sensors, have been developed to address these challenges; however, these approaches are often limited by insufficient imaging contrast. Here we report a polarimetry‑based method with a polarization imaging system integrated on UAV to enhance imaging contrast for in‑situ detection of heliostat mirrors without interrupting field operation. We developed an optical model for skylight polarization pattern to simulate the polarization images of heliostat mirrors and obtained optimized waypoints for polarimetric imaging drone flight path to capture images with enhanced contrast. The polarimetric imaging‑based method improved the success rate of edge detections in scenarios which were challenging for mirror edge detection with conventional imaging sensors. We have performed field tests to achieve significantly enhanced heliostat edge detection success rate and investigate the feasibility of integrating polarimetric imaging method with existing imaging‑based heliostat inspection methods, i.e., Polarimetric Imaging Heliostat Inspection Method (PIHIM). Our preliminary field test results suggest that the PIHIM hold the promise to enable sufficient imaging contrast for real‑time autonomous imaging and detection of heliostat field, thus suitable for non‑interruptive fast CSP field inspection during its operation.
Authors: Parakh M. Gupta, Ondřej Procházka, Jan Hřebec, Matej Novosad, Robert Pěnička, Martin Saska
Abstract: [Accepted to IROS 2025] In this paper, we address the problem of tracking high‑speed agile trajectories for Unmanned Aerial Vehicles(UAVs), where model inaccuracies can lead to large tracking errors. Existing Nonlinear Model Predictive Controller(NMPC) methods typically neglect the dynamics of the low‑level flight controllers such as underlying PID controller present in many flight stacks, and this results in sub‑optimal tracking performance at high speeds and accelerations. To this end, we propose a novel NMPC formulation, LoL‑NMPC, which explicitly incorporates low‑level controller dynamics and motor dynamics in order to minimize trajectory tracking errors while maintaining computational efficiency. By leveraging linear constraints inside low‑level dynamics, our approach inherently accounts for actuator constraints without requiring additional reallocation strategies. The proposed method is validated in both simulation and real‑world experiments, demonstrating improved tracking accuracy and robustness at speeds up to 98.57 km/h and accelerations of 3.5 g. Our results show an average 21.97 % reduction in trajectory tracking error over standard NMPC formulation, with LoL‑NMPC maintaining real‑time feasibility at 100 Hz on an embedded ARM‑based flight computer.
Authors: Mohammad Fakhruddin Babar, Zain A. H. Hammadeh, Mohammad Hamad, Monowar Hasan
Abstract: Leaking information about the execution behavior of critical real‑time tasks may lead to serious consequences, including violations of temporal constraints and even severe failures. We study information leakage for a special class of real‑time tasks that have two execution modes, namely, typical execution (which invokes the majority of times) and critical execution (to tackle exceptional conditions). The data flow‑driven applications inherit such a multimode execution model. In this paper, we investigate whether a low‑priority "observer" task can infer the execution patterns of a high‑priority "victim" task (especially the critical executions). We develop a new statistical analysis technique and show that by analyzing the response times of the low‑priority task, it becomes possible to extract the execution behavior of the high‑priority task. We test our approach against a random selection technique that arbitrarily classifies a job as critical. We find that correlating the observer's response times with the victim's jobs can result in higher precision in identifying critical invocations compared to a random guess. We conduct extensive evaluations with systemically generated workloads, including a case study using a UAV autopilot (ArduPilot) taskset parameters. We found that our inference algorithm can achieve relatively low false positive rates (less than 25%) with relatively low footprint (1 MB memory and 50 ms timing overhead on a Raspberry Pi 4 platform). We further demonstrate the feasibility of inference on two cyber‑physical platforms: an off‑the‑shelf manufacturing robot and a custom‑built surveillance system.
Authors: Ziye Jia, Can Cui, Chao Dong, Qihui Wu, Zhuang Ling, Dusit Niyato, Zhu Han
Abstract: With an extensive increment of computation demands, the aerial multi‑access edge computing (MEC), mainly based on unmanned aerial vehicles (UAVs) and high altitude platforms (HAPs), plays significant roles in future network scenarios. In detail, UAVs can be flexibly deployed, while HAPs are characterized with large capacity and stability. Hence, in this paper, we provide a hierarchical model composed of an HAP and multi‑UAVs, to provide aerial MEC services. Moreover, considering the errors of channel state information from unpredictable environmental conditions, we formulate the problem to minimize the total energy cost with the chance constraint, which is a mixed‑integer nonlinear problem with uncertain parameters and intractable to solve. To tackle this issue, we optimize the UAV deployment via the weighted K‑means algorithm. Then, the chance constraint is reformulated via the distributionally robust optimization (DRO). Furthermore, based on the conditional value‑at‑risk mechanism, we transform the DRO problem into a mixed‑integer second order cone programming, which is further decomposed into two subproblems via the primal decomposition. Moreover, to alleviate the complexity of the binary subproblem, we design a binary whale optimization algorithm. Finally, we conduct extensive simulations to verify the effectiveness and robustness of the proposed schemes by comparing with baseline mechanisms.
Authors: Ze Zhang, Qian Dong
Abstract: Location information is a fundamental requirement for unmanned aerial vehicles (UAVs) and other wireless sensor networks (WSNs). However, accurately and efficiently localizing sensor nodes with diverse functionalities remains a significant challenge, particularly in a hardware‑constrained environment. To address this issue and enhance the applicability of artificial intelligence (AI), this paper proposes a localization algorithm that does not require additional hardware. Specifically, the angle between a node and the anchor nodes is estimated based on the received signal strength indication (RSSI). A subsequent localization strategy leverages the inferred angular relationships in conjunction with a bounding box. Experimental evaluations in three scenarios with varying number of nodes demonstrate that the proposed method achieves substantial improvements in localization accuracy, reducing the average error by 72.4% compared to the Min‑Max and RSSI‑based DV‑Hop algorithms, respectively.
Authors: Jingke Sun, Liang Yang, Alexandros-Apostolos A. Boulogeorgos, Theodoros A. Tsiftsis, Hongwu Liu
Abstract: To enhance both the sensing and covert communication performance, a dual‑unmanned aerial vehicle (UAV)‑aided scheme is proposed for integrated sensing and communication networks, in which one UAV maneuvers as the aerial dual‑functional base‑station (BS), while another UAV flies as the cooperative jammer. Artificial noise (AN) transmitted by the jamming UAV is utilized not only to confuse the ground warden but also to aid the aerial BS to sense multiple ground targets by combing the target‑echoed dual‑functional waveform and AN components from a perspective of the hybrid monostatitc‑bistatic radar. We employ the distance‑normalized beampattern sum‑gain to measure the sensing performance. To maximize the average covert rate (ACR) from the aerial BS to the ground user, the dual‑functional BS beamforming, jamming UAV beamforming, and dual‑UAV trajectory are co‑designed, subject to transmit power budgets, UAV maneuver constraint, covertness requirement, and sensing performance constraint. The imperfect successive interference cancellation (SIC) effects on the received signal‑to‑interference‑plus‑noise ratio are also considered in maximizing the ACR. To tackle the highly complicated non‑convex ACR maximization problem, dual‑UAV beamforming and dual‑UAV trajectory are optimized in a block coordinate descent way using the trust‑region successive convex approximation and semidefinite relaxation. To find the dual‑UAV maneuver locations suitable for sensing the ground targets, we first optimize the dual‑UAV trajectory for the covert communication purpose only and then solve a weighted distance minimization problem for the covert communication and sensing purpose.
Authors: Huayu Huang, Banglei Guan, Yang Shang, Qifeng Yu
Abstract: Photomechanics is a crucial branch of solid mechanics. The localization of point targets constitutes a fundamental problem in optical experimental mechanics, with extensive applications in various missions of UAVs. Localizing moving targets is crucial for analyzing their motion characteristics and dynamic properties. Reconstructing the trajectories of points from asynchronous cameras is a significant challenge. It encompasses two coupled sub‑problems: trajectory reconstruction and camera synchronization. Present methods typically address only one of these sub‑problems individually. This paper proposes a 3D trajectory reconstruction method for point targets based on asynchronous cameras, simultaneously solving both sub‑problems. Firstly, we extend the trajectory intersection method to asynchronous cameras to resolve the limitation of traditional triangulation that requires camera synchronization. Secondly, we develop models for camera temporal information and target motion, based on imaging mechanisms and target dynamics characteristics. The parameters are optimized simultaneously to achieve trajectory reconstruction without accurate time parameters. Thirdly, we optimize the camera rotations alongside the camera time information and target motion parameters, using tighter and more continuous constraints on moving points. The reconstruction accuracy is significantly improved, especially when the camera rotations are inaccurate. Finally, the simulated and real‑world experimental results demonstrate the feasibility and accuracy of the proposed method. The real‑world results indicate that the proposed algorithm achieved a localization error of 112.95 m at an observation range of 15 ~ 20 km.
Authors: Changyuan Zhao, Ruichen Zhang, Jiacheng Wang, Gaosheng Zhao, Dusit Niyato, Geng Sun, Shiwen Mao, Dong In Kim
Abstract: World models are emerging as a transformative paradigm in artificial intelligence, enabling agents to construct internal representations of their environments for predictive reasoning, planning, and decision‑making. By learning latent dynamics, world models provide a sample‑efficient framework that is especially valuable in data‑constrained or safety‑critical scenarios. In this paper, we present a comprehensive overview of world models, highlighting their architecture, training paradigms, and applications across prediction, generation, planning, and causal reasoning. We compare and distinguish world models from related concepts such as digital twins, the metaverse, and foundation models, clarifying their unique role as embedded cognitive engines for autonomous agents. We further propose Wireless Dreamer, a novel world model‑based reinforcement learning framework tailored for wireless edge intelligence optimization, particularly in low‑altitude wireless networks (LAWNs). Through a weather‑aware UAV trajectory planning case study, we demonstrate the effectiveness of our framework in improving learning efficiency and decision quality.
Authors: Agustín Roca, Gabriel Torre, Juan I. Giribet, Gastón Castro, Leonardo Colombo, Ignacio Mas, Javier Pereira
Abstract: This paper examines the use of Unmanned Aerial Vehicles (UAVs) and deep learning for detecting endangered deer species in their natural habitats. As traditional identification processes require trained manual labor that can be costly in resources and time, there is a need for more efficient solutions. Leveraging high‑resolution aerial imagery, advanced computer vision techniques are applied to automate the identification process of deer across two distinct projects in Buenos Aires, Argentina. The first project, Pantano Project, involves the marsh deer in the Paraná Delta, while the second, WiMoBo, focuses on the Pampas deer in Campos del Tuyú National Park. A tailored algorithm was developed using the YOLO framework, trained on extensive datasets compiled from UAV‑captured images. The findings demonstrate that the algorithm effectively identifies marsh deer with a high degree of accuracy and provides initial insights into its applicability to Pampas deer, albeit with noted limitations. This study not only supports ongoing conservation efforts but also highlights the potential of integrating AI with UAV technology to enhance wildlife monitoring and management practices.
Authors: Agustín Roca, Gastón Castro, Gabriel Torre, Leonardo J. Colombo, Ignacio Mas, Javier Pereira, Juan I. Giribet
Abstract: This study compares the performance of state‑of‑the‑art neural networks including variants of the YOLOv11 and RT‑DETR models for detecting marsh deer in UAV imagery, in scenarios where specimens occupy a very small portion of the image and are occluded by vegetation. We extend previous analysis adding precise segmentation masks for our datasets enabling a fine‑grained training of a YOLO model with a segmentation head included. Experimental results show the effectiveness of incorporating the segmentation head achieving superior detection performance. This work contributes valuable insights for improving UAV‑based wildlife monitoring and conservation strategies through scalable and accurate AI‑driven detection systems.
Authors: Chunjie Wang, Xuhui Zhang, Wenchao Liu, Jinke Ren, Huijun Xing, Shuqiang Wang, Yanyan Shen
Abstract: Emerging as a cornerstone for next‑generation wireless networks, integrated sensing and communication (ISAC) systems demand innovative solutions to balance spectral efficiency and sensing accuracy. In this paper, we propose a coordinated beamforming framework for a reconfigurable intelligent surface (RIS)‑empowered ISAC system, where the active precoding at the dual‑functional base station (DFBS) and the passive beamforming at the RIS are jointly optimized to provide communication services for legitimate unmanned aerial vehicles (UAVs) while sensing the unauthorized UAVs. The sum‑rate of all legitimate UAVs are maximized, while satisfying the radar sensing signal‑to‑noise ratio requirements, the transmit power constraints, and the reflection coefficients of the RIS. To address the inherent non‑convexity from coupled variables, we propose a low‑complexity algorithm integrating fractional programming with alternating optimization, featuring convergence guarantees. Numerical results demonstrate that the proposed algorithm achieves higher data rate compared to disjoint optimization benchmarks. This underscores RIS's pivotal role in harmonizing communication and target sensing functionalities for low‑altitude networks.
Authors: Steve Blandino, Nada Golmie, Anirudha Sahoo, Thao Nguyen, Tanguy Ropitault, David Griffith, Amala Sonny
Abstract: The integration of sensing capabilities into 5G New Radio (5G NR) networks offers an opportunity to enable the detection of airborne objects without the need for dedicated radars. This paper investigates the feasibility of using standardized Positioning Reference Signals (PRS) to detect UAVs in Urban Micro (UMi) and Urban Macro (UMa) propagation environments. A full 5G NR radar processing chain is implemented, including clutter suppression, angle and range estimation, and 3D position reconstruction. Simulation results show that performance strongly depends on the propagation environment. 5G NR radars exhibit the highest missed detection rate, up to 16%, in UMi, due to severe clutter. Positioning error increases with target distance, resulting in larger errors in UMa scenarios and at higher UAV altitudes. In particular, the system achieves a position error within 4m in the UMi environment and within 8m in UMa. The simulation platform has been released as open‑source software to support reproducible research in integrated sensing and communication (ISAC) systems.
Authors: Andrew P. Berg, Qian Zhang, Mia Y. Wang
Abstract: Unmanned aerial vehicle (UAV) usage is expected to surge in the coming decade, raising the need for heightened security measures to prevent airspace violations and security threats. This study investigates deep learning approaches to UAV classification focusing on the key issue of data scarcity. To investigate this we opted to train the models using a total of 4,500 seconds of audio samples, evenly distributed across a 9‑class dataset. We leveraged parameter efficient fine‑tuning (PEFT) and data augmentations to mitigate the data scarcity. This paper implements and compares the use of convolutional neural networks (CNNs) and attention‑based transformers. Our results show that, CNNs outperform transformers by 1‑2% accuracy, while still being more computationally efficient. These early findings, however, point to potential in using transformers models; suggesting that with more data and further optimizations they could outperform CNNs. Future works aims to upscale the dataset to better understand the trade‑offs between these approaches.
Authors: Simón Martínez-Rozas, David Alejo, José Javier Carpio, Fernando Caballero, Luis Merino
Abstract: Unmanned Aerial Vehicles (UAVs) have become essential tools in inspection and emergency response operations due to their high maneuverability and ability to access hard‑to‑reach areas. However, their limited battery life significantly restricts their use in long‑duration missions. This paper presents a tethered marsupial robotic system composed of a UAV and an Unmanned Ground Vehicle (UGV), specifically designed for autonomous, long‑duration inspection tasks in Global Navigation Satellite System (GNSS)‑denied environments. The system extends the UAV's operational time by supplying power through a tether connected to high‑capacity battery packs carried by the UGV. Our work details the hardware architecture based on off‑the‑shelf components to ensure replicability and describes our full‑stack software framework used by the system, which is composed of open‑source components and built upon the Robot Operating System (ROS). The proposed software architecture enables precise localization using a Direct LiDAR Localization (DLL) method and ensures safe path planning and coordinated trajectory tracking for the integrated UGV‑tether‑UAV system. We validate the system through three sets of field experiments involving (i) three manual flight endurance tests to estimate the operational duration, (ii) three experiments for validating the localization and the trajectory tracking systems, and (iii) three executions of an inspection mission to demonstrate autonomous inspection capabilities. The results of the experiments confirm the robustness and autonomy of the system in GNSS‑denied environments. Finally, all experimental data have been made publicly available to support reproducibility and to serve as a common open dataset for benchmarking.
Authors: Jianlin Ye, Savvas Papaioannou, Panayiotis Kolios
Abstract: Path planning is a fundamental capability of autonomous Unmanned Aerial Vehicles (UAVs), enabling them to efficiently navigate toward a target region or explore complex environments while avoiding obstacles. Traditional pathplanning methods, such as Rapidly‑exploring Random Trees (RRT), have proven effective but often encounter significant challenges. These include high search space complexity, suboptimal path quality, and slow convergence, issues that are particularly problematic in high‑stakes applications like disaster response, where rapid and efficient planning is critical. To address these limitations and enhance path‑planning efficiency, we propose Vision Language Model RRT (VLM‑RRT), a hybrid approach that integrates the pattern recognition capabilities of Vision Language Models (VLMs) with the path‑planning strengths of RRT. By leveraging VLMs to provide initial directional guidance based on environmental snapshots, our method biases sampling toward regions more likely to contain feasible paths, significantly improving sampling efficiency and path quality. Extensive quantitative and qualitative experiments with various state‑of‑the‑art VLMs demonstrate the effectiveness of this proposed approach.
Authors: Hanxu Jiang, Haiyue Yu, Xiaotong Xie, Qi Gao, Jiang Jiang, Jianbin Sun
Abstract: Adaptive sampling based on Gaussian process regression (GPR) has already been applied with considerable success to generate boundary test scenarios for multi‑UAV systems (MUS). One of the key techniques in such researches is leveraging the accurate prediction of the MUS performance through GPR in different test scenarios. Due to the potential correlations among the multiple MUS performance metrics, current researches commonly utilize a multi‑output GPR (MOGPR) to model the multiple performance metrics simultaneously. This approach can achieve a more accurate prediction, rather than modeling each metric individually. However, MOGPR still suffers from negative transfer. When the feature of one output variable is incorrectly learned by another, the models training process will be negatively affected, leading to a decline in prediction performance. To solve this problem, this paper proposes a novel adaptive regularization approach into the conventional MOGPR training process. Unlike existing regularization approaches for mitigating negative transfer in MOGPR, our method penalizes the inconsistencies among output‑specific characteristic parameters using adaptively adjustable regularization weights. This mechanism helps each set of output parameters avoid local optima. Consequently, it yields simultaneous improvements in predictive accuracy across all outputs. Finally, we validate our approach on a numerical case and on a boundary test scenario generation case for a MUS multi‑objectives search task.
Authors: Xudong Wang, Jian Zhu, Ruichen Zhang, Lei Feng, Dusit Niyato, Jiacheng Wang, Hongyang Du, Shiwen Mao, Zhu Han
Abstract: Recent advances in large language models (LLMs) have opened new possibilities for automated reasoning and decision‑making in wireless networks. However, applying LLMs to wireless communications presents challenges such as limited capability in handling complex logic, generalization, and reasoning. Chain‑of‑Thought (CoT) prompting, which guides LLMs to generate explicit intermediate reasoning steps, has been shown to significantly improve LLM performance on complex tasks. Inspired by this, this paper explores the application potential of CoT‑enhanced LLMs in wireless communications. Specifically, we first review the fundamental theory of CoT and summarize various types of CoT. We then survey key CoT and LLM techniques relevant to wireless communication and networking. Moreover, we introduce a multi‑layer intent‑driven CoT framework that bridges high‑level user intent expressed in natural language with concrete wireless control actions. Our proposed framework sequentially parses and clusters intent, selects appropriate CoT reasoning modules via reinforcement learning, then generates interpretable control policies for system configuration. Using the unmanned aerial vehicle (UAV) network as a case study, we demonstrate that the proposed framework significantly outperforms a non‑CoT baseline in both communication performance and quality of generated reasoning.
Authors: Haiquan Lu, Yong Zeng, Shaodan Ma, Bin Li, Shi Jin, Rui Zhang
Abstract: Unmanned aerial vehicle (UAV) is regarded as a key enabling platform for low‑altitude economy, due to its advantages such as 3D maneuverability, flexible deployment, and LoS air‑to‑air/ground communication links. In particular, the intrinsic high mobility renders UAV especially suitable for operating as a movable antenna (MA) from the sky. In this paper, by exploiting the flexible mobility of UAV swarm and antenna position adjustment of MA, we propose a novel UAV swarm enabled two‑level MA system, where UAVs not only individually deploy a local MA array, but also form a larger‑scale MA system with their individual MA arrays via swarm coordination. We formulate a general optimization problem to maximize the minimum achievable rate over all ground user equipments (UEs), by jointly optimizing the 3D UAV swarm placement positions, their individual MAs' positions, and receive beamforming for different UEs. To gain useful insights, we first consider the special case where each UAV has only one antenna, under different scenarios of one single UE, two UEs, and arbitrary number of UEs. In particular, for the two‑UE case, we derive the optimal UAV swarm placement positions in closed‑form that achieves IUI‑free communication when the uniform plane wave (UPW) model holds, where the UAV swarm forms a uniform sparse array (USA) satisfying minimum safe distance constraint. While for the general case with arbitrary number of UEs, we propose an efficient alternating optimization algorithm to solve the formulated non‑convex optimization problem. Then, we extend the results to the case where each UAV is equipped with multiple antennas. Numerical results verify that the proposed low‑altitude UAV swarm enabled MA system significantly outperforms various benchmark schemes, thanks to the exploitation of two‑level mobility to create more favorable channel conditions for multi‑UE communications.
Authors: Chi Lu, Yiyang Ni, Zhe Wang, Xiaoli Shi, Jun Li, Shi Jin
Abstract: Decision Transformer (DT) has recently demonstrated strong generalizability in dynamic resource allocation within unmanned aerial vehicle (UAV) networks, compared to conventional deep reinforcement learning (DRL). However, its performance is hindered due to zero‑padding for varying state dimensions, inability to manage long‑term energy constraint, and challenges in acquiring expert samples for few‑shot fine‑tuning in new scenarios. To overcome these limitations, we propose an attention‑enhanced prompt Decision Transformer (APDT) framework to optimize trajectory planning and user scheduling, aiming to minimize the average age of information (AoI) under long‑term energy constraint in UAV‑assisted Internet of Things (IoT) networks. Specifically, we enhance the convenional DT framework by incorporating an attention mechanism to accommodate varying numbers of terrestrial users, introducing a prompt mechanism based on short trajectory demonstrations for rapid adaptation to new scenarios, and designing a token‑assisted method to address the UAV's long‑term energy constraint. The APDT framework is first pre‑trained on offline datasets and then efficiently generalized to new scenarios. Simulations demonstrate that APDT achieves twice faster in terms of convergence rate and reduces average AoI by 8% compared to conventional DT.
Authors: San Jiang, Kan You, Ruqin Zhou, Xing Zhang, Zhijun Wang, Qingquan Li
Abstract: Feature matching dominates the time costs in structure from motion (SfM). The primary contribution of this study is a GPU data schedule algorithm for efficient feature matching of Unmanned aerial vehicle (UAV) images. The core idea is to divide the whole dataset into blocks based on matrix band reduction (MBR) and achieve efficient feature matching via GPU‑accelerated cascade hashing. First, match pairs are selected by using an image retrieval technique, which converts images into global descriptors and searches high‑dimension nearest neighbors with graph indexing. Second, compact image blocks are iteratively generated from a MBR‑based data schedule strategy, which exploits image connections to generate image blocks and increase the usage of GPU computing power. Third, guided by the generated image blocks, feature matching is executed sequentially within the framework of GPU‑accelerated cascade hashing, and initial candidate matches are refined by combining a local geometric constraint and RANSAC‑based global verification. For further performance improvement, these two steps are designed to execute in parallel in GPU and CPU. Finally, the performance of the proposed solution is evaluated by using large‑scale UAV datasets. The results demonstrate that it increases the efficiency of feature matching with speedup ratios ranging from 77.0 to 100.0 compared with KD‑Tree based matching methods due to its high usage of GPU computing power. Besides, it achieves comparable accuracy in both relative and absolute bundle adjustment (BA). The proposed algorithm is an efficient solution for feature matching of large‑scale UAV images.
Authors: Mohamed Benzaghta, Sahar Ammar, David López-Pérez, Basem Shihada, Giovanni Geraci
Abstract: Mobility management in cellular networks faces increasing complexity due to network densification and heterogeneous user mobility characteristics. Traditional handover (HO) mechanisms, which rely on predefined parameters such as A3‑offset and time‑to‑trigger (TTT), often fail to optimize mobility performance across varying speeds and deployment conditions. Fixed A3‑offset and TTT configurations either delay HOs, increasing radio link failures (RLFs), or accelerate them, leading to excessive ping‑pong effects. To address these challenges, we propose two data‑driven mobility management approaches leveraging high‑dimensional Bayesian optimization (HD‑BO) and deep reinforcement learning (DRL). HD‑BO optimizes HO parameters such as A3‑offset and TTT, striking a desired trade‑off between ping‑pongs vs. RLF. DRL provides a non‑parameter‑based approach, allowing an agent to select serving cells based on real‑time network conditions. We validate our approach using a real‑world cellular deployment scenario, and employing Sionna ray tracing for site‑specific channel propagation modeling. Results show that both HD‑BO and DRL outperform 3GPP set‑1 (TTT of 480 ms and A3‑offset of 3 dB) and set‑5 (TTT of 40 ms and A3‑offset of ‑1 dB) benchmarks. We augment HD‑BO with transfer learning so it can generalize across a range of user speeds. Applying the same transfer‑learning strategy to the DRL method reduces its training time by a factor of 2.5 while preserving optimal HO performance, showing that it adapts efficiently to the mobility of aerial users such as UAVs. Simulations further reveal that HD‑BO remains more sample‑efficient than DRL, making it more suitable for scenarios with limited training data.
Authors: Weihang Liu, Yuhui Zhong, Yuke Li, Xi Chen, Jiadi Cui, Honglong Zhang, Lan Xu, Xin Lou, Yujiao Shi, Jingyi Yu, Yingliang Zhang
Abstract: Accurate and efficient modeling of large‑scale urban scenes is critical for applications such as AR navigation, UAV based inspection, and smart city digital twins. While aerial imagery offers broad coverage and complements limitations of ground‑based data, reconstructing city‑scale environments from such views remains challenging due to occlusions, incomplete geometry, and high memory demands. Recent advances like 3D Gaussian Splatting (3DGS) improve scalability and visual quality but remain limited by dense primitive usage, long training times, and poor suit ability for edge devices. We propose CityGo, a hybrid framework that combines textured proxy geometry with residual and surrounding 3D Gaussians for lightweight, photorealistic rendering of urban scenes from aerial perspectives. Our approach first extracts compact building proxy meshes from MVS point clouds, then uses zero order SH Gaussians to generate occlusion‑free textures via image‑based rendering and back‑projection. To capture high‑frequency details, we introduce residual Gaussians placed based on proxy‑photo discrepancies and guided by depth priors. Broader urban context is represented by surrounding Gaussians, with importance‑aware downsampling applied to non‑critical regions to reduce redundancy. A tailored optimization strategy jointly refines proxy textures and Gaussian parameters, enabling real‑time rendering of complex urban scenes on mobile GPUs with significantly reduced training and memory requirements. Extensive experiments on real‑world aerial datasets demonstrate that our hybrid representation significantly reduces training time, achieving on average 1.4x speedup, while delivering comparable visual fidelity to pure 3D Gaussian Splatting approaches. Furthermore, CityGo enables real‑time rendering of large‑scale urban scenes on mobile consumer GPUs, with substantially reduced memory usage and energy consumption.
Authors: Julio de la Torre-Vanegas, Miguel Soriano-Garcia, Israel Becerra, Diego Mercado-Ravell
Abstract: Landing safely in crowded urban environments remains an essential yet challenging endeavor for Unmanned Aerial Vehicles (UAVs), especially in emergency situations. In this work, we propose a risk‑aware approach that harnesses semantic segmentation to continuously evaluate potential hazards in the drone's field of view. By using a specialized deep neural network to assign pixel‑level risk values and applying an algorithm based on risk maps, our method adaptively identifies a stable Safe Landing Zone (SLZ) despite moving critical obstacles such as vehicles, people, etc., and other visual challenges like shifting illumination. A control system then guides the UAV toward this low‑risk region, employing altitude‑dependent safety thresholds and temporal landing point stabilization to ensure robust descent trajectories. Experimental validation in diverse urban environments demonstrates the effectiveness of our approach, achieving over 90% landing success rates in very challenging real scenarios, showing significant improvements in various risk metrics. Our findings suggest that risk‑oriented vision methods can effectively help reduce the risk of accidents in emergency landing situations, particularly in complex, unstructured, urban scenarios, densely populated with moving risky obstacles, while potentiating the true capabilities of UAVs in complex urban operations.
Authors: Taimoor Ahmad
Abstract: The integration of unmanned aerial vehicles (UAVs) into smart agriculture has enabled real‑time monitoring, data collection, and automated farming operations. However, the high mobility, decentralized nature, and low‑power communication of UAVs pose significant security challenges, particularly in ensuring transaction integrity and trust. This paper presents a quantum‑resilient blockchain framework designed to secure data and resource transactions in UAV‑assisted smart agriculture networks. The proposed solution incorporates post‑quantum cryptographic primitives‑specifically lattice‑based digital signatures and key encapsulation mechanisms to achieve tamper‑proof, low‑latency consensus without relying on traditional computationally intensive proof‑of‑work schemes. A lightweight consensus protocol tailored for UAV communication constraints is developed, and transaction validation is handled through a trust‑ranked, multi‑layer ledger maintained by edge nodes. Experimental results from simulations using NS‑3 and custom blockchain testbeds show that the framework outperforms existing schemes in terms of transaction throughput, energy efficiency, and resistance to quantum attacks. The proposed system provides a scalable, secure, and sustainable solution for precision agriculture, enabling trusted automation and resilient data sharing in post‑quantum eras.
Authors: Fahrettin Emin Tiras, Hayriye Serra Altinoluk
Abstract: Radio Frequency (RF) fingerprinting offers a promising approach for drone identification and security, although it suffers from significant performance degradation when operating on different transmission channels. This paper presents CrossRF, a domain‑invariant deep learning approach that addresses the problem of cross‑channel RF fingerprinting for Unmanned Aerial Vehicle (UAV) identification. Our approach aims to minimize the domain gap between different RF channels by using adversarial learning to train a more robust model that maintains consistent identification performance despite channel variations. We validate our approach using the UAVSig dataset, comprising real‑world over‑the‑air RF signals from identical drone models operating across several frequency channels, ensuring that the findings correspond to real‑world scenarios. The experimental results show CrossRF's efficiency, achieving up to 99.03% accuracy when adapting from Channel 3 to Channel 4, compared to only 26.39% using conventional methods. The model maintains robust performance in more difficult multi‑channel scenarios (87.57% accuracy adapting from Channels 1,3 to 2,4) and achieves 89.45% accuracy with 0.9 precision for controller classification. These results confirm CrossRF's ability to significantly reduce performance degradation due to cross‑channel variations while maintaining high identification accuracy with minimal training data requirements, making it particularly suitable for practical drone security applications.
Authors: Siddhanta Parial, Sasthi C. Ghosh, Anil K. Ghosh
Abstract: In unmanned aerial vehicle (UAV) assisted millimeter wave (mmWave) communication, appropriate user‑UAV association is crucial for improving system performance. In mmWave communication, user throughput largely depends on the line of sight (LoS) connectivity with the UAV, which in turn depends on the mobility pattern of the users. Moreover, different traffic types like enhanced mobile broadband (eMBB) and ultra reliable low latency communication (URLLC) may require different types of LoS connectivity. Existing user‑UAV association policies do not consider the user mobility during a time interval and different LoS requirements of different traffic types. In this paper, we consider both of them and develop a user association policy in the presence of building blockages. First, considering a simplified scenario, we have analytically established the LoS area, which is the region where users will experience seamless LoS connectivity for eMBB traffic, and the LoS radius, which is the radius of the largest circle within which the user gets uninterrupted LoS services for URLLC traffic. Then, for a more complex scenario, we present a geometric shadow polygon‑based method to compute LoS area and LoS radius. Finally, we associate eMBB and URLLC users, with the UAVs from which they get the maximum average throughput based on LoS area and maximum LoS radius respectively. We show that our approach outperforms the existing discretization based and maximum throughput based approaches.
Authors: Minghao Lu, Xiyu Fan, Bowen Xu, Zexuan Yan, Rui Peng, Han Chen, Lixian Zhang, Peng Lu
Abstract: High‑speed obstacle avoidance of uncrewed aerial vehicles (UAVs) in cluttered environments is a significant challenge. Existing UAV planning and obstacle avoidance systems can only fly at moderate speeds or at high speeds over empty or sparse fields. In this article, we propose a hyper‑efficient perception and planning system for the high‑speed obstacle avoidance of UAVs. The system mainly consists of three modules: 1) A novel incremental robocentric mapping method with distance and gradient information, which takes 89.5% less time compared to existing methods. 2) A novel obstacle‑aware topological path search method that generates multiple distinct paths. 3) An adaptive gradient‑based high‑speed trajectory generation method with a novel time pre‑allocation algorithm. With these innovations, the system has an excellent real‑time performance with only milliseconds latency in each iteration, taking 79.24% less time than existing methods at high speeds (15 m/s in cluttered environments), allowing UAVs to fly swiftly and avoid obstacles in cluttered environments. The planned trajectory of the UAV is close to the global optimum in both temporal and spatial domains. Finally, extensive validations in both simulation and real‑world experiments demonstrate the effectiveness of our proposed system for high‑speed navigation in cluttered environments.
Authors: Mahmoud Chick Zaouali, Todd Charter, Homayoun Najjaran
Abstract: High‑fidelity 3D reconstruction is critical for aerial inspection tasks such as infrastructure monitoring, structural assessment, and environmental surveying. While traditional photogrammetry techniques enable geometric modeling, they lack semantic interpretability, limiting their effectiveness for automated inspection workflows. Recent advances in neural rendering and 3D Gaussian Splatting (3DGS) offer efficient, photorealistic reconstructions but similarly lack scene‑level understanding.
In this work, we present a UAV‑based pipeline that extends Feature‑3DGS for language‑guided 3D segmentation. We leverage LSeg‑based feature fields with CLIP embeddings to generate heatmaps in response to language prompts. These are thresholded to produce rough segmentations, and the highest‑scoring point is then used as a prompt to SAM or SAM2 for refined 2D segmentation on novel view renderings. Our results highlight the strengths and limitations of various feature field backbones (CLIP‑LSeg, SAM, SAM2) in capturing meaningful structure in large‑scale outdoor environments. We demonstrate that this hybrid approach enables flexible, language‑driven interaction with photorealistic 3D reconstructions, opening new possibilities for semantic aerial inspection and scene understanding.
Authors: Sousannah Abdalla, Sabur Baidya
Abstract: Gesture recognition presents a promising avenue for interfacing with unmanned aerial vehicles (UAVs) due to its intuitive nature and potential for precise interaction. This research conducts a comprehensive comparative analysis of vision‑based hand gesture detection methodologies tailored for UAV Control. The existing gesture recognition approaches involving cropping, zooming, and color‑based segmentation, do not work well for this kind of applications in dynamic conditions and suffer in performance with increasing distance and environmental noises. We propose to use a novel approach leveraging hand landmarks drawing and classification for gesture recognition based UAV control. With experimental results we show that our proposed method outperforms the other existing methods in terms of accuracy, noise resilience, and efficacy across varying distances, thus providing robust control decisions. However, implementing the deep learning based compute intensive gesture recognition algorithms on the UAV's onboard computer is significantly challenging in terms of performance. Hence, we propose to use a edge‑computing based framework to offload the heavier computing tasks, thus achieving closed‑loop real‑time performance. With implementation over AirSim simulator as well as over a real‑world UAV, we showcase the advantage of our end‑to‑end gesture recognition based UAV control system.
Authors: Desiree Fisker, Alexander Krawciw, Sven Lilge, Melissa Greeff, Timothy D. Barfoot
Abstract: This paper presents Virtual Teach and Repeat (VirT&R): an extension of the Teach and Repeat (T&R) framework that enables GPS‑denied, zero‑shot autonomous ground vehicle navigation in untraversed environments. VirT&R leverages aerial imagery captured for a target environment to train a Neural Radiance Field (NeRF) model so that dense point clouds and photo‑textured meshes can be extracted. The NeRF mesh is used to create a high‑fidelity simulation of the environment for piloting an unmanned ground vehicle (UGV) to virtually define a desired path. The mission can then be executed in the actual target environment by using NeRF‑generated point cloud submaps associated along the path and an existing LiDAR Teach and Repeat (LT&R) framework. We benchmark the repeatability of VirT&R on over 12 km of autonomous driving data using physical markings that allow a sim‑to‑real lateral path‑tracking error to be obtained and compared with LT&R. VirT&R achieved measured root mean squared errors (RMSE) of 19.5 cm and 18.4 cm in two different environments, which are slightly less than one tire width (24 cm) on the robot used for testing, and respective maximum errors were 39.4 cm and 47.6 cm. This was done using only the NeRF‑derived teach map, demonstrating that VirT&R has similar closed‑loop path‑tracking performance to LT&R but does not require a human to manually teach the path to the UGV in the actual environment.
Authors: Anupam Mondal, Priyadarshi Mukherjee, Sasthi C. Ghosh
Abstract: Reconfigurable intelligent surfaces (RIS) enable smart wireless environments by dynamically controlling signal propagation to enhance communication and localization. Unmanned aerial vehicles (UAVs) can act as flying base stations and thus, improve system performance by avoiding signal blockages. In this paper, we propose a gradient ascent and coordinate search based method to determine the optimal location for a system that consists of a UAV and a RIS, where the UAV serves cellular users (CUs) and the RIS serves device‑to‑device (D2D) pairs. In particular, by optimizing the net throughput for both the D2D pairs and the CUs, the suggested method establishes the ideal location for the RIS‑mounted UAV. We consider both line of sight (LoS) and non‑LoS paths for the RIS and UAV to calculate the throughput while accounting for blockages in the system. The numerical results show that the proposed method performs better than the existing approaches in terms of both the net throughput and the user fairness.
Authors: Qianlei Jia, Xinliang Zhou, Ondrej Krejcar, Enrique Herrera-Viedma
Abstract: In group decision‑making (GDM) scenarios, uncertainty, dynamic social structures, and vague information present major challenges for traditional opinion dynamics models. To address these issues, this study proposes a novel social network group decision‑making (SNGDM) framework that integrates three‑way decision (3WD) theory, dynamic network reconstruction, and linguistic opinion representation. First, the 3WD mechanism is introduced to explicitly model hesitation and ambiguity in agent judgments, thereby preventing irrational decisions. Second, a connection adjustment rule based on opinion similarity is developed, enabling agents to adaptively update their communication links and better reflect the evolving nature of social relationships. Third, linguistic terms are used to describe agent opinions, allowing the model to handle subjective, vague, or incomplete information more effectively. Finally, an integrated multi‑agent decision‑making framework is constructed, which simultaneously considers individual uncertainty, opinion evolution, and network dynamics. The proposed model is applied to a multi‑UAV cooperative decision‑making scenario, where simulation results and consensus analysis demonstrate its effectiveness. Experimental comparisons further verify the advantages of the algorithm in enhancing system stability and representing realistic decision‑making behaviors.
Authors: Oliver Grainge, Michael Milford, Indu Bodala, Sarvapali D. Ramchurn, Shoaib Ehsan
Abstract: TAT‑VPR is a ternary‑quantized transformer that brings dynamic accuracy‑efficiency trade‑offs to visual SLAM loop‑closure. By fusing ternary weights with a learned activation‑sparsity gate, the model can control computation by up to 40% at run‑time without degrading performance (Recall@1). The proposed two‑stage distillation pipeline preserves descriptor quality, letting it run on micro‑UAV and embedded SLAM stacks while matching state‑of‑the‑art localization accuracy.
Authors: Zimao Sheng, Hong'an Yang, Shuxiang Yang, Zirui Yu
Abstract: This paper addresses the challenging problem of robust path‑following for fixed‑wing unmanned aerial vehicles (UAVs) in complex environments with bounded external disturbances and non‑smooth predefined paths. Due to the unique aerodynamic characteristics and flight constraints of fixed‑wing UAVs, achieving accurate and fast stable path following remains difficult, especially in low‑altitude mountainous terrains, urban landscapes, and under wind disturbances. Most existing path‑following guidance laws often struggle to ensure fast stabilization under unknown bounded disturbances while maintaining sufficient robustness, and there is a lack of research on optimizing robustness for non‑smooth paths under flight constraints. This paper addresses these issues by proposing a constraints‑based robust path‑following controller. Firstly, from the perspective of global random attractor, we innovatively introduce robustness metrics that quantify both the exponential convergence rate and the range of the ultimate attractor set. Secondly, building on these metrics, we develop a robust longitudinal‑lateral look‑ahead pursuit (RLLP) guidance law for fixed‑wing UAVs, specifically considering the flight path angle and track angle under external disturbances. Thirdly, we also derive an optimized version (Optimal‑RLLP) to enhance the robustness metrics, and elaborate on the sufficient conditions for fast finite‑time stability, ensuring the guidance law achieves finite‑time stability and robustness with reduced sensitivity to constrained uncertainties. The simulation results validate the proposed guidance law's feasibility, optimality and robustness under atmospheric disturbances using a high‑fidelity simulation platform and provide key principle for practical deployment.
Authors: Deyu Song, Xiangyin Zhang, Zipei Yu, Kaiyu Qin
Abstract: Multi‑view Synthetic Aperture Radar (SAR) imaging can effectively enhance the performance of tasks such as automatic target recognition and image information fusion. Unmanned aerial vehicles (UAVs) have the advantages of flexible deployment and cost reduction. A swarm of UAVs equipped with synthetic aperture radar imaging equipment is well suited to meet the functional requirements of multi‑view synthetic aperture radar imaging missions. However, to provide optimal paths for SAR‑UAVs from the base station to cover target viewpoints in the mission area is of NP‑hard computational complexity. In this work, the coverage path planning problem for multi‑view SAR‑UAV observation systems is studied. First, the coordinate of observation viewpoints is calculated based on the location of targets and base station under a brief geometric model. Then, the exact problem formulation is modeled in order to fully describe the solution space and search for optimal paths that provide maximum coverage rate for SAR‑UAVs. Finally, an Adaptive Density Peak Clustering (ADPC) method is proposed to overcome the additional energy consumption due to the viewpoints being far away from the base station. The Particle Swarm Optimization (PSO) algorithm is introduced for optimal path generation. Experimental results demonstrate the effectiveness and computational efficiency of the proposed approach.
Authors: Xiangyu Wang, Donglin Yang, Yue Liao, Wenhao Zheng, wenjun wu, Bin Dai, Hongsheng Li, Si Liu
Abstract: Unmanned Aerial Vehicles (UAVs) are evolving into language‑interactive platforms, enabling more intuitive forms of human‑drone interaction. While prior works have primarily focused on high‑level planning and long‑horizon navigation, we shift attention to language‑guided fine‑grained trajectory control, where UAVs execute short‑range, reactive flight behaviors in response to language instructions. We formalize this problem as the Flying‑on‑a‑Word (Flow) task and introduce UAV imitation learning as an effective approach. In this framework, UAVs learn fine‑grained control policies by mimicking expert pilot trajectories paired with atomic language instructions. To support this paradigm, we present UAV‑Flow, the first real‑world benchmark for language‑conditioned, fine‑grained UAV control. It includes a task formulation, a large‑scale dataset collected in diverse environments, a deployable control framework, and a simulation suite for systematic evaluation. Our design enables UAVs to closely imitate the precise, expert‑level flight trajectories of human pilots and supports direct deployment without sim‑to‑real gap. We conduct extensive experiments on UAV‑Flow, benchmarking VLN and VLA paradigms. Results show that VLA models are superior to VLN baselines and highlight the critical role of spatial grounding in the fine‑grained Flow setting.
Authors: Alessandro dos Santos Ferreira, Ana Paula Marques Ramos, José Marcato Junior, Wesley Nunes Gonçalves
Abstract: Urban forests play a key role in enhancing environmental quality and supporting biodiversity in cities. Mapping and monitoring these green spaces are crucial for urban planning and conservation, yet accurately detecting trees is challenging due to complex landscapes and the variability in image resolution caused by different satellite sensors or UAV flight altitudes. While deep learning architectures have shown promise in addressing these challenges, their effectiveness remains strongly dependent on the availability of large and manually labeled datasets, which are often expensive and difficult to obtain in sufficient quantity. In this work, we propose a novel pipeline that integrates domain adaptation with GANs and Diffusion models to enhance the quality of low‑resolution aerial images. Our proposed pipeline enhances low‑resolution imagery while preserving semantic content, enabling effective tree segmentation without requiring large volumes of manually annotated data. Leveraging models such as pix2pix, Real‑ESRGAN, Latent Diffusion, and Stable Diffusion, we generate realistic and structurally consistent synthetic samples that expand the training dataset and unify scale across domains. This approach not only improves the robustness of segmentation models across different acquisition conditions but also provides a scalable and replicable solution for remote sensing scenarios with scarce annotation resources. Experimental results demonstrated an improvement of over 50% in IoU for low‑resolution images, highlighting the effectiveness of our method compared to traditional pipelines.
Authors: Li Wang, Xin Yu, Xuxin Lv, Gangzheng Ai, Wenjun Wu
Abstract: With the rapid advancement of unmanned aerial vehicles (UAVs) and missile technologies, perimeter‑defense game between attackers and defenders for the protection of critical regions have become increasingly complex and strategically significant across a wide range of domains. However, existing studies predominantly focus on small‑scale, simplified two‑dimensional scenarios, often overlooking realistic environmental perturbations, motion dynamics, and inherent heterogeneity‑‑factors that pose substantial challenges to real‑world applicability. To bridge this gap, we investigate large‑scale heterogeneous perimeter‑defense game in a three‑dimensional setting, incorporating realistic elements such as motion dynamics and wind fields. We derive the Nash equilibrium strategies for both attackers and defenders, characterize the victory regions, and validate our theoretical findings through extensive simulations. To tackle large‑scale heterogeneous control challenges in defense strategies, we propose an Embedded Mean‑Field Actor‑Critic (EMFAC) framework. EMFAC leverages representation learning to enable high‑level action aggregation in a mean‑field manner, supporting scalable coordination among defenders. Furthermore, we introduce a lightweight agent‑level attention mechanism based on reward representation, which selectively filters observations and mean‑field information to enhance decision‑making efficiency and accelerate convergence in large‑scale tasks. Extensive simulations across varying scales demonstrate the effectiveness and adaptability of EMFAC, which outperforms established baselines in both convergence speed and overall performance. To further validate practicality, we test EMFAC in small‑scale real‑world experiments and conduct detailed analyses, offering deeper insights into the framework's effectiveness in complex scenarios.
Authors: Hengxing Cai, Jinhan Dong, Jingjun Tan, Jingcheng Deng, Sihang Li, Zhifeng Gao, Haidong Wang, Zicheng Su, Agachai Sumalee, Renxin Zhong
Abstract: Unmanned Aerial Vehicle (UAV) Vision‑and‑Language Navigation (VLN) is vital for applications such as disaster response, logistics delivery, and urban inspection. However, existing methods often struggle with insufficient multimodal fusion, weak generalization, and poor interpretability. To address these challenges, we propose FlightGPT, a novel UAV VLN framework built upon Vision‑Language Models (VLMs) with powerful multimodal perception capabilities. We design a two‑stage training pipeline: first, Supervised Fine‑Tuning (SFT) using high‑quality demonstrations to improve initialization and structured reasoning; then, Group Relative Policy Optimization (GRPO) algorithm, guided by a composite reward that considers goal accuracy, reasoning quality, and format compliance, to enhance generalization and adaptability. Furthermore, FlightGPT introduces a Chain‑of‑Thought (CoT)‑based reasoning mechanism to improve decision interpretability. Extensive experiments on the city‑scale dataset CityNav demonstrate that FlightGPT achieves state‑of‑the‑art performance across all scenarios, with a 9.22% higher success rate than the strongest baseline in unseen environments. Our implementation is publicly available.
Authors: Mia Thomas, Trevor Ablett, Jonathan Kelly
Abstract: Unmanned aerial vehicles (UAVs) enable operations in remote and hazardous environments, yet the visible‑spectrum, camera‑based navigation systems often relied upon by UAVs struggle in low‑visibility conditions. Thermal cameras, which capture long‑wave infrared radiation, are able to function effectively in darkness and smoke, where visible‑light cameras fail. This work explores learned cross‑spectral (thermal‑visible) point features as a means to integrate thermal imagery into established camera‑based navigation systems. Existing methods typically train a feature network's detection and description outputs directly, which often focuses training on image regions where thermal and visible‑spectrum images exhibit similar appearance. Aiming to more fully utilize the available data, we propose a method to train the feature network on the tasks of matching and registration. We run our feature network on thermal‑visible image pairs, then feed the network response into a differentiable registration pipeline. Losses are applied to the matching and registration estimates of this pipeline. Our selected model, trained on the task of matching, achieves a registration error (corner error) below 10 pixels for more than 75% of estimates on the MultiPoint dataset. We further demonstrate that our model can also be used with a classical pipeline for matching and registration.
Authors: Sebastian Schroder, Yao Deng, Alice James, Avishkar Seth, Kye Morton, Subhas Mukhopadhyay, Richard Han, Xi Zheng
Abstract: Uncrewed Aerial Vehicles (UAVs) have become a focal point of research, with both established companies and startups investing heavily in their development. This paper presents our iterative process in developing a robust autonomous marker‑based landing system, highlighting the key challenges encountered and the solutions implemented. It reviews existing systems for autonomous landing processes, and through this aims to contribute to the community by sharing insights and challenges faced during development and testing.
Authors: Derek Ming Siang Tan, Shailesh, Boyang Liu, Alok Raj, Qi Xuan Ang, Weiheng Dai, Tanishq Duhan, Jimmy Chiun, Yuhong Cao, Florian Shkurti, Guillaume Sartoretti
Abstract: To perform outdoor visual navigation and search, a robot may leverage satellite imagery to generate visual priors. This can help inform high‑level search strategies, even when such images lack sufficient resolution for target recognition. However, many existing informative path planning or search‑based approaches either assume no prior information, or use priors without accounting for how they were obtained. Recent work instead utilizes large Vision Language Models (VLMs) for generalizable priors, but their outputs can be inaccurate due to hallucination, leading to inefficient search. To address these challenges, we introduce Search‑TTA, a multimodal test‑time adaptation framework with a flexible plug‑and‑play interface compatible with various input modalities (e.g., image, text, sound) and planning methods (e.g., RL‑based). First, we pretrain a satellite image encoder to align with CLIP's visual encoder to output probability distributions of target presence used for visual search. Second, our TTA framework dynamically refines CLIP's predictions during search using uncertainty‑weighted gradient updates inspired by Spatial Poisson Point Processes. To train and evaluate Search‑TTA, we curate AVS‑Bench, a visual search dataset based on internet‑scale ecological data containing 380k images and taxonomy data. We find that Search‑TTA improves planner performance by up to 30.0%, particularly in cases with poor initial CLIP predictions due to domain mismatch and limited training data. It also performs comparably with significantly larger VLMs, and achieves zero‑shot generalization via emergent alignment to unseen modalities. Finally, we deploy Search‑TTA on a real UAV via hardware‑in‑the‑loop testing, by simulating its operation within a large‑scale simulation that provides onboard sensing.
Authors: Aly Sabri Abdalla, Vuk Marojevic
Abstract: This paper studies the problem of securing task offloading transmissions from ground users against ground eavesdropping threats. Our study introduces a reconfigurable intelligent surface (RIS)‑aided unmanned aerial vehicle (UAV)‑mobile edge computing (MEC) scheme to enhance the secure task offloading while minimizing the energy consumption of the UAV subject to task completion constraints. Leveraging a data‑driven approach, we propose a comprehensive optimization strategy that jointly optimizes the aerial MEC (AMEC)'s trajectory, task offloading partitioning, UE transmission scheduling, and RIS phase shifts. Our objective centers on optimizing the secrecy energy efficiency (SEE) of UE task offloading transmissions while preserving the AMEC's energy resources and meeting the task completion time requirements. Numerical results show that the proposed solution can effectively safeguard legitimate task offloading transmissions while preserving AMEC energy.
Authors: Jianlin Guo, Haihong Xiao, Wenxiong Kang
Abstract: Efficient scene representations are essential for many real‑world applications, especially those involving spatial measurement. Although current NeRF‑based methods have achieved impressive results in reconstructing building‑scale scenes, they still suffer from slow training and inference speeds due to time‑consuming stochastic sampling. Recently, 3D Gaussian Splatting (3DGS) has demonstrated excellent performance with its high‑quality rendering and real‑time speed, especially for objects and small‑scale scenes. However, in outdoor scenes, its point‑based explicit representation lacks an effective adjustment mechanism, and the millions of Gaussian points required often lead to memory constraints during training. To address these challenges, we propose EA‑3DGS, a high‑quality real‑time rendering method designed for outdoor scenes. First, we introduce a mesh structure to regulate the initialization of Gaussian components by leveraging an adaptive tetrahedral mesh that partitions the grid and initializes Gaussian components on each face, effectively capturing geometric structures in low‑texture regions. Second, we propose an efficient Gaussian pruning strategy that evaluates each 3D Gaussian's contribution to the view and prunes accordingly. To retain geometry‑critical Gaussian points, we also present a structure‑aware densification strategy that densifies Gaussian points in low‑curvature regions. Additionally, we employ vector quantization for parameter quantization of Gaussian components, significantly reducing disk space requirements with only a minimal impact on rendering quality. Extensive experiments on 13 scenes, including eight from four public datasets (MatrixCity‑Aerial, Mill‑19, Tanks \& Temples, WHU) and five self‑collected scenes acquired through UAV photogrammetry measurement from SCUT‑CA and plateau regions, further demonstrate the superiority of our method.
Authors: Ebasa Temesgen, Mario Jerez, Greta Brown, Graham Wilson, Sree Ganesh Lalitaditya Divakarla, Sarah Boelter, Oscar Nelson, Robert McPherson, Maria Gini
Abstract: Wildlife‑induced crop damage, particularly from deer, threatens agricultural productivity. Traditional deterrence methods often fall short in scalability, responsiveness, and adaptability to diverse farmland environments. This paper presents an integrated unmanned aerial vehicle (UAV) system designed for autonomous wildlife deterrence, developed as part of the Farm Robotics Challenge. Our system combines a YOLO‑based real‑time computer vision module for deer detection, an energy‑efficient coverage path planning algorithm for efficient field monitoring, and an autonomous charging station for continuous operation of the UAV. In collaboration with a local Minnesota farmer, the system is tailored to address practical constraints such as terrain, infrastructure limitations, and animal behavior. The solution is evaluated through a combination of simulation and field testing, demonstrating robust detection accuracy, efficient coverage, and extended operational time. The results highlight the feasibility and effectiveness of drone‑based wildlife deterrence in precision agriculture, offering a scalable framework for future deployment and extension.
Authors: Mitchell Rogers, Theo Thompson, Isla Duporge, Johannes Fischer, Klemens Pütz, Thomas Mattern, Bing Xue, Mengjie Zhang
Abstract: Recent advancements in deep learning and aerial imaging have transformed wildlife monitoring, enabling researchers to survey wildlife populations at unprecedented scales. Unmanned Aerial Vehicles (UAVs) provide a cost‑effective means of capturing high‑resolution imagery, particularly for monitoring densely populated seabird colonies. In this study, we assess the performance of a general‑purpose avian detection model, BirdDetector, in estimating the breeding population of Salvin's albatross (Thalassarche salvini) on the Bounty Islands, New Zealand. Using drone‑derived imagery, we evaluate the model's effectiveness in both zero‑shot and fine‑tuned settings, incorporating enhanced inference techniques and stronger augmentation methods. Our findings indicate that while applying the model in a zero‑shot setting offers a strong baseline, fine‑tuning with annotations from the target domain and stronger image augmentation leads to marked improvements in detection accuracy. These results highlight the potential of leveraging pre‑trained deep‑learning models for species‑specific monitoring in remote and challenging environments.
Authors: Azim Akhtarshenas, Ramin Toosi, David López-Pérez, Tohid Alizadeh, Alireza Hosseini
Abstract: Malicious Unmanned Aerial Vehicles (UAVs) present a significant threat to next‑generation networks (NGNs), posing risks such as unauthorized surveillance, data theft, and the delivery of hazardous materials. This paper proposes an integrated (AE)‑classifier system to detect malicious UAVs. The proposed AE, based on a 4‑layer Tri‑orientated Spatial Mamba (TSMamba) architecture, effectively captures complex spatial relationships crucial for identifying malicious UAV activities. The first phase involves generating residual values through the AE, which are subsequently processed by a ResNet‑based classifier. This classifier leverages the residual values to achieve lower complexity and higher accuracy. Our experiments demonstrate significant improvements in both binary and multi‑class classification scenarios, achieving up to 99.8 % recall compared to 96.7 % in the benchmark. Additionally, our method reduces computational complexity, making it more suitable for large‑scale deployment. These results highlight the robustness and scalability of our approach, offering an effective solution for malicious UAV detection in NGN environments.
Authors: Syed Luqman Shah, Ziaul Haq Abbas, Ghulam Abbas, Nurul Huda Mahmood
Abstract: In unmanned aerial vehicle (UAV)‑assisted wake‑up radio (WuR)‑enabled internet of things (IoT) networks, UAVs can instantly activate the main radios (MRs) of the sensor nodes (SNs) with a wake‑up call (WuC) for efficient data collection in mission‑driven data collection scenarios. However, the spontaneous response of numerous SNs to the UAV's WuC can lead to significant packet loss and collisions, as WuR does not exhibit its superiority for high‑traffic loads. To address this challenge, we propose an innovative receiver‑initiated WuR UAV‑assisted clustering (RI‑WuR‑UAC) medium access control (MAC) protocol to achieve low latency and high reliability in ultra‑low power consumption applications. We model the proposed protocol using the M/G/1/2 queuing framework and derive expressions for key performance metrics, i.e., channel busyness probability, probability of successful clustering, average SN energy consumption, and average transmission delay. The RI‑WuR‑UAC protocol employs three distinct data flow models, tailored to different network traffic conditions, which perform three MAC mechanisms: channel assessment (CCA) clustering for light traffic loads, backoff plus CCA clustering for dense and heavy traffic, and adaptive clustering for variable traffic loads. Simulation results demonstrate that the RI‑WuR‑UAC protocol significantly outperforms the benchmark sub‑carrier modulation clustering protocol. By varying the network load, we capture the trade‑offs among the performance metrics, showcasing the superior efficiency and reliability of the RI‑WuR‑UAC protocol.
Authors: Emre Girgin, Arda Taha Candan, Coşkun Anıl Zaman
Abstract: The fields of autonomous systems and robotics are receiving considerable attention in civil applications such as construction, logistics, and firefighting. Nevertheless, the widespread adoption of these technologies is hindered by the necessity for robust processing units to run AI models. Edge‑AI solutions offer considerable promise, enabling low‑power, cost‑effective robotics that can automate civil services, improve safety, and enhance sustainability. This paper presents a novel Edge‑AI‑enabled drone‑based surveillance system for autonomous multi‑robot operations at construction sites. Our system integrates a lightweight MCU‑based object detection model within a custom‑built UAV platform and a 5G‑enabled multi‑agent coordination infrastructure. We specifically target the real‑time obstacle detection and dynamic path planning problem in construction environments, providing a comprehensive dataset specifically created for MCU‑based edge applications. Field experiments demonstrate practical viability and identify optimal operational parameters, highlighting our approach's scalability and computational efficiency advantages compared to existing UAV solutions. The present and future roles of autonomous vehicles on construction sites are also discussed, as well as the effectiveness of edge‑AI solutions. We share our dataset publicly at github.com/egirgin/storaige‑b950
Authors: Zachary Ravichandran, Fernando Cladera, Jason Hughes, Varun Murali, M. Ani Hsieh, George J. Pappas, Camillo J. Taylor, Vijay Kumar
Abstract: The integration of foundation models (FMs) into robotics has enabled robots to understand natural language and reason about the semantics in their environments. However, existing FM‑enabled robots primary operate in closed‑world settings, where the robot is given a full prior map or has a full view of its workspace. This paper addresses the deployment of FM‑enabled robots in the field, where missions often require a robot to operate in large‑scale and unstructured environments. To effectively accomplish these missions, robots must actively explore their environments, navigate obstacle‑cluttered terrain, handle unexpected sensor inputs, and operate with compute constraints. We discuss recent deployments of SPINE, our LLM‑enabled autonomy framework, in field robotic settings. To the best of our knowledge, we present the first demonstration of large‑scale LLM‑enabled robot planning in unstructured environments with several kilometers of missions. SPINE is agnostic to a particular LLM, which allows us to distill small language models capable of running onboard size, weight and power (SWaP) limited platforms. Via preliminary model distillation work, we then present the first language‑driven UAV planner using on‑device language models. We conclude our paper by proposing several promising directions for future research.
Authors: Yang Gao, Zezhi Zeng
Abstract: Meteorological disasters such as typhoons, forest fires, and floods can damage the communication infrastructures, which will further disable the communication capabilities of cellular networks. The multi‑hop wireless communication based on IoT devices (e.g., rescue robots, UAVs, and mobile devices) becomes an available and rapidly deployable communication approach for search and rescue operations. However, Age of Information (AoI), an emerging network performance metric, has not been comprehensively investigated in this multi‑hop model. In this paper, we first construct a UAV‑relayed wireless network model and formulate the end‑to‑end instant AoI. Then we derive the optimal location of the relay UAV to achieve the minimum instant AoI by mathematical analysis. Simulations show that the derived relay location can always guarantee the optimal AoI and outperform other schemes.
Authors: Cunlai Pu, Fangrui Wu, Rajput Ramiz Sharafat, Guangzhao Dai, Xiangbo Shu
Abstract: Link prediction in unmanned aerial vehicle (UAV) ad hoc networks (UANETs) aims to predict the potential formation of future links between UAVs. In adversarial environments where the route information of UAVs is unavailable, predicting future links must rely solely on the observed historical topological information of UANETs. However, the highly dynamic and sparse nature of UANET topologies presents substantial challenges in effectively capturing meaningful structural and temporal patterns for accurate link prediction. Most existing link prediction methods focus on temporal dynamics at a single structural scale while neglecting the effects of sparsity, resulting in insufficient information capture and limited applicability to UANETs. In this paper, we propose a multi‑scale structural‑temporal link prediction model (MUST) for UANETs. Specifically, we first employ graph attention networks (GATs) to capture structural features at multiple levels, including the individual UAV level, the UAV community level, and the overall network level. Then, we use long short‑term memory (LSTM) networks to learn the temporal dynamics of these multi‑scale structural features. Additionally, we address the impact of sparsity by introducing a sophisticated loss function during model optimization. We validate the performance of MUST using several UANET datasets generated through simulations. Extensive experimental results demonstrate that MUST achieves state‑of‑the‑art link prediction performance in highly dynamic and sparse UANETs.
Authors: Yimou Wu, Mingyang Liang, Chongfeng Liu, Zhongzhong Cao, Huihuan Qian
Abstract: Recovering a drone on a disturbed water surface remains a significant challenge in maritime robotics. In this paper, we propose a unified framework for robot‑assisted drone recovery on a wavy surface that addresses two major tasks: Firstly, accurate prediction of a moving drone's position under wave‑induced disturbances using KalmanNet Plus Plus (KalmanNet++), a Neural Network Aided Kalman Filtering we proposed. Secondly, effective motion planning using the desired position we got for a manipulator via Receding Horizon Model Predictive Control (RHMPC). Specifically, we compared multiple prediction methods and proposed KalmanNet Plus Plus to predict the position of the UAV, thereby obtaining the desired position. The KalmanNet++ predicts the drone's future position 0.1\,s ahead, while the manipulator plans a capture trajectory in real time, thus overcoming not only wave‑induced base motions but also limited constraints such as torque constraints and joint constraints. For the system design, we provide a collaborative system, comprising a manipulator subsystem and a UAV subsystem, enables drone lifting and drone recovery. Simulation and real‑world experiments using wave‑disturbed motion data demonstrate that our approach achieves a high success rate ‑ above 95% and outperforms conventional baseline methods by up to 10% in efficiency and 20% in precision. The results underscore the feasibility and robustness of our system, which achieves state‑of‑the‑art performance and offers a practical solution for maritime drone operations.
Authors: Fernando Cladera, Zachary Ravichandran, Jason Hughes, Varun Murali, Carlos Nieto-Granda, M. Ani Hsieh, George J. Pappas, Camillo J. Taylor, Vijay Kumar
Abstract: As autonomous robotic systems become increasingly mature, users will want to specify missions at the level of intent rather than in low‑level detail. Language is an expressive and intuitive medium for such mission specification. However, realizing language‑guided robotic teams requires overcoming significant technical hurdles. Interpreting and realizing language‑specified missions requires advanced semantic reasoning. Successful heterogeneous robots must effectively coordinate actions and share information across varying viewpoints. Additionally, communication between robots is typically intermittent, necessitating robust strategies that leverage communication opportunities to maintain coordination and achieve mission objectives. In this work, we present a first‑of‑its‑kind system where an unmanned aerial vehicle (UAV) and an unmanned ground vehicle (UGV) are able to collaboratively accomplish missions specified in natural language while reacting to changes in specification on the fly. We leverage a Large Language Model (LLM)‑enabled planner to reason over semantic‑metric maps that are built online and opportunistically shared between an aerial and a ground robot. We consider task‑driven navigation in urban and rural areas. Our system must infer mission‑relevant semantics and actively acquire information via semantic mapping. In both ground and air‑ground teaming experiments, we demonstrate our system on seven different natural‑language specifications at up to kilometer‑scale navigation.
Authors: Mohammad Wasil, Ahmad Drak, Brennan Penfold, Ludovico Scarton, Maximilian Johenneken, Alexander Asteroth, Sebastian Houben
Abstract: Unmanned Aerial Vehicles (UAVs) are increasingly used for reforestation and forest monitoring, including seed dispersal in hard‑to‑reach terrains. However, a detailed understanding of the forest floor remains a challenge due to high natural variability, quickly changing environmental parameters, and ambiguous annotations due to unclear definitions. To address this issue, we adapt the Segment Anything Model (SAM), a vision foundation model with strong generalization capabilities, to segment forest floor objects such as tree stumps, vegetation, and woody debris. To this end, we employ parameter‑efficient fine‑tuning (PEFT) to fine‑tune a small subset of additional model parameters while keeping the original weights fixed. We adjust SAM's mask decoder to generate masks corresponding to our dataset categories, allowing for automatic segmentation without manual prompting. Our results show that the adapter‑based PEFT method achieves the highest mean intersection over union (mIoU), while Low‑rank Adaptation (LoRA), with fewer parameters, offers a lightweight alternative for resource‑constrained UAV platforms.
Authors: Yatai Ji, Zhengqiu Zhu, Yong Zhao, Beidan Liu, Chen Gao, Yihao Zhao, Sihang Qiu, Yue Hu, Quanjun Yin, Yong Li
Abstract: Aerial Visual Object Search (AVOS) tasks in urban environments require Unmanned Aerial Vehicles (UAVs) to autonomously search for and identify target objects using visual and textual cues without external guidance. Existing approaches struggle in complex urban environments due to redundant semantic processing, similar object distinction, and the exploration‑exploitation dilemma. To bridge this gap and support the AVOS task, we introduce CityAVOS, the first benchmark dataset for autonomous search of common urban objects. This dataset comprises 2,420 tasks across six object categories with varying difficulty levels, enabling comprehensive evaluation of UAV agents' search capabilities. To solve the AVOS tasks, we also propose PRPSearcher (Perception‑Reasoning‑Planning Searcher), a novel agentic method powered by multi‑modal large language models (MLLMs) that mimics human three‑tier cognition. Specifically, PRPSearcher constructs three specialized maps: an object‑centric dynamic semantic map enhancing spatial perception, a 3D cognitive map based on semantic attraction values for target reasoning, and a 3D uncertainty map for balanced exploration‑exploitation search. Also, our approach incorporates a denoising mechanism to mitigate interference from similar objects and utilizes an Inspiration Promote Thought (IPT) prompting mechanism for adaptive action planning. Experimental results on CityAVOS demonstrate that PRPSearcher surpasses existing baselines in both success rate and search efficiency (on average: +37.69% SR, +28.96% SPL, ‑30.69% MSS, and ‑46.40% NE). While promising, the performance gap compared to humans highlights the need for better semantic reasoning and spatial exploration capabilities in AVOS tasks. This work establishes a foundation for future advances in embodied target search. Dataset and source code are available at https://anonymous.4open.science/r/CityAVOS‑3DF8.
Authors: Libiao Lou, Yuan Liu, Fotis Foukalas, Hongjiang Lei, Gaofeng Pan, Theodoros A. Tsiftsis, Hongwu Liu
Abstract: In this paper, we propose a dual‑unmanned aerial vehicle (UAV)‑enabled secure communication and sensing (SCS) scheme for an air‑to‑ground integrated sensing and communication (ISAC) system, in which a dual‑functional source UAV and jamming UAV collaborate to enhance both the secure communication and target sensing performance. From a perspective of hybrid monostatitc‑bistatic radar, the jamming UAV maneuvers to aid the source UAV to detect multiple ground targets by emitting artificial noise, meanwhile interfering with the ground eavesdropper. Residual interference is considered to reflect the effects of imperfect successive interference cancellation (SIC) on the receive signal‑plus‑interference‑to‑noise ratios, which results in a degraded system performance. To maximize the average secrecy rate (ASR), the dual‑UAV trajectory and dual‑UAV beamforming are jointly optimized subject to the transmit power budget, UAV maneuvering constraint, and sensing requirements. To tackle the highly complicated non‑convex ASR maximization problem, the dual‑UAV trajectory and dual‑UAV beamforming are optimized for the secure communication (SC) purpose and the SCS purpose, sequentially. In the SC phase, a block coordinate descent algorithm is proposed to optimize the dual‑UAV trajectory and dual‑UAV beamforming iteratively, using the trust‑region successive convex approximation (SCA) and semidefinite relaxation (SDR) techniques. Then, a weighted distance minimization problem is formulated to determine the dual‑UAV maneuvering positions suitable for the SCS purpose, which is solved by a heuristic greedy algorithm, followed by the joint optimization of source beamforming and jamming beamforming.
Authors: Yanggang Xu, Jirong Zha, Weijie Hong, Xiangmin Yi, Geng Chen, Jianfeng Zheng, Chen-Chun Hsia, Xinlei Chen
Abstract: In disaster scenarios, establishing robust emergency communication networks is critical, and unmanned aerial vehicles (UAVs) offer a promising solution to rapidly restore connectivity. However, organizing UAVs to form multi‑hop networks in large‑scale dynamic environments presents significant challenges, including limitations in algorithmic scalability and the vast exploration space required for coordinated decision‑making. To address these issues, we propose MRLMN, a novel framework that integrates multi‑agent reinforcement learning (MARL) and large language models (LLMs) to jointly optimize UAV agents toward achieving optimal networking performance. The framework incorporates a grouping strategy with reward decomposition to enhance algorithmic scalability and balance decision‑making across UAVs. In addition, behavioral constraints are applied to selected key UAVs to improve the robustness of the network. Furthermore, the framework integrates LLM agents, leveraging knowledge distillation to transfer their high‑level decision‑making capabilities to MARL agents. This enhances both the efficiency of exploration and the overall training process. In the distillation module, a Hungarian algorithm‑based matching scheme is applied to align the decision outputs of the LLM and MARL agents and define the distillation loss. Extensive simulation results validate the effectiveness of our approach, demonstrating significant improvements in network performance over the MAPPO baseline and other comparison methods, including enhanced coverage and communication quality.
Authors: Mirco Theile, Andres R. Zapata Rodriguez, Marco Caccamo, Alberto L. Sangiovanni-Vincentelli
Abstract: Unmanned Aerial Vehicle (UAV) Coverage Path Planning (CPP) is critical for applications such as precision agriculture and search and rescue. While traditional methods rely on discrete grid‑based representations, real‑world UAV operations require power‑efficient continuous motion planning. We formulate the UAV CPP problem in a continuous environment, minimizing power consumption while ensuring complete coverage. Our approach models the environment with variable‑size axis‑aligned rectangles and UAV motion with curvature‑constrained Bézier curves. We train a reinforcement learning agent using an action‑mapping‑based Soft Actor‑Critic (AM‑SAC) algorithm employing a self‑adaptive curriculum. Experiments on both procedurally generated and hand‑crafted scenarios demonstrate the effectiveness of our method in learning energy‑efficient coverage strategies.
Authors: Pedro Antonio Alarcon Granadeno, Jane Cleland-Huang
Abstract: Modern coverage path planning (CPP) for holonomic UAVs in emergency response must contend with diverse environments where regions of interest (ROIs) often take the form of highly irregular polygons, characterized by asymmetric shapes, dense clusters of concavities, and multiple internal holes. Modern CPP pipelines typically rely on decomposition strategies that overfragment such polygons into numerous subregions. This increases the number of sweep segments and connectors, which in turn adds inter‑region travel and forces more frequent reorientation. These effects ultimately result in longer completion times and degraded trajectory quality. We address this with a decomposition strategy that applies a recursive dual‑axis monotonicity criterion with cuts guided by a cumulative gap severity metric. This approach distributes clusters of concavities more evenly across subregions and produces a minimal set of partitions that remain sweepable under a parallel‑track maneuver. We pair this with a global optimizer that jointly selects sweep paths and inter‑partition transitions to minimize total path length, transition overhead, and turn count. We demonstrate that our proposed approach achieves the lowest mean overhead in path length and completion time across 13 notable CPP pipelines.
Authors: Alavikunhu Panthakkan, S M Anzar, K. Sherin, Saeed Al Mansoori, Hussain Al-Ahmad
Abstract: Precision farming relies on accurate vegetation monitoring to enhance crop productivity and promote sustainable agricultural practices. This study presents a comprehensive evaluation of UAV‑based imaging for vegetation health assessment in a palm tree cultivation region in Dubai. By comparing multispectral and RGB image data, we demonstrate that RGBbased vegetation indices offer performance comparable to more expensive multispectral indices, providing a cost‑effective alternative for large‑scale agricultural monitoring. Using UAVs equipped with multispectral sensors, indices such as NDVI and SAVI were computed to categorize vegetation into healthy, moderate, and stressed conditions. Simultaneously, RGB‑based indices like VARI and MGRVI delivered similar results in vegetation classification and stress detection. Our findings highlight the practical benefits of integrating RGB imagery into precision farming, reducing operational costs while maintaining accuracy in plant health monitoring. This research underscores the potential of UAVbased RGB imaging as a powerful tool for precision agriculture, enabling broader adoption of data‑driven decision‑making in crop management. By leveraging the strengths of both multispectral and RGB imaging, this work advances the state of UAV applications in agriculture, paving the way for more efficient and scalable farming solutions.
Authors: Keiwan Soltani, Federico Corò, Punyasha Chatterjee, Sajal K. Das
Abstract: Unmanned Aerial Vehicles (UAVs), also known as drones, have gained popularity in various fields such as agriculture, emergency response, and search and rescue operations. UAV networks are susceptible to several security threats, such as wormhole, jamming, spoofing, and false data injection. Time Delay Attack (TDA) is a unique attack in which malicious UAVs intentionally delay packet forwarding, posing significant threats, especially in time‑sensitive applications. It is challenging to distinguish malicious delay from benign network delay due to the dynamic nature of UAV networks, intermittent wireless connectivity, or the Store‑Carry‑Forward (SCF) mechanism during multi‑hop communication. Some existing works propose machine learning‑based centralized approaches to detect TDA, which are computationally intensive and have large message overheads. This paper proposes a novel approach DATAMUt, where the temporal dynamics of the network are represented by a weighted time‑window graph (TWiG), and then two deterministic polynomial‑time algorithms are presented to detect TDA when UAVs have global and local network knowledge. Simulation studies show that the proposed algorithms have reduced message overhead by a factor of five and twelve in global and local knowledge, respectively, compared to existing approaches. Additionally, our approaches achieve approximately 860 and 1050 times less execution time in global and local knowledge, respectively, outperforming the existing methods.
Authors: Junfan Yi, Ke-ke Shang, Michael Small
Abstract: The Earth, a temporal complex system, is witnessing a shift in research on its coordinate system, moving away from conventional static positioning toward embracing dynamic modeling. Early positioning concentrates on static natural geographic features, with the emergence of geographic information systems introducing a growing demand for spatial data, the focus turns to capturing dynamic objects. However, previous methods typically rely on expensive devices or external calibration objects for attitude measurement. We propose an applied mathematical model that utilizes time series, the nature of dynamic object, to determine relative attitudes without absolute attitude measurements, then employs SVD‑based methods for 3D coordinate recognition. The model is validated with negligible error in a numerical simulation, which is inherent in computer numerical approximations. What in follows, to assess our model in the engineering scenario, we propose a framework featuring the integration of applied mathematics with AI, utilizing only three cameras to capture an UAV. We enhance the YOLOv8 model by leveraging time series for the accurate 2D coordinate acquisitions, which is then used as input for 2D‑to‑3D conversion via our mathematics model. As a result, the framework demonstrates high precision, as evidenced by low error metrics including root mean square error, mean absolute error, maximum error, and a strong R‑squared value. It is important to note that the mathematical method itself is inherently error‑free; any observed inaccuracies are due solely to external hardware or the AI‑based 2D coordinate acquisition process, which represents an improved version of the current state‑of‑the‑art. Our framework enriches geodetic theory by providing a streamlined model for the 3D positioning of non‑cooperative targets, minimizing input attitude parameters, leveraging applied mathematics and AI.
Authors: Zeynep Galymzhankyzy, Eric Martinson
Abstract: Efficient crop‑weed segmentation is critical for site‑specific weed control in precision agriculture. Conventional CNN‑based methods struggle to generalize and rely on RGB imagery, limiting performance under complex field conditions. To address these challenges, we propose a lightweight transformer‑CNN hybrid. It processes RGB, Near‑Infrared (NIR), and Red‑Edge (RE) bands using specialized encoders and dynamic modality integration. Evaluated on the WeedsGalore dataset, the model achieves a segmentation accuracy (mean IoU) of 78.88%, outperforming RGB‑only models by 15.8 percentage points. With only 8.7 million parameters, the model offers high accuracy, computational efficiency, and potential for real‑time deployment on Unmanned Aerial Vehicles (UAVs) and edge devices, advancing precision weed management.
Authors: Oleg Sautenkov, Yasheerah Yaqoot, Muhammad Ahsan Mustafa, Faryal Batool, Jeffrin Sam, Artem Lykov, Chih-Yung Wen, Dzmitry Tsetserukou
Abstract: We present UAV‑CodeAgents, a scalable multi‑agent framework for autonomous UAV mission generation, built on large language and vision‑language models (LLMs/VLMs). The system leverages the ReAct (Reason + Act) paradigm to interpret satellite imagery, ground high‑level natural language instructions, and collaboratively generate UAV trajectories with minimal human supervision. A core component is a vision‑grounded, pixel‑pointing mechanism that enables precise localization of semantic targets on aerial maps. To support real‑time adaptability, we introduce a reactive thinking loop, allowing agents to iteratively reflect on observations, revise mission goals, and coordinate dynamically in evolving environments.
UAV‑CodeAgents is evaluated on large‑scale mission scenarios involving industrial and environmental fire detection. Our results show that a lower decoding temperature (0.5) yields higher planning reliability and reduced execution time, with an average mission creation time of 96.96 seconds and a success rate of 93%. We further fine‑tune Qwen2.5VL‑7B on 9,000 annotated satellite images, achieving strong spatial grounding across diverse visual categories. To foster reproducibility and future research, we will release the full codebase and a novel benchmark dataset for vision‑language‑based UAV planning.
Authors: Wenhao Lu, Zhengqiu Zhu, Yong Zhao, Yonglin Tian, Junjie Zeng, Jun Zhang, Zhong Liu, Fei-Yue Wang
Abstract: Mobile crowdsensing is evolving beyond traditional human‑centric models by integrating heterogeneous entities like unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs). Optimizing task allocation among these diverse agents is critical, particularly in challenging emergency rescue scenarios characterized by complex environments, limited communication, and partial observability. This paper tackles the Heterogeneous‑Entity Collaborative‑Sensing Task Allocation (HECTA) problem specifically for emergency rescue, considering humans, UAVs, and UGVs. We introduce a novel ``Hard‑Cooperative'' policy where UGVs prioritize recharging low‑battery UAVs, alongside performing their sensing tasks. The primary objective is maximizing the task completion rate (TCR) under strict time constraints. We rigorously formulate this NP‑hard problem as a decentralized partially observable Markov decision process (Dec‑POMDP) to effectively handle sequential decision‑making under uncertainty. To solve this, we propose HECTA4ER, a novel multi‑agent reinforcement learning algorithm built upon a Centralized Training with Decentralized Execution architecture. HECTA4ER incorporates tailored designs, including specialized modules for complex feature extraction, utilization of action‑observation history via hidden states, and a mixing network integrating global and local information, specifically addressing the challenges of partial observability. Furthermore, theoretical analysis confirms the algorithm's convergence properties. Extensive simulations demonstrate that HECTA4ER significantly outperforms baseline algorithms, achieving an average 18.42% increase in TCR. Crucially, a real‑world case study validates the algorithm's effectiveness and robustness in dynamic sensing scenarios, highlighting its strong potential for practical application in emergency response.
Authors: Tarik Houichime, Younes EL Amrani
Abstract: This paper introduces an innovative approach for the autonomous landing of Unmanned Aerial Vehicles (UAVs) using only a front‑facing monocular camera, therefore obviating the requirement for depth estimation cameras. Drawing on the inherent human estimating process, the proposed method reframes the landing task as an optimization problem. The UAV employs variations in the visual characteristics of a specially designed lenticular circle on the landing pad, where the perceived color and form provide critical information for estimating both altitude and depth. Reinforcement learning algorithms are utilized to approximate the functions governing these estimations, enabling the UAV to ascertain ideal landing settings via training. This method's efficacy is assessed by simulations and experiments, showcasing its potential for robust and accurate autonomous landing without dependence on complex sensor setups. This research contributes to the advancement of cost‑effective and efficient UAV landing solutions, paving the way for wider applicability across various fields.
Authors: Menghao Hu, Tong Zhang, Shuai Wang, Chiya Zhang, Changyang She, Gaojie Chen, Miaowen Wen
Abstract: Low‑altitude economy includes the application of unmanned aerial vehicles (UAVs) serving ground robots. This paper investigates the 3‑dimensional (3D) trajectory and communication optimization for low‑altitude air‑ground cooperation systems, where mobile unmanned ground vehicles (UGVs) upload data to UAVs. We propose a joint optimization algorithm to maximize the minimal sum‑rate of UGVs while ensuring quality of service and navigation constraints. The proposed algorithm integrates a successive convex approximation (SCA)‑penalty method for UGV‑UAV scheduling, an SCA‑based approach for UGV transmit power control, and a novel warm‑start particle swarm optimization with cross mutation (WS‑PSO‑CM). The WS‑PSO‑CM leverages convex optimization results from a statistical channel model to initialize particle swarm, significantly improving the performance, compared with celebrated PSO‑CM. Simulation results demonstrate that the proposed algorithm achieves a 45.8% higher minimal sum‑rate compared to the baseline PSO‑CM under the same iterations. This gain can be translated to reducing computational time by 46.7% of PSO‑CM. Furthermore, our simulation results reveal that UAVs dynamically adjust trajectories to avoid interference by buildings, and maintain proximity to UGVs to mitigate path‑loss.
Authors: Yiming Wang, Yao Fang, Jie Mei, Youmin Gong, Guangfu Ma
Abstract: This paper studies the leaderless formation flying problem with collision avoidance for a group of unmanned aerial vehicles (UAVs), which requires the UAVs to navigate through cluttered environments without colliding while maintaining the formation. The communication network among the UAVs is structured as a directed graph that includes a directed spanning tree. A novel distributed nonlinear model predictive control (NMPC) method based on the model reference adaptive consensus (MRACon) framework is proposed. Within this framework, each UAV tracks an assigned reference output generated by a linear reference model that utilizes relative measurements as input. Subsequently, the NMPC method penalizes the tracking error between the output of the reference model and that of the actual model while also establishing constraint sets for collision avoidance and physical limitations to achieve distributed and safe formation control. Finally, simulations and hardware experiments are conducted to verify the effectiveness of the proposed method.
Authors: Yu Cheng, Harun Šiljak
Abstract: Accurate, real‑time collision detection is essential for ensuring player safety and effective refereeing in high‑contact sports such as rugby, particularly given the severe risks associated with traumatic brain injuries (TBI). Traditional collision‑monitoring methods employing fixed cameras or wearable sensors face limitations in visibility, coverage, and responsiveness. Previously, we introduced a framework using unmanned aerial vehicles (UAVs) for monitoring and real time kinematics extraction from videos of collision events. In this paper, we show that the strategies operating on the objective of ensuring at least one UAV captures every incident on the pitch have an emergent property of fulfilling a stronger key condition for successful kinematics extraction. Namely, they ensure that almost all collisions are captured by multiple drones, establishing multi‑view fidelity and redundancy, while not requiring any drone‑to‑drone communication.
Authors: Chintan B. Maniyar, Minakshi Kumar, Gengchen Mai
Abstract: Accurate building segmentation from high‑resolution RGB imagery remains challenging due to spectral similarity with non‑building features, shadows, and irregular building geometries. In this study, we present a comprehensive deep learning framework for multiscale building segmentation using RGB aerial and satellite imagery with spatial resolutions ranging from 0.4m to 2.7m. We curate a diverse, multi‑sensor dataset and introduce feature‑augmented inputs by deriving secondary representations including Principal Component Analysis (PCA), Visible Difference Vegetation Index (VDVI), Morphological Building Index (MBI), and Sobel edge filters from RGB channels. These features guide a Res‑U‑Net architecture in learning complex spatial patterns more effectively. We also propose training policies incorporating layer freezing, cyclical learning rates, and SuperConvergence to reduce training time and resource usage. Evaluated on a held‑out WorldView‑3 image, our model achieves an overall accuracy of 96.5%, an F1‑score of 0.86, and an Intersection over Union (IoU) of 0.80, outperforming existing RGB‑based benchmarks. This study demonstrates the effectiveness of combining multi‑resolution imagery, feature augmentation, and optimized training strategies for robust building segmentation in remote sensing applications.
Authors: Zhifeng Hu, Chong Han
Abstract: Terahertz (THz) unmanned aerial vehicle (UAV) networks with flexible topologies and ultra‑high data rates are expected to empower numerous applications in security surveillance, disaster response, and environmental monitoring, among others. However, the dynamic topologies hinder the efficient long‑term joint power and antenna array resource allocation for THz links among UAVs. Furthermore, the continuous nature of power and the discrete nature of antennas cause this joint resource allocation problem to be a mixed‑integer nonlinear programming (MINLP) problem with non‑convexity and NP‑hardness. Inspired by recent rapid advancements in deep reinforcement learning (DRL), a graph neural network (GNN) aided DRL algorithm for resource allocation in the dynamic THz UAV network with an emphasis on self‑node features (GLOVE) is proposed in this paper, with the aim of resource efficiency (RE) maximization. When training the allocation policy for each UAV, GLOVE learns the relationship between this UAV and its neighboring UAVs via GNN, while also emphasizing the important self‑node features of this UAV. In addition, a multi‑task structure is leveraged by GLOVE to cooperatively train resource allocation decisions for the power and sub‑arrays of all UAVs. Experimental results illustrate that GLOVE outperforms benchmark schemes in terms of the highest RE and the lowest latency. Moreover, unlike the benchmark methods with severe packet loss, GLOVE maintains zero packet loss during the entire training process, demonstrating its better robustness under the highly dynamic THz UAV network.
Authors: Chenxu Peng, Chenxu Wang, Minrui Zou, Danyang Li, Zhengpeng Yang, Yimian Dai, Ming-Ming Cheng, Xiang Li
Abstract: Infrared object tracking plays a crucial role in Anti‑Unmanned Aerial Vehicle (Anti‑UAV) applications. Existing trackers often depend on cropped template regions and have limited motion modeling capabilities, which pose challenges when dealing with tiny targets. To address this, we propose a simple yet effective infrared tiny‑object tracker that enhances tracking performance by integrating global detection and motion‑aware learning with temporal priors. Our method is based on object detection and achieves significant improvements through two key innovations. First, we introduce frame dynamics, leveraging frame difference and optical flow to encode both prior target features and motion characteristics at the input level, enabling the model to better distinguish the target from background clutter. Second, we propose a trajectory constraint filtering strategy in the post‑processing stage, utilizing spatio‑temporal priors to suppress false positives and enhance tracking robustness. Extensive experiments show that our method consistently outperforms existing approaches across multiple metrics in challenging infrared UAV tracking scenarios. Notably, we achieve state‑of‑the‑art performance in the 4th Anti‑UAV Challenge, securing 1st place in Track 1 and 2nd place in Track 2.
Authors: Kailash A. Hambarde, Nzakiese Mbongo, Pavan Kumar MP, Satish Mekewad, Carolina Fernandes, Gökhan Silahtaroğlu, Alice Nithya, Pawan Wasnik, MD. Rashidunnabi, Pranita Samale, Hugo Proença
Abstract: Person reidentification (ReID) technology has been considered to perform relatively well under controlled, ground‑level conditions, but it breaks down when deployed in challenging real‑world settings. Evidently, this is due to extreme data variability factors such as resolution, viewpoint changes, scale variations, occlusions, and appearance shifts from clothing or session drifts. Moreover, the publicly available data sets do not realistically incorporate such kinds and magnitudes of variability, which limits the progress of this technology. This paper introduces DetReIDX, a large‑scale aerial‑ground person dataset, that was explicitly designed as a stress test to ReID under real‑world conditions. DetReIDX is a multi‑session set that includes over 13 million bounding boxes from 509 identities, collected in seven university campuses from three continents, with drone altitudes between 5.8 and 120 meters. More important, as a key novelty, DetReIDX subjects were recorded in (at least) two sessions on different days, with changes in clothing, daylight and location, making it suitable to actually evaluate long‑term person ReID. Plus, data were annotated from 16 soft biometric attributes and multitask labels for detection, tracking, ReID, and action recognition. In order to provide empirical evidence of DetReIDX usefulness, we considered the specific tasks of human detection and ReID, where SOTA methods catastrophically degrade performance (up to 80% in detection accuracy and over 70% in Rank‑1 ReID) when exposed to DetReIDXs conditions. The dataset, annotations, and official evaluation protocols are publicly available at https://www.it.ubi.pt/DetReIDX/
Authors: Rahman Saadat Yeganeh, Hamid Behroozi
Abstract: This paper proposes an advanced non‑terrestrial communication architecture that integrates Rate‑Splitting Multiple Access (RSMA) with a Beyond‑Diagonal Active Reconfigurable Intelligent Surface (BD‑ARIS) mounted on a UAV under the coverage of a Low Earth Orbit (LEO) satellite. The BD‑ARIS adopts a group‑connected structure to enhance signal amplification and adaptability, while RSMA enables efficient multi‑user access by dividing messages into common and private components. The system jointly optimizes satellite beamforming, UAV positioning, power allocation, and rate‑splitting ratios to maximize the overall energy efficiency (EE). To solve the resulting non‑convex and high‑dimensional problem, we employ three state‑of‑the‑art deep reinforcement learning (DRL) algorithms: Trust Region Policy Optimization (TRPO), Twin Delayed Deep Deterministic Policy Gradient (TD3), and Asynchronous Advantage Actor‑Critic (A3C). Moreover, realistic models for the power consumption of both the UAV and the BD‑ARIS are considered. Simulation results reveal that TRPO consistently achieves the best performance in terms of EE and sum rate, especially under high transmit powers and challenging deployment scenarios. TD3 converges faster and performs competitively in moderate settings, while A3C suffers from instability due to its high variance. Additionally, the robustness of each algorithm under channel state information (CSI) uncertainty is evaluated, confirming TRPO resilience to imperfect observations. Overall, the proposed RSMA‑BD‑ARIS framework significantly outperforms conventional RIS‑assisted designs and provides a scalable, energy‑efficient solution for 6G and massive IoT applications in non‑terrestrial networks.
Authors: Amber Batool, Faryal Batool, Roohan Ahmed Khan, Muhammad Ahsan Mustafa, Aleksey Fedoseev, Dzmitry Tsetserukou
Abstract: Quadcopters are versatile aerial robots gaining popularity in numerous critical applications. However, their operational effectiveness is constrained by limited battery life and restricted flight range. To address these challenges, autonomous drone landing on stationary or mobile charging and battery‑swapping stations has become an essential capability. In this study, we present NMPC‑Lander, a novel control architecture that integrates Nonlinear Model Predictive Control (NMPC) with Control Barrier Functions (CBF) to achieve precise and safe autonomous landing on both static and dynamic platforms. Our approach employs NMPC for accurate trajectory tracking and landing, while simultaneously incorporating CBF to ensure collision avoidance with static obstacles. Experimental evaluations on the real hardware demonstrate high precision in landing scenarios, with an average final position error of 9.0 cm and 11 cm for stationary and mobile platforms, respectively. Notably, NMPC‑Lander outperforms the B‑spline combined with the A planning method by nearly threefold in terms of position tracking, underscoring its superior robustness and practical effectiveness.
Authors: Xinyuan Zhang, Yonglin Tian, Fei Lin, Yue Liu, Jing Ma, Kornélia Sára Szatmáry, Fei-Yue Wang
Abstract: The growing demand for intelligent logistics, particularly fine‑grained terminal delivery, underscores the need for autonomous UAV (Unmanned Aerial Vehicle)‑based delivery systems. However, most existing last‑mile delivery studies rely on ground robots, while current UAV‑based Vision‑Language Navigation (VLN) tasks primarily focus on coarse‑grained, long‑range goals, making them unsuitable for precise terminal delivery. To bridge this gap, we propose LogisticsVLN, a scalable aerial delivery system built on multimodal large language models (MLLMs) for autonomous terminal delivery. LogisticsVLN integrates lightweight Large Language Models (LLMs) and Visual‑Language Models (VLMs) in a modular pipeline for request understanding, floor localization, object detection, and action‑decision making. To support research and evaluation in this new setting, we construct the Vision‑Language Delivery (VLD) dataset within the CARLA simulator. Experimental results on the VLD dataset showcase the feasibility of the LogisticsVLN system. In addition, we conduct subtask‑level evaluations of each module of our system, offering valuable insights for improving the robustness and real‑world deployment of foundation model‑based vision‑language delivery systems.
Authors: Yue Chen, Hui Kang, Jiahui Li, Geng Sun, Boxiong Wang, Jiacheng Wang, Cong Liang, Shuang Liang, Dusit Niyato
Abstract: The integration of simultaneous wireless information and power transfer (SWIPT) technology in 6G Internet of Things (IoT) networks faces significant challenges in remote areas and disaster scenarios where ground infrastructure is unavailable. This paper proposes a novel unmanned aerial vehicle (UAV)‑assisted mobile edge computing (MEC) system enhanced by directional antennas to provide both computational resources and energy support for ground IoT terminals. However, such systems require multiple trade‑off policies to balance UAV energy consumption, terminal battery levels, and computational resource allocation under various constraints, including limited UAV battery capacity, non‑linear energy harvesting characteristics, and dynamic task arrivals. To address these challenges comprehensively, we formulate a bi‑objective optimization problem that simultaneously considers system energy efficiency and terminal battery sustainability. We then reformulate this non‑convex problem with a hybrid solution space as a Markov decision process (MDP) and propose an improved soft actor‑critic (SAC) algorithm with an action simplification mechanism to enhance its convergence and generalization capabilities. Simulation results have demonstrated that our proposed approach outperforms various baselines in different scenarios, achieving efficient energy management while maintaining high computational performance. Furthermore, our method shows strong generalization ability across different scenarios, particularly in complex environments, validating the effectiveness of our designed boundary penalty and charging reward mechanisms.
Authors: Yixuan Huang, Jie Yang, Chao-Kai Wen, Shuqiang Xia, Xiao Li, Shi Jin
Abstract: The low‑altitude economy has emerged as a critical focus for future economic development, emphasizing the urgent need for flight activity surveillance utilizing the existing sensing capabilities of mobile cellular networks. Traditional monostatic or localization‑based sensing methods, however, encounter challenges in fusing sensing results and matching channel parameters. To address these challenges, we propose an innovative approach that directly draws the radio images of the low‑altitude space, leveraging its inherent sparsity with compressed sensing (CS)‑based algorithms and the cooperation of multiple base stations. Furthermore, recognizing that unmanned aerial vehicles (UAVs) are randomly distributed in space, we introduce a physics‑embedded learning method to overcome off‑grid issues inherent in CS‑based models. Additionally, an online hard example mining method is incorporated into the design of the loss function, enabling the network to adaptively concentrate on the samples bearing significant discrepancy with the ground truth, thereby enhancing its ability to detect the rare UAVs within the expansive low‑altitude space. Simulation results demonstrate the effectiveness of the imaging‑based low‑altitude surveillance approach, with the proposed physics‑embedded learning algorithm significantly outperforming traditional CS‑based methods under off‑grid conditions.
Authors: Siao Wang, Zhen Dong, Hui Li, Liwei Shen, Xin Peng, Dongdong She
Abstract: Flight control programs use PID control modules with user‑configurable Proportional (P), Integral (I), and Derivative (D) parameters to manage UAV flying behaviors. Users can adjust these PID parameters during flight. However, flight control programs lack sufficient safety checks on user‑provided PID parameters, leading to a severe UAV vulnerability ‑ the input validation bug. This occurs when a user misconfigures PID parameters, causing dangerous states like deviation from the expected path, loss of control, or crash.
Prior works use random testing like fuzzing, but these are not effective in the three‑dimensional search space of PID parameters. The expensive dynamic execution of UAV tests further hinders random testing performance.
We address PID parameter misconfiguration by combining the Routh‑Hurwitz stability criterion with coordinate search, introducing RouthSearch. Instead of ad‑hoc identification, RouthSearch principledly determines valid ranges for three‑dimensional PID parameters. We first leverage the Routh‑Hurwitz Criterion to identify a theoretical PID parameter boundary, then refine it using efficient coordinate search. The determined valid range can filter misconfigured PID parameters from users during flight and help discover logical bugs in flight control programs.
We evaluated RouthSearch across eight flight modes in PX4 and Ardupilot. Results show RouthSearch determines valid ranges with 92.0% accuracy compared to ground truth. RouthSearch discovers 3,853 PID misconfigurations within 48 hours, while the STOA work PGFuzz discovers only 449 sets, significantly outperforming prior works by 8.58 times. Our method also helped detect three bugs in ArduPilot and PX4.
Authors: Ivan Tan, Wei Minn, Christopher M. Poskitt, Lwin Khin Shar, Lingxiao Jiang
Abstract: UAVs, commonly referred to as drones, have witnessed a remarkable surge in popularity due to their versatile applications. These cyber‑physical systems depend on multiple sensor inputs, such as cameras, GPS receivers, accelerometers, and gyroscopes, with faults potentially leading to physical instability and serious safety concerns. To mitigate such risks, anomaly detection has emerged as a crucial safeguarding mechanism, capable of identifying the physical manifestations of emerging issues and allowing operators to take preemptive action at runtime. Recent anomaly detection methods based on LSTM neural networks have shown promising results, but three challenges persist: the need for models that can generalise across the diverse mission profiles of drones; the need for interpretability, enabling operators to understand the nature of detected problems; and the need for capturing domain knowledge that is difficult to infer solely from log data. Motivated by these challenges, this paper introduces RADD, an integrated approach to anomaly detection in drones that combines rule mining and unsupervised learning. In particular, we leverage rules (or invariants) to capture expected relationships between sensors and actuators during missions, and utilise unsupervised learning techniques to cover more subtle relationships that the rules may have missed. We implement this approach using the ArduPilot drone software in the Gazebo simulator, utilising 44 rules derived across the main phases of drone missions, in conjunction with an ensemble of five unsupervised learning models. We find that our integrated approach successfully detects 93.84% of anomalies over six types of faults with a low false positive rate (2.33%), and can be deployed effectively at runtime. Furthermore, RADD outperforms a state‑of‑the‑art LSTM‑based method in detecting the different types of faults evaluated in our study.
Authors: Chenyang Fan, Xujie Zhu, Taige Luo, Sheng Xu, Zhulin Chen, Hongxin Yang
Abstract: The pattern analysis of tree structure holds significant scientific value for genetic breeding and forestry management. The current trunk and branch extraction technologies are mainly LiDAR‑based or UAV‑based. The former approaches obtain high‑precision 3D data, but its equipment cost is high and the three‑dimensional (3D) data processing is complex. The latter approaches efficiently capture canopy information, but they miss the 3‑D structure of trees. In order to deal with the branch information extraction from the complex background interference and occlusion, this work proposes a novel WaveInst instance segmentation framework, involving a discrete wavelet transform, to enhance multi‑scale edge information for accurately improving tree structure extraction. Experimental results of the proposed model show superior performance on SynthTree43k, CaneTree100, Urban Street and our PoplarDataset. Moreover, we present a new Phenotypic dataset PoplarDataset, which is dedicated to extract tree structure and pattern analysis from artificial forest. The proposed method achieves a mean average precision of 49.6 and 24.3 for the structure extraction of mature and juvenile trees, respectively, surpassing the existing state‑of‑the‑art method by 9.9. Furthermore, by in tegrating the segmentation model within the regression model, we accurately achieve significant tree grown parameters, such as the location of trees, the diameter‑at‑breast‑height of individual trees, and the plant height, from 2D images directly. This study provides a scientific and plenty of data for tree structure analysis in related to the phenotype research, offering a platform for the significant applications in precision forestry, ecological monitoring, and intelligent breeding.
Authors: Michael Marinaccio, Fatemeh Afghah
Abstract: High‑fidelity wildfire monitoring using Unmanned Aerial Vehicles (UAVs) typically requires multimodal sensing ‑ especially RGB and thermal imagery ‑ which increases hardware cost and power consumption. This paper introduces SAM‑TIFF, a novel teacher‑student distillation framework for pixel‑level wildfire temperature prediction and segmentation using RGB input only. A multimodal teacher network trained on paired RGB‑Thermal imagery and radiometric TIFF ground truth distills knowledge to a unimodal RGB student network, enabling thermal‑sensor‑free inference. Segmentation supervision is generated using a hybrid approach of segment anything (SAM)‑guided mask generation, and selection via TOPSIS, along with Canny edge detection and Otsu's thresholding pipeline for automatic point prompt selection. Our method is the first to perform per‑pixel temperature regression from RGB UAV data, demonstrating strong generalization on the recent FLAME 3 dataset. This work lays the foundation for lightweight, cost‑effective UAV‑based wildfire monitoring systems without thermal sensors.
Authors: Mingfeng Tang, Ningna Wang, Ziyuan Xie, Jianwei Hu, Ke Xie, Xiaohu Guo, Hui Huang
Abstract: We present the first scene‑update aerial path planning algorithm specifically designed for detecting and updating change areas in urban environments. While existing methods for large‑scale 3D urban scene reconstruction focus on achieving high accuracy and completeness, they are inefficient for scenarios requiring periodic updates, as they often re‑explore and reconstruct entire scenes, wasting significant time and resources on unchanged areas. To address this limitation, our method leverages prior reconstructions and change probability statistics to guide UAVs in detecting and focusing on areas likely to have changed. Our approach introduces a novel changeability heuristic to evaluate the likelihood of changes, driving the planning of two flight paths: a prior path informed by static priors and a dynamic real‑time path that adapts to newly detected changes. The framework integrates surface sampling and candidate view generation strategies, ensuring efficient coverage of change areas with minimal redundancy. Extensive experiments on real‑world urban datasets demonstrate that our method significantly reduces flight time and computational overhead, while maintaining high‑quality updates comparable to full‑scene re‑exploration and reconstruction. These contributions pave the way for efficient, scalable, and adaptive UAV‑based scene updates in complex urban environments.
Authors: Yan Miao, Will Shen, Hang Cui, Sayan Mitra
Abstract: We introduce FalconWing, an ultra‑light (150 g) indoor fixed‑wing UAV platform for vision‑based autonomy. Controlled indoor environment enables year‑round repeatable UAV experiment but imposes strict weight and maneuverability limits on the UAV, motivating our ultra‑light FalconWing design. FalconWing couples a lightweight hardware stack (137g airframe with a 9g camera) and offboard computation with a software stack featuring a photorealistic 3D Gaussian Splat (GSplat) simulator for developing and evaluating vision‑based controllers. We validate FalconWing on two challenging vision‑based aerial case studies. In the leader‑follower case study, our best vision‑based controller, trained via imitation learning on GSplat‑rendered data augmented with domain randomization, achieves 100% tracking success across 3 types of leader maneuvers over 30 trials and shows robustness to leader's appearance shifts in simulation. In the autonomous landing case study, our vision‑based controller trained purely in simulation transfers zero‑shot to real hardware, achieving an 80% success rate over ten landing trials. We will release hardware designs, GSplat scenes, and dynamics models upon publication to make FalconWing an open‑source flight kit for engineering students and research labs.
Authors: Taewook Park, Jinwoo Lee, Hyondong Oh, Won-Jae Yun, Kyu-Wha Lee
Abstract: As the agricultural workforce declines and labor costs rise, robotic yield estimation has become increasingly important. While unmanned ground vehicles (UGVs) are commonly used for indoor farm monitoring, their deployment in greenhouses is often constrained by infrastructure limitations, sensor placement challenges, and operational inefficiencies. To address these issues, we develop a lightweight unmanned aerial vehicle (UAV) equipped with an RGB‑D camera, a 3D LiDAR, and an IMU sensor. The UAV employs a LiDAR‑inertial odometry algorithm for precise navigation in GNSS‑denied environments and utilizes a 3D multi‑object tracking algorithm to estimate the count and weight of cherry tomatoes. We evaluate the system using two dataset: one from a harvesting row and another from a growing row. In the harvesting‑row dataset, the proposed system achieves 94.4% counting accuracy and 87.5% weight estimation accuracy within a 13.2‑meter flight completed in 10.5 seconds. For the growing‑row dataset, which consists of occluded unripened fruits, we qualitatively analyze tracking performance and highlight future research directions for improving perception in greenhouse with strong occlusions. Our findings demonstrate the potential of UAVs for efficient robotic yield estimation in commercial greenhouses.
Authors: Haocheng Meng, Shaocheng Luo, Zhenyuan Liang, Qing Huang, Amir Khazraei, Miroslav Pajic
Abstract: Unmanned Aerial Vehicles (UAVs) rely on measurements from Inertial Measurement Units (IMUs) to maintain stable flight. However, IMUs are susceptible to physical attacks, including acoustic resonant and electromagnetic interference attacks, resulting in immediate UAV crashes. Consequently, we introduce a Model‑based Anomaly detection and Recovery System (MARS) that enables UAVs to quickly detect adversarial attacks on inertial sensors and achieve dynamic flight recovery. MARS features an attack‑resilient state estimator based on the Extended Kalman Filter, which incorporates position, velocity, heading, and rotor speed measurements to reconstruct accurate attitude and angular velocity information for UAV control. Moreover, a statistical anomaly detection system monitors IMU sensor data, raising a system‑level alert if an attack is detected. Upon receiving the alert, a multi‑stage dynamic flight recovery strategy suspends the ongoing mission, stabilizes the drone in a hovering condition, and then resumes tasks under the resilient control. Experimental results in PX4 software‑in‑the‑loop environments as well as real‑world MARS‑PX4 autopilot‑equipped drones demonstrate the superiority of our approach over existing IMU‑defense frameworks, showcasing the ability of the UAVs to survive attacks and complete the missions.
Authors: Xuzhao Li, Xuchen Li, Shiyu Hu
Abstract: Nighttime UAV tracking presents significant challenges due to extreme illumination variations and viewpoint changes, which severely degrade tracking performance. Existing approaches either rely on light enhancers with high computational costs or introduce redundant domain adaptation mechanisms, failing to fully utilize the dynamic features in varying perspectives. To address these issues, we propose DARTer (Dynamic Adaptive Representation Tracker), an end‑to‑end tracking framework designed for nighttime UAV scenarios. DARTer leverages a Dynamic Feature Blender (DFB) to effectively fuse multi‑perspective nighttime features from static and dynamic templates, enhancing representation robustness. Meanwhile, a Dynamic Feature Activator (DFA) adaptively activates Vision Transformer layers based on extracted features, significantly improving efficiency by reducing redundant computations. Our model eliminates the need for complex multi‑task loss functions, enabling a streamlined training process. Extensive experiments on multiple nighttime UAV tracking benchmarks demonstrate the superiority of DARTer over state‑of‑the‑art trackers. These results confirm that DARTer effectively balances tracking accuracy and efficiency, making it a promising solution for real‑world nighttime UAV tracking applications.
Authors: Mohammed Saif, Shahrokh Valaee
Abstract: The integration of reconfigurable intelligent surfaces (RISs) and unmanned aerial vehicles (UAVs) has emerged as a promising solution for enhancing connectivity in future wireless networks. This paper designs well‑connected and resilient UAV networks by deploying and virtually partitioning multiple RISs to create multiple RIS‑aided links, focusing on a link‑layer perspective. The RIS‑aided links are created to connect user equipment (UE) to blocked and reliable UAVs, where multiple UEs can transmit to same UAV via RIS using non‑orthogonal multiple access (NOMA), granting access to UEs and maximizing network connectivity. We first derive exact and approximated closed‑form expressions for signal‑to‑interference plus noise ratio (SINR) based on aligned and non‑aligned RIS‑aided beams. Then, we propose to formulate the problem of maximizing network connectivity that jointly considers (i) UE NOMA clustering, (ii) RIS‑aided link selection, and (ii) virtual RIS partitioning. This problem is a computationally expensive combinatorial optimization. To tackle this problem, a two‑step iterative approach, called RIS‑aided NOMA, is proposed. In the first step, the UEs are clustered to the RISs according to their channel gains, while UAVs are associated to those generated clusters based on their reliability, which measures the criticality of UAVs. The second step optimally partitions the RISs to support each of the cluster members. In this step, we derive the closed‑form equations for the optimal partitioning of RISs within the clusters. Simulation results demonstrate that the proposed RIS‑aided NOMA yields a gain of 30% to 40%, respectively, compared to UAV traditional scheme. The finding emphasizes the potential of integrating RIS with UAV communications as a robust and reliable connectivity solution for future wireless communication systems.
Authors: Tengchao Zhang, Yonglin Tian, Fei Lin, Jun Huang, Patrik P. Süli, Qinghua Ni, Rui Qin, Xiao Wang, Fei-Yue Wang
Abstract: With the increasing demand for heterogeneous Unmanned Aerial Vehicle (UAV) swarms to perform complex tasks in urban environments, system design now faces major challenges, including efficient semantic understanding, flexible task planning, and the ability to dynamically adjust coordination strategies in response to evolving environmental conditions and continuously changing task requirements. To address the limitations of existing methods, this paper proposes CoordField, a coordination field agent system for coordinating heterogeneous drone swarms in complex urban scenarios. In this system, large language models (LLMs) is responsible for interpreting high‑level human instructions and converting them into executable commands for the UAV swarms, such as patrol and target tracking. Subsequently, a Coordination field mechanism is proposed to guide UAV motion and task selection, enabling decentralized and adaptive allocation of emergent tasks. A total of 50 rounds of comparative testing were conducted across different models in a 2D simulation space to evaluate their performance. Experimental results demonstrate that the proposed system achieves superior performance in terms of task coverage, response time, and adaptability to dynamic changes.
Authors: Hossein Davoudi, Behrouz Shahgholi Ghahfarokhi, Neda Moghim, Sachin Shetty
Abstract: In the future wireless networks, terrestrial, aerial, space, and maritime wireless networks are integrated into a unified network to meet the needs of a fully connected global network. Nowadays, vehicular communication has become one of the challenging applications of wireless networks. In this article, we aim to address the radio resource management in Cellular V2X (C‑V2X) networks using Unmanned Aerial Vehicles (UAV) and Non‑orthogonal multiple access (NOMA). The goal of this problem is to maximize the spectral efficiency of vehicular users in Cellular Vehicle‑to‑Everything (C‑V2X) networks under a fronthaul constraint. To solve this problem, a two‑stage approach is utilized. In the first stage, vehicles in dense area are clustered based on their geographical locations, predicted location of vehicles, and speeds. Then UAVs are deployed to serve the clusters. In the second stage, NOMA groups are formed within each cluster and radio resources are allocated to vehicles based on NOMA groups. An optimization problem is formulated and a suboptimal method is used to solve it. The performance of the proposed method is evaluated through simulations where results demonstrate superiority of proposed method in spectral efficiency, min point, and distance.
Authors: Yixian Wang, Geng Sun, Zemin Sun, Jiacheng Wang, Jiahui Li, Changyuan Zhao, Jing Wu, Shuang Liang, Minghao Yin, Pengfei Wang, Dusit Niyato, Sumei Sun, Dong In Kim
Abstract: The rise of the low‑altitude economy (LAE) is propelling urban development and emerging industries by integrating advanced technologies to enhance efficiency, safety, and sustainability in low‑altitude operations. The widespread adoption of unmanned aerial vehicles (UAVs) and electric vertical takeoff and landing (eVTOL) aircraft plays a crucial role in enabling key applications within LAE, such as urban logistics, emergency rescue, and aerial mobility. However, unlike traditional UAV networks, LAE networks encounter increased airspace management demands due to dense flying nodes and potential interference with ground communication systems. In addition, there are heightened and extended security risks in real‑time operations, particularly the vulnerability of low‑altitude aircraft to cyberattacks from ground‑based threats. To address these, this paper first explores related standards and core architecture that support the development of LAE networks. Subsequently, we highlight the integration of technologies such as communication, sensing, computing, positioning, navigation, surveillance, flight control, and airspace management. This synergy of multi‑technology drives the advancement of real‑world LAE applications, particularly in improving operational efficiency, optimizing airspace usage, and ensuring safety. Finally, we outline future research directions for LAE networks, such as intelligent and adaptive optimization, security and privacy protection, sustainable energy and power management, quantum‑driven coordination, generative governance, and three‑dimensional (3D) airspace coverage, which collectively underscore the potential of collaborative technologies to advance LAE networks.
Authors: Pranav Saxena, Nishant Raghuvanshi, Neena Goveas
Abstract: A core challenge in AI‑guided autonomy is enabling agents to navigate realistically and effectively in previously unseen environments based on natural language commands. We propose UAV‑VLN, a novel end‑to‑end Vision‑Language Navigation (VLN) framework for Unmanned Aerial Vehicles (UAVs) that seamlessly integrates Large Language Models (LLMs) with visual perception to facilitate human‑interactive navigation. Our system interprets free‑form natural language instructions, grounds them into visual observations, and plans feasible aerial trajectories in diverse environments.
UAV‑VLN leverages the common‑sense reasoning capabilities of LLMs to parse high‑level semantic goals, while a vision model detects and localizes semantically relevant objects in the environment. By fusing these modalities, the UAV can reason about spatial relationships, disambiguate references in human instructions, and plan context‑aware behaviors with minimal task‑specific supervision. To ensure robust and interpretable decision‑making, the framework includes a cross‑modal grounding mechanism that aligns linguistic intent with visual context.
We evaluate UAV‑VLN across diverse indoor and outdoor navigation scenarios, demonstrating its ability to generalize to novel instructions and environments with minimal task‑specific training. Our results show significant improvements in instruction‑following accuracy and trajectory efficiency, highlighting the potential of LLM‑driven vision‑language interfaces for safe, intuitive, and generalizable UAV autonomy.
Authors: Kıvanç Şerefoğlu, Önder Gürcan, Reyhan Aydoğan
Abstract: We present a simulation tool for evaluating team formation in autonomous multi‑UAV (Unmanned Aerial Vehicle) missions that operate Beyond Visual Line of Sight (BVLOS). The tool models UAV collaboration and mission execution in dynamic and adversarial conditions, where Byzantine UAVs attempt to disrupt operations. Our tool allows researchers to integrate and compare various team formation strategies in a controlled environment with configurable mission parameters and adversarial behaviors. The log of each simulation run is stored in a structured way along with performance metrics so that statistical analysis could be done straightforwardly. The tool is versatile for testing and improving UAV coordination strategies in real‑world applications.
Authors: Md Safwan Mondal, Subramanian Ramasamy, Luca Russo, James D. Humann, James M. Dotterweich, Pranav Bhounsule
Abstract: Efficient mission planning for cooperative systems involving Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs) requires addressing energy constraints, scalability, and coordination challenges between agents. UAVs excel in rapidly covering large areas but are constrained by limited battery life, while UGVs, with their extended operational range and capability to serve as mobile recharging stations, are hindered by slower speeds. This heterogeneity makes coordination between UAVs and UGVs critical for achieving optimal mission outcomes. In this work, we propose a scalable deep reinforcement learning (DRL) framework to address the energy‑constrained cooperative routing problem for multi‑agent UAV‑UGV teams, aiming to visit a set of task points in minimal time with UAVs relying on UGVs for recharging during the mission. The framework incorporates sortie‑wise agent switching to efficiently manage multiple agents, by allocating task points and coordinating actions. Using an encoder‑decoder transformer architecture, it optimizes routes and recharging rendezvous for the UAV‑UGV team in the task scenario. Extensive computational experiments demonstrate the framework's superior performance over heuristic methods and a DRL baseline, delivering significant improvements in solution quality and runtime efficiency across diverse scenarios. Generalization studies validate its robustness, while dynamic scenario highlights its adaptability to real‑time changes with a case study. This work advances UAV‑UGV cooperative routing by providing a scalable, efficient, and robust solution for multi‑agent mission planning.
Authors: Bhavya Dixit, Ananthapadmanabhan A., Adheeba Thahsin, Saketh Pathak, Gaurav S. Kasbekar, Arnab Maity
Abstract: We present MAVShield, a novel lightweight cipher designed to secure communications in Unmanned Aerial Vehicles (UAVs) using the MAVLink protocol, which by default transmits unencrypted messages between UAVs and Ground Control Stations (GCS). While existing studies propose encryption for MAVLink, most remain theoretical or simulation‑based. We implement MAVShield alongside AES‑CTR, ChaCha20, Speck‑CTR, and Rabbit, and evaluate them on a real drone testbed. A comprehensive security analysis using statistical test suites (NIST and Diehard) demonstrates strong resistance of the novel cipher to cryptanalysis. Performance evaluation across key metrics including memory usage, CPU load, and battery power consumption, demonstrates that MAVShield outperforms existing algorithms and offers an efficient, real‑world solution for securing MAVLink communications in UAVs.
Authors: Yunbo Wang, Cong Sun, Qiaosen Liu, Bingnan Su, Zongxu Zhang, Michael Norris, Gang Tan, Jianfeng Ma
Abstract: Sensor attacks on robotic vehicles have become pervasive and manipulative. Their latest advancements exploit sensor and detector characteristics to bypass detection. Recent security efforts have leveraged the physics‑based model to detect or mitigate sensor attacks. However, these approaches are only resilient to a few sensor attacks and still need improvement in detection effectiveness. We present VIMU, an efficient sensor attack detection and resilience system for unmanned aerial vehicles. We propose a detection algorithm, CS‑EMA, that leverages low‑pass filtering to identify stealthy gyroscope attacks while achieving an overall effective sensor attack detection. We develop a fine‑grained nonlinear physical model with precise aerodynamic and propulsion wrench modeling. We also augment the state estimation with a FIFO buffer safeguard to mitigate the impact of high‑rate IMU attacks. The proposed physical model and buffer safeguard provide an effective system state recovery toward maintaining flight stability. We implement VIMU on PX4 autopilot. The evaluation results demonstrate the effectiveness of VIMU in detecting and mitigating various realistic sensor attacks, especially stealthy attacks.
Authors: Wenxuan Liu, Zhuo Zhou, Xuemei Jia, Siyuan Yang, Wenxin Huang, Xian Zhong, Chia-Wen Lin
Abstract: Action recognition in unmanned aerial vehicles (UAVs) poses unique challenges due to significant view variations along the vertical spatial axis. Unlike traditional ground‑based settings, UAVs capture actions at a wide range of altitudes, resulting in considerable appearance discrepancies. We introduce a multi‑view formulation tailored to varying UAV altitudes and empirically observe a partial order among views, where recognition accuracy consistently decreases as altitude increases. This observation motivates a novel approach that explicitly models the hierarchical structure of UAV views to improve recognition performance across altitudes. To this end, we propose the Partial Order Guided Multi‑View Network (POG‑MVNet), designed to address drastic view variations by effectively leveraging view‑dependent information across different altitude levels. The framework comprises three key components: a View Partition (VP) module, which uses the head‑to‑body ratio to group views by altitude; an Order‑aware Feature Decoupling (OFD) module, which disentangles action‑relevant and view‑specific features under partial order guidance; and an Action Partial Order Guide (APOG), which uses the partial order to transfer informative knowledge from easier views to more challenging ones. We conduct experiments on Drone‑Action, MOD20, and UAV, demonstrating that POG‑MVNet significantly outperforms competing methods. For example, POG‑MVNet achieves a 4.7% improvement on Drone‑Action and a 3.5% improvement on UAV compared to state‑of‑the‑art methods ASAT and FAR. Code will be released soon.
Authors: Trinh Van Chien, Nguyen Minh Quan, Oh-Soon Shin, Van-Dinh Nguyen
Abstract: The integration of unmanned aerial vehicles (UAVs) into wireless communication systems has emerged as a transformative approach, promising cost‑efficient connectivity. This paper addresses the optimization of the dynamic time‑splitting ratio and flight trajectory for a communication system linking a ground base station to the UAV equipped with backscatter devices (referred to as UB), and from UB to an end user. Given the inherent non‑convexity of the problem, we develop two meta‑heuristic‑based approaches inspired by genetic algorithm and particle swarm optimization to enhance the total achievable rate while reducing computational complexity. Numerical results demonstrate the effectiveness of these meta‑heuristic solutions, showcasing significant improvements in the achievable rate and computation time compared to existing benchmarks.
Authors: Kai Ye, Haidi Tang, Bowen Liu, Pingyang Dai, Liujuan Cao, Rongrong Ji
Abstract: Applications of unmanned aerial vehicle (UAV) in logistics, agricultural automation, urban management, and emergency response are highly dependent on oriented object detection (OOD) to enhance visual perception. Although existing datasets for OOD in UAV provide valuable resources, they are often designed for specific downstream tasks.Consequently, they exhibit limited generalization performance in real flight scenarios and fail to thoroughly demonstrate algorithm effectiveness in practical environments. To bridge this critical gap, we introduce CODrone, a comprehensive oriented object detection dataset for UAVs that accurately reflects real‑world conditions. It also serves as a new benchmark designed to align with downstream task requirements, ensuring greater applicability and robustness in UAV‑based OOD.Based on application requirements, we identify four key limitations in current UAV OOD datasets‑low image resolution, limited object categories, single‑view imaging, and restricted flight altitudes‑and propose corresponding improvements to enhance their applicability and robustness.Furthermore, CODrone contains a broad spectrum of annotated images collected from multiple cities under various lighting conditions, enhancing the realism of the benchmark. To rigorously evaluate CODrone as a new benchmark and gain deeper insights into the novel challenges it presents, we conduct a series of experiments based on 22 classical or SOTA methods.Our evaluation not only assesses the effectiveness of CODrone in real‑world scenarios but also highlights key bottlenecks and opportunities to advance OOD in UAV applications.Overall, CODrone fills the data gap in OOD from UAV perspective and provides a benchmark with enhanced generalization capability, better aligning with practical applications and future algorithm development.
Authors: Yunting Xu, Jiacheng Wang, Ruichen Zhang, Changyuan Zhao, Dusit Niyato, Jiawen Kang, Zehui Xiong, Bo Qian, Haibo Zhou, Shiwen Mao, Abbas Jamalipour, Xuemin Shen, Dong In Kim
Abstract: Mixture of Experts (MoE) has emerged as a promising paradigm for scaling model capacity while preserving computational efficiency, particularly in large‑scale machine learning architectures such as large language models (LLMs). Recent advances in MoE have facilitated its adoption in wireless networks to address the increasing complexity and heterogeneity of modern communication systems. This paper presents a comprehensive survey of the MoE framework in wireless networks, highlighting its potential in optimizing resource efficiency, improving scalability, and enhancing adaptability across diverse network tasks. We first introduce the fundamental concepts of MoE, including various gating mechanisms and the integration with generative AI (GenAI) and reinforcement learning (RL). Subsequently, we discuss the extensive applications of MoE across critical wireless communication scenarios, such as vehicular networks, unmanned aerial vehicles (UAVs), satellite communications, heterogeneous networks, integrated sensing and communication (ISAC), and mobile edge networks. Furthermore, key applications in channel prediction, physical layer signal processing, radio resource management, network optimization, and security are thoroughly examined. Additionally, we present a detailed overview of open‑source datasets that are widely used in MoE‑based models to support diverse machine learning tasks. Finally, this survey identifies crucial future research directions for MoE, emphasizing the importance of advanced training techniques, resource‑aware gating strategies, and deeper integration with emerging 6G technologies.
Authors: Qianyi Zhang, Shijian Ma, Boyi Liu, Jianhao Jiao, Dimitrios Kanoulas
Abstract: Robust and flexible leader‑following is a critical capability for robots to integrate into human society. While existing methods struggle to generalize to leaders of arbitrary form and often fail when the leader temporarily leaves the robot's field of view, this work introduces a unified framework addressing both challenges. First, traditional detection models are replaced with a segmentation model, allowing the leader to be anything. To enhance recognition robustness, a distance frame buffer is implemented that stores leader embeddings at multiple distances, accounting for the unique characteristics of leader‑following tasks. Second, a goal‑aware adaptation mechanism is designed to govern robot planning states based on the leader's visibility and motion, complemented by a graph‑based planner that generates candidate trajectories for each state, ensuring efficient following with obstacle avoidance. Simulations and real‑world experiments with a legged robot follower and various leaders (human, ground robot, UAV, legged robot, stop sign) in both indoor and outdoor environments show competitive improvements in follow success rate, reduced visual loss duration, lower collision rate, and decreased leader‑follower distance.
Authors: Khashayar Ghanizadegan, Hashim A. Hashim
Abstract: This paper introduces a geometric Quaternion‑based Unscented Particle Filter for Visual‑Inertial Navigation (QUPF‑VIN) specifically designed for a vehicle operating with six degrees of freedom (6 DoF). The proposed QUPF‑VIN technique is quaternion‑based capturing the inherently nonlinear nature of true navigation kinematics. The filter fuses data from a low‑cost inertial measurement unit (IMU) and landmark observations obtained via a vision sensor. The QUPF‑VIN is implemented in discrete form to ensure seamless integration with onboard inertial sensing systems. Designed for robustness in GPS‑denied environments, the proposed method has been validated through experiments with real‑world dataset involving an unmanned aerial vehicle (UAV) equipped with a 6‑axis IMU and a stereo camera, operating with 6 DoF. The numerical results demonstrate that the QUPF‑VIN provides superior tracking accuracy compared to ground truth data. Additionally, a comparative analysis with a standard Kalman filter‑based navigation technique further highlights the enhanced performance of the QUPF‑VIN.
Authors: Manuel Boldrer, Vit Kratky, Viktor Walter, Martin Saska
Abstract: In this letter, we present a distributed algorithm for flocking in complex
environments that operates at constant altitude, without explicit
communication, no a priori information about the environment, and by using
only on‑board sensing and computation capabilities. We provide sufficient
conditions to guarantee collision avoidance with obstacles and other robots
without exceeding a desired maximum distance from a predefined set of
neighbors (flocking or proximity maintenance constraint) during the mission.
The proposed approach allows to operate in crowded scenarios and to explicitly
deal with tracking errors and on‑board sensing errors. The algorithm was
verified through simulations with varying number of UAVs and also through
numerous real‑world experiments in a dense forest involving up to four UAVs.
Authors: Harris K. Armeniakos, Viktor Nikolaidis, Petros S. Bithas, Konstantinos Maliatsos, Athanasios G. Kanatas
Abstract: Unmanned aerial vehicle (UAV) corridor‑assisted communication networks are expected to expand significantly in the upcoming years driven by several technological, regulatory, and societal trends. In this new type of networks, accurate and realistic channel models are essential for designing reliable, efficient, and secure communication systems. In this paper, an analytical framework is presented that is based on one‑dimensional (1D) finite point processes, namely the binomial point process (BPP) and the finite homogeneous Poisson point process (HPPP), to model the spatial locations of UAV‑Base Stations (UAV‑BSs). To this end, the shadowing conditions experienced in the UAV‑BS‑to‑ground users links are accurately considered in a realistic maximum power‑based user association policy. Subsequently, coverage probability analysis under the two spatial models is conducted, and exact‑form expressions are derived. In an attempt to reduce the analytical complexity of the derived expressions, a dominant interferer‑based approach is also investigated. Finally, the main outcomes of this paper are extensively validated by empirical data collected in an air‑to‑ground measurement campaign. To the best of the authors' knowledge, this is the first work to experimentally verify a generic spatial model by jointly considering the random spatial and shadowing characteristics of a UAV‑assisted air‑to‑ground network.
Authors: Irshad A. Meer, Bruno Hörmann, Mustafa Ozger, Fabien Geyer, Alberto Viseras, Dominic Schupke, Cicek Cavdar
Abstract: The integration of unmanned aerial vehicles (UAVs) into cellular networks presents significant mobility management challenges, primarily due to frequent handovers caused by probabilistic line‑of‑sight conditions with multiple ground base stations (BSs). To tackle these challenges, reinforcement learning (RL)‑based methods, particularly deep Q‑networks (DQN), have been employed to optimize handover decisions dynamically. However, a major drawback of these learning‑based approaches is their black‑box nature, which limits interpretability in the decision‑making process. This paper introduces an explainable AI (XAI) framework that incorporates Shapley Additive Explanations (SHAP) to provide deeper insights into how various state parameters influence handover decisions in a DQN‑based mobility management system. By quantifying the impact of key features such as reference signal received power (RSRP), reference signal received quality (RSRQ), buffer status, and UAV position, our approach enhances the interpretability and reliability of RL‑based handover solutions. To validate and compare our framework, we utilize real‑world network performance data collected from UAV flight trials. Simulation results show that our method provides intuitive explanations for policy decisions, effectively bridging the gap between AI‑driven models and human decision‑makers.
Authors: Andreas Anastasiou, Savvas Papaioannou, Panayiotis Kolios, Christos G. Panayiotou
Abstract: Nowadays, unmanned aerial vehicles (UAVs) are increasingly utilized in search and rescue missions, a trend driven by technological advancements, including enhancements in automation, avionics, and the reduced cost of electronics. In this work, we introduce a collaborative model predictive control (MPC) framework aimed at addressing the joint problem of guidance and state estimation for tracking multiple castaway targets with a fleet of autonomous UAV agents. We assume that each UAV agent is equipped with a camera sensor, which has a limited sensing range and is utilized for receiving noisy observations from multiple moving castaways adrift in maritime conditions. We derive a nonlinear mixed integer programming (NMIP) based controller that facilitates the guidance of the UAVs by generating non‑myopic trajectories within a receding planning horizon. These trajectories are designed to minimize the tracking error across multiple targets by directing the UAV fleet to locations expected to yield targets measurements, thereby minimizing the uncertainty of the estimated target states. Extensive simulation experiments validate the effectiveness of our proposed method in tracking multiple castaways in maritime environments.
Authors: Liugang Lu, Dabin He, Congxiang Liu, Zhixiang Deng
Abstract: With the rapid advancement of Unmanned Aerial Vehicle (UAV) and computer vision technologies, object detection from UAV perspectives has emerged as a prominent research area. However, challenges for detection brought by the extremely small proportion of target pixels, significant scale variations of objects, and complex background information in UAV images have greatly limited the practical applications of UAV. To address these challenges, we propose a novel object detection network Multi‑scale Context Aggregation and Scale‑adaptive Fusion YOLO (MASF‑YOLO), which is developed based on YOLOv11. Firstly, to tackle the difficulty of detecting small objects in UAV images, we design a Multi‑scale Feature Aggregation Module (MFAM), which significantly improves the detection accuracy of small objects through parallel multi‑scale convolutions and feature fusion. Secondly, to mitigate the interference of background noise, we propose an Improved Efficient Multi‑scale Attention Module (IEMA), which enhances the focus on target regions through feature grouping, parallel sub‑networks, and cross‑spatial learning. Thirdly, we introduce a Dimension‑Aware Selective Integration Module (DASI), which further enhances multi‑scale feature fusion capabilities by adaptively weighting and fusing low‑dimensional features and high‑dimensional features. Finally, we conducted extensive performance evaluations of our proposed method on the VisDrone2019 dataset. Compared to YOLOv11‑s, MASFYOLO‑s achieves improvements of 4.6% in mAP@0.5 and 3.5% in mAP@0.5:0.95 on the VisDrone2019 validation set. Remarkably, MASF‑YOLO‑s outperforms YOLOv11‑m while requiring only approximately 60% of its parameters and 65% of its computational cost. Furthermore, comparative experiments with state‑of‑the‑art detectors confirm that MASF‑YOLO‑s maintains a clear competitive advantage in both detection accuracy and model efficiency.
Authors: Yuqiao Yang, Yongzhao Zhang, Wenhao Liu, Jun Li, Pengtao Shi, DingYu Zhong, Jie Yang, Ting Chen, Sheng Cao, Yuntao Ren, Yongyue Wu, Xiaosong Zhang
Abstract: As modern vehicles evolve into intelligent and connected systems, their growing complexity introduces significant cybersecurity risks. Threat Analysis and Risk Assessment (TARA) has therefore become essential for managing these risks under mandatory regulations. However, existing TARA automation methods rely on static threat libraries, limiting their utility in the detailed, function‑level analyses demanded by industry. This paper introduces DefenseWeaver, the first system that automates function‑level TARA using component‑specific details and large language models (LLMs). DefenseWeaver dynamically generates attack trees and risk evaluations from system configurations described in an extended OpenXSAM++ format, then employs a multi‑agent framework to coordinate specialized LLM roles for more robust analysis. To further adapt to evolving threats and diverse standards, DefenseWeaver incorporates Low‑Rank Adaptation (LoRA) fine‑tuning and Retrieval‑Augmented Generation (RAG) with expert‑curated TARA reports. We validated DefenseWeaver through deployment in four automotive security projects, where it identified 11 critical attack paths, verified through penetration testing, and subsequently reported and remediated by the relevant automakers and suppliers. Additionally, DefenseWeaver demonstrated cross‑domain adaptability, successfully applying to unmanned aerial vehicles (UAVs) and marine navigation systems. In comparison to human experts, DefenseWeaver outperformed manual attack tree generation across six assessment scenarios. Integrated into commercial cybersecurity platforms such as UAES and Xiaomi, DefenseWeaver has generated over 8,200 attack trees. These results highlight its ability to significantly reduce processing time, and its scalability and transformative impact on cybersecurity across industries.
Authors: Huajie Wu, Wenyi Liu, Yunfan Ren, Zheng Liu, Hairuo Wei, Fangcheng Zhu, Haotian Li, Fu Zhang
Abstract: Navigating unmanned aerial vehicles (UAVs) through cluttered and dynamic environments remains a significant challenge, particularly when dealing with fast‑moving or sudden‑appearing obstacles. This paper introduces a complete LiDAR‑based system designed to enable UAVs to avoid various moving obstacles in complex environments. Benefiting the high computational efficiency of perception and planning, the system can operate in real time using onboard computing resources with low latency. For dynamic environment perception, we have integrated our previous work, M‑detector, into the system. M‑detector ensures that moving objects of different sizes, colors, and types are reliably detected. For dynamic environment planning, we incorporate dynamic object predictions into the integrated planning and control (IPC) framework, namely DynIPC. This integration allows the UAV to utilize predictions about dynamic obstacles to effectively evade them. We validate our proposed system through both simulations and real‑world experiments. In simulation tests, our system outperforms state‑of‑the‑art baselines across several metrics, including success rate, time consumption, average flight time, and maximum velocity. In real‑world trials, our system successfully navigates through forests, avoiding moving obstacles along its path.
Authors: Tianhao Shao, Bohan Feng, Yingying Zhou, Bin Guo, Kaixing Zhao
Abstract: Rapid progress in intelligent unmanned systems has presented new opportunities for mobile crowd sensing (MCS). Today, heterogeneous air‑ground collaborative multi‑agent framework, which comprise unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs), have presented superior flexibility and efficiency compared to traditional homogeneous frameworks in complex sensing tasks. Within this context, task allocation among different agents always play an important role in improving overall MCS quality. In order to better allocate tasks among heterogeneous collaborative agents, in this paper, we investigated two representative complex multi‑agent task allocation scenarios with dual optimization objectives: (1) For AG‑FAMT (Air‑Ground Few Agents More Tasks) scenario, the objectives are to maximize the task completion while minimizing the total travel distance; (2) For AG‑MAFT (Air‑Ground More Agents Few Tasks) scenario, where the agents are allocated based on their locations, has the optimization objectives of minimizing the total travel distance while reducing travel time cost. To achieve this, we proposed a Multi‑Task Minimum Cost Maximum Flow (MT‑MCMF) optimization algorithm tailored for AG‑FAMT, along with a multi‑objective optimization algorithm called W‑ILP designed for AG‑MAFT, with a particular focus on optimizing the charging path planning of UAVs. Our experiments based on a large‑scale real‑world dataset demonstrated that the proposed two algorithms both outperform baseline approaches under varying experimental settings, including task quantity, task difficulty, and task distribution, providing a novel way to improve the overall quality of mobile crowdsensing tasks.
Authors: Jalal Arabneydi, Saiful Islam, Srijita Das, Sai Krishna Gottipati, William Duguay, Cloderic Mars, Matthew E. Taylor, Matthew Guzdial, Antoine Fagette, Younes Zerouali
Abstract: With the growing popularity of deep reinforcement learning (DRL), human‑in‑the‑loop (HITL) approach has the potential to revolutionize the way we approach decision‑making problems and create new opportunities for human‑AI collaboration. In this article, we introduce a novel multi‑layered hierarchical HITL DRL algorithm that comprises three types of learning: self learning, imitation learning and transfer learning. In addition, we consider three forms of human inputs: reward, action and demonstration. Furthermore, we discuss main challenges, trade‑offs and advantages of HITL in solving complex problems and how human information can be integrated in the AI solution systematically. To verify our technical results, we present a real‑world unmanned aerial vehicles (UAV) problem wherein a number of enemy drones attack a restricted area. The objective is to design a scalable HITL DRL algorithm for ally drones to neutralize the enemy drones before they reach the area. To this end, we first implement our solution using an award‑winning open‑source HITL software called Cogment. We then demonstrate several interesting results such as (a) HITL leads to faster training and higher performance, (b) advice acts as a guiding direction for gradient methods and lowers variance, and (c) the amount of advice should neither be too large nor too small to avoid over‑training and under‑training. Finally, we illustrate the role of human‑AI cooperation in solving two real‑world complex scenarios, i.e., overloaded and decoy attacks.
Authors: Haolin Liu, Shiliang Zhang, Shangbin Jiao, Xiaohui Zhang, Xuehui Ma, Yan Yan, Wenchuan Cui, Youmin Zhang
Abstract: This paper presents a fault‑tolerant control for the trajectory tracking of autonomous underwater vehicles (AUVs) against thruster failures. We formulate faults in AUV thrusters as discrete switching events during a UAV mission, and develop a soft‑switching approach in facilitating shift of control strategies across fault scenarios. We mathematically define AUV thruster fault scenarios, and develop the fault‑tolerant control that captures the fault scenario via Bayesian approach. Particularly, when the AUV fault type switches from one to another, the developed control captures the fault states and maintains the control by a linear quadratic tracking controller. With the captured fault states by Bayesian approach, we derive the control law by aggregating the control outputs for individual fault scenarios weighted by their Bayesian posterior probability. The developed fault‑tolerant control works in an adaptive way and guarantees soft‑switching across fault scenarios, and requires no complicated fault detection dedicated to different type of faults. The entailed soft‑switching ensures stable AUV trajectory tracking when fault type shifts, which otherwise leads to reduced control under hard‑switching control strategies. We conduct numerical simulations with diverse AUV thruster fault settings. The results demonstrate that the proposed control can provide smooth transition across thruster failures, and effectively sustain AUV trajectory tracking control in case of thruster failures and failure shifts.
Authors: Ghofran Khalaf, May Itani, Sanaa Sharafeddine
Abstract: With the continued growth of its core technologies, including the Internet of Things (IoT), artificial intelligence (AI), Big Data and data analytics, and edge computing, digital twin (DT) technology has witnessed a significant increase in industrial applications, helping the industry become more sustainable, smart, and adaptable. Hence, DT technology has emerged as a promising link between the physical and virtual worlds, enabling simulation, prediction, and real‑time performance optimization. This work aims to explore the development of a high‑fidelity digital twin framework, focusing on synchronization and accuracy between physical and digital systems to enhance data‑driven decision making. To achieve this, we deploy several stationary UAVs in optimized locations to collect data from industrial IoT devices, which were used to monitor multiple physical entities and perform computations to evaluate their status. We consider a practical setup in which multiple IoT devices may monitor a single physical entity, and as a result, the measurements are combined and processed together to determine the status of the physical entity. The resulting status updates are subsequently uploaded from the UAVs to the base station, where the DT resides. In this work, we consider a novel metric based on the Age of Information (AoI), coined as the Age of Digital Twin (AoDT), to reflect the status freshness of the digital twin. Factoring AoDT in the problem formulation ensures that the DT reliably mirrors the physical system with high accuracy and synchronization. We formulate a mixed‑integer non‑convex program to maximize the total amount of data collected from all IoT devices while ensuring a constrained AoDT. Using successive convex approximations, we solve the problem, conduct extensive simulations and compare the results with baseline approaches to demonstrate the effectiveness of the proposed solution.
Authors: Abhishek Tyagi, Charu Gaur
Abstract: We present an autonomous aerial surveillance platform, Veg, designed as a fault‑tolerant quadcopter system that integrates visual SLAM for GPS‑independent navigation, advanced control architecture for dynamic stability, and embedded vision modules for real‑time object and face recognition. The platform features a cascaded control design with an LQR inner‑loop and PD outer‑loop trajectory control. It leverages ORB‑SLAM3 for 6‑DoF localization and loop closure, and supports waypoint‑based navigation through Dijkstra path planning over SLAM‑derived maps. A real‑time Failure Detection and Identification (FDI) system detects rotor faults and executes emergency landing through re‑routing. The embedded vision system, based on a lightweight CNN and PCA, enables onboard object detection and face recognition with high precision. The drone operates fully onboard using a Raspberry Pi 4 and Arduino Nano, validated through simulations and real‑world testing. This work consolidates real‑time localization, fault recovery, and embedded AI on a single platform suitable for constrained environments.
Authors: Liu Wenbin
Abstract: Aerial object detection using unmanned aerial vehicles (UAVs) faces critical challenges including sub‑10px targets, dense occlusions, and stringent computational constraints. Existing detectors struggle to balance accuracy and efficiency due to rigid receptive fields and redundant architectures. To address these limitations, we propose Variable Receptive Field DETR (VRF‑DETR), a transformer‑based detector incorporating three key components: 1) Multi‑Scale Context Fusion (MSCF) module that dynamically recalibrates features through adaptive spatial attention and gated multi‑scale fusion, 2) Gated Convolution (GConv) layer enabling parameter‑efficient local‑context modeling via depthwise separable operations and dynamic gating, and 3) Gated Multi‑scale Fusion (GMCF) Bottleneck that hierarchically disentangles occluded objects through cascaded global‑local interactions. Experiments on VisDrone2019 demonstrate VRF‑DETR achieves 51.4% mAP\textsubscript50 and 31.8% mAP\textsubscript50:95 with only 13.5M parameters. This work establishes a new efficiency‑accuracy Pareto frontier for UAV‑based detection tasks.
Authors: Mahmoud M. Salim, Khaled M. Rabie, Ali H. Muqaibel
Abstract: Many future Internet of Things (IoT) applications are expected to rely heavily on reconfigurable intelligent surface (RIS)‑aided unmanned aerial vehicles (UAVs). However, the endurance of such systems is constrained by the limited onboard energy, where frequent recharging or battery replacements are required. This consequently disrupts continuous operation and may be impractical in disaster scenarios. To address this challenge, we explore a dual energy harvesting (EH) framework that integrates time‑switching (TS), power‑splitting (PS), and element‑splitting (ES) EH protocols for radio frequency energy, along with solar energy as a renewable source. First, we present the proposed system architecture and EH operating protocols, introducing the proposed hybrid ES‑TS‑PS EH strategy to extend UAV‑mounted RIS endurance. Next, we outline key application scenarios and the associated design challenges. After that, a deep reinforcement learning‑based framework is introduced to maximize the EH efficiency by jointly optimizing UAV trajectory, RIS phase shifts, and EH strategies. The framework considers dual EH, hardware impairments, and channel state information imperfections to reflect real‑world deployment conditions. The optimization problem is formulated as a Markov decision process and solved using an enhanced deep deterministic policy gradient algorithm, incorporating clipped double Q‑learning and softmax‑based Q‑value estimation for improved stability and efficiency. The results demonstrate significant performance gains compared to the considered baseline approaches. Finally, possible challenges and open research directions are presented, highlighting the transformative potential of energy‑efficient UAV‑mounted RIS networks for IoT systems.
Authors: Mahmoud M. Salim, Khaled M. Rabie, Ali H. Muqaibel
Abstract: Reconfigurable intelligent surfaces (RISs) enhance unmanned aerial vehicles (UAV)‑assisted communication by extending coverage, improving efficiency, and enabling adaptive beamforming. This paper investigates a multiple‑input single‑output system where a base station (BS) communicates with multiple single‑antenna users through a UAV‑assisted RIS, dynamically adapting to user mobility to maintain seamless connectivity. To extend UAV‑RIS operational time, we propose a hybrid energy‑harvesting resource allocation (HERA) strategy that leverages the irregular RIS ON/OFF capability while adapting to BS‑RIS and RIS‑user channels. The HERA strategy dynamically allocates resources by integrating non‑linear radio frequency energy harvesting (EH) based on the time‑switching (TS) approach and renewable energy as a complementary source. A non‑convex mixed‑integer nonlinear programming problem is formulated to maximize EH efficiency while satisfying quality‑of‑service, power, and energy constraints under channel state information and hardware impairments. The optimization jointly considers BS transmit power, RIS phase shifts, TS factor, and RIS element selection as decision variables. To solve this problem, we introduce the energy‑efficient deep deterministic policy gradient (EE‑DDPG) algorithm. This deep reinforcement learning (DRL)‑based approach integrates action clipping and softmax‑weighted Q‑value estimation to mitigate estimation errors. Simulation results demonstrate that the proposed HERA method significantly improves EH efficiency, reaching up to 81.5% and 73.2% in single‑user and multi‑user scenarios, respectively, contributing to extended UAV operational time. Additionally, the proposed EE‑DDPG model outperforms existing DRL algorithms while maintaining practical computational complexity.
Authors: Nicola Taddei, Riccardo Maggioni, Jaap Eising, Giulia De Pasquale, Florian Dorfler
Abstract: We consider the problem of learning time‑varying functions in a distributed fashion, where agents collect local information to collaboratively achieve a shared estimate. This task is particularly relevant in control applications, whenever real‑time and robust estimation of dynamic cost/reward functions in safety critical settings has to be performed. In this paper, we,adopt a finite‑dimensional approximation of a Gaussian Process, corresponding to a Bayesian linear regression in an appropriate feature space, and propose a new algorithm, DistKP, to track the time‑varying coefficients via a distributed Kalman filter. The proposed method works for arbitrary kernels and under weaker assumptions on the time‑evolution of the function to learn compared to the literature. We validate our results using a simulation example in which a fleet of Unmanned Aerial Vehicles (UAVs) learns a dynamically changing wind field.
Authors: Yousef Emami, Hao Zhou, SeyedSina Nabavirazani, Luis Almeida
Abstract: Unmanned Aerial Vehicles (UAVs) are increasingly being utilized in various private and commercial applications, e.g., traffic control, parcel delivery, and Search and Rescue (SAR) missions. Machine Learning (ML) methods used in UAV‑Assisted Sensor Networks (UASNETs) and, especially, in Deep Reinforcement Learning (DRL) face challenges such as complex and lengthy model training, gaps between simulation and reality, and low sampling efficiency, which conflict with the urgency of emergencies, such as SAR missions. In this paper, an In‑Context Learning (ICL)‑Data Collection Scheduling (ICLDC) system is proposed as an alternative to DRL in emergencies. The UAV collects sensory data and transmits it to a Large Language Model (LLM), which creates a task description in natural language. From this description, the UAV receives a data collection schedule that must be executed. A verifier ensures safe UAV operations by evaluating the schedules generated by the LLM and overriding unsafe schedules based on predefined rules. The system continuously adapts by incorporating feedback into the task descriptions and using this for future decisions. This method is tested against jailbreaking attacks, where the task description is manipulated to undermine network performance, highlighting the vulnerability of LLMs to such attacks. The proposed ICLDC significantly reduces cumulative packet loss compared to both the DQN and Maximum Channel Gain baselines. ICLDC presents a promising direction for intelligent scheduling and control in UASNETs.
Authors: Sebastian Incicco, Juan Ignacio Giribet, Leonardo Colombo
Abstract: This paper presents an integrated navigation algorithm based on trident quaternions, an extension of dual quaternions. The proposed methodology provides an efficient approach for achieving precise and robust navigation by leveraging the advantages of trident quaternions. The performance of the navigation system was validated through experimental tests using a multi‑rotor UAV equipped with two navigation computers: one executing the proposed algorithm and the other running a commercial autopilot, which was used as a reference.
Authors: Ying Wang, Tingfa Xu, Jianan Li
Abstract: Anti‑UAV tracking poses significant challenges, including small target sizes, abrupt camera motion, and cluttered infrared backgrounds. Existing tracking paradigms can be broadly categorized into global‑ and local‑based methods. Global‑based trackers, such as SiamDT, achieve high accuracy by scanning the entire field of view but suffer from excessive computational overhead, limiting real‑world deployment. In contrast, local‑based methods, including OSTrack and ROMTrack, efficiently restrict the search region but struggle when targets undergo significant displacements due to abrupt camera motion. Through preliminary experiments, it is evident that a local tracker, when paired with adaptive search region adjustment, can significantly enhance tracking accuracy, narrowing the gap between local and global trackers. To address this challenge, we propose FocusTrack, a novel framework that dynamically refines the search region and strengthens feature representations, achieving an optimal balance between computational efficiency and tracking accuracy. Specifically, our Search Region Adjustment (SRA) strategy estimates the target presence probability and adaptively adjusts the field of view, ensuring the target remains within focus. Furthermore, to counteract feature degradation caused by varying search regions, the Attention‑to‑Mask (ATM) module is proposed. This module integrates hierarchical information, enriching the target representations with fine‑grained details. Experimental results demonstrate that FocusTrack achieves state‑of‑the‑art performance, obtaining 67.7% AUC on AntiUAV and 62.8% AUC on AntiUAV410, outperforming the baseline tracker by 8.5% and 9.1% AUC, respectively. In terms of efficiency, FocusTrack surpasses global‑based trackers, requiring only 30G MACs and achieving 143 fps with FocusTrack (SRA) and 44 fps with the full version, both enabling real‑time tracking.
Authors: Xin Tang, Qian Chen, Wenjie Weng, Chao Jin, Zhang Liu, Jiacheng Wang, Geng Sun, Xiaohuan Li, Dusit Niyato
Abstract: The integration of emerging uncrewed aerial vehicles (UAVs) with artificial intelligence (AI) and ground‑embedded robots (GERs) has transformed emergency rescue operations in unknown environments. However, the high computational demands often exceed a single UAV's capacity, making it difficult to continuously provide stable high‑level services. To address this, this paper proposes a cooperation framework involving UAVs, GERs, and airships. The framework enables resource pooling through UAV‑to‑GER (U2G) and UAV‑to‑airship (U2A) links, offering computing services for offloaded tasks. Specifically, we formulate the multi‑objective problem of task assignment and exploration as a dynamic long‑term optimization problem aiming to minimize task completion time and energy use while ensuring stability. Using Lyapunov optimization, we transform it into a per‑slot deterministic problem and propose HG‑MADDPG, which combines the Hungarian algorithm with a GDM‑based multi‑agent deep deterministic policy gradient. Simulations demonstrate significant improvements in offloading efficiency, latency, and system stability over baselines.
Authors: Chongyang Shi, Michael R. Dorothy, Jie Fu
Abstract: This paper studies the synthesis of a joint control and active perception policy for a stochastic system modeled as a partially observable Markov decision process (POMDP), subject to temporal logic specifications. The POMDP actions influence both system dynamics (control) and the emission function (perception). Beyond task completion, the planner seeks to maximize information gain about certain temporal events (the secret) through coordinated perception and control. To enable active information acquisition, we introduce minimizing the Shannon conditional entropy of the secret as a planning objective, alongside maximizing the probability of satisfying the temporal logic formula within a finite horizon. Using a variant of observable operators in hidden Markov models (HMMs) and POMDPs, we establish key properties of the conditional entropy gradient with respect to policy parameters. These properties facilitate efficient policy gradient computation. We validate our approach through graph‑based examples, inspired by common security applications with UAV surveillance.
Authors: Haonan He, Yuheng Qiu, Junyi Geng
Abstract: Modeling and control of nonlinear dynamics are critical in robotics, especially in scenarios with unpredictable external influences and complex dynamics. Traditional cascaded modular control pipelines often yield suboptimal performance due to conservative assumptions and tedious parameter tuning. Pure data‑driven approaches promise robust performance but suffer from low sample efficiency, sim‑to‑real gaps, and reliance on extensive datasets. Hybrid methods combining learning‑based and traditional model‑based control in an end‑to‑end manner offer a promising alternative. This work presents a self‑supervised learning framework combining learning‑based inertial odometry (IO) module and differentiable model predictive control (d‑MPC) for Unmanned Aerial Vehicle (UAV) attitude control. The IO denoises raw IMU measurements and predicts UAV attitudes, which are then optimized by MPC for control actions in a bi‑level optimization (BLO) setup, where the inner MPC optimizes control actions and the upper level minimizes discrepancy between real‑world and predicted performance. The framework is thus end‑to‑end and can be trained in a self‑supervised manner. This approach combines the strength of learning‑based perception with the interpretable model‑based control. Results show the effectiveness even under strong wind. It can simultaneously enhance both the MPC parameter learning and IMU prediction performance.
Authors: Nikhil Vijay, Will C. Forte, Ishan Gajjar, Sarvesh Patham, Syon Gupta, Sahil Shah, Prathamesh Trivedi, Rishit Arora
Abstract: Unmanned aerial vehicles (UAVs) are becoming more commonly used in populated areas, raising concerns about noise pollution generated from their propellers. This study investigates the acoustic performance of unconventional propeller designs, specifically toroidal and uneven‑blade spaced propellers, for their potential in reducing psychoacoustic annoyance. Our experimental results show that these designs noticeably reduced acoustic characteristics associated with noise annoyance.
Authors: Yifei Dong, Fengyi Wu, Sanjian Zhang, Guangyu Chen, Yuzhi Hu, Masumi Yano, Jingdong Sun, Siyu Huang, Feng Liu, Qi Dai, Zhi-Qi Cheng
Abstract: Unmanned Aerial Vehicles (UAVs) are indispensable for infrastructure inspection, surveillance, and related tasks, yet they also introduce critical security challenges. This survey provides a wide‑ranging examination of the anti‑UAV domain, centering on three core objectives‑classification, detection, and tracking‑while detailing emerging methodologies such as diffusion‑based data synthesis, multi‑modal fusion, vision‑language modeling, self‑supervised learning, and reinforcement learning. We systematically evaluate state‑of‑the‑art solutions across both single‑modality and multi‑sensor pipelines (spanning RGB, infrared, audio, radar, and RF) and discuss large‑scale as well as adversarially oriented benchmarks. Our analysis reveals persistent gaps in real‑time performance, stealth detection, and swarm‑based scenarios, underscoring pressing needs for robust, adaptive anti‑UAV systems. By highlighting open research directions, we aim to foster innovation and guide the development of next‑generation defense strategies in an era marked by the extensive use of UAVs.
Authors: Xiaoxiao Ma, Junxiong Tong
Abstract: With the rapid development of information technology, modern warfare increasingly relies on intelligence, making small target detection critical in military applications. The growing demand for efficient, real‑time detection has created challenges in identifying small targets in complex environments due to interference. To address this, we propose a small target detection method based on multi‑modal image fusion and attention mechanisms. This method leverages YOLOv5, integrating infrared and visible light data along with a convolutional attention module to enhance detection performance. The process begins with multi‑modal dataset registration using feature point matching, ensuring accurate network training. By combining infrared and visible light features with attention mechanisms, the model improves detection accuracy and robustness. Experimental results on anti‑UAV and Visdrone datasets demonstrate the effectiveness and practicality of our approach, achieving superior detection results for small and dim targets.
Authors: Bo Ma, Yi Ji, Liyong Fang
Abstract: The traditional Artificial Potential Field (APF) method exhibits limitations in its force distribution: excessive attraction when UAVs are far from the target may cause collisions with obstacles, while insufficient attraction near the goal often results in failure to reach the target. Furthermore, APF is highly susceptible to local minima, compromising motion reliability in complex environments. To address these challenges, this paper presents a novel hybrid obstacle avoidance algorithm‑Deflected Simulated Annealing‑Adaptive Artificial Potential Field (DSA‑AAPF)‑which combines an improved simulated annealing mechanism with an enhanced APF model. The proposed approach integrates a Leader‑Follower distributed formation strategy with the APF framework, where the resultant force formulation is redefined to smooth UAV trajectories. An adaptive gravitational gain function is introduced to dynamically adjust UAV velocity based on environmental context, and a fast‑converging controller ensures accurate and efficient convergence to the target. Moreover, a directional deflection mechanism is embedded within the simulated annealing process, enabling UAVs to escape local minima caused by semi‑enclosed obstacles through continuous rotational motion. The simulation results, covering formation reconfiguration, complex obstacle avoidance, and entrapment escape, demonstrate the feasibility, robustness, and superiority of the proposed DSA‑AAPF algorithm.
Authors: Hyojun Ahn, Seungcheol Oh, Gyu Seon Kim, Soyi Jung, Soohyun Park, Joongheon Kim
Abstract: This paper proposes SafeGPT, a two‑tiered framework that integrates generative pretrained transformers (GPTs) with reinforcement learning (RL) for efficient and reliable unmanned aerial vehicle (UAV) last‑mile deliveries. In the proposed design, a Global GPT module assigns high‑level tasks such as sector allocation, while an On‑Device GPT manages real‑time local route planning. An RL‑based safety filter monitors each GPT decision and overrides unsafe actions that could lead to battery depletion or duplicate visits, effectively mitigating hallucinations. Furthermore, a dual replay buffer mechanism helps both the GPT modules and the RL agent refine their strategies over time. Simulation results demonstrate that SafeGPT achieves higher delivery success rates compared to a GPT‑only baseline, while substantially reducing battery consumption and travel distance. These findings validate the efficacy of combining GPT‑based semantic reasoning with formal safety guarantees, contributing a viable solution for robust and energy‑efficient UAV logistics.
Authors: Ali Nazari, Ali Olfat
Abstract: As an emerging technology, the simultaneous transmitting and reflecting reconfigurable intelligent surface (STAR‑RIS) can improve the spectrum efficiency (SE) of primary users (PUs) and secondary users (SUs) in cognitive radio (CR) networks by mitigating the interference of the incident signals. The STAR‑RIS‑assisted unmanned aerial vehicle (UAV) can fully cover the dynamic environment through high mobility and fast deployment. According to the dynamic air‑to‑ground channels, the STAR‑RIS‑assisted UAV may face a challenge configuring their elements' coefficients (i.e., reflecting and transmitting the amplitude and phases). Hence, to meet the requirements of dynamic channel determination with the SE approach, this paper proposes the sum rate maximization of both PUs and SUs through non‑orthogonal multiple access in CR network to jointly optimize the trajectory and transmission‑reflection beamforming design of the STAR‑RIS‑assisted UAV, and power allocation. Since the non‑convex joint optimization problem includes coupled optimization variables, we develop an alternative optimization algorithm. Simulation results study the impact of: 1) the significant parameters, 2) the performance of different intelligence surface modes and STAR‑RIS operating protocols, 3) the joint trajectory and beamforming design with fixed and mobile users, and 4) STAR‑RIS capabilities such as mitigating the interference, and how variations in the roles of elements dynamically.
Authors: Pei Peng, Xianfu Chen, Tianheng Xu, Celimuge Wu, Yulong Zou, Qiang Ni, Emina Soljanin
Abstract: We propose a novel covert communication system in which a ground user, Alice, transmits unauthorized message fragments to Bob, a low‑Earth orbit satellite (LEO), and an unmanned aerial vehicle (UAV) warden (Willie) attempts to detect these transmissions. The key contribution is modeling a scenario where Alice and Willie are unaware of each other's exact locations and move randomly within a specific area. Alice utilizes environmental obstructions to avoid detection and only transmits when the satellite is directly overhead. LEO satellite technology allows users to avoid transmitting messages near a base station. We introduce two key performance metrics: catch probability (Willie detects and locates Alice during a message chunk transmission) and overall catch probability over multiple message chunks. We analyze how two parameters impact these metrics: 1) the size of the detection window and 2) the number of message chunks. The paper proposes two algorithms to optimize these parameters. The simulation results show that the algorithms effectively reduce the detection risks. This work advances the understanding of covert communication under mobility and uncertainty in satellite‑aided systems.
Authors: Haotian Xu, Yue Hu, Chen Gao, Zhengqiu Zhu, Yong Zhao, Yong Li, Quanjun Yin
Abstract: Language‑goal aerial navigation requires UAVs to localize targets in the complex outdoors, such as urban blocks based on textual instructions. The indoor methods are often hard to scale to urban scenes due to ambiguous objects, limited visual field, and spatial reasoning. In this work, we propose GeoNav, a multi‑modal agent for long‑range aerial navigation with geospatial awareness. GeoNav operates in three phases‑landmark navigation, target search, and precise localization‑mimicking human coarse‑to‑fine spatial reasoning patterns. To support such reasoning, it dynamically builds dual‑scale spatial representations. The first is a global but schematic cognitive map, which fuses prior geographic knowledge and embodied visual cues into a top‑down and explicit annotated form. It enables fast navigation to the landmark region via intuitive map‑based reasoning. The second is a local but delicate scene graph representing hierarchical spatial relationships between landmarks and objects, utilized for accurate target localization. On top of the structured memory, GeoNav employs a spatial chain‑of‑thought mechanism to enable MLLMs with efficient and interpretable action‑making across stages. On the CityNav benchmark, GeoNav surpasses the current SOTA up to 18.4% in success rate and significantly eliminates navigation error. The ablation studies highlight the importance of each module, positioning structured spatial perception as the key to advanced UAV navigation. Published in Pattern Recognition, 2026.
Authors: Fei Lin, Yonglin Tian, Tengchao Zhang, Jun Huang, Sangtian Guan, Fei-Yue Wang
Abstract: Unmanned Aerial Vehicles (UAVs) are increasingly important in dynamic environments such as logistics transportation and disaster response. However, current tasks often rely on human operators to monitor aerial videos and make operational decisions. This mode of human‑machine collaboration suffers from significant limitations in efficiency and adaptability. In this paper, we present AirVista‑II ‑‑ an end‑to‑end agentic system for embodied UAVs, designed to enable general‑purpose semantic understanding and reasoning in dynamic scenes. The system integrates agent‑based task identification and scheduling, multimodal perception mechanisms, and differentiated keyframe extraction strategies tailored for various temporal scenarios, enabling the efficient capture of critical scene information. Experimental results demonstrate that the proposed system achieves high‑quality semantic understanding across diverse UAV‑based dynamic scenarios under a zero‑shot setting.
Authors: Hanyu Jin, Zhefan Xu, Haoyu Shen, Xinming Han, Kanlong Ye, Kenji Shimada
Abstract: Inspecting indoor environments such as tunnels, industrial facilities, and construction sites is essential for infrastructure monitoring and maintenance. While manual inspection in these environments is often time‑consuming and potentially hazardous, Unmanned Aerial Vehicles (UAVs) can improve efficiency by autonomously handling inspection tasks. Such inspection tasks usually rely on reference maps for coverage planning. However, in industrial applications, only the floor plans are typically available. The unforeseen obstacles not included in the floor plans will result in outdated reference maps and inefficient or unsafe inspection trajectories. In this work, we propose an adaptive inspection framework that integrates global coverage planning with local reactive adaptation to improve the coverage and efficiency of UAV‑based inspection in partially unknown indoor environments. Experimental results in structured indoor scenarios demonstrate the effectiveness of the proposed approach in inspection efficiency and achieving high coverage rates with adaptive obstacle handling, highlighting its potential for enhancing the efficiency of indoor facility inspection.
Authors: Lingyi Cai, Jiacheng Wang, Ruichen Zhang, Yu Zhang, Tao Jiang, Dusit Niyato, Xianbin Wang, Abbas Jamalipour, Xuemin Shen
Abstract: The Low‑Altitude Economy Networking (LAENet) is emerging as a transformative paradigm that enables an integrated and sophisticated communication infrastructure to support aerial vehicles in carrying out a wide range of economic activities within low‑altitude airspace. However, the physical layer communications in the LAENet face growing security threats due to inherent characteristics of aerial communication environments, such as signal broadcast nature and channel openness. These challenges highlight the urgent need for safeguarding communication confidentiality, availability, and integrity. In view of the above, this survey comprehensively reviews existing secure countermeasures for physical layer communication in the LAENet. We explore core methods focusing on anti‑eavesdropping and authentication for ensuring communication confidentiality. Subsequently, availability‑enhancing techniques are thoroughly discussed for anti‑jamming and spoofing defense. Then, we review approaches for safeguarding integrity through anomaly detection and injection protection. Furthermore, we discuss future research directions, emphasizing energy‑efficient physical layer security, multi‑drone collaboration for secure communication, AI‑driven security defense strategy, space‑air‑ground integrated security architecture, and 6G‑enabled secure UAV communication. This survey may provide valuable references and new insights for researchers in the field of secure physical layer communication for the LAENet.
Authors: Mikayel Aramyan, Anna Manucharyan, Lusine Poghosyan, Tigran Bakaryan, Naira Hovakimyan
Abstract: Coordinated missions involving Unmanned Aerial Vehicles (UAVs) in dynamic environments pose significant challenges in maintaining both coordination and agility. In this paper, relying on the cooperative path following framework and using a game‑theoretic formulation, we introduce a novel and scalable approach in which each UAV acts autonomously in different mission conditions. This formulation naturally accommodates heterogeneous and time‑varying objectives across the system. In our setting, each UAV optimizes a cost function that incorporates temporal and mission‑specific constraints. The optimization is performed within a one‑dimensional domain, significantly reducing the computational cost and enabling real‑time application to complex and dynamic scenarios. The framework is distributed in structure, enabling global, system‑wide coordination (a Nash equilibrium) by using only local information. For ideal systems, we prove the existence and the Nash equilibrium exhibits exponential convergence. Furthermore, we invoke model predictive control (MPC) for non‑ideal scenarios. In particular, we propose a discrete‑time optimization approach that tackles path‑following errors and communication failures, ensuring reliable and agile performance in dynamic and uncertain environments. Simulation results demonstrate the effectiveness and agility of the approach in ensuring successful mission execution across diverse realistic scenarios.
Authors: Colin Samplawski, Adam D. Cobb, Susmit Jha
Abstract: Computer‑aided design (CAD) is a promising application area for emerging artificial intelligence methods. Traditional workflows for cyberphysical systems create detailed digital models which can be evaluated by physics simulators in order to narrow the search space before creating physical prototypes. A major bottleneck of this approach is that the simulators are often computationally expensive and slow. Recent advancements in AI methods offer the possibility to accelerate these pipelines. We use the recently released AircraftVerse dataset, which is especially suited for developing and evaluating large language models for designs. AircraftVerse contains a diverse set of UAV designs represented via textual design trees together with detailed physics simulation results. Following the recent success of large language models (LLMs), we propose AGENT (Aircraft GENeraTor). AGENT is a comprehensive design tool built on the CodeT5+ LLM which learns powerful representations of aircraft textual designs directly from JSON files. We develop a curriculum of training tasks which imbues a single model with a suite of useful features. AGENT is able to generate designs conditioned on properties of flight dynamics (hover time, maximum speed, etc.). Additionally, AGENT can issue evaluations of designs allowing it to act as a surrogate model of the physics simulation that underlies the AircraftVerse dataset. We present a series of experiments which demonstrate our system's abilities. We are able to achieve strong performance using the smallest member of the CodeT5+ family (220M parameters). This allows for a flexible and powerful system which can be executed on a single GPU enabling a clear path toward future deployment.
Authors: Sara Habibi, Naghmeh Ivaki, João Barata
Abstract: Unmanned aerial vehicles (UAVs), initially developed for military applications, are now used in various fields. As UAVs become more common across multiple industries, it is crucial to understand how to adopt them effectively, efficiently, and safely. The utilization of UAVs in healthcare and emergency services has evolved significantly in recent years, with these aerial vehicles potentially contributing to increased survival rates and enhanced healthcare services.
This paper presents a two‑stage systematic literature review, including a tertiary study of 15 review papers and an in‑depth assessment of 136 primary publications focused on using UAVs in healthcare and emergency services. The research demonstrates how civilian UAVs have been used in numerous applications, such as healthcare emergencies, medical supply delivery, and disaster management, for diverse use cases such as Automated External Defibrillator (AED) delivery, blood delivery, and search and rescue.
The studies indicate that UAVs significantly improve response times in emergency situations, enhance survival rates by ensuring the timely delivery of critical medical supplies such as AEDs, and prove to be cost‑effective alternatives to traditional delivery methods, especially in remote or inaccessible areas. The studies also highlight the need for ongoing research and development to address existing challenges, such as regulatory frameworks, security, privacy and safety concerns, infrastructure development, and ethical and social issues. Effectively understanding and tackling these challenges is essential for maximizing the benefits of UAV technology in healthcare and emergency services, ultimately leading to safer, more resilient, and responsive systems that can better serve public health needs.
Authors: Mohamed S. Talamali, Genki Miyauchi, Thomas Watteyne, Micael S. Couceiro, Roderich Gross
Abstract: Unmanned Aerial Vehicles (UAVs) are expected to transform logistics, reducing delivery time, costs, and emissions. This study addresses an on‑demand delivery , in which fleets of UAVs are deployed to fulfil orders that arrive stochastically. Unlike previous work, it considers UAVs with heterogeneous, unknown energy storage capacities and assumes no knowledge of the energy consumption models. We propose a decentralised deployment strategy that combines auction‑based task allocation with online learning. Each UAV independently decides whether to bid for orders based on its energy storage charge level, the parcel mass, and delivery distance. Over time, it refines its policy to bid only for orders within its capability. Simulations using realistic UAV energy models reveal that, counter‑intuitively, assigning orders to the least confident bidders reduces delivery times and increases the number of successfully fulfilled orders. This strategy is shown to outperform threshold‑based methods which require UAVs to exceed specific charge levels at deployment. We propose a variant of the strategy which uses learned policies for forecasting. This enables UAVs with insufficient charge levels to commit to fulfilling orders at specific future times, helping to prioritise early orders. Our work provides new insights into long‑term deployment of UAV swarms, highlighting the advantages of decentralised energy‑aware decision‑making coupled with online learning in real‑world dynamic environments.
Authors: Jiawei Wang, Vincent Chau, Weiwei Wu
Abstract: We study the problem of the Unmanned Aerial Vehicle (UAV) such that a specific set of objects needs to be observed while ensuring a quality of observation. Our goal is to determine the shortest path for the UAV. This paper proposes an offline algorithm with an approximation of (2+2n)(1+ε) where ε>0 is a small constant, and n is the number of objects. We then propose several online algorithms in which objects are discovered during the process. To evaluate the performance of these algorithms, we conduct experimental comparisons. Our results show that the online algorithms perform similarly to the offline algorithm, but with significantly faster execution times ranging from 0.01 seconds to 200 seconds. We also show that our methods yield solutions with costs comparable to those obtained by the Gurobi optimizer that requires 30000 seconds of runtime.
Authors: Priyavrat Dev Sharma, Ibrahim Sorkhoh, Muthucumaru Maheswaran
Abstract: Advances in the Internet of Things are revolutionizing data acquisition, enhancing artificial intelligence and quality of service. Unmanned Aerial Vehicles (UAVs) provide an efficient data‑gathering solution across varied environments. This paper addresses challenges in integrating UAVs for large scale data operations, including mobility, multi‑hop paths, and optimized multi‑source information transfer. We propose a collaborative UAV framework that enables efficient data sharing with minimal communication overhead, featuring adaptive power control and dynamic resource allocation. Formulated as an NP‑hard Integer Linear Program, our approach uses heuristic algorithms to optimize routing through UAV hubs. Simulations show promise in terms of computation time (99% speedup) and outcome (down to 14% deviation from the optimal).
Authors: Max Beffert, Andreas Zell
Abstract: The flight time of multirotor unmanned aerial vehicles (UAVs) is typically constrained by their high power consumption. Tethered power systems present a viable solution to extend flight times while maintaining the advantages of multirotor UAVs, such as hover capability and agility. This paper addresses the critical aspect of cable selection for tether‑powered multirotor UAVs, considering both hover and forward flight. Existing research often overlooks the trade‑offs between cable mass, power losses, and system constraints. We propose a novel methodology to optimize cable selection, accounting for thrust requirements and power efficiency across various flight conditions. The approach combines physics‑informed modeling with system identification to combine hover and forward flight dynamics, incorporating factors such as motor efficiency, tether resistance, and aerodynamic drag. This work provides an intuitive and practical framework for optimizing tethered UAV designs, ensuring efficient power transmission and flight performance. Thus allowing for better, safer, and more efficient tethered drones.
Authors: Yongkang Zhang, Bin Jiang, Yajie Ma
Abstract: This paper presents a novel approach employing prescribed performance control to address the distributed fault‑tolerant formation control problem in a heterogeneous UAV‑UGV cooperative system under a directed interaction topology and communication link failures. The proposed distributed fault‑tolerant control scheme enables UAVs to accurately track a virtual leader's trajectory and achieve the desired formation, while ensuring UGVs converge within the convex hull formed by leader UAVs. By accounting for differences in system parameters and state dimensions between UAVs and UGVs, the method leverages performance functions to guarantee predefined transient and steady‑state behavior. Additionally, a variable prescribed performance boundary control strategy with an adaptive learning rate is introduced to tackle actuator saturation, ensuring reliable formation tracking in real‑world scenarios. Simulation results demonstrate the effectiveness and robustness of the proposed approach.
Authors: Mohammad Javad-Kalbasi, Shahrokh Valaee
Abstract: Unmanned Aerial Vehicles (UAVs) in networked environments face significant challenges due to energy constraints and limited battery life, which necessitate periodic replacements to maintain continuous operation. Efficiently managing the handover of data flows during these replacements is crucial to avoid disruptions in communication and to optimize energy consumption. This paper addresses the complex issue of energy‑efficient UAV replacement in software‑defined UAV network. We introduce a novel approach based on establishing a strict total ordering relation for UAVs and data flows, allowing us to formulate the problem as an integer linear program. By utilizing the Gurobi solver, we obtain optimal handover schedules for the tested problem instances. Additionally, we propose a heuristic algorithm that significantly reduces computational complexity while maintaining near‑optimal performance. Through comprehensive simulations, we demonstrate that our heuristic offers practical and scalable solution, ensuring energy‑efficient UAV replacement while minimizing network disruptions. Our results suggest that the proposed approach can enhance UAV battery life and improve overall network reliability in real‑world applications.
Authors: Saqib Abbas, Anurag Kumar, Arpan Chattopadhyay
Abstract: This paper addresses the problem of quickest change detection (QCD) at two spatially separated locations monitored by a single unmanned aerial vehicle (UAV) equipped with a sensor. At any location, the UAV observes i.i.d. data sequentially in discrete time instants. The distribution of the observation data changes at some unknown, arbitrary time and the UAV has to detect this change in the shortest possible time. Change can occur at most at one location over the entire infinite time horizon. The UAV switches between these two locations in order to quickly detect the change. To this end, we propose Location Switching and Change Detection (LS‑CD) algorithm which uses a repeated one‑sided sequential probability ratio test (SPRT) based mechanism for observation‑driven location switching and change detection. The primary goal is to minimize the worst‑case average detection delay (WADD) while meeting constraints on the average run length to false alarm (ARL2FA) and the UAV's time‑averaged energy consumption. We provide a rigorous theoretical analysis of the algorithm's performance by using theory of random walk. Specifically, we derive tight upper and lower bounds to its ARL2FA and a tight upper bound to its WADD. In the special case of a symmetrical setting, our analysis leads to a new asymptotic upper bound to the ARL2FA of the standard CUSUM algorithm, a novel contribution not available in the literature, to our knowledge. Numerical simulations demonstrate the efficacy of LS‑CD.
Authors: Kexin Zhang, Xin Zhang, Lixin Li, Wensheng Lin, Wenchi Cheng, Qinghe Du
Abstract: Due to their flexibility and dynamic coverage capabilities, Unmanned Aerial Vehicles (UAVs) have emerged as vital platforms for emergency communication in disaster‑stricken areas. However, the complex channel conditions in high‑speed mobile scenarios significantly impact the reliability and efficiency of traditional communication systems. This paper presents an intelligent emergency communication framework that integrates Orthogonal Time Frequency Space (OTFS) modulation, semantic communication, and a diffusion‑based denoising module to address these challenges. OTFS ensures robust communication under dynamic channel conditions due to its superior anti‑fading characteristics and adaptability to rapidly changing environments. Semantic communication further enhances transmission efficiency by focusing on key information extraction and reducing data redundancy. Moreover, a diffusion‑based channel denoising module is proposed to leverage the gradual noise reduction process and statistical noise modeling, optimizing the accuracy of semantic information recovery. Experimental results demonstrate that the proposed solution significantly improves link stability and transmission performance in high‑mobility UAV scenarios, achieving at least a 3dB SNR gain over existing methods.
Authors: Aaron Yu, Iuliia Kolotylo, Hashim A. Hashim, A. E. E. Eltoukhy
Abstract: Unmanned Aerial Vehicles (UAVs) play a pivotal role in modern autonomous air mobility, and the reliability of UAV avionics systems is critical to ensuring mission success, sustainability practices, and public safety. The success of UAV missions depends on effectively mitigating various aspects of electronic warfare, including non‑destructive and destructive cyberattacks, transponder vulnerabilities, and jamming threats, while rigorously implementing countermeasures and defensive aids. This paper provides a comprehensive review of UAV cyberattacks, countermeasures, and defensive strategies. It explores UAV‑to‑UAV coordination attacks and their associated features, such as dispatch system attacks, Automatic Dependent Surveillance‑Broadcast (ADS‑B) attacks, Traffic Alert and Collision Avoidance System (TCAS)‑induced collisions, and TCAS attacks. Additionally, the paper examines UAV‑to‑command center coordination attacks, as well as UAV functionality attacks. The review also covers various countermeasures and defensive aids designed for UAVs. Lastly, a comparison of common cyberattacks and countermeasure approaches is conducted, along with a discussion of future trends in the field. Keywords: Electronic warfare, UAVs, Avionics Systems, cyberattacks, coordination attacks, functionality attacks, countermeasure, defensive‑aids.
Authors: Uthman Olawoye, Jason N. Gross
Abstract: This paper explores the use of applying a deep learning approach for 3D object detection to compute the relative position of an Unmanned Aerial Vehicle (UAV) from an Unmanned Ground Vehicle (UGV) equipped with a LiDAR sensor in a GPS‑denied environment. This was achieved by evaluating the LiDAR sensor's data through a 3D detection algorithm (PointPillars). The PointPillars algorithm incorporates a column voxel point‑cloud representation and a 2D Convolutional Neural Network (CNN) to generate distinctive point‑cloud features representing the object to be identified, in this case, the UAV. The current localization method utilizes point‑cloud segmentation, Euclidean clustering, and predefined heuristics to obtain the relative position of the UAV. Results from the two methods were then compared to a reference truth solution.
Authors: Yang Wang, Hai Yu, Shizhen Wu, Zhichao Yang, Jianda Han, Yongchun Fang, Xiao Liang
Abstract: The unmanned aerial manipulator system, consisting of a multirotor UAV (unmanned aerial vehicle) and a manipulator, has attracted considerable interest from researchers. Nevertheless, the operation of a dual‑arm manipulator poses a dynamic challenge, as the CoM (center of mass) of the system changes with manipulator movement, potentially impacting the multirotor UAV. Additionally, unmodeled effects, parameter uncertainties, and external disturbances can significantly degrade control performance, leading to unforeseen dangers. To tackle these issues, this paper proposes a nonlinear adaptive RISE (robust integral of the sign of the error) controller based on DNN (deep neural network). The first step involves establishing the kinematic and dynamic model of the dual‑arm aerial manipulator. Subsequently, the adaptive RISE controller is proposed with a DNN feedforward term to effectively address both internal and external challenges. By employing Lyapunov techniques, the asymptotic convergence of the tracking error signals are guaranteed rigorously. Notably, this paper marks a pioneering effort by presenting the first DNN‑based adaptive RISE controller design accompanied by a comprehensive stability analysis. To validate the practicality and robustness of the proposed control approach, several groups of actual hardware experiments are conducted. The results confirm the efficacy of the developed methodology in handling real‑world scenarios, thereby offering valuable insights into the performance of the dual‑arm aerial manipulator system.
Authors: Viswa Narayanan Sankaranarayanan, Akshit Saradagi, Sumeet Satpute, George Nikolakopoulos
Abstract: In this article, we present a centralized approach for the control of multiple unmanned aerial vehicles (UAVs) for landing on moving unmanned ground vehicles (UGVs) using control barrier functions (CBFs). The proposed control framework employs two kinds of CBFs to impose safety constraints on the UAVs' motion. The first class of CBFs (LCBF) is a three‑dimensional exponentially decaying function centered above the landing platform, designed to safely and precisely land UAVs on the UGVs. The second set is a spherical CBF (SCBF), defined between every pair of UAVs, which avoids collisions between them. The LCBF is time‑varying and adapts to the motions of the UGVs. In the proposed CBF approach, the control input from the UAV's nominal tracking controller designed to reach the landing platform is filtered to choose a minimally‑deviating control input that ensures safety (as defined by the CBFs). As the control inputs of every UAV are shared in establishing multiple CBF constraints, we prove that the control inputs are shared without conflict in rendering the safe sets forward invariant. The performance of the control framework is validated through a simulated scenario involving three UAVs landing on three moving targets.
Authors: Amit Kumar Singh, Prasanth Kumar Duba, P. Rajalakshmi
Abstract: The Autonomy of Unmanned Aerial Vehicles (UAVs) in indoor environments poses significant challenges due to the lack of reliable GPS signals in enclosed spaces such as warehouses, factories, and indoor facilities. Micro Aerial Vehicles (MAVs) are preferred for navigating in these complex, GPS‑denied scenarios because of their agility, low power consumption, and limited computational capabilities. In this paper, we propose a Reinforcement Learning based Deep‑Proximal Policy Optimization (D‑PPO) algorithm to enhance realtime navigation through improving the computation efficiency. The end‑to‑end network is trained in 3D realistic meta‑environments created using the Unreal Engine. With these trained meta‑weights, the MAV system underwent extensive experimental trials in real‑world indoor environments. The results indicate that the proposed method reduces computational latency by 91% during training period without significant degradation in performance. The algorithm was tested on a DJI Tello drone, yielding similar results.
Authors: Savvas Papaioannou, Panayiotis Kolios, Theocharis Theocharides, Christos G. Panayiotou, Marios M. Polycarpou
Abstract: This work proposes a jointly optimized trajectory generation and camera control approach, enabling an autonomous agent, such as an unmanned aerial vehicle (UAV) operating in 3D environments, to plan and execute coverage trajectories that maximally cover the surface area of a 3D object of interest. Specifically, the UAV's kinematic and camera control inputs are jointly optimized over a rolling planning horizon to achieve complete 3D coverage of the object. The proposed controller incorporates ray‑tracing into the planning process to simulate the propagation of light rays, thereby determining the visible parts of the object through the UAV's camera. This integration enables the generation of precise look‑ahead coverage trajectories. The coverage planning problem is formulated as a rolling finite‑horizon optimal control problem and solved using mixed‑integer programming techniques. Extensive real‑world and synthetic experiments validate the performance of the proposed approach.
Authors: Savvas Papaioannou, Panayiotis Kolios, Theocharis Theocharides, Christos G. Panayiotou, Marios M. Polycarpou
Abstract: This work proposes a coverage controller that enables an aerial team of distributed autonomous agents to collaboratively generate non‑myopic coverage plans over a rolling finite horizon, aiming to cover specific points on the surface area of a 3D object of interest. The collaborative coverage problem, formulated, as a distributed model predictive control problem, optimizes the agents' motion and camera control inputs, while considering inter‑agent constraints aiming at reducing work redundancy. The proposed coverage controller integrates constraints based on light‑path propagation techniques to predict the parts of the object's surface that are visible with regard to the agents' future anticipated states. This work also demonstrates how complex, non‑linear visibility assessment constraints can be converted into logical expressions that are embedded as binary constraints into a mixed‑integer optimization framework. The proposed approach has been demonstrated through simulations and practical applications for inspecting buildings with unmanned aerial vehicles (UAVs).
Authors: Pavlo Mykytyn, Ronald Chitauro, Zoya Dyka, Peter Langendoerfer
Abstract: Networks built on the IEEE 802.11 standard have experienced rapid growth in the last decade. Their field of application is vast, including smart home applications, Internet of Things (IoT), and short‑range high throughput static and dynamic inter‑vehicular communication networks. Within such networks, Channel State Information (CSI) provides a detailed view of the state of the communication channel and represents the combined effects of multipath propagation, scattering, phase shift, fading, and power decay. In this work, we investigate the problem of jamming attack detection in static and dynamic vehicular networks. We utilize ESP32‑S3 modules to set up a communication network between an Unmanned Aerial Vehicle (UAV) and a Ground Control Station (GCS), to experimentally test the combined effects of a constant jammer on recorded CSI parameters, and the feasibility of jamming detection through CSI analysis in static and dynamic communication scenarios.
Authors: Udayanga G. W. K. N. Gamage, Xuanni Huo, Luca Zanatta, T Delbruck, Cesar Cadena, Matteo Fumagalli, Silvia Tolu
Abstract: Small unmanned aerial vehicle (UAV)‑based visual inspections are a more efficient alternative to manual methods for examining civil structural defects, offering safe access to hazardous areas and significant cost savings by reducing labor requirements. However, traditional frame‑based cameras, widely used in UAV‑based inspections, often struggle to capture defects under low or dynamic lighting conditions. In contrast, dynamic vision sensors (DVS), or event‑based cameras, excel in such scenarios by minimizing motion blur, enhancing power efficiency, and maintaining high‑quality imaging across diverse lighting conditions without saturation or information loss. Despite these advantages, existing research lacks studies exploring the feasibility of using DVS for detecting civil structural defects. Moreover, there is no dedicated event‑based dataset tailored for this purpose. Addressing this gap, this study introduces the first event‑based civil infrastructure defect detection dataset, capturing defective surfaces as a spatio‑temporal event stream using DVS. In addition to event‑based data, the dataset includes grayscale intensity image frames captured simultaneously using an active pixel sensor (APS). Both data types were collected using the DAVIS346 camera, which integrates DVS and APS sensors. The dataset focuses on two types of defects: cracks and spalling, and includes data from both field and laboratory environments. The field dataset comprises 318 recording sequences, documenting 458 distinct cracks and 121 distinct spalling instances. The laboratory dataset includes 362 recording sequences, covering 220 distinct cracks and 308 spalling instances. We evaluated the dataset using four real‑time object detection models.The results demonstrate the applicability of DVS cameras for robust detection of civil infrastructure defects under challenging lighting conditions.
Authors: Zhenteng Li, Sheng Lian, Dengfeng Pan, Youlin Wang, Wei Liu
Abstract: Object detection in Unmanned Aerial Vehicle (UAV) images poses significant challenges due to complex scale variations and class imbalance among objects. Existing methods often address these challenges separately, overlooking the intricate nature of UAV images and the potential synergy between them. In response, this paper proposes AD‑Det, a novel framework employing a coherent coarse‑to‑fine strategy that seamlessly integrates two pivotal components: Adaptive Small Object Enhancement (ASOE) and Dynamic Class‑balanced Copy‑paste (DCC). ASOE utilizes a high‑resolution feature map to identify and cluster regions containing small objects. These regions are subsequently enlarged and processed by a fine‑grained detector. On the other hand, DCC conducts object‑level resampling by dynamically pasting tail classes around the cluster centers obtained by ASOE, main‑taining a dynamic memory bank for each tail class. This approach enables AD‑Det to not only extract regions with small objects for precise detection but also dynamically perform reasonable resampling for tail‑class objects. Consequently, AD‑Det enhances the overall detection performance by addressing the challenges of scale variations and class imbalance in UAV images through a synergistic and adaptive framework. We extensively evaluate our approach on two public datasets, i.e., VisDrone and UAVDT, and demonstrate that AD‑Det significantly outperforms existing competitive alternatives. Notably, AD‑Det achieves a 37.5% Average Precision (AP) on the VisDrone dataset, surpassing its counterparts by at least 3.1%.
Authors: Juan I. Giribet, Alejandro S. Ghersin, Ignacio Mas, Harrison Neves Marciano, Daniel Khede Dourado Villa, Mario Sarcinelli-Filho
Abstract: This paper presents a control strategy based on dual quaternions for the coordinated formation flying of small UAV groups. A virtual structure is employed to define the desired formation, enabling unified control of its position, orientation, and shape. This abstraction makes formation management easier by allowing a low‑level controller to compute individual UAV commands efficiently. The proposed controller integrates a pose control module with a geometry‑based adaptive strategy, ensuring precise and robust task execution. The effectiveness of the approach is demonstrated through both simulation and experimental results.
Authors: Zherong Pan, Kui Wu
Abstract: Non‑convex constrained optimizations are ubiquitous in robotic applications such as multi‑agent navigation, UAV trajectory optimization, and soft robot simulation. For this problem class, conventional optimizers suffer from small step sizes and slow convergence. We propose BC‑ADMM, a variant of Alternating Direction Method of Multiplier (ADMM), that can solve a class of non‑convex constrained optimizations with biconvex constraint relaxation. Our algorithm allows larger step sizes by breaking the problem into small‑scale sub‑problems that can be easily solved in parallel. We show that our method has both theoretical convergence speed guarantees and practical convergence guarantees in the asymptotic sense. Through numerical experiments in a row of four robotic applications, we show that BC‑ADMM has faster convergence than conventional gradient descent and Newton's method in terms of wall clock time.
Authors: Mohamed Benzaghta, Giovanni Geraci, David López-Pérez, Alvaro Valcarce
Abstract: We address the challenge of designing cellular networks for uncrewed aerial vehicles (UAVs) corridors through a novel data‑driven approach. We assess multiple state‑of‑the‑art high‑dimensional Bayesian optimization (HD‑BO) techniques to jointly optimize the cell antenna tilts and half‑power beamwidth (HPBW). We find that some of these approaches achieve over 20dB gains in median SINR along UAV corridors, with negligible degradation to ground user performance. Furthermore, we explore the HD‑BO's capabilities in terms of model generalization via transfer learning, where data from a previously observed scenario source is leveraged to predict the optimal solution for a new scenario target. We provide examples of scenarios where such transfer learning is successful and others where it fails. Moreover, we demonstrate that HD‑BO enables multi‑objective optimization, identifying optimal design trade‑offs between data rates on the ground versus UAV coverage reliability. We observe that aiming to provide UAV coverage across the entire sky can lower the rates for ground users compared to setups specifically optimized for UAV corridors. Finally, we validate our approach through a case study in a real‑world cellular network, where HD‑BO identifies optimal and non‑obvious antenna configurations that result in more than double the rates along 3D UAV corridors with negligible ground performance loss.
Authors: Victor Monzon Baeza, Raúl Parada, Laura Concha Salor, Carlos Monzo
Abstract: The integration of Artificial Intelligence (AI) in military communications and networking is reshaping modern defense strategies, enhancing secure data exchange, real‑time situational awareness, and autonomous decision‑making. This survey explores how AI‑driven technologies improve tactical communication networks, radar‑based data transmission, UAV‑assisted relay systems, and electronic warfare resilience. The study highlights AI applications in adaptive signal processing, multi‑agent coordination for network optimization, radar‑assisted target tracking, and AI‑driven electronic countermeasures. Our work introduces a novel three‑criteria evaluation methodology. It systematically assesses AI applications based on general system objectives, communications constraints in the military domain, and critical tactical environmental factors. We analyze key AI techniques for different types of learning applied to multi‑domain network interoperability and distributed data information fusion in military operations. We also address challenges such as adversarial AI threats, the real‑time adaptability of autonomous communication networks, and the limitations of current AI models under battlefield conditions. Finally, we discuss emerging trends in self‑healing networks, AI‑augmented decision support systems, and intelligent spectrum allocation. We provide a structured roadmap for future AI‑driven defense communications and networking research.
Authors: Yufei Jiang, Yuanzhu Zhan, Harsh Vardhan Gupta, Chinmay Borde, Junyi Geng
Abstract: While Unmanned Aerial Vehicles (UAVs) have gained significant traction across various fields, path planning in 3D environments remains a critical challenge, particularly under size, weight, and power (SWAP) constraints. Traditional modular planning systems often introduce latency and suboptimal performance due to limited information sharing and local minima issues. End‑to‑end learning approaches streamline the pipeline by mapping sensory observations directly to actions but require large‑scale datasets, face significant sim‑to‑real gaps, or lack dynamical feasibility. In this paper, we propose a self‑supervised UAV trajectory planning pipeline that integrates a learning‑based depth perception with differentiable trajectory optimization. A 3D cost map guides UAV behavior without expert demonstrations or human labels. Additionally, we incorporate a neural network‑based time allocation strategy to improve the efficiency and optimality. The system thus combines robust learning‑based perception with reliable physics‑based optimization for improved generalizability and interpretability. Both simulation and real‑world experiments validate our approach across various environments, demonstrating its effectiveness and robustness. Our method achieves a 31.33% improvement in position tracking error and 49.37% reduction in control effort compared to the state‑of‑the‑art.
Authors: Aanchal Patel, Aswani Kumar Cherukuri
Abstract: Unmanned Aerial Vehicles are increasingly utilized across various domains, necessitating robust security measures for their communication networks. The ASCON family, a NIST finalist in lightweight cryptography standards, is known for its simplistic yet resilient design, making it well‑suited for resource‑constrained environments characterized by limited processing capabilities and energy reservoirs. This study focuses on understanding the integration and assessment of the ASCON encryption algorithm in UAV networks, emphasizing its potential as a lightweight and efficient cryptographic solution. The research objectives aim to evaluate ASCON variants' effectiveness in providing security comparable to AES‑128 while exhibiting lower computational cost and energy consumption within simulated UAV network environments. Comparative analysis assesses performance metrics such as encryption and decryption speeds, resource utilization, and resistance to cryptographic vulnerabilities against established algorithms like AES. Performance metrics, including peak and average execution times, overall throughput, and security properties against various cryptographic attacks, are measured and analysed to determine the most suitable cryptographic algorithm for UAV communication systems. Performance results indicate that ASCON‑128a as the optimal choice for UAV communication systems requiring a balance between efficiency and security. Its superior performance metrics, robust security properties, and suitability for resource‑constrained environments position it as the preferred solution for securing UAV communication networks. By leveraging the strengths of ASCON‑128a, UAV communication systems can achieve optimal performance and security, ensuring reliable and secure communication in challenging operational environments.
Authors: Mario Rico Ibanez, Azim Akhtarshenas, David Lopez-Perez, Giovanni Geraci
Abstract: Unmanned aerial vehicle (UAV)‑based base stations offer a promising solution in emergencies where the rapid deployment of cutting‑edge networks is crucial for maximizing life‑saving potential. Optimizing the strategic positioning of these UAVs is essential for enhancing communication efficiency. This paper introduces an automated reinforcement learning approach that enables UAVs to dynamically interact with their environment and determine optimal configurations. By leveraging the radio signal sensing capabilities of communication networks, our method provides a more realistic perspective, utilizing state‑of‑the‑art algorithm ‑‑ proximal policy optimization ‑‑ to learn and generalize positioning strategies across diverse user equipment (UE) movement patterns. We evaluate our approach across various UE mobility scenarios, including static, random, linear, circular, and mixed hotspot movements. The numerical results demonstrate the algorithm's adaptability and effectiveness in maintaining comprehensive coverage across all movement patterns.
Authors: Eslam Eldeeb, Hirley Alves
Abstract: The rapid growth of heterogeneous and massive wireless connectivity in 6G networks demands intelligent solutions to ensure scalability, reliability, privacy, ultra‑low latency, and effective control. Although artificial intelligence (AI) and machine learning (ML) have demonstrated their potential in this domain, traditional online reinforcement learning (RL) and deep RL methods face limitations in real‑time wireless networks. For instance, these methods rely on online interaction with the environment, which might be unfeasible, costly, or unsafe. In addition, they cannot handle the inherent uncertainties in real‑time wireless applications. We focus on offline and distributional RL, two advanced RL techniques that can overcome these challenges by training on static datasets and accounting for network uncertainties. We introduce a novel framework that combines offline and distributional RL for wireless communication applications. Through case studies on unmanned aerial vehicle (UAV) trajectory optimization and radio resource management (RRM), we demonstrate that our proposed Conservative Quantile Regression (CQR) algorithm outperforms conventional RL approaches regarding convergence speed and risk management. Finally, we discuss open challenges and potential future directions for applying these techniques in 6G networks, paving the way for safer and more efficient real‑time wireless systems.
Authors: Suman Raj, Rajdeep Singh, Kautuk Astu, Yogesh Simmhan
Abstract: The increasing adoption of UAVs with advanced sensors and GPU‑accelerated edge computing has enabled real‑time AI‑driven applications in fields such as precision agriculture, wildfire monitoring, and environmental conservation. However, integrating deep learning on UAVs remains challenging due to platform heterogeneity, real‑time constraints, and the need for seamless cloud‑edge coordination. To address these challenges, we introduce AeroDaaS, a service‑oriented framework that abstracts UAV‑based sensing complexities and provides a Drone‑as‑a‑Service (DaaS) model for intelligent decision‑making. AeroDaaS offers modular service primitives for on‑demand UAV sensing, navigation, and analytics as composable microservices, ensuring cross‑platform compatibility and scalability across heterogeneous UAV and edge‑cloud infrastructures. We implement and evaluate a preliminary version of AeroDaaS for two real‑world DaaS applications. We require <=40 lines of code for the applications and see minimal platform overhead of <=20 ms per frame and <=0.5 GB memory usage on Orin Nano. These early results are promising for AeroDaaS as an efficient, flexible and scalable UAV programming framework for autonomous aerial analytics.
Authors: Sebastian Gasche, Christian Kallies, Andreas Himmel, Rolf Findeisen
Abstract: Unmanned aerial vehicles (UAVs), especially multicopters, have recently gained popularity for use in surveillance, monitoring, inspection, and search and rescue missions. Their maneuverability and ability to operate in confined spaces make them particularly useful in cluttered environments. For advanced control and mission planning applications, accurate and resource‑efficient modeling of UAVs and their capabilities is essential. This study presents a modular approach to multicopter modeling that considers vehicle dynamics, energy consumption, and sensor integration. The power train model includes detailed descriptions of key components such as the lithium‑ion battery, electronic speed controllers, and brushless DC motors. Their models are validated with real test flight data. In addition, sensor models, including LiDAR and cameras, are integrated to describe the equipment often used in surveillance and monitoring missions. The individual models are combined into an energy‑aware multicopter model, which provide the basis for a companion study on path planning for unmanned aircaft system (UAS) swarms performing search and rescue missions in cluttered and dynamic environments. The flexible modeling approach enables easy description of different UAVs in a heterogeneous UAS swarm, allowing for energy‑efficient operations and autonomous decision making for a reliable mission performance.
Authors: Yike Qiao, Xiaodong He, An Zhuo, Zhiyong Sun, Weimin Bao, Zhongkui Li
Abstract: Vector fields are advantageous in handling nonholonomic motion planning as they provide reference orientation for robots. However, additionally incorporating curvature constraints becomes challenging, due to the interconnection between the design of the curvature‑bounded vector field and the tracking controller under underactuation. In this paper, we present a novel framework to co‑develop the vector field and the control laws, guiding the nonholonomic robot to the target configuration with curvature‑bounded trajectory. First, we formulate the problem by introducing the target positive limit set, which allows the robot to converge to or pass through the target configuration, depending on different dynamics and tasks. Next, we construct a curvature‑constrained vector field (CVF) via blending and distributing basic flow fields in workspace and propose the saturated control laws with a dynamic gain, under which the tracking error's magnitude decreases even when saturation occurs. Under the control laws, kinematically constrained nonholonomic robots are guaranteed to track the reference CVF and converge to the target positive limit set with bounded trajectory curvature. Numerical simulations show that the proposed CVF method outperforms other vector‑field‑based algorithms. Experiments on Ackermann UGVs and semi‑physical fixed‑wing UAVs demonstrate that the method can be effectively implemented in real‑world scenarios.
Authors: Van Chung Nguyen, Hung Manh La
Abstract: This study introduces a novel methodology for controlling Quadrotor Unmanned Aerial Vehicles, focusing on Hierarchical Sliding Mode Control strategies and an Extended Kalman Filter. Initially, an EKF is proposed to enhance robustness in estimating UAV states, thereby reducing the impact of measured noises and external disturbances. By locally linearizing UAV systems, the EKF can mitigate the disadvantages of the Kalman filter and reduce the computational cost of other nonlinear observers. Subsequently, in comparison to other related work in terms of stability and computational cost, the HSMC framework shows its outperformance in allowing the quadrotor UAVs to track the references. Three types of HSMC Aggregated HSMC, Incremental HSMC, and Combining HSMC are investigated for their effectiveness in tracking reference trajectories. Moreover, the stability of the quadrotor UAVs is rigorously analyzed using the Lyapunov stability principle. Finally, experimental results and comparative analyses demonstrate the efficacy and feasibility of the proposed methodologies.
Authors: Yujun Huang, Marius Furter, Gioele Zardini
Abstract: Optimizing the design of complex systems requires navigating interdependent decisions, heterogeneous components, and multiple objectives. Our monotone theory of co‑design offers a compositional framework for addressing this challenge, modeling systems as Design Problems (DPs), representing trade‑offs between functionalities and resources within partially ordered sets. While current approaches model uncertainty using intervals, capturing worst‑ and best‑case bounds, they fail to express probabilistic notions such as risk and confidence. These limitations hinder the applicability of co‑design in domains where uncertainty plays a critical role. In this paper, we introduce a unified framework for composable uncertainty in co‑design, capturing intervals, distributions, and parametrized models. This extension enables reasoning about risk‑performance trade‑offs and supports advanced queries such as experiment design, learning, and multi‑stage decision making. We demonstrate the expressiveness and utility of the framework via a numerical case study on the uncertainty‑aware co‑design of task‑driven Unmanned Aerial Vehicles (UAVs).
Authors: Achilles Kiwanuka Machumilane, Alberto Gotta, Pietro Cassarà
Abstract: Path planning and optimization for unmanned aerial vehicles (UAVs)‑assisted next‑generation wireless networks is critical for mobility management and ensuring UAV safety and ubiquitous connectivity, especially in dense urban environments with street canyons and tall buildings. Traditional statistical and model‑based techniques have been successfully used for path optimization in communication networks. However, when dynamic channel propagation characteristics such as line‑of‑sight (LOS), interference, handover, and signal‑to‑interference and noise ratio (SINR) are included in path optimization, statistical and model‑based path planning solutions become obsolete since they cannot adapt to the dynamic and time‑varying wireless channels, especially in the mmWave bands. In this paper, we propose a novel model‑free actor‑critic deep reinforcement learning (AC‑DRL) framework for path optimization in UAV‑assisted 5G mmWave wireless networks, which combines four important aspects of UAV communication: flight time, handover, connectivity and SINR. We train an AC‑RL agent that enables a UAV connected to a gNB to determine the optimal path to a desired destination in the shortest possible time with minimal gNB handover, while maintaining connectivity and the highest possible SINR. We train our model with data from a powerful ray tracing tool called Wireless InSite, which uses 3D images of the propagation environment and provides data that closely resembles the real propagation environment. The simulation results show that our system has superior performance in tracking high SINR compared to other selected RL algorithms.
Authors: Manzoor Ahmed, Aized Amin Soofi, Feroz Khan, Salman Raza, Wali Ullah Khan, Lina Su, Fang Xu, Zhu Han
Abstract: The integration of RIS into UAV networks presents a transformative solution for achieving energy‑efficient and reliable communication, particularly within the rapidly expanding low‑altitude economy (LAE). As UAVs facilitate diverse aerial services‑spanning logistics to smart surveillance‑their limited energy reserves create significant challenges. RIS effectively addresses this issue by dynamically shaping the wireless environment to enhance signal quality, reduce power consumption, and extend UAV operation time, thus enabling sustainable and scalable deployment across various LAE applications. This survey provides a comprehensive review of RIS‑assisted UAV networks, focusing on energy‑efficient design within LAE applications. We begin by introducing the fundamentals of RIS, covering its operational modes, deployment architectures, and roles in both terrestrial and aerial environments. Next, advanced EE‑driven strategies for integrating RIS and UAVs. Techniques such as trajectory optimization, power control, beamforming, and dynamic resource management are examined. Emphasis is placed on collaborative solutions that incorporate UAV‑mounted RIS, wireless energy harvesting (EH), and intelligent scheduling frameworks. We further categorize RIS‑enabled schemes based on key performance objectives relevant to LAE scenarios. These objectives include sum rate maximization, coverage extension, QoS guarantees, secrecy rate improvement, latency reduction, and age of information (AoI) minimization. The survey also delves into RIS‑UAV synergy with emerging technologies like MEC, NOMA, V2X communication, and WPT. These technologies are crucial to the LAE ecosystem. Finally, we outline open research challenges and future directions, emphasizing the critical role of energy‑aware, RIS‑enhanced UAV networks in shaping scalable, sustainable, and intelligent infrastructures within the LAE.
Authors: Jaehoon Choi, Dongki Jung, Yonghan Lee, Sungmin Eum, Dinesh Manocha, Heesung Kwon
Abstract: We present UAVTwin, a method for creating digital twins from real‑world environments and facilitating data augmentation for training downstream models embedded in unmanned aerial vehicles (UAVs). Specifically, our approach focuses on synthesizing foreground components, such as various human instances in motion within complex scene backgrounds, from UAV perspectives. This is achieved by integrating 3D Gaussian Splatting (3DGS) for reconstructing backgrounds along with controllable synthetic human models that display diverse appearances and actions in multiple poses. To the best of our knowledge, UAVTwin is the first approach for UAV‑based perception that is capable of generating high‑fidelity digital twins based on 3DGS. The proposed work significantly enhances downstream models through data augmentation for real‑world environments with multiple dynamic objects and significant appearance variations‑both of which typically introduce artifacts in 3DGS‑based modeling. To tackle these challenges, we propose a novel appearance modeling strategy and a mask refinement module to enhance the training of 3D Gaussian Splatting. We demonstrate the high quality of neural rendering by achieving a 1.23 dB improvement in PSNR compared to recent methods. Furthermore, we validate the effectiveness of data augmentation by showing a 2.5% to 13.7% improvement in mAP for the human detection task.
Authors: Nick Chodura, Melissa Greeff, Joshua Woods
Abstract: Rooftop 3D reconstruction using UAV‑based photogrammetry offers a promising solution for infrastructure assessment, but existing methods often require high percentages of image overlap and extended flight times to ensure model accuracy when using autonomous flight paths. This study systematically evaluates key flight parameters‑ground sampling distance (GSD) and image overlap‑to optimize the 3D reconstruction of complex rooftop infrastructure. Controlled UAV flights were conducted over a multi‑segment rooftop at Queen's University using a DJI Phantom 4 Pro V2, with varied GSD and overlap settings. The collected data were processed using Reality Capture software and evaluated against ground truth models generated from UAV‑based LiDAR and terrestrial laser scanning (TLS). Experimental results indicate that a GSD range of 0.75‑1.26 cm combined with 85% image overlap achieves a high degree of model accuracy, while minimizing images collected and flight time. These findings provide guidance for planning autonomous UAV flight paths for efficient rooftop assessments.
Authors: Marc Schneider, Renato Loureiro, Torbjørn Cunis, Walter Fichter
Abstract: Trajectory prediction of aerial vehicles is a key requirement in applications ranging from missile guidance to UAV collision avoidance. While most prediction methods assume deterministic target motion, real‑world targets often exhibit stochastic behaviors such as evasive maneuvers or random gliding patterns. This paper introduces a probabilistic framework based on Conditional Normalizing Flows (CNFs) to model and predict such stochastic dynamics directly from trajectory data. The learned model generates probability distributions of future target positions conditioned on initial states and dynamic parameters, enabling efficient sampling and exact density evaluation. To provide deterministic surrogates compatible with existing guidance and planning algorithms, sampled trajectories are clustered using a time series k‑means approach, yielding a set of representative "virtual target" trajectories. The method is target‑agnostic, computationally efficient, and requires only trajectory data for training, making it suitable as a drop‑in replacement for deterministic predictors. Simulated scenarios with maneuvering and ballistic targets demonstrate that the proposed approach bridges the gap between deterministic assumptions and stochastic reality, advancing guidance and control algorithms for autonomous vehicles.
Authors: Jie Xu, Yongxin Ma, Yixuan Li, Xuanxuan Zhang, Jun Zhou, Shenghai Yuan, Lihua Xie
Abstract: The accuracy of the initial state, including initial velocity, gravity direction, and IMU biases, is critical for the initialization of LiDAR‑inertial SLAM systems. Inaccurate initial values can reduce initialization speed or lead to failure. When the system faces urgent tasks, robust and fast initialization is required while the robot is moving, such as during the swift assessment of rescue environments after natural disasters, bomb disposal, and restarting LiDAR‑inertial SLAM in rescue missions. However, existing initialization methods usually require the platform to remain stationary, which is ineffective when the robot is in motion. To address this issue, this paper introduces a robust and fast dynamic initialization method for LiDAR‑inertial systems (D‑LI‑Init). This method iteratively aligns LiDAR‑based odometry with IMU measurements to achieve system initialization. To enhance the reliability of the LiDAR odometry module, the LiDAR and gyroscope are tightly integrated within the ESIKF framework. The gyroscope compensates for rotational distortion in the point cloud. Translational distortion compensation occurs during the iterative update phase, resulting in the output of LiDAR‑gyroscope odometry. The proposed method can initialize the system no matter the robot is moving or stationary. Experiments on public datasets and real‑world environments demonstrate that the D‑LI‑Init algorithm can effectively serve various platforms, including vehicles, handheld devices, and UAVs. D‑LI‑Init completes dynamic initialization regardless of specific motion patterns. To benefit the research community, we have open‑sourced our code and test datasets on GitHub.
Authors: Xiao Tang, Kexin Zhao, Chao Shen, Qinghe Du, Yichen Wang, Dusit Niyato, Zhu Han
Abstract: While unmanned aerial vehicles (UAVs) with flexible mobility are envisioned to enhance physical layer security in wireless communications, the efficient security design that adapts to such high network dynamics is rather challenging. The conventional approaches extended from optimization perspectives are usually quite involved, especially when jointly considering factors in different scales such as deployment and transmission in UAV‑related scenarios. In this paper, we address the UAV‑enabled multi‑user secure communications by proposing a deep graph reinforcement learning framework. Specifically, we reinterpret the security beamforming as a graph neural network (GNN) learning task, where mutual interference among users is managed through the message‑passing mechanism. Then, the UAV deployment is obtained through soft actor‑critic reinforcement learning, where the GNN‑based security beamforming is exploited to guide the deployment strategy update. Simulation results demonstrate that the proposed approach achieves near‑optimal security performance and significantly enhances the efficiency of strategy determination. Moreover, the deep graph reinforcement learning framework offers a scalable solution, adaptable to various network scenarios and configurations, establishing a robust basis for information security in UAV‑enabled communications.
Authors: Xiangwang Hou, Xianghe Wang, Jiacheng Wang, Zekai Zhang, Jun Du, Jingjing Wang, Yong Ren
Abstract: Unmanned aerial vehicles (UAVs) with integrated sensing, communication, computation and control (ISC3) capabilities have become key enablers of next‑generation wireless networks. Federated edge learning (FEL) leverages UAVs as mobile learning agents to collect data, perform local model updates, and contribute to global model aggregation. However, existing UAV‑assisted FEL systems face critical challenges, including excessive computational demands, privacy risks, and inefficient communication, primarily due to the requirement for full‑model training on resource‑constrained UAVs. To deal with aforementioned challenges, we propose Split Federated Learning for UAV‑Enabled ISC3 (SFLSC3), a novel framework that integrates split federated learning (SFL) into UAV‑assisted FEL. SFLSC3 optimally partitions model training between UAVs and edge servers, significantly reducing UAVs' computational burden while preserving data privacy. We conduct a theoretical analysis of UAV deployment, split point selection, data sensing volume, and client‑side aggregation frequency, deriving closed‑form upper bounds for the convergence gap. Based on these insights, we conceive a joint optimization problem to minimize the delay required to achieve a target model accuracy. Given the non‑convex nature of the problem, we develop a low‑complexity algorithm to efficiently determine UAV deployment, split point selection, and communication frequency. Extensive simulations on a target motion recognition task validate the effectiveness of SFLSC3, demonstrating superior convergence and delay performance compared to baseline methods.
Authors: Sk Abid Hasan, Lakshmikanta Sau, Sasthi C. Ghosh
Abstract: Unmanned aerial vehicle (UAV) assisted communication is a revolutionary technology that has been recently presented as a potential candidate for beyond fifth‑generation millimeter wave (mmWave) communications. Although mmWaves can offer a notably high data rate, their high penetration and propagation losses mean that line of sight (LoS) is necessary for effective communication. Due to the presence of obstacles and user mobility, UAV trajectory planning plays a crucial role in improving system performance. In this work, we propose a novel computational geometry‑based trajectory planning scheme by considering the user mobility, the priority of the delay sensitive ultra‑reliable low‑latency communications (URLLC) and the high throughput requirements of the enhanced mobile broadband (eMBB) traffic. Specifically, we use geometric tools like Apollonius circle and minimum enclosing ball of balls to find the optimal position of the UAV that supports uninterrupted connections to the URLLC users and maximizes the aggregate throughput of the eMBB users. Finally, the numerical results demonstrate the benefits of the suggested approach over an existing state of the art benchmark scheme in terms of sum throughput obtained by URLLC and eMBB users.
Authors: Tianqi, Ding, Dawei Xiang, Yijiashun Qi, Ze Yang, Zunduo Zhao, Tianyao Sun, Pengbin Feng, Haoyu Wang
Abstract: The rapid growth of industrial automation has highlighted the need for precise and efficient defect detection in large‑scale machinery. Traditional inspection techniques, involving manual procedures such as scaling tall structures for visual evaluation, are labor‑intensive, subjective, and often hazardous. To overcome these challenges, this paper introduces an automated defect detection framework built on Neural Radiance Fields (NeRF) and the concept of digital twins. The system utilizes UAVs to capture images and reconstruct 3D models of machinery, producing both a standard reference model and a current‑state model for comparison. Alignment of the models is achieved through the Iterative Closest Point (ICP) algorithm, enabling precise point cloud analysis to detect deviations that signify potential defects. By eliminating manual inspection, this method improves accuracy, enhances operational safety, and offers a scalable solution for defect detection. The proposed approach demonstrates great promise for reliable and efficient industrial applications.
Authors: Anas Shrinah, Kerstin Eder
Abstract: Simulation‑based testing provides a safe and cost‑effective environment for verifying the safety of Uncrewed Aerial Vehicles (UAVs). However, simulation can be resource‑consuming, especially when High‑Fidelity Simulators (HFS) are used. To optimise simulation resources, we propose a pseudo‑random test generator that uses a Low‑Fidelity Simulator (LFS) to estimate UAV flight paths. This work simplifies the PX4 autopilot HFS to develop a LFS, which operates one order of magnitude faster than the HFS.Test cases predicted to cause safety violations in the LFS are subsequently validated using the HFS.
Authors: Xuli Cai, Poonam Lohan, Burak Kantarci
Abstract: In critical situations such as natural disasters, network outages, battlefield communication, or large‑scale public events, Unmanned Aerial Vehicles (UAVs) offer a promising approach to maximize wireless coverage for affected users in the shortest possible time. In this paper, we propose a novel framework where multiple UAVs are deployed with the objective to maximize the number of served user equipment (UEs) while ensuring a predefined data rate threshold. UEs are initially clustered using a K‑means algorithm, and UAVs are optimally positioned based on the UEs' spatial distribution. To optimize power allocation and mitigate inter‑cluster interference, we employ the Multi‑Agent Deep Deterministic Policy Gradient (MADDPG) algorithm, considering both LOS and NLOS fading. Simulation results demonstrate that our method significantly enhances UEs coverage and outperforms Deep Q‑Network (DQN) and equal power distribution methods, improving their UE coverage by up to 2.07 times and 8.84 times, respectively.
Authors: Weilong Sun, Yumin Zhang, Boren Wei
Abstract: The Visual‑Inertial Simultaneous Localization and Mapping (VI‑SLAM) algorithms which are mostly based on static assumption are widely used in fields such as robotics, UAVs, VR, and autonomous driving. To overcome the localization risks caused by dynamic landmarks in most VI‑SLAM systems, a robust visual‑inertial motion prior SLAM system, named IDY‑VINS, is proposed in this paper which effectively handles dynamic landmarks using inertial motion prior for dynamic environments to varying degrees. Specifically, potential dynamic landmarks are preprocessed during the feature tracking phase by the probabilistic model of landmarks' minimum projection errors which are obtained from inertial motion prior and epipolar constraint. Subsequently, a robust and self‑adaptive bundle adjustment residual is proposed considering the minimum projection error prior for dynamic candidate landmarks. This residual is integrated into a sliding window based nonlinear optimization process to estimate camera poses, IMU states and landmark positions while minimizing the impact of dynamic candidate landmarks that deviate from the motion prior. Finally, a clean point cloud map without `ghosting effect' is obtained that contains only static landmarks. Experimental results demonstrate that our proposed system outperforms state‑of‑the‑art methods in terms of localization accuracy and time cost by robustly mitigating the influence of dynamic landmarks.
Authors: Bisheng Wei, Ruichen Zhang, Ruihong Jiang, Mugen Peng, Dusit Niyato
Abstract: With the rapid growth of the low‑altitude economy, there is increasing demand for real‑time data collection using UAV‑assisted wireless sensor networks. This paper investigates the problem of minimizing the age of information (AoI) in UAV‑assisted wireless sensor networks by optimizing the UAV flight routing. We formulate the AoI minimization task and propose a large language model (LLM)‑assisted UAV routing algorithm (LAURA). LAURA employs an LLM as intelligent crossover operators within an evolutionary optimization framework to efficiently explore the solution space. Simulation results show that LAURA outperforms benchmark methods in reducing the maximum AoI, especially in scenarios with a large number of sensor nodes.
Authors: Wenyi Liu, Huajie Wu, Liuyu Shi, Fangcheng Zhu, Yuying Zou, Fanze Kong, Fu Zhang
Abstract: In recent years, autonomous unmanned aerial vehicle (UAV) technology has seen rapid advancements, significantly improving operational efficiency and mitigating risks associated with manual tasks in domains such as industrial inspection, agricultural monitoring, and search‑and‑rescue missions. Despite these developments, existing UAV inspection systems encounter two critical challenges: limited reliability in complex, unstructured, and GNSS‑denied environments, and a pronounced dependency on skilled operators. To overcome these limitations, this study presents a LiDAR‑based UAV inspection system employing a dual‑phase workflow: human‑in‑the‑loop inspection and autonomous inspection. During the human‑in‑the‑loop phase, untrained pilots are supported by autonomous obstacle avoidance, enabling them to generate 3D maps, specify inspection points, and schedule tasks. Inspection points are then optimized using the Traveling Salesman Problem (TSP) to create efficient task sequences. In the autonomous phase, the quadrotor autonomously executes the planned tasks, ensuring safe and efficient data acquisition. Comprehensive field experiments conducted in various environments, including slopes, landslides, agricultural fields, factories, and forests, confirm the system's reliability and flexibility. Results reveal significant enhancements in inspection efficiency, with autonomous operations reducing trajectory length by up to 40% and flight time by 57% compared to human‑in‑the‑loop operations. These findings underscore the potential of the proposed system to enhance UAV‑based inspections in safety‑critical and resource‑constrained scenarios.
Authors: Oscar F. Archila, Alain Vande Wouwer, Johannes Schiffer
Abstract: Collision avoidance is a problem largely studied in robotics, particularly in unmanned aerial vehicle (UAV) applications. Among the main challenges in this area are hardware limitations, the need for rapid response, and the uncertainty associated with obstacle detection. Artificial potential functions (APOFs) are a prominent method to address these challenges. However, existing solutions lack assurances regarding closed‑loop stability and may result in chattering effects. Motivated by this, we propose a control method for static obstacle avoidance based on multiple artificial potential functions (MAPOFs). We derive tuning conditions on the control parameters that ensure the stability of the final position. The stability proof is established by analyzing the closed‑loop system using tools from hybrid systems theory. Furthermore, we validate the performance of the MAPOF control through simulations, showcasing its effectiveness in avoiding static obstacles.
Authors: Yu Cheng, Harun Šiljak
Abstract: Recent advancements in unmanned aerial vehicle (UAV) technology have opened new avenues for dynamic data collection in challenging environments, such as sports fields during fast‑paced sports action. For the purposes of monitoring sport events for dangerous injuries, we envision a coordinated UAV fleet designed to capture high‑quality, multi‑view video footage of collision events in real‑time. The extracted video data is crucial for analyzing athletes' motions and investigating the probability of sports‑related traumatic brain injuries (TBI) during impacts. This research implemented a UAV fleet system on the NetLogo platform, utilizing custom collision detection algorithms to compare against traditional TV‑coverage strategies. Our system supports decentralized data capture and autonomous processing, providing resilience in the rapidly evolving dynamics of sports collisions.
The collaboration algorithm integrates both shared and local data to generate multi‑step analyses aimed at determining the efficacy of custom methods in enhancing the accuracy of TBI prediction models. Missions are simulated in real‑time within a two‑dimensional model, focusing on the strategic capture of collision events that could lead to TBI, while considering operational constraints such as rapid UAV maneuvering and optimal positioning. Preliminary results from the NetLogo simulations suggest that custom collision detection methods offer superior performance over standard TV‑coverage strategies by enabling more precise and timely data capture. This comparative analysis highlights the advantages of tailored algorithmic approaches in critical sports safety applications.
Authors: Subhadip Ghosh, Priyadarshi Mukherjee, Sasthi C. Ghosh
Abstract: These days, unmanned aerial vehicle (UAV)‑based millimeter wave (mmWave) communication systems have drawn a lot of attention due to the increasing demand for faster data rates. Given the susceptibility of mmWave signals to obstacles and high propagation loss of mmWaves, ensuring line‑of‑sight (LoS) connectivity is critical for maintaining robust and efficient communication. Furthermore, UAVs have limited power resource and limited capacity in terms of number of users it can serve. Most significantly different users have different delay requirements and they keep moving while interacting with the UAVs. In this paper, first, we have provided an efficient solution for the optimal movement of the UAVs, by taking into account the energy efficiency of the UAVs as well as the mobility and delay priority of the users. Next, we have proposed a greedy solution for the optimal user‑UAV assignment. After that, the numerical results show how well the suggested solution performs in comparison to the current benchmarks in terms of delay suffered by the users, number of unserved users, and energy efficiency of the UAVs.
Authors: Mahya Nikouei, Bita Baroutian, Shahabedin Nabavi, Fateme Taraghi, Atefe Aghaei, Ayoob Sajedi, Mohsen Ebrahimi Moghaddam
Abstract: Small object detection (SOD) is a critical yet challenging task in computer vision, with applications like spanning surveillance, autonomous systems, medical imaging, and remote sensing. Unlike larger objects, small objects contain limited spatial and contextual information, making accurate detection difficult. Challenges such as low resolution, occlusion, background interference, and class imbalance further complicate the problem. This survey provides a comprehensive review of recent advancements in SOD using deep learning, focusing on articles published in Q1 journals during 2024‑2025. We analyzed challenges, state‑of‑the‑art techniques, datasets, evaluation metrics, and real‑world applications. Recent advancements in deep learning have introduced innovative solutions, including multi‑scale feature extraction, Super‑Resolution (SR) techniques, attention mechanisms, and transformer‑based architectures. Additionally, improvements in data augmentation, synthetic data generation, and transfer learning have addressed data scarcity and domain adaptation issues. Furthermore, emerging trends such as lightweight neural networks, knowledge distillation (KD), and self‑supervised learning offer promising directions for improving detection efficiency, particularly in resource‑constrained environments like Unmanned Aerial Vehicles (UAV)‑based surveillance and edge computing. We also review widely used datasets, along with standard evaluation metrics such as mean Average Precision (mAP) and size‑specific AP scores. The survey highlights real‑world applications, including traffic monitoring, maritime surveillance, industrial defect detection, and precision agriculture. Finally, we discuss open research challenges and future directions, emphasizing the need for robust domain adaptation techniques, better feature fusion strategies, and real‑time performance optimization.
Authors: Yulu Han, Ziye Jia, Sijie He, Yu Zhang, Qihui Wu
Abstract: The unmanned aerial vehicle (UAV) network has gained significant attentions in recent years due to its various applications. However, the traffic security becomes the key threatening public safety issue in an emergency rescue system due to the increasing vulnerability of UAVs to cyber attacks in environments with high heterogeneities. Hence, in this paper, we propose a novel anomaly traffic detection architecture for UAV networks based on the software‑defined networking (SDN) framework and blockchain technology. Specifically, SDN separates the control and data plane to enhance the network manageability and security. Meanwhile, the blockchain provides decentralized identity authentication and data security records. Beisdes, a complete security architecture requires an effective mechanism to detect the time‑series based abnormal traffic. Thus, an integrated algorithm combining convolutional neural networks (CNNs) and Transformer (CNN+Transformer) for anomaly traffic detection is developed, which is called CTranATD. Finally, the simulation results show that the proposed CTranATD algorithm is effective and outperforms the individual CNN, Transformer, and LSTM algorithms for detecting anomaly traffic.
Authors: Nesrine Cherif, Qurrat-Ul-Ain Nadeem
Abstract: Unmanned Aerial Vehicles (UAVs) are increasingly used in a plethora of applications such as shipping, surveillance, and search‑and‑rescue. For UAVs to operate safely, reliable cellular connectivity is essential. Utilizing the terrestrial networks for aerial connectivity has been proposed, but the 3D radiation pattern of base station antennas significantly affects the performance of aerial links.. To address this, we evaluate the coverage probability of cellular‑connected UAVs, considering vertical antenna gain, by leveraging tools from stochastic geometry. We also analyze how the UAV hovering height, tilt angle and 3D antenna beamwidth influence the reliability of the communication link. Our results show that a down‑tiled antenna does not only improve the connectivity of terrestrial users but also its cellularconnected UAVs counterpart. Moreover, the coverage probability of the UAV‑UE becomes saturated at large down‑tilt angles at the TBSs due to the antenna sidelobe gain at the serving and interfering TBSs. We also found that the significant increase of the vertical antenna beamwidth improves the UAV user coverage probability especially at relatively low hovering altitudes thanks to the increase of the desired signal strength compared to the interference power.
Authors: Ziji Guo, Haonan Tong, Zhilong Zhang, Danpu Liu
Abstract: Recent advances in integrated sensing and communication (ISAC) unmanned aerial vehicles (UAVs) have enabled their widespread deployment in critical applications such as emergency management. This paper investigates the challenge of efficient multitask multimodal data communication in UAV‑assisted ISAC systems, in the considered system model, hyperspectral (HSI) and LiDAR data are collected by UAV‑mounted sensors for both target classification and data reconstruction at the terrestrial BS. The limited channel capacity and complex environmental conditions pose significant challenges to effective air‑to‑ground communication. To tackle this issue, we propose a perception‑enhanced multitask multimodal semantic communication (PE‑MMSC) system that strategically leverages the onboard computational and sensing capabilities of UAVs. In particular, we first propose a robust multimodal feature fusion method that adaptively combines HSI and LiDAR semantics while considering channel noise and task requirements. Then the method introduces a perception‑enhanced (PE) module incorporating attention mechanisms to perform coarse classification on UAV side, thereby optimizing the attention‑based multimodal fusion and transmission. Experimental results demonstrate that the proposed PE‑MMSC system achieves 5%‑‑10% higher target classification accuracy compared to conventional systems without PE module, while maintaining comparable data reconstruction quality with acceptable computational overheads.
Authors: Omid Esrafilian, Rakesh Mundlamuri, Florian Kaltenberger, Raymond Knopp, David Gesbert
Abstract: This paper considers the challenge of localizing ground users with the help of a radio‑equipped unmanned aerial vehicle (UAV) that collects measurements from users. We utilize time‑of‑arrival (ToA) measurements estimated from the radio signals received from users collected by a UAV at different locations. Since the UAV's location might not be perfectly known, the problem becomes about simultaneously localizing the users and tracking the UAV's position. To solve this problem, we employed a least‑squares simultaneous localization and mapping (SLAM) framework to fuse ToA data and the estimate of UAV location available from global positioning system (GPS). We verified the performance of the developed algorithm through real‑world experimentation.
Authors: Salim Janji, Paweł Sroka, Adrian Kliks
Abstract: This paper addresses the crucial need for reliable wireless communication in vehicular networks, particularly vital for the safety and efficacy of (semi‑)autonomous driving amid increasing traffic. We explore the use of Reconfigurable Intelligent Surfaces (RISes) mounted on Drone Relay Stations (DRS) to enhance communication reliability. Our study formulates an optimization problem to pinpoint the optimal location and orientation of the DRS, thereby creating an additional propagation path for vehicle‑to‑everything (V2X) communications. We introduce a heuristic approach that combines trajectory optimization for DRS positioning and a Q‑learning scheme for RIS orientation. Our results not only confirm the convergence of the Q‑learning algorithm but also demonstrate significant communication improvements achieved by integrating a DRS into V2X networks.
Authors: Zifa Chen
Abstract: UAV has been widely used in various fields. However, most of the existing object detectors used in drones are not end‑to‑end and require the design of various complex components and careful fine‑tuning. Most of the existing end‑to‑end object detectors are designed for natural scenes. It is not ideal to apply them directly to UAV images. In order to solve the above challenges, we design an local‑global information interaction DETR for UAVs, namely LGI‑DETR. Cross‑layer bidirectional low‑level and high‑level feature information enhancement, this fusion method is effective especially in the field of small objection detection. At the initial stage of encoder, we propose a local spatial enhancement module (LSE), which enhances the low‑level rich local spatial information into the high‑level feature, and reduces the loss of local information in the transmission process of high‑level information. At the final stage of the encoder, we propose a novel global information injection module (GII) designed to integrate rich high‑level global semantic representations with low‑level feature maps. This hierarchical fusion mechanism effectively addresses the inherent limitations of local receptive fields by propagating contextual information across the feature hierarchy. Experimental results on two challenging UAV image object detection benchmarks, VisDrone2019 and UAVDT, show that our proposed model outperforms the SOTA model. Compared to the baseline model, AP and AP50 improved by 1.9% and 2.4%, respectively.
Authors: Brenner S. Rego, Daniel N. Cardoso, Marco. H. Terra, Guilherme V. Raffo
Abstract: This paper proposes a joint state‑parameter observer‑based controller for trajectory tracking of an octocopter unmanned aerial vehicle (OUAV), for transportation of a heavy load with unknown mass and size. The multi‑body dynamic model of the OUAV with a rigidly attached load is obtained, effectively considering the effects of the load parameters into the dynamics of the system. A robust nonlinear W‑infinity control strategy is designed for optimal trajectory tracking of the OUAV, with information of the states and load parameters provided by a joint estimation unscented Kalman filter. The effectiveness of the proposed strategy is corroborated by numerical results.
Authors: Jean C. Pereira, Valter J. S. Leite, Guilherme V. Raffo
Abstract: This paper addresses the motion control problem for underactuated mechanical systems with full attitude control and one translational force input to manage the six degrees of freedom involved in the three‑dimensional Euclidean space. These systems are often classified as second‑order nonholonomic due to their completely nonintegrable acceleration constraints. To tackle this complex control problem, we propose two nonlinear model predictive control (NMPC) schemes that ensure closed‑loop stability and recursive feasibility without terminal conditions. The system dynamics are modeled on the SE(3) manifold for a globally and unique description of rigid body configurations. One NMPC scheme also aims to reduce mission time as an economic criterion. The controllers' effectiveness is validated through numerical experiments on a quadrotor UAV.
Authors: Arthur Amorim, Max Taylor, Trevor Kann, Gary T. Leavens, William L. Harrison, Lance Joneckis
Abstract: Unmanned aerial vehicles (UAVs) depend on untrusted software components to automate dangerous or critical missions, making them a desirable target for attacks. Some work has been done to prevent an attacker who has either compromised a ground control station or parts of a UAV's software from sabotaging the vehicle, but not both. We present an architecture running a UAV software stack with runtime monitoring and seL4‑based software isolation that prevents attackers from both exploiting software bugs and stealthy attacks. Our architecture retrofits legacy UAVs and secures the popular MAVLink protocol, making wide adoption possible.
Authors: Yan Kyaw Tun, Nway Nway Ei, Sheikh Salman Hassan, Cedomir Stefanovic, Nguyen Van Huynh, Madyan Alsenwi, Choong Seon Hong
Abstract: In this paper, we investigate beamforming design and trajectory optimization for a multi‑unmanned aerial vehicle (UAV)‑assisted integrated sensing and communication (ISAC) system. The proposed system employs multiple UAVs equipped with dual‑functional radar‑communication capabilities to simultaneously perform target sensing and provide communication services to users. We formulate a joint optimization problem that aims to maximize the sum rate of users while maintaining target sensing performance through coordinated beamforming and UAV trajectory design. To address this challenging non‑convex problem, we develop a block coordinated descent (BCD)‑based iterative algorithm that decomposes the original problem into tractable subproblems. Then, the beamforming design problem is addressed using fractional programming, while the UAV trajectory is refined through the deep deterministic policy gradient (DDPG) algorithm. The simulation results demonstrate that the proposed joint optimization approach achieves significant performance improvements in both communication throughput and sensing accuracy compared to conventional, separated designs. We also show that proper coordination of multiple UAVs through optimized trajectories and beamforming patterns can effectively balance the tradeoff between sensing and communication objectives.
Authors: Farshad Rostami Ghadi, Masoud Kaveh, Francisco Hernando-Gallego, Diego Martin, Kai-Kit Wong, Chan-Byoung Chae
Abstract: This letter studies the impact of fluid antenna system (FAS) technology on the performance of unmanned aerial vehicle (UAV)‑assisted multiuser communication networks. Specifically, we consider a scenario where a fixed‑position antenna (FPA) base station (BS) serves K FAS‑equipped users with the assistance of a UAV acting as an aerial relay. The BS employs rate‑splitting multiple access (RSMA), while the UAV operates in half‑duplex (HD) mode using the decode‑and‑forward (DF) strategy. For this system, we derive a compact analytical expression for the outage probability (OP) and its asymptotic behavior in the high signal‑to‑noise ratio (SNR) regime, leveraging the multivariate t‑distribution. Our results show how deploying FAS at ground users (GUs) in UAV‑aided communications improves overall system performance compared to using FPA GUs.
Authors: Talip Tolga Sarı, Gökhan Seçinti, Angelo Trotta
Abstract: In large‑scale UAV swarms, dynamically executing machine learning tasks can pose significant challenges due to network volatility and the heterogeneous resource constraints of each UAV. Traditional approaches often rely on centralized orchestration to partition tasks among nodes. However, these methods struggle with communication bottlenecks, latency, and reliability when the swarm grows or the topology shifts rapidly. To overcome these limitations, we propose a fully distributed, diffusive metric‑based approach for split computing in UAV swarms. Our solution introduces a new iterative measure, termed the aggregated gigaflops, capturing each node's own computing capacity along with that of its neighbors without requiring global network knowledge. By forwarding partial inferences intelligently to underutilized nodes, we achieve improved task throughput, lower latency, and enhanced energy efficiency. Further, to handle sudden workload surges and rapidly changing node conditions, we incorporate an early‑exit mechanism that can adapt the inference pathway on‑the‑fly. Extensive simulations demonstrate that our approach significantly outperforms baseline strategies across multiple performance indices, including latency, fairness, and energy consumption. These results highlight the feasibility of large‑scale distributed intelligence in UAV swarms and provide a blueprint for deploying robust, scalable ML services in diverse aerial networks.
Authors: Wali Ullah Khan, Chandan Kumar Sheemar, Eva Lagunas, Symeon Chatzinotas
Abstract: Beyond diagonal reconfigurable intelligent surfaces (BD‑RIS) have emerged as a transformative technology for enhancing wireless communication by intelligently manipulating the propagation environment. This paper explores the potential of BD‑RIS in improving cognitive radio enabled multilayer non‑terrestrial networks (NTNs). It is assumed that a high‑altitude platform station (HAPS) has set up the primary network, while an uncrewed aerial vehicle (UAV) establishes the secondary network in the HAPS footprint. We formulate a joint optimization problem to maximize the secrecy rate by optimizing BD‑RIS phase shifts and the secondary transmitter power allocation while controlling the interference temperature from the secondary network to the primary network. To solve this problem efficiently, we decouple the original problem into two sub‑problems, which are solved iteratively by relying on alternating optimization. Simulation results demonstrate the effectiveness of BD‑RIS in cognitive radio‑enabled multilayer NTNs to accommodate the secondary network while satisfying the constraints imposed from the primary network.
Authors: Ruben Queiros, Megumi Kaneko, Helder Fontes, Rui Campos
Abstract: Flying Networks (FNs) have emerged as a promising solution to provide on‑demand wireless connectivity when network coverage is insufficient or the communications infrastructure is compromised, such as in disaster management scenarios. Despite extensive research on Unmanned Aerial Vehicle (UAV) positioning and radio resource allocation, the challenge of ensuring reliable traffic relay through backhaul links in predictive FNs remains unexplored.
This work proposes Simulated Annealing for predictive FNs (SAFnet), an innovative algorithm that optimizes network performance under positioning constraints, limited bandwidth and minimum rate requirements. Our algorithm uniquely leverages prior knowledge of the first‑tier node trajectories to assign bandwidth and dynamically adjust the position of the second‑tier flying relay. Building upon Simulated Annealing, our approach enhances this well‑known AI algorithm with penalty functions, achieving performance levels comparable to exhaustive search while significantly reducing computational complexity.
Authors: Atharva Ghotavadekar, František Nekovář, Martin Saska, Jan Faigl
Abstract: Agile trajectory planning can improve the efficiency of multi‑rotor Uncrewed Aerial Vehicles (UAVs) in scenarios with combined task‑oriented and kinematic trajectory planning, such as monitoring spatio‑temporal phenomena or intercepting dynamic targets. Agile planning using existing non‑linear model predictive control methods is limited by the number of planning steps as it becomes increasingly computationally demanding. That reduces the prediction horizon length, leading to a decrease in solution quality. Besides, the fixed time‑step length limits the utilization of the available UAV dynamics in the target neighborhood. In this paper, we propose to address these limitations by introducing variable time steps and coupling them with the prediction horizon length. A simplified point‑mass motion primitive is used to leverage the differential flatness of quadrotor dynamics and the generation of feasible trajectories in the flat output space. Based on the presented evaluation results and experimentally validated deployment, the proposed method increases the solution quality by enabling planning for long flight segments but allowing tightly sampled maneuvering.
Authors: Lin Geng, Hao Li, Sidney Givigi, Bram Adams
Abstract: Heterogeneous unmanned aerial vehicle (UAV) swarms consist of dozens to hundreds of drones with different roles and varying hardware and software requirements collaborating towards a shared mission. While traditional approaches for synchronized software updates assume swarms to be unstructured and homogeneous, the heterogeneous nature of modern swarms and the emerging need of drones to update their deep learning (perception) models with new objectives or data as a mission unfolds, has made efficient software update methods crucial for swarms to adapt to dynamic environments. To address these challenges, we introduce the SwarmUpdate framework for software updates in heterogeneous UAV swarms, composed of two key components: SwarmSync and SwarmModelPatch. SwarmSync is a hierarchical software update synchronization strategy to distribute a software update to the right subset of drones within a swarm, while SwarmModelPatch is a deep learning model patching method that reduces the size of a (deep learning model) update by only allowing some layers of the model to be updated (freezing the other layers). In this paper, we systematically evaluate the performance of SwarmSync through large‑scale simulations in the ARGoS swarm simulator, comparing SwarmSync to auction‑based (SOUL) and gossip‑based rebroadcasting (Gossip) baselines, and SwarmModelPatch to a non‑incremental model patching strategy.
Authors: Hubert Szolc, Mateusz Wasala, Remigiusz Mietla, Kacper Iwicki, Tomasz Kryjak
Abstract: The use of unmanned aerial vehicles (UAVs) for smart agriculture is becoming increasingly popular. This is evidenced by recent scientific works, as well as the various competitions organised on this topic. Therefore, in this work we present a system for automatic fruit counting using UAVs. To detect them, our solution uses a vision algorithm that processes streams from an RGB camera and a depth sensor using classical image operations. Our system also allows the planning and execution of flight trajectories, taking into account the minimisation of flight time and distance covered. We tested the proposed solution in simulation and obtained an average score of 87.27/100 points from a total of 500 missions. We also submitted it to the UAV Competition organised as part of the ICUAS 2024 conference, where we achieved an average score of 84.83/100 points, placing 6th in a field of 23 teams and advancing to the finals.
Authors: Thu Tran, Kenny Tsu Wei Choo, Shaohui Foong, Hitesh Bhardwaj, Shane Kyi Hla Win, Wei Jun Ang, Kenneth Goh, Rajesh Krishna Balan
Abstract: Monitoring swimmer performance is crucial for improving training and enhancing athletic techniques. Traditional methods for tracking swimmers, such as above‑water and underwater cameras, face limitations due to the need for multiple cameras and obstructions from water splashes. This paper presents a novel approach for tracking swimmers using a moving UAV. The proposed system employs a UAV equipped with a high‑resolution camera to capture aerial footage of the swimmers. The footage is then processed using computer vision algorithms to extract the swimmers' positions and movements. This approach offers several advantages, including single camera use and comprehensive coverage. The system's accuracy is evaluated with both training and in competition videos. The results demonstrate the system's ability to accurately track swimmers' movements, limb angles, stroke duration and velocity with the maximum error of 0.3 seconds and 0.35~m/s for stroke duration and velocity, respectively.
Authors: Leonardo Grando, Juan Fernando Galindo Jaramillo, Jose Roberto Emiliano Leite, Edson Luiz Ursini
Abstract: The low battery autonomy of Unnamed Aerial Vehicles (UAVs or drones) can make smart farming (precision agriculture), disaster recovery, and the fighting against dengue vector applications difficult. This article considers two approaches, first enumerating the characteristics observed in these three IoT application types and then modeling an UAV's battery recharge coordination using the Agent‑Based Simulation (ABS) approach. In this way, we propose that each drone inside the swarm does not communicate concerning this recharge coordination decision, reducing energy usage and permitting remote usage. A total of 6000 simulations were run to evaluate how two proposed policies, the BaseLine (BL) and ChargerThershold (CT) coordination recharging policy, behave in 30 situations regarding how each simulation sets conclude the simulation runs and how much time they work until recharging results. CT policy shows more reliable results in extreme system usage. This work conclusion presents the potential of these three IoT applications to achieve their perpetual service without communication between drones and ground stations. This work can be a baseline for future policies and simulation parameter enhancements.
Authors: Xuan Ma, Zewen Lv, Chengcai Ma, Tao Zhang, Yuelan Xin, Kun Zhan
Abstract: Extremely degraded grassland on the Qinghai‑Tibetan Plateau (QTP) presents a significant environmental challenge due to overgrazing, climate change, and rodent activity, which degrade vegetation cover and soil quality. These extremely degraded grassland on QTP, commonly referred to as black‑soil area, require accurate assessment to guide effective restoration efforts. In this paper, we present a newly created QTP black‑soil dataset, annotated under expert guidance. We introduce a novel neural network model, BS‑Mamba, specifically designed for the black‑soil area detection using UAV remote sensing imagery. The BS‑Mamba model demonstrates higher accuracy in identifying black‑soil area across two independent test datasets than the state‑of‑the‑art models. This research contributes to grassland restoration by providing an efficient method for assessing the extent of black‑soil area on the QTP.
Authors: Neelanga Thelasingha, Agung Julius, James Humann, James Dotterweich
Abstract: In complex multi‑agent systems involving heterogeneous teams, uncertainty arises from numerous sources like environmental disturbances, model inaccuracies, and changing tasks. This causes planned trajectories to become infeasible, requiring replanning. Further, different communication architectures used in multi‑agent systems give rise to asymmetric knowledge of planned trajectories across the agents. In such systems, replanning must be done in a communication‑aware fashion. This paper establishes the conditions for synchronization and feasibility in epistemic planning scenarios introduced by opportunistic communication architectures. We also establish conditions on task satisfaction based on quantified recoverability of disturbances in an iterative planning scheme. We further validate these theoretical results experimentally in a UAV‑‑UGV task assignment problem.
Authors: Yanpeng Jia, Shiyi Wang, Shiliang Shao, Yue Wang, Fu Zhang, Ting Wang
Abstract: Ground robots play a crucial role in inspection, exploration, rescue, and other applications. In recent years, advancements in LiDAR technology have made sensors more accurate, lightweight, and cost‑effective. Therefore, researchers increasingly integrate sensors, for SLAM studies, providing robust technical support for ground robots and expanding their application domains. Public datasets are essential for advancing SLAM technology. However, existing datasets for ground robots are typically restricted to flat‑terrain motion with 3 DOF and cover only a limited range of scenarios. Although handheld devices and UAV exhibit richer and more aggressive movements, their datasets are predominantly confined to small‑scale environments due to endurance limitations. To fill these gap, we introduce M2UD, a multi‑modal, multi‑scenario, uneven‑terrain SLAM dataset for ground robots. This dataset contains a diverse range of highly challenging environments, including cities, open fields, long corridors, and mixed scenarios. Additionally, it presents extreme weather conditions. The aggressive motion and degradation characteristics of this dataset not only pose challenges for testing and evaluating existing SLAM methods but also advance the development of more advanced SLAM algorithms. To benchmark SLAM algorithms, M2UD provides smoothed ground truth localization data obtained via RTK and introduces a novel localization evaluation metric that considers both accuracy and efficiency. Additionally, we utilize a high‑precision laser scanner to acquire ground truth maps of two representative scenes, facilitating the development and evaluation of mapping algorithms. We select 12 localization sequences and 2 mapping sequences to evaluate several classical SLAM algorithms, verifying usability of the dataset. To enhance usability, the dataset is accompanied by a suite of development kits.
Authors: Xinglong Sun, Haijiang Sun, Shan Jiang, Jiacheng Wang, Jiasong Wang
Abstract: The trackers based on lightweight neural networks have achieved great success in the field of aerial remote sensing, most of which aggregate multi‑stage deep features to lift the tracking quality. However, existing algorithms usually only generate single‑stage fusion features for state decision, which ignore that diverse kinds of features are required for identifying and locating the object, limiting the robustness and precision of tracking. In this paper, we propose a novel target‑aware Bidirectional Fusion transformer (BFTrans) for UAV tracking. Specifically, we first present a two‑stream fusion network based on linear self and cross attentions, which can combine the shallow and the deep features from both forward and backward directions, providing the adjusted local details for location and global semantics for recognition. Besides, a target‑aware positional encoding strategy is designed for the above fusion model, which is helpful to perceive the object‑related attributes during the fusion phase. Finally, the proposed method is evaluated on several popular UAV benchmarks, including UAV‑123, UAV20L and UAVTrack112. Massive experimental results demonstrate that our approach can exceed other state‑of‑the‑art trackers and run with an average speed of 30.5 FPS on embedded platform, which is appropriate for practical drone deployments.
Authors: Hussein N. Naser, Hashim A. Hashim, Mojtaba Ahmadi
Abstract: This paper presents a nonlinear control strategy for an aerial cooperative payload transportation system consisting of two quadrotor UAVs rigidly connected to a payload. The system includes human physical interaction facilitated by an admittance control. The proposed control framework integrates an adaptive Backstepping controller for the position subsystem and a Fast Nonsingular Terminal Sliding Mode Control (FNTSMC) for the attitude subsystem to ensure asymptotic stabilization. The admittance controller interprets the interaction forces from the human operator, generating reference trajectories for the position controller to ensure accurate tracking of the operator's guidance. The system aims to assist humans in payload transportation, providing both stability and responsiveness. The robustness and effectiveness of the proposed control scheme in maintaining system stability and performance under various conditions are presented.
Authors: Junning Liang, Haowen Zheng, Yuying Zhang, Yongzhuo Gao, Wei Dong, Ximin Lyu
Abstract: Turbojet‑powered VTOL UAVs have garnered increased attention in heavy‑load transport and emergency services, due to their superior power density and thrust‑to‑weight ratio compared to existing electronic propulsion systems. The main challenge with jet‑powered UAVs lies in the complexity of thrust vectoring mechanical systems, which aim to mitigate the slow dynamics of the turbojet. In this letter, we introduce a novel turbojet‑powered UAV platform named Hex‑Jet. Our concept integrates thrust vectoring and differential thrust for comprehensive attitude control. This approach notably simplifies the thrust vectoring mechanism. We utilize a predictor‑based time delay control method based on the frequency domain model in our Hex‑Jet controller design to mitigate the delay in roll attitude control caused by turbojet dynamics. Our comparative studies provide valuable insights for the UAV community, and flight tests on the scaled prototype demonstrate the successful implementation and verification of the proposed predictor‑based time delay control technique.
Authors: Yubo Yang, Tao Yang, Xiaofeng Wu, Ziyu Guo, Bo Hu
Abstract: UAV swarms are widely used in emergency communications, area monitoring, and disaster relief. Coordinated by control centers, they are ideal for federated learning (FL) frameworks. However, current UAV‑assisted FL methods primarily focus on single tasks, overlooking the need for multi‑task training. In disaster relief scenarios, UAVs perform tasks such as crowd detection, road feasibility analysis, and disaster assessment, which exhibit time‑varying demands and potential correlations. In order to meet the time‑varying requirements of tasks and complete multiple tasks efficiently under resource constraints, in this paper, we propose a UAV swarm based multi‑task FL framework, where ground emergency vehicles (EVs) collaborate with UAVs to accomplish multiple tasks efficiently under constrained energy and bandwidth resources. Through theoretical analysis, we identify key factors affecting task performance and introduce a task attention mechanism to dynamically evaluate task importance, thereby achieving efficient resource allocation. Additionally, we propose a task affinity (TA) metric to capture the dynamic correlation among tasks, thereby promoting task knowledge sharing to accelerate training and improve the generalization ability of the model in different scenarios. To optimize resource allocation, we formulate a two‑layer optimization problem to jointly optimize UAV transmission power, computation frequency, bandwidth allocation, and UAV‑EV associations. For the inner problem, we derive closed‑form solutions for transmission power, computation frequency, and bandwidth allocation and apply a block coordinate descent method for optimization. For the outer problem, a two‑stage algorithm is designed to determine optimal UAV‑EV associations. Furthermore, theoretical analysis reveals a trade‑off between UAV energy consumption and multi‑task performance.
Authors: Shibo Huang, Chenfan Shi, Jian Yang, Hanlin Dong, Jinpeng Mi, Ke Li, Jianfeng Zhang, Miao Ding, Peidong Liang, Xiong You, Xian Wei
Abstract: Autonomous navigation in open‑world outdoor environments faces challenges in integrating dynamic conditions, long‑distance spatial reasoning, and semantic understanding. Traditional methods struggle to balance local planning, global planning, and semantic task execution, while existing large language models (LLMs) enhance semantic comprehension but lack spatial reasoning capabilities. Although diffusion models excel in local optimization, they fall short in large‑scale long‑distance navigation. To address these gaps, this paper proposes KiteRunner, a language‑driven cooperative local‑global navigation strategy that combines UAV orthophoto‑based global planning with diffusion model‑driven local path generation for long‑distance navigation in open‑world scenarios. Our method innovatively leverages real‑time UAV orthophotography to construct a global probability map, providing traversability guidance for the local planner, while integrating large models like CLIP and GPT to interpret natural language instructions. Experiments demonstrate that KiteRunner achieves 5.6% and 12.8% improvements in path efficiency over state‑of‑the‑art methods in structured and unstructured environments, respectively, with significant reductions in human interventions and execution time.
Authors: Ji Zhao, Xiao Lin
Abstract: The emergence of large language models (LLMs) opens new frontiers for unmanned aerial vehicle (UAVs), yet existing systems remain confined to predefined tasks due to hardware‑software co‑design challenges. This paper presents the first aerial intelligent agent capable of open‑world task execution through tight integration of LLM‑based reasoning and robotic autonomy. Our hardware‑software co‑designed system addresses two fundamental limitations: (1) Onboard LLM operation via an edge‑optimized computing platform, achieving 5‑6 tokens/sec inference for 14B‑parameter models at 220W peak power; (2) A bidirectional cognitive architecture that synergizes slow deliberative planning (LLM task planning) with fast reactive control (state estimation, mapping, obstacle avoidance, and motion planning). Validated through preliminary results using our prototype, the system demonstrates reliable task planning and scene understanding in communication‑constrained environments, such as sugarcane monitoring, power grid inspection, mine tunnel exploration, and biological observation applications. This work establishes a novel framework for embodied aerial artificial intelligence, bridging the gap between task planning and robotic autonomy in open environments.
Authors: Hyungsoo Kang, Isaac Kaminer, Venanzio Cichella, Naira Hovakimyan
Abstract: This article presents a novel time‑coordination algorithm based on event‑triggered communication to ensure multiple UAVs progress along their desired paths in coordination with one another. In the proposed algorithm, a UAV transmits its progression information to its neighbor UAVs only when a decentralized trigger condition is satisfied. Consequently, it significantly reduces the volume of inter‑vehicle communications required to achieve the goal compared with the existing algorithms based on continuous communication. With such intermittent communications, it is shown that a decentralized coordination controller guarantees exponential convergence of the coordination error to a neighborhood of zero. Furthermore, a lower bound on the difference between two consecutive event‑triggered times is provided showing that the Zeno behavior is excluded with the proposed algorithm. Lastly, simulation results validate the efficacy of the proposed algorithm.
Authors: Md Sharif Hossen, Anil Gurses, Mihail Sichitiu, Ismail Guvenc
Abstract: Unmanned aerial vehicles (UAVs) enhance coverage and provide flexible deployment in 5G and next‑generation wireless networks. The performance of such wireless networks can be improved by developing new navigation and wireless adaptation approaches in digital twins (DTs). However, challenges such as complex propagation conditions and hardware complexities in real‑world scenarios introduce a realism gap with the DTs. Moreover, while using real‑time full‑stack protocols in DTs enables subsequent deployment and testing in a real‑world environment, development in DTs requires high computational complexity and involves a long development time. In this paper, to accelerate the development cycle, we develop a measurement‑calibrated Matlab‑based simulation framework to replicate performance in a full‑stack UAV wireless network DT. In particular, we use the DT from the NSF AERPAW platform and compare its reports with those generated by our developed simulation framework in wireless networks with similar settings. In both environments, we observe comparable results in terms of RSRP measurement, hence motivating iterative use of the developed simulation environment with the DT.
Authors: Lavanya Ratnabala, Robinroy Peter, Aleksey Fedoseev, Dzmitry Tsetserukou
Abstract: This paper tackles decentralized continuous task allocation in heterogeneous multi‑agent systems. We present a novel framework HIPPO‑MAT that integrates graph neural networks (GNN) employing a GraphSAGE architecture to compute independent embeddings on each agent with an Independent Proximal Policy Optimization (IPPO) approach for multi‑agent deep reinforcement learning. In our system, unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs) share aggregated observation data via communication channels while independently processing these inputs to generate enriched state embeddings. This design enables dynamic, cost‑optimal, conflict‑aware task allocation in a 3D grid environment without the need for centralized coordination. A modified A path planner is incorporated for efficient routing and collision avoidance. Simulation experiments demonstrate scalability with up to 30 agents and preliminary real‑world validation on JetBot ROS AI Robots, each running its model on a Jetson Nano and communicating through an ESP‑NOW protocol using ESP32‑S3, which confirms the practical viability of the approach that incorporates simultaneous localization and mapping (SLAM). Experimental results revealed that our method achieves a high 92.5% conflict‑free success rate, with only a 16.49% performance gap compared to the centralized Hungarian method, while outperforming the heuristic decentralized baseline based on greedy approach. Additionally, the framework exhibits scalability with up to 30 agents with allocation processing of 0.32 simulation step time and robustness in responding to dynamically generated tasks.
Authors: Jacob Swindell, Madeleine Darbyshire, Marija Popovic, Riccardo Polvara
Abstract: Accurate agricultural weed mapping using UAVs is crucial for precision farming applications. Traditional methods rely on orthomosaic stitching from rigid flight paths, which is computationally intensive and time‑consuming. Gaussian Process (GP)‑based mapping offers continuous modelling of the underlying variable (i.e. weed distribution) but requires discretisation for practical tasks like path planning or visualisation. Current implementations often default to quadtrees or gridmaps without systematically evaluating alternatives. This study compares five discretisation methods: quadtrees, wedgelets, top‑down binary space partition (BSP) trees using least square error (LSE), bottom‑up BSP trees using graph merging, and variable‑resolution hexagonal grids. Evaluations on real‑world weed distributions measure visual similarity, mean squared error (MSE), and computational efficiency. Results show quadtrees perform best overall, but alternatives excel in specific scenarios: hexagons or BSP LSE suit fields with large, dominant weed patches, while quadtrees are optimal for dispersed small‑scale distributions. These findings highlight the need to tailor discretisation approaches to weed distribution patterns (patch size, density, coverage) rather than relying on default methods. By choosing representations based on the underlying distribution, we can improve mapping accuracy and efficiency for precision agriculture applications.
Authors: Wentao Wu, Chenglong Li, Xiao Wang, Bin Luo, Qi Liu
Abstract: Existing multimodal UAV object detection methods often overlook the impact of semantic gaps between modalities, which makes it difficult to achieve accurate semantic and spatial alignments, limiting detection performance. To address this problem, we propose a Large Language Model (LLM) guided Progressive feature Alignment Network called LPANet, which leverages the semantic features extracted from a large language model to guide the progressive semantic and spatial alignment between modalities for multimodal UAV object detection. To employ the powerful semantic representation of LLM, we generate the fine‑grained text descriptions of each object category by ChatGPT and then extract the semantic features using the large language model MPNet. Based on the semantic features, we guide the semantic and spatial alignments in a progressive manner as follows. First, we design the Semantic Alignment Module (SAM) to pull the semantic features and multimodal visual features of each object closer, alleviating the semantic differences of objects between modalities. Second, we design the Explicit Spatial alignment Module (ESM) by integrating the semantic relations into the estimation of feature‑level offsets, alleviating the coarse spatial misalignment between modalities. Finally, we design the Implicit Spatial alignment Module (ISM), which leverages the cross‑modal correlations to aggregate key features from neighboring regions to achieve implicit spatial alignment. Comprehensive experiments on two public multimodal UAV object detection datasets demonstrate that our approach outperforms state‑of‑the‑art multimodal UAV object detectors.
Authors: Xiaowei Li, Kuan Xu, Fen Liu, Ruofei Bai, Shenghai Yuan, Lihua Xie
Abstract: Traditional unmanned aerial vehicle (UAV) swarm missions rely heavily on expensive custom‑made drones with onboard perception or external positioning systems, limiting their widespread adoption in research and education. To address this issue, we propose AirSwarm. AirSwarm democratizes multi‑drone coordination using low‑cost commercially available drones such as Tello or Anafi, enabling affordable swarm aerial robotics research and education. Key innovations include a hierarchical control architecture for reliable multi‑UAV coordination, an infrastructure‑free visual SLAM system for precise localization without external motion capture, and a ROS‑based software framework for simplified swarm development. Experiments demonstrate cm‑level tracking accuracy, low‑latency control, communication failure resistance, formation flight, and trajectory tracking. By reducing financial and technical barriers, AirSwarm makes multi‑robot education and research more accessible. The complete instructions and open source code will be available at
Authors: Xiaohong Yang, Minghui Liwang, Liqun Fu, Yuhan Su, Seyyedali Hosseinalipour, Xianbin Wang, Yiguang Hong
Abstract: Hierarchical Federated Learning (HFL) extends conventional Federated Learning (FL) by introducing intermediate aggregation layers, enabling distributed learning in geographically dispersed environments, particularly relevant for smart IoT systems, such as remote monitoring and battlefield operations, where cellular connectivity is limited. In these scenarios, UAVs serve as mobile aggregators, dynamically connecting terrestrial IoT devices. This paper investigates an HFL architecture with energy‑constrained, dynamically deployed UAVs prone to communication disruptions. We propose a novel approach to minimize global training costs by formulating a joint optimization problem that integrates learning configuration, bandwidth allocation, and device‑to‑UAV association, ensuring timely global aggregation before UAV disconnections and redeployments. The problem accounts for dynamic IoT devices and intermittent UAV connectivity and is NP‑hard. To tackle this, we decompose it into three subproblems: (i) optimizing learning configuration and bandwidth allocation via an augmented Lagrangian to reduce training costs; (ii) introducing a device fitness score based on data heterogeneity (via Kullback‑Leibler divergence), device‑to‑UAV proximity, and computational resources, using a TD3‑based algorithm for adaptive device‑to‑UAV assignment; (iii) developing a low‑complexity two‑stage greedy strategy for UAV redeployment and global aggregator selection, ensuring efficient aggregation despite UAV disconnections. Experiments on diverse real‑world datasets validate the approach, demonstrating cost reduction and robust performance under communication disruptions.
Authors: Achiel Colpaert, Zhuangzhuang Cui, Sofie Pollin
Abstract: Connecting aerial and terrestrial users with a single base station (BS) is increasingly challenging due to the rising number of aerial users like unmanned aerial vehicles (UAVs). Traditional BSs, designed with down‑tilted beams, focus mainly on ground users, but massive MIMO (mMIMO) systems can significantly enhance coverage in low‑altitude airspace. This paper analyzes how a mMIMO BS serves both aerial and terrestrial users in a 3D spectrum‑sharing scheme. Using Semi‑orthogonal User Selection (SUS) and random scheduling, we assess the spectral efficiency and performance limits of these systems. Results reveal that mMIMO effectively supports more terrestrial users, influenced by channel characteristics and user scheduling strategies, providing key insights for future 3D aerial‑terrestrial networks.
Authors: Aditya Prashant Naidu, Hem Gosalia, Ishaan Gakhar, Shaurya Singh Rathore, Krish Didwania, Ujjwal Verma
Abstract: Although advances in deep learning and aerial surveillance technology are improving wildlife conservation efforts, complex and erratic environmental conditions still pose a problem, requiring innovative solutions for cost‑effective small animal detection. This work introduces DEAL‑YOLO, a novel approach that improves small object detection in Unmanned Aerial Vehicle (UAV) images by using multi‑objective loss functions like Wise IoU (WIoU) and Normalized Wasserstein Distance (NWD), which prioritize pixels near the centre of the bounding box, ensuring smoother localization and reducing abrupt deviations. Additionally, the model is optimized through efficient feature extraction with Linear Deformable (LD) convolutions, enhancing accuracy while maintaining computational efficiency. The Scaled Sequence Feature Fusion (SSFF) module enhances object detection by effectively capturing inter‑scale relationships, improving feature representation, and boosting metrics through optimized multiscale fusion. Comparison with baseline models reveals high efficacy with up to 69.5% fewer parameters compared to vanilla Yolov8‑N, highlighting the robustness of the proposed modifications. Through this approach, our paper aims to facilitate the detection of endangered species, animal population analysis, habitat monitoring, biodiversity research, and various other applications that enrich wildlife conservation efforts. DEAL‑YOLO employs a two‑stage inference paradigm for object detection, refining selected regions to improve localization and confidence. This approach enhances performance, especially for small instances with low objectness scores.
Authors: Yu-Hsi Chen, Chin-Tien Wu
Abstract: Optical flow is a fundamental technique for motion estimation, widely applied in video stabilization, interpolation, and object tracking. Traditional optical flow estimation methods rely on restrictive assumptions like brightness constancy and slow motion constraints. Recent deep learning‑based flow estimations require extensive training on large domain‑specific datasets, making them computationally demanding. Also, artificial intelligence (AI) advances have enabled deep learning models to take advantage of optical flow as an important feature for object tracking and motion analysis. Since optical flow is commonly encoded in HSV for visualization, its conversion to RGB for neural network processing is nonlinear and may introduce perceptual distortions. These transformations amplify the sensitivity to estimation errors, potentially affecting the predictive accuracy of the networks. To address these challenges that are influential to the performance of downstream network models, we propose Reynolds flow, a novel training‑free flow estimation inspired by the Reynolds transport theorem, offering a principled approach to modeling complex motion dynamics. In addition to conventional HSV‑based visualization of Reynolds flow, we also introduce an RGB‑encoded representation of Reynolds flow designed to improve flow visualization and feature enhancement for neural networks. We evaluated the effectiveness of Reynolds flow in video‑based tasks. Experimental results on three benchmarks, tiny object detection on UAVDB, infrared object detection on Anti‑UAV, and pose estimation on GolfDB, demonstrate that networks trained with RGB‑encoded Reynolds flow achieve SOTA performance, exhibiting improved robustness and efficiency across all tasks.
Authors: Chiara Gabellieri, Lars Teeuwen, Yaolei Shen, Antonio Franchi
Abstract: This work considers a large class of systems composed of multiple quadrotors manipulating deformable and extensible cables. The cable is described via a discretized representation, which decomposes it into linear springs interconnected through lumped‑mass passive spherical joints. Sets of flat outputs are found for the systems. Numerical simulations support the findings by showing cable manipulation relying on flatness‑based trajectories. Eventually, we present an experimental validation of the effectiveness of the proposed discretized cable model for a two‑robot example. Moreover, a closed‑loop controller based on the identified model and using cable‑output feedback is experimentally tested.
Authors: Dragos Costea, Alina Marcu, Marius Leordeanu
Abstract: Generating novel views from recorded videos is crucial for enabling autonomous UAV navigation. Recent advancements in neural rendering have facilitated the rapid development of methods capable of rendering new trajectories. However, these methods often fail to generalize well to regions far from the training data without an optimized flight path, leading to suboptimal reconstructions. We propose a self‑supervised cyclic neural‑analytic pipeline that combines high‑quality neural rendering outputs with precise geometric insights from analytical methods. Our solution improves RGB and mesh reconstructions for novel view synthesis, especially in undersampled areas and regions that are completely different from the training dataset. We use an effective transformer‑based architecture for image reconstruction to refine and adapt the synthesis process, enabling effective handling of novel, unseen poses without relying on extensive labeled datasets. Our findings demonstrate substantial improvements in rendering views of novel and also 3D reconstruction, which to the best of our knowledge is a first, setting a new standard for autonomous navigation in complex outdoor environments.
Authors: Chiara Gabellieri, Yaolei Shen, Martina Paolucci, Antonio Franchi
Abstract: This work demonstrates that the non‑stop flights of three or more carriers are compatible with holding a constant pose of a cable‑suspended load. It also presents an algorithm for generating the carriers' coordinated non‑stop trajectories. The proposed method builds upon two pillars: (1) the choice of n special linearly independent directions of internal forces within the 3n‑6‑dimensional nullspace of the grasp matrix of the load, chosen as the edges of a Hamiltonian cycle on the graph that connects the cable attachment points on the load. Adjacent pairs of directions are used to generate n forces evolving on distinct 2D affine subspaces, despite the attachment points being generically in 3D; (2) the construction of elliptical trajectories within these subspaces by mapping, through appropriate graph coloring, each edge of the Hamiltonian cycle to a periodic coordinate while ensuring that no adjacent coordinates exhibit simultaneous zero derivatives. Combined with conditions for load statics and attachment point positions, these choices ensure that each of the n force trajectories projects onto the corresponding cable constraint sphere with non‑zero tangential velocity, enabling perpetual motion of the carriers while the load is still. The work provides a scalable constructive design for any n greater than or equal to 3 with tuning guidelines, quantifies sensitivity and single‑carrier failures, and provides a fixed‑wing‑compatible planner that preserves load statics under speed/bank/flight‑path constraints. The theoretical findings are validated through simulations and laboratory experiments with quadrotor UAVs.
Authors: Muhammet Hevesli, Abegaz Mohammed Seid, Aiman Erbad, Mohamed Abdallah
Abstract: Mobile edge computing (MEC)‑enabled air‑ground networks are a key component of 6G, employing aerial base stations (ABSs) such as unmanned aerial vehicles (UAVs) and high‑altitude platform stations (HAPS) to provide dynamic services to ground IoT devices (IoTDs). These IoTDs support real‑time applications (e.g., multimedia and Metaverse services) that demand high computational resources and strict quality of service (QoS) guarantees in terms of latency and task queue management. Given their limited energy and processing capabilities, IoTDs rely on UAVs and HAPS to offload tasks for distributed processing, forming a multi‑tier MEC system. This paper tackles the overall energy minimization problem in MEC‑enabled air‑ground integrated networks (MAGIN) by jointly optimizing UAV trajectories, computing resource allocation, and queue‑aware task offloading decisions. The optimization is challenging due to the nonconvex, nonlinear nature of this hierarchical system, which renders traditional methods ineffective. We reformulate the problem as a multi‑agent Markov decision process (MDP) with continuous action spaces and heterogeneous agents, and propose a novel variant of multi‑agent proximal policy optimization with a Beta distribution (MAPPO‑BD) to solve it. Extensive simulations show that MAPPO‑BD outperforms baseline schemes, achieving superior energy savings and efficient resource management in MAGIN while meeting queue delay and edge computing constraints.
Authors: Zilin Zhao, Chishui Chen, Haotian Shi, Jiale Chen, Xuanlin Yue, Zhejian Yang, Yang Liu
Abstract: Efficient path planning for unmanned aerial vehicles (UAVs) is crucial in remote sensing and information collection. As task scales expand, the cooperative deployment of multiple UAVs significantly improves information collection efficiency. However, collaborative communication and decision‑making for multiple UAVs remain major challenges in path planning, especially in noisy environments. To efficiently accomplish complex information collection tasks in 3D space and address robust communication issues, we propose a multi‑agent reinforcement learning (MARL) framework for UAV path planning based on the Counterfactual Multi‑Agent Policy Gradients (COMA) algorithm. The framework incorporates attention mechanism‑based UAV communication protocol and training‑deployment system, significantly improving communication robustness and individual decision‑making capabilities in noisy conditions. Experiments conducted on both synthetic and real‑world datasets demonstrate that our method outperforms existing algorithms in terms of path planning efficiency and robustness, especially in noisy environments, achieving a 78% improvement in entropy reduction.
Authors: Yasheerah Yaqoot, Muhammad Ahsan Mustafa, Oleg Sautenkov, Artem Lykov, Valerii Serpiva, Dzmitry Tsetserukou
Abstract: Emergency search and rescue (SAR) operations often require rapid and precise target identification in complex environments where traditional manual drone control is inefficient. In order to address these scenarios, a rapid SAR system, UAV‑VLRR (Vision‑Language‑Rapid‑Response), is developed in this research. This system consists of two aspects: 1) A multimodal system which harnesses the power of Visual Language Model (VLM) and the natural language processing capabilities of ChatGPT‑4o (LLM) for scene interpretation. 2) A non‑linearmodel predictive control (NMPC) with built‑in obstacle avoidance for rapid response by a drone to fly according to the output of the multimodal system. This work aims at improving response times in emergency SAR operations by providing a more intuitive and natural approach to the operator to plan the SAR mission while allowing the drone to carry out that mission in a rapid and safe manner. When tested, our approach was faster on an average by 33.75% when compared with an off‑the‑shelf autopilot and 54.6% when compared with a human pilot. Video of UAV‑VLRR: https://youtu.be/KJqQGKKt1xY
Authors: Oleg Sautenkov, Aibek Akhmetkazy, Yasheerah Yaqoot, Muhammad Ahsan Mustafa, Grik Tadevosyan, Artem Lykov, Dzmitry Tsetserukou
Abstract: The UAV‑VLPA (Visual‑Language‑Planning‑and‑Action) system represents a cutting‑edge advancement in aerial robotics, designed to enhance communication and operational efficiency for unmanned aerial vehicles (UAVs). By integrating advanced planning capabilities, the system addresses the Traveling Salesman Problem (TSP) to optimize flight paths, reducing the total trajectory length by 18.5% compared to traditional methods. Additionally, the incorporation of the A algorithm enables robust obstacle avoidance, ensuring safe and efficient navigation in complex environments. The system leverages satellite imagery processing combined with the Visual Language Model (VLM) and GPT's natural language processing capabilities, allowing users to generate detailed flight plans through simple text commands. This seamless fusion of visual and linguistic analysis empowers precise decision‑making and mission planning, making UAV‑VLPA a transformative tool for modern aerial operations. With its unmatched operational efficiency, navigational safety, and user‑friendly functionality, UAV‑VLPA sets a new standard in autonomous aerial robotics, paving the way for future innovations in the field.
Authors: Artem Lykov, Valerii Serpiva, Muhammad Haris Khan, Oleg Sautenkov, Artyom Myshlyaev, Grik Tadevosyan, Yasheerah Yaqoot, Dzmitry Tsetserukou
Abstract: This paper introduces CognitiveDrone, a novel Vision‑Language‑Action (VLA) model tailored for complex Unmanned Aerial Vehicles (UAVs) tasks that demand advanced cognitive abilities. Trained on a dataset comprising over 8,000 simulated flight trajectories across three key categories‑Human Recognition, Symbol Understanding, and Reasoning‑the model generates real‑time 4D action commands based on first‑person visual inputs and textual instructions. To further enhance performance in intricate scenarios, we propose CognitiveDrone‑R1, which integrates an additional Vision‑Language Model (VLM) reasoning module to simplify task directives prior to high‑frequency control. Experimental evaluations using our open‑source benchmark, CognitiveDroneBench, reveal that while a racing‑oriented model (RaceVLA) achieves an overall success rate of 31.3%, the base CognitiveDrone model reaches 59.6%, and CognitiveDrone‑R1 attains a success rate of 77.2%. These results demonstrate improvements of up to 30% in critical cognitive tasks, underscoring the effectiveness of incorporating advanced reasoning capabilities into UAV control systems. Our contributions include the development of a state‑of‑the‑art VLA model for UAV control and the introduction of the first dedicated benchmark for assessing cognitive tasks in drone operations. The complete repository is available at cognitivedrone.github.io
Authors: Jialei He, Zhihao Zhan, Zhituo Tu, Xiang Zhu, Jie Yuan
Abstract: Rapid generation of large‑scale orthoimages from Unmanned Aerial Vehicles (UAVs) has been a long‑standing focus of research in the field of aerial mapping. A multi‑sensor UAV system, integrating the Global Positioning System (GPS), Inertial Measurement Unit (IMU), 4D millimeter‑wave radar and camera, can provide an effective solution to this problem. In this paper, we utilize multi‑sensor data to overcome the limitations of conventional orthoimage generation methods in terms of temporal performance, system robustness, and geographic reference accuracy. A prior‑pose‑optimized feature matching method is introduced to enhance matching speed and accuracy, reducing the number of required features and providing precise references for the Structure from Motion (SfM) process. The proposed method exhibits robustness in low‑texture scenes like farmlands, where feature matching is difficult. Experiments show that our approach achieves accurate feature matching orthoimage generation in a short time. The proposed drone system effectively aids in farmland detection and management.
Authors: Junxiao Lin, Shuhang Ji, Yuze Wu, Tianyue Wu, Zhichao Han, Fei Gao
Abstract: How to endow aerial robots with the ability to operate in close proximity remains an open problem. The core challenges lie in the propulsion system's dual‑task requirement: generating manipulation forces while simultaneously counteracting gravity. These competing demands create dynamic coupling effects during physical interactions. Furthermore, rotor‑induced airflow disturbances critically undermine operational reliability. Although fully‑actuated unmanned aerial vehicles (UAVs) alleviate dynamic coupling effects via six‑degree‑of‑freedom (6‑DoF) force‑torque decoupling, existing implementations fail to address the aerodynamic interference between drones and environments. They also suffer from oversized designs, which compromise maneuverability and limit their applications in various operational scenarios. To address these limitations, we present FLOAT Drone (FuLly‑actuated cOaxial Aerial roboT), a novel fully‑actuated UAV featuring two key structural innovations. By integrating control surfaces into fully‑actuated systems for the first time, we significantly suppress lateral airflow disturbances during operations. Furthermore, a coaxial dual‑rotor configuration enables a compact size while maintaining high hovering efficiency. Through dynamic modeling, we have developed hierarchical position and attitude controllers that support both fully‑actuated and underactuated modes. Experimental validation through comprehensive real‑world experiments confirms the system's functional capabilities in close‑proximity operations.
Authors: Baris Yamansavascilar, Atay Ozgovde, Cem Ersoy
Abstract: We are witnessing a new era where problem‑solving and cognitive tasks are being increasingly delegated to Large Language Models (LLMs) across diverse domains, ranging from code generation to holiday planning. This trend also creates a demand for the ubiquitous execution of LLM‑powered applications in a wide variety of environments in which traditional terrestrial 2D networking infrastructures may prove insufficient. A promising solution in this context is to extend edge computing into a 3D setting to include aerial platforms organized in multiple layers, a paradigm we refer to as air computing, to augment local devices for running LLM and Generative AI (GenAI) applications. This approach alleviates the strain on existing infrastructure while enhancing service efficiency by offloading computational tasks to the corresponding air units such as UAVs. Furthermore, the coordinated deployment of various air units can significantly improve the Quality of Experience (QoE) by ensuring seamless, adaptive, and resilient task execution. In this study, we investigate the synergy between LLM‑based applications and air computing, exploring their potential across various use cases. Additionally, we present a disaster response case study demonstrating how the collaborative utilization of LLMs and air computing can significantly improve outcomes in critical situations.
Authors: Jaeyoung Lim, David Rohr, Thomas Stastny, Roland Siegwart
Abstract: Due to their energy‑efficient flight characteristics, fixed‑wing type UAVs are useful robotic tools for long‑range and duration flight applications in large‑scale environments. However, flying fixed‑wing UAV in confined environments, such as mountainous regions, can be challenging due to their limited maneuverability and sensitivity to uncertain wind conditions. In this work, we first analyze periodic trochoidal paths that can be used to define wind‑aware terminal loitering states. We then propose a wind‑invariant safe set of trochoidal paths along with a switching strategy for selecting the corresponding minimum‑extent periodic path type. Finally, we show that planning with this minimum‑extent set allows us to safely reach up to 10 times more locations in mountainous terrain compared to planning with a single, conservative loitering maneuver.
Authors: Yante Li, Hanwen Qi, Haoyu Chen, Xinlian Liang, Guoying Zhao
Abstract: In environmental protection, tree monitoring plays an essential role in maintaining and improving ecosystem health. However, precise monitoring is challenging because existing datasets fail to capture continuous fine‑grained changes in trees due to low‑resolution images and high acquisition costs. In this paper, we introduce UAVTC, a large‑scale, long‑term, high‑resolution dataset collected using UAVs equipped with cameras, specifically designed to detect individual Tree Changes (TCs). UAVTC includes rich annotations and statistics based on biological knowledge, offering a fine‑grained view for tree monitoring. To address environmental influences and effectively model the hierarchical diversity of physiological TCs, we propose a novel Hyperbolic Siamese Network (HSN) for TC detection, enabling compact and hierarchical representations of dynamic tree changes.
Extensive experiments show that HSN can effectively capture complex hierarchical changes and provide a robust solution for fine‑grained TC detection. In addition, HSN generalizes well to cross‑domain face anti‑spoofing task, highlighting its broader significance in AI. We believe our work, combining ecological insights and interdisciplinary expertise, will benefit the community by offering a new benchmark and innovative AI technologies.
Authors: Takumi Ito, Hayato Kawashima, Riku Funada, Mitsuji Sampei
Abstract: This paper presents a method for shaping the feasible force set of a payload‑carrying platform composed of multiple Unmanned Aerial Vehicles (UAVs) and proposes a control law that leverages the advantages of this shaped force set. The UAVs are connected to the payload through passively rotatable hinge joints. The joint angles are controlled by the differential thrust produced by the rotors, while the total force generated by all the rotors is responsible for controlling the payload. The shape of the set of the total force depends on the tilt angles of the UAVs, which allows us to shape the feasible force set by adjusting these tilt angles. This paper aims to ensure that the feasible force set encompasses the required shape, enabling the platform to generate force redundantly meaning in various directions. We then propose a control law that takes advantage of this redundancy.
Authors: Alejo Silvarrey, Pablo Negri
Abstract: Biological invasions pose a significant threat to the sustainability of water sources. Efforts are increasingly being made to prevent invasions, eradicate established invaders, or control them. Remote sensing (RS) has long been recognized as a potential tool to aid in this effort, for example, by mapping the distribution of invasive species or identifying areas at risk of invasion. This paper provides a detailed explanation of a process for mapping the actual distribution of invasive species. This article presents a case studie on the detection of invasive Iris Pseudacorus L. using multispectral data captured by small Unmanned Aerial Vehicles (UAVs). The process involved spectral feature mapping followed by semi‑supervised classification, which produced accurate maps of these invasive.
Authors: Shuaiang Rong, Lina He, Salih Furkan Atici, Ahmet Enis Cetin
Abstract: Power line infrastructure is a key component of the power system, and it is rapidly expanding to meet growing energy demands. Vegetation encroachment is a significant threat to the safe operation of power lines, requiring reliable and timely management to enhance the resilience and reliability of the power network. Integrating smart grid technology, especially Unmanned Aerial Vehicles (UAVs), provides substantial potential to revolutionize the management of extensive power line networks with advanced imaging techniques. However, processing the vast quantity of images captured by UAV patrols remains a significant challenge. This paper introduces an intelligent real‑time monitoring framework for detecting power lines and adjacent vegetation. It is developed based on the deep‑learning Convolutional Neural Network (CNN), You Only Look Once (YOLO), renowned for its high‑speed object detection capabilities. Unlike existing deep learning‑based methods, this framework enhances accuracy by integrating YOLOv8 with directional filters. They can extract directional features and textures of power lines and their vicinity, generating Oriented Bounding Boxes (OBB) for more precise localization. Additionally, a post‑processing algorithm is developed to create a vegetation encroachment metric for power lines, allowing for a quantitative assessment of the surrounding vegetation distribution. The effectiveness of the proposed framework is demonstrated using a widely used power line dataset.
Authors: Sabina Jangirova, Branislava Jankovic, Waseem Ullah, Latif U. Khan, Mohsen Guizani
Abstract: Wildfire catastrophes cause significant environmental degradation, human losses, and financial damage. To mitigate these severe impacts, early fire detection and warning systems are crucial. Current systems rely primarily on fixed CCTV cameras with a limited field of view, restricting their effectiveness in large outdoor environments. The fusion of intelligent fire detection with remote sensing improves coverage and mobility, enabling monitoring in remote and challenging areas. Existing approaches predominantly utilize convolutional neural networks and vision transformer models. While these architectures provide high accuracy in fire detection, their computational complexity limits real‑time performance on edge devices such as UAVs. In our work, we present a lightweight fire detection model based on MobileViT‑S, compressed through the distillation of knowledge from a stronger teacher model. The ablation study highlights the impact of a teacher model and the chosen distillation technique on the model's performance improvement. We generate activation map visualizations using Grad‑CAM to confirm the model's ability to focus on relevant fire regions. The high accuracy and efficiency of the proposed model make it well‑suited for deployment on satellites, UAVs, and IoT devices for effective fire detection. Experiments on common fire benchmarks demonstrate that our model suppresses the state‑of‑the‑art model by 0.44%, 2.00% while maintaining a compact model size. Our model delivers the highest processing speed among existing works, achieving real‑time performance on resource‑constrained devices.
Authors: Henry Lei, Joshua Aurand, Zachary S. Lippay, Sean Phillips
Abstract: With the increasingly congested and contested space environment, safe and effective satellite operation has become increasingly challenging. As a result, there is growing interest in autonomous satellite capabilities, with common machine learning techniques gaining attention for their potential to address complex decision‑making in the space domain. However, the "black‑box" nature of many of these methods results in difficulty understanding the model's input/output relationship and more specifically its sensitivity to environmental disturbances, sensor noise, and control intervention. This paper explores the use of Deep Reinforcement Learning (DRL) for satellite control in multi‑agent inspection tasks. The Local Intelligent Network of Collaborative Satellites (LINCS) Lab is used to test the performance of these control algorithms across different environments, from simulations to real‑world quadrotor UAV hardware, with a particular focus on understanding their behavior and potential degradation in performance when deployed beyond the training environment.
Authors: Marios-Nektarios Stamatopoulos, Jakub Haluska, Elias Small, Jude Marroush, Avijit Banerjee, George Nikolakopoulos
Abstract: A novel autonomous chunk‑based aerial additive manufacturing framework is presented, supported with experimental demonstration advancing aerial 3D printing. An optimization‑based decomposition algorithm transforms structures into sub‑components, or chunks, treated as individual tasks coordinated via a dependency graph, ensuring sequential assignment to UAVs considering inter‑dependencies and printability constraints for seamless execution. A specially designed hexacopter equipped with a pressurized canister for lightweight expandable foam extrusion is utilized to deposit the material in a controlled manner. To further enhance precise execution of the printing, an offset‑free Model Predictive Control mechanism is considered compensating reactively for disturbances and ground effect during execution. Additionally, an interlocking mechanism is introduced in the chunking process to enhance structural cohesion and improve layer adhesion. Extensive experiments demonstrate the framework's effectiveness in constructing precise structures of various shapes while seamlessly adapting to practical challenges, proving its potential for a transformative leap in aerial robotic capability for autonomous construction.
Authors: Beomyeol Yu, Taeyoung Lee
Abstract: Improving sampling efficiency and generalization capability is critical for the successful data‑driven control of quadrotor unmanned aerial vehicles (UAVs) that are inherently unstable. While various reinforcement learning (RL) approaches have been applied to autonomous quadrotor flight, they often require extensive training data, posing multiple challenges and safety risks in practice. To address these issues, we propose data‑efficient, equivariant monolithic and modular RL frameworks for quadrotor low‑level control. Specifically, by identifying the rotational and reflectional symmetries in quadrotor dynamics and encoding these symmetries into equivariant network models, we remove redundancies of learning in the state‑action space. This approach enables the optimal control action learned in one configuration to automatically generalize into other configurations via symmetry, thereby enhancing data efficiency. Experimental results demonstrate that our equivariant approaches significantly outperform their non‑equivariant counterparts in terms of learning efficiency and flight performance.
Authors: Thomas Hickling, Maxwell Hogan, Abdulla Tammam, Nabil Aouf
Abstract: This paper presents the first end‑to‑end framework that combines guidance, navigation, and centralised task allocation for multiple UAVs performing autonomous search‑and‑rescue (SAR) in GNSS‑denied indoor environments. A Twin Delayed Deep Deterministic Policy Gradient controller is trained with an Artificial Potential Field (APF) reward that blends attractive and repulsive potentials with continuous control, accelerating convergence and yielding smoother, safer trajectories than distance‑only baselines. Collaborative mission assignment is solved by a deep Graph Attention Network that, at each decision step, reasons over the drone‑task graph to produce near‑optimal allocations with negligible on‑board compute. To arrest the notorious Z‑drift of indoor LiDAR‑SLAM, we fuse depth‑camera altimetry with IMU vertical velocity in a lightweight complementary filter, giving centimetre‑level altitude stability without external beacons. The resulting system was deployed on two 1m‑class quad‑rotors and flight‑tested in a cluttered, multi‑level disaster mock‑up designed for the NATO‑Sapience Autonomous Cooperative Drone Competition. Compared with prior DRL guidance that remains largely in simulation, our framework demonstrates an ability to navigate complex indoor environments, securing first place in the 2024 event. These results demonstrate that APF‑shaped DRL and GAT‑driven cooperation can translate to reliable real‑world SAR operations.
Authors: Zhaoxuan Wang, Yang Li, Jie Zhang, Xingshuo Han, Kangbo Liu, Lyu Yang, yuan Zhou, Tianwei Zhang, Quan Pan
Abstract: Unmanned aerial vehicles (UAVs) are increasingly employed to perform high‑risk tasks that require minimal human intervention. However, UAVs face escalating cybersecurity threats, particularly from GNSS spoofing attacks. While previous studies have extensively investigated the impacts of GNSS spoofing on UAVs, few have focused on its effects on specific tasks. Moreover, the influence of UAV motion states on the assessment of network security risks is often overlooked. To address these gaps, we first provide a detailed evaluation of how motion states affect the effectiveness of network attacks. We demonstrate that nonlinear motion states not only enhance the effectiveness of position spoofing in GNSS spoofing attacks but also reduce the probability of speed‑related attack detection. Building upon this, we propose a state‑triggered backdoor attack method (SSD) to deceive GNSS systems and assess its risk to trajectory planning tasks. Extensive validation of SSD's effectiveness and stealthiness is conducted. Experimental results show that, with appropriately tuned hyperparameters, SSD significantly increases positioning errors and the risk of task failure, while maintaining 100% stealth across three state‑of‑the‑art detectors.
Authors: David-Alexandre Poissant, Alexis Lussier Desbiens, François Ferland, Louis Petit
Abstract: Autonomous robotic inspection missions require balancing multiple conflicting objectives while navigating near costly obstacles. Current multi‑objective path planning (MOPP) methods struggle to adapt to evolving risks like localization errors, weather, battery state, and communication issues. This letter presents an Adaptive Risk‑aware and Energy‑efficient NAvigation (ARENA) MOPP approach for UAVs in complex 3D environments. Our method enables online trajectory adaptation by optimizing safety, time, and energy using 4D NURBS representation and a genetic‑based algorithm to generate the Pareto front. A novel risk‑aware voting algorithm ensures adaptivity. Simulations and real‑world tests demonstrate the planner's ability to produce diverse, optimized trajectories covering 95% or more of the range defined by single‑objective benchmarks and its ability to estimate power consumption with a mean error representing 14% of the full power range. The ARENA framework enhances UAV autonomy and reliability in critical, evolving 3D missions.
Authors: Xiangming Du, Shuowen Zhang, Francis C. -M. Lau
Abstract: In this letter, we study a cellular‑connected unmanned aerial vehicle (UAV) which aims to complete a mission of flying between two pre‑determined locations while maintaining satisfactory communication quality with the ground base stations (GBSs). Due to the potentially long distance of the UAV's flight, frequent handovers may be incurred among different GBSs, which leads to various practical issues such as large delay and synchronization overhead. To address this problem, we investigate the trajectory optimization of the UAV to minimize the number of GBS handovers during the flight, subject to a communication quality constraint and a maximum mission completion time constraint. Although this problem is non‑convex and difficult to solve, we derive useful structures of the optimal solution, based on which we propose an efficient algorithm based on graph theory and Lagrangian relaxation for finding a high‑quality suboptimal solution in polynomial time. Numerical results validate the effectiveness of our proposed trajectory design.
Authors: D. Hareb, J. Martinet, B. Miramond
Abstract: Achieving optimal semantic segmentation with frame‑based vision sensors poses significant challenges for real‑time systems like UAVs and self‑driving cars, which require rapid and precise processing. Traditional frame‑based methods often struggle to balance latency, accuracy, and energy efficiency. To address these challenges, we leverage event streams from event‑based cameras‑bio‑inspired sensors that trigger events in response to changes in the scene. Specifically, we analyze the number of events triggered between successive frames, with a high number indicating significant changes and a low number indicating minimal changes. We exploit this event information to solve the semantic segmentation task by employing a Spiking Neural Network (SNN), a bio‑inspired computing paradigm known for its low energy consumption. Our experiments on the DSEC dataset show that our approach significantly reduces latency with only a limited drop in accuracy. Additionally, by using SNNs, we achieve low power consumption, making our method suitable for energy‑constrained real‑time applications. To the best of our knowledge, our approach is the first to effectively balance reduced latency, minimal accuracy loss, and energy efficiency using events stream to enhance semantic segmentation in dynamic and resource‑limited environments.
Authors: Ruiqing Han, Tianxian Zhang, Han Zhong, Yuanhang Wang
Abstract: The low detectability and low cost of unmanned aerial vehicles (UAVs) allow them to swarm near the radar network for effective jamming. The key to jamming is the reasonable task assignment and resource allocation of UAVs. However, the existing allocation model is somewhat ideal, weakly adaptive to the dynamic environment, and rarely considers frequency matching, which cannot suppress the frequency agile radar (FAR) network effectively. To solve these problems, a dynamic UAVs cooperative suppressive jamming method with joint task assignment and bandwidth allocation is proposed. To represent the matching relationship between UAVs and FARs, a system model of task assignment and bandwidth allocation is established, the problem is formulated as a dynamic mixed integer programming (D‑MIP) problem. Then, a suppressive jamming evaluation indicator is proposed, and the utility function is designed based on the Quality of Service (QoS) framework to quantify the jamming effect of UAVs. To solve the combinational optimization problem, a two‑step dynamic hybrid algorithm based on Kriging model is proposed, which can obtain the task assignment and bandwidth allocation schemes of UAVs by consuming fewer computational resources in dynamic environment. Simulation results show that the proposed method is effective in terms of jamming performance, computational resource saving and dynamic environment adaptability.
Authors: Seon-Geun Jeong, Pham Dang Anh Duc, Quang Vinh Do, Dae-Il Noh, Nguyen Xuan Tung, Trinh Van Chien, Quoc-Viet Pham, Mikio Hasegawa, Hiroo Sekiya, Won-Joo Hwang
Abstract: In wireless communication networks, it is difficult to solve many NP‑hard problems owing to computational complexity and high cost. Recently, quantum annealing (QA) based on quantum physics was introduced as a key enabler for solving optimization problems quickly. However, only some studies consider quantum‑based approaches in wireless communications. Therefore, we investigate the performance of a QA solution to an optimization problem in wireless networks. Specifically, we aim to maximize the sum rate by jointly optimizing clustering, sub‑channel assignment, and power allocation in a multi‑unmanned aerial vehicle‑aided wireless network. We formulate the sum rate maximization problem as a combinatorial optimization problem. Then, we divide it into two sub‑problems: 1) a QA‑based clustering and 2) sub‑channel assignment and power allocation for a given clustering configuration. Subsequently, we obtain an optimized solution for the joint optimization problem by solving these two sub‑problems. For the first sub‑problem, we convert the problem into a simplified quadratic unconstrained binary optimization (QUBO) model. As for the second sub‑problem, we introduce a novel QA algorithm with optimal scaling parameters to address it. Simulation results demonstrate the effectiveness of the proposed algorithm in terms of the sum rate and running time.
Authors: Li Dong, Feibo Jiang, Yubo Peng
Abstract: Unmanned Aerial Vehicles (UAVs) in Wireless Power Transfer (WPT)‑assisted Internet of Things (IoT) systems face the following challenges: limited resources and suboptimal trajectory planning. Reinforcement learning‑based trajectory planning schemes face issues of low search efficiency and learning instability when optimizing large‑scale systems. To address these issues, we present an Attention‑based UAV Trajectory Optimization (AUTO) framework based on the graph transformer, which consists of an Attention Trajectory Optimization Model (ATOM) and a Trajectory lEarNing Method based on Actor‑critic (TENMA). In ATOM, a graph encoder is used to calculate the self‑attention characteristics of all IoTDs, and a trajectory decoder is developed to optimize the number and trajectories of UAVs. TENMA then trains the ATOM using an improved Actor‑Critic method, in which the real reward of the system is applied as the baseline to reduce variances in the critic network. This method is suitable for high‑quality and large‑scale multi‑UAV trajectory planning. Finally, we develop numerous experiments, including a hardware experiment in the field case, to verify the feasibility and efficiency of the AUTO framework.
Authors: Cesar Briso, Cesar Calvo, Zhuangzhuang Cui, Lei Zhang, Youyun Xu
Abstract: In most countries, small (<2 kg) and medium (<25 kg) size unmanned aerial vehicles (UAVs) must fly at low altitude, below 120 m, and with permanent radio communications with ground for control and telemetry. These communications links can be provided using 4G/5G networks or dedicated links, but in either this case the communications can be significantly degraded by frequent Non Line of Sight (NLoS) propagation. In this case, reflection and diffraction from ground objects are critical to maintain links, and hence accurate propagation models for this must be considered. In this letter we present a model for path loss when the UAV is flying in NLOS conditions. The study is based on measurements made at frequencies of 1, 4, 12, and 24 GHz with a UAV flying in a suburban environment. Measurements have been used to model NLOS propagation below 4 GHz, where the dominant mechanism is diffraction, and above 4GHzwhere multipath is the dominant propagationmechanism. The model can be of use in predicting excess losses when UAVs fly in suburban NLOS conditions.
Authors: Stella Dumenčić, Luka Lanča, Karlo Jakac, Stefan Ivić
Abstract: Search and rescue (SAR) missions require reliable search methods to locate survivors, especially in challenging or inaccessible environments. This is why introducing unmanned aerial vehicles (UAVs) can be of great help to enhance the efficiency of SAR missions while simultaneously increasing the safety of everyone involved in the mission. Motivated by this, we design and experiment with autonomous UAV search for humans in a Mediterranean karst environment. The UAVs are directed using Heat equation‑driven area coverage (HEDAC) ergodic control method according to known probability density and detection function. The implemented sensing framework consists of a probabilistic search model, motion control system, and computer vision object detection. It enables calculation of the probability of the target being detected in the SAR mission, and this paper focuses on experimental validation of proposed probabilistic framework and UAV control. The uniform probability density to ensure the even probability of finding the targets in the desired search area is achieved by assigning suitably thought‑out tasks to 78 volunteers. The detection model is based on YOLO and trained with a previously collected ortho‑photo image database. The experimental search is carefully planned and conducted, while as many parameters as possible are recorded. The thorough analysis consists of the motion control system, object detection, and the search validation. The assessment of the detection and search performance provides strong indication that the designed detection model in the UAV control algorithm is aligned with real‑world results.
Authors: Jikang Deng, Hui Zhou, Mohamed-Slim Alouini
Abstract: To achieve global coverage and ubiquitous connectivity, the non‑terrestrial network (NTN) has been regarded as a key enabler in the sixth generation (6G) network, which includes uncrewed aerial vehicles (UAVs), high‑altitude platforms (HAPs), and satellites. Since the unique characteristics of various NTN platforms strongly affect their implementation and lead to a highly dynamic and heterogeneous NTN scenario, achieving distributed coordination remains an important research direction. However, the explicit and systematic analysis of the individual layers' challenges and corresponding distributed coordination solutions in heterogeneous NTNs has not been proposed yet. Therefore, in this paper, we summarize the unique characteristics of each NTN platform, identify communication challenges within individual layers, and propose potential delay‑tolerant or delay‑sensitive coordinated solutions accordingly. We further analyse the feasibility of leveraging multi‑agent deep reinforcement learning (MADRL) algorithms to achieve the proposed coordinated solutions. Finally, we present a case study of the joint scheduling and trajectory optimization problem in heterogeneous NTN, where a two‑timescale multi‑agent deep deterministic policy gradient (TTS‑MADDPG) algorithm is developed to validate the effectiveness of distributed coordination.
Authors: Hamidreza Mazandarani, Masoud Shokrnezhad, Tarik Taleb
Abstract: The advancement towards 6G technology leverages improvements in aerial‑terrestrial networking, where one of the critical challenges is the efficient allocation of transmit power. Although existing studies have shown commendable performance in addressing this challenge, a revolutionary breakthrough is anticipated to meet the demands and dynamism of 6G. Potential solutions include: 1) semantic communication and orchestration, which transitions the focus from mere transmission of bits to the communication of intended meanings of data and their integration into the network orchestration process; and 2) distributed machine learning techniques to develop adaptable and scalable solutions. In this context, this paper introduces a power allocation framework specifically designed for semantic‑aware networks. The framework addresses a scenario involving multiple Unmanned Aerial Vehicles (UAVs) that collaboratively transmit observations over a multi‑channel uplink medium to a central server, aiming to maximise observation quality. To tackle this problem, we present the Semantic‑Aware Multi‑Agent Double and Dueling Deep Q‑Learning (SAMA‑D3QL) algorithm, which utilizes the data quality of observing areas as reward feedback during the training phase, thereby constituting a semantic‑aware learning mechanism. Simulation results substantiate the efficacy and scalability of our approach, demonstrating its superior performance compared to traditional bit‑oriented learning and heuristic algorithms.
Authors: Mingkun Li, Ziming Wang, Guang Huo, Wei Chen, Xiaoning Zhao
Abstract: With the expanding application scope of unmanned aerial vehicles (UAVs), the demand for stable UAV control has significantly increased. However, in complex environments, GPS signals are prone to interference, resulting in ineffective UAV positioning. Therefore, self‑positioning of UAVs in GPS‑denied environments has become a critical objective. Some methods obtain geolocation information in GPS‑denied environments by matching ground objects in the UAV viewpoint with remote sensing images. However, most of these methods only provide coarse‑level positioning, which satisfies cross‑view geo‑localization but cannot support precise UAV positioning tasks. Consequently, this paper focuses on a newer and more challenging task: precise UAV self‑positioning based on remote sensing images. This approach not only considers the features of ground objects but also accounts for the spatial distribution of objects in the images. To address this challenge, we present a deep learning framework with geographic information adaptive loss, which achieves precise localization by aligning UAV images with corresponding satellite imagery in fine detail through the integration of geographic information from multiple perspectives. To validate the effectiveness of the proposed method, we conducted a series of experiments. The results demonstrate the method's efficacy in enabling UAVs to achieve precise self‑positioning using remote sensing imagery.
Authors: Paul S. Kudyba, Haijian Sun
Abstract: In precision agriculture and plant science, there is an increasing demand for wireless sensors that are easy to deploy, maintain, and monitor. This paper investigates a novel approach that leverages recent advances in extremely low‑power wireless communication and sensing, as well as the rapidly increasing availability of unmanned aerial vehicle (UAV) platforms. By mounting a specialized wireless payload on a UAV, battery‑less sensor tags can harvest wireless beacon signals emitted from the drone, dramatically reducing the cost per sensor. These tags can measure environmental information such as temperature and humidity, then encrypt and transmit the data in the range of several meters. An experimental implementation was constructed at AERPAW, an NSF‑funded wireless aerial drone research platform. While ground‑based tests confirmed reliable sensor operation and data collection, airborne trials encountered wireless interference that impeded successfully detecting tag data. Despite these challenges, our results suggest further refinements could improve reliability and advance precision agriculture and agrarian research.
Authors: Brady Moon, Nayana Suvarna, Andrew Jong, Satrajit Chatterjee, Junbin Yuan, Muqing Cao, Sebastian Scherer
Abstract: Planning paths that maximize information gain for robotic platforms has wide‑ranging applications and significant potential impact. To effectively adapt to real‑time data collection, informative path planning must be computed online and be responsive to new observations. In this work, we present IA‑TIGRIS (Incremental and Adaptive Tree‑based Information Gathering Using Informed Sampling), which is an incremental and adaptive sampling‑based informative path planner designed for real‑time onboard execution. Our approach leverages past planning efforts through incremental refinement while continuously adapting to updated belief maps. We additionally present detailed implementation and optimization insights to facilitate real‑world deployment, along with an array of reward functions tailored to specific missions and behaviors. Extensive simulation results demonstrate IA‑TIGRIS generates higher‑quality paths compared to baseline methods. We validate our planner on two distinct hardware platforms: a hexarotor unmanned aerial vehicle (UAV) and a fixed‑wing UAV, each having different motion models and configuration spaces. Our results show up to a 38% improvement in information gain compared to baseline methods, highlighting the planner's potential for deployment in real‑world applications. Project website: https://ia‑tigris.github.io
Authors: Muhammad Ahmed Mohsin, Muhammad Umer, Amara Umar, Hatem Abou-Zeid, Syed Ali Hassan
Abstract: The rapid growth of computation‑intensive applications like augmented reality, autonomous driving, remote healthcare, and smart cities has exposed the limitations of traditional terrestrial networks, particularly in terms of inadequate coverage, limited capacity, and high latency in remote areas. This chapter explores how integrated terrestrial and non‑terrestrial networks (IT‑NTNs) can address these challenges and enable efficient computation offloading. We examine mobile edge computing (MEC) and its evolution toward multiple‑access edge computing, highlighting the critical role computation offloading plays for resource‑constrained devices. We then discuss the architecture of IT‑NTNs, focusing on how terrestrial base stations, unmanned aerial vehicles (UAVs), high‑altitude platforms (HAPs), and LEO satellites work together to deliver ubiquitous connectivity. Furthermore, we analyze various computation offloading strategies, including edge, cloud, and hybrid offloading, outlining their strengths and weaknesses. Key enabling technologies such as NOMA, mmWave/THz communication, and reconfigurable intelligent surfaces (RIS) are also explored as essential components of existing algorithms for resource allocation, task offloading decisions, and mobility management. Finally, we conclude by highlighting the transformative impact of computation offloading in IT‑NTNs across diverse application areas and discuss key challenges and future research directions, emphasizing the potential of these networks to revolutionize communication and computation paradigms.
Authors: Lucas Rey, Ana M. Bernardos, Andrzej D. Dobrzycki, David Carramiñana, Luca Bergesio, Juan A. Besada, José Ramón Casar
Abstract: Advancements in embedded systems and Artificial Intelligence (AI) have enhanced the capabilities of Unmanned Aircraft Vehicles (UAVs) in computer vision. However, the integration of AI techniques o‑nboard drones is constrained by their processing capabilities. In this sense, this study evaluates the deployment of object detection models (YOLOv8n and YOLOv8s) on both resource‑constrained edge devices and cloud environments. The objective is to carry out a comparative performance analysis using a representative real‑time UAV image processing pipeline. Specifically, the NVIDIA Jetson Orin Nano, Orin NX, and Raspberry Pi 5 (RPI5) devices have been tested to measure their detection accuracy, inference speed, and energy consumption, and the effects of post‑training quantization (PTQ). The results show that YOLOv8n surpasses YOLOv8s in its inference speed, achieving 52 FPS on the Jetson Orin NX and 65 fps with INT8 quantization. Conversely, the RPI5 failed to satisfy the real‑time processing needs in spite of its suitability for low‑energy consumption applications. An analysis of both the cloud‑based and edge‑based end‑to‑end processing times showed that increased communication latencies hindered real‑time applications, revealing trade‑offs between edge (low latency) and cloud processing (quick processing). Overall, these findings contribute to providing recommendations and optimization strategies for the deployment of AI models on UAVs.
Authors: Ahmed Alagha, Maha Kadadha, Rabeb Mizouni, Shakti Singh, Jamal Bentahar, Hadi Otrok
Abstract: This paper addresses the challenges of selecting relay nodes and coordinating among them in UAV‑assisted Internet‑of‑Vehicles (IoV). The selection of UAV relay nodes in IoV employs mechanisms executed either at centralized servers or decentralized nodes, which have two main limitations: 1) the traceability of the selection mechanism execution and 2) the coordination among the selected UAVs, which is currently offered in a centralized manner and is not coupled with the relay selection. Existing UAV coordination methods often rely on optimization methods, which are not adaptable to different environment complexities, or on centralized deep reinforcement learning, which lacks scalability in multi‑UAV settings. Overall, there is a need for a comprehensive framework where relay selection and coordination are coupled and executed in a transparent and trusted manner. This work proposes a framework empowered by reinforcement learning and Blockchain for UAV‑assisted IoV networks. It consists of three main components: a two‑sided UAV relay selection mechanism for UAV‑assisted IoV, a decentralized Multi‑Agent Deep Reinforcement Learning (MDRL) model for autonomous UAV coordination, and a Blockchain implementation for transparency and traceability in the interactions between vehicles and UAVs. The relay selection considers the two‑sided preferences of vehicles and UAVs based on the Quality‑of‑UAV (QoU) and the Quality‑of‑Vehicle (QoV). Upon selection of relay UAVs, the decentralized coordination between them is enabled through an MDRL model trained to control their mobility and maintain the network coverage and connectivity using Proximal Policy Optimization (PPO). The evaluation results demonstrate that the proposed selection and coordination mechanisms improve the stability of the selected relays and maximize the coverage and connectivity achieved by the UAVs.
Authors: Alexandre Gemayel, Dimitrios Michael Manias, Abdallah Shami
Abstract: As smart cities begin to materialize, the role of Unmanned Aerial Vehicles (UAVs) and their reliability becomes increasingly important. One aspect of reliability relates to Condition Monitoring (CM), where Machine Learning (ML) models are leveraged to identify abnormal and adverse conditions. Given the resource‑constrained nature of next‑generation edge networks, the utilization of precious network resources must be minimized. This work explores the optimization of network resources for ML‑based UAV CM frameworks. The developed framework uses experimental data and varies the feature extraction aggregation interval to optimize ML model selection. Additionally, by leveraging dimensionality reduction techniques, there is a 99.9% reduction in network resource consumption.
Authors: Jinwen Zhu, Jun Hu, Xudong Zhao, Xiaoming Lang, Yinian Mao, Guoquan Huang
Abstract: While LiDAR and cameras are becoming ubiquitous for unmanned aerial vehicles (UAVs) but can be ineffective in challenging environments, 4D millimeter‑wave (MMW) radars that can provide robust 3D ranging and Doppler velocity measurements are less exploited for aerial navigation. In this paper, we develop an efficient and robust error‑state Kalman filter (ESKF)‑based radar‑inertial navigation for UAVs. The key idea of the proposed approach is the point‑to‑distribution radar scan matching to provide motion constraints with proper uncertainty qualification, which are used to update the navigation states in a tightly coupled manner, along with the Doppler velocity measurements. Moreover, we propose a robust keyframe‑based matching scheme against the prior map (if available) to bound the accumulated navigation errors and thus provide a radar‑based global localization solution with high accuracy. Extensive real‑world experimental validations have demonstrated that the proposed radar‑aided inertial navigation outperforms state‑of‑the‑art methods in both accuracy and robustness.
Authors: S. Doodeman, Z. Tang, M. Jacinto, R. Cunha, C. Silvestre
Abstract: This work addresses the practical problem of distributed formation tracking control of a group of quadrotor vehicles in a relaxed sensing graph topology with a very limited sensor set, where only one leader vehicle can access the global position. Other vehicles in the formation are assumed to only have access to inter‑agent bearing (direction) measurements and relative velocities with respect to their neighbor agents. A hierarchical control architecture is adopted for each quadrotor, combining a high‑gain attitude inner‑loop and an outer‑loop bearing‑based formation controller with collision avoidance augmentation. The proposed method enables a group of quadrotors to track arbitrary bearing persistently exciting desired formations, including time‑varying shapes and rotational maneuvers, such that each quadrotor only requires relative measurements to at least one neighboring quadrotor. The effective performance of the control strategy is validated by numerical simulations in MATLAB and real‑world experiments with three quadrotors.
Authors: Luca Crupi, Luca Butera, Alberto Ferrante, Alessandro Giusti, Daniele Palossi
Abstract: Efficient crop production requires early detection of pest outbreaks and timely treatments; we consider a solution based on a fleet of multiple autonomous miniaturized unmanned aerial vehicles (nano‑UAVs) to visually detect pests and a single slower heavy vehicle that visits the detected outbreaks to deliver treatments. To cope with the extreme limitations aboard nano‑UAVs, e.g., low‑resolution sensors and sub‑100 mW computational power budget, we design, fine‑tune, and optimize a tiny image‑based convolutional neural network (CNN) for pest detection. Despite the small size of our CNN (i.e., 0.58 GOps/inference), on our dataset, it scores a mean average precision (mAP) of 0.79 in detecting harmful bugs, i.e., 14% lower mAP but 32x fewer operations than the best‑performing CNN in the literature. Our CNN runs in real‑time at 6.8 frame/s, requiring 33 mW on a GWT GAP9 System‑on‑Chip aboard a Crazyflie nano‑UAV. Then, to cope with in‑field unexpected obstacles, we leverage a global+local path planner based on the A algorithm. The global path planner determines the best route for the nano‑UAV to sweep the entire area, while the local one runs up to 50 Hz aboard our nano‑UAV and prevents collision by adjusting the short‑distance path. Finally, we demonstrate with in‑simulator experiments that once a 25 nano‑UAVs fleet has combed a 200x200 m vineyard, collected information can be used to plan the best path for the tractor, visiting all and only required hotspots. In this scenario, our efficient transportation system, compared to a traditional single‑ground vehicle performing both inspection and treatment, can save up to 20 h working time.
Authors: Saad Masrur, Ismail Guvenc
Abstract: Localization of radio frequency (RF) sources has critical applications, including search and rescue, jammer detection, and monitoring of hostile activities. Unmanned aerial vehicles (UAVs) offer significant advantages for RF source localization (RFSL) over terrestrial methods, leveraging autonomous 3D navigation and improved signal capture at higher altitudes. Recent advancements in deep learning (DL) have further enhanced localization accuracy, particularly for outdoor scenarios. DL models often face challenges in real‑world performance, as they are typically trained on simulated datasets that fail to replicate real‑world conditions fully. To address this, we first propose the Enhanced Two‑Ray propagation model, reducing the simulation‑to‑reality gap by improving the accuracy of propagation environment modeling. For RFSL, we propose the 3D Cluster‑Based RealAdaptRNet, a DL‑based method leveraging 3D clustering‑based feature extraction for robust localization. Experimental results demonstrate that the proposed Enhanced Two‑Ray model provides superior accuracy in simulating real‑world propagation scenarios compared to conventional free‑space and two‑ray models. Notably, the 3D Cluster‑Based RealAdaptRNet, trained entirely on simulated datasets, achieves exceptional performance when validated in real‑world environments using the AERPAW physical testbed, with an average localization error of 18.2 m. The proposed approach is computationally efficient, utilizing 33.5 times fewer parameters, and demonstrates strong generalization capabilities across diverse trajectories, making it highly suitable for real‑world applications.
Authors: Kangning Cui, Rongkun Zhu, Manqi Wang, Wei Tang, Gregory D. Larsen, Victor P. Pauca, Sarra Alqahtani, Fan Yang, David Segurado, David Lutz, Jean-Michel Morel, Miles R. Silman
Abstract: Palms are ecologically and economically indicators of tropical forest health, biodiversity, and human impact that support local economies and global forest product supply chains. While palm detection in plantations is well‑studied, efforts to map naturally occurring palms in dense forests remain limited by overlapping crowns, uneven shading, and heterogeneous landscapes. We develop PRISM (Processing, Inference, Segmentation, and Mapping), a flexible pipeline for detecting and localizing palms in dense tropical forests using large orthomosaic images. Orthomosaics are created from thousands of aerial images and spanning several to hundreds of gigabytes. Our contributions are threefold. First, we construct a large UAV‑derived orthomosaic dataset collected across 21 ecologically diverse sites in western Ecuador, annotated with 8,830 bounding boxes and 5,026 palm center points. Second, we evaluate multiple state‑of‑the‑art object detectors based on efficiency and performance, integrating zero‑shot SAM 2 as the segmentation backbone, and refining the results for precise geographic mapping. Third, we apply calibration methods to align confidence scores with IoU and explore saliency maps for feature explainability. Though optimized for palms, PRISM is adaptable for identifying other natural objects, such as eastern white pines. Future work will explore transfer learning for lower‑resolution datasets (0.5 to 1m).
Authors: Wei Zhao, Shaoxin Cui, Wen Qiu, Zhiqiang He, Zhi Liu, Xiao Zheng, Bomin Mao, Nei Kato
Abstract: Unmanned aerial vehicles (UAVs) are playing an increasingly pivotal role in modern communication networks,offering flexibility and enhanced coverage for a variety of applica‑tions. However, UAV networks pose significant challenges due to their dynamic and distributed nature, particularly when dealing with tasks such as power allocation, channel assignment, caching,and task offloading. Traditional optimization techniques often struggle to handle the complexity and unpredictability of these environments, leading to suboptimal performance. This survey provides a comprehensive examination of how deep reinforcement learning (DRL) can be applied to solve these mathematical optimization problems in UAV communications and networking.Rather than simply introducing DRL methods, the focus is on demonstrating how these methods can be utilized to solve complex mathematical models of the underlying problems. We begin by reviewing the fundamental concepts of DRL, including value‑based, policy‑based, and actor‑critic approaches. Then,we illustrate how DRL algorithms are applied to specific UAV network tasks by discussing from problem formulations to DRL implementation. By framing UAV communication challenges as optimization problems, this survey emphasizes the practical value of DRL in dynamic and uncertain environments. We also explore the strengths of DRL in handling large‑scale network scenarios and the ability to continuously adapt to changes in the environment. In addition, future research directions are outlined, highlighting the potential for DRL to further enhance UAV communications and expand its applicability to more complex,multi‑agent settings.
Authors: Jing Jin, Yutao Zhang, Ruitian Xu, Yixin Chen
Abstract: Recent advancements in large language models (LLMs) provide a more effective pathway for upgrading brain‑computer interface (BCI) technology in terms of user interaction. The widespread adoption of BCIs in daily application scenarios is still limited by factors such as their single functionality, restricted paradigm design, weak multilingual support, and low levels of intelligence. In this paper, we propose an innovative BCI system that deeply integrates a steady‑state visual evoked potential (SSVEP) speller with an LLM application programming interface (API). It allows natural language input through the SSVEP speller and dynamically calls large models to generate SSVEP paradigms. The command prompt, blinking frequency, and layout position are adjustable to meet the user's control requirements in various scenarios. More than ten languages are compatible with the multilingual support of LLM. A variety of task scenarios, such as home appliance control, robotic arm operation, and unmanned aerial vehicle (UAV) management are provided. The task interfaces of the system can be personalized according to the user's habits, usage scenarios, and equipment characteristics. By combining the SSVEP speller with an LLM, the system solves numerous challenges faced by current BCI systems and makes breakthroughs in functionality, intelligence, and multilingual support. The introduction of LLM not only enhances user experience but also expands the potential applications of BCI technology in real‑world environments.
Authors: Yuanze Xu, Ming Dai, Wenxiao Cai, Wankou Yang
Abstract: Image retrieval has been employed as a robust complementary technique to address the challenge of Unmanned Aerial Vehicles (UAVs) self‑positioning. However, most existing methods primarily focus on localizing objects captured by UAVs through complex part‑based representations, often overlooking the unique challenges associated with UAV self‑positioning, such as fine‑grained spatial discrimination requirements and dynamic scene variations. To address the above issues, we propose the Context‑Enhanced method for precise UAV Self‑Positioning (CEUSP), specifically designed for UAV self‑positioning tasks. CEUSP integrates a Dynamic Sampling Strategy (DSS) to efficiently select optimal negative samples, while the Rubik's Cube Attention (RCA) module, combined with the Context‑Aware Channel Integration (CACI) module, enhances feature representation and discrimination by exploiting interdimensional interactions, inspired by the rotational mechanics of a Rubik's Cube. Extensive experimental validate the effectiveness of the proposed method, demonstrating notable improvements in feature representation and UAV self‑positioning accuracy within complex urban environments. Our approach achieves state‑of‑the‑art performance on the DenseUAV dataset, which is specifically designed for dense urban contexts, and also delivers competitive results on the widely recognized University‑1652 benchmark.
Authors: Halim Lee, Jiwon Seo
Abstract: Accurate and swift localization of the target is crucial in emergencies. However, accurate position data of a target mobile device, typically obtained from global navigation satellite systems (GNSS), cellular networks, or WiFi, may not always be accessible to first responders. For instance, 1) accuracy and availability can be limited in challenging signal reception environments, and 2) in regions where emergency location services are not mandatory, certain mobile devices may not transmit their location during emergencies. As an alternative localization method, a network of unmanned aerial vehicles (UAVs) can be employed to passively locate targets by collecting radio frequency (RF) signal measurements, such as received signal strength (RSS). In these situations, UAV trajectories play a critical role in localization performance, influencing both accuracy and search time. Previous studies optimized UAV trajectories using the determinant of the Fisher information matrix (FIM), but its performance declines under unfavorable geometric conditions, such as when UAVs start from a single base, leading to position ambiguity. To address this, our prior work introduced a rigidity‑based approach, which improved the search time compared to FIM‑based methods in our simulation case. However, the high computational cost of rigidity‑based optimization, primarily due to singular value decomposition (SVD), limits its practicality. In this paper, we applied techniques to reduce computational complexity, including randomized SVD, smooth SVD, and vertex pruning.
Authors: Neda Rahimpour Anaraki, Maryam Tahmasbi, Saeed Reza Kheradpisheh
Abstract: Finding the cadastral boundaries of farmlands is a crucial concern for land administration. Therefore, using deep learning methods to expedite and simplify the extraction of cadastral boundaries from satellite and unmanned aerial vehicle (UAV) images is critical. In this paper, we employ transfer learning to train a U‑Net model with a ResNet34 backbone to detect cadastral boundaries through three‑class semantic segmentation: "boundary", "field", and "background". We evaluate the performance on two satellite images from farmlands in Iran using "precision", "recall", and "F‑score", achieving high values of 88%, 75%, and 81%, respectively, which indicate promising results.
Authors: Wenwen Xie, Geng Sun, Jiacheng Wang, Hongyang Du, Jiawen Kang, Dusit Niyato, Kaibin Huang, Victor C. M. Leung
Abstract: Integrated sensing and communication (ISAC) has garnered substantial research interest owing to its pivotal role in advancing the development of next‑generation (6G) wireless networks. However, achieving a performance balance between communication and sensing in the dual‑function radar communication (DFRC)‑based ISAC system remains a significant challenge. In this paper, a low‑altitude intelligent reflecting surface (IRS)‑assisted ISAC system is explored, where a base station (BS) supports dual‑functional operations, enabling both data transmission for multiple users and sensing for a blocked target, with the channel quality enhanced by an IRS mounted on the unmanned aerial vehicle (UAV). Moreover, we formulate an integrated communication, sensing, and energy efficiency multi‑objective optimization problem (CSEMOP), which aims to maximize the communication rate of the users and the sensing rate of the target, while minimizing UAV propulsion energy consumption by jointly optimizing the BS beamforming matrix, IRS phase shifts, the flight velocity and angle of the UAV. Considering the non‑convexity, trade‑off, and dynamic nature of the formulated CSEMOP, we propose a generative diffusion model‑based deep deterministic policy gradient (GDMDDPG) algorithm to solve the problem. Specifically, the diffusion model is incorporated into the actor network of DDPG to improve the action quality, with noise perturbation mechanism for better exploration and recent prioritized experience replay (RPER) sampling mechanism for enhanced training efficiency. Simulation results indicate that the GDMDDPG algorithm delivers superior performance compared to the existing methods.
Authors: Donggu Lee, Ozgur Ozdemir, Asokan Ram, Ismail Guvenc
Abstract: Unmanned aerial vehicles (UAVs) are expected to play a key role in 6G‑enabled vehicular‑to‑everything (V2X) communications requiring high data rates, low latency, and reliable connectivity for mission‑critical applications. Multi‑input multi‑output (MIMO) technology is essential for meeting these demands. However, UAV link performance is significantly affected by environmental factors such as signal attenuation, multipath propagation, and blockage from obstacles, particularly dense foliage in rural areas. In this paper, we investigate RF coverage and channel rank over UAV channels in foliage‑dominated rural environments using ray tracing (RT) simulations. We conduct RT‑based channel rank and RF coverage analysis over Lake Wheeler Field Labs at NC State University to examine the impact on UAV links. Custom‑modeled trees are integrated into the RT simulations using NVIDIA Sionna, Blender, and Open Street Map (OSM) database to capture realistic blockage effects. Results indicate that tree‑induced blockage impacts RF coverage and channel rank at lower UAV altitudes. We also propose a Kriging interpolation‑based 3D channel rank interpolation scheme, leveraging the observed spatial correlation of channel rank in the given environments. The accuracy of the proposed scheme is evaluated using the mean absolute error (MAE) metric and compared against baseline interpolation methods. Finally, we compare the RT‑based received signal strength (RSS) and channel rank results with real‑world measurements from the NSF AERPAW testbed demonstrating reasonable consistency between simulation results and the measurements.
Authors: Zifan Lang, Guixia Liu, Geng Sun, Jiahui Li, Zemin Sun, Jiacheng Wang, Victor C. M. Leung
Abstract: This paper proposes a UAV‑assisted forwarding system based on distributed beamforming to enhance age of information (AoI) in Internet of Things (IoT). Specifically, UAVs collect and relay data between sensor nodes (SNs) and the remote base station (BS). However, flight delays increase the AoI and degrade the network performance. To mitigate this, we adopt distributed beamforming to extend the communication range, reduce the flight frequency and ensure the continuous data relay and efficient energy utilization. Then, we formulate an optimization problem to minimize AoI and UAV energy consumption, by jointly optimizing the UAV trajectories and communication schedules. The problem is non‑convex and with high dynamic, and thus we propose a deep reinforcement learning (DRL)‑based algorithm to solve the problem, thereby enhancing the stability and accelerate convergence speed. Simulation results show that the proposed algorithm effectively addresses the problem and outperforms other benchmark algorithms.
Authors: Wenhui Ma, Wenhao Li, Bo Jin, Changhong Lu, Xiangfeng Wang
Abstract: Unmanned Aerial Vehicles (UAVs) and Automated Guided Vehicles (AGVs) increasingly collaborate in logistics, surveillance, inspection tasks and etc. However, existing simulators often focus on a single domain, limiting cross‑domain study. This paper presents the SkyRover, a modular simulator for UAV‑AGV multi‑agent pathfinding (MAPF). SkyRover supports realistic agent dynamics, configurable 3D environments, and convenient APIs for external solvers and learning methods. By unifying ground and aerial operations, it facilitates cross‑domain algorithm design, testing, and benchmarking. Experiments highlight SkyRover's capacity for efficient pathfinding and high‑fidelity simulations in UAV‑AGV coordination. Project is available at https://sites.google.com/view/mapf3d/home.
Authors: Kamran Shafafi, Manuel Ricardo, Rui Campos
Abstract: Unmanned Aerial Vehicles (UAVs) increasingly enhance the Quality of Service (QoS) in wireless networks due to their flexibility and cost‑effectiveness. However, optimizing UAV placement in dynamic, obstacle‑prone environments remains a significant research challenge due to their complexity. Reinforcement Learning (RL) offers adaptability and robustness in such environments, proving effective for UAV optimization.
This paper introduces RLpos‑3, a novel framework that integrates standard RL techniques and simulation libraries with Network Simulator 3 (ns‑3) to facilitate the development and evaluation of UAV positioning algorithms. RLpos‑3 serves as a supplementary tool for researchers, enabling the implementation, analysis, and benchmarking of UAV positioning strategies across diverse environmental conditions while meeting user traffic demands. To validate its effectiveness, we present use cases demonstrating RLpos‑3's performance in optimizing UAV placement under realistic conditions, such as urban and obstacle‑rich environments.
Authors: Samuel O. Folorunsho, Maggi Ni, William R. Norris
Abstract: Consider an unmanned aerial vehicle (UAV) physically connected to the ground station with a tether operating in a space, tasked with performing precise maneuvers while constrained by the physical limitation of its tether, which prevents it from flying beyond a maximum allowable length. Violating this tether constraint could lead to system failure or operational hazards, making it essential to enforce safety constraints dynamically while ensuring the drone can track desired trajectories accurately. This paper presents a Control Barrier Function Quadratic Programming Framework (CBF‑QP) for ensuring the safe and efficient operation of tethered unmanned aerial vehicles (TUAVs). The framework leverages nominal backstepping control to achieve trajectory tracking, augmented with control barrier functions to ensure compliance with the tether constraint. In this proposed method, the tether constraint is directly embedded in the control design and therefore guarantees the TUAV remains within a predefined operational region defined by the maximum tether length while achieving precise trajectory tracking. The effectiveness of the proposed framework is validated through simulations involving set‑point tracking, dynamic trajectory following, and disturbances such as incorrect user inputs. The results demonstrate that the TUAV respects the tether constraint ||x(t)||</= Lmax, with tracking errors converging to zero and the control input remaining bounded.
Authors: Jiahao You, Ziye Jia, Chao Dong, Qihui Wu, Zhu Han
Abstract: The increasing deployment of unmanned surface vehicles (USVs) require computational support and coverage in applications such as maritime search and rescue. Unmanned aerial vehicles (UAVs) can offer low‑cost, flexible aerial services, and ground stations (GSs) can provide powerful supports, which can cooperate to help the USVs in complex scenarios. However, the collaboration between UAVs and GSs for USVs faces challenges of task uncertainties, USVs trajectory uncertainties, heterogeneities, and limited computational resources. To address these issues, we propose a cooperative UAV and GS based robust multi‑access edge computing framework to assist USVs in completing computational tasks. Specifically, we formulate the optimization problem of joint task offloading and UAV trajectory to minimize the total execution time, which is in the form of mixed integer nonlinear programming and NP‑hard to tackle. Therefore, we propose the algorithm of generative artificial intelligence‑enhanced heterogeneous agent proximal policy optimization (GAI‑HAPPO). The proposed algorithm integrates GAI models to enhance the actor network ability to model complex environments and extract high‑level features, thereby allowing the algorithm to predict uncertainties and adapt to dynamic conditions. Additionally, GAI stabilizes the critic network, addressing the instability of multi‑agent reinforcement learning approaches. Finally, extensive simulations demonstrate that the proposed algorithm outperforms the existing benchmark methods, thus highlighting the potentials in tackling intricate, cross‑domain issues in the considered scenarios.
Authors: Reza Ahmadvand, Sarah Safura Sharif, Yaser Mike Banad
Abstract: Presented study introduces a novel distributed cloud‑edge framework for autonomous multi‑UAV systems that combines the computational efficiency of neuromorphic computing with nature‑inspired control strategies. The proposed architecture equips each UAV with an individual Spiking Neural Network (SNN) that learns to reproduce optimal control signals generated by a cloud‑based controller, enabling robust operation even during communication interruptions. By integrating spike coding with nature‑inspired control principles inspired by Tilapia fish territorial behavior, our system achieves sophisticated formation control and obstacle avoidance in complex urban environments. The distributed architecture leverages cloud computing for complex calculations while maintaining local autonomy through edge‑based SNNs, significantly reducing energy consumption and computational overhead compared to traditional centralized approaches. Our framework addresses critical limitations of conventional methods, including the dependency on pre‑modeled environments, computational intensity of traditional methods, and local minima issues in potential field approaches. Simulation results demonstrate the system's effectiveness across two different scenarios. First, the indoor deployment of a multi‑UAV system made‑up of 15 UAVs. Then the collision‑free formation control of a moving UAV flock including 6 UAVs considering the obstacle avoidance. Owing to the sparsity of spiking patterns, and the event‑based nature of SNNs in average for the whole group of UAVs, the framework achieves almost 90% reduction in computational burden compared to traditional von Neumann architectures implementing traditional artificial neural networks.
Authors: Zhuangkun Wei, Wenxiu Hu, Yathreb Bouazizi, Mengbang Zou, Chenguang Liu, Yunfei Chen, Hongjian Sun, Julie McCann
Abstract: Coordinated controlling a large UAV swarm requires significant spectrum resources due to the need for bandwidth allocation per UAV, posing a challenge in resource‑limited environments. Over‑the‑air (OTA) control has emerged as a spectrum‑efficient approach, leveraging electromagnetic superposition to form control signals at a base station (BS). However, existing OTA controllers lack sufficient optimization variables to meet UAV swarm control objectives and fail to integrate control with other BS functions like sensing. This work proposes an integrated sensing and OTA control framework (ISAC‑OTA) for UAV swarm. The BS performs OTA signal construction (uplink) and dispatch (downlink) while simultaneously sensing objects. Two uplink post‑processing methods are developed: a control‑centric approach generating closed‑form control signals via a feedback‑looped OTA control problem, and a sensing‑centric method mitigating transmission‑induced interference for accurate object sensing. For the downlink, a non‑convex problem is formulated and solved to minimize control signal dispatch (transmission) error while maintaining a minimum sensing signal‑to‑noise ratio (SNR). Simulation results show that the proposed ISAC‑OTA controller achieves control performance comparable to the benchmark optimal control algorithm while maintaining high sensing accuracy, despite OTA transmission interference. Moreover, it eliminates the need for per‑UAV bandwidth allocation, showcasing a spectrum‑efficient method for cooperative control in future wireless systems.
Authors: Boxiong Wang, Hui Kang, Jiahui Li, Geng Sun, Zemin Sun, Jiacheng Wang, Dusit Niyato
Abstract: Unmanned aerial vehicle (UAV)‑assisted mobile edge computing (MEC) and data collection (DC) have been popular research issues. Different from existing works that consider MEC and DC scenarios separately, this paper investigates a multi‑UAV‑assisted joint MEC‑DC system. Specifically, we formulate a joint optimization problem to minimize the MEC latency and maximize the collected data volume. This problem can be classified as a non‑convex mixed integer programming problem that exhibits long‑term optimization and dynamics. Thus, we propose a deep reinforcement learning‑based approach that jointly optimizes the UAV movement, user transmit power, and user association in real time to solve the problem efficiently. Specifically, we reformulate the optimization problem into an action space‑reduced Markov decision process (MDP) and optimize the user association by using a two‑phase matching‑based association (TMA) strategy. Subsequently, we propose a soft actor‑critic (SAC)‑based approach that integrates the proposed TMA strategy (SAC‑TMA) to solve the formulated joint optimization problem collaboratively. Simulation results demonstrate that the proposed SAC‑TMA is able to coordinate the two subsystems and can effectively reduce the system latency and improve the data collection volume compared with other benchmark algorithms.
Authors: Blake A Myers, Matthew Q Hill, Veda Nandan Gandi, Thomas M Metz, Alice J O'Toole
Abstract: This study presents an investigation of four distinct approaches to long‑term person identification using body shape. Unlike short‑term re‑identification systems that rely on temporary features (e.g., clothing), we focus on learning persistent body shape characteristics that remain stable over time. We introduce a body identification model based on a Vision Transformer (ViT) (Body Identification from Diverse Datasets, BIDDS) and on a Swin‑ViT model (Swin‑BIDDS). We also expand on previous approaches based on the Linguistic and Non‑linguistic Core ResNet Identity Models (LCRIM and NLCRIM), but with improved training. All models are trained on a large and diverse dataset of over 1.9 million images of approximately 5k identities across 9 databases. Performance was evaluated on standard re‑identification benchmark datasets (MARS, MSMT17, Outdoor Gait, DeepChange) and on an unconstrained dataset that includes images at a distance (from close‑range to 1000m), at altitude (from an unmanned aerial vehicle, UAV), and with clothing change. A comparative analysis across these models provides insights into how different backbone architectures and input image sizes impact long‑term body identification performance across real‑world conditions.
Authors: Kristina Cormier, Kongwen, Zhang, Joshua Padron-Uy, Albert Wong, Keona Gagnier, Ajitesh Parihar
Abstract: This research developed a prototype data warehouse to integrate multi‑source forestry data for long‑term monitoring, management, and sustainability. The data warehouse is intended to accommodate all types of imagery from various platforms, LiDAR point clouds, survey records, and paper documents, with the capability to transform these datasets into machine learning (ML) and deep learning classification and segmentation models. In this study, we pioneered the integration of unmanned aerial vehicle (UAV) imagery and paper records, testing the merged data on the YOLOv11 model. Paper records improved ground truth, and preliminary results demonstrated notable performance improvements.
This research aims to implement a data warehouse (DW) to manage data for a YOLO (You Only Look Once) model, which identifies objects in images. It does this by integrating advanced data processing pipelines. Data are also stored and easily accessible for future use, including comparing current and historical data to understand growth or declining patterns. In addition, the design is used to optimize resource usage. It also scales easily, not affecting other parts of the data warehouse when adding dimension tables or other fields to the fact table. DW performance and estimations for growing workloads are also explored in this paper.
Authors: Ashab Uddin, Ahmed Hamdi Sakr, Ning Zhang
Abstract: The increasing complexity of Intelligent Transportation Systems (ITS) has led to significant interest in computational offloading to external infrastructures such as edge servers, vehicular nodes, and UAVs. These dynamic and heterogeneous environments pose challenges for traditional offloading strategies, prompting the exploration of Reinforcement Learning (RL) and Deep Reinforcement Learning (DRL) as adaptive decision‑making frameworks. This survey presents a comprehensive review of recent advances in DRL‑based offloading for vehicular edge computing (VEC). We classify and compare existing works based on learning paradigms (e.g., single‑agent, multi‑agent), system architectures (e.g., centralized, distributed, hierarchical), and optimization objectives (e.g., latency, energy, fairness). Furthermore, we analyze how Markov Decision Process (MDP) formulations are applied and highlight emerging trends in reward design, coordination mechanisms, and scalability. Finally, we identify open challenges and outline future research directions to guide the development of robust and intelligent offloading strategies for next‑generation ITS.
Authors: Malaika Zafar, Roohan Ahmed Khan, Aleksey Fedoseev, Kumar Katyayan Jaiswal, Dzmitry Tsetserukou
Abstract: With the growing demand for efficient logistics and warehouse management, unmanned aerial vehicles (UAVs) are emerging as a valuable complement to automated guided vehicles (AGVs). UAVs enhance efficiency by navigating dense environments and operating at varying altitudes. However, their limited flight time, battery life, and payload capacity necessitate a supporting ground station. To address these challenges, we propose HetSwarm, a heterogeneous multi‑robot system that combines a UAV and a mobile ground robot for collaborative navigation in cluttered and dynamic conditions. Our approach employs an artificial potential field (APF)‑based path planner for the UAV, allowing it to dynamically adjust its trajectory in real time. The ground robot follows this path while maintaining connectivity through impedance links, ensuring stable coordination. Additionally, the ground robot establishes temporal impedance links with low‑height ground obstacles to avoid local collisions, as these obstacles do not interfere with the UAV's flight. Experimental validation of HetSwarm in diverse environmental conditions demonstrated a 90% success rate across 30 test cases. The ground robot exhibited an average deviation of 45 cm near obstacles, confirming effective collision avoidance. Extensive simulations in the Gym PyBullet environment further validated the robustness of our system for real‑world applications, demonstrating its potential for dynamic, real‑time task execution in cluttered environments.
Authors: Mustafa Siham, Qutaiba I. Ali
Abstract: The smart metering infrastructure may become one of the key elements in efficiently managing energy in smart cities. At the same time, traditional measurement record collection is performed by manual methods, which raises cost, safety, and accuracy issues. This paper proposes an innovative SMI architecture based on an unmanned aerial vehicle swarm organizing itself for the autonomous data collection in smart metering infrastructure with scalability and cost‑effectiveness while minimizing risks. We design an architecture‑based comprehensive system with various phases of operation, communication protocols, and robust failure‑handling mechanisms to ensure reliable operations. We further perform extensive simulations in maintenance of precise formations during flight, efficient data collection from smart meters, and adaptation to various failure scenarios. Importantly, we analyze the energy consumption of the proposed system in both drone flight operations and network communication. We now propose a battery sizing strategy and provide an estimate of the operational lifetime of the swarm, underlining the feasibility and practicality of our approach. Our results show that UAV swarms have great potential to revolutionize smart metering and to bring a further brick to greener and more resilient smart cities.
Authors: Geng Sun, Jian Xiao, Jiahui Li, Jiacheng Wang, Jiawen Kang, Dusit Niyato, Shiwen Mao
Abstract: Unmanned aerial vehicles (UAVs) have emerged as the potential aerial base stations (BSs) to improve terrestrial communications. However, the limited onboard energy and antenna power of a UAV restrict its communication range and transmission capability. To address these limitations, this work employs collaborative beamforming through a UAV‑enabled virtual antenna array to improve transmission performance from the UAV to terrestrial mobile users, under interference from non‑associated BSs and dynamic channel conditions. Specifically, we introduce a memory‑based random walk model to more accurately depict the mobility patterns of terrestrial mobile users. Following this, we formulate a multi‑objective optimization problem (MOP) focused on maximizing the transmission rate while minimizing the flight energy consumption of the UAV swarm. Given the NP‑hard nature of the formulated MOP and the highly dynamic environment, we transform this problem into a multi‑objective Markov decision process and propose an improved evolutionary multi‑objective reinforcement learning algorithm. Specifically, this algorithm introduces an evolutionary learning approach to obtain the approximate Pareto set for the formulated MOP. Moreover, the algorithm incorporates a long short‑term memory network and hyper‑sphere‑based task selection method to discern the movement patterns of terrestrial mobile users and improve the diversity of the obtained Pareto set. Simulation results demonstrate that the proposed method effectively generates a diverse range of non‑dominated policies and outperforms existing methods. Additional simulations demonstrate the scalability and robustness of the proposed CB‑based method under different system parameters and various unexpected circumstances.
Authors: L. Colombo, J. Giribet, D. Martín de Diego
Abstract: Numerical methods that preserves geometric invariants of the system such as energy, momentum and symplectic form, are called geometric integrators. These include variational integrators as an important subclass of geometric integrators. The general idea for those variational integrators is to discretize Hamilton's principle rather than the equations of motion and as a consequence these methods preserves some of the invariants of the original system (symplecticity, symmetry, good behavior of energy,...). In this paper, we construct variational integrators for control‑dependent Lagrangian systems on Lie groups. These integrators are derived via a discrete‑time variational principle for discrete‑time control‑dependent reduced Lagrangians. We employ the variational integrator into optimal control problems for path planning of foldable unmanned aerial vehicles (UAVs). Simulation are shown to validate the performance of the geometric integrator.
Authors: Fen Liu, Shenghai Yuan, Wei Meng, Rong Su, Lihua Xie
Abstract: This paper investigates the stochastic moving target encirclement problem in a realistic setting. In contrast to typical assumptions in related works, the target in our work is non‑cooperative and capable of escaping the circle containment by boosting its speed to maximum for a short duration. Considering the extreme environment, such as GPS denial, weight limit, and lack of ground guidance, two agents can only rely on their onboard single‑modality perception tools to measure the distances to the target. The distance measurement allows for creating a position estimator by providing a target position‑dependent variable. Furthermore, the construction of the unique distributed anti‑synchronization controller (DASC) can guarantee that the two agents track and encircle the target swiftly. The convergence of the estimator and controller is rigorously evaluated using the Lyapunov technique. A real‑world UAV‑based experiment is conducted to illustrate the performance of the proposed methodology in addition to a simulated Matlab numerical sample. Our video demonstration can be found in the URL https://youtu.be/JXu1gib99yQ.
Authors: Maneesha Wickramasuriya, Beomyeol Yu, Taeyoung Lee, Murray Snyder
Abstract: This paper proposes a vision‑in‑the‑loop simulation environment for deep monocular pose estimation of a UAV operating in an ocean environment. Recently, a deep neural network with a transformer architecture has been successfully trained to estimate the pose of a UAV relative to the flight deck of a research vessel, overcoming several limitations of GPS‑based approaches. However, validating the deep pose estimation scheme in an actual ocean environment poses significant challenges due to the limited availability of research vessels and the associated operational costs. To address these issues, we present a photo‑realistic 3D virtual environment leveraging recent advancements in Gaussian splatting, a novel technique that represents 3D scenes by modeling image pixels as Gaussian distributions in 3D space, creating a lightweight and high‑quality visual model from multiple viewpoints. This approach enables the creation of a virtual environment integrating multiple real‑world images collected in situ. The resulting simulation enables the indoor testing of flight maneuvers while verifying all aspects of flight software, hardware, and the deep monocular pose estimation scheme. This approach provides a cost‑effective solution for testing and validating the autonomous flight of shipboard UAVs, specifically focusing on vision‑based control and estimation algorithms.
Authors: Simone Martini, Margareta Stefanovic, Kimon P. Valavanis
Abstract: This paper presents a methodology to achieve lower‑dimensional Koopman quasi‑linear representations of nonlinear system dynamics using Koopman generalized eigenfunctions. The proposed approach considers the analytically derived Koopman formulation of rigid body dynamics, but it can be extended to any data‑driven or analytically derived generalized eigenfunction set. It achieves a representation for which the number of Koopman observables matches the number of inputs allowing for Koopman linearization control solutions rather than resorting to the least squares approximation method adopted in high dimensional Koopman formulations. Through a linear combination of Koopman generalized eigenfunctions a new set of Koopman generalized eigenfunction is constructed so that the zero order truncation approximate a Koopman eigenfunction which can be used to design linear control strategies to steer the dynamics of the original nonlinear system. The proposed methodology is tested by designing a linear quadratic (LQ) flight controller for a quadrotor UAV. Numerical and Hardware‑in‑the‑loop (HIL) simulations validate the applicability and real‑time implementability of the proposed approach in the presence of noise and sensor delays. The main advantage of the proposed method is the realization of a fully actuated Koopman based model which, in the case of the underactuated quadrotor system, allows to achieve trajectory tracking through a single linear control loop.
Authors: Balakrishnan Dharmalingam, Rajdeep Mukherjee, Brett Piggott, Guohuan Feng, Anyi Liu
Abstract: Increased utilization of unmanned aerial vehicles (UAVs) in critical operations necessitates secure and reliable communication with Ground Control Stations (GCS). This paper introduces Aero‑LLM, a framework integrating multiple Large Language Models (LLMs) to enhance UAV mission security and operational efficiency. Unlike conventional singular LLMs, Aero‑LLM leverages multiple specialized LLMs for various tasks, such as inferencing, anomaly detection, and forecasting, deployed across onboard systems, edge, and cloud servers. This dynamic, distributed architecture reduces performance bottleneck and increases security capabilities. Aero‑LLM's evaluation demonstrates outstanding task‑specific metrics and robust defense against cyber threats, significantly enhancing UAV decision‑making and operational capabilities and security resilience against cyber attacks, setting a new standard for secure, intelligent UAV operations.
Authors: David Čapek, Jan Hrnčíř, Tomáš Báča, Jakub Jirkal, Vojtěch Vonásek, Robert Pěnička, Martin Saska
Abstract: Robotic simulators play a crucial role in the development and testing of autonomous systems, particularly in the realm of Uncrewed Aerial Vehicles (UAV). However, existing simulators often lack high‑level autonomy, hindering their immediate applicability to complex tasks such as autonomous navigation in unknown environments. This limitation stems from the challenge of integrating realistic physics, photorealistic rendering, and diverse sensor modalities into a single simulation environment. At the same time, the existing photorealistic UAV simulators use mostly hand‑crafted environments with limited environment sizes, which prevents the testing of long‑range missions. This restricts the usage of existing simulators to only low‑level tasks such as control and collision avoidance. To this end, we propose the novel FlightForge UAV open‑source simulator. FlightForge offers advanced rendering capabilities, diverse control modalities, and, foremost, procedural generation of environments. Moreover, the simulator is already integrated with a fully autonomous UAV system capable of long‑range flights in cluttered unknown environments. The key innovation lies in novel procedural environment generation and seamless integration of high‑level autonomy into the simulation environment. Experimental results demonstrate superior sensor rendering capability compared to existing simulators, and also the ability of autonomous navigation in almost infinite environments.
Authors: Neetu R. R, Ozan Alp Topal, Özlem Tuğfe Demir, Emil Björnson, Cicek Cavdar, Gourab Ghatak, Vivek Ashok Bohara
Abstract: We consider a cell‑free massive multiple‑input multiple‑output (mMIMO) network, where unmanned aerial vehicles (UAVs) equipped with multiple antennas serve as distributed UAV‑access points (UAV‑APs). These UAV‑APs provide seamless coverage by jointly serving user equipments (UEs) with out predefined cell boundaries. However, high‑capacity wireless networks face significant challenges due to fronthaul limitations in UAV‑assisted architectures. This letter proposes a novel UAV‑based cell‑free mMIMO framework that leverages distributed UAV‑APs to serve UEs while addressing the capacity constraints of wireless fronthaul links. We evaluate functional split Options 7.2 and 8 for the fronthaul links, aiming to maximize the minimum signal‑to‑interference‑plus‑noise ratio (SINR) among the UEs and minimize the power consumption by optimizing the transmit powers of UAV‑APs and selectively activating them. Our analysis compares sub‑6 GHz and millimeter wave (mmWave) bands for the fronthaul, showing that mmWave achieves superior SINR with lower power consumption, particularly under Option 8. Additionally, we determine the minimum fronthaul bandwidth required to activate a single UAV‑AP under different split options.
Authors: Serhat Sönmez, Luca Montecchio, Simone Martini, Matthew J. Rutherford, Alessandro Rizzo, Margareta Stefanovic, Kimon P. Valavanis
Abstract: A reinforcement learning (RL) based methodology is proposed and implemented for online fine‑tuning of PID controller gains, thus, improving quadrotor effective and accurate trajectory tracking. The RL agent is first trained offline on a quadrotor PID attitude controller and then validated through simulations and experimental flights. RL exploits a Deep Deterministic Policy Gradient (DDPG) algorithm, which is an off‑policy actor‑critic method. Training and simulation studies are performed using Matlab/Simulink and the UAV Toolbox Support Package for PX4 Autopilots. Performance evaluation and comparison studies are performed between the hand‑tuned and RL‑based tuned approaches. The results show that the controller parameters based on RL are adjusted during flights, achieving the smallest attitude errors, thus significantly improving attitude tracking performance compared to the hand‑tuned approach.
Authors: S. Martínez-Rozas, D. Alejo, F. Caballero, L. Merino, M. A. Pérez-Cutiño, F. Rodriguez, V. Sánchez-Canales, I. Ventura, J. M. Díaz-Bañez
Abstract: This paper presents a novel approach to efficiently parameterize and estimate the state of a hanging tether for path and trajectory planning of a UGV tied to a UAV in a marsupial configuration. Most implementations in the state of the art assume a taut tether or make use of the catenary curve to model the shape of the hanging tether. The catenary model is complex to compute and must be instantiated thousands of times during the planning process, becoming a time‑consuming task, while the taut tether assumption simplifies the problem, but might overly restrict the movement of the platforms. In order to accelerate the planning process, this paper proposes defining an analytical model to efficiently compute the hanging tether state, and a method to get a tether state parameterization free of collisions. We exploit the existing similarity between the catenary and parabola curves to derive analytical expressions of the tether state.
Authors: Nhat-Tan Do, Nhi Ngoc-Yen Nguyen, Dieu-Phuong Nguyen, Trong-Hop Do
Abstract: Multi‑object tracking (MOT) in UAV‑based video is challenging due to variations in viewpoint, low resolution, and the presence of small objects. While other research on MOT dedicated to aerial videos primarily focuses on the academic aspect by developing sophisticated algorithms, there is a lack of attention to the practical aspect of these systems. In this paper, we propose a novel real‑time MOT framework that integrates Apache Kafka and Apache Spark for efficient and fault‑tolerant video stream processing, along with state‑of‑the‑art deep learning models YOLOv8/YOLOv10 and BYTETRACK/BoTSORT for accurate object detection and tracking. Our work highlights the importance of not only the advanced algorithms but also the integration of these methods with scalable and distributed systems. By leveraging these technologies, our system achieves a HOTA of 48.14 and a MOTA of 43.51 on the Visdrone2019‑MOT test set while maintaining a real‑time processing speed of 28 FPS on a single GPU. Our work demonstrates the potential of big data technologies and deep learning for addressing the challenges of MOT in UAV applications.
Authors: Abdullahi Isa Ahmed, Jamal Bentahar, El Mehdi Amhoud
Abstract: As next‑generation Internet of Things (NG‑IoT) networks continue to grow, the number of connected devices is rapidly increasing, along with their energy demands, creating challenges for resource management and sustainability. Energy‑efficient communication, particularly for power‑limited IoT devices, is therefore a key research focus. In this paper, we study Long Range (LoRa) networks supported by multiple unmanned aerial vehicles (UAVs) in an uplink data collection scenario. Our objective is to maximize system energy efficiency by jointly optimizing transmission power, spreading factor, bandwidth, and user association. To address this challenging problem, we first model it as a partially observable stochastic game (POSG) to account for dynamic channel conditions, end device mobility, and partial observability at each UAV. We then propose a two‑stage solution: a channel‑aware matching algorithm for end device‑UAV association and a cooperative multi‑agent reinforcement learning (MARL) based multi‑agent proximal policy optimization (MAPPO) framework for resource allocation under centralized training with decentralized execution (CTDE). Simulation results show that our proposed approach significantly outperforms conventional off‑policy and on‑policy MARL algorithms.
Authors: Ryan Barker
Abstract: The integration of Artificial Intelligence (AI) and Machine Learning (ML) in next‑generation wireless communication systems has become a cornerstone for advancing intelligent, adaptive, and scalable networks. This reading report examines key innovations in dynamic spectrum sensing (DSS), beginning with the foundational DeepSense framework, which uses convolutional neural networks (CNNs) and spectrogram‑based analysis for real‑time wideband spectrum monitoring. Building on this groundwork, it highlights advancements such as DeepSweep and Wideband Signal Stitching, which address the challenges of scalability, latency, and dataset diversity through parallel processing, semantic segmentation, and robust data augmentation strategies. The report then explores Open Radio Access Networks (ORAN), focusing on AI/ML‑driven enhancements for UAV experimentation, digital twin‑based optimization, network slicing, and self‑healing xApp development. By bridging AI‑based DSS methodologies with ORAN's open, vendor‑neutral architecture, these studies underscore the potential of software‑defined, intelligent infrastructures in enabling efficient, resilient, and self‑optimizing networks for 5G/6G ecosystems. Through this synthesis, the report highlights AI's transformative role in shaping the future of wireless communication and autonomous systems.
Authors: Tae-Won Ban, Kyu-Min Kang, Bang Chul Jung
Abstract: This paper introduces a novel unmanned aerial vehicles (UAV) chasing system designed to track and chase unauthorized UAVs, significantly enhancing their neutralization effectiveness.
Authors: Md Safwan Mondal, Subramanian Ramasamy, Pranav Bhounsule
Abstract: Integrating Unmanned Aerial Vehicles (UAVs) with Unmanned Ground Vehicles (UGVs) provides an effective solution for persistent surveillance in disaster management. UAVs excel at covering large areas rapidly, but their range is limited by battery capacity. UGVs, though slower, can carry larger batteries for extended missions. By using UGVs as mobile recharging stations, UAVs can extend mission duration through periodic refueling, leveraging the complementary strengths of both systems. To optimize this energy‑aware UAV‑UGV cooperative routing problem, we propose a planning framework that determines optimal routes and recharging points between a UAV and a UGV. Our solution employs a deep reinforcement learning (DRL) framework built on an encoder‑decoder transformer architecture with multi‑head attention mechanisms. This architecture enables the model to sequentially select actions for visiting mission points and coordinating recharging rendezvous between the UAV and UGV. The DRL model is trained to minimize the age periods (the time gap between consecutive visits) of mission points, ensuring effective surveillance. We evaluate the framework across various problem sizes and distributions, comparing its performance against heuristic methods and an existing learning‑based model. Results show that our approach consistently outperforms these baselines in both solution quality and runtime. Additionally, we demonstrate the DRL policy's applicability in a real‑world disaster scenario as a case study and explore its potential for online mission planning to handle dynamic changes. Adapting the DRL policy for priority‑driven surveillance highlights the model's generalizability for real‑time disaster response.
Authors: Lavanya Ratnabala, Aleksey Fedoseev, Robinroy Peter, Dzmitry Tsetserukou
Abstract: This paper addresses the challenge of decentralized task allocation within heterogeneous multi‑agent systems operating under communication constraints. We introduce a novel framework that integrates graph neural networks (GNNs) with a centralized training and decentralized execution (CTDE) paradigm, further enhanced by a tailored Proximal Policy Optimization (PPO) algorithm for multi‑agent deep reinforcement learning (MARL). Our approach enables unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs) to dynamically allocate tasks efficiently without necessitating central coordination in a 3D grid environment. The framework minimizes total travel time while simultaneously avoiding conflicts in task assignments. For the cost calculation and routing, we employ reservation‑based A and R path planners. Experimental results revealed that our method achieves a high 92.5% conflict‑free success rate, with only a 7.49% performance gap compared to the centralized Hungarian method, while outperforming the heuristic decentralized baseline based on greedy approach. Additionally, the framework exhibits scalability with up to 20 agents with allocation processing of 2.8 s and robustness in responding to dynamically generated tasks, underscoring its potential for real‑world applications in complex multi‑agent scenarios.
Authors: Yuxuan Zhang, Luciano Sebastian Martinez-Rau, Quynh Nguyen Phuong Vu, Bengt Oelmann, Sebastian Bader
Abstract: Structural Health Monitoring (SHM) ensures the safety and longevity of infrastructure by enabling timely damage detection. Vision‑based crack detection, combined with UAVs, addresses the limitations of traditional sensor‑based SHM methods but requires the deployment of efficient deep learning models on resource‑constrained devices. This study evaluates two lightweight convolutional neural network models, MobileNetV1x0.25 and MobileNetV2x0.5, across TensorFlow, PyTorch, and Open Neural Network Exchange platforms using three quantization techniques: dynamic quantization, post‑training quantization (PTQ), and quantization‑aware training (QAT). Results show that QAT consistently achieves near‑floating‑point accuracy, such as an F1‑score of 0.8376 for MBNV2x0.5 with Torch‑QAT, while maintaining efficient resource usage. PTQ significantly reduces memory and energy consumption but suffers from accuracy loss, particularly in TensorFlow. Dynamic quantization preserves accuracy but faces deployment challenges on PyTorch. By leveraging QAT, this work enables real‑time, low‑power crack detection on UAVs, enhancing safety, scalability, and cost‑efficiency in SHM applications, while providing insights into balancing accuracy and efficiency across different platforms for autonomous inspections.
Authors: Eslam Eldeeb, Hirley Alves
Abstract: Reinforcement learning (RL) has been a promising essence in future 5G‑beyond and 6G systems. Its main advantage lies in its robust model‑free decision‑making in complex and large‑dimension wireless environments. However, most existing RL frameworks rely on online interaction with the environment, which might not be feasible due to safety and cost concerns. Another problem with online RL is the lack of scalability of the designed algorithm with dynamic or new environments. This work proposes a novel, resilient, few‑shot meta‑offline RL algorithm combining offline RL using conservative Q‑learning (CQL) and meta‑learning using model‑agnostic meta‑learning (MAML). The proposed algorithm can train RL models using static offline datasets without any online interaction with the environments. In addition, with the aid of MAML, the proposed model can be scaled up to new unseen environments. We showcase the proposed algorithm for optimizing an unmanned aerial vehicle (UAV) 's trajectory and scheduling policy to minimize the age‑of‑information (AoI) and transmission power of limited‑power devices. Numerical results show that the proposed few‑shot meta‑offline RL algorithm converges faster than baseline schemes, such as deep Q‑networks and CQL. In addition, it is the only algorithm that can achieve optimal joint AoI and transmission power using an offline dataset with few shots of data points and is resilient to network failures due to unprecedented environmental changes.
Authors: Jiuhong Xiao, Giuseppe Loianno
Abstract: Geo‑localization is an essential component of Unmanned Aerial Vehicle (UAV) navigation systems to ensure precise absolute self‑localization in outdoor environments. To address the challenges of GPS signal interruptions or low illumination, Thermal Geo‑localization (TG) employs aerial thermal imagery to align with reference satellite maps to accurately determine the UAV's location. However, existing TG methods lack uncertainty measurement in their outputs, compromising system robustness in the presence of textureless or corrupted thermal images, self‑similar or outdated satellite maps, geometric noises, or thermal images exceeding satellite maps. To overcome these limitations, this paper presents UASTHN, a novel approach for Uncertainty Estimation (UE) in Deep Homography Estimation (DHE) tasks for TG applications. Specifically, we introduce a novel Crop‑based Test‑Time Augmentation (CropTTA) strategy, which leverages the homography consensus of cropped image views to effectively measure data uncertainty. This approach is complemented by Deep Ensembles (DE) employed for model uncertainty, offering comparable performance with improved efficiency and seamless integration with any DHE model. Extensive experiments across multiple DHE models demonstrate the effectiveness and efficiency of CropTTA in TG applications. Analysis of detected failure cases underscores the improved reliability of CropTTA under challenging conditions. Finally, we demonstrate the capability of combining CropTTA and DE for a comprehensive assessment of both data and model uncertainty. Our research provides profound insights into the broader intersection of localization and uncertainty estimation. The code and models are publicly available.
Authors: Saeed Karimi-Bidhendi, Giovanni Geraci, Hamid Jafarkhani
Abstract: We present a general mathematical framework for optimizing cell deployment and antenna configuration in wireless networks, inspired by quantization theory. Unlike traditional methods, our framework supports networks with deterministically located nodes, enabling modeling and optimization under controlled deployment scenarios. We demonstrate our framework through two applications: joint fine‑tuning of antenna parameters across base stations (BSs) to optimize network coverage, capacity, and load balancing, and the strategic deployment of new BSs, including the optimization of their locations and antenna settings. These optimizations are conducted for a heterogeneous 3D user population, comprising ground users (GUEs) and uncrewed aerial vehicles (UAVs) along aerial corridors. Our case studies highlight the framework's versatility in optimizing performance metrics such as the coverage‑capacity trade‑off and capacity per region. Our results confirm that optimizing the placement and orientation of additional BSs consistently outperforms approaches focused solely on antenna adjustments, regardless of GUE distribution. Furthermore, joint optimization for both GUEs and UAVs significantly enhances UAV service without severely affecting GUE performance.
Authors: Taneya Sharma, Seyed Ahmad Soleymani, Mohammad Shojafar, Rahim Tafazolli
Abstract: This paper introduces a secure communication architecture for Unmanned Aerial Vehicles (UAVs) and ground stations in 5G networks, addressing critical challenges in network security. The proposed solution integrates the Advanced Encryption Standard (AES) with Elliptic Curve Cryptography (ECC) and CRYSTALS‑Kyber for key encapsulation, offering a hybrid cryptographic approach. By incorporating CRYSTALS‑Kyber, the framework mitigates vulnerabilities in ECC against quantum attacks, positioning it as a quantum‑resistant alternative. The architecture is based on a server‑client model, with UAVs functioning as clients and the ground station acting as the server. The system was rigorously evaluated in both VPN and 5G environments. Experimental results confirm that CRYSTALS‑Kyber delivers strong protection against quantum threats with minimal performance overhead, making it highly suitable for UAVs with resource constraints. Moreover, the proposed architecture integrates an Artificial Intelligence (AI)‑based Intrusion Detection System (IDS) to further enhance security. In performance evaluations, the IDS demonstrated strong results across multiple models with XGBoost, particularly in more demanding scenarios, outperforming other models with an accuracy of 97.33% and an AUC of 0.94. These findings underscore the potential of combining quantum‑resistant encryption mechanisms with AI‑driven IDS to create a robust, scalable, and secure communication framework for UAV networks, particularly within the high‑performance requirements of 5G environments.
Authors: Arthur Amorim, Max Taylor, Trevor Kann, William L. Harrison, Gary T. Leavens, Lance Joneckis
Abstract: A compromised system component can issue message sequences that are legal while also leading the overall system into unsafe states. Such stealthy attacks are challenging to characterize, because message interfaces in standard languages specify each individual message separately but do not specify safe sequences of messages. We present initial results from ongoing work applying refined multiparty session types as a mechanism for expressing and enforcing proper message usage to exclude unsafe sequences. We illustrate our approach by using refined multiparty session types to mitigate safety and security issues in the MAVLink protocol commonly used in UAVs.
Authors: Xudong Wang, Yaxin Peng, Chaomin Shen
Abstract: Object detection in unmanned aerial vehicle (UAV) remote sensing images poses significant challenges due to unstable image quality, small object sizes, complex backgrounds, and environmental occlusions. Small objects, in particular, occupy small portions of images, making their accurate detection highly difficult. Existing multi‑scale feature fusion methods address these challenges to some extent by aggregating features across different resolutions. However, they often fail to effectively balance the classification and localization performance for small objects, primarily due to insufficient feature representation and imbalanced network information flow. In this paper, we propose a novel feature fusion framework specifically designed for UAV object detection tasks to enhance both localization accuracy and classification performance. The proposed framework integrates hybrid upsampling and downsampling modules, enabling feature maps from different network depths to be flexibly adjusted to arbitrary resolutions. This design facilitates cross‑layer connections and multi‑scale feature fusion, ensuring improved representation of small objects. Our approach leverages hybrid downsampling to enhance fine‑grained feature representation, improving spatial localization of small targets, even under complex conditions. Simultaneously, the upsampling module aggregates global contextual information, optimizing feature consistency across scales and enhancing classification robustness in cluttered scenes. Experimental results on two public UAV datasets demonstrate the effectiveness of the proposed framework. Integrated into the YOLO‑v10 model, our method achieves a 2% improvement in average precision (AP) compared to the baseline YOLO‑v10 model, while maintaining the same number of parameters. These results highlight the potential of our framework for accurate and efficient UAV object detection.
Authors: Zhuangzhuang Cui, Cesar Briso-Rodriguez, Ke Guan, Cesar Calvo-Ramirez, Bo Ai, Zhangdui Zhong
Abstract: In the design of unmanned aerial vehicle (UAV) wireless communications, a better understanding of propagation characteristics and an accurate channel model are required. Measurements and comprehensive analysis for the UAV‑based air‑ground (AG) propagation channel in the vertical dimension are presented in this letter. Based on the measurement data at 1 and 4 GHz, the large‑scale and small‑scale channel parameters are extracted in the line‑of‑sight (LOS) and nonLOS case, respectively. The altitude‑dependent path loss model is proposed herein. Furthermore, shadow fading and fast fading are statistically analyzed for comprehensively describing the fading behavior. Our results will be useful in the modeling of AG channels and the performance analysis for UAV‑enabled wireless communication systems.
Authors: Bazeela Banday, Chandan Kumar Sah, Jishnu Keshavan
Abstract: This paper presents an optic flow‑guided approach for achieving soft landings by resource‑constrained unmanned aerial vehicles (UAVs) on dynamic platforms. An offline data‑driven linear model based on Koopman operator theory is developed to describe the underlying (nonlinear) dynamics of optic flow output obtained from a single monocular camera that maps to vehicle acceleration as the control input. Moreover, a novel adaptation scheme within the Koopman framework is introduced online to handle uncertainties such as unknown platform motion and ground effect, which exert a significant influence during the terminal stage of the descent process. Further, to minimize computational overhead, an event‑based adaptation trigger is incorporated into an event‑driven Model Predictive Control (MPC) strategy to regulate optic flow and track a desired reference. A detailed convergence analysis ensures global convergence of the tracking error to a uniform ultimate bound while ensuring Zeno‑free behavior. Simulation results demonstrate the algorithm's robustness and effectiveness in landing on dynamic platforms under ground effect and sensor noise, which compares favorably to non‑adaptive event‑triggered and time‑triggered adaptive schemes.
Authors: Zhang Liu, Dusit Niyato, Jiacheng Wang, Geng Sun, Lianfen Huang, Zhibin Gao, Xianbin Wang
Abstract: Lyapunov optimization theory has recently emerged as a powerful mathematical framework for solving complex stochastic optimization problems by transforming long‑term objectives into a sequence of real‑time short‑term decisions while ensuring system stability. This theory is particularly valuable in unmanned aerial vehicle (UAV)‑based low‑altitude economy (LAE) networking scenarios, where it could effectively address inherent challenges of dynamic network conditions, multiple optimization objectives, and stability requirements. Recently, generative artificial intelligence (GenAI) has garnered significant attention for its unprecedented capability to generate diverse digital content. Extending beyond content generation, in this paper, we propose a framework integrating generative diffusion models with reinforcement learning to address Lyapunov optimization problems in UAV‑based LAE networking. We begin by introducing the fundamentals of Lyapunov optimization theory and analyzing the limitations of both conventional methods and traditional AI‑enabled approaches. We then examine various GenAI models and comprehensively analyze their potential contributions to Lyapunov optimization. Subsequently, we develop a Lyapunov‑guided generative diffusion model‑based reinforcement learning framework and validate its effectiveness through a UAV‑based LAE networking case study. Finally, we outline several directions for future research.
Authors: Micah Reich
Abstract: This article presents an error‑state Linear Quadratic Regulator (LQR) formulation for robust trajectory tracking in quadrotor Unmanned Aerial Vehicles (UAVs). The proposed approach leverages error‑state dynamics and employs exponential coordinates to represent orientation errors, enabling a linearized system representation for real‑time control. The control strategy integrates an LQR‑based full‑state feedback controller for trajectory tracking, combined with a cascaded bodyrate controller to handle actuator dynamics. Detailed derivations of the error‑state dynamics, the linearization process, and the controller design are provided, highlighting the applicability of the method for precise and stable quadrotor control in dynamic environments.
Authors: Yuheng Qiu, Can Xu, Yutian Chen, Shibo Zhao, Junyi Geng, Sebastian Scherer
Abstract: Inertial odometry (IO) using only Inertial Measurement Units (IMUs) offers a lightweight and cost‑effective solution for Unmanned Aerial Vehicle (UAV) applications, yet existing learning‑based IO models often fail to generalize to UAVs due to the highly dynamic and non‑linear‑flight patterns that differ from pedestrian motion. In this work, we identify that the conventional practice of transforming raw IMU data to global coordinates undermines the observability of critical kinematic information in UAVs. By preserving the body‑frame representation, our method achieves substantial performance improvements, with a 66.7% average increase in accuracy across three datasets. Furthermore, explicitly encoding attitude information into the motion network results in an additional 23.8% improvement over prior results. Combined with a data‑driven IMU correction model (AirIMU) and an uncertainty‑aware Extended Kalman Filter (EKF), our approach ensures robust state estimation under aggressive UAV maneuvers without relying on external sensors or control inputs. Notably, our method also demonstrates strong generalizability to unseen data not included in the training set, underscoring its potential for real‑world UAV applications.
Authors: Jiawei Huang, Aimin Wang, Geng Sun, Jiahui Li, Jiacheng Wang, Dusit Niyato, Victor C. M. Leung
Abstract: Low Earth orbit (LEO) satellites can be used to assist maritime wireless communications for wide‑area data transmission. However, the extensive coverage of LEO satellites, combined with the openness of channels, can cause the communication process to suffer from security risks. This paper presents a LEO satellite‑maritime communication system assisted by low‑altitude unmanned aerial vehicle (UAV) friendly‑jamming to ensure data security at the physical layer. Since such a system requires balancing the conflicting performance metrics of secrecy rate and energy consumption of the UAV to meet evolving scenario demands, we formulate a secure satellite‑maritime communication multi‑objective optimization problem (SSMCMOP). In order to solve the dynamic and long‑term optimization problem, we reformulate it into a Markov decision process. We then propose a transformer‑enhanced soft actor‑critic (TransSAC) algorithm, which is a generative artificial intelligence‑enabled deep reinforcement learning approach to solve the reformulated problem, thus capturing strong temporal correlations and diversely exploring weights. Simulation results demonstrate that the TransSAC algorithm outperforms comparative approaches and algorithms, maximizing the secrecy rate while effectively minimizing the energy consumption of the UAV. Moreover, the results identify more suitable constraints for the system.
Authors: Talha Azfar, Kaicong Huang, Ruimin Ke
Abstract: Edge sensing and computing is rapidly becoming part of intelligent infrastructure architecture leading to operational reliance on such systems in disaster or emergency situations. In such scenarios there is a high chance of power supply failure due to power grid issues, and communication system issues due to base stations losing power or being damaged by the elements, e.g., flooding, wildfires etc. Mobile edge computing in the form of unmanned aerial vehicles (UAVs) has been proposed to provide computation offloading from these devices to conserve their battery, while the use of UAVs as relay network nodes has also been investigated previously. This paper considers the use of UAVs with further constraints on power and connectivity to prolong the life of the network while also ensuring that the data is received from the edge nodes in a timely manner. Reinforcement learning is used to investigate numerous scenarios of various levels of power and communication failure. This approach is able to identify the device most likely to fail in a given scenario, thus providing priority guidance for maintenance personnel. The evacuations of a rural town and urban downtown area are also simulated to demonstrate the effectiveness of the approach at extending the life of the most critical edge devices.
Authors: Subhrajit Barick, Chetna Singhal
Abstract: Mobile edge computing (MEC) is a promising technology to meet the increasing demands and computing limitations of complex Internet of Things (IoT) devices. However, implementing MEC in urban environments can be challenging due to factors like high device density, complex infrastructure, and limited network coverage. Network congestion and connectivity issues can adversely affect user satisfaction. Hence, in this article, we use unmanned aerial vehicle (UAV)‑assisted collaborative MEC architecture to facilitate task offloading of IoT devices in urban environments. We utilize the combined capabilities of UAVs and ground edge servers (ESs) to maximize user satisfaction and thereby also maximize the service provider's (SP) profit. We design IoT task‑offloading as joint IoT‑UAV‑ES association and UAV‑network topology optimization problem. Due to NP‑hard nature, we break the problem into two subproblems: offload strategy optimization and UAV topology optimization. We develop a Three‑sided Matching with Size and Cyclic preference (TMSC) based task offloading algorithm to find stable association between IoTs, UAVs, and ESs to achieve system objective. We also propose a K‑means based iterative algorithm to decide the minimum number of UAVs and their positions to provide offloading services to maximum IoTs in the system. Finally, we demonstrate the efficacy of the proposed task offloading scheme over benchmark schemes through simulation‑based evaluation. The proposed scheme outperforms by 19%, 12%, and 25% on average in terms of percentage of served IoTs, average user satisfaction, and SP profit, respectively, with 25% lesser UAVs, making it an effective solution to support IoT task requirements in urban environments using UAV‑assisted MEC architecture.
Authors: Guangjin Pan, Yuan Gao, Yilin Gao, Wenjun Yu, Zhiyong Zhong, Xiaoyu Yang, Xinyu Guo, Shugong Xu
Abstract: Wireless positioning technologies hold significant value for applications in autonomous driving, extended reality (XR), unmanned aerial vehicles (UAVs), and more. With the advancement of artificial intelligence (AI), leveraging AI to enhance positioning accuracy and robustness has emerged as a field full of potential. Driven by the requirements and functionalities defined in the 3rd Generation Partnership Project (3GPP) standards, AI/machine learning (ML)‑based cellular positioning is becoming a key technology to overcome the limitations of traditional methods. This paper presents a comprehensive survey of AI‑driven cellular positioning. We begin by reviewing the fundamentals of wireless positioning and AI models, analyzing their respective challenges and synergies. We provide a comprehensive review of the evolution of 3GPP positioning standards, with a focus on the integration of AI/ML in current and upcoming standard releases. Guided by the 3GPP‑defined taxonomy, we categorize and summarize state‑of‑the‑art (SOTA) research into two major classes: AI/ML‑assisted positioning and direct AI/ML‑based positioning. The former includes line‑of‑sight (LOS)/non‑line‑of‑sight (NLOS) detection, time of arrival (TOA)/time difference of arrival (TDOA) estimation, and angle prediction; the latter encompasses fingerprinting, knowledge‑assisted learning, and channel charting. Furthermore, we review representative public datasets and conduct performance evaluations of AI‑based positioning algorithms using these datasets. Finally, we conclude by summarizing the challenges and opportunities of AI‑driven wireless positioning.
Authors: Sankani Sarathchandra, Eslam Eldeeb, Mohammad Shehab, Hirley Alves, Konstantin Mikhaylov, Mohamed-Slim Alouini
Abstract: Age‑of‑information (AoI) and transmission power are crucial performance metrics in low energy wireless networks, where information freshness is of paramount importance. This study examines a power‑limited internet of things (IoT) network supported by a flying unmanned aerial vehicle(UAV) that collects data. Our aim is to optimize the UAV flight trajectory and scheduling policy to minimize a varying AoI and transmission power combination. To tackle this variation, this paper proposes a meta‑deep reinforcement learning (RL) approach that integrates deep Q‑networks (DQNs) with model‑agnostic meta‑learning (MAML). DQNs determine optimal UAV decisions, while MAML enables scalability across varying objective functions. Numerical results indicate that the proposed algorithm converges faster and adapts to new objectives more effectively than traditional deep RL methods, achieving minimal AoI and transmission power overall.
Authors: Viktor Kozák, Karel Košnar, Jan Chudoba, Miroslav Kulich, Libor Přeučil
Abstract: Inspection systems utilizing unmanned aerial vehicles (UAVs) equipped with thermal cameras are increasingly popular for the maintenance of photovoltaic (PV) power plants. However, automation of the inspection task is a challenging problem as it requires precise navigation to capture images from optimal distances and viewing angles. This paper presents a novel localization pipeline that directly integrates PV module detection with UAV navigation, allowing precise positioning during inspection. The detections are used to identify the power plant structures in the image. These are associated with the power plant model and used to infer the UAV position relative to the inspected PV installation. We define visually recognizable anchor points for the initial association and use object tracking to discern global associations. Additionally, we present three different methods for visual segmentation of PV modules and evaluate their performance in relation to the proposed localization pipeline. The presented methods were verified and evaluated using custom aerial inspection data sets, demonstrating their robustness and applicability for real‑time navigation. Additionally, we evaluate the influence of the power plant model precision on the localization methods.
Authors: Mhd Ali Shehadeh, Jakub Kudela
Abstract: The Unmanned Aerial Vehicle (UAV) path planning problem is a complex optimization problem in the field of robotics. In this paper, we investigate the possible utilization of this problem in benchmarking global optimization methods. We devise a problem instance generator and pick 56 representative instances, which we compare to established benchmarking suits through Exploratory Landscape Analysis to show their uniqueness. For the computational comparison, we select twelve well‑performing global optimization techniques from both subfields of stochastic algorithms (evolutionary computation methods) and deterministic algorithms (Dividing RECTangles, or DIRECT‑type methods). The experiments were conducted in settings with varying dimensionality and computational budgets. The results were analyzed through several criteria (number of best‑found solutions, mean relative error, Friedman ranks) and utilized established statistical tests. The best‑ranking methods for the UAV problems were almost universally the top‑performing evolutionary techniques from recent competitions on numerical optimization at the Institute of Electrical and Electronics Engineers Congress on Evolutionary Computation. Lastly, we discussed the variable dimension characteristics of the studied UAV problems that remain still largely under‑investigated.
Authors: Yuhan Hu, Yirong Sun, Yanjun Chen, Xinghao Chen, Xiaoyu Shen, Wei Zhang
Abstract: Unmanned Aerial Vehicles (UAVs) offer significant potential in dynamic, perception‑intensive tasks such as search and rescue and environmental monitoring; however, their effectiveness is severely restricted by conventional pre‑planned routing methods, which lack the flexibility to respond in real‑time to evolving task demands, unexpected disturbances, and localized view limitations in real‑world scenarios. To address this fundamental limitation, we introduce a novel multi‑agent reinforcement learning framework named Heterogeneous Graph Attention Multi‑agent Deep Deterministic Policy Gradient (HGAM), uniquely designed to enable adaptive real‑time coordination between mission UAVs (MUAVs) and charging UAVs (CUAVs). HGAM specifically addresses the previously unsolved challenge of enabling precise, decentralized continuous‑action coordination solely based on local, heterogeneous graph‑based observations. Extensive simulations demonstrate that HGAM substantially surpasses existing methods, achieving, for example, a 30% improvement in data collection coverage and a 20% increase in charging efficiency, providing crucial insights and foundations for the future deployment of intelligent, flexible UAV networks in complex, dynamic environments.
Authors: Abdul Saboor, Zhuangzhuang Cui, Evgenii Vinogradov, Sofie Pollin
Abstract: Path Loss (PL) is vital to evaluate the performance of Unmanned Aerial Vehicles (UAVs) as Aerial Base Stations (ABSs), particularly in urban environments with complex propagation due to various obstacles. Accurately modeling PL requires a generalized Probability of Line‑of‑Sight (PLoS) that can consider multiple obstructions. While the existing PLoS models mostly assume a simplified Manhattan grid with uniform building sizes and spacing, they overlook the real‑world variability in building dimensions. Furthermore, such models do not consider other obstacles, such as trees and streetlights, which may also impact the performance, especially in millimeter‑wave (mmWave) bands. This paper introduces a Manhattan Random Simulator (MRS) to estimate PLoS for UAV‑based communications in urban areas by incorporating irregular building shapes, non‑uniform spacing, and additional random obstacles to create a more realistic environment. Lastly, we present the PL differences with and without obstacles for standard urban environments and derive the empirical PL for these environments.
Authors: Abdul Saboor, Zhuangzhuang Cui, Evgenii Vinogradov, Sofie Pollin
Abstract: Accurate Probability of Line‑of‑Sight (PLoS) modeling is important in evaluating the performance of Unmanned Aerial Vehicle (UAV)‑based communication systems in urban environments, where real‑time communication and low latency are often major requirements. Existing PLoS models often rely on simplified Manhattan grid layouts using International Telecommunication Union (ITU)‑defined built‑up parameters, which may not reflect the randomness of real cities. Therefore, this paper introduces the Urban LoS Simulator (ULS) to model PLoS for three random city layouts with varying building sizes and shapes constructed using ITU built‑up parameters. Based on the ULS simulated data, we obtained the empirical PLoS for four standard urban environments across three different city layouts. Finally, we analyze how well Manhattan grid‑based models replicate PLoS results from random and real‑world layouts, providing insights into their applicability for time‑critical communication systems in urban IoT networks.
Authors: Gavin Jager, David Cornett, Gavin Glenn, Deniz Aykac, Christi Johnson, Robert Zhang, Ryan Shivers, David Bolme, Laura Davies, Scott Dolvin, Nell Barber, Joel Brogan, Nick Burchfield, Carl Dukes, Andrew Duncan, Regina Ferrell, Austin Garrett, Jim Goddard, Jairus Hines, Bart Murphy, Sean Pharris, Brandon Stockwell, Leanne Thompson, Matthew Yohe
Abstract: The state‑of‑the‑art in biometric recognition algorithms and operational systems has advanced quickly in recent years providing high accuracy and robustness in more challenging collection environments and consumer applications. However, the technology still suffers greatly when applied to non‑conventional settings such as those seen when performing identification at extreme distances or from elevated cameras on buildings or mounted to UAVs. This paper summarizes an extension to the largest dataset currently focused on addressing these operational challenges, and describes its composition as well as methodologies of collection, curation, and annotation.
Authors: Ozlem Ceviz, Sevil Sen, Pinar Sadioglu
Abstract: Flying Ad Hoc Networks (FANETs), which primarily interconnect Unmanned Aerial Vehicles (UAVs), present distinctive security challenges due to their distributed and dynamic characteristics, necessitating tailored security solutions. Intrusion detection in FANETs is particularly challenging due to communication costs, and privacy concerns. While Federated Learning (FL) holds promise for intrusion detection in FANETs with its cooperative and decentralized model training, it also faces drawbacks such as large data requirements, power consumption, and time constraints. Moreover, the high speeds of nodes in dynamic networks like FANETs may disrupt communication among Intrusion Detection Systems (IDS). In response, our study explores the use of few‑shot learning (FSL) to effectively reduce the data required for intrusion detection in FANETs. The proposed approach called Few‑shot Federated Learning‑based IDS (FSFL‑IDS) merges FL and FSL to tackle intrusion detection challenges such as privacy, power constraints, communication costs, and lossy links, demonstrating its effectiveness in identifying routing attacks in dynamic FANETs.This approach reduces both the local models and the global model's training time and sample size, offering insights into reduced computation and communication costs and extended battery life. Furthermore, by employing FSL, which requires less data for training, IDS could be less affected by lossy links in FANETs.
Authors: Jinhui Pang, Jinglin He, Noureldin Mohamed Abdelaal Ahmed Mohamed, Changqing Lin, Zhihui Zhang, Xiaoshuai Hao
Abstract: Multi‑UAV air combat is a complex task involving multiple autonomous UAVs, an evolving field in both aerospace and artificial intelligence. This paper aims to enhance adversarial performance through collaborative strategies. Previous approaches predominantly discretize the action space into predefined actions, limiting UAV maneuverability and complex strategy implementation. Others simplify the problem to 1v1 combat, neglecting the cooperative dynamics among multiple UAVs. To address the high‑dimensional challenges inherent in six‑degree‑of‑freedom space and improve cooperation, we propose a hierarchical framework utilizing the Leader‑Follower Multi‑Agent Proximal Policy Optimization (LFMAPPO) strategy. Specifically, the framework is structured into three levels. The top level conducts a macro‑level assessment of the environment and guides execution policy. The middle level determines the angle of the desired action. The bottom level generates precise action commands for the high‑dimensional action space. Moreover, we optimize the state‑value functions by assigning distinct roles with the leader‑follower strategy to train the top‑level policy, followers estimate the leader's utility, promoting effective cooperation among agents. Additionally, the incorporation of a target selector, aligned with the UAVs' posture, assesses the threat level of targets. Finally, simulation experiments validate the effectiveness of our proposed method.
Authors: Yihao Dong, Muhayyu Ud Din, Francesco Lagala, Hailiang Kuang, Jianjun Sun, Siyuan Yang, Irfan Hussain, Shaoming He
Abstract: This paper introduces an innovative drone carrier concept that is applied in maritime port security or offshore rescue. This system works with a heterogeneous system consisting of multiple Unmanned Aerial Vehicles (UAVs) and Unmanned Surface Vehicles (USVs) to perform inspection and intervention tasks in GNSS‑denied or interrupted environments. The carrier, an electric catamaran measuring 4m by 7m, features a 4m by 6m deck supporting automated takeoff and landing for four DJI M300 drones, along with a 10kg‑payload manipulator operable in up to level 3 sea conditions. Utilizing an offshore gimbal camera for navigation, the carrier can autonomously navigate, approach and dock with non‑cooperative vessels, guided by an onboard camera, LiDAR, and Doppler Velocity Log (DVL) over a 3 km^2 area. UAVs equipped with onboard Ultra‑Wideband (UWB) technology execute mapping, detection, and manipulation tasks using a versatile gripper designed for wet, saline conditions. Additionally, two UAVs can coordinate to transport large objects to the manipulator or interact directly with them. These procedures are fully automated and were successfully demonstrated at the Mohammed Bin Zayed International Robotic Competition (MBZIRC2024), where the drone carrier equipped with four UAVS and one manipulator, automatically accomplished the intervention tasks in sea‑level‑3 (wave height 1.25m) based on the rough target information.
Authors: Branislava Jankovic, Sabina Jangirova, Waseem Ullah, Latif U. Khan, Mohsen Guizani
Abstract: Dangerous surroundings and difficult‑to‑reach landscapes introduce significant complications for adequate disaster management and recuperation. These problems can be solved by engaging unmanned aerial vehicles (UAVs) provided with embedded platforms and optical sensors. In this work, we focus on enabling onboard aerial image processing to ensure proper and real‑time disaster detection. Such a setting usually causes challenges due to the limited hardware resources of UAVs. However, privacy, connectivity, and latency issues can be avoided. We suggest a UAV‑assisted edge framework for disaster detection, leveraging our proposed model optimized for onboard real‑time aerial image classification. The optimization of the model is achieved using post‑training quantization techniques. To address the limited number of disaster cases in existing benchmark datasets and therefore ensure real‑world adoption of our model, we construct a novel dataset, DisasterEye, featuring disaster scenes captured by UAVs and individuals on‑site. Experimental results reveal the efficacy of our model, reaching high accuracy with lowered inference latency and memory use on both traditional machines and resource‑limited devices. This shows that the scalability and adaptability of our method make it a powerful solution for real‑time disaster management on resource‑constrained UAV platforms.
Authors: Yubo Yang, Tao Yang, Xiaofeng Wu, Bo Hu
Abstract: The rapid development of Unmanned aerial vehicles (UAVs) technology has spawned a wide variety of applications, such as emergency communications, regional surveillance, and disaster relief. Due to their limited battery capacity and processing power, multiple UAVs are often required for complex tasks. In such cases, a control center is crucial for coordinating their activities, which fits well with the federated learning (FL) framework. However, conventional FL approaches often focus on a single task, ignoring the potential of training multiple related tasks simultaneously. In this paper, we propose a UAV‑assisted multi‑task federated learning scheme, in which data collected by multiple UAVs can be used to train multiple related tasks concurrently. The scheme facilitates the training process by sharing feature extractors across related tasks and introduces a task attention mechanism to balance task performance and encourage knowledge sharing. To provide an analytical description of training performance, the convergence analysis of the proposed scheme is performed. Additionally, the optimal bandwidth allocation for UAVs under limited bandwidth conditions is derived to minimize communication time. Meanwhile, a UAV‑EV association strategy based on coalition formation game is proposed. Simulation results validate the effectiveness of the proposed scheme in enhancing multi‑task performance and training speed.
Authors: Panayiota Valianti, Kleanthis Malialis, Panayiotis Kolios, Georgios Ellinas
Abstract: This work considers the problem of intercepting rogue drones targeting sensitive critical infrastructure facilities. While current interception technologies focus mainly on the jamming/spoofing tasks, the challenges of effectively locating and tracking rogue drones have not received adequate attention. Solving this problem and integrating with recently proposed interception techniques will enable a holistic system that can reliably detect, track, and neutralize rogue drones. Specifically, this work considers a team of pursuer UAVs that can search, detect, and track multiple rogue drones over a sensitive facility. The joint search and track problem is addressed through a novel multiagent reinforcement learning scheme to optimize the agent mobility control actions that maximize the number of rogue drones detected and tracked. The performance of the proposed system is investigated under realistic settings through extensive simulation experiments with varying number of agents demonstrating both its performance and scalability.
Authors: Ciem Cornelissen, Sam Leroux, Pieter Simoens
Abstract: Unmanned Aerial Vehicles (UAVs) combined with Hyperspectral imaging (HSI) offer potential for environmental and agricultural applications by capturing detailed spectral information that enables the prediction of invisible features like biochemical leaf properties. However, the data‑intensive nature of HSI poses challenges for remote devices, which have limited computational resources and storage. This paper introduces an Online Hyperspectral Simple Linear Iterative Clustering algorithm (OHSLIC) framework for real‑time tree phenotype segmentation. OHSLIC reduces inherent noise and computational demands through adaptive incremental clustering and a lightweight neural network, which phenotypes trees using leaf contents such as chlorophyll, carotenoids, and anthocyanins. A hyperspectral dataset is created using a custom simulator that incorporates realistic leaf parameters, and light interactions. Results demonstrate that OHSLIC achieves superior regression accuracy and segmentation performance compared to pixel‑ or window‑based methods while significantly reducing inference time. The method`s adaptive clustering enables dynamic trade‑offs between computational efficiency and accuracy, paving the way for scalable edge‑device deployment in HSI applications.
Authors: Joseanne Viana, Boris Galkin, Lester Ho, Holger Claussen
Abstract: Unmanned Aerial Vehicles (UAVs) are increasingly essential in various fields such as surveillance, reconnaissance, and telecommunications. This study aims to develop a learning algorithm for the path planning of UAV wireless communication relays, which can reduce storage requirements and accelerate Deep Reinforcement Learning (DRL) convergence. Assuming the system possesses terrain maps of the area and can estimate user locations using localization algorithms or direct GPS reporting, it can input these parameters into the learning algorithms to achieve optimized path planning performance. However, higher resolution terrain maps are necessary to extract topological information such as terrain height, object distances, and signal blockages. This requirement increases memory and storage demands on UAVs while also lengthening convergence times in DRL algorithms. Similarly, defining the telecommunication coverage map in UAV wireless communication relays using these terrain maps and user position estimations demands higher memory and storage utilization for the learning path planning algorithms. Our approach reduces path planning training time by applying a dimensionality reduction technique based on Principal Component Analysis (PCA), sample combination, Prioritized Experience Replay (PER), and the combination of Mean Squared Error (MSE) and Mean Absolute Error (MAE) loss calculations in the coverage map estimates, thereby enhancing a Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm. The proposed solution reduces the convergence episodes needed for basic training by approximately four times compared to the traditional TD3.
Authors: Ruchita Singh, Sandeep Kumar
Abstract: Unmanned Aerial Vehicles (UAVs), commonly known as Drones, are one of 21st century most transformative technologies. Emerging first for military use, advancements in materials, electronics, and software have catapulted drones into multipurpose tools for a wide range of industries. In this paper, we have covered the history, taxonomy, architecture, navigation systems and branched activities for the same. It explores important future trends like autonomous navigation, AI integration, and obstacle avoidance systems, emphasizing how they contribute to improving the efficiency and versatility of drones. It also looks at the major challenges like technical, environmental, economic, regulatory and ethical, that limit the actual take‑up of drones, as well as trends that are likely to mitigate these obstacles in the future. This work offers a structured synthesis of existing studies and perspectives that enable insights about how drones will transform agriculture, logistics, healthcare, disaster management, and other areas, while also identifying new opportunities for innovation and development.
Authors: Minh Tu Nguyen, Van Truong Hoang, Manh Duong Phung, Van Hoa Doan
Abstract: This paper investigates an adaptive sliding‑mode control for an integrated UAV autopilot and guidance system. First, a two‑dimensional mathematical model of the system is derived by considering the incorporated lateral dynamics and relative kinematics of the UAV and its potential target of attack. Then, a sliding surface is derived utilizing the zero‑effort miss distance. An adaptive twisting sliding mode (ATSMC) algorithm is applied to the integrated system. Simulation and comparisons have been accomplished. The results show our proposed design performs well in interception precision, even with high nonlinearity, uncertainties, disturbances, and abrupt changes in the target's movement, thanks to the adaptation strategy.
Authors: Kenneth Bonilla-Ormachea, Horacio Cuizaga, Edwin Salcedo, Sebastian Castro, Sergio Fernandez-Testa, Misael Mamani
Abstract: Early detection of forest fires is crucial to minimizing the environmental and socioeconomic damage they cause. Indeed, a fire's duration directly correlates with the difficulty and cost of extinguishing it. For instance, a fire burning for 1 minute might require 1 liter of water to extinguish, while a 2‑minute fire could demand 100 liters, and a 10‑minute fire might necessitate 1,000 liters. On the other hand, existing fire detection systems based on novel technologies (e.g., remote sensing, PTZ cameras, UAVs) are often expensive and require human intervention, making continuous monitoring of large areas impractical. To address this challenge, this work proposes a low‑cost forest fire detection system that utilizes a central gateway device with computer vision capabilities to monitor a 360° field of view for smoke at long distances. A deep reinforcement learning agent enhances surveillance by dynamically controlling the camera's orientation, leveraging real‑time sensor data (smoke levels, ambient temperature, and humidity) from distributed IoT devices. This approach enables automated wildfire monitoring across expansive areas while reducing false positives.
Authors: Van Truong Hoang, Manh Duong Phung
Abstract: This work addresses the path planning problem for a group of unmanned aerial vehicles (UAVs) to maintain a desired formation during operation. Our approach formulates the problem as an optimization task by defining a set of fitness functions that not only ensure the formation but also include constraints for optimal and safe UAV operation. To optimize the fitness function and obtain a suboptimal path, we employ the teaching‑learning‑based optimization algorithm and then further enhance it with mechanisms such as mutation, elite strategy, and multi‑subject combination. A number of simulations and experiments have been conducted to evaluate the proposed method. The results demonstrate that the algorithm successfully generates valid paths for the UAVs to fly in a triangular formation for an inspection task.
Authors: Li-Hsiang Shen, Yi-Hsuan Chiu
Abstract: This paper investigates reconfigurable intelligent surface (RIS)‑assisted unmanned aerial vehicle (UAV) downlink networks with fluid antennas (FA), where RIS enables non‑line‑of‑sight (NLoS) transmissions. Moreover, the FA is equipped on the UAV offering dynamic antenna position adjustment, enhancing spatial diversity besides UAV deployment. We aim at total downlink rate maximization while ensuring minimum user rate requirement. We consider joint optimization of active UAV beamforming, passive RIS beamforming, UAV deployment and FA position adjustment. To address the complex problem, we propose beamfomring for RIS/UAV and FA‑UAV deployment (BRAUD) scheme by employing alternative optimization, successive convex approximation (SCA) and sequential rank‑one constraint relaxation (SROCR) method for the decomposed subproblems. Simulation results demonstrate the effectiveness of RIS‑FA‑UAV, achieving the highest rate among existing architectures without FA/UAV/RIS deployment and without proper beamforming. Moreover, BRAUD achieves the highest rate among benchmarks of drop‑rank method, heuristic optimizations and conventional zero‑forcing beamforming as well as random method.
Authors: Junteng Mao, Ziye Jia, Hanzhi Gu, Chenyu Shi, Haomin Shi, Lijun He, Qihui Wu
Abstract: The unmanned aerial vehicles (UAVs) are efficient tools for diverse tasks such as electronic reconnaissance, agricultural operations and disaster relief. In the complex three‑dimensional (3D) environments, the path planning with obstacle avoidance for UAVs is a significant issue for security assurance. In this paper, we construct a comprehensive 3D scenario with obstacles and no‑fly zones for dynamic UAV trajectory. Moreover, a novel artificial potential field algorithm coupled with simulated annealing (APF‑SA) is proposed to tackle the robust path planning problem. APF‑SA modifies the attractive and repulsive potential functions and leverages simulated annealing to escape local minimum and converge to globally optimal solutions. Simulation results demonstrate that the effectiveness of APF‑SA, enabling efficient autonomous path planning for UAVs with obstacle avoidance.
Authors: Amit Kumar Bhuyan, Hrishikesh Dutta, Subir Biswas
Abstract: This paper introduces an Unmanned Aerial Vehicle ‑ enabled content management architecture that is suitable for critical content access in communities of users that are communication‑isolated during diverse types of disaster scenarios. The proposed architecture leverages a hybrid network of stationary anchor UAVs and mobile Micro‑UAVs for ubiquitous content dissemination. The anchor UAVs are equipped with both vertical and lateral communication links, and they serve local users, while the mobile micro‑ferrying UAVs extend coverage across communities with increased mobility. The focus is on developing a content dissemination system that dynamically learns optimal caching policies to maximize content availability. The core innovation is an adaptive content dissemination framework based on distributed Federated Multi‑Armed Bandit learning. The goal is to optimize UAV content caching decisions based on geo‑temporal content popularity and user demand variations. A Selective Caching Algorithm is also introduced to reduce redundant content replication by incorporating inter‑UAV information sharing. This method strategically preserves the uniqueness in user preferences while amalgamating the intelligence across a distributed learning system. This approach improves the learning algorithm's ability to adapt to diverse user preferences. Functional verification and performance evaluation confirm the proposed architecture's utility across different network sizes, UAV swarms, and content popularity patterns.
Authors: Sajad Khatiri, Fatemeh Mohammadi Amin, Sebastiano Panichella, Paolo Tonella
Abstract: Despite the recent developments in obstacle avoidance and other safety features, autonomous Unmanned Aerial Vehicles (UAVs) continue to face safety challenges. No previous work investigated the relationship between the behavioral uncertainty of a UAV, characterized in this work by inconsistent or erratic control signal patterns, and the unsafety of its flight. By quantifying uncertainty, it is possible to develop a predictor for unsafety, which acts as a flight supervisor. We conducted a large‑scale empirical investigation of safety violations using PX4‑Autopilot, an open‑source UAV software platform. Our dataset of over 5,000 simulated flights, created to challenge obstacle avoidance, allowed us to explore the relation between uncertain UAV decisions and safety violations: up to 89% of unsafe UAV states exhibit significant decision uncertainty, and up to 74% of uncertain decisions lead to unsafe states. Based on these findings, we implemented Superialist (Supervising Autonomous Aerial Vehicles), a runtime uncertainty detector based on autoencoders, the state‑of‑the‑art technology for anomaly detection. Superialist achieved high performance in detecting uncertain behaviors with up to 96% precision and 93% recall. Despite the observed performance degradation when using the same approach for predicting unsafety (up to 74% precision and 87% recall), Superialist enabled early prediction of unsafe states up to 50 seconds in advance.
Authors: Raúl Arranz, David Carramiñana, Gonzalo de Miguel, Juan A. Besada, Ana M. Bernardos
Abstract: This paper summarizes in depth the state of the art of aerial swarms, covering both classical and new reinforcement‑learning‑based approaches for their management. Then, it proposes a hybrid AI system, integrating deep reinforcement learning in a multi‑agent centralized swarm architecture. The proposed system is tailored to perform surveillance of a specific area, searching and tracking ground targets, for security and law enforcement applications. The swarm is governed by a central swarm controller responsible for distributing different search and tracking tasks among the cooperating UAVs. Each UAV agent is then controlled by a collection of cooperative sub‑agents, whose behaviors have been trained using different deep reinforcement learning models, tailored for the different task types proposed by the swarm controller. More specifically, proximal policy optimization (PPO) algorithms were used to train the agents' behavior. In addition, several metrics to assess the performance of the swarm in this application were defined. The results obtained through simulation show that our system searches the operation area effectively, acquires the targets in a reasonable time, and is capable of tracking them continuously and consistently.
Authors: Giovanny Vazquez, Shengjie Zhai, Mei Yang
Abstract: Autonomous unmanned aerial vehicles (UAVs) integrated with edge computing capabilities empower real‑time data processing directly on the device, dramatically reducing latency in critical scenarios such as wildfire detection. This study underscores Transfer Learning's (TL) significance in boosting the performance of object detectors for identifying wildfire smoke and flames, especially when trained on limited datasets, and investigates the impact TL has on edge computing metrics. With the latter focusing how TL‑enhanced You Only Look Once (YOLO) models perform in terms of inference time, power usage, and energy consumption when using edge computing devices. This study utilizes the Aerial Fire and Smoke Essential (AFSE) dataset as the target, with the Flame and Smoke Detection Dataset (FASDD) and the Microsoft Common Objects in Context (COCO) dataset serving as source datasets. We explore a two‑stage cascaded TL method, utilizing D‑Fire or FASDD as initial stage target datasets and AFSE as the subsequent stage. Through fine‑tuning, TL significantly enhances detection precision, achieving up to 79.2% mean Average Precision (mAP@0.5), reduces training time, and increases model generalizability across the AFSE dataset. However, cascaded TL yielded no notable improvements and TL alone did not benefit the edge computing metrics evaluated. Lastly, this work found that YOLOv5n remains a powerful model when lacking hardware acceleration, finding that YOLOv5n can process images nearly twice as fast as its newer counterpart, YOLO11n. Overall, the results affirm TL's role in augmenting the accuracy of object detectors while also illustrating that additional enhancements are needed to improve edge computing performance.
Authors: Shahab Ataei, Dipankar Maity, Debdipta Goswami
Abstract: Koopman‑based lifted linear identification have been widely used for data‑driven prediction and model predictive control (MPC) of nonlinear systems. It has found applications in flow‑control, soft robotics, and unmanned aerial vehicles (UAV). For autonomous systems, this system identification method works by embedding the nonlinear system in a higher‑dimensional linear space and computing a finite‑dimensional approximation of the corresponding Koopman operator with the Extended Dynamic Mode Decomposition (EDMD) algorithm. EDMD is a data‑driven algorithm that estimates an approximate linear system by lifting the state data‑snapshots via nonlinear dictionary functions. For control systems, EDMD is further modified to utilize both state and control data‑snapshots to estimate a lifted linear predictor with control input. This article investigates how the estimation process is affected when the data is quantized. Specifically, we examine the fundamental connection between estimates of the linear predictor matrices obtained from unquantized data and those from quantized data via modified EDMD. Furthermore, using the law of large numbers, we demonstrate that, under a large data regime, the quantized estimate can be considered a regularized version of the unquantized estimate. We also explore the relationship between the two estimates in the finite data regime. We further analyze the effect of nonlinear lifting functions on this regularization due to quantization. The theory is validated through repeated numerical experiments conducted on several control systems. The effect of quantization on the MPC performance is also demonstrated.
Authors: Pengyu Wang, Zhaohua Yang, Jialu Li, Ling Shi
Abstract: Safety‑critical cyber‑physical systems (CPS), such as quadrotor UAVs, are particularly prone to cyber attacks, which can result in significant consequences if not detected promptly and accurately. During outdoor operations, the nonlinear dynamics of UAV systems, combined with non‑Gaussian noise, pose challenges to the effectiveness of conventional statistical and machine learning methods. To overcome these limitations, we present QUADFormer, an advanced attack detection framework for quadrotor UAVs leveraging a transformer‑based architecture. This framework features a residue generator that produces sequences sensitive to anomalies, which are then analyzed by the transformer to capture statistical patterns for detection and classification. Furthermore, an alert mechanism ensures UAVs can operate safely even when under attack. Extensive simulations and experimental evaluations highlight that QUADFormer outperforms existing state‑of‑the‑art techniques in detection accuracy.
Authors: Manzoor Ahmed, Ali Arshad Nasir, Mudassir Masood, Kamran Ali Memon, Khurram Karim Qureshi, Feroz Khan, Wali Ullah Khan, Fang Xu, Zhu Han
Abstract: Unmanned aerial vehicle (UAV)‑based integrated sensing and communication (ISAC) systems are poised to revolutionize next‑generation wireless networks by enabling simultaneous sensing and communication (S\&C). This survey comprehensively reviews UAV‑ISAC systems, highlighting foundational concepts, key advancements, and future research directions. We explore recent advancements in UAV‑based ISAC systems from various perspectives and objectives, including advanced channel estimation (CE), beam tracking, and system throughput optimization under joint sensing and communication S\&C constraints. Additionally, we examine weighted sum rate (WSR) and sensing trade‑offs, delay and age of information (AoI) minimization, energy efficiency (EE), and security enhancement. These applications highlight the potential of UAV‑based ISAC systems to improve spectrum utilization, enhance communication reliability, reduce latency, and optimize energy consumption across diverse domains, including smart cities, disaster relief, and defense operations. The survey also features summary tables for comparative analysis of existing methodologies, emphasizing performance, limitations, and effectiveness in addressing various challenges. By synthesizing recent advancements and identifying open research challenges, this survey aims to be a valuable resource for developing efficient, adaptive, and secure UAV‑based ISAC systems.
Authors: Obed Morrison Atsu, Salmane Naoumi, Roberto Bomfin, Marwa Chafii
Abstract: This paper introduces a novel Multi‑Agent Reinforcement Learning (MARL) framework to enhance integrated sensing and communication (ISAC) networks using unmanned aerial vehicle (UAV) swarms as sensing radars. By framing the positioning and trajectory optimization of UAVs as a Partially Observable Markov Decision Process, we develop a MARL approach that leverages centralized training with decentralized execution to maximize the overall sensing performance. Specifically, we implement a decentralized cooperative MARL strategy to enable UAVs to develop effective communication protocols, therefore enhancing their environmental awareness and operational efficiency. Additionally, we augment the MARL solution with a transmission power adaptation technique to mitigate interference between the communicating drones and optimize the communication protocol efficiency. Moreover, a transmission power adaptation technique is incorporated to mitigate interference and optimize the learned communication protocol efficiency. Despite the increased complexity, our solution demonstrates robust performance and adaptability across various scenarios, providing a scalable and cost‑effective enhancement for future ISAC networks.
Authors: Geng Sun, Weilong Ma, Jiahui Li, Zemin Sun, Jiacheng Wang, Dusit Niyato, Shiwen Mao
Abstract: The low‑altitude economy (LAE), driven by unmanned aerial vehicles (UAVs) and other aircraft, has revolutionized fields such as transportation, agriculture, and environmental monitoring. In the upcoming six‑generation (6G) era, UAV‑assisted mobile edge computing (MEC) is particularly crucial in challenging environments such as mountainous or disaster‑stricken areas. The computation task offloading problem is one of the key issues in UAV‑assisted MEC, primarily addressing the trade‑off between minimizing the task delay and the energy consumption of the UAV. In this paper, we consider a UAV‑assisted MEC system where the UAV carries the edge servers to facilitate task offloading for ground devices (GDs), and formulate a calculation delay and energy consumption multi‑objective optimization problem (CDECMOP) to simultaneously improve the performance and reduce the cost of the system. Then, by modeling the formulated problem as a multi‑objective Markov decision process (MOMDP), we propose a multi‑objective deep reinforcement learning (DRL) algorithm within an evolutionary framework to dynamically adjust the weights and obtain non‑dominated policies. Moreover, to ensure stable convergence and improve performance, we incorporate a target distribution learning (TDL) algorithm. Simulation results demonstrate that the proposed algorithm can better balance multiple optimization objectives and obtain superior non‑dominated solutions compared to other methods.
Authors: Hongming Chen, Biyu Ye, Xianqi Liang, Weiliang Deng, Ximin Lyu
Abstract: Aerial Manipulators (AMs) provide a versatile platform for various applications, including 3D printing, architecture, and aerial grasping missions. However, their operational speed is often sacrificed to uphold precision. Existing control strategies for AMs often regard the manipulator as a disturbance and employ robust control methods to mitigate its influence. This research focuses on elevating the precision of the end‑effector and enhancing the agility of aerial manipulator movements. We present a composite control scheme to address these challenges. Initially, a Nonlinear Disturbance Observer (NDOB) is utilized to compensate for internal coupling effects and external disturbances. Subsequently, manipulator dynamics are processed through a high pass filter to facilitate agile movements. By integrating the proposed control method into a fully autonomous delta‑arm‑based AM system, we substantiate the controller's efficacy through extensive real‑world experiments. The outcomes illustrate that the end‑effector can achieve accuracy at the millimeter level.
Authors: Daniel Rossi, Guido Borghi, Roberto Vezzani
Abstract: Designing efficient neural networks for embedded devices is a critical challenge, particularly in applications requiring real‑time performance, such as aerial imaging with drones and UAVs for emergency responses. In this work, we introduce TakuNet, a novel light‑weight architecture which employs techniques such as depth‑wise convolutions and an early downsampling stem to reduce computational complexity while maintaining high accuracy. It leverages dense connections for fast convergence during training and uses 16‑bit floating‑point precision for optimization on embedded hardware accelerators. Experimental evaluation on two public datasets shows that TakuNet achieves near‑state‑of‑the‑art accuracy in classifying aerial images of emergency situations, despite its minimal parameter count. Real‑world tests on embedded devices, namely Jetson Orin Nano and Raspberry Pi, confirm TakuNet's efficiency, achieving more than 650 fps on the 15W Jetson board, making it suitable for real‑time AI processing on resource‑constrained platforms and advancing the applicability of drones in emergency scenarios. The code and implementation details are publicly released.
Authors: Yousef Emami, Hao Zhou, Luis Almeida, Kai Li
Abstract: Unmanned Aerial Vehicles (UAVs) are increasingly adopted in modern communication networks. However, challenges in decision‑making and digital modeling continue to impede their rapid advancement. Reinforcement Learning (RL) algorithms face limitations such as low sample efficiency and limited data versatility, further magnified in UAV communication scenarios. Moreover, Digital Twin (DT) modeling introduces substantial decision‑making and data management complexities. RL models, often integrated into DT frameworks, require extensive training data to achieve accurate predictions. In contrast to traditional approaches that focus on class boundaries, Diffusion Models (DMs), a new class of generative AI, learn the underlying probability distribution from the training data and can generate trustworthy new patterns based on this learned distribution. This paper explores the integration of DMs with RL and DT to effectively address these challenges. By combining the data generation capabilities of DMs with the decision‑making framework of RL and the modeling accuracy of DT, the integration improves the adaptability and real‑time performance of UAV communication. Moreover, the study shows how DMs can alleviate data scarcity, improve policy networks, and optimize dynamic modeling, providing a robust solution for complex UAV communication scenarios.
Authors: Van Truong Hoang
Abstract: The paper investigates the problem of path planning techniques for multi‑copter uncrewed aerial vehicles (UAV) cooperation in a formation shape to examine surrounding surfaces. We first describe the problem as a joint objective cost for planning a path of the formation centroid working in a complicated space. The path planning algorithm, named the generalized particle swarm optimization algorithm, is then presented to construct an optimal, flyable path while avoiding obstacles and ensuring the flying mission requirements. A path‑development scheme is then incorporated to generate a relevant path for each drone to maintain its position in the formation configuration. Simulation, comparison, and experiments have been conducted to verify the proposed approach. Results show the feasibility of the proposed path‑planning algorithm with GEPSO.
Authors: Xiaoya Zheng, Geng Sun, Jiahui Li, Jiacheng Wang, Qingqing Wu, Dusit Niyato, Abbas Jamalipour
Abstract: The low‑altitude economy (LAE) plays an indispensable role in cargo transportation, healthcare, infrastructure inspection, and especially post‑disaster communication. Specifically, unmanned aerial vehicles (UAVs), as one of the core technologies of the LAE, can be deployed to provide communication coverage, facilitate data collection, and relay data for trapped users, thereby significantly enhancing the efficiency of post‑disaster response efforts. In this paper, we design an efficient and robust UAV‑swarm enabled collaborative self‑organizing network to facilitate post‑disaster communications. Specifically, a ground device transmits data to UAV swarms, which then use collaborative beamforming (CB) technique to form virtual antenna arrays and relay the data to a remote access point (AP) efficiently. Then, we formulate a rescue‑oriented post‑disaster transmission rate maximization optimization problem (RPTRMOP). Then, we propose a two‑stage optimization approach to address it. In the first stage, the optimal traffic routing and the theoretical upper bound on the transmission rate of the network are derived. In the second stage, we transform the formulated RPTRMOP into a variant named V‑RPTRMOP, and a diffusion model‑enabled particle swarm optimization (DM‑PSO) algorithm is proposed to deal with the V‑RPTRMOP. Simulation results show the effectiveness of the proposed two‑stage optimization approach in improving the transmission rate of the constructed network, which demonstrates the great potential for post‑disaster communications. Moreover, the robustness of the constructed network is also validated via evaluating the impact of two unexpected situations on the system transmission rate.
Authors: Oleg Sautenkov, Yasheerah Yaqoot, Artem Lykov, Muhammad Ahsan Mustafa, Grik Tadevosyan, Aibek Akhmetkazy, Miguel Altamirano Cabrera, Mikhail Martynov, Sausar Karaf, Dzmitry Tsetserukou
Abstract: The UAV‑VLA (Visual‑Language‑Action) system is a tool designed to facilitate communication with aerial robots. By integrating satellite imagery processing with the Visual Language Model (VLM) and the powerful capabilities of GPT, UAV‑VLA enables users to generate general flight paths‑and‑action plans through simple text requests. This system leverages the rich contextual information provided by satellite images, allowing for enhanced decision‑making and mission planning. The combination of visual analysis by VLM and natural language processing by GPT can provide the user with the path‑and‑action set, making aerial operations more efficient and accessible. The newly developed method showed the difference in the length of the created trajectory in 22% and the mean error in finding the objects of interest on a map in 34.22 m by Euclidean distance in the K‑Nearest Neighbors (KNN) approach.
Authors: Kechong Ren, Li Gao, Qi Guan
Abstract: The convergence of drone delivery systems, virtual worlds, and blockchain has transformed logistics and supply chain management, providing a fast, and environmentally friendly alternative to traditional ground transportation methods;Provide users with a real‑world experience, virtual service providers need to collect up‑to‑the‑minute delivery information from edge devices. To address this challenge, 1) a reinforcement learning approach is introduced to enable drones with fast training capabilities and the ability to autonomously adapt to new virtual scenarios for effective resource allocation.2) A semantic communication framework for meta‑universes is proposed, which utilizes the extraction of semantic information to reduce the communication cost and incentivize the transmission of information for meta‑universe services.3) In order to ensure that user information security, a lightweight authentication and key agreement scheme is designed between the drone and the user by introducing blockchain technology. In our experiments, the drone adaptation performance is improved by about 35%, and the local offloading rate can reach 90% with the increase of the number of base stations. The semantic communication system proposed in this paper is compared with the Cross Entropy baseline model. Introducing blockchain technology the throughput of the transaction is maintained at a stable value with different number of drones.
Authors: Junjie Wang, Fang Fang, Gangtao Han, Ning Wang, Xianbin Wang
Abstract: Secure communication is crucial in many emerging systems enabled by unmanned aerial vehicle (UAV) communication networks. To protect legitimate communication in a chaotic UAV environment, where both eavesdropping and jamming become straightforward from multiple adversaries with line‑of‑sight signal propagation, a new reliable and integrated physical layer security mechanism is proposed in this paper for a massive multiple‑input‑multiple‑output (MIMO) UAV system. Particularly, a physical layer fingerprint, also called a tag, is first embedded into each message for authentication purpose. We then propose to reuse the tag additionally as a reference to encode each message to ensure secrecy for confidentiality enhancement at a low cost. Specifically, we create a new dual‑reference symmetric tag generation mechanism by inputting an encoding‑insensitive feature of plaintext along with the key into a hash function. At a legitimate receiver, an expected tag, reliable for decoding, can be symmetrically regenerated based on the received ciphertext, and authentication can be performed by comparing the regenerated reference tag to the received tag. However, an illegitimate receiver can only receive the fuzzy tag which can not be used to decode the received message. Additionally, we introduce artificial noise (AN) to degrade eavesdropping to further decrease message leakage. To verify the efficiency of our proposed tag‑based encoding (TBE) scheme, we formulate two optimization problems including ergodic sum secrecy rate maximization and authentication fail probability minimization. The power allocation solutions are derived by difference‑of‑convex (DC) programming and the Lagrange method, respectively. The simulation results demonstrate the superior performance of the proposed TBE approach compared to the prior AN‑aided tag embedding scheme.
Authors: Manzoor Ahmed, Fang Xu, Yuanlin Lyu, Aized Amin Soofi, Yongxiao Li, Feroz Khan, Wali Ullah Khan, Muhammad Sheraz, Teong Chee Chuah, Min Deng
Abstract: This comprehensive survey examines how Reconfigurable Intelligent Surfaces (RIS) revolutionize resource allocation in various network frameworks. It begins by establishing a theoretical foundation with an overview of RIS technologies, including passive RIS, active RIS, and Simultaneously Transmitting and Reflecting RIS (STAR‑RIS). The core of the survey focuses on RIS's role in optimizing resource allocation within Single‑Input Multiple‑Output (SIMO), Multiple‑Input Single‑Output (MISO), and Multiple‑Input Multiple‑Output (MIMO) systems. It further explores RIS integration in complex network environments, such as Heterogeneous Wireless Networks (HetNets) and Non‑Orthogonal Multiple Access (NOMA) frameworks. Additionally, the survey investigates RIS applications in advanced communication domains like Terahertz (THz) networks, Vehicular Communication (VC), and Unmanned Aerial Vehicle (UAV) communications, highlighting the synergy between RIS and Artificial Intelligence (AI) for enhanced network efficiency. Summary tables provide comparative insights into various schemes. The survey concludes with lessons learned, future research directions, and challenges, emphasizing critical open issues.
Authors: Wenwen Xie, Geng Sun, Bei Liu, Jiahui Li, Jiacheng Wang, Hongyang Du, Dusit Niyato, Dong In Kim
Abstract: Emerging technologies in sixth generation (6G) of wireless communications, such as terahertz communication and ultra‑massive multiple‑input multiple‑output, present promising prospects. Despite the high data rate potential of millimeter wave communications, millimeter wave (mmWave) communications in urban low altitude economy (LAE) environments are constrained by challenges such as signal attenuation and multipath interference. Specially, in urban environments, mmWave communication experiences significant attenuation due to buildings, owing to its short wavelength, which necessitates developing innovative approaches to improve the robustness of such communications in LAE networking. In this paper, we explore the use of an unmanned aerial vehicle (UAV)‑carried intelligent reflecting surface (IRS) to support low altitude mmWave communication. Specifically, we consider a typical urban low altitude communication scenario where a UAV‑carried IRS establishes a line‑of‑sight (LoS) channel between the mobile users and a source user (SU) despite the presence of obstacles. Subsequently, we formulate an optimization problem aimed at maximizing the transmission rates and minimizing the energy consumption of the UAV by jointly optimizing phase shifts of the IRS and UAV trajectory. Given the non‑convex nature of the problem and its high dynamics, we propose a deep reinforcement learning‑based approach incorporating neural episodic control, long short‑term memory, and an IRS phase shift control method to enhance the stability and accelerate the convergence. Simulation results show that the proposed algorithm effectively resolves the problem and surpasses other benchmark algorithms in various performances.
Authors: Hui Lin, Nan Li, Pengjuan Yao, Kexin Dong, Yuhan Guo, Danfeng Hong, Ying Zhang, Congcong Wen
Abstract: Remote sensing object detection is particularly challenging due to the high resolution, multi‑scale features, and diverse ground object characteristics inherent in satellite and UAV imagery. These challenges necessitate more advanced approaches for effective object detection in such environments. While deep learning methods have achieved remarkable success in remote sensing object detection, they typically rely on large amounts of labeled data. Acquiring sufficient labeled data, particularly for novel or rare objects, is both challenging and time‑consuming in remote sensing scenarios, limiting the generalization capabilities of existing models. To address these challenges, few‑shot learning (FSL) has emerged as a promising approach, aiming to enable models to learn new classes from limited labeled examples. Building on this concept, few‑shot object detection (FSOD) specifically targets object detection challenges in data‑limited conditions. However, the generalization capability of FSOD models, particularly in remote sensing, is often constrained by the complex and diverse characteristics of the objects present in such environments. In this paper, we propose the Generalization‑Enhanced Few‑Shot Object Detection (GE‑FSOD) model to improve the generalization capability in remote sensing FSOD tasks. Our model introduces three key innovations: the Cross‑Level Fusion Pyramid Attention Network (CFPAN) for enhanced multi‑scale feature representation, the Multi‑Stage Refinement Region Proposal Network (MRRPN) for more accurate region proposals, and the Generalized Classification Loss (GCL) for improved classification performance in few‑shot scenarios. Extensive experiments on the DIOR and NWPU VHR‑10 datasets show that our model achieves state‑of‑the‑art performance for few‑shot object detection in remote sensing.
Authors: Gitae Park, Kanghyun Heo, Kisong Lee
Abstract: Unmanned aerial vehicles (UAVs) offer dynamic trajectory control, enabling them to avoid obstacles and establish line‑of‑sight (LoS) wireless channels with ground nodes (GNs), unlike traditional ground‑fixed base stations. This study addresses the joint optimization of scheduling and three‑dimensional (3D) trajectory planning for UAV‑assisted wireless data harvesting. The objective is to maximize the minimum uplink throughput among GNs while accounting for signal blockages and building avoidance. To achieve this, we first present mathematical models designed to avoid cuboid‑shaped buildings and to determine wireless signal blockage by buildings through rigorous mathematical proof. The optimization problem is formulated as nonconvex mixed‑integer nonlinear programming and solved using advanced techniques. Specifically, the problem is decomposed into convex subproblems via quadratic transform and successive convex approximation. Building avoidance and signal blockage constraints are incorporated using the separating hyperplane method and an approximated indicator function. These subproblems are then iteratively solved using the block coordinate descent algorithm. Simulation results validate the effectiveness of the proposed approach. The UAV dynamically adjusts its trajectory and scheduling policy to maintain LoS channels with GNs, significantly enhancing network throughput compared to existing schemes. Moreover, the trajectory of the UAV adheres to building avoidance constraints for its continuous trajectory, ensuring uninterrupted operation and compliance with safety requirements.
Authors: Milad Tatar Mamaghani, Xiangyun Zhou, Nan Yang, A. Lee Swindlehurst
Abstract: In this paper, we study a secure integrated sensing and communication (ISAC) system employing a full‑duplex base station with sensing capabilities against a mobile proactive adversarial target\unicodex2014a malicious unmanned aerial vehicle (M‑UAV). We develop a game‑theoretic model to enhance communication security, radar sensing accuracy, and power efficiency. The interaction between the legitimate network and the mobile adversary is formulated as a non‑cooperative Stackelberg game (NSG), where the M‑UAV acts as the leader and strategically adjusts its trajectory to improve its eavesdropping ability while conserving power and avoiding obstacles. In response, the legitimate network, acting as the follower, dynamically allocates resources to minimize network power usage while ensuring required secrecy rates and sensing performance. To address this challenging problem, we propose a low‑complexity successive convex approximation (SCA) method for network resource optimization combined with a deep reinforcement learning (DRL) algorithm for adaptive M‑UAV trajectory planning through sequential interactions and learning. Simulation results demonstrate the efficacy of the proposed method in addressing security challenges of dynamic ISAC systems in 6G, i.e., achieving a Stackelberg equilibrium with robust performance while mitigating the adversary's ability to intercept network signals.
Authors: Huiming Li, Hao Chen, Xiangke Wang, Zhongkui Li, Lincheng Shen
Abstract: In affine formation control problems, the construction of the framework with universal rigidity and affine localizability is a critical prerequisite, but it has not yet been well addressed, especially when additional agents join the formation or link/agent failures emerge. Motivated by this observation, we investigate the problem of constructing affine frameworks in three scenarios, including vertex addition, edge deletion and vertex deletion. Our approach starts from the original affine formation and uses geometric methods to locally adjust the structure of the weighted graph to describe the topology, so that the modified framework maintains the universal rigidity and affine localizability. Notably, the developed strategies only utilize local measurements and exhibit distributed characteristics, laying the foundation for applications in multi‑agent systems. To demonstrate the compatibility with affine formation control proposals, we present a case study on affine formation tracking in a multi‑UAV formation, demonstrating the effectiveness of our algorithms in constructing eligible frameworks in aforementioned scenarios. Moreover, a comparative simulation is also conducted to highlight the low time complexity of our distributed algorithm relative to the centralized optimization‑based method.
Authors: Mo Tian, Md Zubair Ebne Rafique, Kolappan Chidambaranathan, Randy Brost, Daniel Small, David Novick, Julius Yellowhair, Yu Yao
Abstract: The soiling level of heliostat mirrors in Concentrated Solar Power (CSP) fields is one of the key factors that significantly influences optical efficiency. State‑of‑the‑art methods of monitoring heliostats soiling levels still face various challenges, including slow speed, labor‑intensive operations, resolution and accuracy constraints or interruptions to solar field operations. We present a rapid, cost‑effective, and non‑intrusive method for mirror soiling detection based on polarimetric imaging, referred to as Polarimetric Imaging‑based Mirror Soiling (PIMS). The compact PIMS device is designed for integration with unmanned aerial vehicles (UAVs), enabling rapid, large‑area assessments of heliostat mirrors for efficient soiling detection. Our method utilizes the correlation between the Degree of Linear Polarization (DoLP) and surface soiling level based on Mie scattering theory and Monte Carlo simulations. Field deployment of the PIMS method requires minimal device installation, and its UAV‑based operation allows for soiling detection without interrupting plant activities. The PIMS method holds the potential for mirror soiling detection across various concentrated solar power (CSP) plants and can be further adapted for other types of solar fields, such as parabolic trough systems.
Authors: Suryansh Prakhar, Jung-Hee Seo, Rajat Mittal
Abstract: The application of unmanned aerial vehicles (UAVs) is surging across several industries, paralleled by growing demand for these UAVs. However, the noise emitted by UAVs remains a significant impediment to their widespread use even though in areas such as product delivery, they can be more environmentally friendly than traditional delivery methods. Nature has often been a source of inspiration for devices that are efficient and eco‑friendly. In the current study, we leverage the previous work by Seo et al. (Bioinsp. Biomimetics, 16 (4):046019, 2021) on the aeroacoustics of flapping wing flight in mosquitoes and fruit flies to propose and examine a simple strategy for reducing the aeroacoustic noise from drone rotors. In particular, inspired by these insects, we explore how an increase in the planform area of the rotor could be used to reduce the rotation rate and the associated aeroacoustic noise from small‑scale rotors. The study employs a sharp‑interface immersed boundary solver for the flow simulations and the aeroacoustic sound is predicted by the Ffowcs Williams‑Hawkings equation. Simulations indicate that the simple strategy of employing rotors with larger planform areas could lead not just to reduced aeroacoustic noise but improved power economy as well.
Authors: Hammam Salem, Mohanad Ahmed, Mohammed AlSharif, Ali Muqaibel, Tareq Al-Naffouri
Abstract: Driven by technological breakthroughs, indoor tracking and localization have gained importance in various applications including the Internet of Things (IoT), robotics, and unmanned aerial vehicles (UAVs). To tackle some of the challenges associated with indoor tracking, this study explores the potential benefits of incorporating the SO(3) manifold structure of the rotation matrix. The goal is to enhance the 3D tracking performance of the extended Kalman filter (EKF) and unscented Kalman filter (UKF) of a moving target within an indoor environment. Our results demonstrate that the proposed extended Kalman filter with Riemannian (EKFRie) and unscented Kalman filter with Riemannian (UKFRie) algorithms consistently outperform the conventional EKF and UKF in terms of position and orientation accuracy. While the conventional EKF and UKF achieved root mean square error (RMSE) of 0.36m and 0.43m, respectively, for a long stair path, the proposed EKFRie and UKFRie algorithms achieved a lower RMSE of 0.21m and 0.10m. Our results show also the outperforming of the proposed algorithms over the EKF and UKF algorithms with the Isosceles triangle manifold. While the latter achieved RMSE of 7.26cm and 7.27cm, respectively, our proposed algorithms achieved RMSE of 6.73cm and 6.16cm. These results demonstrate the enhanced performance of the proposed algorithms.
Authors: Abdoul Karim A. H. Saliah, Hajar El Hammouti, Daniel Bonilla Licea
Abstract: Multirotor Aerial Vehicles (MRAVs) when integrated into wireless communication systems and equipped with a Reflective Intelligent Surface (RIS) enhance coverage and enable connectivity in obstructed areas. However, due to limited degrees of freedom (DoF), traditional under‑actuated MRAVs with RIS are unable to control independently both the RIS orientation and their location, which significantly limits network performance. A new design, omnidirectional MRAV (o‑MRAV), is introduced to address this issue. In this paper, an o‑MRAV is deployed to assist a terrestrial base station in providing connectivity to obstructed users. Our objective is to maximize the minimum data rate among users by optimizing the o‑MRAV's orientation, location, and RIS phase shift. To solve this challenging problem, we first smooth the objective function and then apply the Parallel Successive Convex Approximation (PSCA) technique to find efficient solutions. Our simulation results show significant improvements of 28% and 14% in terms of minimum and average data rates, respectively, for the o‑MRAVs compared to traditional u‑MRAVs.
Authors: Hashim A. Hashim
Abstract: Avionics systems of an Unmanned Aerial Vehicle (UAV) or drone are the critical electronic components found onboard that regulate, navigate, and control UAV travel while ensuring public safety. Contemporary UAV avionics work together to facilitate success of UAV missions by enabling stable communication, secure identification protocols, novel energy solutions, multi‑sensor accurate perception and autonomous navigation, precise path planning, that guarantees collision avoidance, reliable trajectory control, and efficient data transfer within the UAV system. Moreover, special consideration must be given to electronic warfare threats prevention, detection, and mitigation, and the regulatory framework associated with UAV operations. This review presents the role and taxonomy of each UAV avionics system while covering shortcomings and benefits of available alternatives within each system. UAV communication systems, antennas, and location communication tracking are surveyed. Identification systems that respond to air‑to‑air or air‑to‑ground interrogating signals are presented. UAV classical and more innovative power sources are discussed. The rapid development of perception systems improves UAV autonomous navigation and control capabilities. The paper reviews common perception systems, navigation techniques, path planning approaches, obstacle avoidance methods, and tracking control. Modern electronic warfare uses advanced techniques and has to be counteracted by equally advanced methods to keep the public safe. Consequently, this work presents a detailed overview of common electronic warfare threats and state‑of‑the‑art countermeasures and defensive aids. UAV safety occurrences are analyzed in the context of national regulatory framework and the certification process. Databus communication and standards for UAVs are reviewed as they enable efficient and fast real‑time data transfer.
Authors: Suman Raj, Radhika Mittal, Harshil Gupta, Yogesh Simmhan
Abstract: Drone fleets with onboard cameras coupled with computer vision and DNN inferencing models can support diverse applications. One such novel domain is for one or more buddy drones to assist Visually Impaired People (VIPs) lead an active lifestyle. Video inferencing tasks from such drones can help both navigate the drone and provide situation awareness to the VIP, and hence have strict execution deadlines. We propose a deadline‑driven heuristic, DEMS‑A, to schedule diverse DNN tasks generated continuously to perform inferencing over video segments generated by multiple drones linked to an edge, with the option to execute on the cloud. We use strategies like task dropping, work stealing and migration, and dynamic adaptation to cloud variability, to guarantee a Quality of Service (QoS), i.e. maximize the utility and the number of tasks completed. We also introduce an additional Quality of Experience (QoE) metric useful to the assistive drone domain, which values the frequency of success for task types to ensure the responsiveness and reliability of the VIP application. We extend our DEMS solution to GEMS to solve this. We evaluate these strategies, using (i) an emulated setup of a fleet of over 80 drones supporting over 25 VIPs, with real DNN models executing on pre‑recorded drone video streams, using Jetson Nano edges and AWS Lambda cloud functions, and (ii) a real‑world setup of a Tello drone and a Jetson Orin Nano edge generating drone commands to follow a VIP in real‑time. Our strategies present a task completion rate of up to 88%, up to 2.7x higher QoS utility compared to the baselines, a further 16% higher QoS utility while adapting to network variability, and up to 75% higher QoE utility. Our practical validation exhibits task completion of up to 87% for GEMS and 33% higher total utility of GEMS compared to edge‑only.
Authors: Yifei Sun, Chao Yu, Yan Luo, Tony Xiao Han, Haisheng Tan, Rui Wang, Francis C. M. Lau
Abstract: Given the prospects of the low‑altitude economy (LAE) and the popularity of unmanned aerial vehicles (UAVs), there are increasing demands on monitoring flying objects at low altitude in wide urban areas. In this work, the widely deployed long‑term evolution (LTE) base station (BS) is exploited to illuminate UAVs in bistatic trajectory tracking. Specifically, a passive sensing receiver with two digital antenna arrays is proposed and developed to capture both the line‑of‑sight (LoS) signal and the scattered signal off a target UAV. From their cross ambiguity function, the bistatic range, Doppler shift and angle‑of‑arrival (AoA) of the target UAV can be detected in a sequence of time slots. In order to address missed detections and false alarms of passive sensing, a multi‑target tracking framework is adopted to track the trajectory of the target UAV. It is demonstrated by experiments that the proposed UAV tracking system can achieve a meter‑level accuracy.
Authors: Ying Zhang, Haibao Yan, Danni Zhu, Jiankun Wang, Cui-Hua Zhang, Weili Ding, Xi Luo, Changchun Hua, Max Q. -H. Meng
Abstract: Air‑ground collaborative robots have shown great potential in the field of fire and rescue, which can quickly respond to rescue needs and improve the efficiency of task execution. Mapping and navigation, as the key foundation for air‑ground collaborative robots to achieve efficient task execution, have attracted a great deal of attention. This growing interest in collaborative robot mapping and navigation is conducive to improving the intelligence of fire and rescue task execution, but there has been no comprehensive investigation of this field to highlight their strengths. In this paper, we present a systematic review of the ground‑to‑ground cooperative robots for fire and rescue from a new perspective of mapping and navigation. First, an air‑ground collaborative robots framework for fire and rescue missions based on unmanned aerial vehicle (UAV) mapping and unmanned ground vehicle (UGV) navigation is introduced. Then, the research progress of mapping and navigation under this framework is systematically summarized, including UAV mapping, UAV/UGV co‑localization, and UGV navigation, with their main achievements and limitations. Based on the needs of fire and rescue missions, the collaborative robots with different numbers of UAVs and UGVs are classified, and their practicality in fire and rescue tasks is elaborated, with a focus on the discussion of their merits and demerits. In addition, the application examples of air‑ground collaborative robots in various firefighting and rescue scenarios are given. Finally, this paper emphasizes the current challenges and potential research opportunities, rounding up references for practitioners and researchers willing to engage in this vibrant area of air‑ground collaborative robots.
Authors: Songhan Zhao, Shimin Gong, Bo Gu, Lanhua Li, Bin Lyu, Dinh Thai Hoang, Changyan Yi
Abstract: In this paper, we consider an aerial reconfigurable intelligent surface (ARIS)‑assisted wireless network, where multiple unmanned aerial vehicles (UAVs) collect data from ground users (GUs) by using the non‑orthogonal multiple access (NOMA) method. The ARIS provides enhanced channel controllability to improve the NOMA transmissions and reduce the co‑channel interference among UAVs. We also propose a novel dual‑mode switching scheme, where each UAV equipped with both an ARIS and a radio frequency (RF) transceiver can adaptively perform passive reflection or active transmission. We aim to maximize the overall network throughput by jointly optimizing the UAVs' trajectory planning and operating modes, the ARIS's passive beamforming, and the GUs' transmission control strategies. We propose an optimization‑driven hierarchical deep reinforcement learning (O‑HDRL) method to decompose it into a series of subproblems. Specifically, the multi‑agent deep deterministic policy gradient (MADDPG) adjusts the UAVs' trajectory planning and mode switching strategies, while the passive beamforming and transmission control strategies are tackled by the optimization methods. Numerical results reveal that the O‑HDRL efficiently improves the learning stability and reward performance compared to the benchmark methods. Meanwhile, the dual‑mode switching scheme is verified to achieve a higher throughput performance compared to the fixed ARIS scheme.
Authors: Yaoqi Yang, Yong Chen, Jiacheng Wang, Geng Sun, Dusit Niyato
Abstract: Low altitude economy (LAE) holds immense potential to drive urban development across various sectors. However, LAE also faces challenges in data collection and processing efficiency, flight control precision, and network performance. The challenges could be solved by realizing an integration of sensing, communications, computation, and control (ISC3) for LAE. In this regard, embodied artificial intelligence (EAI), with its unique perception, planning, and decision‑making capabilities, offers a promising solution to realize ISC3. Specifically, this paper investigates an application of EAI into ISC3 to support LAE, exploring potential research focuses, solutions, and case study. We begin by outlining rationales and benefits of introducing EAI into LAE, followed by reviewing research directions and solutions for EAI in ISC3. We then propose a framework of an EAI‑enabled ISC3 for LAE. The framework's effectiveness is evaluated through a case study of express delivery utilizing an EAI‑enabled UAV. Finally, we discuss several future research directions for advancing EAI‑enabled LAE.
Authors: Yuxuan Song, Yong Zeng, Yuhang Yang, Zixiang Ren, Gaoyuan Cheng, Xiaoli Xu, Jie Xu, Shi Jin, Rui Zhang
Abstract: Low‑altitude unmanned aerial vehicles (UAVs) are expected to play an important role in future wireless networks, either as aerial base stations (BSs) or aerial users connected to the cellular network. In addition, integrated sensing and communication (ISAC) has been identified as one of the six usage scenarios for the forthcoming sixth‑generation (6G) mobile networks, aimed at improving network functionalities and realizing situational awareness of the physical world. While most existing research efforts focus on terrestrial two‑dimensional (2D) communication and sensing, UAV as an aerial platform offers a new degree of freedom for designing three‑dimensional (3D) air‑ground (AG) ISAC networks. In this article, we provide an overview of cellular‑connected UAV ISAC, by elaborating the UAV's roles as a target to be sensed and as an aerial anchor to provide sensing functionality, respectively. In particular, we pay attention to the network coverage issue and topics specific to UAV networking, emphasizing the new opportunities as well as unique challenges to be addressed.
Authors: Jianping Yao, Zeyu Yang, Zai Yang, Jie Xu, Tony Q. S. Quek
Abstract: In this work, we study an unmanned aerial vehicle (UAV)‑enabled secure integrated sensing and communication (ISAC) system, where a UAV serves as an aerial base station (BS) to simultaneously perform communication with a user and detect a target on the ground, while a dual‑functional eavesdropper attempts to intercept the signals for both sensing and communication. Facing the dual eavesdropping threats, we aim to enhance the average achievable secrecy rate for the communication user by jointly designing the UAV trajectory together with the transmit information and sensing beamforming, while satisfying the requirements on sensing performance and sensing security, as well as the UAV power and flight constraints. To address the non‑convex nature of the optimization problem, we employ the alternating optimization (AO) strategy, jointly with the successive convex approximation (SCA) and semidefinite relaxation (SDR) methods. Numerical results validate the proposed approach, demonstrating its ability to achieve a high secrecy rate while meeting the required sensing and security constraints.
Authors: Taohong Zhu, Adrians Skapars, Fardeen Mackenzie, Declan Kehoe, William Newton, Suzanne Embury, Youcheng Sun
Abstract: Fuzz testing effectively uncovers software vulnerabilities; however, it faces challenges with Autonomous Systems (AS) due to their vast search spaces and complex state spaces, which reflect the unpredictability and complexity of real‑world environments. This paper presents a universal framework aimed at improving the efficiency of fuzz testing for AS. At its core is SaFliTe, a predictive component that evaluates whether a test case meets predefined safety criteria. By leveraging the large language model (LLM) with information about the test objective and the AS state, SaFliTe assesses the relevance of each test case. We evaluated SaFliTe by instantiating it with various LLMs, including GPT‑3.5, Mistral‑7B, and Llama2‑7B, and integrating it into four fuzz testing tools: PGFuzz, DeepHyperion‑UAV, CAMBA, and TUMB. These tools are designed specifically for testing autonomous drone control systems, such as ArduPilot, PX4, and PX4‑Avoidance. The experimental results demonstrate that, compared to PGFuzz, SaFliTe increased the likelihood of selecting operations that triggered bug occurrences in each fuzzing iteration by an average of 93.1%. Additionally, after integrating SaFliTe, the ability of DeepHyperion‑UAV, CAMBA, and TUMB to generate test cases that caused system violations increased by 234.5%, 33.3%, and 17.8%, respectively. The benchmark for this evaluation was sourced from a UAV Testing Competition.
Authors: Simon Kohaut, Nikolas Hohmann, Sebastian Brulin, Benedict Flade, Julian Eggert, Markus Olhofer, Jürgen Adamy, Devendra Singh Dhami, Kristian Kersting
Abstract: Advanced Aerial Mobility encompasses many outstanding applications that promise to revolutionize modern logistics and pave the way for various public services and industry uses. However, throughout its history, the development of such systems has been impeded by the complexity of legal restrictions and physical constraints. While airspaces are often tightly shaped by various legal requirements, Unmanned Aerial Vehicles (UAV) must simultaneously consider, among others, energy demands, signal quality, and noise pollution. In this work, we address this challenge by presenting a novel architecture that integrates methods of Probabilistic Mission Design (ProMis) and Many‑Objective Optimization for UAV routing. Hereby, our framework is able to comply with legal requirements under uncertainty while producing effective paths that minimize various physical costs a UAV needs to consider when traversing human‑inhabited spaces. To this end, we combine hybrid probabilistic first‑order logic for spatial reasoning with mixed deterministic‑stochastic route optimization, incorporating physical objectives such as energy consumption and radio interference with a logical, probabilistic model of legal requirements. We demonstrate the versatility and advantages of our system in a large‑scale empirical evaluation over real‑world, crowd‑sourced data from a map extract from the city of Paris, France, showing how a network of effective and compliant paths can be formed.
Authors: Kamal Shayegan
Abstract: In recent years, Unmanned Aerial Vehicles (UAVs) have been utilized as effective platforms for carrying Wi‑Fi Access Points (APs) and cellular Base Stations (BSs), enabling low‑cost, agile, and flexible wireless networks with high Quality of Service (QoS). The next generation of wireless communications will rely on increasingly higher frequencies, which are easily obstructed by obstacles. One of the most critical concepts yet to be fully addressed is positioning the UAV at optimal coordinates while accounting for obstacles. To ensure a line of sight (LoS) between UAVs and user equipment (UE), improve QoS, and establish reliable wireless links with maximum coverage, obstacles must be integrated into the proposed placement algorithms. This paper introduces a simulation‑based measurement approach for characterizing an air‑to‑ground (AG) channel in a simple scenario. By considering obstacles, we present a novel perspective on channel characterization. The results, in terms of throughput, packet delivery, packet loss, and delay, are compared using the proposed positioning approach.
Authors: Hussein Naser, Hashim A. Hashim, Mojtaba Ahmadi
Abstract: This paper presents a novel approach to utilizing underactuated quadrotor Unmanned Aerial Vehicles (UAVs) as assistive devices in cooperative payload transportation task through human guidance and physical interaction. The proposed system consists of two underactuated UAVs rigidly connected to the transported payload. This task involves the collaboration between human and UAVs to transport and manipulate a payload. The goal is to reduce the workload of the human and enable seamless interaction between the human operator and the aerial vehicle. An Admittance‑Nonsingular Fast Terminal Sliding Mode Control (NFTSMC) is employed to control and asymptotically stabilize the system while performing the task, where forces are applied to the payload by the human operator dictate the aerial vehicle's motion. The stability of the proposed controller is confirmed using Lyapunov analysis. Extensive simulation studies were conducted using MATLAB, Robot Operating System (ROS), and Gazebo to validate robustness and effectiveness of the proposed controller in assisting with payload transportation tasks. Results demonstrates feasibility and potential benefits utilizing quadrotor UAVs as assistive devices for payload transportation through intuitive human‑guided control. Keywords Cooperative payload transportation, Admittance control, Sliding mode control, Quadrotor control
Authors: Wen-Yu Dong, Shaoshi Yang, Wei Lin, Wei Zhao, Jia-Xing Gui, Sheng Chen
Abstract: In harsh environments such as mountainous terrain, dense vegetation areas, or urban landscapes, a single type of unmanned aerial vehicles (UAVs) may encounter challenges like flight restrictions, difficulty in task execution, or increased risk. Therefore, employing multiple types of UAVs, along with satellite assistance, to collaborate becomes essential in such scenarios. In this context, we present a stochastic geometry based approach for modeling the heterogeneous non‑terrestrial networks (NTNs) by using the classical binomial point process and introducing a novel point process, called Matérn hard‑core cluster process (MHCCP). Our MHCCP possesses both the exclusivity and the clustering properties, thus it can better model the aircraft group composed of multiple clusters. Then, we derive closed‑form expressions of the outage probability (OP) for the uplink (aerial‑to‑satellite) of heterogeneous NTNs. Unlike existing studies, our analysis relies on a more advanced system configuration, where the integration of beamforming and frequency division multiple access, and the shadowed‑Rician (SR) fading model for interference power, are considered. The accuracy of our theoretical derivation is confirmed by Monte Carlo simulations. Our research offers fundamental insights into the system‑level performance optimization of NTNs.
Authors: Farzan Moosavi, Bilal Farooq
Abstract: We introduce a multi‑modal autonomous delivery optimization framework as a coalition game for a fleet of UAVs and ADRs operating in two overlaying networks to address last‑mile delivery in urban environments, including high‑density areas and time‑critical applications. The problem is defined as multiple depot pickup and delivery with time windows constrained over operational restrictions, such as vehicle battery limitation, precedence time window, and building obstruction. Utilizing the coalition game theory, we investigate cooperation structures among the modes to capture how strategic collaboration can improve overall routing efficiency. To do so, a generalized reinforcement learning model is designed to evaluate the cost‑sharing and allocation to different modes to learn the cooperative behaviour with respect to various realistic scenarios. Our methodology leverages an end‑to‑end deep multi‑agent policy gradient method augmented by a novel spatio‑temporal adjacency neighbourhood graph attention network using a heterogeneous edge‑enhanced attention model and transformer architecture. Several numerical experiments on last‑mile delivery applications have been conducted, showing the results from the case study in the city of Mississauga, which shows that despite the incorporation of an extensive network in the graph for two modes and a complex training structure, the model addresses realistic operational constraints and achieves high‑quality solutions compared with the existing transformer‑based and classical methods. It can perform well on non‑homogeneous data distribution, generalizes well on different scales and configurations, and demonstrates a robust cooperative performance under stochastic scenarios across various tasks, which is effectively reflected by coalition analysis and cost allocation to signify the advantage of cooperation.
Authors: Adeel Ahmed, Wang Xingfu, Ammar Hawbani, Weijie Yuan, Hina Tabassum, Yuanwei Liu, Muhammad Umar Farooq Qaisar, Zhiguo Ding, Naofal Al-Dhahir, Arumugam Nallanathan, Derrick Wing Kwan Ng
Abstract: Revolutionary sixth‑generation wireless communications technologies and applications, notably digital twin networks (DTN), connected autonomous vehicles (CAVs), space‑air‑ground integrated networks (SAGINs), zero‑touch networks, industry 5.0, and healthcare 5.0, are driving next‑generation wireless networks (NGWNs). These technologies generate massive data, requiring swift transmission and trillions of device connections, fueling the need for sophisticated next‑generation multiple access (NGMA) schemes. NGMA enables massive connectivity in the 6G era, optimizing NGWN operations beyond current multiple access (MA) schemes. This survey showcases non‑orthogonal multiple access (NOMA) as NGMA's frontrunner, exploring What has NOMA delivered?, What is NOMA providing?, and What lies ahead?. We present NOMA variants, fundamental operations, and applicability in multi‑antenna systems, machine learning, reconfigurable intelligent surfaces (RIS), cognitive radio networks (CRN), integrated sensing and communications (ISAC), terahertz networks, and unmanned aerial vehicles (UAVs). Additionally, we explore NOMA's interplay with state‑of‑the‑art wireless technologies, highlighting its advantages and technical challenges. Finally, we unveil NOMA research trends in the 6G era and provide design recommendations and future perspectives for NOMA as the leading NGMA solution for NGWNs.
Authors: Hanfang Liang, Jinming Hu, Xiaohuan Ling, Bing Wang
Abstract: The increasing deployment of small drones as tools of conflict and disruption has amplified their threat, highlighting the urgent need for effective anti‑drone measures. However, the compact size of most drones presents a significant challenge, as traditional supervised point cloud or image‑based object detection methods often fail to identify such small objects effectively. This paper proposes a simple UAV detection method using an unsupervised pipeline. It uses spatial‑temporal sequence processing to fuse multiple lidar datasets effectively, tracking and determining the position of UAVs, so as to detect and track UAVs in challenging environments. Our method performs front and rear background segmentation of point clouds through a global‑local sequence clusterer and parses point cloud data from both the spatial‑temporal density and spatial‑temporal voxels of the point cloud. Furthermore, a scoring mechanism for point cloud moving targets is proposed, using time series detection to improve accuracy and efficiency. We used the MMAUD dataset, and our method achieved 4th place in the CVPR 2024 UG2+ Challenge, confirming the effectiveness of our method in practical applications.
Authors: Muhammad Ali Jamshed, Aryan Kaushik, Sanaullah Manzoor, Muhammad Zeeshan Shakir, Jaehyup Seong, Mesut Toka, Wonjae Shin, Malte Schellmann
Abstract: The International Mobile Telecommunications (IMT)‑2030 framework recently adopted by the International Telecommunication Union Radiocommunication Sector (ITU‑R) envisions 6G networks to deliver intelligent, seamless connectivity that supports reliable, sustainable, and resilient communications. Recent developments in the 3rd Generation Partnership Project (3GPP) Releases 17‑19, particularly within the Radio Access Network (RAN)4 working group addressing satellite and cellular spectrum sharing and RAN2 enhancing New Radio (NR)/IoT for NTN, highlight the critical role NTN is set to play in the evolution of 6G standards. The integration of advanced signal processing, edge and cloud computing, and Deep Reinforcement Learning (DRL) for Low Earth Orbit (LEO) satellites and aerial platforms, such as Uncrewed Aerial Vehicles (UAV) and high‑, medium‑, and low‑altitude platform stations, has revolutionized the convergence of space, aerial, and Terrestrial Networks (TN). Artificial Intelligence (AI)‑powered deployments for NTN and NTN‑IoT, combined with Next Generation Multiple Access (NGMA) technologies, have dramatically reshaped global connectivity. This tutorial paper provides a comprehensive exploration of emerging NTN‑based 6G wireless networks, covering vision, alignment with 5G‑Advanced and 6G standards, key principles, trends, challenges, real‑world applications, and novel problem solving frameworks. It examines essential enabling technologies like AI for NTN (LEO satellites and aerial platforms), DRL, edge computing for NTN, AI for NTN trajectory optimization, Reconfigurable Intelligent Surfaces (RIS)‑enhanced NTN, and robust Multiple‑Input‑Multiple‑Output (MIMO) beamforming. Furthermore, it addresses interference management through NGMA, including Rate‑Splitting Multiple Access (RSMA) for NTN, and the use of aerial platforms for access, relay, and fronthaul/backhaul connectivity.
Authors: Irshad A. Meer, Karl-Ludwig Besser, Mustafa Ozger, Dominic Schupke, H. Vincent Poor, Cicek Cavdar
Abstract: Multi‑connectivity involves dynamic cluster formation among distributed access points (APs) and coordinated resource allocation from these APs, highlighting the need for efficient mobility management strategies for users with multi‑connectivity. In this paper, we propose a novel mobility management scheme for unmanned aerial vehicles (UAVs) that uses dynamic cluster reconfiguration with energy‑efficient power allocation in a wireless interference network. Our objective encompasses meeting stringent reliability demands, minimizing joint power consumption, and reducing the frequency of cluster reconfiguration. To achieve these objectives, we propose a hierarchical multi‑agent deep reinforcement learning (H‑MADRL) framework, specifically tailored for dynamic clustering and power allocation. The edge cloud connected with a set of APs through low latency optical back‑haul links hosts the high‑level agent responsible for the optimal clustering policy, while low‑level agents reside in the APs and are responsible for the power allocation policy. To further improve the learning efficiency, we propose a novel action‑observation transition‑driven learning algorithm that allows the low‑level agents to use the action space from the high‑level agent as part of the local observation space. This allows the lower‑level agents to share partial information about the clustering policy and allocate the power more efficiently. The simulation results demonstrate that our proposed distributed algorithm achieves comparable performance to the centralized algorithm. Additionally, it offers better scalability, as the decision time for clustering and power allocation increases by only 10% when doubling the number of APs, compared to a 90% increase observed with the centralized approach.
Authors: Laura Weihl, Bilal Wehbe, Andrzej Wąsowski
Abstract: Autonomous inspection of infrastructure on land and in water is a quickly growing market, with applications including surveying constructions, monitoring plants, and tracking environmental changes in on‑ and off‑shore wind energy farms. For Autonomous Underwater Vehicles and Unmanned Aerial Vehicles overfitting of controllers to simulation conditions fundamentally leads to poor performance in the operation environment. There is a pressing need for more diverse and realistic test data that accurately represents the challenges faced by these systems. We address the challenge of generating perception test data for autonomous systems by leveraging Neural Radiance Fields to generate realistic and diverse test images, and integrating them into a metamorphic testing framework for vision components such as vSLAM and object detection. Our tool, N2R‑Tester, allows training models of custom scenes and rendering test images from perturbed positions. An experimental evaluation of N2R‑Tester on eight different vision components in AUVs and UAVs demonstrates the efficacy and versatility of the approach.
Authors: Tao Zhou, Kai Ye, Zeyu Shi, Jiajing Lin, Dejun Xu, Min Jiang
Abstract: Numerous remarkable advancements have been made in accuracy, speed, and parallelism for solving the Unmanned Aerial Vehicle Route Planing (UAVRP). However, existing UAVRP solvers face challenges when attempting to scale effectively and efficiently for larger instances. In this paper, we present a generalization framework that enables current UAVRP solvers to robustly extend their capabilities to larger instances, accommodating up to 10,000 points, using widely recognized test sets. The UAVRP under a large number of patrol points is a typical large‑scale TSP problem.Our proposed framework comprises three distinct steps. Firstly, we employ Delaunay triangulation to extract subgraphs from large instances while preserving global features. Secondly, we utilize an embedded TSP solver to obtain sub‑results, followed by graph fusion. Finally, we implement a decoding strategy customizable to the user's requirements, resulting in high‑quality solutions, complemented by a warming‑up process for the heatmap. To demonstrate the flexibility of our approach, we integrate two representative TSP solvers into our framework and conduct a comprehensive comparative analysis against existing algorithms using large TSP benchmark datasets. The results unequivocally demonstrate that our framework efficiently scales existing TSP solvers to handle large instances and consistently outperforms state‑of‑the‑art (SOTA) methods. Furthermore, since our proposed framework does not necessitate additional training or fine‑tuning, we believe that its generality can significantly advance research on end‑to‑end UAVRP solvers, enabling the application of a broader range of methods to real‑world scenarios.
Authors: Aneesha Guna, Parth Ganeriwala, Siddhartha Bhattacharyya
Abstract: With the advancement of deep learning methods it is imperative that autonomous systems will increasingly become intelligent with the inclusion of advanced machine learning algorithms to execute a variety of autonomous operations. One such task involves the design and evaluation for a subsystem of the perception system for object detection and tracking. The challenge in the creation of software to solve the task is in discovering the need for a dataset, annotation of the dataset, selection of features, integration and refinement of existing algorithms, while evaluating performance metrics through training and testing. This research effort focuses on the development of a machine learning pipeline emphasizing the inclusion of assurance methods with increasing automation. In the process, a new dataset was created by collecting videos of moving object such as Roomba vacuum cleaner, emulating search and rescue (SAR) for indoor environment. Individual frames were extracted from the videos and labeled using a combination of manual and automated techniques. This annotated dataset was refined for accuracy by initially training it on YOLOv4. After the refinement of the dataset it was trained on a second YOLOv4 and a Mask R‑CNN model, which is deployed on a Parrot Mambo drone to perform real‑time object detection and tracking. Experimental results demonstrate the effectiveness of the models in accurately detecting and tracking the Roomba across multiple trials, achieving an average loss of 0.1942 and 96% accuracy.
Authors: Joseanne Viana, Hamed Farkhari, Pedro Sebastiao, Victor P Gil Jimenez
Abstract: Unmanned Aerial Vehicles (UAVs) face significant security risks from jamming attacks, which can compromise network functionality. Traditional detection methods often fall short when confronting AI‑powered jamming that dynamically modifies its behavior, while contemporary machine learning approaches frequently demand substantial feature engineering and struggle with temporal patterns in attack signatures. The vulnerability extends to 5G networks employing Time Division Duplex (TDD) or Frequency Division Duplex (FDD), where service quality may deteriorate due to deliberate interference. We introduce a novel U‑shaped transformer architecture that leverages Principal Component Analysis (PCA) to refine feature representations for improved wireless security. The training process is regularized by incorporating the output entropy uncertainty into the loss function, a mechanism inspired by the Soft Actor‑Critic (SAC) algorithm in Reinforcement Learning (RL) to enable robust jamming detection techniques. The architecture features a modified transformer encoder specially designed to process critical wireless signal features, including Received Signal Strength Indicator (RSSI) and Signal‑to‑ Interference‑plus‑Noise Ratio (SINR) measurements. We complement this with a custom positional encoding mechanism that specifically accounts for the inherent periodicity of wireless signals,enabling a more accurate representation of temporal signal patterns. In addition, we propose a batch size scheduler and implement chunking techniques to optimize convergence for time series data. These advancements contribute to up to a ten times improvement in training speed within the advanced U‑shaped encoder‑decoder transformer model introduced in this study. Experimental evaluations demonstrate the effectiveness of our entropy‑based approach, achieving detection rates of 85.06% in NLoS scenarios.
Authors: Atharva Sagale, Tohid Kargar Tasooji, Ramviyas Parasuraman
Abstract: This paper presents a novel approach to range‑based cooperative localization for robot swarms in GPS‑denied environments, addressing the limitations of current methods in noisy and sparse settings. We propose a robust multi‑layered localization framework that combines shadow edge localization techniques with the strategic deployment of UAVs. This approach not only addresses the challenges associated with nonrigid and poorly connected graphs but also enhances the convergence rate of the localization process. We introduce two key concepts: the S1‑Edge approach in our distributed protocol to address the rigidity problem of sparse graphs and the concept of a powerful UAV node to increase the sensing and localization capability of the multi‑robot system. Our approach leverages the advantages of the distributed localization methods, enhancing scalability and adaptability in large robot networks. We establish theoretical conditions for the new S1‑Edge that ensure solutions exist even in the presence of noise, thereby validating the effectiveness of shadow edge localization. Extensive simulation experiments confirm the superior performance of our method compared to state‑of‑the‑art techniques, resulting in up to 95% reduction in localization error, demonstrating substantial improvements in localization accuracy and robustness to sparse graphs. This work provides a decisive advancement in the field of multi‑robot localization, offering a powerful tool for high‑performance and reliable operations in challenging environments.
Authors: Weiqi Wang, Jin Xu
Abstract: Motivated by the critical need for unmanned aerial vehicles (UAVs) to patrol grid systems in hazardous and dynamically changing environments, this study addresses a routing problem aimed at minimizing the time‑average Age of Information (AoI) for edges in general graphs. We establish a lower bound for all feasible patrol policies and demonstrate that this bound is tight when the graph contains an Eulerian cycle. For graphs without Eulerian cycles, it becomes challenging to identify the optimal patrol strategy due to the extensive range of feasible options. Our analysis shows that restricting the strategy to periodic sequences still results in an exponentially large number of possible strategies. To address this complexity, we introduce two polynomial‑time approximation schemes, each involving a two‑step process: constructing multigraphs first and then embedding Eulerian cycles within these multigraphs. We prove that both schemes achieve an approximation ratio of 2. Further, both analytical and numerical results suggest that evenly and sparsely distributing edge visits within a periodic route significantly reduces the average AoI compared to strategies that merely minimize the route travel distance. Building on this insight, we propose a heuristic method that not only maintains the approximation ratio of 2 but also ensures robust performance across varying random graphs.
Authors: Sanghyoup Gu, Ratnesh Kumar
Abstract: Recent advances in deep learning have provided new data‑driven ways of controller design to replace the traditional manual synthesis and certification approaches. Employing neural network (NN) as controllers however, presents its own challenge: that of certifying stability due to their inherent complex nonlinearity, and while NN controllers have demonstrated high performance in complex systems, they often lack formal stability guarantees. This issue is further accentuated for critical nonlinear applications such as of unmanned aerial vehicles (UAVs), complicating their stability guarantees, whereas a lack of stability assurance raises the risk of critical damage or even complete failure under a loss of control. In this study, we improve a Robust, Optimal, Safe and Stability Guaranteed Training (ROSS‑GT) method of [1] to design an NN controller for a quadcopter flight control. The approach ensures closed‑loop system stability by finding a Lyapunov function, and providing a safe initial state domain that remains invariant under the control and guarantees stability to an equilibrium within it. Stability guaranteeing constraints are derived from the sector bound of the system nonlinearity and of its parameters and disturbance variations, in the form of a Lipschitz bound for a NN control. The control performance is further optimized by searching over the class of stability‑guaranteeing controllers to minimize the reference tracking error and the control costs.
Authors: Bin Li, Xiao Zhu, Junyi Wang
Abstract: Data compression technology is able to reduce data size, which can be applied to lower the cost of task offloading in mobile edge computing (MEC). This paper addresses the practical challenges for robust trajectory and scheduling optimization based on data compression in the unmanned aerial vehicle (UAV)‑assisted MEC, aiming to minimize the sum energy cost of terminal users while maintaining robust performance during UAV flight. Considering the non‑convexity of the problem and the dynamic nature of the scenario, the optimization problem is reformulated as a Markov decision process. Then, a randomized ensembled double Q‑learning (REDQ) algorithm is adopted to solve the issue. The algorithm allows for higher feasible update‑to‑data ratio, enabling more effective learning from observed data. The simulation results show that the proposed scheme effectively reduces the energy consumption while ensuring flight robustness. Compared to the PPO and A2C algorithms, energy consumption is reduced by approximately 21.9% and 35.4%, respectively. This method demonstrates significant advantages in complex environments and holds great potential for practical applications.
Authors: Jose Enrique Maese, Fernando Caballero, Luis Merino
Abstract: This paper presents a simulation framework able of modeling the dynamics of a hanging tether with adjustable length, connecting a UAV to a UGV. The model incorporates the interaction between the UAV, UGV, and a winch, allowing for dynamic tether adjustments based on the relative motion of the robots. The accuracy and reliability of the simulator are assessed through extensive experiments, including comparisons with real‑world experiment, to evaluate its ability to reproduce the complex tether dynamics observed in physical deployments. The results demonstrate that the simulation closely aligns with real‑world behavior, particularly in constrained environments where tether effects are significant. This work provides a validated tool for studying tethered robotic systems, offering valuable insights into their motion dynamics and control strategies.
Authors: Hanfang Liang, Yizhuo Yang, Jinming Hu, Jianfei Yang, Fen Liu, Shenghai Yuan
Abstract: Compact UAV systems, while advancing delivery and surveillance, pose significant security challenges due to their small size, which hinders detection by traditional methods. This paper presents a cost‑effective, unsupervised UAV detection method using spatial‑temporal sequence processing to fuse multiple LiDAR scans for accurate UAV tracking in real‑world scenarios. Our approach segments point clouds into foreground and background, analyzes spatial‑temporal data, and employs a scoring mechanism to enhance detection accuracy. Tested on a public dataset, our solution placed 4th in the CVPR 2024 UG2+ Challenge, demonstrating its practical effectiveness. We plan to open‑source all designs, code, and sample data for the research community github.com/lianghanfang/UnLiDAR‑UAV‑Est.
Authors: Allen Lei, Tianchen Deng, Han Wang, Jianfei Yang, Shenghai Yuan
Abstract: As small unmanned aerial vehicles (UAVs) become increasingly prevalent, there is growing concern regarding their impact on public safety and privacy, highlighting the need for advanced tracking and trajectory estimation solutions. In response, this paper introduces a novel framework that utilizes audio array for 3D UAV trajectory estimation. Our approach incorporates a self‑supervised learning model, starting with the conversion of audio data into mel‑spectrograms, which are analyzed through an encoder to extract crucial temporal and spectral information. Simultaneously, UAV trajectories are estimated using LiDAR point clouds via unsupervised methods. These LiDAR‑based estimations act as pseudo labels, enabling the training of an Audio Perception Network without requiring labeled data. In this architecture, the LiDAR‑based system operates as the Teacher Network, guiding the Audio Perception Network, which serves as the Student Network. Once trained, the model can independently predict 3D trajectories using only audio signals, with no need for LiDAR data or external ground truth during deployment. To further enhance precision, we apply Gaussian Process modeling for improved spatiotemporal tracking. Our method delivers top‑tier performance on the MMAUD dataset, establishing a new benchmark in trajectory estimation using self‑supervised learning techniques without reliance on ground truth annotations.
Authors: Ziang Wang, Lei Wang, Qi Yi, Yimin Liu
Abstract: Unmanned aerial vehicles (UAVs) have played an increasingly important role in military operations and social life. Among all application scenarios, multi‑target tracking tasks accomplished by UAV swarms have received extensive attention. However, when UAVs use radar to track targets, the tracking performance can be severely compromised by jammers. To track targets in the presence of jammers, UAVs can use passive radar to position the jammer. This paper proposes a system where a UAV swarm selects the radar's active or passive work mode to track multiple differently located and potentially jammer‑carrying targets. After presenting the optimization problem and proving its solving difficulty, we use a multi‑agent reinforcement learning algorithm to solve this control problem. We also propose a mechanism based on simulated annealing algorithm to avoid cases where UAV actions violate constraints. Simulation experiments demonstrate the effectiveness of the proposed algorithm.
Authors: Rui Wang, Kaitao Meng, Deshi Li
Abstract: Unmanned aerial vehicles (UAVs) have attracted plenty of attention due to their high flexibility and enhanced communication ability. However, the limited coverage and energy of UAVs make it difficult to provide timely wireless service for large‑scale sensor networks, which also exist in multiple UAVs. To this end, the advanced collaboration mechanism of UAVs urgently needs to be designed. In this paper, we propose a multi‑UAV collaborative scheme for seamless data collection and transmission, where UAVs are dispatched to collection points (CPs) to collect and transmit the time‑critical data to the ground base station (BS) simultaneously through the cooperative backhaul link. Specifically, the mission completion time is minimized by optimizing the trajectories, task allocation, collection time scheduling, and transmission topology of UAVs while ensuring backhaul link to the BS. However, the formulated problem is non‑convex and challenging to solve directly. To tackle this problem, the CP locations and transmission topology of UAVs are obtained by sensor node (SN) clustering and region division. Next, the transmission connectivity condition between UAVs is derived to facilitate the trajectory discretization and thus reduce the dimensions of variables. This simplifies the problem to optimizing the UAV hovering locations, hovering time, and CP serving sequence. Then, we propose a point‑matching‑based trajectory planning algorithm to solve the problem efficiently. The simulation results show that the proposed scheme achieves significant performance gains over the two benchmarks.
Authors: Reza Ahmadvand, Sarah Sharif, Yaser Banad
Abstract: Recent advances in multi‑agent systems manipulation have demonstrated a rising demand for the implementation of multi‑UAV systems in urban areas which are always subjected to the presence of static and dynamic obstacles. The focus of the presented research is on the introduction of a nature‑inspired collision‑free control for a multi‑UAV system considering obstacle avoidance maneuvers. Inspired by the collective behavior of tilapia fish and pigeon, the presented framework in this study uses a centralized controller for the optimal formation control/recovery, which is defined by probabilistic Lloyd's algorithm, while it uses a distributed controller for the intervehicle collision and obstacle avoidance. Further, the presented framework has been extended to the 3D space with 3D maneuvers. Finally, the presented framework has been applied to a multi‑UAV system in 2D and 3D scenarios, and obtained results demonstrated the validity of the presented method in the presence of buildings and different types of obstacles.
Authors: Rick van Essen, Eldert van Henten, Gert Kootstra
Abstract: UAVs are becoming popular in agriculture, however, they usually use time‑consuming row‑by‑row flight paths. This paper presents a deep‑reinforcement‑learning‑based approach for path planning to efficiently localize weeds in agricultural fields using UAVs with minimal flight‑path length. The method combines prior knowledge about the field containing uncertain, low‑resolution weed locations with in‑flight weed detections. The search policy was learned using deep Q‑learning. We trained the agent in simulation, allowing a thorough evaluation of the weed distribution, typical errors in the perception system, prior knowledge, and different stopping criteria on the planner's performance. When weeds were non‑uniformly distributed over the field, the agent found them faster than a row‑by‑row path, showing its capability to learn and exploit the weed distribution. Detection errors and prior knowledge quality had a minor effect on the performance, indicating that the learned search policy was robust to detection errors and did not need detailed prior knowledge. The agent also learned to terminate the search. To test the transferability of the learned policy to a real‑world scenario, the planner was tested on real‑world image data without further training, which showed a 66% shorter path compared to a row‑by‑row path at the cost of a 10% lower percentage of found weeds. Strengths and weaknesses of the planner for practical application are comprehensively discussed, and directions for further development are provided. Overall, it is concluded that the learned search policy can improve the efficiency of finding non‑uniformly distributed weeds using a UAV and shows potential for use in agricultural practice.
Authors: Quan Chen, Tingyu Wang, Rongfeng Lu, Yu Liu, Bolun Zheng, Zhedong Zheng
Abstract: UAV Geo‑Localization faces significant challenges due to the drastic appearance discrepancy between dronecaptured images and satellite views. Existing methods typically assume a consistent scaling factor across views and rely on predefined partition alignment to extract viewpoint‑invariant representations through part‑level feature construction. However, this scaling assumption often fails in real‑world scenarios, where variations in drone flight states lead to scale mismatches between cross‑view images, resulting in severe performance degradation. To address this issue, we propose a scale‑adaptive partition learning framework that leverages known drone flight height to predict scale factors and dynamically adjust feature extraction. Our key contribution is a height‑aware adjustment strategy, which calculates the relative height ratio between drone and satellite views, dynamically adjusting partition sizes to explicitly align semantic information between partition pairs. This strategy is integrated into a Scale‑adaptive Local Partition Network (SaLPN), building upon an existing square partition strategy to extract both finegrained and global features. Additionally, we propose a saliencyguided refinement strategy to enhance part‑level features, further improving retrieval accuracy. Extensive experiments validate that our height‑aware, scale‑adaptive approach achieves stateof‑the‑art geo‑localization accuracy in various scale‑inconsistent scenarios and exhibits strong robustness against scale variations. The code will be made publicly available.
Authors: Hang Zhang, Zhuoling Li, Jun Liu
Abstract: Dynamic scenes contain intricate spatio‑temporal information, crucial for mobile robots, UAVs, and autonomous driving systems to make informed decisions. Parsing these scenes into semantic triplets <Subject‑Predicate‑Object> for accurate Scene Graph Generation (SGG) is highly challenging due to the fluctuating spatio‑temporal complexity. Inspired by the reasoning capabilities of Large Language Models (LLMs), we propose SceneLLM, a novel framework that leverages LLMs as powerful scene analyzers for dynamic SGG. Our framework introduces a Video‑to‑Language (V2L) mapping module that transforms video frames into linguistic signals (scene tokens), making the input more comprehensible for LLMs. To better encode spatial information, we devise a Spatial Information Aggregation (SIA) scheme, inspired by the structure of Chinese characters, which encodes spatial data into tokens. Using Optimal Transport (OT), we generate an implicit language signal from the frame‑level token sequence that captures the video's spatio‑temporal information. To further improve the LLM's ability to process this implicit linguistic input, we apply Low‑Rank Adaptation (LoRA) to fine‑tune the model. Finally, we use a transformer‑based SGG predictor to decode the LLM's reasoning and predict semantic triplets. Our method achieves state‑of‑the‑art results on the Action Genome (AG) benchmark, and extensive experiments show the effectiveness of SceneLLM in understanding and generating accurate dynamic scene graphs.
Authors: Peini Yi, Wenchi Cheng, Jingqing Wang, Wei Zhang
Abstract: In recent years, reconfigurable intelligent surfaces (RIS) have garnered significant attention for their ability to control the phase shifts in reflected signals. By intelligently adjusting these phases, RIS can establish seamless direct paths between communication devices obstructed by obstacles, eliminating the need for forwarding and significantly reducing system overhead associated with relaying. This capability is crucial in multi‑hop ad hoc networks requiring multiple relay steps. Consequently, the concept of incorporating multi‑hop RIS into wireless multi‑hop relay networks has emerged. In this paper, we propose a novel network model where each UAV communication node is equipped with a RIS, facilitating seamless connections in multi‑hop relay wireless networks. We analyze the performance of this model by integrating RIS‑assisted physical layer modeling into the seamless connection network framework and conducting a detailed comparative analysis of RIS‑assisted and conventional connections. At the medium access layer, we introduce a RIS‑DCF MAC protocol based on the IEEE 802.11 distributed coordination function (DCF), modeling the medium access process as a two‑hop access scenario. Our results demonstrate that the seamless connections and diversity gain provided by RIS significantly enhance the performance of multi‑hop relay wireless networks.
Authors: Nethmi S. Hewawiththi, M. Mahesha Viduranga, Vanodhya G. Warnasooriya, Tharindu Fernando, Himal A. Suraweera, Sridha Sridharan, Clinton Fookes
Abstract: Unmanned aerial vehicle‑assisted disaster recovery missions have been promoted recently due to their reliability and flexibility. Machine learning algorithms running onboard significantly enhance the utility of UAVs by enabling real‑time data processing and efficient decision‑making, despite being in a resource‑constrained environment. However, the limited bandwidth and intermittent connectivity make transmitting the outputs to ground stations challenging. This paper proposes a novel semantic extractor that can be adopted into any machine learning downstream task for identifying the critical data required for decision‑making. The semantic extractor can be executed onboard which results in a reduction of data that needs to be transmitted to ground stations. We test the proposed architecture together with the semantic extractor on two publicly available datasets, FloodNet and RescueNet, for two downstream tasks: visual question answering and disaster damage level classification. Our experimental results demonstrate the proposed method maintains high accuracy across different downstream tasks while significantly reducing the volume of transmitted data, highlighting the effectiveness of our semantic extractor in capturing task‑specific salient information.
Authors: Zhiying Wang, Gang Sun, Yuhui Wang, Hongfang Yu, Dusit Niyato
Abstract: The Space‑Air‑Ground Integrated Network (SAGIN) framework is a crucial foundation for future networks, where satellites and aerial nodes assist in computational task offloading. The low‑altitude economy, leveraging the flexibility and multifunctionality of Unmanned Aerial Vehicles (UAVs) in SAGIN, holds significant potential for development in areas such as communication and sensing. However, effective coordination is needed to streamline information exchange and enable efficient system resource allocation. In this paper, we propose a Clustering‑based Multi‑agent Deep Deterministic Policy Gradient (CMADDPG) algorithm to address the multi‑UAV cooperative task scheduling challenges in SAGIN. The CMADDPG algorithm leverages dynamic UAV clustering to partition UAVs into clusters, each managed by a Cluster Head (CH) UAV, facilitating a distributed‑centralized control approach. Within each cluster, UAVs delegate offloading decisions to the CH UAV, reducing intra‑cluster communication costs and decision conflicts, thereby enhancing task scheduling efficiency. Additionally, by employing a multi‑agent reinforcement learning framework, the algorithm leverages the extensive coverage of satellites to achieve centralized training and distributed execution of multi‑agent tasks, while maximizing overall system profit through optimized task offloading decision‑making. Simulation results reveal that the CMADDPG algorithm effectively optimizes resource allocation, minimizes queue delays, maintains balanced load distribution, and surpasses existing methods by achieving at least a 25% improvement in system profit, showcasing its robustness and adaptability across diverse scenarios.
Authors: Ashley Kline, Abirami Elangovan, Dominique Escandon, Scott Wade, Aatish Gupta
Abstract: The use of Unmanned Aerial Vehicles (UAVs) for aerial tasks and environmental manipulation is increasingly desired. This can be demonstrated via art tasks. This paper presents the development of Magnasketch, capable of translating image inputs into art on a magnetic drawing board via a Bitcraze Crazyflie 2.0 quadrotor. Optimal trajectories were generated using a Model Predictive Control (MPC) formulation newly incorporating magnetic force dynamics. A Z‑compliant magnetic drawing apparatus was designed for the quadrotor. Experimental results of the novel controller tested against the existing Position High Level Commander showed comparable performance. Although slightly outperformed in terms of error, with average errors of 3.9 cm, 4.4 cm, and 0.5 cm in x, y, and z respectively, the Magnasketch controller produced smoother drawings with the added benefit of full state control.
Authors: Chen Li, Rui Zhao, Zeyu Wang, Huiying Xu, Xinzhong Zhu
Abstract: Object detection in Unmanned Aerial Vehicle (UAV) images has emerged as a focal area of research, which presents two significant challenges: i) objects are typically small and dense within vast images; ii) computational resource constraints render most models unsuitable for real‑time deployment. Current real‑time object detectors are not optimized for UAV images, and complex methods designed for small object detection often lack real‑time capabilities. To address these challenges, we propose a novel detector, RemDet (Reparameter efficient multiplication Detector). Our contributions are as follows: 1) Rethinking the challenges of existing detectors for small and dense UAV images, and proposing information loss as a design guideline for efficient models. 2) We introduce the ChannelC2f module to enhance small object detection performance, demonstrating that high‑dimensional representations can effectively mitigate information loss. 3) We design the GatedFFN module to provide not only strong performance but also low latency, effectively addressing the challenges of real‑time detection. Our research reveals that GatedFFN, through the use of multiplication, is more cost‑effective than feed‑forward networks for high‑dimensional representation. 4) We propose the CED module, which combines the advantages of ViT and CNN downsampling to effectively reduce information loss. It specifically enhances context information for small and dense objects. Extensive experiments on large UAV datasets, Visdrone and UAVDT, validate the real‑time efficiency and superior performance of our methods. On the challenging UAV dataset VisDrone, our methods not only provided state‑of‑the‑art results, improving detection by more than 3.4%, but also achieve 110 FPS on a single 4090.
Authors: Bitgoeul Kim, Samuel W. Blair, Talukder Z. Jubery, Soumik Sarkar, Arti Singh, Asheesh K. Singh, Baskar Ganapathysubramanian
Abstract: Plant breeding programs require assessments of days to maturity for accurate selection and placement of entries in appropriate tests. In the early stages of the breeding pipeline, soybean breeding programs assign relative maturity ratings to experimental varieties that indicate their suitable maturity zones. Traditionally, the estimation of maturity value for breeding varieties has involved breeders manually inspecting fields and assessing maturity value visually. This approach relies heavily on rater judgment, making it subjective and time‑consuming. This study aimed to develop a machine‑learning model for evaluating soybean maturity using UAV‑based time‑series imagery. Images were captured at three‑day intervals, beginning as the earliest varieties started maturing and continuing until the last varieties fully matured. The data collected for this experiment consisted of 22,043 plots collected across three years (2021 to 2023) and represent relative maturity groups 1.6 ‑ 3.9. We utilized contour plot images extracted from the time‑series UAV RGB imagery as input for a neural network model. This contour plot approach encoded the temporal and spatial variation within each plot into a single image. A deep learning model was trained to utilize this contour plot to predict maturity ratings. This model significantly improves accuracy and robustness, achieving up to 85% accuracy. We also evaluate the model's accuracy as we reduce the number of time points, quantifying the trade‑off between temporal resolution and maturity prediction. The predictive model offers a scalable, objective, and efficient means of assessing crop maturity, enabling phenomics and ML approaches to reduce the reliance on manual inspection and subjective assessment. This approach enables the automatic prediction of relative maturity ratings in a breeding program, saving time and resources.
Authors: Sandeep Banik, Jinrae Kim, Naira Hovakimyan, Luca Carlone, John P. Thomas, Nancy G. Leveson
Abstract: Vertical take‑off and landing (VTOL) unmanned aerial vehicles (UAVs) are versatile platforms widely used in applications such as surveillance, search and rescue, and urban air mobility. Despite their potential, the critical phases of take‑off and landing in uncertain and dynamic environments pose significant safety challenges due to environmental uncertainties, sensor noise, and system‑level interactions. This paper presents an integrated approach combining vision‑based sensor fusion with System‑Theoretic Process Analysis (STPA) to enhance the safety and robustness of VTOL UAV operations during take‑off and landing. By incorporating fiducial markers, such as AprilTags, into the control architecture, and performing comprehensive hazard analysis, we identify unsafe control actions and propose mitigation strategies. Key contributions include developing the control structure with vision system capable of identifying a fiducial marker, multirotor controller and corresponding unsafe control actions and mitigation strategies. The proposed solution is expected to improve the reliability and safety of VTOL UAV operations, paving the way for resilient autonomous systems.
Authors: Serhii Svystun, Oleksandr Melnychenko, Pavlo Radiuk, Oleg Savenko, Andrii Lysyi
Abstract: With the rapid development of green energy, the efficiency and reliability of wind turbines are key to sustainable renewable energy production. For that reason, this paper presents a novel intelligent system architecture designed for the dynamic collection and real‑time processing of visual data to detect defects in wind turbines. The system employs advanced algorithms within a distributed framework to enhance inspection accuracy and efficiency using unmanned aerial vehicles (UAVs) with integrated visual and thermal sensors. An experimental study conducted at the "Staryi Sambir‑1" wind power plant in Ukraine demonstrates the system's effectiveness, showing a significant improvement in defect detection accuracy (up to 94%) and a reduction in inspection time per turbine (down to 1.5 hours) compared to traditional methods. The results show that the proposed intelligent system architecture provides a scalable and reliable solution for wind turbine maintenance, contributing to the durability and performance of renewable energy infrastructure.
Authors: Jin Zhang, Xiaoran Qin, Ming Zhang
Abstract: With the increasing development of intelligent transportation systems and advancements in aviation technology, the concept of Advanced Air Mobility (AAM) is gaining attention. This study aims to improve operational safety and service quality within Urban Air Mobility (UAM) through a trajectory‑based operation (TBO). A multi‑layer operational risk assessment model is introduced to capture the effects of aircraft failure scenarios on critical urban entities, including ground personnel, vehicles, and in‑flight UAVs (unmanned aerial vehicles). Based on this, a single‑aircraft track planning model is designed to balance operational risk and transportation cost under the performance constraints of eVTOL (electric Vertical Take‑off and Landing) aircraft. A customized track planning algorithm with safety buffer zones is used to identify the most efficient flight paths. Additionally, a multi‑aircraft scheduling optimization model is proposed to minimize delays and reduce mid‑air collision risks. Experimental results show that the presented approach improves both efficiency and safety, providing practical solutions for UAM operations.
Authors: Jad Mansour, Hayat Rajani, Rafael Garcia, Nuno Gracias
Abstract: The joint use of event‑based vision and Spiking Neural Networks (SNNs) is expected to have a large impact in robotics in the near future, in tasks such as, visual odometry and obstacle avoidance. While researchers have used real‑world event datasets for optical flow prediction (mostly captured with Unmanned Aerial Vehicles (UAVs)), these datasets are limited in diversity, scalability, and are challenging to collect. Thus, synthetic datasets offer a scalable alternative by bridging the gap between reality and simulation. In this work, we address the lack of datasets by introducing eWiz, a comprehensive library for processing event‑based data. It includes tools for data loading, augmentation, visualization, encoding, and generation of training data, along with loss functions and performance metrics. We further present a synthetic event‑based datasets and data generation pipelines for optical flow prediction tasks. Built on top of eWiz, eCARLA‑scenes makes use of the CARLA simulator to simulate self‑driving car scenarios. The ultimate goal of this dataset is the depiction of diverse environments while laying a foundation for advancing event‑based camera applications in autonomous field vehicle navigation, paving the way for using SNNs on neuromorphic hardware such as the Intel Loihi.
Authors: Shengcai Zhou, Halvin Yang, Luping Xiang, Kun Yang
Abstract: In the evolving landscape of high‑speed communication, the shift from traditional pilot‑based methods to a Sensing‑Oriented Approach (SOA) is anticipated to gain momentum. This paper delves into the development of an innovative Integrated Sensing and Communication (ISAC) framework, specifically tailored for beamforming and trajectory prediction processes. Central to this research is the exploration of an Unmanned Aerial Vehicle (UAV)‑enabled communication system, which seamlessly integrates ISAC technology. This integration underscores the synergistic interplay between sensing and communication capabilities. The proposed system initially deploys omnidirectional beams for the sensing‑focused phase, subsequently transitioning to directional beams for precise object tracking. This process incorporates an Extended Kalman Filtering (EKF) methodology for the accurate estimation and prediction of object states. A novel frame structure is introduced, employing historical sensing data to optimize beamforming in real‑time for subsequent time slots, a strategy we refer to as 'temporal‑assisted' beamforming. To refine the temporal‑assisted beamforming technique, we employ Successive Convex Approximation (SCA) in tandem with Iterative Rank Minimization (IRM), yielding high‑quality suboptimal solutions. Comparative analysis with conventional pilot‑based systems reveals that our approach yields a substantial improvement of 156% in multi‑object scenarios and 136% in single‑object scenarios.
Authors: Ashik E Rasul, Humaira Tasnim, Hyung-Jin Yoon, Ayoosh Bansal, Duo Wang, Naira Hovakimyan, Lui Sha, Petros Voulgaris
Abstract: Learning‑based solutions have enabled incredible capabilities for autonomous systems. Autonomous vehicles, both aerial and ground, rely on DNN for various integral tasks, including perception. The efficacy of supervised learning solutions hinges on the quality of the training data. Discrepancies between training data and operating conditions result in faults that can lead to catastrophic incidents. However, collecting vast amounts of context‑sensitive data, with broad coverage of possible operating environments, is prohibitively difficult. Synthetic data generation techniques for DNN allow for the easy exploration of diverse scenarios. However, synthetic data generation solutions for aerial vehicles are still lacking.
This work presents a data augmentation framework for aerial vehicle's perception training, leveraging photorealistic simulation integrated with high‑fidelity vehicle dynamics. Safe landing is a crucial challenge in the development of autonomous air taxis, therefore, landing maneuver is chosen as the focus of this work. With repeated simulations of landing in varying scenarios we assess the landing performance of the VTOL type UAV and gather valuable data. The landing performance is used as the objective function to optimize the DNN through retraining. Given the high computational cost of DNN retraining, we incorporated Bayesian Optimization in our framework that systematically explores the data augmentation parameter space to retrain the best‑performing models. The framework allowed us to identify high‑performing data augmentation parameters that are consistently effective across different landing scenarios. Utilizing the capabilities of this data augmentation framework, we obtained a robust perception model. The model consistently improved the perception‑based landing success rate by at least 20% under different lighting and weather conditions.
Authors: Xuhui Zhang, Wenchao Liu, Jinke Ren, Huijun Xing, Gui Gui, Yanyan Shen, Shuguang Cui
Abstract: Federated learning (FL) has become a transformative paradigm for distributed machine learning across wireless networks. However, the performance of FL is often hindered by the unreliable communication links between resource‑constrained Internet of Things (IoT) devices and the central server. To overcome this challenge, we propose a novel framework that employs an unmanned aerial vehicle (UAV) as a mobile server to enhance the FL training process. By capitalizing on the UAV's mobility, we establish strong line‑of‑sight connections with IoT devices, thereby enhancing communication reliability and capacity. To maximize training efficiency, we formulate a latency minimization problem that jointly optimizes bandwidth allocation, computing frequencies, transmit power for both the UAV and IoT devices, and the UAV's flight trajectory. Subsequently, we analyze the required rounds of the IoT devices training and the UAV aggregation for FL convergence. Based on the convergence constraint, we transform the problem into three subproblems and develop an efficient alternating optimization algorithm to solve this problem effectively. Additionally, we provide a thorough analysis of the algorithm's convergence and computational complexity. Extensive numerical results demonstrate that our proposed scheme not only surpasses existing benchmark schemes in reducing latency up to 15.29%, but also achieves training efficiency that nearly matches the ideal scenario.
Authors: Ondřej Procházka, Filip Novák, Tomáš Báča, Parakh M. Gupta, Robert Pěnička, Martin Saska
Abstract: This paper proposes a novel trajectory generation method based on Model Predictive Control (MPC) for agile landing of an Unmanned Aerial Vehicle (UAV) onto an Unmanned Surface Vehicle (USV)'s deck in harsh conditions. The trajectory generation exploits the state predictions of the USV to create periodically updated trajectories for a multirotor UAV to precisely land on the deck of a moving USV even in cases where the deck's inclination is continuously changing. We use an MPC‑based scheme to create trajectories that consider both the UAV dynamics and the predicted states of the USV up to the first derivative of position and orientation. Compared to existing approaches, our method dynamically modifies the penalization matrices to precisely follow the corresponding states with respect to the flight phase. Especially during the landing maneuver, the UAV synchronizes attitude with the USV's, allowing for fast landing on a tilted deck. Simulations show the method's reliability in various sea conditions up to Rough sea (wave height 4 m), outperforming state‑of‑the‑art methods in landing speed and accuracy, with twice the precision on average. Finally, real‑world experiments validate the simulation results, demonstrating robust landings on a moving USV, while all computations are performed in real‑time onboard the UAV.
Authors: Hongjuan Li, Hui Kang, Geng Sun, Jiahui Li, Jiacheng Wang, Xue Wang, Dusit Niyato, Victor C. M. Leung
Abstract: Unmanned aerial vehicles (UAVs) have gained considerable attention as a platform for establishing aerial wireless networks and communications. However, the line‑of‑sight dominance in air‑to‑ground communications often leads to significant interference with terrestrial networks, reducing communication efficiency among terrestrial terminals. This paper explores a novel uplink interference mitigation approach based on the collaborative beamforming (CB) method in multi‑UAV network systems. Specifically, the UAV swarm forms a UAV‑enabled virtual antenna array (VAA) to achieve the transmissions of gathered data to multiple base stations (BSs) for data backup and distributed processing. However, there is a trade‑off between the effectiveness of CB‑based interference mitigation and the energy conservation of UAVs. Thus, by jointly optimizing the excitation current weights and hover position of UAVs as well as the sequence of data transmission to various BSs, we formulate an uplink interference mitigation multi‑objective optimization problem (MOOP) to decrease interference affection, enhance transmission efficiency, and improve energy efficiency, simultaneously. In response to the computational demands of the formulated problem, we introduce an evolutionary computation method, namely chaotic non‑dominated sorting genetic algorithm II (CNSGA‑II) with multiple improved operators. The proposed CNSGA‑II efficiently addresses the formulated MOOP, outperforming several other comparative algorithms, as evidenced by the outcomes of the simulations. Moreover, the proposed CB‑based uplink interference mitigation approach can significantly reduce the interference caused by UAVs to non‑receiving BSs.
Authors: Junqiao Wang, Zhongliang Yu, Dong Zhou, Jiaqi Shi, Runran Deng
Abstract: The capability of UAVs for efficient autonomous navigation and obstacle avoidance in complex and unknown environments is critical for applications in agricultural irrigation, disaster relief and logistics. In this paper, we propose the DPRL (Distributed Privileged Reinforcement Learning) navigation algorithm, an end‑to‑end policy designed to address the challenge of high‑speed autonomous UAV navigation under partially observable environmental conditions. Our approach combines deep reinforcement learning with privileged learning to overcome the impact of observation data corruption caused by partial observability. We leverage an asymmetric Actor‑Critic architecture to provide the agent with privileged information during training, which enhances the model's perceptual capabilities. Additionally, we present a multi‑agent exploration strategy across diverse environments to accelerate experience collection, which in turn expedites model convergence. We conducted extensive simulations across various scenarios, benchmarking our DPRL algorithm against the state‑of‑the‑art navigation algorithms. The results consistently demonstrate the superior performance of our algorithm in terms of flight efficiency, robustness and overall success rate.
Authors: Leon Fernando, Billy Pik Lik Lau, Chau Yuen, U-Xuan Tan
Abstract: The rapid advancements in unmanned aerial vehicles (UAVs) have unlocked numerous applications, including environmental monitoring, disaster response, and agricultural surveying. Enhancing the collective behavior of multiple decentralized UAVs can significantly improve these applications through more efficient and coordinated operations. In this study, we explore a Recurrent PPO model for target localization in perceptually degraded environments like places without GNSS/GPS signals. We first developed a single‑drone approach for target identification, followed by a decentralized two‑drone model. Our approach can utilize two types of sensors on the UAVs, a detection sensor and a target signal sensor. The single‑drone model achieved an accuracy of 93%, while the two‑drone model achieved an accuracy of 86%, with the latter requiring fewer average steps to locate the target. This demonstrates the potential of our method in UAV swarms, offering efficient and effective localization of radiant targets in complex environmental conditions.
Authors: Spencer Folk
Abstract: Small unmanned aerial vehicles (UAVs) have become standard tools in reconnaissance and surveying for both civilian and defense applications. In the future, UAVs will likely play a pivotal role in autonomous package delivery, but current multi‑rotor candidates suffer from poor energy efficiency leading to insufficient endurance and range. In order to reduce the power demands of package delivery UAVs while still maintaining necessary hovering capabilities, companies like Amazon are experimenting with hybrid Vertical Take‑Off and Landing (VTOL) platforms. Tailsitter VTOLs offer a mechanically simple and cost‑effective solution compared to other hybrid VTOL configurations, and while advances in hardware and microelectronics have optimized the tailsitter for package delivery, the software behind its operation has largely remained a critical barrier to industry adoption. Tailsitters currently lack a generic, computationally efficient method of control that can provide strong safety and robustness guarantees over the entire flight domain. Further, tailsitters lack a closed‑form method of designing dynamically feasible transition maneuvers between hover and cruise. In this paper, we survey the modeling and control methods currently implemented on small‑scale tailsitter UAVs, and attempt to leverage a nonlinear dynamic model to design physically realizable, continuous‑pitch transition maneuvers at constant altitude. Primary results from this paper isolate potential barriers to constant‑altitude transition, and a novel approach to bypassing these barriers is proposed. While initial results are unsuccessful at providing feasible transition, this work acts as a stepping stone for future efforts to design new transition maneuvers that are safe, robust, and computationally efficient.
Authors: Jiawei Huang, Aimin Wang, Geng Sun, Jiahui Li, Jiacheng Wang, Hongyang Du, Dusit Niyato
Abstract: Unmanned aerial vehicles (UAVs) can be utilized as relay platforms to assist maritime wireless communications. However, complex channels and multipath effects at sea can adversely affect the quality of UAV transmitted signals. Collaborative beamforming (CB) can enhance the signal strength and range to assist the UAV relay for remote maritime communications. However, due to the open nature of UAV channels, security issue requires special consideration. This paper proposes a dual UAV cluster‑assisted system via CB to achieve physical layer security in maritime wireless communications. Specifically, one UAV cluster forms a maritime UAV‑enabled virtual antenna array (MUVAA) relay to forward data signals to the remote legitimate vessel, and the other UAV cluster forms an MUVAA jammer to send jamming signals to the remote eavesdropper. In this system, we formulate a secure and energy‑efficient maritime communication multi‑objective optimization problem (SEMCMOP) to maximize the signal‑to‑interference‑plus‑noise ratio (SINR) of the legitimate vessel, minimize the SINR of the eavesdropping vessel and minimize the total flight energy consumption of UAVs. Since the SEMCMOP is an NP‑hard and large‑scale optimization problem, we propose an improved swarm intelligence optimization algorithm with chaotic solution initialization and hybrid solution update strategies to solve the problem. Simulation results indicate that the proposed algorithm outperforms other comparison algorithms, and it can achieve more efficient signal transmission by using the CB‑based method.
Authors: Chong Huang, Xuyang Chen, Gaojie Chen, Pei Xiao, Geoffrey Ye Li, Wei Huang
Abstract: In this paper, we introduce a novel framework consisting of hybrid bit‑level and generative semantic communications for efficient downlink image transmission within space‑air‑ground integrated networks (SAGINs). The proposed model comprises multiple low Earth orbit (LEO) satellites, unmanned aerial vehicles (UAVs), and ground users. Considering the limitations in signal coverage and receiver antennas that make the direct communication between satellites and ground users unfeasible in many scenarios, thus UAVs serve as relays and forward images from satellites to the ground users. Our hybrid communication framework effectively combines bit‑level transmission with several semantic‑level image generation modes, optimizing bandwidth usage to meet stringent satellite link budget constraints and ensure communication reliability and low latency under low signal‑to‑noise ratio (SNR) conditions. To reduce the transmission delay while ensuring reconstruction quality for the ground user, we propose a novel metric to measure delay and reconstruction quality in the proposed system, and employ a deep reinforcement learning (DRL)‑based strategy to optimize resource allocation in the proposed network. Simulation results demonstrate the superiority of the proposed framework in terms of communication resource conservation, reduced latency, and maintaining high image quality, significantly outperforming traditional solutions. Therefore, the proposed framework can ensure that real‑time image transmission requirements in SAGINs, even under dynamic network conditions and user demand.
Authors: Serhii Svystun, Oleksandr Melnychenko, Pavlo Radiuk, Oleg Savenko, Anatoliy Sachenko, Andrii Lysyi
Abstract: The inspection of wind turbine blades (WTBs) is crucial for ensuring their structural integrity and operational efficiency. Traditional inspection methods can be dangerous and inefficient, prompting the use of unmanned aerial vehicles (UAVs) that access hard‑to‑reach areas and capture high‑resolution imagery. In this study, we address the challenge of enhancing defect detection on WTBs by integrating thermal and RGB images obtained from UAVs. We propose a multispectral image composition method that combines thermal and RGB imagery through spatial coordinate transformation, key point detection, binary descriptor creation, and weighted image overlay. Using a benchmark dataset of WTB images annotated for defects, we evaluated several state‑of‑the‑art object detection models. Our results show that composite images significantly improve defect detection efficiency. Specifically, the YOLOv8 model's accuracy increased from 91% to 95%, precision from 89% to 94%, recall from 85% to 92%, and F1‑score from 87% to 93%. The number of false positives decreased from 6 to 3, and missed defects reduced from 5 to 2. These findings demonstrate that integrating thermal and RGB imagery enhances defect detection on WTBs, contributing to improved maintenance and reliability.
Authors: Xiaowen Ye, Yuyi Mao, Xianghao Yu, Shu Sun, Liqun Fu, Jie Xu
Abstract: This paper studies an integrated sensing and communications (ISAC) system for low‑altitude economy (LAE), where a ground base station (GBS) provides communication and navigation services for authorized unmanned aerial vehicles (UAVs), while sensing the low‑altitude airspace to monitor the unauthorized mobile target. The expected communication sum‑rate over a given flight period is maximized by jointly optimizing the beamforming at the GBS and UAVs' trajectories, subject to the constraints on the average signal‑to‑noise ratio requirement for sensing, the flight mission and collision avoidance of UAVs, as well as the maximum transmit power at the GBS. Typically, this is a sequential decision‑making problem with the given flight mission. Thus, we transform it to a specific Markov decision process (MDP) model called episode task. Based on this modeling, we propose a novel LAE‑oriented ISAC scheme, referred to as Deep LAE‑ISAC (DeepLSC), by leveraging the deep reinforcement learning (DRL) technique. In DeepLSC, a reward function and a new action selection policy termed constrained noise‑exploration policy are judiciously designed to fulfill various constraints. To enable efficient learning in episode tasks, we develop a hierarchical experience replay mechanism, where the gist is to employ all experiences generated within each episode to jointly train the neural network. Besides, to enhance the convergence speed of DeepLSC, a symmetric experience augmentation mechanism, which simultaneously permutes the indexes of all variables to enrich available experience sets, is proposed. Simulation results demonstrate that compared with benchmarks, DeepLSC yields a higher sum‑rate while meeting the preset constraints, achieves faster convergence, and is more robust against different settings.
Authors: Bin Li, Huimin Shan
Abstract: Traditional video transmission systems assisted by multiple Unmanned Aerial Vehicles (UAVs) are often limited by computing resources, making it challenging to meet the demands for efficient video processing. To solve this challenge, this paper presents a multi‑UAV‑assisted Device‑to‑Device (D2D) mobile edge computing system for the maximization of task offloading profits in video stream transmission. In particular, the system enables UAVs to collaborate with idle user devices to process video computing tasks by introducing D2D communications. To maximize the system efficiency, the paper jointly optimizes power allocation, video transcoding strategies, computing resource allocation, and UAV trajectory. The resulting non‑convex optimization problem is formulated as a Markov decision process and solved relying on the Twin Delayed Deep Deterministic policy gradient (TD3) algorithm. Numerical results indicate that the proposed TD3 algorithm performs a significant advantage over other traditional algorithms in enhancing the overall system efficiency.
Authors: Alejandro Puente-Castro, Enrique Fernandez-Blanco, Daniel Rivero
Abstract: Path Planning methods for autonomously controlling swarms of unmanned aerial vehicles (UAVs) are gaining momentum due to their operational advantages. An increasing number of scenarios now require autonomous control of multiple UAVs, as autonomous operation can significantly reduce labor costs. Additionally, obtaining optimal flight paths can lower energy consumption, thereby extending battery life for other critical operations. Many of these scenarios, however, involve obstacles such as power lines and trees, which complicate Path Planning. This paper presents an evolutionary computation‑based system employing genetic algorithms to address this problem in environments with obstacles. The proposed approach aims to ensure complete coverage of areas with fixed obstacles, such as in field exploration tasks, while minimizing flight time regardless of map size or the number of UAVs in the swarm. No specific goal points or prior information beyond the provided map is required. The experiments conducted in this study used five maps of varying sizes and obstacle densities, as well as a control map without obstacles, with different numbers of UAVs. The results demonstrate that this method can determine optimal paths for all UAVs during full map traversal, thus minimizing resource consumption. A comparative analysis with other state‑of‑the‑art approach is presented to highlight the advantages and potential limitations of the proposed method.
Authors: Thulio Amorim, Tiago Nascimento, Akash Chaudhary, Eliseo Ferrante, Martin Saska
Abstract: In this work, we propose a minimalistic swarm flocking approach for multirotor unmanned aerial vehicles (UAVs). Our approach allows the swarm to achieve cohesively and aligned flocking (collective motion), in a random direction, without externally provided directional information exchange (alignment control). The method relies on minimalistic sensory requirements as it uses only the relative range and bearing of swarm agents in local proximity obtained through onboard sensors on the UAV. Thus, our method is able to stabilize and control the flock of a general shape above a steep terrain without any explicit communication between swarm members. To implement proximal control in a three‑dimensional manner, the Lennard‑Jones potential function is used to maintain cohesiveness and avoid collisions between robots. The performance of the proposed approach was tested in real‑world conditions by experiments with a team of nine UAVs. Experiments also present the usage of our approach on UAVs that are independent of external positioning systems such as the Global Navigation Satellite System (GNSS). Relying only on a relative visual localization through the ultraviolet direction and ranging (UVDAR) system, previously proposed by our group, the experiments verify that our system can be applied in GNSS‑denied environments. The degree achieved of alignment and cohesiveness was evaluated using the metrics of order and steady‑state value.
Authors: Davi Santos, Martin Saska, Tiago Nascimento
Abstract: This paper addresses the problem of thrust estimation and control for the rotors of small‑sized multirotors Uncrewed Aerial Vehicles (UAVs). Accurate control of the thrust generated by each rotor during flight is one of the main challenges for robust control of quadrotors. The most common approach is to approximate the mapping of rotor speed to thrust with a simple quadratic model. This model is known to fail under non‑hovering flight conditions, introducing errors into the control pipeline. One of the approaches to modeling the aerodynamics around the propellers is the Blade Element Momentum Theory (BEMT). Here, we propose a novel BEMT‑based closed‑loop thrust estimator and control to eliminate the laborious calibration step of finding several aerodynamic coefficients. We aim to reuse known values as a baseline and fit the thrust estimate to values closest to the real ones with a simple test bench experiment, resulting in a single scaling value. A feedforward PID thrust control was implemented for each rotor, and the methods were validated by outdoor experiments with two multirotor UAV platforms: 250mm and 500mm. A statistical analysis of the results showed that the thrust estimation and control provided better robustness under aerodynamically varying flight conditions compared to the quadratic model.
Authors: Lucas Nogueira Nobrega, Ewerton de Oliveira, Martin Saska, Tiago Nascimento
Abstract: The human‑robot interaction (HRI) is a growing area of research. In HRI, complex command (action) classification is still an open problem that usually prevents the real applicability of such a technique. The literature presents some works that use neural networks to detect these actions. However, occlusion is still a major issue in HRI, especially when using uncrewed aerial vehicles (UAVs), since, during the robot's movement, the human operator is often out of the robot's field of view. Furthermore, in multi‑robot scenarios, distributed training is also an open problem. In this sense, this work proposes an action recognition and control approach based on Long Short‑Term Memory (LSTM) Deep Neural Networks with two layers in association with three densely connected layers and Federated Learning (FL) embedded in multiple drones. The FL enabled our approach to be trained in a distributed fashion, i.e., access to data without the need for cloud or other repositories, which facilitates the multi‑robot system's learning. Furthermore, our multi‑robot approach results also prevented occlusion situations, with experiments with real robots achieving an accuracy greater than 96%.
Authors: Bryce Hopkins, Leo ONeill, Michael Marinaccio, Eric Rowell, Russell Parsons, Sarah Flanary, Irtija Nazim, Carl Seielstad, Fatemeh Afghah
Abstract: The increasing accessibility of radiometric thermal imaging sensors for unmanned aerial vehicles (UAVs) offers significant potential for advancing AI‑driven aerial wildfire management. Radiometric imaging provides per‑pixel temperature estimates, a valuable improvement over non‑radiometric data that requires irradiance measurements to be converted into visible images using RGB color palettes. Despite its benefits, this technology has been underutilized largely due to a lack of available data for researchers. This study addresses this gap by introducing methods for collecting and processing synchronized visual spectrum and radiometric thermal imagery using UAVs at prescribed fires. The included imagery processing pipeline drastically simplifies and partially automates each step from data collection to neural network input. Further, we present the FLAME 3 dataset, the first comprehensive collection of side‑by‑side visual spectrum and radiometric thermal imagery of wildland fires. Building on our previous FLAME 1 and FLAME 2 datasets, FLAME 3 includes radiometric thermal Tag Image File Format (TIFFs) and nadir thermal plots, providing a new data type and collection method. This dataset aims to spur a new generation of machine learning models utilizing radiometric thermal imagery, potentially trivializing tasks such as aerial wildfire detection, segmentation, and assessment. A single‑burn subset of FLAME 3 for computer vision applications is available on Kaggle with the full 6 burn set available to readers upon request.
Authors: Reek Majumder, Gurcan Comert, David Werth, Adrian Gale, Mashrur Chowdhury, M Sabbir Salek
Abstract: The network of services, including delivery, farming, and environmental monitoring, has experienced exponential expansion in the past decade with Unmanned Aerial Vehicles (UAVs). Yet, UAVs are not robust enough against cyberattacks, especially on the Controller Area Network (CAN) bus. The CAN bus is a general‑purpose vehicle‑bus standard to enable microcontrollers and in‑vehicle computers to interact, primarily connecting different Electronic Control Units (ECUs). In this study, we focus on solving some of the most critical security weaknesses in UAVs by developing a novel graph‑based intrusion detection system (IDS) leveraging the Uncomplicated Application‑level Vehicular Communication and Networking (UAVCAN) protocol. First, we decode CAN messages based on UAVCAN protocol specification; second, we present a comprehensive method of transforming tabular UAVCAN messages into graph structures. Lastly, we apply various graph‑based machine learning models for detecting cyber‑attacks on the CAN bus, including graph convolutional neural networks (GCNNs), graph attention networks (GATs), Graph Sample and Aggregate Networks (GraphSAGE), and graph structure‑based transformers. Our findings show that inductive models such as GATs, GraphSAGE, and graph‑based transformers can achieve competitive and even better accuracy than transductive models like GCNNs in detecting various types of intrusions, with minimum information on protocol specification, thus providing a generic robust solution for CAN bus security for the UAVs. We also compared our results with baseline single‑layer Long Short‑Term Memory (LSTM) and found that all our graph‑based models perform better without using any decoded features based on the UAVCAN protocol, highlighting higher detection performance with protocol‑independent capability.
Authors: Jan Quenzel, Linus T. Mallwitz, Benedikt T. Arnold, Sven Behnke
Abstract: Modern unmanned aerial vehicles (UAVs) are irreplaceable in search and rescue (SAR) missions to obtain a situational overview or provide closeups without endangering personnel. However, UAVs heavily rely on global navigation satellite system (GNSS) for localization which works well in open spaces, but the precision drastically degrades in the vicinity of buildings. These inaccuracies hinder aggregation of diverse data from multiple sources in a unified georeferenced frame for SAR operators. In contrast, CityGML models provide approximate building shapes with accurate georeferenced poses. Besides, LiDAR works best in the vicinity of 3D structures. Hence, we refine coarse GNSS measurements by registering LiDAR maps against CityGML and digital elevation map (DEM) models as a prior for allocentric mapping. An intuitive plausibility score selects the best hypothesis based on occupancy using a 2D height map. Afterwards, we integrate the registration results in a continuous‑time spline‑based pose graph optimizer with LiDAR odometry and further sensing modalities to obtain globally consistent, georeferenced trajectories and maps. We evaluate the viability of our approach on multiple flights captured at two distinct testing sites. Our method successfully reduced GNSS offset errors from up‑to 16 m to below 0.5 m on multiple flights. Furthermore, we obtain globally consistent maps w.r.t. prior 3D geospatial models.
Authors: Martin Křížek, Matouš Vrba, Antonella Barišić Kulaš, Stjepan Bogdan, Martin Saska
Abstract: We propose a new approach to visual perception for relative localization of agents within large‑scale swarms of UAVs. Inspired by biological perception utilized by schools of sardines, swarms of bees, and other large groups of animals capable of moving in a decentralized yet coherent manner, our method does not rely on detecting individual neighbors by each agent and estimating their relative position, but rather we propose to regress a neighbor density over distance. This allows for a more accurate distance estimation as well as better scalability with respect to the number of neighbors. Additionally, a novel swarm control algorithm is proposed to make it compatible with the new relative localization method. We provide a thorough evaluation of the presented methods and demonstrate that the regressing approach to distance estimation is more robust to varying relative pose of the targets and that it is suitable to be used as the main source of relative localization for swarm stabilization.
Authors: Junzhi Li, Jingliang Sun, Teng Long, Zhenlin Zhou
Abstract: Due to the strong nonlinearity and nonholonomic dynamics, despite the various general trajectory optimization methods presented, few of them can guarantee efficient computation and physical feasibility for relatively complicated fixed‑wing UAV dynamics. Aiming at this issue, this paper investigates a differential flatness‑based trajectory optimization method for fixed‑wing UAVs (DFTO‑FW). The customized trajectory representation is presented through differential flat characteristics analysis and polynomial parameterization, eliminating equality constraints to avoid the heavy computational burdens of solving complex dynamics. Through the design of integral performance costs and derivation of analytical gradients, the original trajectory optimization is transcribed into a lightweight, unconstrained, gradient‑analytical optimization with linear time complexity to improve efficiency further. The simulation experiments illustrate the superior efficiency of the DFTO‑FW, which takes sub‑second CPU time (on a personal desktop) against other competitors by orders of magnitude to generate fixed‑wing UAV trajectories in randomly generated obstacle environments.
Authors: Sina Kazemdehbashi
Abstract: Unmanned aerial vehicles (UAVs) are increasingly utilized in search and rescue (SAR) operations to enhance efficiency by enabling rescue teams to cover large search areas in a shorter time. Reducing coverage time directly increases the likelihood of finding the target quickly, thereby improving the chances of a successful SAR operation. In this context, UAVs require path planning to determine the optimal flight path that fully covers the search area in the least amount of time. A common approach involves decomposing the search area into a grid, where the UAV must visit all cells to achieve complete coverage. In this paper, we propose an Adaptive Grid‑based Decomposition (AGD) algorithm that efficiently partitions polygonal search areas into grids with fewer cells. Additionally, we utilize a Mixed‑Integer Programming (MIP) model, compatible with the AGD algorithm, to determine a flight path that ensures complete cell coverage while minimizing overall coverage time. Experimental results highlight the efficiency of the AGD algorithm in reducing coverage time (by up to 20%) across various scenarios.
Authors: Vit Kratky, Robert Penicka, Jiri Horyna, Petr Stibinger, Tomas Baca, Matej Petrlik, Petr Stepan, Martin Saska
Abstract: In this paper, we introduce an algorithm designed to address the problem of time‑optimal formation reshaping in three‑dimensional environments while preventing collisions between agents. The utility of the proposed approach is particularly evident in mobile robotics, where agents benefit from being organized and navigated in formation for a variety of real‑world applications requiring frequent alterations in formation shape for efficient navigation or task completion. Given the constrained operational time inherent to battery‑powered mobile robots, the time needed to complete the formation reshaping process is crucial for their efficient operation, especially in case of multi‑rotor Unmanned Aerial Vehicles (UAVs). The proposed Collision‑Aware Time‑Optimal formation Reshaping Algorithm (CAT‑ORA) builds upon the Hungarian algorithm for the solution of the robot‑to‑goal assignment implementing the inter‑agent collision avoidance through direct constraints on mutually exclusive robot‑goal pairs combined with a trajectory generation approach minimizing the duration of the reshaping process. Theoretical validations confirm the optimality of CAT‑ORA, with its efficacy further showcased through simulations, and a real‑world outdoor experiment involving 19 UAVs. Thorough numerical analysis shows the potential of CAT‑ORA to decrease the time required to perform complex formation reshaping tasks by up to 49%, and 12% on average compared to commonly used methods in randomly generated scenarios.
Authors: Chiya Zhang, Ting Wang, Rubing Han, Yuanxiang Gong
Abstract: Unmanned Aerial Vehicles (UAVs) are increasingly utilized in wireless communication, yet accurate channel loss prediction remains a significant challenge, limiting resource optimization performance. To address this issue, this paper leverages Artificial Intelligence Generated Content (AIGC) for the efficient construction of Channel Knowledge Maps (CKM) and UAV trajectory design. Given the time‑consuming nature of channel data collection, AI techniques are employed in a Wasserstein Generative Adversarial Network (WGAN) to extract environmental features and augment the data. Experiment results demonstrate the effectiveness of the proposed framework in improving CKM construction accuracy. Moreover, integrating CKM into UAV trajectory planning reduces channel gain uncertainty, demonstrating its potential to enhance wireless communication efficiency.
Authors: Songqiying Yang, Ania Adil, Eric Feron
Abstract: With advancements in technology, commercial aircraft formation flying is becoming increasingly feasible as an efficient and environmentally friendly flight method. However, gaps remain in practical implementation, particularly in collision avoidance for aircraft formations. Existing avoidance algorithms mainly focus on single aircraft or UAV swarms, lacking comprehensive studies on the complex interactions within commercial aircraft formations. To address this, this paper proposes an optimization model designed to generate safe and effective collision avoidance solutions for commercial aircraft formations. This model demonstrates avoidance paths for formations facing intruders and offers insights for developing formation flight strategies. This study explores response strategies for commercial aircraft formations encountering intruders, considering the difficulty of pilot maneuvers. The findings provide theoretical support for the practical implementation of commercial formation flying and may advance the adoption of this technology.
Authors: Thiviyathinesvaran Palani, Hiroaki Fukushima, Shunsuke Izuhara
Abstract: This paper presents a novel control method for a group of UAVs in obstacle‑laden environments while preserving sensing network connectivity without data transmission between the UAVs. By leveraging constraints rooted in control barrier functions (CBFs), the proposed method aims to overcome the limitations, such as oscillatory behaviors and frequent constraint violations, of the existing method based on artificial potential fields (APFs). More specifically, the proposed method first determines desired control inputs by considering CBF‑based constraints rather than repulsive APFs. The desired inputs are then minimally modified by solving a numerical optimization problem with soft constraints. In addition to the optimization‑based method, we present an approximate method without numerical optimization. The effectiveness of the proposed methods is evaluated by extensive simulations to compare the performance of the CBF‑based methods with an APF‑based approach. Experimental results using real quadrotors are also presented.
Authors: Alexander Vavoulas, Nicholas Vaiopoulos, Konstantinos K. Delibasis, Harilaos G. Sandalidis
Abstract: The integration of unmanned aerial vehicles (UAVs) into next‑generation wireless networks is a promising solution for providing flexible, efficient coverage. This paper explores the optimal deployment of a single UAV to cover an arbitrary convex quadrilateral region, utilizing a directional antenna with a tiltable beam that produces an elliptical coverage footprint. We examine two distinct coverage scenarios: (i) the largest inscribed ellipse, which maximizes coverage within the quadrilateral while excluding the boundary, and (ii) the smallest circumscribed ellipse, ensuring complete coverage of the entire area. The study formulates an optimization framework that accounts for path loss, signal‑to‑noise ratio (SNR), and energy consumption to determine the optimal altitude of the UAV. By employing a simplified path loss model, we derive the altitude that minimizes maximum path loss, while also analyzing the impact of antenna directivity on maximizing the minimum SNR at the coverage boundary. Additionally, the UAV's energy consumption is evaluated, considering the power demands during hovering, forward flight, and vertical takeoff. Numerical simulations are presented to illustrate the trade‑offs between coverage effectiveness, communication performance, and energy efficiency across various environmental conditions and antenna configurations.
Authors: Xinsong Feng, Ian P. Roberts
Abstract: This paper leverages stochastic geometry to model, analyze, and optimize multi‑band unmanned aerial vehicle (UAV) communication networks operating across low‑frequency and millimeter‑wave (mmWave) bands. We introduce a novel approach to modeling mmWave antenna gain in such networks, which allows us to better capture and account for interference in our analysis and optimization. We then propose a simple yet effective user‑UAV association policy, which strategically biases users towards mmWave UAVs to take advantage of lower interference and wider bandwidths compared to low‑frequency UAVs. Under this scheme, we analytically derive the corresponding association probability, coverage probability, and spectral efficiency. We conclude by assessing our proposed association policy through simulation and analysis, demonstrating its effectiveness based on coverage probability and per‑user data rates, as well as the alignment between analytical and simulation results.
Authors: Muhammad Waseem Akram, Marco Vannucci, Giorgio Buttazzo, Valentina Colla, Stefano Roccella, Andrea Vannini, Giovanni Caruso, Simone Nesi, Alessandra Francini, Luca Sebastiani
Abstract: The leaf area index determines crop health and growth. Traditional methods for calculating it are time‑consuming, destructive, costly, and limited to a scale. In this study, we automate the index estimation method using drone image data of grapevine plants and a machine learning model. Traditional feature extraction and deep learning methods are used to obtain helpful information from the data and enhance the performance of the different machine learning models employed for the leaf area index prediction. The results showed that deep learning based feature extraction is more effective than traditional methods. The new approach is a significant improvement over old methods, offering a faster, non‑destructive, and cost‑effective leaf area index calculation, which enhances precision agriculture practices.
Authors: Yaosheng Deng, Mengtao Lyu, Junjie Gao, Jiaping Xiao, Mir Feroskhan
Abstract: Reinforcement Learning (RL) enables autonomous aerial vehicles to adapt quickly and make efficient decisions, making it well‑suited for dynamic urban air mobility operations. However, the lack of safety guarantees and transparency hinders the airworthiness certification of RL‑based flight control systems, particularly in low‑altitude urban environments with human presence. This paper proposes a trustworthy reinforcement learning algorithm that utilizes safe techniques to address the AI trustworthiness requirements for aviation safety, ensuring the transparent and certifiable deployment of RL in safety‑critical aerial operations. Specifically, we proposed a Trustworthy Reinforcement learning Using Safe Techniques for UAV Pursuit (TRUST‑UP), which consists of two key components: a safety filter constructed from Control Barrier Functions (CBFs) that transforms unsafe RL actions into provably safe flight commands, and a switching strategy that enhances feasibility while maintaining operational transparency. These components enable trustworthy AI deployment in urban airspace, satisfying technical robustness and transparency requirements for aviation certification. Simulation results demonstrate that TRUST‑UP enables autonomous UAVs to safely navigate congested urban environments while maintaining human‑interpretable decision logic. This work contributes toward certifiable and explainable AI frameworks for low‑altitude aviation, addressing the critical need for trustworthy autonomous flight systems in future urban air mobility.
Authors: Serhii Svystun, Oleksandr Melnychenko, Pavlo Radiuk, Oleg Savenko, Anatoliy Sachenko, Andrii Lysyi
Abstract: The research presents an automated method for determining the trajectory of an unmanned aerial vehicle (UAV) for wind turbine inspection. The proposed method enables efficient data collection from multiple wind installations using UAV optical sensors, considering the spatial positioning of blades and other components of the wind energy installation. It includes component segmentation of the wind energy unit (WEU), determination of the blade pitch angle, and generation of optimal flight trajectories, considering safe distances and optimal viewing angles. The results of computational experiments have demonstrated the advantage of the proposed method in monitoring WEU, achieving a 78% reduction in inspection time, a 17% decrease in total trajectory length, and a 6% increase in average blade surface coverage compared to traditional methods. Furthermore, the process minimizes the average deviation from the optimal trajectory by 68%, indicating its high accuracy and ability to compensate for external influences.
Authors: Viswa Narayanan Sankaranarayanan, Achilleas Santi Seisa, Akshit Saradagi, Sumeet Satpute, George Nikolakopoulos
Abstract: In this article, we propose a control architecture for the safe, coordinated operation of a multi‑agent system with aerial (UAVs) and ground (UGVs) robots in a confined task space. We consider the case where the aerial and ground operations are coupled, enabled by the capability of the aerial robots to land on moving ground robots. The proposed method uses time‑varying Control Barrier Functions (CBFs) to impose safety constraints associated with (i) collision avoidance between agents, (ii) landing of UAVs on mobile UGVs, and (iii) task space restriction. Further, this article addresses the challenge induced by the rapid increase in the number of CBF constraints with the increasing number of agents through a hybrid centralized‑distributed coordination approach that determines the set of CBF constraints that is relevant for every aerial and ground agent at any given time. A centralized node (Watcher), hosted by an edge computing cluster, activates the relevant constraints, thus reducing the network complexity and the need for high onboard processing on the robots. The CBF constraints are enforced in a distributed manner by individual robots that run a nominal controller and safety filter locally to overcome latency and other network nonidealities.
Authors: Ankit Shaw
Abstract: This paper presents a comprehensive overview of exploration strategies utilized in both 2D and 3D environments, focusing on autonomous multi‑robot systems designed for building exploration and fire detection. We explore the limitations of traditional algorithms that rely on prior knowledge and predefined maps, emphasizing the challenges faced when environments undergo changes that invalidate these maps. Our modular approach integrates localization, mapping, and trajectory planning to facilitate effective exploration using an OctoMap framework generated from point cloud data. The exploration strategy incorporates obstacle avoidance through potential fields, ensuring safe navigation in dynamic settings. Additionally, I propose future research directions, including decentralized map creation, coordinated exploration among unmanned aerial vehicles (UAVs), and adaptations to time‑varying environments. This work serves as a foundation for advancing coordinated multi‑robot exploration algorithms, enhancing their applicability in real‑world scenarios.
Authors: Guofeng Yang, Yu Li, Yong He, Zhenjiang Zhou, Lingzhen Ye, Hui Fang, Yiqi Luo, Xuping Feng
Abstract: UAV remote sensing technology has become a key technology in crop breeding, which can achieve high‑throughput and non‑destructive collection of crop phenotyping data. However, the multidisciplinary nature of breeding has brought technical barriers and efficiency challenges to knowledge mining. Therefore, it is important to develop a smart breeding goal tool to mine cross‑domain multimodal data. Based on different pre‑trained open‑source multimodal large language models (MLLMs) (e.g., Qwen‑VL, InternVL, Deepseek‑VL), this study used supervised fine‑tuning (SFT), retrieval‑augmented generation (RAG), and reinforcement learning from human feedback (RLHF) technologies to inject cross‑domain knowledge into MLLMs, thereby constructing multiple multimodal large language models for wheat breeding (WBLMs). The above WBLMs were evaluated using the newly created evaluation benchmark in this study. The results showed that the WBLM constructed using SFT, RAG and RLHF technologies and InternVL2‑8B has leading performance. Then, subsequent experiments were conducted using the WBLM. Ablation experiments indicated that the combination of SFT, RAG, and RLHF technologies can improve the overall generation performance, enhance the generated quality, balance the timeliness and adaptability of the generated answer, and reduce hallucinations and biases. The WBLM performed best in wheat yield prediction using cross‑domain data (remote sensing, phenotyping, weather, germplasm) simultaneously, with R2 and RMSE of 0.821 and 489.254 kg/ha, respectively. Furthermore, the WBLM can generate professional decision support answers for phenotyping estimation, environmental stress assessment, target germplasm screening, cultivation technique recommendation, and seed price query tasks.
Authors: Haoyuan Li, Chang Xu, Wen Yang, Li Mi, Huai Yu, Haijian Zhang
Abstract: Unmanned Aerial Vehicle (UAV) Cross‑View Geo‑Localization (CVGL) presents significant challenges due to the view discrepancy between oblique UAV images and overhead satellite images. Existing methods heavily rely on the supervision of labeled datasets to extract viewpoint‑invariant features for cross‑view retrieval. However, these methods have expensive training costs and tend to overfit the region‑specific cues, showing limited generalizability to new regions. To overcome this issue, we propose an unsupervised solution that lifts the scene representation to 3d space from UAV observations for satellite image generation, providing robust representation against view distortion. By generating orthogonal images that closely resemble satellite views, our method reduces view discrepancies in feature representation and mitigates shortcuts in region‑specific image pairing. To further align the rendered image's perspective with the real one, we design an iterative camera pose updating mechanism that progressively modulates the rendered query image with potential satellite targets, eliminating spatial offsets relative to the reference images. Additionally, this iterative refinement strategy enhances cross‑view feature invariance through view‑consistent fusion across iterations. As such, our unsupervised paradigm naturally avoids the problem of region‑specific overfitting, enabling generic CVGL for UAV images without feature fine‑tuning or data‑driven training. Experiments on the University‑1652 and SUES‑200 datasets demonstrate that our approach significantly improves geo‑localization accuracy while maintaining robustness across diverse regions. Notably, without model fine‑tuning or paired training, our method achieves competitive performance with recent supervised methods.
Authors: Jun Xiang, Jun Chen
Abstract: Safety is extremely important for urban flights of autonomous Unmanned Aerial Vehicles (UAVs). Risk‑aware path planning is one of the most effective methods to guarantee the safety of UAVs. This type of planning can be represented as a Constrained Shortest Path (CSP) problem, which seeks to find the shortest route that meets a predefined safety constraint. Solving CSP problems is NP‑hard, presenting significant computational challenges. Although traditional methods can accurately solve CSP problems, they tend to be very slow. Previously, we introduced an additional safety dimension to the traditional A algorithm, known as ASD A, to effectively handle Constrained Shortest Path (CSP) problems. Then, we developed a custom learning‑based heuristic using transformer‑based neural networks, which significantly reduced computational load and enhanced the performance of the ASD A algorithm. In this paper, we expand our dataset to include more risk maps and tasks, improve the proposed model, and increase its performance. We also introduce a new heuristic strategy and a novel neural network, which enhance the overall effectiveness of our approach.
Authors: Jun Xiang, Drake Essick, Luiz Gonzalez Bautista, Junfei Xie, Jun Chen
Abstract: Models for trajectory prediction are an essential component of many advanced air mobility studies. These models help aircraft detect conflict and plan avoidance maneuvers, which is especially important in Unmanned Aircraft systems (UAS) landing management due to the congested airspace near vertiports. In this paper, we propose a landing trajectory prediction model for UAS based on Generative Adversarial Network (GAN). The GAN is a prestigious neural network that has been developed for many years. In previous research, GAN has achieved many state‑of‑the‑art results in many generation tasks. The GAN consists of one neural network generator and a neural network discriminator. Because of the learning capacity of the neural networks, the generator is capable to understand the features of the sample trajectory. The generator takes the previous trajectory as input and outputs some random status of a flight. According to the results of the experiences, the proposed model can output more accurate predictions than the baseline method(GMR) in various datasets. To evaluate the proposed model, we also create a real UAV landing dataset that includes more than 2600 trajectories of drone control manually by real pilots.
Authors: Maryam Ghaffari Saadat, Angelo Ferrando, Louise A. Dennis, Michael Fisher
Abstract: Formal verification of robotic applications presents challenges due to their hybrid nature and distributed architecture. This paper introduces ROSMonitoring 2.0, an extension of ROSMonitoring designed to facilitate the monitoring of both topics and services while considering the order in which messages are published and received. The framework has been enhanced to support these novel features for ROS1 ‑‑ and partially ROS2 environments ‑‑ offering improved real‑time support, security, scalability, and interoperability. We discuss the modifications made to accommodate these advancements and present results obtained from a case study involving the runtime monitoring of specific components of a fire‑fighting Uncrewed Aerial Vehicle (UAV).
Authors: Fei Song, Zhe Wang, Jun Li, Long Shi, Wen Chen, Shi Jin
Abstract: In ultra‑dense unmanned aerial vehicle (UAV) networks, it is challenging to coordinate the resource allocation and interference management among large‑scale UAVs, for providing flexible and efficient service coverage to the ground users (GUs). In this paper, we propose a learning‑based resource allocation scheme in an ultra‑dense UAV communication network, where the GUs' service demands are time‑varying with unknown distributions. We formulate the non‑cooperative game among multiple co‑channel UAVs as a stochastic game, where each UAV jointly optimizes its trajectory, user association, and downlink power control to maximize the expectation of its locally cumulative energy efficiency under the interference and energy constraints. To cope with the scalability issue in a large‑scale network, we further formulate the problem as a mean‑field game (MFG), which simplifies the interactions among the UAVs into a two‑player game between a representative UAV and a mean‑field. We prove the existence and uniqueness of the equilibrium for the MFG, and propose a model‑free mean‑field reinforcement learning algorithm named maximum entropy mean‑field deep Q network (ME‑MFDQN) to solve the mean‑field equilibrium in both fully and partially observable scenarios. The simulation results reveal that the proposed algorithm improves the energy efficiency compared with the benchmark algorithms. Moreover, the performance can be further enhanced if the GUs' service demands exhibit higher temporal correlation or if the UAVs have wider observation capabilities over their nearby GUs.
Authors: Yu Bai, Boxuan Xie, Ruifan Zhu, Zheng Chang, Riku Jantti
Abstract: Backscatter communication (BC) becomes a promising energy‑efficient solution for future wireless sensor networks (WSNs). Unmanned aerial vehicles (UAVs) enable flexible data collection from remote backscatter devices (BDs), yet conventional UAVs rely on omni‑directional fixed‑position antennas (FPAs), limiting channel gain and prolonging data collection time. To address this issue, we consider equipping a UAV with a directional movable antenna (MA) with high directivity and flexibility. The MA enhances channel gain by precisely aiming its main lobe at each BD, focusing transmission power for efficient communication. Our goal is to minimize the total data collection time by jointly optimizing the UAV's trajectory and the MA's orientation. We develop a deep reinforcement learning (DRL)‑based strategy using the azimuth angle and distance between the UAV and each BD to simplify the agent's observation space. To ensure stability during training, we adopt Soft Actor‑Critic (SAC) algorithm that balances exploration with reward maximization for efficient and reliable learning. Simulation results demonstrate that our proposed MA‑equipped UAV with SAC outperforms both FPA‑equipped UAVs and other RL methods, achieving significant reductions in both data collection time and energy consumption.
Authors: Yunuo Zhang, Baiting Luo, Ayan Mukhopadhyay, Daniel Stojcsics, Daniel Elenius, Anirban Roy, Susmit Jha, Miklos Maroti, Xenofon Koutsoukos, Gabor Karsai, Abhishek Dubey
Abstract: Efficient path optimization for drones in search and rescue operations faces challenges, including limited visibility, time constraints, and complex information gathering in urban environments. We present a comprehensive approach to optimize UAV‑based search and rescue operations in neighborhood areas, utilizing both a 3D AirSim‑ROS2 simulator and a 2D simulator. The path planning problem is formulated as a partially observable Markov decision process (POMDP), and we propose a novel ``Shrinking POMCP'' approach to address time constraints. In the AirSim environment, we integrate our approach with a probabilistic world model for belief maintenance and a neurosymbolic navigator for obstacle avoidance. The 2D simulator employs surrogate ROS2 nodes with equivalent functionality. We compare trajectories generated by different approaches in the 2D simulator and evaluate performance across various belief types in the 3D AirSim‑ROS simulator. Experimental results from both simulators demonstrate that our proposed shrinking POMCP solution achieves significant improvements in search times compared to alternative methods, showcasing its potential for enhancing the efficiency of UAV‑assisted search and rescue operations.
Authors: Shima Salar Hosseini, Paeiz Azmi, Ali Nazari
Abstract: Unmanned aerial vehicles (UAVs) have the potential for time‑sensitive applications. Due to wireless channel variation, received data may have an expiration time, particularly in critical situations such as rescue operations, natural disasters, or the military. Age of Information (AoI) is a metric that measures the freshness of received packets to specify the validity period of information. In addition, it is necessary to guarantee the privacy of confidential information transmission through air‑to‑ground links against eavesdroppers. This paper investigates UAV‑assisted covert communication to minimize AoI in the presence of an aerial eavesdropper for the first time. However, to ensure the eavesdropper's error detection rate, UAV‑enabled beamforming employs the power‑domain non‑orthogonal multiple access (PD‑NOMA) technique to cover the covert user by a public user. PD‑NOMA technique significantly improves the user's AoI, too. The joint optimization problem contains non‑convex constraints and coupled optimization variables, including UAV trajectory, beamforming design, and the user's AoI which is challenging to derive a direct solution. We have developed an efficient alternating optimization technique to address the formulated optimization problem. Numerical results demonstrate the impact of the main parameters on the performance of the proposed communication system.
Authors: Kevin Weinberger, David Müller, Martin Mönnigmann, Aydin Sezgin
Abstract: Reconfigurable Intelligent Surfaces (RIS) are emerging as a key technology for sixth‑generation (6G) wireless networks, leveraging adjustable reflecting elements to dynamically control electromagnetic wave propagation and optimize wireless connectivity. By positioning the RIS on an unmanned aerial vehicle (UAV), it can maintain line‑of‑sight and proximity to both the transmitter and receiver, critical factors that mitigate path loss and enhance signal strength. The lightweight, power‑efficient nature of RIS makes UAV integration feasible, yet the setup faces significant disturbances from UAV motion, which can degrade RIS alignment and link performance. In this study, we address these challenges using both experimental measurements and analytical methods. Using an extended Kalman filter (EKF), we estimate the UAV's orientation in real time during experimental flights to capture real disturbance effects. The resulting orientation uncertainty is then propagated to the RIS's channel estimates by applying the Guide to the Expression of Uncertainty in Measurement (GUM) framework as well as complex‑valued propagation techniques to accurately assess and minimize the impact of UAV orientation uncertainties on RIS performance. This method enables us to systematically trace and quantify how orientation uncertainties affect channel gain and phase stability in real‑time. Through numerical simulations, we find that the uncertainty of the RIS channel link is influenced by the RIS's configuration. Furthermore, our results demonstrate that the uncertainty area is most accurately represented by an annular section, enabling a 58% reduction in the uncertainty area while maintaining a 95% coverage probability.
Authors: Ke Zhang, Zhaoye Zheng, Yurong Guo, Jiacun Wang, Jiyuan Yang, Yangjie Xiao
Abstract: Unmanned aerial vehicle (UAV) patrol inspection has emerged as a predominant approach in transmission line monitoring owing to its cost‑effectiveness. Detecting defects in transmission lines is a critical task during UAV patrol inspection. However, due to imaging distance and shooting angles, UAV patrol images often suffer from insufficient defect‑related visual information, which has an adverse effect on detection accuracy. In this article, we propose a novel method for detecting defects in UAV patrol images, which is based on vision‑language pretraining for transmission line (VLP‑TL) and a progressive transfer strategy (PTS). Specifically, VLP‑TL contains two novel pretraining tasks tailored for the transmission line scenario, aimimg at pretraining an image encoder with abundant knowledge acquired from both visual and linguistic information. Transferring the pretrained image encoder to the defect detector as its backbone can effectively alleviate the insufficient visual information problem. In addition, the PTS further improves transfer performance by progressively bridging the gap between pretraining and downstream defection detection. Experimental results demonstrate that the proposed method significantly improves defect detection accuracy by jointly utilizing multimodal information, overcoming the limitations of insufficient defect‑related visual information provided by UAV patrol images.
Authors: Huan Lin, Lianghui Ding
Abstract: Unmanned aerial vehicle (UAV) swarm networks face severe challenges of communication network split (CNS) issues caused by massive damage in hostile environments. In this paper, we propose a new paradigm to restore network connectivity by repositioning remaining UAVs based on damage information within local topologies. Particularly, the locations of destroyed UAVs distributed in gaps between disconnected sub‑nets are considered for recovery trajectory planning. Specifically, we construct the multi‑hop differential sub‑graph (MDSG) to represent local damage‑varying topologies. Based on this, we develop two distinct algorithms to address CNS issues. The first approach leverages an artificial potential field algorithm to calculate the recovery velocities via MDSG, enabling simple deployment on low‑intelligence UAVs. In the second approach, we design an MDSG‑based graph convolution framework to find the recovery topology for high‑intelligence swarms. As per the unique topology of MDSG, we propose a novel bipartite graph convolution operation, enhanced with a batch‑processing mechanism to improve graph convolution efficiency. Simulation results show that the proposed algorithms expedite the recovery with significant margin while improving the spatial coverage and topology degree uniformity after recovery.
Authors: Kürşat Tekbıyık, Güneş Karabulut Kurt, Antoine Lesage-Landry
Abstract: The increasing demand for data usage in wireless communications requires using wider bands in the spectrum, especially for backhaul links. Yet, allocations in the spectrum for non‑communication systems inhibit merging bands to achieve wider bandwidth. To overcome this issue, spectrum‑sharing or opportunistic spectrum utilization by secondary users stands out as a promising solution. However, both approaches must minimize interference to primary users. Therefore, spectrum sensing becomes vital for such opportunistic usage, ensuring the proper operation of the primary users. Although this problem has been investigated for 2D networks, unmanned aerial vehicle (UAV) networks need different points of view concerning 3D space, its challenges, and opportunities. For this purpose, we propose a federated learning (FL)‑based method for spectrum sensing in UAV networks to account for their distributed nature and limited computational capacity. FL enables local training without sharing raw data while guaranteeing the privacy of local users,lowering communication overhead, and increasing data diversity. Furthermore, we develop a federated aggregation method, namely FedSNR, that considers the signal‑to‑noise ratio observed by UAVs to acquire a global model. The numerical results show that the proposed architecture and the aggregation method outperform traditional methods.
Authors: Dimitria Silveria, Kleber Cabral, Peter Jardine, Sidney Givigi
Abstract: This work investigates the self‑organization of multi‑agent systems into closed trajectories, a common requirement in unmanned aerial vehicle (UAV) surveillance tasks. In such scenarios, smooth, unbiased control signals save energy and mitigate mechanical strain. We propose a decentralized control system architecture that produces a globally stable emergent structure from local observations only; there is no requirement for agents to share a global plan or follow prescribed trajectories. Central to our approach is the formulation of an injective virtual embedding induced by rotations from the actual agent positions. This embedding serves as a structure‑preserving map around which all agent stabilize their relative positions and permits the use of well‑established linear control techniques. We construct the embedding such that it is topologically equivalent to the desired trajectory (i.e., a homeomorphism), thereby preserving the stability characteristics. We demonstrate the versatility of this approach through implementation on a swarm of Quanser QDrone quadcopters. Results demonstrate the quadcopters self‑organize into the desired trajectory while maintaining even separation.
Authors: Derek Fan, David A. Copp
Abstract: Online trajectory optimization and optimal control methods are crucial for enabling sustainable unmanned aerial vehicle (UAV) services, such as agriculture, environmental monitoring, and transportation, where available actuation and energy are limited. However, optimal controllers are highly sensitive to model mismatch, which can occur due to loaded equipment, packages to be delivered, or pre‑existing variability in fundamental structural and thrust‑related parameters. To circumvent this problem, optimal controllers can be paired with parameter estimators to improve their trajectory planning performance and perform adaptive control. However, UAV platforms are limited in terms of onboard processing power, oftentimes making nonlinear parameter estimation too computationally expensive to consider. To address these issues, we propose a relaxed, affine‑in‑parameters multirotor model along with an efficient optimal parameter estimator. We convexify the nominal Moving Horizon Parameter Estimation (MHPE) problem into a linear‑quadratic form (LQ‑MHPE) via an affine‑in‑parameter relaxation on the nonlinear dynamics, resulting in fast quadratic programs (QPs) that facilitate adaptive Model Predictve Control (MPC) in real time. We compare this approach to the equivalent nonlinear estimator in Monte Carlo simulations, demonstrating a decrease in average solve time and trajectory optimality cost by 98.2% and 23.9‑56.2%, respectively.
Authors: Abuzar B. M. Adam, Elhadj Moustapha Diallo, Mohammed A. M. Elhassan
Abstract: In this work, we explore UAV‑assisted reconfigurable intelligent surface (RIS) technology to enhance downlink communications in wireless networks. By integrating RIS on both UAVs and ground infrastructure, we aim to boost network coverage, fairness, and resilience against challenges such as UAV jitter. To maximize the minimum achievable user rate, we formulate a joint optimization problem involving beamforming, phase shifts, and UAV trajectory. To address this problem, we propose an adaptive soft actor‑critic (ASAC) framework. In this approach, agents are built using adaptive sparse transformers with attentive feature refinement (ASTAFER), enabling dynamic feature processing that adapts to real‑time network conditions. The ASAC model learns optimal solutions to the coupled subproblems in real time, delivering an end‑to‑end solution without relying on iterative or relaxation‑based methods. Simulation results demonstrate that our ASAC‑based approach achieves better performance compared to the conventional SAC. This makes it a robust, adaptable solution for real‑time, fair, and efficient downlink communication in UAV‑RIS networks.
Authors: Ziqi Rong, Qiushi Zheng, Zhishu Shen, Xiaolong Li, Tiehua Zhang, Zheng Lei, Jiong Jin
Abstract: With the rapid advancement of the Internet of Things (IoT) and Artificial Intelligence (AI), intelligent information services are being increasingly integrated across various sectors, including healthcare, industry, and transportation. Traditional solutions rely on centralized cloud processing, which encounters considerable challenges in fulfilling the Quality of Service (QoS) requirements of Computer Vision (CV) tasks generated in the resource‑constrained infrastructure‑less environments. In this paper, we introduce a distributed framework called CoUAV‑Pro for multi‑task video processing powered by Unmanned Aerial Vehicles (UAVs). This framework empowers multiple UAVs to meet the service demands of various computer vision (CV) tasks in infrastructure‑less environments, thereby eliminating the need for centralized processing. Specifically, we develop a novel task allocation algorithm that leverages enhanced distributed actor‑critic networks within CoUAV‑Pro, aiming to optimize task processing efficiency while contending with constraints associated with UAV's energy, computational, and communication resources. Comprehensive experiments demonstrate that our proposed solution achieves satisfactory performance levels against those of centralized methods across key metrics including task acquisition rates, task latency, and energy consumption.
Authors: Zijian Ge, Jingjing Jiang, Matthew Coombes
Abstract: The application of Multiple Unmanned Aerial Vehicles (Multi‑UAV) in Wilderness Search and Rescue (WiSAR) significantly enhances mission success due to their rapid coverage of search areas from high altitudes and their adaptability to complex terrains. This capability is particularly crucial because time is a critical factor in searching for a lost person in the wilderness; as time passes, survival rates decrease and the search area expands. The probability of success in such searches can be further improved if UAVs leverage terrain features to predict the lost person's position. In this paper, we aim to enhance search missions by proposing a smart agent‑based probability model that combines Monte Carlo simulations with an agent strategy list, mimicking the behavior of a lost person in the wildness areas. Furthermore, we develop a distributed Multi‑UAV receding horizon search strategy with dynamic partitioning, utilizing the generated probability density model as prior information to prioritize locations where the lost person is most likely to be found. Simulated search experiments across different terrains have been conducted to validate the search efficiency of the proposed methods compared to other benchmark methods.
Authors: Long He, Geng Sun, Zemin Sun, Jiacheng Wang, Hongyang Du, Dusit Niyato, Jiangchuan Liu, Victor C. M. Leung
Abstract: The emergence of space‑air‑ground integrated multi‑access edge computing (SAGIMEC) networks opens a significant opportunity for the rapidly growing low altitude economy (LAE), facilitating the development of various applications by offering efficient communication and computing services. However, the heterogeneous nature of SAGIMEC networks, coupled with the stringent computational and communication requirements of diverse applications in the LAE, introduces considerable challenges in integrating SAGIMEC into the LAE. In this work, we first present a digital twin‑assisted SAGIMEC paradigm for LAE, where digital twin enables reliable network monitoring and management, while SAGIMEC provides efficient computing offloading services for Internet of Things sensor devices (ISDs). Then, a joint satellite selection, computation offloading, communication resource allocation, computation resource allocation and UAV trajectory control optimization problem (JSC4OP) is formulated to maximize the quality of service (QoS) of ISDs. Given the complexity of JSC4OP, we propose an online decentralized optimization approach (ODOA) to address the problem. Specifically, JSC4OP is first transformed into a real‑time decision‑making optimization problem (RDOP) by leveraging Lyapunov optimization. Then, to solve the RDOP, we introduce an online learning‑based latency prediction method to predict the uncertain system environment and a game theoretic decision‑making method to make real‑time decisions. Finally, theoretical analysis confirms the effectiveness of the ODOA, while the simulation results demonstrate that the proposed ODOA outperforms other alternative approaches in terms of overall system performance.
Authors: Barrios-Munoz Ricardo, Bernabe Matteo, Lopez-Perez David, Gomez-Barquero David, Quintanilla-Garcia Israel
Abstract: Following the burgeoning interest in unmanned aerial vehicles (UAVs) utilization within human‑inhabited spaces, critical challenges arise in ensuring reliable, low‑latency communication‑particularly important given the safety‑critical nature of such operations in densely populated urban environments. Therefore, adequate cellular communication capabilities are essential to enable safe and effective operations within the so‑called U‑Spaces. In this context, this paper investigates the communication performance of cellular‑connected UAVs in dense urban environments. In particular, the analysis is based on a comprehensive measurement campaign conducted in the city of Benidorm, Spain‑an urban area well known for its high concentration of tall buildings and overall urban density. More specifically, we evaluated key performance indicators (KPIs) related to received signal strength and quality, data rate, and latency across various altitudes, mobile network operators, access technologies, and frequency bands, using multiple types of measurement equipment. The results highlight significant challenges, primarily due to the lack of dedicated planning for aerial coverage and interference management, revealing that current cellular networks may fall short in supporting reliable and ubiquitous UAVs communication. Thus, this paper calls for improved network solutions to ensure the reliability of UAV operations in urban airspace, thereby contributing to the integration of UAVs into urban logistics and mobility.
Authors: Shaba Shaon, Tien Nguyen, Lina Mohjazi, Aryan Kaushik, Dinh C. Nguyen
Abstract: This paper studies a new latency optimization problem in unmanned aerial vehicles (UAVs)‑enabled federated learning (FL) with integrated sensing and communication. In this setup, distributed UAVs participate in model training using sensed data and collaborate with a base station (BS) serving as FL aggregator to build a global model. The objective is to minimize the FL system latency over UAV networks by jointly optimizing UAVs' trajectory and resource allocation of both UAVs and the BS. The formulated optimization problem is troublesome to solve due to its non‑convexity. Hence, we develop a simple yet efficient iterative algorithm to find a high‑quality approximate solution, by leveraging block coordinate descent and successive convex approximation techniques. Simulation results demonstrate the effectiveness of our proposed joint optimization strategy under practical parameter settings, saving the system latency up to 68.54% compared to benchmark schemes.
Authors: Haoyang Di, Xiaodong Zhu, Yulin Shao
Abstract: Unmanned aerial vehicles (UAVs) have become key enablers in relay‑assisted wireless communications thanks to their flexibility and line‑of‑sight channel advantage. However, most existing trajectory optimization frameworks assume ideal Gaussian inputs, overlooking the fact that practical wireless systems rely on structured, finite‑alphabet constellations. This mismatch can lead to suboptimal, and sometimes misleading, design choices. In this paper, we challenge that convention by introducing a finite‑alphabet‑aware framework for joint trajectory and precoder optimization in UAV‑assisted relay systems. We formulate a non‑convex design problem that directly accounts for discrete signal structures and propose an efficient solution based on alternating optimization and successive convex approximation. Simulation results reveal that strategies optimized under Gaussian assumptions can waste energy and degrade throughput in real deployments. In contrast, our approach adapts both the UAV's trajectory and transmission strategy to the underlying modulation format, delivering consistent performance gains under practical system constraints. This work takes a key step toward aligning UAV communication design with the realities of modern wireless systems: discrete signals, power limits, and intelligent mobility.
Authors: Youzhi Liu, Fanglong Yao, Yuanchang Yue, Guangluan Xu, Xian Sun, Kun Fu
Abstract: Vision‑and‑Language Navigation (VLN), as a widely discussed research direction in embodied intelligence, aims to enable embodied agents to navigate in complicated visual environments through natural language commands. Most existing VLN methods focus on indoor ground robot scenarios. However, when applied to UAV VLN in outdoor urban scenes, it faces two significant challenges. First, urban scenes contain numerous objects, which makes it challenging to match fine‑grained landmarks in images with complex textual descriptions of these landmarks. Second, overall environmental information encompasses multiple modal dimensions, and the diversity of representations significantly increases the complexity of the encoding process. To address these challenges, we propose NavAgent, the first urban UAV embodied navigation model driven by a large Vision‑Language Model. NavAgent undertakes navigation tasks by synthesizing multi‑scale environmental information, including topological maps (global), panoramas (medium), and fine‑grained landmarks (local). Specifically, we utilize GLIP to build a visual recognizer for landmark capable of identifying and linguisticizing fine‑grained landmarks. Subsequently, we develop dynamically growing scene topology map that integrate environmental information and employ Graph Convolutional Networks to encode global environmental data. In addition, to train the visual recognizer for landmark, we develop NavAgent‑Landmark2K, the first fine‑grained landmark dataset for real urban street scenes. In experiments conducted on the Touchdown and Map2seq datasets, NavAgent outperforms strong baseline models. The code and dataset will be released to the community to facilitate the exploration and development of outdoor VLN.
Authors: Yutao Shen, Hongyu Zhou, Xin Yang, Xuqi Lu, Ziyue Guo, Lixi Jiang, Yong He, Haiyan Cen
Abstract: Biomass estimation of oilseed rape is crucial for optimizing crop productivity and breeding strategies. While UAV‑based imaging has advanced high‑throughput phenotyping, current methods often rely on orthophoto images, which struggle with overlapping leaves and incomplete structural information in complex field environments. This study integrates 3D Gaussian Splatting (3DGS) with the Segment Anything Model (SAM) for precise 3D reconstruction and biomass estimation of oilseed rape. UAV multi‑view oblique images from 36 angles were used to perform 3D reconstruction, with the SAM module enhancing point cloud segmentation. The segmented point clouds were then converted into point cloud volumes, which were fitted to ground‑measured biomass using linear regression. The results showed that 3DGS (7k and 30k iterations) provided high accuracy, with peak signal‑to‑noise ratios (PSNR) of 27.43 and 29.53 and training times of 7 and 49 minutes, respectively. This performance exceeded that of structure from motion (SfM) and mipmap Neural Radiance Fields (Mip‑NeRF), demonstrating superior efficiency. The SAM module achieved high segmentation accuracy, with a mean intersection over union (mIoU) of 0.961 and an F1‑score of 0.980. Additionally, a comparison of biomass extraction models found the point cloud volume model to be the most accurate, with an determination coefficient (R2) of 0.976, root mean square error (RMSE) of 2.92 g/plant, and mean absolute percentage error (MAPE) of 6.81%, outperforming both the plot crop volume and individual crop volume models. This study highlights the potential of combining 3DGS with multi‑view UAV imaging for improved biomass phenotyping.
Authors: Xin Tang, Qian Chen, Wenjie Weng, Binhan Liao, Jiacheng Wang, Xianbin Cao, Xiaohuan Li
Abstract: Unmanned Aerial Vehicles (UAVs) possess high mobility and flexible deployment capabilities, prompting the development of UAVs for various application scenarios within the Internet of Things (IoT). The unique capabilities of UAVs give rise to increasingly critical and complex tasks in uncertain and potentially harsh environments. The substantial amount of data generated from these applications necessitates processing and analysis through deep neural networks (DNNs). However, UAVs encounter challenges due to their limited computing resources when managing DNN models. This paper presents a joint approach that combines multiple‑agent reinforcement learning (MARL) and generative diffusion models (GDM) for assigning DNN tasks to a UAV swarm, aimed at reducing latency from task capture to result output. To address these challenges, we first consider the task size of the target area to be inspected and the shortest flying path as optimization constraints, employing a greedy algorithm to resolve the subproblem with a focus on minimizing the UAV's flying path and the overall system cost. In the second stage, we introduce a novel DNN task assignment algorithm, termed GDM‑MADDPG, which utilizes the reverse denoising process of GDM to replace the actor network in multi‑agent deep deterministic policy gradient (MADDPG). This approach generates specific DNN task assignment actions based on agents' observations in a dynamic environment. Simulation results indicate that our algorithm performs favorably compared to benchmarks in terms of path planning, Age of Information (AoI), energy consumption, and task load balancing.
Authors: Dong Yang, Wei Dong, Wei Lu, Yanqi Dong, Sirui Liu
Abstract: Complex Cyber‑Physical System (CPS) such as Unmanned Aerial System (UAS) got rapid development these years, but also became vulnerable to GPS spoofing, packets injection, buffer‑overflow and other malicious attacks. Ensuring the behaviors of UAS always keeping secure no matter how the environment changes, would be a prospective direction for UAS security. This paper aims at introducing a pattern‑based framework to describe the security properties of UAS, and presenting a reactive synthesis‑based approach to implement the automatic generation of secure UAS controller. First, we study the operating mechanism of UAS and construct a high‑level model consisting of actuator and monitor. Besides, we analyze the security threats of UAS from the perspective of hardware, software and cyber physics, and then summarize the corresponding specification patterns of security properties with LTL formulas. With the UAS model and security specification patterns, automatons for controller can be constructed by General Reactivity of Rank 1 (GR(1)) synthesis algorithm, which is a two‑player game process between Unmanned Aerial Vehicle (UAV) and its environment. Finally, we experimented under the Ardupilot simulation platform to test the effectiveness of our method.
Authors: Juan P. Martinez-Esteso, Francisco J. Castellanos, Jorge Calvo-Zaragoza, Antonio Javier Gallego
Abstract: The speed of response by search and rescue teams at sea is of vital importance, as survival may depend on it. Recent technological advancements have led to the development of more efficient systems for locating individuals involved in a maritime incident, such as the use of Unmanned Aerial Vehicles (UAVs) equipped with cameras and other integrated sensors. Over the past decade, several researchers have contributed to the development of automatic systems capable of detecting people using aerial images, particularly by leveraging the advantages of deep learning. In this article, we provide a comprehensive review of the existing literature on this topic. We analyze the methods proposed to date, including both traditional techniques and more advanced approaches based on machine learning and neural networks. Additionally, we take into account the use of synthetic data to cover a wider range of scenarios without the need to deploy a team to collect data, which is one of the major obstacles for these systems. Overall, this paper situates the reader in the field of detecting people at sea using aerial images by quickly identifying the most suitable methodology for each scenario, as well as providing an in‑depth discussion and direction for future trends.
Authors: William Smith, Xinhua Wang
Abstract: Tilt rotor aircraft combine the benefits of both helicopters and fixed wing aircraft, this makes them popular for a variety of applications, including Search and Rescue and VVIP transport. However, due to the multiple flight modes, significant challenges with regards to the control system design are experienced. The main challenges with VTOL aircraft, comes during the dynamic phase (mode transition), where the aircraft transitions from a hover state to full forwards flight. In this transition phase the aerodynamic lift and torque generated by the wing/control surfaces increases and as such, the rotor thrust, and the tilt rate must be carefully considered, such that the height and attitude remain invariant during the mode transition. In this paper, a digital PID controller with the applicable digital filter and data hold functions is designed so that a successful mode transition between hover and forwards flight can be ascertained. Finally, the presented control system for the tilt‑rotor UAV is demonstrated through simulations by using the MATLAB software suite. The performance obtained from the simulations confirm the success of the implemented methods, with full stability in all three degrees of freedom being demonstrated.
Authors: Fen Liu, Shenghai Yuan, Wei Meng, Rong Su, Lihua Xie
Abstract: From prehistoric encirclement for hunting to GPS orbiting the earth for positioning, target encirclement has numerous real world applications. However, encircling multiple non‑cooperative targets in GPS‑denied environments remains challenging. In this work, multiple targets encirclement by using a minimum of two tasking agents, is considered where the relative distance measurements between the agents and the targets can be obtained by using onboard sensors. Based on the measurements, the center of all the targets is estimated directly by a fuzzy wavelet neural network (FWNN) and the least squares fit method. Then, a new distributed anti‑synchronization controller (DASC) is designed so that the two tasking agents are able to encircle all targets while staying opposite to each other. In particular, the radius of the desired encirclement trajectory can be dynamically determined to avoid potential collisions between the two agents and all targets. Based on the Lyapunov stability analysis method, the convergence proofs of the neural network prediction error, the target‑center position estimation error, and the controller error are addressed respectively. Finally, both numerical simulations and UAV flight experiments are conducted to demonstrate the validity of the encirclement algorithms. The flight tests recorded video and other simulation results can be found in https://youtu.be/B8uTorBNrl4.
Authors: Shadman Tajwar Shahid, Shah Md. Ahasan Siddique, Md. Mahidul Alam
Abstract: This article addresses the challenge of UAV survey coverage path planning for areas that are complex concave polygons, containing exclusion zones or obstacles. While standard drone path planners typically generate coverage paths for simple convex polygons, this study proposes a method to manage more intricate regions, including boundary splits, merges, and interior holes. To achieve this, polygonal decomposition techniques are used to partition the target area into convex sub‑regions. The sub‑polygons are then merged using a depth‑first search algorithm, followed by the generation of continuous Boustrophedon paths based on connected components. Polygonal offset by the straight skeleton method was used to ensure a constant safe distance from the exclusion zones. This approach allows UAV path planning in environments with complex geometric constraints.
Authors: Hemal Naik, Junran Yang, Dipin Das, Margaret C Crofoot, Akanksha Rathore, Vivek Hari Sridhar
Abstract: Understanding animal behaviour is central to predicting, understanding, and mitigating impacts of natural and anthropogenic changes on animal populations and ecosystems. However, the challenges of acquiring and processing long‑term, ecologically relevant data in wild settings have constrained the scope of behavioural research. The increasing availability of Unmanned Aerial Vehicles (UAVs), coupled with advances in machine learning, has opened new opportunities for wildlife monitoring using aerial tracking. However, limited availability of datasets with wild animals in natural habitats has hindered progress in automated computer vision solutions for long‑term animal tracking. Here we introduce BuckTales, the first large‑scale UAV dataset designed to solve multi‑object tracking (MOT) and re‑identification (Re‑ID) problem in wild animals, specifically the mating behaviour (or lekking) of blackbuck antelopes. Collected in collaboration with biologists, the MOT dataset includes over 1.2 million annotations including 680 tracks across 12 high‑resolution (5.4K) videos, each averaging 66 seconds and featuring 30 to 130 individuals. The Re‑ID dataset includes 730 individuals captured with two UAVs simultaneously. The dataset is designed to drive scalable, long‑term animal behaviour tracking using multiple camera sensors. By providing baseline performance with two detectors, and benchmarking several state‑of‑the‑art tracking methods, our dataset reflects the real‑world challenges of tracking wild animals in socially and ecologically relevant contexts. In making these data widely available, we hope to catalyze progress in MOT and Re‑ID for wild animals, fostering insights into animal behaviour, conservation efforts, and ecosystem dynamics through automated, long‑term monitoring.
Authors: Huy-Hoang Ngo, Thanh Nguyen Canh, Xiem HoangVan
Abstract: Intelligent aerial platforms such as Unmanned Aerial Vehicles (UAVs) are expected to revolutionize various fields, including transportation, traffic management, field monitoring, industrial production, and agricultural management. Among these, precise control is a critical task that determines the performance and capabilities of UAV systems. However, current research primarily focuses on trajectory tracking and minimizing flight errors, with limited attention to improving flight time. In this paper, we propose a Model Predictive Control (MPC) approach aimed at minimizing flight time while addressing the limitations of the commonly used classical MPC controllers. Furthermore, the MPC method and its application for UAV control are presented in detail. Finally, the results demonstrate that the proposed controller outperforms the standard MPC in terms of efficiency. Moreover, this approach shows potential to become a foundation for integrating intelligent algorithms into basic controllers.
Authors: Minjie Tang, Chenyuan Feng, Tony Q. S. Quek
Abstract: This paper investigates the semantic communication and cooperative tracking control for an UAV swarm comprising a leader UAV and a group of follower UAVs, all interconnected via unreliable wireless multiple‑input‑multiple‑output (MIMO) channels. Initially, we develop a dynamic model for the UAV swarm that accounts for both the internal interactions among the cooperative follower UAVs and the imperfections inherent in the MIMO channels that interlink the leader and follower UAVs. Building on this model, we incorporate the power costs of the UAVs and formulate the communication and cooperative tracking control challenge as a drift‑plus‑penalty optimization problem. We then derive a closed‑form optimal solution that maintains a decentralized semantic architecture, dynamically adjusting to the tracking error costs and local channel conditions within the swarm. Employing Lyapunov drift analysis, we establish closed‑form sufficient conditions for the stabilization of the UAV swarm's tracking performance. Numerical results demonstrate the significant enhancements in our proposed scheme over various state‑of‑the‑art methods.
Authors: Hubert Szolc, Karol Desnos, Tomasz Kryjak
Abstract: Deep reinforcement learning (DRL) is currently the most popular AI‑based approach to autonomous vehicle control. An agent, trained for this purpose in simulation, can interact with the real environment with a human‑level performance. Despite very good results in terms of selected metrics, this approach has some significant drawbacks: high computational requirements and low explainability. Because of that, a DRL‑based agent cannot be used in some control tasks, especially when safety is the key issue. Therefore we propose to use Tangled Program Graphs (TPGs) as an alternative for deep reinforcement learning in control‑related tasks. In this approach, input signals are processed by simple programs that are combined in a graph structure. As a result, TPGs are less computationally demanding and their actions can be explained based on the graph structure. In this paper, we present our studies on the use of TPGs as an alternative for DRL in control‑related tasks. In particular, we consider the problem of navigating an unmanned aerial vehicle (UAV) through the unknown environment based solely on the on‑board LiDAR sensor. The results of our work show promising prospects for the use of TPGs in control related‑tasks.
Authors: Zongcheng Zuo, Yuanxiang Li, Tongtong Zhang
Abstract: Due to different seasons, illumination, and atmospheric conditions, the photometric of the acquired image varies greatly, which leads to obvious stitching seams at the edges of the mosaic image. Traditional methods can be divided into two categories, one is absolute radiation correction and the other is relative radiation normalization. We propose a NeRF‑based method of color consistency correction for multi‑view images, which weaves image features together using implicit expressions, and then re‑illuminates feature space to generate a fusion image with a new perspective. We chose Superview‑1 satellite images and UAV images with large range and time difference for the experiment. Experimental results show that the synthesize image generated by our method has excellent visual effect and smooth color transition at the edges.
Authors: James Mordaunt, Xinhua Wang
Abstract: This paper presents an agile Unmanned Aerial Vehicle (UAV) landing control by considering the effect of ship's oscillations and moving, and also disturbance (i.e., crosswind) is considered. The presented control system can make the quadrotor UAV autonomously land whilst overcoming these adverse conditions, and the addition of a rudder beneath each propeller is designed to increase the yaw authority which is found to be lacking in heavy‑lift quadrotor UAV. The PID flight control system is proposed based on reference‑point tracking, allowing the UAV to follow any desired path in 3D space whilst simultaneously yawing to face any desired heading. Realistic saturation limits on actuator outputs to ensure the real‑world performance of actuators. Disturbances include randomised gusting wind in 3 axes, and sensor noise on translation and rotation signals to represent noise from the GPS and accelerometer respectively. The results from the simulations demonstrate that the UAV is capable of landing on a ship which is moving with varying heading and oscillating vertically on ocean waves and has the ability to time its descent such that it meets the ship at the peak of a wave to minimise the relative velocity.
Authors: Bowei Li, Yang Xu, Ran Zhang, Jiang, Xie, Miao Wang
Abstract: Deep reinforcement learning (DRL) has been extensively applied to Multi‑Unmanned Aerial Vehicle (UAV) network (MUN) to effectively enable real‑time adaptation to complex, time‑varying environments. Nevertheless, most of the existing works assume a stationary user distribution (UD) or a dynamic one with predicted patterns. Such considerations may make the UD‑specific strategies insufficient when a MUN is deployed in unknown environments. To this end, this paper investigates distributed user connectivity maximization problem in a MUN with generalization to arbitrary UDs. Specifically, the problem is first formulated into a time‑coupled combinatorial nonlinear non‑convex optimization with arbitrary underlying UDs. To make the optimization tractable, a multi‑agent CNN‑enhanced deep Q learning (MA‑CDQL) algorithm is proposed. The algorithm integrates a ResNet‑based CNN to the policy network to analyze the input UD in real time and obtain optimal decisions based on the extracted high‑level UD features. To improve the learning efficiency and avoid local optimums, a heatmap algorithm is developed to transform the raw UD to a continuous density map. The map will be part of the true input to the policy network. Simulations are conducted to demonstrate the efficacy of UD heatmaps and the proposed algorithm in maximizing user connectivity as compared to K‑means methods.
Authors: Geng Sun, Jiaxu Wu, Zemin Sun, Long He, Jiacheng Wang, Dusit Niyato, Abbas Jamalipour, Shiwen Mao
Abstract: In the era of the sixth generation (6G) and industrial Internet of Things (IIoT), an industrial cyber‑physical system (ICPS) drives the proliferation of sensor devices and computing‑intensive tasks. To address the limited resources of IIoT sensor devices, unmanned aerial vehicle (UAV)‑assisted mobile edge computing (MEC) has emerged as a promising solution, providing flexible and cost‑effective services in close proximity of IIoT sensor devices (ISDs). However, leveraging aerial MEC to meet the delay‑sensitive and computation‑intensive requirements of the ISDs could face several challenges, including the limited communication, computation and caching (3C) resources, stringent offloading requirements for 3C services, and constrained on‑board energy of UAVs. To address these issues, we first present a collaborative aerial MEC‑assisted ICPS architecture by incorporating the computing capabilities of the macro base station (MBS) and UAVs. We then formulate a service delay minimization optimization problem (SDMOP). Since the SDMOP is proved to be an NP‑hard problem, we propose a joint computation offloading, caching, communication resource allocation, computation resource allocation, and UAV trajectory control approach (JC5A). Specifically, JC5A consists of a block successive upper bound minimization method of multipliers (BSUMM) for computation offloading and service caching, a convex optimization‑based method for communication and computation resource allocation, and a successive convex approximation (SCA)‑based method for UAV trajectory control. Moreover, we theoretically prove the convergence and polynomial complexity of JC5A. Simulation results demonstrate that the proposed approach can achieve superior system performance compared to the benchmark approaches and algorithms.
Authors: Trong-Nhan Phan, Hoang-Hai Nguyen, Thi-Thu-Hien Ha, Huy-Tan Thai, Kim-Hung Le
Abstract: Visual inspections of bridges are critical to ensure their safety and identify potential failures early. This inspection process can be rapidly and accurately automated by using unmanned aerial vehicles (UAVs) integrated with deep learning models. However, choosing an appropriate model that is lightweight enough to integrate into the UAV and fulfills the strict requirements for inference time and accuracy is challenging. Therefore, our work contributes to the advancement of this model selection process by conducting a benchmark of 23 models belonging to the four newest YOLO variants (YOLOv5, YOLOv6, YOLOv7, YOLOv8) on COCO‑Bridge‑2021+, a dataset for bridge details detection. Through comprehensive benchmarking, we identify YOLOv8n, YOLOv7tiny, YOLOv6m, and YOLOv6m6 as the models offering an optimal balance between accuracy and processing speed, with mAP@50 scores of 0.803, 0.837, 0.853, and 0.872, and inference times of 5.3ms, 7.5ms, 14.06ms, and 39.33ms, respectively. Our findings accelerate the model selection process for UAVs, enabling more efficient and reliable bridge inspections.
Authors: Jiawen Kang, Yongju Tong, Yue Zhong, Junlong Chen, Minrui Xu, Dusit Niyato, Runrong Deng, Shiwen Mao
Abstract: The rise of 6G‑enable Vehicular Metaverses is transforming the automotive industry by integrating immersive, real‑time vehicular services through ultra‑low latency and high bandwidth connectivity. In 6G‑enable Vehicular Metaverses, vehicles are represented by Vehicle Twins (VTs), which serve as digital replicas of physical vehicles to support real‑time vehicular applications such as large Artificial Intelligence (AI) model‑based Augmented Reality (AR) navigation, called VT tasks. VT tasks are resource‑intensive and need to be offloaded to ground Base Stations (BSs) for fast processing. However, high demand for VT tasks and limited resources of ground BSs, pose significant resource allocation challenges, particularly in densely populated urban areas like intersections. As a promising solution, Unmanned Aerial Vehicles (UAVs) act as aerial edge servers to dynamically assist ground BSs in handling VT tasks, relieving resource pressure on ground BSs. However, due to high mobility of UAVs, there exists information asymmetry regarding VT task demands between UAVs and ground BSs, resulting in inefficient resource allocation of UAVs. To address these challenges, we propose a learning‑based Modified Second‑Bid (MSB) auction mechanism to optimize resource allocation between ground BSs and UAVs by accounting for VT task latency and accuracy. Moreover, we design a diffusion‑based reinforcement learning algorithm to optimize the price scaling factor, maximizing the total surplus of resource providers and minimizing VT task latency. Finally, simulation results demonstrate that the proposed diffusion‑based MSB auction outperforms traditional baselines, providing better resource distribution and enhanced service quality for vehicular users.
Authors: Joshua Moore, Aly Sabri Abdalla, Charles Ueltschey, Vuk Marojevic
Abstract: The Open Radio Access Network (O‑RAN) architecture is reshaping the telecommunications landscape by enhancing network flexibility, openness, and intelligence. This paper establishes the requirements, evaluates the design tradeoffs, and introduces a scalable architecture and prototype of an open‑source O‑RAN experimentation platform within the Aerial Experimentation and Research Platform for Advanced Wireless (AERPAW), an at scale testbed that integrates unmanned aerial vehicles (UAVs) with advanced wireless network technologies, offering experimentation in both outdoor testbed and emulation via a custom digital twin (DT). Through a series of aerial experiments, we evaluate FlexRIC, an open‑source RAN Intelligent Controller, within the AERPAW hardware‑software platform for network data monitoring, providing valuable insights into the proposed integration and revealing opportunities for leveraging O‑RAN to create custom service based optimizations for cellular connected UAVs. We discuss the challenges and potential use cases of this integration and demonstrate the use of a generative artificial intelligence model for generating realistic data based on collected real‑world data to support AERPAW's DT.
Authors: Sanku Kumar Roy, Mohamed Samshad, Ketan Rajawat
Abstract: The rapid growth of UAV applications necessitates a robust communication and networking architecture capable of addressing the diverse requirements of various applications concurrently, rather than relying on application‑specific solutions. This paper proposes a generic and reliable multi‑UAV communication and networking architecture designed to support the varying demands of heterogeneous applications, including short‑range and long‑range communication, star and mesh topologies, different data rates, and multiple wireless standards. Our architecture accommodates both adhoc and infrastructure networks, ensuring seamless connectivity throughout the network. Additionally, we present the design of a multi‑protocol UAV gateway that enables interoperability among various communication protocols. Furthermore, we introduce a data processing and service layer framework with a graphical user interface of a ground control station that facilitates remote control and monitoring from any location at any time. We practically implemented the proposed architecture and evaluated its performance using different metrics, demonstrating its effectiveness.
Authors: Yang Zhao, Zidong Nie, Kangsheng Dong, Qinghua Huang, Xuelong Li
Abstract: The application of intelligent decision‑making in unmanned aerial vehicle (UAV) is increasing, and with the development of UAV 1v1 pursuit‑evasion game, multi‑UAV cooperative game has emerged as a new challenge. This paper proposes a deep reinforcement learning‑based model for decision‑making in multi‑role UAV cooperative pursuit‑evasion game, to address the challenge of enabling UAV to autonomously make decisions in complex game environments. In order to enhance the training efficiency of the reinforcement learning algorithm in UAV pursuit‑evasion game environment that has high‑dimensional state‑action space, this paper proposes multi‑environment asynchronous double deep Q‑network with priority experience replay algorithm to effectively train the UAV's game policy. Furthermore, aiming to improve cooperation ability and task completion efficiency, as well as minimize the cost of UAVs in the pursuit‑evasion game, this paper focuses on the allocation of roles and targets within multi‑UAV environment. The cooperative game decision model with varying numbers of UAVs are obtained by assigning diverse tasks and roles to the UAVs in different scenarios. The simulation results demonstrate that the proposed method enables autonomous decision‑making of the UAVs in pursuit‑evasion game scenarios and exhibits significant capabilities in cooperation.
Authors: Francisco Giral, Ignacio Gómez, Ricardo Vinuesa, Soledad Le Clainche
Abstract: This study presents a transformer‑based approach for fault‑tolerant control in fixed‑wing Unmanned Aerial Vehicles (UAVs), designed to adapt in real time to dynamic changes caused by structural damage or actuator failures. Unlike traditional Flight Control Systems (FCSs) that rely on classical control theory and struggle under severe alterations in dynamics, our method directly maps outer‑loop reference values ‑‑ altitude, heading, and airspeed ‑‑ into control commands using the in‑context learning and attention mechanisms of transformers, thus bypassing inner‑loop controllers and fault‑detection layers. Employing a teacher‑student knowledge distillation framework, the proposed approach trains a student agent with partial observations by transferring knowledge from a privileged expert agent with full observability, enabling robust performance across diverse failure scenarios. Experimental results demonstrate that our transformer‑based controller outperforms industry‑standard FCS and state‑of‑the‑art reinforcement learning (RL) methods, maintaining high tracking accuracy and stability in nominal conditions and extreme failure cases, highlighting its potential for enhancing UAV operational safety and reliability.
Authors: Mohamed Samshad, Ketan Rajawat
Abstract: This paper presents a communication and energy‑aware multi‑UAV Coverage Path Planning (mCPP) method for scenarios requiring continuous inter‑UAV communication, such as cooperative search and rescue and surveillance missions. Unlike existing mCPP solutions that focus on energy, time, or coverage efficiency, the proposed method generates coverage paths that minimize a specified combination of energy and inter‑UAV connectivity radius. Key features of the proposed algorithm include a simplified and validated energy consumption model, an efficient connectivity radius estimator, and an optimization framework that enables us to search for the optimal paths over irregular and obstacle‑rich regions. The effectiveness and utility of the proposed algorithm is validated through simulations on various test regions with and without no‑fly‑zones. Real‑world experiments on a three‑UAV system demonstrate the remarkably high 99% match between the estimated and actual communication range requirement.
Authors: Kuan Jia, Dingcheng Yang, Yapeng Wang, Tianyun Shui, Chenji Liu
Abstract: This paper considers a patrol inspection scenario where multiple unmanned aerial vehicles (UAVs) are adopted to traverse multiple predetermined cruise points for data collection. The UAVs are connected to cellular networks and they would offload the collected data to the ground base stations (GBSs) for data processing within the constrained duration. This paper proposes a balanced task assignment strategy among patrol UAVs and an energy‑efficient trajectory design method. Through jointly optimizing the cruise point assignment, communication scheduling, computational allocation, and UAV trajectory, a novel solution can be obtained to balance the multiple UAVs' task completion time and minimize the total energy consumption. Firstly, we propose a novel clustering method that considers geometry topology, communication rate, and offload volume; it can determine each UAV's cruise points and balance the UAVs' patrol task. Secondly, a hybrid Time‑Energy traveling salesman problem is formulated to analyze the cruise point traversal sequence, and the energy‑efficient UAV trajectory can be designed by adopting the successive convex approximation (SCA) technique and block coordinate descent (BCD) scheme. The numerical results demonstrate that the proposed balanced task assignment strategy can efficiently balance the multiple UAVs' tasks. Moreover, the min‑max task completion time and total energy consumption performance of the proposed solution outperform that of the current conventional approach.
Authors: Juanqin Liu, Leonardo Plotegher, Eloy Roura, Cristino de Souza Junior, Shaoming He
Abstract: Unmanned Aerial Vehicle (UAV) detection technology plays a critical role in mitigating security risks and safeguarding privacy in both military and civilian applications. However, traditional detection methods face significant challenges in identifying UAV targets with extremely small pixels at long distances. To address this issue, we propose the Global‑Local YOLO‑Motion (GL‑YOMO) detection algorithm, which combines You Only Look Once (YOLO) object detection with multi‑frame motion detection techniques, markedly enhancing the accuracy and stability of small UAV target detection. The YOLO detection algorithm is optimized through multi‑scale feature fusion and attention mechanisms, while the integration of the Ghost module further improves efficiency. Additionally, a motion detection approach based on template matching is being developed to augment detection capabilities for minute UAV targets. The system utilizes a global‑local collaborative detection strategy to achieve high precision and efficiency. Experimental results on a self‑constructed fixed‑wing UAV dataset demonstrate that the GL‑YOMO algorithm significantly enhances detection accuracy and stability, underscoring its potential in UAV detection applications.
Authors: Eunhyuk Park, Junbeom Kim, Seok-Hwan Park, Osvaldo Simeone, Shlomo Shamai
Abstract: This work investigates a collaborative sensing and data collection system in which multiple unmanned aerial vehicles (UAVs) sense an area of interest and transmit images to a cloud server (CS) for processing. To accelerate the completion of sensing missions, including data transmission, the sensing task is divided into individual private sensing tasks for each UAV and a common sensing task that is executed by all UAVs to enable cooperative transmission. Unlike existing studies, we explore the use of an advanced cell‑free multiple‑input multiple‑output (MIMO) network, which effectively manages inter‑UAV interference. To further optimize wireless channel utilization, we propose a hybrid transmission strategy that combines time‑division multiple access (TDMA), non‑orthogonal multiple access (NOMA), and cooperative transmission. The problem of jointly optimizing task splitting ratios and the hybrid TDMA‑NOMA‑cooperative transmission strategy is formulated with the objective of minimizing mission completion time. Extensive numerical results demonstrate the effectiveness of the proposed task allocation and hybrid transmission scheme in accelerating the completion of sensing missions.
Authors: Aiman Munir, Ayan Dutta, Ramviyas Parasuraman
Abstract: We propose a distributed control law for a heterogeneous multi‑robot coverage problem, where the robots could have different energy characteristics, such as capacity and depletion rates, due to their varying sizes, speeds, capabilities, and payloads. Existing energy‑aware coverage control laws consider capacity differences but assume the battery depletion rate to be the same for all robots. In realistic scenarios, however, some robots can consume energy much faster than other robots; for instance, UAVs hover at different altitudes, and these changes could be dynamically updated based on their assigned tasks. Robots' energy capacities and depletion rates need to be considered to maximize the performance of a multi‑robot system. To this end, we propose a new energy‑aware controller based on Lloyd's algorithm to adapt the weights of the robots based on their energy dynamics and divide the area of interest among the robots accordingly. The controller is theoretically analyzed and extensively evaluated through simulations and real‑world demonstrations in multiple realistic scenarios and compared with three baseline control laws to validate its performance and efficacy.
Authors: Daniel Bonilla Licea, Giuseppe Silano, Hajar El Hammouti, Mounir Ghogho, Martin Saska
Abstract: A new class of Multi‑Rotor Aerial Vehicles (MRAVs), known as omnidirectional MRAVs (o‑MRAVs), has attracted significant interest in the robotics community. These MRAVs have the unique capability of independently controlling their 3D position and 3D orientation. In the context of aerial communication networks, this translates into the ability to control the position and orientation of the antenna mounted on the MRAV without any additional devices tasked for antenna orientation. This additional Degrees of Freedom (DoF) adds a new dimension to aerial communication systems, creating various research opportunities in communications‑aware trajectory planning and positioning. This paper presents this new class of MRAVs and discusses use cases in areas such as physical layer security and optical communications. Furthermore, the benefits of these MRAVs are illustrated with realistic simulation scenarios. Finally, new research problems and opportunities introduced by this advanced robotics technology are discussed.
Authors: Thanh Nguyen Canh, Huy-Hoang Ngo, Xiem HoangVan, Nak Young Chong
Abstract: Localization is one of the most crucial tasks for Unmanned Aerial Vehicle systems (UAVs) directly impacting overall performance, which can be achieved with various sensors and applied to numerous tasks related to search and rescue operations, object tracking, construction, etc. However, due to the negative effects of challenging environments, UAVs may lose signals for localization. In this paper, we present an effective path‑planning system leveraging semantic segmentation information to navigate around texture‑less and problematic areas like lakes, oceans, and high‑rise buildings using a monocular camera. We introduce a real‑time semantic segmentation architecture and a novel keyframe decision pipeline to optimize image inputs based on pixel distribution, reducing processing time. A hierarchical planner based on the Dynamic Window Approach (DWA) algorithm, integrated with a cost map, is designed to facilitate efficient path planning. The system is implemented in a photo‑realistic simulation environment using Unity, aligning with segmentation model parameters. Comprehensive qualitative and quantitative evaluations validate the effectiveness of our approach, showing significant improvements in the reliability and efficiency of UAV localization in challenging environments.
Authors: Jianjun Sun, Zhenwei Niu, Yihao Dong, Fenglin Zhang, Muhayy Ud Din, Lakmal Seneviratne, Defu Lin, Irfan Hussain, Shaoming He
Abstract: This paper presents an autonomous aerial system specifically engineered for operation in challenging marine GNSS‑denied environments, aimed at transporting small cargo from a target vessel. In these environments, characterized by weakly textured sea surfaces with few feature points, chaotic deck oscillations due to waves, and significant wind gusts, conventional navigation methods often prove inadequate. Leveraging the DJI M300 platform, our system is designed to autonomously navigate and transport cargo while overcoming these environmental challenges. In particular, this paper proposes an anchor‑based localization method using ultrawideband (UWB) and QR codes facilities, which decouples the UAV's attitude from that of the moving landing platform, thus reducing control oscillations caused by platform movement. Additionally, a motor‑driven attachment mechanism for cargo is designed, which enhances the UAV's field of view during descent and ensures a reliable attachment to the cargo upon landing. The system's reliability and effectiveness were progressively enhanced through multiple outdoor experimental iterations and were validated by the successful cargo transport during the 2024 Mohamed BinZayed International Robotics Challenge (MBZIRC2024) competition. Crucially, the system addresses uncertainties and interferences inherent in maritime transportation missions without prior knowledge of cargo locations on the deck and with strict limitations on intervention throughout the transportation.
Authors: Vu Khanh Quy, Nguyen Minh Quy, Tran Thi Hoai, Shaba Shaon, Md Raihan Uddin, Tien Nguyen, Dinh C. Nguyen, Aryan Kaushik, Periklis Chatzimisios
Abstract: 6G wireless networks are expected to provide seamless and data‑based connections that cover space‑air‑ground and underwater networks. As a core partition of future 6G networks, Space‑Air‑Ground Integrated Networks (SAGIN) have been envisioned to provide countless real‑time intelligent applications. To realize this, promoting AI techniques into SAGIN is an inevitable trend. Due to the distributed and heterogeneous architecture of SAGIN, federated learning (FL) and then quantum FL are emerging AI model training techniques for enabling future privacy‑enhanced and computation‑efficient SAGINs. In this work, we explore the vision of using FL/QFL in SAGINs. We present a few representative applications enabled by the integration of FL and QFL in SAGINs. A case study of QFL over UAV networks is also given, showing the merit of quantum‑enabled training approach over the conventional FL benchmark. Research challenges along with standardization for QFL adoption in future SAGINs are also highlighted.
Authors: Caroline M. Gevaert, Alexandra Aguiar Pedro, Ou Ku, Hao Cheng, Pranav Chandramouli, Farzaneh Dadrass Javan, Francesco Nattino, Sonja Georgievska
Abstract: Deep Learning methods are notorious for relying on extensive labeled datasets to train and assess their performance. This can cause difficulties in practical situations where models should be trained for new applications for which very little data is available. While few‑shot learning algorithms can address the first problem, they still lack sufficient explanations for the results. This research presents a workflow that tackles both challenges by proposing an explainable few‑shot learning workflow for detecting invasive and exotic tree species in the Atlantic Forest of Brazil using Unmanned Aerial Vehicle (UAV) images. By integrating a Siamese network with explainable AI (XAI), the workflow enables the classification of tree species with minimal labeled data while providing visual, case‑based explanations for the predictions. Results demonstrate the effectiveness of the proposed workflow in identifying new tree species, even in data‑scarce conditions. With a lightweight backbone, e.g., MobileNet, it achieves a F1‑score of 0.86 in 3‑shot learning, outperforming a shallow CNN. A set of explanation metrics, i.e., correctness, continuity, and contrastivity, accompanied by visual cases, provide further insights about the prediction results. This approach opens new avenues for using AI and UAVs in forest management and biodiversity conservation, particularly concerning rare or under‑studied species.
Authors: Bryan S. Guevara, Viviana Moya, Luis F. Recalde, David Pozo-Espin, Daniel C. Gandolfo, Juan M. Toibero
Abstract: In this study, we propose a novel method that integrates Nonlinear Model Predictive Contour Control (NMPCC) with an Exponentially Stabilizing Control Lyapunov Function (ES‑CLF) and Exponential Higher‑Order Control Barrier Functions to achieve stable path‑following and obstacle avoidance in UAV systems. This framework enables unmanned aerial vehicles (UAVs) to safely navigate around both static and dynamic obstacles while strictly adhering to desired paths. The quaternion‑based formulation ensures precise orientation and attitude control, while a robust optimization solver enforces the constraints imposed by the Control Lyapunov Function (CLF) and Control Barrier Functions (CBF), ensuring reliable real‑time performance. The method was validated in a Model‑in‑the‑Loop (MiL) environment, demonstrating effective path tracking and obstacle avoidance. The results highlight the framework's ability to minimize both orthogonal and tangential errors, ensuring stability and safety in complex environments.
Authors: Francisco M. F. R. Gonçalves, Ryan M. Bena, Néstor O. Pérez-Arancibia
Abstract: We present a new Lyapunov‑based switching attitude controller for energy‑efficient real‑time selection of the torque inputted to an uncrewed aerial vehicle (UAV) during flight. The proposed method, using quaternions to describe the attitude of the controlled UAV, interchanges the stability properties of the two fixed points‑one locally asymptotically stable and another unstable‑of the resulting closed‑loop (CL) switching dynamics of the system. In this approach, the switching events are triggered by the value of a compound energy‑based function. To analyze and ensure the stability of the CL switching dynamics, we use classical nonlinear Lyapunov techniques, in combination with switching‑systems theory. For this purpose, we introduce a new compound Lyapunov function (LF) that not only enables us to derive the conditions for CL asymptotic and exponential stability, but also provides us with an estimate of the CL system's region of attraction. This new estimate is considerably larger than those previously reported for systems of the type considered in this paper. To test and demonstrate the functionality, suitability, and performance of the proposed method, we present and discuss experimental data obtained using a 31‑g quadrotor during the execution of high‑speed yaw‑tracking maneuvers. Also, we provide empirical evidence indicating that all the initial conditions chosen for these maneuvers, as estimated, lie inside the system's region of attraction. Last, experimental data obtained through these flight tests show that the proposed switching controller reduces the control effort by about 53%, on average, with respect to that corresponding to a commonly used benchmark control scheme, when executing a particular type of high‑speed yaw‑tracking maneuvers.
Authors: Arjun Ramesh Kaushik, Charanjit Jutla, Nalini Ratha
Abstract: In safeguarding mission‑critical systems, such as Unmanned Aerial Vehicles (UAVs), preserving the privacy of path trajectories during navigation is paramount. While the combination of Reinforcement Learning (RL) and Fully Homomorphic Encryption (FHE) holds promise, the computational overhead of FHE presents a significant challenge. This paper proposes an innovative approach that leverages Knowledge Distillation to enhance the practicality of secure UAV navigation. By integrating RL and FHE, our framework addresses vulnerabilities to adversarial attacks while enabling real‑time processing of encrypted UAV camera feeds, ensuring data security. To mitigate FHE's latency, Knowledge Distillation is employed to compress the network, resulting in an impressive 18x speedup without compromising performance, as evidenced by an R‑squared score of 0.9499 compared to the original model's score of 0.9631. Our methodology underscores the feasibility of processing encrypted data for UAV navigation tasks, emphasizing security alongside performance efficiency and timely processing. These findings pave the way for deploying autonomous UAVs in sensitive environments, bolstering their resilience against potential security threats.
Authors: Benjamin J. Marshall, Yunda Yan, James Knowles, Chenguang Yang, Cunjia Liu
Abstract: A new disturbance observer based control scheme is developed for a quadrotor under the concurrent disturbances from a lightweight elastic tether cable and a lumped vertical disturbance. This elastic tether is unusual as it creates a disturbance proportional to the multicopter's translational movement. This paper takes an observer‑based approach to estimate the stiffness coefficient of the cable and uses the system model to update the estimates of the external forces, which are then compensated in the control action. Given that the tethered cable force affects both horizontal channels of the quadrotor and is also coupled with the vertical channel, the proposed disturbance observer is constructed to exploit the redundant measurements across all three channels to jointly estimate the cable stiffness and the vertical disturbance. A pseudo‑inverse method is used to determine the observer gain functions, such that the estimation of the two quantities is decoupled and stable. Compared to standard disturbance observers which assume nearly constant disturbances, the proposed approach can quickly adjust its total force estimate as the tethered quadrotor changes its position or tautness of the tether. This is applied to two experiments ‑ a tracking performance test where the multicopter moves under a constant tether strain, and an object extraction test. In the second test, the multicopter manipulates a nonlinear mechanism mimicking the extraction of a wedged object. In both cases, the proposed approach shows significant improvement over standard Disturbance Observer and Extended State Observer approaches. A video summary of the experiments can be found at https://youtu.be/9gKr13WTj‑k.
Authors: Faisal Mehmood, Enqing Chen, Touqeer Abbas, Samah M. Alzanin
Abstract: Human Action Recognition (HAR) is an interesting research area in human‑computer interaction used to monitor the activities of elderly and disabled individuals affected by physical and mental health. In the recent era, skeleton‑based HAR has received much attention because skeleton data has shown that it can handle changes in striking, body size, camera views, and complex backgrounds. One key characteristic of ST‑GCN is automatically learning spatial and temporal patterns from skeleton sequences. It has some limitations, as this method only works for short‑range correlation due to its limited receptive field. Consequently, understanding human action requires long‑range interconnection. To address this issue, we developed a spatial‑temporal relative transformer ST‑RTR model. The ST‑RTR includes joint and relay nodes, which allow efficient communication and data transmission within the network. These nodes help to break the inherent spatial and temporal skeleton topologies, which enables the model to understand long‑range human action better. Furthermore, we combine ST‑RTR with a fusion model for further performance improvements. To assess the performance of the ST‑RTR method, we conducted experiments on three skeleton‑based HAR benchmarks: NTU RGB+D 60, NTU RGB+D 120, and UAV‑Human. It boosted CS and CV by 2.11 % and 1.45% on NTU RGB+D 60, 1.25% and 1.05% on NTU RGB+D 120. On UAV‑Human datasets, accuracy improved by 2.54%. The experimental outcomes explain that the proposed ST‑RTR model significantly improves action recognition associated with the standard ST‑GCN method.
Authors: Shaswat Garg, Houman Masnavi, Baris Fidan, Farrokh Janabi-Sharifi
Abstract: This paper presents a novel reinforcement learning framework for trajectory tracking of unmanned aerial vehicles in cluttered environments using a dual‑agent architecture. Traditional optimization methods for trajectory tracking face significant computational challenges and lack robustness in dynamic environments. Our approach employs deep reinforcement learning (RL) to overcome these limitations, leveraging 3D pointcloud data to perceive the environment without relying on memory‑intensive obstacle representations like occupancy grids. The proposed system features two RL agents: one for predicting UAV velocities to follow a reference trajectory and another for managing collision avoidance in the presence of obstacles. This architecture ensures real‑time performance and adaptability to uncertainties. We demonstrate the efficacy of our approach through simulated and real‑world experiments, highlighting improvements over state‑of‑the‑art RL and optimization‑based methods. Additionally, a curriculum learning paradigm is employed to scale the algorithms to more complex environments, ensuring robust trajectory tracking and obstacle avoidance in both static and dynamic scenarios.
Authors: Omer Nacar, Mohamed Abdelkader, Lahouari Ghouti, Kahled Gabr, Abdulrahman S. Al-Batati, Anis Koubaa
Abstract: This paper tackles the challenge of real‑time 3D trajectory prediction for UAVs, which is critical for applications such as aerial surveillance and defense. Existing prediction models that rely primarily on position data struggle with accuracy, especially when UAV movements fall outside the position domain used in training. Our research identifies a gap in utilizing velocity estimates, first‑order dynamics, to better capture the dynamics and enhance prediction accuracy and generalizability in any position domain. To bridge this gap, we propose a new trajectory prediction method using Gated Recurrent Units (GRUs) within sequence‑based neural networks. Unlike traditional methods that rely on RNNs or transformers, this approach forecasts future velocities and positions based on historical velocity data instead of positions. This is designed to enhance prediction accuracy and scalability, overcoming challenges faced by conventional models in handling complex UAV dynamics. The methodology employs both synthetic and real‑world 3D UAV trajectory data, capturing a wide range of flight patterns, speeds, and agility. Synthetic data is generated using the Gazebo simulator and PX4 Autopilot, while real‑world data comes from the UZH‑FPV and Mid‑Air drone racing datasets. The GRU‑based models significantly outperform state‑of‑the‑art RNN approaches, with a mean square error (MSE) as low as 2 x 10^‑8. Overall, our findings confirm the effectiveness of incorporating velocity data in improving the accuracy of UAV trajectory predictions across both synthetic and real‑world scenarios, in and out of position data distributions. Finally, we open‑source our 5000 trajectories dataset and a ROS 2 package to facilitate the integration with existing ROS‑based UAV systems.
Authors: Alaa Awad Abdellatif, Ali Elmancy, Amr Mohamed, Ahmed Massoud, Wadha Lebda, Khalid K. Naji
Abstract: This paper introduces a comprehensive framework for Post‑Disaster Search and Rescue (PDSR), aiming to optimize search and rescue operations leveraging Unmanned Aerial Vehicles (UAVs). The primary goal is to improve the precision and availability of sensing capabilities, particularly in various catastrophic scenarios. Central to this concept is the rapid deployment of UAV swarms equipped with diverse sensing, communication, and intelligence capabilities, functioning as an integrated system that incorporates multiple technologies and approaches for efficient detection of individuals buried beneath rubble or debris following a disaster. Within this framework, we propose architectural solution and address associated challenges to ensure optimal performance in real‑world disaster scenarios. The proposed framework aims to achieve complete coverage of damaged areas significantly faster than traditional methods using a multi‑tier swarm architecture. Furthermore, integrating multi‑modal sensing data with machine learning for data fusion could enhance detection accuracy, ensuring precise identification of survivors.
Authors: Tong Hui, Matteo Fumagalli
Abstract: As aerial robots gain traction in industrial applications, there is growing interest in enhancing their physical interaction capabilities. Pushing tasks performed by aerial manipulators have been successfully demonstrated in contact‑based inspections. However, more complex industrial applications require these systems to support higher‑DoF (Degree of Freedom) manipulators and generate larger forces while pushing (e.g., drilling, grinding). This paper builds on our previous work, where we introduced an aerial vehicle that can dynamically vary its CoM (Center of Mass) location to improve force exertion during interactions. We propose a novel approach to further enhance this system's force generation by optimizing its CoM location during interactions. Additionally, we study the case of this aerial vehicle equipped with a 2‑DoF manipulation arm to extend the system's functionality in tool‑based tasks. The effectiveness of the proposed methods is validated through simulations, demonstrating the potential of this system for advanced aerial manipulation in practical settings.
Authors: Andrea Vaiuso, Marcello Righi, Oier Coretti, Moreno Apicella
Abstract: Unmanned Aerial Vehicles (UAVs) have become widely used in various fields and industrial applications thanks to their low operational cost, compact size and wide accessibility. However, the noise generated by drone propellers has emerged as a significant concern. This may affect the public willingness to implement these vehicles in services that require operation in proximity to residential areas. The standard approaches to address this challenge include sound pressure measurements and noise characteristic analyses. The integration of Artificial Intelligence models in recent years has further streamlined the process by enhancing complex feature detection in drone acoustics data. This study builds upon prior research by examining the efficacy of various Deep Learning models in predicting Psychoacoustic Annoyance, an effective index for measuring perceived annoyance by human ears, based on multiple drone characteristics as input. This is accomplished by constructing a training dataset using precise measurements of various drone models with multiple microphones and analyzing flight data, maneuvers, drone physical characteristics, and perceived annoyance under realistic conditions. The aim of this research is to improve our understanding of drone noise, aid in the development of noise reduction techniques, and encourage the acceptance of drone usage on public spaces.
Authors: Jess Stephenson, William S. Stewart, Melissa Greeff
Abstract: Landing a multirotor unmanned aerial vehicle (UAV) on an uncrewed surface vessel (USV) extends the operational range and offers recharging capabilities for maritime and limnology applications, such as search‑and‑rescue and environmental monitoring. However, autonomous UAV landings on USVs are challenging due to the unpredictable tilt and motion of the vessel caused by waves. This movement introduces spatial and temporal uncertainties, complicating safe, precise landings. Existing autonomous landing techniques on unmanned ground vehicles (UGVs) rely on shared state information, often causing time delays due to communication limits. This paper introduces a learning‑based distributed Model Predictive Control (MPC) framework for autonomous UAV landings on USVs in wave‑like conditions. Each vehicle's MPC optimizes for an artificial goal and input, sharing only the goal with the other vehicle. These goals are penalized by coupling and platform tilt costs, learned as a Gaussian Process (GP). We validate our framework in comprehensive indoor experiments using a custom‑designed platform attached to a UGV to simulate USV tilting motion. Our approach achieves a 53% increase in landing success compared to an approach that neglects the impact of tilt motion on landing.
Authors: Manjunath D, Prajwal Gurunath, Sumanth Udupa, Aditya Gandhamal, Shrikar Madhu, Aniruddh Sikdar, Suresh Sundaram
Abstract: Deep neural networks (DNNs) have shown exceptional performance when trained on well‑illuminated images captured by Electro‑Optical (EO) cameras, which provide rich texture details. However, in critical applications like aerial perception, it is essential for DNNs to maintain consistent reliability across all conditions, including low‑light scenarios where EO cameras often struggle to capture sufficient detail. Additionally, UAV‑based aerial object detection faces significant challenges due to scale variability from varying altitudes and slant angles, adding another layer of complexity. Existing methods typically address only illumination changes or style variations as domain shifts, but in aerial perception, correlation shifts also impact DNN performance. In this paper, we introduce the IndraEye dataset, a multi‑sensor (EO‑IR) dataset designed for various tasks. It includes 5,612 images with 145,666 instances, encompassing multiple viewing angles, altitudes, seven backgrounds, and different times of the day across the Indian subcontinent. The dataset opens up several research opportunities, such as multimodal learning, domain adaptation for object detection and segmentation, and exploration of sensor‑specific strengths and weaknesses. IndraEye aims to advance the field by supporting the development of more robust and accurate aerial perception systems, particularly in challenging conditions. IndraEye dataset is benchmarked with object detection and semantic segmentation tasks. Dataset and source codes are available at https://bit.ly/indraeye.
Authors: Rajiv Ranjan, Tejasavi Birdh, Nandan Mandal, Dinesh Kumar, Shashank Tamaskar
Abstract: This study investigates the relationship between sugarcane yield and cane height derived under different water and nitrogen conditions from pre‑harvest Digital Surface Model (DSM) obtained via Unmanned Aerial Vehicle (UAV) flights over a sugarcane test farm. The farm was divided into 62 blocks based on three water levels (low, medium, and high) and three nitrogen levels (low, medium, and high), with repeated treatments. In pixel distribution of DSM for each block, it provided bimodal distribution representing two peaks, ground level (gaps within canopies) and top of the canopies respectively. Using bimodal distribution, mean cane height was extracted for each block by applying a trimmed mean to the pixel distribution, focusing on the top canopy points. Similarly, the extracted mean elevation of the base was derived from the bottom points, representing ground level. The Derived Cane Height Model (DCHM) was generated by taking the difference between the mean canopy height and mean base elevation for each block. Yield measurements (tons/acre) were recorded post‑harvest for each block. By aggregating the data into nine treatment zones (e.g., high water‑low nitrogen, low water‑high nitrogen), the DCHM and median yield were calculated for each zone. The regression analysis between the DCHM and corresponding yields for the different treatment zones yielded an R 2 of 0.95. This study demonstrates the significant impact of water and nitrogen treatments on sugarcane height and yield, utilizing one‑time UAV‑derived DSM data.
Authors: Antonio Sojo, Iván Maza, Aníbal Ollero
Abstract: In this paper we address the optimal planning of autonomous teams for general purpose tasks including a wide spectrum of situations: from project management of human teams to the coordination of an automated assembly lines, focusing in the automated inspection of power grids. There exist many methods for task planning. However, the vast majority of such methods are conceived for very specific problems or situations and are often based in certain assumptions and simplifications. Consider for example all the different algorithms developed to solve the Vehicle Routing Problem (VRP) for all the different vehicles and environment characteristics. This means that no robust general planning method exists and that a possible extension of any of them to a more general situation is often not a trivial task. To address this, we propose a new truly general method ultimately based on a generalization of the Traveling Salesman Problem (TSP). We call this new model the Heterogeneous Multi‑worker Task Planning Problem (HMWTPP). It provides a natural framework to model many situations typical in task planning of all kinds. Task‑Worker compatibility, precedence/order and time‑windows constraints are already encoded into the HMWTPP while it can be easily extended to include weight capacity or battery per node constraints in an intuitive manner. Several classical TSP problems included in the TSPLIB library are solved for validation and performance analysis of HMWTPP showing a comparable numerical performance to that of existing models. In addition, a synthetic example modeling an automated assembly line is analyzed to prove the potential capabilities of the HMWTPP in real‑life scenarios. Ultimately, we focus in the computation of the optimal plan of Unmanned Aerial Vehicles (UAVs) specifically in the context of automated inspection of electrical power grids.
Authors: Alice James, Avishkar Seth, Endrowednes Kuantama, Subhas Mukhopadhyay, Richard Han
Abstract: In this paper, we address the challenge of navigating through unknown indoor environments using autonomous aerial robots within confined spaces. The core of our system involves the integration of key sensor technologies, including depth sensing from the ZED 2i camera, IMU data, and LiDAR measurements, facilitated by the Robot Operating System (ROS) and RTAB‑Map. Through custom designed experiments, we demonstrate the robustness and effectiveness of this approach. Our results showcase a promising navigation accuracy, with errors as low as 0.4 meters, and mapping quality characterized by a Root Mean Square Error (RMSE) of just 0.13 m. Notably, this performance is achieved while maintaining energy efficiency and balanced resource allocation, addressing a crucial concern in UAV applications. Flight tests further underscore the precision of our system in maintaining desired flight orientations, with a remarkable error rate of only 0.1%. This work represents a significant stride in the development of autonomous indoor UAV navigation systems, with potential applications in search and rescue, facility inspection, and environmental monitoring within GPS‑denied indoor environments.
Authors: InPyo Song, Jangwon Lee
Abstract: This paper addresses the problem of multi‑object tracking in Unmanned Aerial Vehicle (UAV) footage. It plays a critical role in various UAV applications, including traffic monitoring systems and real‑time suspect tracking by the police. However, this task is highly challenging due to the fast motion of UAVs, as well as the small size of target objects in the videos caused by the high‑altitude and wide angle views of drones. In this study, we thus introduce a simple yet more effective method compared to previous work to overcome these challenges. Our approach involves a new tracking strategy, which initiates the tracking of target objects from low‑confidence detections commonly encountered in UAV application scenarios. Additionally, we propose revisiting traditional appearance‑based matching algorithms to improve the association of low‑confidence detections. To evaluate the effectiveness of our method, we conducted benchmark evaluations on two UAV‑specific datasets (VisDrone2019, UAVDT) and one general object tracking dataset (MOT17). The results demonstrate that our approach surpasses current state‑of‑the art methodologies, highlighting its robustness and adaptability in diverse tracking environments. Furthermore, we have improved the annotation of the UAVDT dataset by rectifying several errors and addressing omissions found in the original annotations. We will provide this refined version of the dataset to facilitate better benchmarking in the field.
Authors: Mehdi Maboudi, Jan Backhaus, Inka Mai, Yahya Ghassoun, Yogesh Khedar, Dirk Lowke, Bjoern Riedel, Ulf Bestmann, Markus Gerke
Abstract: Accurate and efficient structural health monitoring of infrastructure objects such as bridges is a vital task, as many existing constructions have already reached or are approaching their planned service life. In this contribution, we address the question of the suitability of UAV‑based monitoring for SHM, in particular focusing on the geometric deformation under load. Such an advanced technology is becoming increasingly popular due to its ability to decrease the cost and risk of tedious traditional inspection methods. To this end, we performed extensive tests employing a research reinforced concrete bridge that can be exposed to a predefined load via ground anchors. Very high‑resolution image blocks have been captured before, during, and after the application of controlled loads. From those images, the motion of distinct points on the bridge has been monitored, and in addition, dense image point clouds were computed to evaluate the performance of surface‑based data acquisition. Moreover, a geodetic control network in stable regions is used as control information for bundle adjustment. We applied different sensing technologies in order to be able to judge the image‑based deformation results: displacement transducers, tachymetry, and laser profiling. As a platform for the photogrammetric measurements, a multi‑rotor UAV DJI Matrice 600 Pro was employed, equipped with two RTK‑GNSS receivers. The mounted camera was a PhaseOne iXM‑100 (100MP) with an 80 mm lens. With a flying height of 30 m above the terrain, this resulted in a GSD of 1.3 mm while a forward and sideward overlap of 80% was maintained. The comparison with reference data (displacement transducers) reveals a difference of less than 1 mm. We show that by employing the introduced UAV‑based monitoring approach, a full area‑wide quantification of deformation is possible in contrast to classical point or profile measurements.
Authors: Elias J. R. Freitas, Miri Weiss Cohen, Frederico G. Guimarães, Luciano C. A. Pimenta
Abstract: This research presents an online path planner for Unmanned Aerial Vehicles (UAVs) that can handle dynamic obstacles and UAV motion constraints, including maximum curvature and desired orientations. Our proposed planner uses a NURBS path representation and a Differential Evolution algorithm, incorporating concepts from the Velocity Obstacle approach in a constraint function. Initial results show that our approach is feasible and provides a foundation for future extensions to three‑dimensional (3D) environments.
Authors: Yuqing Xie, Chao Yu, Hongzhi Zang, Feng Gao, Wenhao Tang, Jingyi Huang, Jiayu Chen, Botian Xu, Yi Wu, Yu Wang
Abstract: This paper tackles the challenging task of maintaining formation among multiple unmanned aerial vehicles (UAVs) while avoiding both static and dynamic obstacles during directed flight. The complexity of the task arises from its multi‑objective nature, the large exploration space, and the sim‑to‑real gap. To address these challenges, we propose a two‑stage reinforcement learning (RL) pipeline. In the first stage, we randomly search for a reward function that balances key objectives: directed flight, obstacle avoidance, formation maintenance, and zero‑shot policy deployment. The second stage applies this reward function to more complex scenarios and utilizes curriculum learning to accelerate policy training. Additionally, we incorporate an attention‑based observation encoder to improve formation maintenance and adaptability to varying obstacle densities. Experimental results in both simulation and real‑world environments demonstrate that our method outperforms both planning‑based and RL‑based baselines in terms of collision‑free rates and formation maintenance across static, dynamic, and mixed obstacle scenarios. Ablation studies further confirm the effectiveness of our curriculum learning strategy and attention‑based encoder. Animated demonstrations are available at: https://sites.google.com/view/ uav‑formation‑with‑avoidance/.
Authors: Xinran Fang, Chengleyang Lei, Wei Feng, Yunfei Chen, Ming Xiao, Ning Ge, Chengxiang Wang
Abstract: Rapid advancements in field robots have brought a new kind of cyber physical system (CPS)‑‑unmanned robotic system‑‑under the spotlight. In the upcoming sixth‑generation (6G) era, these systems hold great potential to replace humans in hazardous tasks. This paper investigates an unmanned robotic system comprising a multi‑functional unmanned aerial vehicle (UAV), sensors, and actuators. The UAV carries communication and computing modules, acting as an edge information hub (EIH) that transfers and processes information. During the task execution, the EIH gathers sensing data, calculates control commands, and transmits commands to actuators‑‑leading to reflex‑arc‑like sensing‑communication‑computing‑control (\mathbfSC^3) loops. Unlike existing studies that design \mathbfSC^3 loop components separately, we take each \mathbfSC^3 loop as an integrated structure and propose a goal‑oriented closed‑loop optimization scheme. This scheme jointly optimizes uplink and downlink (UL&DL) communication and computing within and across the \mathbfSC^3 loops to minimize the total linear quadratic regulator (LQR) cost. We derive optimal closed‑form solutions for intra‑loop allocation and propose an efficient iterative algorithm for inter‑loop optimization. Under the condition of adequate CPU frequency availability, we derive an approximate closed‑form solution for inter‑loop bandwidth allocation. Simulation results demonstrate that the proposed scheme achieves a two‑tier task‑level balance within and across \mathbfSC^3 loops.
Authors: Abhishek Phadke, Alihan Hadimlioglu, Tianxing Chu, Chandra N Sekharan
Abstract: The intersection of LLMs (Large Language Models) and UAV (Unoccupied Aerial Vehicles) technology represents a promising field of research with the potential to enhance UAV capabilities significantly. This study explores the application of LLMs in UAV control, focusing on the opportunities for integrating advanced natural language processing into autonomous aerial systems. By enabling UAVs to interpret and respond to natural language commands, LLMs simplify the UAV control and usage, making them accessible to a broader user base and facilitating more intuitive human‑machine interactions. The paper discusses several key areas where LLMs can impact UAV technology, including autonomous decision‑making, dynamic mission planning, enhanced situational awareness, and improved safety protocols. Through a comprehensive review of current developments and potential future directions, this study aims to highlight how LLMs can transform UAV operations, making them more adaptable, responsive, and efficient in complex environments. A template development framework for integrating LLMs in UAV control is also described. Proof of Concept results that integrate existing LLM models and popular robotic simulation platforms are demonstrated. The findings suggest that while there are substantial technical and ethical challenges to address, integrating LLMs into UAV control holds promising implications for advancing autonomous aerial systems.
Authors: Nicolas Michel, Ayush Patnaik, Zhaodan Kong, Xinfan Lin
Abstract: Multirotor unmanned aerial vehicle is a prevailing type of aerial robots with wide real‑world applications. The energy efficiency of the robot is a critical aspect of its performance, determining the range and duration of the missions that can be performed. This paper studies the energy‑optimal planning of the multirotor, which aims at finding the optimal ordering of waypoints with the minimum energy consumption for missions in 3D space. The study is performed based on a previously developed model capturing first‑principle energy dynamics of the multirotor. We found that in majority of the cases (up to 95%) the solutions of the energy‑optimal planning are different from those of the traditional traveling salesman problem which minimizes the total distance. The difference can be as high as 14.9%, with the average at 1.6%‑3.3% and 90th percentile at 3.7%‑6.5% depending on the range and number of waypoints in the mission. We then identified and explained the key features of the minimum‑energy order by correlating to the underlying flight energy dynamics. It is shown that instead of minimizing the distance, coordination of vertical and horizontal motion to promote aerodynamic efficiency is the key to optimizing energy consumption.
Authors: Mohammed Saif, Shahrokh Valaee
Abstract: Reconfigurable intelligent surface (RIS) is pivotal for beyond 5G networks in regards to the surge demand for reliable communication in unmanned aerial vehicle (UAV) networks. This paper presents an innovative approach to maximize connectivity of UAV networks using RIS deployment and virtual partitioning, wherein an RIS is deployed to assist in the communications between an user‑equipment (UE) and blocked UAVs. Closed‑form (CF) expressions for signal‑to‑noise ratio (SNR) of the two‑UAV setup are derived and validated. Then, an optimization problem is formulated to maximize network connectivity by optimizing the 3D deployment of the RIS and its partitioning subject to predefined quality‑of‑service (QoS) constraints. To tackle this problem, we propose a method of virtually partitioning the RIS given a fixed 3D location, such that the partition phase shifts are configured to create cascaded channels between the UE and the blocked two UAVs. Then, simulated‑annealing (SA) method is used to find the 3D location of the RIS. Simulation results demonstrate that the proposed joint RIS deployment and partitioning framework can significantly improve network connectivity compared to benchmarks, including RIS‑free and RIS with a single narrow‑beam link.
Authors: Mahmoud Ali, Di Yang, François Brémond
Abstract: Current vision‑language foundation models, such as CLIP, have recently shown significant improvement in performance across various downstream tasks. However, whether such foundation models significantly improve more complex fine‑grained action recognition tasks is still an open question. To answer this question and better find out the future research direction on human behavior analysis in‑the‑wild, this paper provides a large‑scale study and insight on current state‑of‑the‑art vision foundation models by comparing their transfer ability onto zero‑shot and frame‑wise action recognition tasks. Extensive experiments are conducted on recent fine‑grained, human‑centric action recognition datasets (e.g., Toyota Smarthome, Penn Action, UAV‑Human, TSU, Charades) including action classification and segmentation.
Authors: Nikos Sakellariou, Antonios Lalas, Konstantinos Votis, Dimitrios Tzovaras
Abstract: The unique cost, flexibility, speed, and efficiency of modern UAVs make them an attractive choice in many applications in contemporary society. This, however, causes an ever‑increasing number of reported malicious or accidental incidents, rendering the need for the development of UAV detection and classification mechanisms essential. We propose a methodology for developing a system that fuses already processed multi‑sensor data into a new Deep Neural Network to increase its classification accuracy towards UAV detection. The DNN model fuses high‑level features extracted from individual object detection and classification models associated with thermal, optronic, and radar data. Additionally, emphasis is given to the model's Convolutional Neural Network (CNN) based architecture that combines the features of the three sensor modalities by stacking the extracted image features of the thermal and optronic sensor achieving higher classification accuracy than each sensor alone.
Authors: Chen Hu, Hanchi Ren, Jingjing Deng, Xianghua Xie
Abstract: Unmanned Aerial Vehicle (UAV) swarms are increasingly deployed in dynamic, data‑rich environments for applications such as environmental monitoring and surveillance. These scenarios demand efficient data processing while maintaining privacy and security, making Federated Learning (FL) a promising solution. FL allows UAVs to collaboratively train global models without sharing raw data, but challenges arise due to the non‑Independent and Identically Distributed (non‑IID) nature of the data collected by UAVs. In this study, we show an integration of the state‑of‑the‑art FL methods to UAV Swarm application and invetigate the performance of multiple aggregation methods (namely FedAvg, FedProx, FedOpt, and MOON) with a particular focus on tackling non‑IID on a variety of datasets, specifically MNIST for baseline performance, CIFAR10 for natural object classification, EuroSAT for environment monitoring, and CelebA for surveillance. These algorithms were selected to cover improved techniques on both client‑side updates and global aggregation. Results show that while all algorithms perform comparably on IID data, their performance deteriorates significantly under non‑IID conditions. FedProx demonstrated the most stable overall performance, emphasising the importance of regularising local updates in non‑IID environments to mitigate drastic deviations in local models.
Authors: Andrea Berra, Viswa Narayanan Sankaranarayanan, Achilleas Santi Seisa, Julien Mellet, Udayanga G. W. K. N. Gamage, Sumeet Gajanan Satpute, Fabio Ruggiero, Vincenzo Lippiello, Silvia Tolu, Matteo Fumagalli, George Nikolakopoulos, Miguel Ángel Trujillo Soto, Guillermo Heredia
Abstract: The paper introduces a novel framework for safe and autonomous aerial physical interaction in industrial settings. It comprises two main components: a neural network‑based target detection system enhanced with edge computing for reduced onboard computational load, and a control barrier function (CBF)‑based controller for safe and precise maneuvering. The target detection system is trained on a dataset under challenging visual conditions and evaluated for accuracy across various unseen data with changing lighting conditions. Depth features are utilized for target pose estimation, with the entire detection framework offloaded into low‑latency edge computing. The CBF‑based controller enables the UAV to converge safely to the target for precise contact. Simulated evaluations of both the controller and target detection are presented, alongside an analysis of real‑world detection performance.
Authors: Anning Wei, Jintao Liang, Kaiyuan Lin, Ziyue Li, Rui Zhao
Abstract: Existing multi‑agent deep reinforcement learning (MADRL) methods for multi‑UAV navigation face challenges in generalization, particularly when applied to unseen complex environments. To address these limitations, we propose a Dual‑Transformer Encoder‑based Proximal Policy Optimization (DTPPO) method. DTPPO enhances multi‑UAV collaboration through a Spatial Transformer, which models inter‑agent dynamics, and a Temporal Transformer, which captures temporal dependencies to improve generalization across diverse environments. This architecture allows UAVs to navigate new, unseen environments without retraining. Extensive simulations demonstrate that DTPPO outperforms current MADRL methods in terms of transferability, obstacle avoidance, and navigation efficiency across environments with varying obstacle densities. The results confirm DTPPO's effectiveness as a robust solution for multi‑UAV navigation in both known and unseen scenarios.
Authors: Maciej Adamiak, Yulia Grinblat, Julian Psotta, Nir Fulman, Himshikhar Mazumdar, Shiyu Tang, Alexander Zipf
Abstract: This paper presents a method for detecting and estimating vehicle speeds using PlanetScope SuperDove satellite imagery, offering a scalable solution for global vehicle traffic monitoring. Conventional methods such as stationary sensors and mobile systems like UAVs are limited in coverage and constrained by high costs and legal restrictions. Satellite‑based approaches provide broad spatial coverage but face challenges, including high costs, low frame rates, and difficulty detecting small vehicles in high‑resolution imagery. We propose a Keypoint R‑CNN model to track vehicle trajectories across RGB bands, leveraging band timing differences to estimate speed. Validation is performed using drone footage and GPS data covering highways in Germany and Poland. Our model achieved a Mean Average Precision of 0.53 and velocity estimation errors of approximately 3.4 m/s compared to GPS data. Results from drone comparison reveal underestimations, with average speeds of 112.85 km/h for satellite data versus 131.83 km/h from drone footage. While challenges remain with high‑speed accuracy, this approach demonstrates the potential for scalable, daily traffic monitoring across vast areas, providing valuable insights into global traffic dynamics.
Authors: Ala Souissi, Hajer Fradi, Panagiotis Papadakis
Abstract: In this paper, we present our proposed approach for active tracking to increase the autonomy of Unmanned Aerial Vehicles (UAVs) using event cameras, low‑energy imaging sensors that offer significant advantages in speed and dynamic range. The proposed tracking controller is designed to respond to visual feedback from the mounted event sensor, adjusting the drone movements to follow the target. To leverage the full motion capabilities of a quadrotor and the unique properties of event sensors, we propose an end‑to‑end deep‑reinforcement learning (DRL) framework that maps raw sensor data from event streams directly to control actions for the UAV. To learn an optimal policy under highly variable and challenging conditions, we opt for a simulation environment with domain randomization for effective transfer to real‑world environments. We demonstrate the effectiveness of our approach through experiments in challenging scenarios, including fast‑moving targets and changing lighting conditions, which result in improved generalization capabilities.
Authors: Xueming Liu, Dengyu Zhang, Qingrui Zhang, Tianjiang Hu
Abstract: This paper proposes an integrated framework for coordinating multiple unmanned aerial vehicles (UAVs) in a distributed manner to persistently enclose and track a moving target without relying on external localization systems. The proposed framework consists of three modules: cooperative state estimators, circular formation pattern generators, and formation tracking controllers. In the cooperative state estimation module, a recursive least squares estimator (RLSE) for estimating the relative positions between UAVs is integrated with a distributed Kalman filter (DKF), enabling a persistent estimation of the target's state. When a UAV loses direct measurements of the target due to environmental occlusion, measurements from neighbors are aligned into the UAV's local frame to provide indirect measurements. The second module focuses on planning a desired circular formation pattern using a coupled oscillator model. This pattern ensures an even distribution of UAVs around a circle that encloses the moving target. The persistent excitation property of the circular formation is crucial for achieving convergence in the first module. Finally, a consensus‑based formation controller is designed to enable multiple UAVs to asymptotically track the planned circular formation pattern while ensuring bounded control inputs. Theoretical analysis demonstrates that the proposed framework ensures asymptotic tracking of a target with constant velocity. For a target with varying velocity, the tracking error converges to a bounded region related to the target's maximum acceleration. Simulations and experiments validate the effectiveness of the proposed algorithm.
Authors: Midhun E K, Ashwini Ratnoo
Abstract: This paper focuses on developing a bearings‑only measurement‑based three‑dimensional window traversal guidance method for quadrotor Uninhabitated Aerial Vehicles (UAVs). The desired flight path and heading angles of the quadrotor are proposed as functions of the bearing angle information of the four vertices of the window. These angular guidance inputs employ a bearing angle bisector term and an elliptic shaping angle term, which directs the quadrotor towards the centroid of the window. Detailed stability analysis of the resulting kinematics demonstrates that all quadrotor trajectories lead to the centroid of the window along a direction which is normal to the window plane. A qualitative comparison with existing traversal methodologies showcases the superiority of the proposed guidance approach with regard to the nature of information, computations for generating the guidance commands, and flexibility of replanning the traversal path. Realistic simulations considering six degree‑of‑freedom quadrotor model and Monte Carlo studies validate the effectiveness, accuracy, and robustness of the proposed guidance solution. Representative flight validation trials are carried out using an indoor motion capture system.
Authors: Demetris Shianios, Panayiotis Kolios, Christos Kyrkou
Abstract: The integration of Unmanned Aerial Vehicles (UAVs) with artificial intelligence (AI) models for aerial imagery processing in disaster assessment, necessitates models that demonstrate exceptional accuracy, computational efficiency, and real‑time processing capabilities. Traditionally Convolutional Neural Networks (CNNs), demonstrate efficiency in local feature extraction but are limited by their potential for global context interpretation. On the other hand, Vision Transformers (ViTs) show promise for improved global context interpretation through the use of attention mechanisms, although they still remain underinvestigated in UAV‑based disaster response applications. Bridging this research gap, we introduce DiRecNetV2, an improved hybrid model that utilizes convolutional and transformer layers. It merges the inductive biases of CNNs for robust feature extraction with the global context understanding of Transformers, maintaining a low computational load ideal for UAV applications. Additionally, we introduce a new, compact multi‑label dataset of disasters, to set an initial benchmark for future research, exploring how models trained on single‑label data perform in a multi‑label test set. The study assesses lightweight CNNs and ViTs on the AIDERSv2 dataset, based on the frames per second (FPS) for efficiency and the weighted F1 scores for classification performance. DiRecNetV2 not only achieves a weighted F1 score of 0.964 on a single‑label test set but also demonstrates adaptability, with a score of 0.614 on a complex multi‑label test set, while functioning at 176.13 FPS on the Nvidia Orin Jetson device.
Authors: Kristina Telegraph, Christos Kyrkou
Abstract: This work presents advancements in multi‑class vehicle detection using UAV cameras through the development of spatiotemporal object detection models. The study introduces a Spatio‑Temporal Vehicle Detection Dataset (STVD) containing 6, 600 annotated sequential frame images captured by UAVs, enabling comprehensive training and evaluation of algorithms for holistic spatiotemporal perception. A YOLO‑based object detection algorithm is enhanced to incorporate temporal dynamics, resulting in improved performance over single frame models. The integration of attention mechanisms into spatiotemporal models is shown to further enhance performance. Experimental validation demonstrates significant progress, with the best spatiotemporal model exhibiting a 16.22% improvement over single frame models, while it is demonstrated that attention mechanisms hold the potential for additional performance gains.
Authors: Kejun Ren, Xin Wu, Lianming Xu, Li Wang
Abstract: Unmanned Aerial Vehicle (UAV) remote sensing, with its advantages of rapid information acquisition and low cost, has been widely applied in scenarios such as emergency response. However, due to the long imaging distance and complex imaging mechanisms, targets in remote sensing images often face challenges such as small object size, dense distribution, and low inter‑class discriminability. To address these issues, this paper proposes a multi‑modal remote sensing object detection network called RemoteDet‑Mamba, which is based on a patch‑level four‑direction selective scanning fusion strategy. This method simultaneously learns unimodal local features and fuses cross‑modal patch‑level global semantic information, thereby enhancing the distinguishability of small objects and improving inter‑class discrimination. Furthermore, the designed lightweight fusion mechanism effectively decouples densely packed targets while reducing computational complexity. Experimental results on the DroneVehicle dataset demonstrate that RemoteDet‑Mamba achieves superior detection performance compared to current mainstream methods, while maintaining low parameter count and computational overhead, showing promising potential for practical applications.
Authors: Sotiris Papatheodorou, Anthony Tzes
Abstract: The objective in this article is to develop a control strategy for coverage purposes of a convex region by a fleet of Mobile Aerial Agents (MAAs). Each MAA is equipped with a downward facing camera that senses a convex portion of the area while its altitude flight is constrained. Rather than relying on typical Voronoi‑like tessellations of the area to be covered, a scheme focusing on the assignment to each MAA of certain parts of the mosaic of the current covered area is proposed. A gradient ascent algorithm is then employed to increase in a monotonic manner the covered area by the MAA‑fleet. Simulation studies are offered to illustrate the effectiveness of the proposed scheme.
Authors: Fei Chen, S. Hamid Rezatofighi, Damith C. Ranasinghe
Abstract: Autonomous aerial vehicles can provide efficient and effective solutions for radio frequency (RF) source tracking and localizing problems with applications ranging from wildlife conservation to search and rescue operations. Existing lightweight, low‑cost, bearing measurements‑based methods with a single antenna‑receiver sensor system configurations necessitate in situ rotations, leading to substantial measurement acquisition times restricting searchable areas and number of measurements. We propose a GyroCopter for the task. Our approach plans the trajectory of a multi‑rotor unmanned aerial vehicle (UAV) whilst utilizing UAV flight dynamics to execute a constant gyration motion to derive "pseudo‑bearing" measurements to track RF sources. The gyration‑based pseudo‑bearing approach: i) significantly reduces the limitations associated with in situ rotation bearing; while ii) capitalizing on the simplicity, affordability, and lightweight nature of signal strength measurement acquisition hardware to estimate bearings. This method distinguishes itself from other pseudo‑bearing approaches by eliminating the need for additional hardware to maintain simplicity, lightweightness and cost‑effectiveness. To validate our approach, we derived the optimal rotation speed and conducted extensive simulations and field missions with our GyroCopter to track and localize multiple RF sources. The results confirm the effectiveness of our method, highlighting its potential as a practical and rapid solution for RF source localization tasks.
Authors: Jesús Alejandro Loera-Ponce, Diego A. Mercado-Ravell, Israel Becerra-Durán, Luis Manuel Valentin-Coronado
Abstract: In this paper, we address the vision‑based autonomous landing problem in complex urban environments using deep neural networks for semantic segmentation and risk assessment. We propose employing the SegFormer, a state‑of‑the‑art visual transformer network, for the semantic segmentation of complex, unstructured urban environments. This approach yields valuable information that can be utilized in smart autonomous landing missions, particularly in emergency landing scenarios resulting from system failures or human errors. The assessment is done in real‑time flight, when images of an RGB camera at the Unmanned Aerial Vehicle (UAV) are segmented with the SegFormer into the most common classes found in urban environments. These classes are then mapped into a level of risk, considering in general, potential material damage, damaging the drone itself and endanger people. The proposed strategy is validated through several case studies, demonstrating the huge potential of semantic segmentation‑based strategies to determining the safest landing areas for autonomous emergency landing, which we believe will help unleash the full potential of UAVs on civil applications within urban areas.
Authors: Haechan Mark Bong, Ricardo de Azambuja, Giovanni Beltrame
Abstract: Real‑time aerial image segmentation plays an important role in the environmental perception of Uncrewed Aerial Vehicles (UAVs). We introduce BlabberSeg, an optimized Vision‑Language Model built on CLIPSeg for on‑board, real‑time processing of aerial images by UAVs. BlabberSeg improves the efficiency of CLIPSeg by reusing prompt and model features, reducing computational overhead while achieving real‑time open‑vocabulary aerial segmentation. We validated BlabberSeg in a safe landing scenario using the Dynamic Open‑Vocabulary Enhanced SafE‑Landing with Intelligence (DOVESEI) framework, which uses visual servoing and open‑vocabulary segmentation. BlabberSeg reduces computational costs significantly, with a speed increase of 927.41% (16.78 Hz) on a NVIDIA Jetson Orin AGX (64GB) compared with the original CLIPSeg (1.81Hz), achieving real‑time aerial segmentation with negligible loss in accuracy (2.1% as the ratio of the correctly segmented area with respect to CLIPSeg). BlabberSeg's source code is open and available online.
Authors: Rushikesh Nalamothu, Puneet Sontha, Janardhan Karravula, Ankit Agrawal
Abstract: In the high‑stakes domain of search‑and‑rescue missions, the deployment of Unmanned Aerial Vehicles (UAVs) has become increasingly pivotal. These missions require seamless, real‑time communication among diverse roles within response teams, particularly between Remote Operators (ROs) and On‑Site Operators (OSOs). Traditionally, ROs and OSOs have relied on radio communication to exchange critical information, such as the geolocation of victims, hazardous areas, and points of interest. However, radio communication lacks information visualization, suffers from noise, and requires mental effort to interpret information, leading to miscommunications and misunderstandings. To address these challenges, this paper presents VizCom‑AR, an Augmented Reality system designed to facilitate visual communication between ROs and OSOs and their situational awareness during UAV‑driven search‑and‑rescue missions. Our experiments, focus group sessions with police officers, and field study showed that VizCom‑AR enhances spatial awareness of both ROs and OSOs, facilitate geolocation information exchange, and effectively complement existing communication tools in UAV‑driven emergency response missions. Overall, VizCom‑AR offers a fundamental framework for designing Augmented Reality systems for large scale UAV‑driven rescue missions.
Authors: Wenhao Zhang, Ji He, Yuanyu Zhang
Abstract: This paper delves into the time‑efficient covert multicast in a wireless communication system facilitated by Unmanned Aerial Vehicle (UAV), in which the UAV aims to disseminate a common covert information to multiple ground users (GUs) while suffering from the risk of detection by a ground warden (Willie). We propose one hop (OH) and two hop (TH) transmission schemes, first develop a theoretical framework for performance modeling of both the detection error probability at Willie and the transmission time at UAV. The optimization problems subject to the covertness constraint for the two transmission schemes are then formulated to gain insights into the system settings of the UAV's prior transmit probability, transmit power and horizontal location that affect the minimum transmission time. The optimization problems are non‑convex and challenging to give numerical results. We thus explore the optimal setting of the transmit power and the prior transmit probability for the UAV separately under specific parameters with two schemes. We further propose a particle swarm optimization (PSO) based algorithm and an exhaustive algorithm to provide the joint solutions for the optimization problem with the OH transmission scheme and TH scheme, respectively. Finally, the efficiency of the proposed PSO‑based algorithm is substantiated through extensive numerical results.
Authors: Francisco M. F. R. Goncalves, Ryan M. Bena, Konstantin I. Matveev, Nestor O. Perez-Arancibia
Abstract: We present a switching scheme, which uses both the attitude‑error quaternion (AEQ) and the angular‑velocity error, for controlling the rotational degrees of freedom of an uncrewed aerial vehicle (UAV) during flight. In this approach, the proposed controller continually selects the stable closed‑loop (CL) equilibrium AEQ corresponding to the smallest cost between those computed with two energy‑based Lyapunov functions. To analyze and enforce the stability of the CL switching dynamics, we use basic nonlinear theory. This research problem is relevant because the selection of the stable CL equilibrium AEQ directly determines the power and energy requirements of the controlled UAV during flight. To test and demonstrate the implementation, suitability, functionality, and performance of the proposed approach, we present experimental results obtained using a 31‑gram quadrotor, which was controlled to execute high‑speed yaw maneuvers in flight. These flight tests show that the proposed switching controller can respectively reduce the control effort and rotational power by as much as 49.75 % and 28.14 %, on average, compared to those corresponding to an often‑used benchmark controller.
Authors: Bryan S. Guevara, Viviana Moya, Daniel C. Gandolfo, Juan M. Toibero
Abstract: This paper presents a comprehensive approach to nonlinear dynamics identification for UAVs using a combination of data‑driven techniques and theoretical modeling. Two key methodologies are explored: Proportional‑Derivative (PD) approximation and Sparse Identification of Nonlinear Dynamics (SINDy). The UAV dynamics are first modeled using the Euler‑Lagrange formulation, providing a set of generalized coordinates. However, platform constraints limit the control inputs to attitude angles, and linear and angular velocities along the z‑axis. To accommodate these limitations, thrust and torque inputs are approximated using a PD controller, serving as the foundation for nonlinear system identification. In parallel, SINDy, a data‑driven method, is employed to derive a compact and interpretable model of the UAV dynamics from experimental data. Both identified models are then integrated into a Model Predictive Control (MPC) framework for accurate trajectory tracking, where model accuracy, informed by data‑driven insights, plays a critical role in optimizing control performance. This fusion of data‑driven approaches and theoretical modeling enhances the system's robustness and adaptability in real‑world conditions, offering a detailed analysis of the UAV's dynamic behavior.
Authors: Olalekan Akindele, Joshua Atolagbe
Abstract: Existing detection methods for insulator defect identification from unmanned aerial vehicles (UAV) struggle with complex background scenes and small objects, leading to suboptimal accuracy and a high number of false positives detection. Using the concept of local attention modeling, this paper proposes a new attention‑based foundation architecture, YOLO‑ELA, to address this issue. The Efficient Local Attention (ELA) blocks were added into the neck part of the one‑stage YOLOv8 architecture to shift the model's attention from background features towards features of insulators with defects. The SCYLLA Intersection‑Over‑Union (SIoU) criterion function was used to reduce detection loss, accelerate model convergence, and increase the model's sensitivity towards small insulator defects, yielding higher true positive outcomes. Due to a limited dataset, data augmentation techniques were utilized to increase the diversity of the dataset. In addition, we leveraged the transfer learning strategy to improve the model's performance. Experimental results on high‑resolution UAV images show that our method achieved a state‑of‑the‑art performance of 96.9% mAP0.5 and a real‑time detection speed of 74.63 frames per second, outperforming the baseline model. This further demonstrates the effectiveness of attention‑based convolutional neural networks (CNN) in object detection tasks.
Authors: Muhammad Morshed Alam, Sangman Moh
Abstract: Millimeter wave (mmWave)‑enabled unmanned aerial vehicle (UAV) swarm networks (UAVSNs) can utilize a large spectrum of resources to provide low latency and high data transmission rate. Additionally, owing to the short wavelength, UAVs equipped with large antenna arrays can form secure narrow directive beam to establish communication with less interference. However, due to the high UAV mobility, limited beam coverage, beam misalignment, and high path loss, it is very challenging to adopt the mmWave communication in UAVSNs. In this article, we present a comprehensive survey on neighbor discovery and beam alignment techniques for directional communication in mmWave‑enabled UAVSNs. The existing techniques are reviewed and compared with each other. We also discuss key open issues and challenges with potential research direction.
Authors: Alessandro Erba, John H. Castellanos, Sahil Sihag, Saman Zonouz, Nils Ole Tippenhauer
Abstract: Unmanned Aerial Vehicles autonomously perform tasks with the use of state‑of‑the‑art control algorithms. These control algorithms rely on the freshness and correctness of sensor readings. Incorrect control actions lead to catastrophic destabilization of the process.
In this work, we propose a multi‑part \emphSensor Deprivation Attacks (SDAs), aiming to stealthily impact process control via sensor reconfiguration. In the first part, the attacker will inject messages on local buses that connect to the sensor. The injected message reconfigures the sensors, e.g.,~to suspend the sensing. In the second part, those manipulation primitives are selectively used to cause adversarial sensor values at the controller, transparently to the data consumer. In the third part, the manipulated sensor values lead to unwanted control actions (e.g. a drone crash). We experimentally investigate all three parts of our proposed attack. Our findings show that i)~reconfiguring sensors can have surprising effects on reported sensor values, and ii)~the attacker can stall the overall Kalman Filter state estimation, leading to a complete stop of control computations. As a result, the UAV becomes destabilized, leading to a crash or significant deviation from its planned trajectory (over 30 meters). We also propose an attack synthesis methodology that optimizes the timing of these SDA manipulations, maximizing their impact. Notably, our results demonstrate that these SDAs evade detection by state‑of‑the‑art UAV anomaly detectors.
Our work shows that attacks on sensors are not limited to continuously inducing random measurements, and demonstrate that sensor reconfiguration can completely stall the drone controller. In our experiments, state‑of‑the‑art UAV controller software and countermeasures are unable to handle such manipulations. Hence, we also discuss new corresponding countermeasures.
Authors: Kangning Cui, Wei Tang, Rongkun Zhu, Manqi Wang, Gregory D. Larsen, Victor P. Pauca, Sarra Alqahtani, Fan Yang, David Segurado, Paul Fine, Jordan Karubian, Raymond H. Chan, Robert J. Plemmons, Jean-Michel Morel, Miles R. Silman
Abstract: Understanding the spatial distribution of palms within tropical forests is essential for effective ecological monitoring, conservation strategies, and the sustainable integration of natural forest products into local and global supply chains. However, the analysis of remotely sensed data in these environments faces significant challenges, such as overlapping palm and tree crowns, uneven shading across the canopy surface, and the heterogeneous nature of the forest landscapes, which often affect the performance of palm detection and segmentation algorithms. To overcome these issues, we introduce PalmDSNet, a deep learning framework for real‑time detection, segmentation, and counting of canopy palms. Additionally, we employ a bimodal reproduction algorithm that simulates palm spatial propagation to further enhance the understanding of these point patterns using PalmDSNet's results. We used UAV‑captured imagery to create orthomosaics from 21 sites across western Ecuadorian tropical forests, covering a gradient from the everwet Chocó forests near Colombia to the drier forests of southwestern Ecuador. Expert annotations were used to create a comprehensive dataset, including 7,356 bounding boxes on image patches and 7,603 palm centers across five orthomosaics, encompassing a total area of 449 hectares. By combining PalmDSNet with the bimodal reproduction algorithm, which optimizes parameters for both local and global spatial variability, we effectively simulate the spatial distribution of palms in diverse and dense tropical environments, validating its utility for advanced applications in tropical forest monitoring and remote sensing analysis.
Authors: Niloufar Mehrabi, Sayed Pedram Haeri Boroujeni, Jenna Hofseth, Abolfazl Razi, Long Cheng, Manveen Kaur, James Martin, Rahul Amin
Abstract: Unmanned Aerial Vehicles (UAVs) play an increasingly critical role in Intelligence, Surveillance, and Reconnaissance (ISR) missions such as border patrolling and criminal detection, thanks to their ability to access remote areas and transmit real‑time imagery to processing servers. However, UAVs are highly constrained by payload size, power limits, and communication bandwidth, necessitating the development of highly selective and efficient data transmission strategies. This has driven the development of various compression and optimal transmission technologies for UAVs. Nevertheless, most methods strive to preserve maximal information in transferred video frames, missing the fact that only certain parts of images/video frames might offer meaningful contributions to the ultimate mission objectives in the ISR scenarios involving moving object detection and tracking (OD/OT). This paper adopts a different perspective, and offers an alternative AI‑driven scheduling policy that prioritizes selecting regions of the image that significantly contributes to the mission objective. The key idea is tiling the image into small patches and developing a deep reinforcement learning (DRL) framework that assigns higher transmission probabilities to patches that present higher overlaps with the detected object of interest, while penalizing sharp transitions over consecutive frames to promote smooth scheduling shifts. Although we used Yolov‑8 object detection and UDP transmission protocols as a benchmark testing scenario the idea is general and applicable to different transmission protocols and OD/OT methods. To further boost the system's performance and avoid OD errors for cluttered image patches, we integrate it with interframe interpolations.
Authors: Lihao Qiu, Ming Zhu, JeeWoong Park, Yingtao Jiang, Hualiang, Teng
Abstract: The safety of train operations is largely dependent on the health of rail tracks, necessitating regular and meticulous inspection and maintenance. A significant part of such inspections involves geometric measurements of the tracks to detect any potential problems. Traditional methods for track geometry measurements, while proven to be accurate, require track closures during inspections, and consume a considerable amount of time as the inspection area grows, causing significant disruptions to regular operations. To address this challenge, this paper proposes a track geometry measurement system (TGMS) that utilizes an unmanned aerial vehicle (UAV) platform equipped with a light detection and ranging (LiDAR) sensor. Integrated with a state‑of‑the‑art machine‑learning‑based computer vision algorithm, and a simultaneous localization and mapping (SLAM) algorithm, this platform can conduct rail geometry inspections seamlessly over a larger area without interrupting rail operations. In particular, this semi‑ or fully automated measurement is found capable of measuring critical rail geometry irregularities in gauge, curvature, and profile with sub‑inch accuracy. Cross‑level and warp are not measured due to the absence of gravity data. By eliminating operational interruptions, our system offers a more streamlined, cost‑effective, and safer solution for inspecting and maintaining rail infrastructure.
Authors: Duy-Nam Bui, Thu Hang Khuat, Manh Duong Phung, Thuan-Hoang Tran, Dong LT Tran
Abstract: Motion planning is an essential process for the navigation of unmanned aerial vehicles (UAVs) where they need to adapt to obstacles and different structures of their operating environment to reach the goal. This paper presents an optimal motion planner for UAVs operating in unknown complex environments. The motion planner receives point cloud data from a local range sensor and then converts it into a voxel grid representing the surrounding environment. A local trajectory guiding the UAV to the goal is then generated based on the voxel grid. This trajectory is further optimized using model predictive control (MPC) to enhance the safety, speed, and smoothness of UAV operation. The optimization is carried out via the definition of several cost functions and constraints, taking into account the UAV's dynamics and requirements. A number of simulations and comparisons with a state‑of‑the‑art method have been conducted in a complex environment with many obstacles to evaluate the performance of our method. The results show that our method provides not only shorter and smoother trajectories but also faster and more stable speed profiles. It is also energy efficient making it suitable for various UAV applications.
Authors: Joshua Moore, Aly Sabri Abdalla, Charles Ueltschey, Anıl Gürses, Özgür Özdemir, Mihail L. Sichitiu, İsmail Güvenç, Vuk Marojevic
Abstract: The rapid evolution of 5G and beyond has advanced space‑air‑terrestrial networks, with unmanned aerial vehicles (UAVs) offering enhanced coverage, flexible configurations, and cost efficiency. However, deploying UAV‑based systems presents challenges including varying propagation conditions and hardware limitations. While simulators and theoretical models have been developed, real‑world experimentation is critically important to validate the research. Digital twins, virtual replicas of physical systems, enable emulation that bridge theory and practice. This paper presents our experimental results from AERPAW's digital twin, showcasing its ability to simulate UAV communication scenarios and providing insights into system performance and reliability.
Authors: Seungwook Lee, Maulana Bisyir Azhari, Gyuree Kang, Ozan Günes, Donghun Han, David Hyunchul Shim
Abstract: We present an integrated UAV‑hexapod robotic system designed for GNSS‑denied maritime operations, capable of autonomous deployment and retrieval of a hexapod robot via a winch mechanism installed on a UAV. This system is intended to address the challenges of localization, control, and mobility in dynamic maritime environments. Our solution leverages sensor fusion techniques, combining optical flow, LiDAR, and depth data for precise localization. Experimental results demonstrate the effectiveness of this system in real‑world scenarios, validating its performance during field tests in both controlled and operational conditions in the MBZIRC 2023 Maritime Challenge.
Authors: Yan Li, Deke Guo, Lailong Luo, Minghua Xia
Abstract: Air‑to‑ground (A2G) networks, using unmanned aerial vehicles (UAVs) as base stations to serve terrestrial user equipments (UEs), are promising for extending the spatial coverage capability in future communication systems. Coordinated transmission among multiple UAVs significantly improves network coverage and throughput compared to a single UAV transmission. However, implementing coordinated multi‑point (CoMP) transmission for UAV mobility requires complex cooperation procedures, regardless of the handoff mechanism involved. This paper designs a novel CoMP transmission strategy that enables terrestrial UEs to achieve reliable and seamless connections with mobile UAVs. Specifically, a computationally efficient CoMP transmission method based on the theory of Poisson‑Delaunay triangulation is developed, where an efficient subdivision search strategy for a CoMP UAV set is designed to minimize search overhead by a divide‑and‑conquer approach. For concrete performance evaluation, the cooperative handoff probability of the typical UE is analyzed, and the coverage probability with handoffs is derived. Simulation results demonstrate that the proposed scheme outperforms the conventional Voronoi scheme with the nearest serving UAV regarding coverage probabilities with handoffs. Moreover, each UE has a fixed and unique serving UAV set to avoid real‑time dynamic UAV searching and achieve effective load balancing, significantly reducing system resource costs and enhancing network coverage performance.
Authors: Meriem Ouadah, Fatiha Merazka
Abstract: Recent technological advancements have seen the integration of unmanned aerial networks (UAVs) into various sectors, from civilian missions to military operations. In this context, ensuring security, precisely authentication, is essential to prevent data theft and manipulation. A Man‑in‑the‑Middle attack not only compromises network integrity but also threatens the original data, potentially leading to theft or alteration. In this work, we proposed an authentication method to secure UAV data exchange over an insecure communication channel. Our solution combines Diffie‑Hellman (DH) key exchange and Hash‑based Message Authentication Code (HMAC) within ROS communication channels to authenticate exchanged UAV data. We evaluated our method by measuring transmission time and simulating key tampering, finding acceptable performance for DH key sizes below 4096 bits but longer times for larger sizes due to increased complexity. Both drones successfully detected tampered keys, affirming our method's efficacy in protecting UAV communication. However, scalability challenges in resource‑constrained environments warrant further research.
Authors: Viviane Potocnik, Alfio Di Mauro, Lorenzo Lamberti, Victor Kartsch, Moritz Scherer, Francesco Conti, Luca Benini
Abstract: Embodied artificial intelligence (AI) requires pushing complex multi‑modal models to the extreme edge for time‑constrained tasks such as autonomous navigation of robots and vehicles. On small form‑factor devices, e.g., nano‑sized unmanned aerial vehicles (UAVs), such challenges are exacerbated by stringent constraints on energy efficiency and weight. In this paper, we explore embodied multi‑modal AI‑based perception for Nano‑UAVs with the Kraken shield, a 7g multi‑sensor (frame‑based and event‑based imagers) board based on Kraken, a 22 nm SoC featuring multiple acceleration engines for multi‑modal event and frame‑based inference based on spiking (SNN) and ternary (TNN) neural networks, respectively. Kraken can execute SNN real‑time inference for depth estimation at 1.02k inf/s, 18 μJ/inf, TNN real‑time inference for object classification at 10k inf/s, 6 μJ/inf, and real‑time inference for obstacle avoidance at 221 frame/s, 750 μJ/inf.
Authors: Mirko Baglioni, Apurva Patil, Luis Sentis, Anahita Jamshidnejad
Abstract: Wildfire suppression is a complex task that poses high risks to humans. Using robotic teams for wildfire suppression enhances the safety and efficiency of detecting, monitoring, and extinguishing fires. We propose a control architecture based on task hierarchical control for the autonomous steering of a system of flying robots in wildfire suppression. We incorporate a novel line‑of‑sight obstacle avoidance method that calculates the best viewpoints and ensures an occlusion‑free view for the suppression robot during the mission. Path integral control generates optimal trajectories towards the goals. We conduct an ablation study to assess the effectiveness of our approach by comparing it to scenarios where these key components are excluded, in order to validate the approach in simulations using Matlab and Unity. The results demonstrate significant performance improvements, with 44.0 % increase in effectiveness with the new line‑of‑sight obstacle avoidance task and up to 39.6 % improvement when using path integral control.
Authors: Yunpeng Gao, Zhigang Wang, Pengfei Han, Linglin Jing, Dong Wang, Bin Zhao
Abstract: Aerial Vision‑and‑Language Navigation (VLN) is a novel task enabling Unmanned Aerial Vehicles (UAVs) to navigate in outdoor environments through natural language instructions and visual cues. However, it remains challenging due to the complex spatial relationships in aerial scenes.In this paper, we propose a training‑free, zero‑shot framework for aerial VLN tasks, where the large language model (LLM) is leveraged as the agent for action prediction. Specifically, we develop a novel Semantic‑Topo‑Metric Representation (STMR) to enhance the spatial reasoning capabilities of LLMs. This is achieved by extracting and projecting instruction‑related semantic masks onto a top‑down map, which presents spatial and topological information about surrounding landmarks and grows during the navigation process. At each step, a local map centered at the UAV is extracted from the growing top‑down map, and transformed into a ma trix representation with distance metrics, serving as the text prompt to LLM for action prediction in response to the given instruction. Experiments conducted in real and simulation environments have proved the effectiveness and robustness of our method, achieving absolute success rate improvements of 26.8% and 5.8% over current state‑of‑the‑art methods on simple and complex navigation tasks, respectively. The dataset and code will be released soon.
Authors: Ang He, Ximei Wu, Xing Xu, Jing Chen, Xiaobin Guo, Sheng Xu
Abstract: Precise segmentation of Unmanned Aerial Vehicle (UAV)‑captured images plays a vital role in tasks such as crop yield estimation and plant health assessment in banana plantations. By identifying and classifying planted areas, crop area can be calculated, which is indispensable for accurate yield predictions. However, segmenting banana plantation scenes requires a substantial amount of annotated data, and manual labeling of these images is both time‑consuming and labor‑intensive, limiting the development of large‑scale datasets. Furthermore, challenges such as changing target sizes, complex ground backgrounds, limited computational resources, and correct identification of crop categories make segmentation even more difficult. To address these issues, we proposed a comprehensive solution. Firstly, we designed an iterative optimization annotation pipeline leveraging SAM2's zero‑shot capabilities to generate high‑quality segmentation annotations, thereby reducing the cost and time associated with data annotation significantly. Secondly, we developed ALSS‑YOLO‑Seg, an efficient lightweight segmentation model optimized for UAV imagery. The model's backbone includes an Adaptive Lightweight Channel Splitting and Shuffling (ALSS) module to improve information exchange between channels and optimize feature extraction, aiding accurate crop identification. Additionally, a Multi‑Scale Channel Attention (MSCA) module combines multi‑scale feature extraction with channel attention to tackle challenges of varying target sizes and complex ground backgrounds.
Authors: Liang Liu, Xiao Hu, Wei Jiang, Guanglei Meng, Zhujun Wang, Taining Zhang
Abstract: Recent advancements in UAV technology have spurred interest in developing multi‑UAV aerial surveying systems for use in confined environments where GNSS signals are blocked or jammed. This paper focuses airborne magnetic surveying scenarios. To obtain clean magnetic measurements reflecting the Earth's magnetic field, the magnetic sensor must be isolated from other electronic devices, creating a significant localization challenge. We propose a visual cooperative localization solution. The solution incorporates a visual processing module and an improved manifold‑based sensor fusion algorithm, delivering reliable and accurate positioning information. Real flight experiments validate the approach, demonstrating single‑axis centimeter‑level accuracy and decimeter‑level overall 3D positioning accuracy.
Authors: Xiangyu Wang, Donglin Yang, Ziqin Wang, Hohin Kwan, Jinyu Chen, Wenjun Wu, Hongsheng Li, Yue Liao, Si Liu
Abstract: Developing agents capable of navigating to a target location based on language instructions and visual information, known as vision‑language navigation (VLN), has attracted widespread interest. Most research has focused on ground‑based agents, while UAV‑based VLN remains relatively underexplored. Recent efforts in UAV vision‑language navigation predominantly adopt ground‑based VLN settings, relying on predefined discrete action spaces and neglecting the inherent disparities in agent movement dynamics and the complexity of navigation tasks between ground and aerial environments. To address these disparities and challenges, we propose solutions from three perspectives: platform, benchmark, and methodology. To enable realistic UAV trajectory simulation in VLN tasks, we propose the OpenUAV platform, which features diverse environments, realistic flight control, and extensive algorithmic support. We further construct a target‑oriented VLN dataset consisting of approximately 12k trajectories on this platform, serving as the first dataset specifically designed for realistic UAV VLN tasks. To tackle the challenges posed by complex aerial environments, we propose an assistant‑guided UAV object search benchmark called UAV‑Need‑Help, which provides varying levels of guidance information to help UAVs better accomplish realistic VLN tasks. We also propose a UAV navigation LLM that, given multi‑view images, task descriptions, and assistant instructions, leverages the multimodal understanding capabilities of the MLLM to jointly process visual and textual information, and performs hierarchical trajectory generation. The evaluation results of our method significantly outperform the baseline models, while there remains a considerable gap between our results and those achieved by human operators, underscoring the challenge presented by the UAV‑Need‑Help task.
Authors: Bhola, Yu-Jia Chen, Ashutosh Balakrishnan, Swades De, Li-Chun Wang
Abstract: In the post‑fifth generation (5G) era, escalating user quality of service (QoS) strains terrestrial network capacity, especially in urban areas with dynamic traffic distributions. This paper introduces a novel cooperative unmanned aerial vehicle relay‑based deployment (CUD) framework in satellite air‑ground integrated networks (SAGIN). The CUD strategy deploys an unmanned aerial vehicle‑based relay (UAVr) in an amplify‑andforward (AF) mode to enhance user QoS when terrestrial base stations fall short of network capacity. By combining low earth orbit (LEO) satellite and UAVr signals using cooperative diversity, the CUD framework enhances the signal to noise ratio (SNR) at the user. Comparative evaluations against existing frameworks reveal performance improvements, demonstrating the effectiveness of the CUD framework in addressing the evolving demands of next‑generation networks.
Authors: Qihan Qi, Xinsong Yang, Gang Xia, Daniel W. C. Ho, Pengyang Tang
Abstract: This paper proposes a safety modulator actor‑critic (SMAC) method to address safety constraint and overestimation mitigation in model‑free safe reinforcement learning (RL). A safety modulator is developed to satisfy safety constraints by modulating actions, allowing the policy to ignore safety constraint and focus on maximizing reward. Additionally, a distributional critic with a theoretical update rule for SMAC is proposed to mitigate the overestimation of Q‑values with safety constraints. Both simulation and real‑world scenarios experiments on Unmanned Aerial Vehicles (UAVs) hovering confirm that the SMAC can effectively maintain safety constraints and outperform mainstream baseline algorithms.
Authors: Michal Werner, Tomáš Báča, Petr Štibinger, Daniela Doubravová, Jaroslav Šolc, Jan Rusňák, Martin Saska
Abstract: A novel method for autonomous localization of multiple sources of gamma radiation using a group of Micro Aerial Vehicles (MAVs) is presented in this paper. The method utilizes an extremely lightweight (44 g) Compton camera MiniPIX TPX3. The compact size of the detector allows for deployment onboard safe and agile small‑scale Unmanned Aerial Vehicles (UAVs). The proposed radiation mapping approach fuses measurements from multiple distributed Compton camera sensors to accurately estimate the positions of multiple radioactive sources in real time. Unlike commonly used intensity‑based detectors, the Compton camera reconstructs the set of possible directions towards a radiation source from just a single ionizing particle. Therefore, the proposed approach can localize radiation sources without having to estimate the gradient of a radiation field or contour lines, which require longer measurements. The instant estimation is able to fully exploit the potential of highly mobile MAVs. The radiation mapping method is combined with an active search strategy, which coordinates the future actions of the MAVs in order to improve the quality of the estimate of the sources' positions, as well as to explore the area of interest faster. The proposed solution is evaluated in simulation and real world experiments with multiple Cesium‑137 radiation sources.
Authors: Muhammad Morshed Alam, Muhammad Yeasir Aarafat, Tamim Hossain
Abstract: Autonomous unmanned aerial vehicle (UAV) swarm networks (UAVSNs) can effectively execute surveillance, connectivity, and computing services to ground users (GUs). These missions require trajectory planning, UAV‑GUs association, task offloading, next‑hop selection, and resources such as transmit power, bandwidth, caching, and computing allocation to improve network performances. Owing to the highly dynamic topology, limited resources, and non‑availability of global knowledge, optimizing network performance in UAVSNs is very intricate. Hence, it requires an adaptive joint optimization framework that can tackle both discrete and continuous decision variables to ensure optimal network performance under dynamic constraints. Multi‑agent deep reinforcement learning‑based adaptive actor‑critic framework can efficiently address these problems. This paper investigates the recent evolutions of actor‑critic frameworks to deal with joint optimization problems in UAVSNs. In addition, challenges and potential solutions are addressed as research directions.
Authors: Thomas Jantos, Martin Scheiber, Christian Brommer, Eren Allak, Stephan Weiss, Jan Steinbrener
Abstract: Object‑relative mobile robot navigation is essential for a variety of tasks, e.g. autonomous critical infrastructure inspection, but requires the capability to extract semantic information about the objects of interest from raw sensory data. While deep learning‑based (DL) methods excel at inferring semantic object information from images, such as class and relative 6 degree of freedom (6‑DoF) pose, they are computationally demanding and thus often not suitable for payload constrained mobile robots. In this letter we present a real‑time capable unmanned aerial vehicle (UAV) system for object‑relative, closed‑loop navigation with a minimal sensor configuration consisting of an inertial measurement unit (IMU) and RGB camera. Utilizing a DL‑based object pose estimator, solely trained on synthetic data and optimized for companion board deployment, the object‑relative pose measurements are fused with the IMU data to perform object‑relative localization. We conduct multiple real‑world experiments to validate the performance of our system for the challenging use case of power pole inspection. An example closed‑loop flight is presented in the supplementary video.
Authors: Pei-Fa Sun, Yujae Song, Kang-Yu Gao, Yu-Kai Wang, Changjun Zhou, Sang-Woon Jeon, Jun Zhang
Abstract: UAVs are increasingly becoming vital tools in various wireless communication applications including internet of things (IoT) and sensor networks, thanks to their rapid and agile non‑terrestrial mobility. Despite recent research, planning three‑dimensional (3D) UAV trajectories over a continuous temporal‑spatial domain remains challenging due to the need to solve computationally intensive optimization problems. In this paper, we study UAV‑assisted IoT data collection aimed at minimizing total energy consumption while accounting for the UAV's physical capabilities, the heterogeneous data demands of IoT nodes, and 3D terrain. We propose a matrix‑based differential evolution with constraint handling (MDE‑CH), a computation‑efficient evolutionary algorithm designed to address non‑convex constrained optimization problems with several different types of constraints. Numerical evaluations demonstrate that the proposed MDE‑CH algorithm provides a continuous 3D temporal‑spatial UAV trajectory capable of efficiently minimizing energy consumption under various practical constraints and outperforms the conventional fly‑hover‑fly model for both two‑dimensional (2D) and 3D trajectory planning.
Authors: Ashish Kumar, Laxmidhar Behera
Abstract: In this paper, we present a comprehensive UAV system design to perform the highly complex task of off‑centered aerial grasping. This task has several interdisciplinary research challenges which need to be addressed at once. The main design challenges are GPS‑denied functionality, solely onboard computing, and avoiding off‑the‑shelf costly positioning systems. While in terms of algorithms, visual perception, localization, control, and grasping are the leading research problems. Hence in this paper, we make interdisciplinary contributions: (i) A detailed description of the fundamental challenges in indoor aerial grasping, (ii) a novel lightweight gripper design, (iii) a complete aerial platform design and in‑lab fabrication, and (iv) localization, perception, control, grasping systems, and an end‑to‑end flight autonomy state‑machine. Finally, we demonstrate the resulting aerial grasping system Drone‑Bee achieving a high grasping rate for a highly challenging agricultural task of apple‑like fruit harvesting, indoors in a vertical farming setting (Fig. 1). To our knowledge, such a system has not been previously discussed in the literature, and with its capabilities, this system pushes aerial manipulation towards 4th generation.
Authors: Ashish Kumar, Laxmidhar Behera
Abstract: In this work, we propose an end‑to‑end Thrust Microstepping and Decoupled Control (TMDC) of quadrotors. TMDC focuses on precise off‑centered aerial grasping of payloads dynamically, which are attached rigidly to the UAV body via a gripper contrary to the swinging payload. The dynamic payload grasping quickly changes UAV's mass, inertia etc, causing instability while performing a grasping operation in‑air. We identify that to handle unknown payload grasping, the role of thrust controller is crucial. Hence, we focus on thrust control without involving system parameters such as mass etc. TMDC is based on our novel Thrust Microstepping via Acceleration Feedback (TMAF) thrust controller and Decoupled Motion Control (DMC). TMAF precisely estimates the desired thrust even at smaller loop rates while DMC decouples the horizontal and vertical motion to counteract disturbances in the case of dynamic payloads. We prove the controller's efficacy via exhaustive experiments in practically interesting and adverse real‑world cases, such as fully onboard state estimation without any positioning sensor, narrow and indoor flying workspaces with intense wind turbulence, heavy payloads, non‑uniform loop rates, etc. Our TMDC outperforms recent direct acceleration feedback thrust controller (DA) and geometric tracking control (GT) in flying stably for aerial grasping and achieves RMSE below 0.04m in contrast to 0.15m of DA and 0.16m of GT.
Authors: Filip Novák, Tomáš Báča, Ondřej Procházka, Martin Saska
Abstract: A novel approach for robust state estimation of marine vessels in rough water is proposed in this paper to enable tight collaboration between Unmanned Aerial Vehicles (UAVs) and a marine vessel, such as cooperative landing or object manipulation, regardless of weather conditions. Our study of marine vessel (in our case Unmanned Surface Vehicle (USV)) dynamics influenced by strong wave motion has resulted in a novel nonlinear mathematical USV model with 6 degrees of freedom (DOFs), which is required for precise USV state estimation and motion prediction. The proposed state estimation and prediction approach fuses data from multiple sensors onboard the UAV and the USV to enable redundancy and robustness under varying weather conditions of real‑world applications. The proposed approach provides estimated states of the USV with 6 DOFs and predicts its future states to enable tight control of both vehicles on a receding control horizon. The proposed approach was extensively tested in the realistic Gazebo simulator and successfully experimentally validated in many real‑world experiments representing different application scenarios, including agile landing on an oscillating and moving USV. A comparative study indicates that the proposed approach significantly surpassed the current state‑of‑the‑art.
Authors: Haoyun Li, Ming Xiao, Kezhi Wang, Dong In Kim, Merouane Debbah
Abstract: This letter investigates an unmanned aerial vehicle (UAV) network with integrated sensing and communication (ISAC) systems, where multiple UAVs simultaneously sense the locations of ground users and provide communication services with radars. To find the trade‑off between communication and sensing (C\&S) in the system, we formulate a multi‑objective optimization problem (MOP) to maximize the total network utility and the localization Cramér‑Rao bounds (CRB) of ground users, which jointly optimizes the deployment and power control of UAVs. Inspired by the huge potential of large language models (LLM) for prediction and inference, we propose an LLM‑enabled decomposition‑based multi‑objective evolutionary algorithm (LEDMA) for solving the highly non‑convex MOP. We first adopt a decomposition‑based scheme to decompose the MOP into a series of optimization sub‑problems. We second integrate LLMs as black‑box search operators with MOP‑specifically designed prompt engineering into the framework of MOEA to solve optimization sub‑problems simultaneously. Numerical results demonstrate that the proposed LEDMA can find the clear trade‑off between C\&S and outperforms baseline MOEAs in terms of obtained Pareto fronts and convergence.
Authors: Sicong Peng, Bin Li, Lei Liu, Zesong Fei, Dusit Niyato
Abstract: In this paper, we propose a multi‑unmanned aerial vehicle (UAV)‑assisted integrated sensing, communication, and computation network. Specifically, the treble‑functional UAVs are capable of offering communication and edge computing services to mobile users (MUs) in proximity, alongside their target sensing capabilities by using multi‑input multi‑output arrays. For the purpose of enhance the computation efficiency, we consider task compression, where each MU can partially compress their offloaded data prior to transmission to trim its size. The objective is to minimize the weighted energy consumption by jointly optimizing the transmit beamforming, the UAVs' trajectories, the compression and offloading partition, the computation resource allocation, while fulfilling the causal‑effect correlation between communication and computation as well as adhering to the constraints on sensing quality. To tackle it, we first reformulate the original problem as a multi‑agent Markov decision process (MDP), which involves heterogeneous agents to decompose the large state spaces and action spaces of MDP. Then, we propose a multi‑agent proximal policy optimization algorithm with attention mechanism to handle the decision‑making problem. Simulation results validate the significant effectiveness of the proposed method in reducing energy consumption. Moreover, it demonstrates superior performance compared to the baselines in relation to resource utilization and convergence speed.
Authors: Unmesh Patil, Akshith Gunasekaran, Rakesh Bobba, Houssam Abbas
Abstract: We present a new simulator of Uncrewed Aerial Vehicles (UAVs) that is
tailored to the needs of testing cyber‑physical security attacks and
defenses. Recent investigations into UAV safety have unveiled various attack
surfaces and some defense mechanisms. However, due to escalating regulations
imposed by aviation authorities on security research on real UAVs, and the
substantial costs associated with hardware test‑bed configurations, there
arises a necessity for a simulator capable of substituting for hardware
experiments, and/or narrowing down their scope to the strictly necessary.
The study of different attack mechanisms requires specific features in a
simulator. We propose a simulation framework based on ROS2, leveraging some
of its key advantages, including modularity, replicability, customization,
and the utilization of open‑source tools such as Gazebo. Our framework has a
built‑in motion planner, controller, communication models and attack models.
We share examples of research use cases that our framework can enable,
demonstrating its utility.
Authors: Yen-Cheng Chu, Kai-Cheng Fang, Feng-Li Lian
Abstract: Traditional vertical take‑off and landing (VTOL) aircraft can not achieve optimal efficiency for various payload weights and has limited mobility due to its under‑actuation. With the thrust‑vectoring mechanism, the proposed modular team UAV is fully actuated at certain attitudes. However, the attainable force space (AFS) differs according to the team configuration, which makes the controller design difficult. We propose an approximation to the AFS and a full‑pose tracking controller with an attitude planner and a force projection, which guarantees the control force is feasible. The proposed approach can be applied to UAVs having multiple thrust‑vectoring effectors with homogeneous agents. The simulation and experiment demonstrate a tilting motion during hovering for a 4‑agent team.
Authors: Shouthiri Partheepan, Farzad Sanati, Jahan Hassan
Abstract: Bushfire is one of the major natural disasters that cause huge losses to livelihoods and the environment. Understanding and analyzing the severity of bushfires is crucial for effective management and mitigation strategies, helping to prevent the extensive damage and loss caused by these natural disasters. This study presents an in‑depth analysis of bushfire severity in Australia over the last twelve years, combining remote sensing data and machine learning techniques to predict future fire trends. By utilizing Landsat imagery and integrating spectral indices like NDVI, NBR, and Burn Index, along with topographical and climatic factors, we developed a robust predictive model using XGBoost. The model achieved high accuracy, 86.13%, demonstrating its effectiveness in predicting fire severity across diverse Australian ecosystems. By analyzing historical trends and integrating factors such as population density and vegetation cover, we identify areas at high risk of future severe bushfires. Additionally, this research identifies key regions at risk, providing data‑driven recommendations for targeted firefighting efforts. The findings contribute valuable insights into fire management strategies, enhancing resilience to future fire events in Australia. Also, we propose future work on developing a UAV‑based swarm coordination model to enhance fire prediction in real‑time and firefighting capabilities in the most vulnerable regions.
Authors: Tuan-Cuong Vuong, Cong Chi Nguyen, Van-Cuong Pham, Thi-Thanh-Huyen Le, Xuan-Nam Tran, Thien Van Luong
Abstract: This paper proposes a novel intrusion detection method for unmanned aerial vehicles (UAV) in the presence of recent actual UAV intrusion dataset. In particular, in the first stage of our method, we design an autoencoder architecture for effectively extracting important features, which are then fed into various machine learning models in the second stage for detecting and classifying attack types. To the best of our knowledge, this is the first attempt to propose such the autoencoder‑based machine learning intrusion detection method for UAVs using actual dataset, while most of existing works only consider either simulated datasets or datasets irrelevant to UAV communications. Our experiment results show that the proposed method outperforms the baselines such as feature selection schemes in both binary and multi‑class classification tasks.
Authors: Mingliang Wei, Ruoguang Li, Li Wang, Lianming Xu, Zhu Han
Abstract: In this letter, we propose an airborne maneuverable bi‑static integrated sensing and communication system where both the transmitter and receiver are unmanned aerial vehicles. By timely forming a dynamic bi‑static range based on the motion information of the target, such a system can provide an adaptive two dimensional tracking and communication services. Towards this end, a trajectory optimization problem for both transmits and receive UAV is formulated to achieve high‑accurate motion state estimation by minimizing the time‑variant Cramer Rao bound, subject to the sufficient communication signal‑to‑noise ratio to maintain communication channel prediction error. Then we develop an efficient approach based on the successive convex approximation technique and the S‑procedure to address the problem. Numerical results demonstrate that our proposed airborne maneuverable bi‑static ISAC system is able to obtain higher tracking accuracy compared with the static or semi‑dynamic ISAC system.
Authors: Giulio Rigoni, Nicola Scremin, Mauro Conti
Abstract: There has been substantial growth in the UAV market along with an expansion in their applications. However, the successful execution of a UAV mission is very often dependent on the use of a GNSS. Unfortunately, the vulnerability of GNSS signals, due to their lack of encryption and authentication, poses a significant cybersecurity issue. This vulnerability makes various attacks, particularly the "GNSS spoofing attack," and "GNSS jamming attack" easily executable. Generally speaking, during this attack, the drone is manipulated into altering its path, usually resulting in an immediate forced landing or crash. As far as we know, we are the first to propose a lightweight‑solution that enable a drone to autonomously rescue itself, assuming it is under GNSS attack and the GNSS is no longer available, and return safely to its initial takeoff position, thereby preventing any potential crashes. During the flight, wind plays a critical role as it can instantaneously alter the drone's position. To solve this problem, we have devised a highly effective 2‑phases solution: (i) Forward Phase, for monitoring and recording the forward journey, and (ii) Backward Phase, that generates a backward route, based on the Forward Phase and wind presence. The final solution ensures strong performance in consistently returning the drone to the original position, even in wind situations, while maintaining a very fast computation time.
Authors: Chang Hou, Luigi Marra, Guy Y. Cornejo Maceda, Peng Jiang, Jingguo Chen, Yutong Liu, Gang Hu, Jialong Chen, Andrea Ianiro, Stefano Discetti, Andrea Meilán-Vila, Bernd R. Noack
Abstract: We propose a physics‑informed data‑driven framework for urban wind estimation. This framework validates and incorporates the Reynolds number independence for flows under various working conditions, thus allowing the extrapolation for wind conditions far beyond the training data. Another key enabler is a machine‑learned non‑dimensionalized manifold from snapshot data. The velocity field is modeled using a double encoder‑decoder approach. The first encoder normalizes data using the oncoming wind speed, while the second encoder projects this normalized data onto the isometric feature mapping manifold. The decoders reverse this process, with k‑nearest neighbor performing the first decoding and the second undoing the normalization. The manifold is coarse‑grained by clustering to reduce the computational load for de‑ and encoding. The sensor‑based flow estimation is based on the estimate of the oncoming wind speed and a mapping from sensor signal to the manifold latent variables. The proposed machine‑learned flow estimation framework is exemplified for the flow above an Unmanned Aerial Vehicle vertiport. The wind estimation is shown to generalize well for rare wind conditions, not included in the original database.
Authors: Gabriel C. M. da Silva, Victor F. Monteiro, Diego A. Sousa, Darlan C. Moreira, Tarcisio F. Maciel, Fco. Rafael M. Lima, Behrooz Makki
Abstract: As the number of user equipments increases in fifth generation (5G) and beyond, it is desired to densify the cellular network with auxiliary nodes assisting the base stations. Examples of these nodes are integrated access and backhaul (IAB) nodes, network‑controlled repeaters (NCRs) and reconfigurable intelligent surfaces (RISs). In this context, this work presents a system level overview of these three nodes. Moreover, this work evaluates through simulations the impact of network planning aiming at enhancing the performance of a network used to cover an outdoor sport event. We show that, in the considered scenario, in general, IAB nodes provide an improved signal to interference‑plus‑noise ratio and throughput, compared to NCRs and RISs. However, there are situations where NCR outperforms IAB due to higher level of interference caused by the latter. Finally, we show that the deployment of these nodes in unmanned aerial vehicles (UAVs) also achieves performance gains due to their aerial mobility. However, UAV constraints related to aerial deployment may prevent these nodes from reaching results as good as the ones achieved by their stationary deployment.
Authors: Hongliang Ma, Jie Ding, Zhe Zhang, Qiang Gao, Quan Liu, Gaohan Wang, Wendong Zhang, Xuge Fan
Abstract: The advent of the 5G era means that the concepts of robot, VR/AR, UAV, smart home, smart healthcare based on IoT (Internet of Things) have gradually entered human life. Since then, intelligent life has become the dominant direction of social development. Humidity sensors, as humidity detection tools, not only convey the comfort of human living environment, but also display great significance in the fields of meteorology, medicine, agriculture and industry. Graphene‑based materials exhibit tremendous potential in humidity sensing owing to their ultra‑high specific surface area and excellent electron mobility under room temperature for application in humidity sensing. This review begins with the introduction of examples of various synthesis strategies of graphene, followed by the device structure and working mechanism of graphene‑based humidity sensor. In addition, several different structural design methods of graphene are summarized, demonstrating the structural design of graphene can not only optimize the performance of graphene, but also bring significant advantages in humidity sensing. Finally, key challenges hindering the further development and practical application of high‑performance graphene‑based humidity sensors are discussed, followed by presenting the future perspectives.
Authors: Yufeng Zheng, Lixin Li, Wensheng Lin, Wei Liang, Qinghe Du, Zhu Han
Abstract: This paper investigates the resource allocation optimization for cooperative communication with non‑cooperative localization in integrated sensing and communications (ISAC)‑enabled multi‑unmanned aerial vehicle (UAV) cooperative networks. Our goal is to maximize the weighted sum of the system's average sum rate and the localization quality of service (QoS) by jointly optimizing cell association, communication power allocation, and sensing power allocation. Since the formulated problem is a mixed‑integer nonconvex problem, we propose the alternating iteration algorithm based on optimal transport theory (AIBOT) to solve the optimization problem more effectively. Simulation results demonstrate that the AIBOT can improve the system sum rate by nearly 12% and reduce the localization Cr'amer‑Rao bound (CRB) by almost 29% compared to benchmark algorithms.
Authors: Ya Lian, Wensheng Lin, Lixin Li, Fucheng Yang, Zhu Han, Tad Matsumoto
Abstract: In this paper, performance of a lossy cooperative unmanned aerial vehicle (UAV) relay communication system is analyzed. In this system, the UAV relay adopts lossy forward (LF) strategy and the receiver has certain distortion requirements for the received information. For the system described above, we first derive the achievable rate distortion region of the system. Then, on the basis of the region analysis, the system outage probability when the channel suffers Nakagami‑m fading is analyzed. Finally, we design an optimal relay position identification algorithm based on the Soft Actor‑Critic (SAC) algorithm, which determines the optimal UAV position to minimize the outage probability. The simulation results show that the proposed algorithm can optimize the UAV position and reduce the system outage probability effectively.
Authors: S. Parisa Dajkhosh, Peter M. Le, Orges Furxhi, Eddie L. Jacobs
Abstract: Capturing real‑world aerial images for vision‑based navigation (VBN) is challenging due to limited availability and conditions that make it nearly impossible to access all desired images from any location. The complexity increases when multiple locations are involved. State‑of‑the‑art solutions, such as deploying UAVs (unmanned aerial vehicles) for aerial imaging or relying on existing research databases, come with significant limitations. TerrAInav Sim offers a compelling alternative by simulating a UAV to capture bird's‑eye view map‑based images at zero yaw with real‑world visible‑band specifications. This open‑source tool allows users to specify the bounding box (top‑left and bottom‑right) coordinates of any region on a map. Without the need to physically fly a drone, the virtual Python UAV performs a raster search to capture images. Users can define parameters such as the flight altitude, aspect ratio, diagonal field of view of the camera, and the overlap between consecutive images. TerrAInav Sim's capabilities range from capturing a few low‑altitude images for basic applications to generating extensive datasets of entire cities for complex tasks like deep learning. This versatility makes TerrAInav a valuable tool for not only VBN but also other applications, including environmental monitoring, construction, and city management. The open‑source nature of the tool also allows for the extension of the raster search to other missions. A dataset of Memphis, TN, has been provided along with this simulator. A supplementary dataset is also provided, which includes data from a 3D world generation package for comparison.
Authors: Fiifi Dawson, Zainab Mosunmola, Sahil Pocker, Raj Abhijit Dandekar, Rajat Dandekar, Sreedath Panat
Abstract: Although LLMs have been extremely effective in a large number of complex tasks, their understanding and functionality for regional languages and cultures are not well studied. In this paper, we explore the ability of various LLMs to comprehend the cultural aspects of two regional languages: Malayalam (state of Kerala, India) and Yoruba (West Africa). Using Hofstede's six cultural dimensions: Power Distance (PDI), Individualism (IDV), Motivation towards Achievement and Success (MAS), Uncertainty Avoidance (UAV), Long Term Orientation (LTO), and Indulgence (IVR), we quantify the cultural awareness of LLM‑based responses. We demonstrate that although LLMs show a high cultural similarity for English, they fail to capture the cultural nuances across these 6 metrics for Malayalam and Yoruba. We also highlight the need for large‑scale regional language LLM training with culturally enriched datasets. This will have huge implications for enhancing the user experience of chat‑based LLMs and also improving the validity of large‑scale LLM agent‑based market research.
Authors: Wentao Wang, Yi Shen, Kaiyang Chen, Kaifan Lu
Abstract: Search‑based motion planning algorithms have been widely utilized for unmanned aerial vehicles (UAVs). However, deploying these algorithms on real UAVs faces challenges due to limited onboard computational resources. The algorithms struggle to find solutions in high‑dimensional search spaces and require considerable time to ensure that the trajectories are dynamically feasible. This paper incorporates the lazy search concept into search‑based planning algorithms to address the critical issue of real‑time planning for collision‑free and dynamically feasible trajectories on UAVs. We demonstrate that the lazy search motion planning algorithm can efficiently find optimal trajectories and significantly improve computational efficiency.
Authors: Vlatko Spasev, Ivica Dimitrovski, Ivan Chorbev, Ivan Kitanovski
Abstract: The escalating use of Unmanned Aerial Vehicles (UAVs) as remote sensing platforms has garnered considerable attention, proving invaluable for ground object recognition. While satellite remote sensing images face limitations in resolution and weather susceptibility, UAV remote sensing, employing low‑speed unmanned aircraft, offers enhanced object resolution and agility. The advent of advanced machine learning techniques has propelled significant strides in image analysis, particularly in semantic segmentation for UAV remote sensing images. This paper evaluates the effectiveness and efficiency of SegFormer, a semantic segmentation framework, for the semantic segmentation of UAV images. SegFormer variants, ranging from real‑time (B0) to high‑performance (B5) models, are assessed using the UAVid dataset tailored for semantic segmentation tasks. The research details the architecture and training procedures specific to SegFormer in the context of UAV semantic segmentation. Experimental results showcase the model's performance on benchmark dataset, highlighting its ability to accurately delineate objects and land cover features in diverse UAV scenarios, leading to both high efficiency and performance.
Authors: Vytautas Paura, Virginijus Marcinkevičius
Abstract: The hyperspectral unmixing method is an algorithm that extracts material (usually called endmember) data from hyperspectral data cube pixels along with their abundances. Due to a lower spatial resolution of hyperspectral sensors data in each of the pixels may contain mixed information from multiple endmembers. In this paper we create a hyperspectral unmixing dataset, created from blueberry field data gathered by a hyperspectral camera mounted on a UAV. We also propose a hyperspectral unmixing algorithm based on U‑Net network architecture to achieve more accurate unmixing results on existing and newly created hyperspectral unmixing datasets.
Authors: Gong Chen, Malika Meghjani, Marcel Bartholomeus Prasetyo
Abstract: We present a viewpoint‑based non‑linear Model Predictive Control (MPC) for evacuation guiding robots. Specifically, the proposed MPC algorithm enables evacuation guiding robots to track and guide cooperative human targets in emergency scenarios. Our algorithm accounts for the environment layout as well as distances between the robot and human target and distance to the goal location. A key challenge for evacuation guiding robot is the trade‑off between its planned motion for leading the target toward a goal position and staying in the target's viewpoint while maintaining line‑of‑sight for guiding. We illustrate the effectiveness of our proposed evacuation guiding algorithm in both simulated and real‑world environments with an Unmanned Aerial Vehicle (UAV) guiding a human. Our results suggest that using the contextual information from the environment for motion planning, increases the visibility of the guiding UAV to the human while achieving faster total evacuation time.
Authors: Ethan Davies, Pranav Kalidindi
Abstract: Mission planning often involves optimising the use of ISR (Intelligence, Surveillance and Reconnaissance) assets in order to achieve a set of mission objectives within allowed parameters subject to constraints. The missions of interest here, involve routing multiple UAVs visiting multiple targets, utilising sensors to capture data relating to each target. Finding such solutions is often an NP‑Hard problem and cannot be solved efficiently on classical computers. Furthermore, during the mission new constraints and objectives may arise, requiring a new solution to be computed within a short time period. To achieve this we investigate near term quantum algorithms that have the potential to offer speed‑ups against current classical methods. We demonstrate how a large family of these problems can be formulated as a Mixed Integer Linear Program (MILP) and then converted to a Quadratic Unconstrained Binary Optimisation (QUBO). The formulation provided is versatile and can be adapted for many different constraints with clear qubit scaling provided. We discuss the results of solving the QUBO formulation using commercial quantum annealers and compare the solutions to current edge classical solvers. We also analyse the results from solving the QUBO using Quantum Approximate Optimisation Algorithms (QAOA) and discuss their results. Finally, we also provide efficient methods to encode to the problem into the Variational Quantum Eigensolver (VQE) formalism, where we have tailored the ansatz to the problem making efficient use of the qubits available.
Authors: Ruiqi Xian, Xiyang Wu, Tianrui Guan, Xijun Wang, Boqing Gong, Dinesh Manocha
Abstract: We introduce FALCON, a unified self‑supervised video pretraining approach for UAV action recognition from raw RGB aerial footage, requiring no additional preprocessing at inference. UAV videos exhibit severe spatial imbalance: large, cluttered backgrounds dominate the field of view, causing reconstruction‑based pretraining to waste capacity on uninformative regions and under‑learn action‑relevant human/object cues. FALCON addresses this by integrating object‑aware masked autoencoding with object‑centric dual‑horizon future reconstruction. Using detections only during pretraining, we construct objectness priors that (i) enforce balanced token visibility during masking and (ii) concentrate reconstruction supervision on action‑relevant regions, preventing learning from being dominated by background appearance. To promote temporal dynamics learning, we further reconstruct short‑ and long‑horizon future content within an object‑centric supervision region, injecting anticipatory temporal supervision that is robust to noisy aerial context. Across UAV benchmarks, FALCON improves top‑1 accuracy by 2.9% on NEC‑Drone and 5.8% on UAV‑Human with a ViT‑B backbone, while achieving 2×‑‑5× faster inference than supervised approaches that rely on heavy test‑time augmentation.
Authors: Teaya Yang, Roman Ibrahimov, Mark W. Mueller
Abstract: We present an autonomous aerial system for safe and efficient through‑the‑canopy fruit counting. Aerial robot applications in large‑scale orchards face significant challenges due to the complexity of fine‑tuning flight paths based on orchard layouts, canopy density, and plant variability. Through‑the‑canopy navigation is crucial for minimizing occlusion by leaves and branches but is more challenging due to the complex and dense environment compared to traditional over‑the‑canopy flights. Our system addresses these challenges by integrating: i) a high‑fidelity simulation framework for optimizing flight trajectories, ii) a low‑cost autonomy stack for canopy‑level navigation and data collection, and iii) a robust workflow for fruit detection and counting using RGB images. We validate our approach through fruit counting with canopy‑level aerial images and by demonstrating the autonomous navigation capabilities of our experimental vehicle.
Authors: Jean-Michel Fortin, Olivier Gamache, William Fecteau, Effie Daum, William Larrivée-Hardy, François Pomerleau, Philippe Giguère
Abstract: Terrain awareness is an essential milestone to enable truly autonomous off‑road navigation. Accurately predicting terrain characteristics allows optimizing a vehicle's path against potential hazards. Recent methods use deep neural networks to predict traversability‑related terrain properties in a self‑supervised manner, relying on proprioception as a training signal. However, onboard cameras are inherently limited by their point‑of‑view relative to the ground, suffering from occlusions and vanishing pixel density with distance. This paper introduces a novel approach for self‑supervised terrain characterization using an aerial perspective from a hovering drone. We capture terrain‑aligned images while sampling the environment with a ground vehicle, effectively training a simple predictor for vibrations, bumpiness, and energy consumption. Our dataset includes 2.8 km of off‑road data collected in forest environment, comprising 13 484 ground‑based images and 12 935 aerial images. Our findings show that drone imagery improves terrain property prediction by 21.37 % on the whole dataset and 37.35 % in high vegetation, compared to ground robot images. We conduct ablation studies to identify the main causes of these performance improvements. We also demonstrate the real‑world applicability of our approach by scouting an unseen area with a drone, planning and executing an optimized path on the ground.
Authors: Fernando Cladera, Kenneth Chaney, M. Ani Hsieh, Camillo J. Taylor, Vijay Kumar
Abstract: Traditionally, unmanned aerial vehicles (UAVs) rely on CMOS‑based cameras to collect images about the world below. One of the most successful applications of UAVs is to generate orthomosaics or orthomaps, in which a series of images are integrated together to develop a larger map. However, the use of CMOS‑based cameras with global or rolling shutters mean that orthomaps are vulnerable to challenging light conditions, motion blur, and high‑speed motion of independently moving objects under the camera. Event cameras are less sensitive to these issues, as their pixels are able to trigger asynchronously on brightness changes. This work introduces the first orthomosaic approach using event cameras. In contrast to existing methods relying only on CMOS cameras, our approach enables map generation even in challenging light conditions, including direct sunlight and after sunset.
Authors: Federica Tonti, Jean Rabault, Ricardo Vinuesa
Abstract: The increasing number of unmanned aerial vehicles (UAVs) in urban environments requires a strategy to minimize their environmental impact, both in terms of energy efficiency and noise reduction. In order to reduce these concerns, novel strategies for developing prediction models and optimization of flight planning, for instance through deep reinforcement learning (DRL), are needed. Our goal is to develop DRL algorithms capable of enabling the autonomous navigation of UAVs in urban environments, taking into account the presence of buildings and other UAVs, optimizing the trajectories in order to reduce both energetic consumption and noise. This is achieved using fluid‑flow simulations which represent the environment in which UAVs navigate and training the UAV as an agent interacting with an urban environment. In this work, we consider a domain domain represented by a two‑dimensional flow field with obstacles, ideally representing buildings, extracted from a three‑dimensional high‑fidelity numerical simulation. The presented methodology, using PPO+LSTM cells, was validated by reproducing a simple but fundamental problem in navigation, namely the Zermelo's problem, which deals with a vessel navigating in a turbulent flow, travelling from a starting point to a target location, optimizing the trajectory. The current method shows a significant improvement with respect to both a simple PPO and a TD3 algorithm, with a success rate (SR) of the PPO+LSTM trained policy of 98.7%, and a crash rate (CR) of 0.1%, outperforming both PPO (SR = 75.6%, CR=18.6%) and TD3 (SR=77.4% and CR=14.5%). This is the first step towards DRL strategies which will guide UAVs in a three‑dimensional flow field using real‑time signals, making the navigation efficient in terms of flight time and avoiding damages to the vehicle.
Authors: David Olivares, Pierre Fournier, Pavan Vasishta, Julien Marzat
Abstract: This paper evaluates and compares the performance of model‑free and model‑based reinforcement learning for the attitude control of fixed‑wing unmanned aerial vehicles using PID as a reference point. The comparison focuses on their ability to handle varying flight dynamics and wind disturbances in a simulated environment. Our results show that the Temporal Difference Model Predictive Control agent outperforms both the PID controller and other model‑free reinforcement learning methods in terms of tracking accuracy and robustness over different reference difficulties, particularly in nonlinear flight regimes. Furthermore, we introduce actuation fluctuation as a key metric to assess energy efficiency and actuator wear, and we test two different approaches from the literature: action variation penalty and conditioning for action policy smoothness. We also evaluate all control methods when subject to stochastic turbulence and gusts separately, so as to measure their effects on tracking performance, observe their limitations and outline their implications on the Markov decision process formalism.
Authors: Zhiying Wang, Tianxi Wei, Gang Sun, Xinyue Liu, Hongfang Yu, Dusit Niyato
Abstract: Mobile Edge Computing (MEC) reduces the computational burden on terminal devices by shortening the distance between these devices and computing nodes. Integrating Unmanned Aerial Vehicles (UAVs) with enhanced MEC networks can leverage the high mobility of UAVs to flexibly adjust network topology, further expanding the applicability of MEC. However, in highly dynamic and complex real‑world environments, it is crucial to balance task offloading effectiveness with algorithm performance. This paper investigates a multi‑UAV communication network equipped with edge computing nodes to assist terminal users in task computation. Our goal is to reduce the task processing delay for users through the joint optimization of discrete computation modes, continuous 3D trajectories, and resource assignment. To address the challenges posed by the mixed action space, we propose a Multi‑UAV Edge Computing Resource Scheduling (MUECRS) algorithm, which comprises two key components: 1) trajectory optimization, and 2) computation mode and resource management. Experimental results demonstrate our method effectively designs the 3D flight trajectories of UAVs, enabling rapid terminal coverage. Furthermore, the proposed algorithm achieves efficient resource deployment and scheduling, outperforming comparative algorithms by at least 16.7%, demonstrating superior adaptability and robustness.
Authors: Fangcheng Zhu, Yunfan Ren, Longji Yin, Fanze Kong, Qingbo Liu, Ruize Xue, Wenyi Liu, Yixi Cai, Guozheng Lu, Haotian Li, Fu Zhang
Abstract: Aerial swarm systems possess immense potential in various aspects, such as cooperative exploration, target tracking, search and rescue. Efficient, accurate self and mutual state estimation are the critical preconditions for completing these swarm tasks, which remain challenging research topics. This paper proposes Swarm‑LIO2: a fully decentralized, plug‑and‑play, computationally efficient, and bandwidth‑efficient LiDAR‑inertial odometry for aerial swarm systems. Swarm‑LIO2 uses a decentralized, plug‑and‑play network as the communication infrastructure. Only bandwidth‑efficient and low‑dimensional information is exchanged, including identity, ego‑state, mutual observation measurements, and global extrinsic transformations. To support the plug‑and‑play of new teammate participants, Swarm‑LIO2 detects potential teammate UAVs and initializes the temporal offset and global extrinsic transformation all automatically. To enhance the initialization efficiency, novel reflectivity‑based UAV detection, trajectory matching, and factor graph optimization methods are proposed. For state estimation, Swarm‑LIO2 fuses LiDAR, IMU, and mutual observation measurements within an efficient ESIKF framework, with careful compensation of temporal delay and modeling of measurements to enhance the accuracy and consistency.
Authors: Ran Zhang, Bowei Li, Liyuan Zhang, Jiang, Xie, Miao Wang
Abstract: Unmanned Aerial Vehicle (UAV) based communication networks (UCNs) are a key component in future mobile networking. To handle the dynamic environments in UCNs, reinforcement learning (RL) has been a promising solution attributed to its strong capability of adaptive decision‑making free of the environment models. However, most existing RL‑based research focus on control strategy design assuming a fixed set of UAVs. Few works have investigated how UCNs should be adaptively regulated when the serving UAVs change dynamically. This article discusses RL‑based strategy design for adaptive UCN regulation given a dynamic UAV set, addressing both reactive strategies in general UCNs and proactive strategies in solar‑powered UCNs. An overview of the UCN and the RL framework is first provided. Potential research directions with key challenges and possible solutions are then elaborated. Some of our recent works are presented as case studies to inspire innovative ways to handle dynamic UAV crew with different RL algorithms.
Authors: Sivaram Krishnan, Jihong Park, Gregory Sherman, Benjamin Campbell, Jinho Choi
Abstract: Low Probability of Detection (LPD) communication aims to obscure the presence of radio frequency (RF) signals to evade surveillance. In the context of mobile surveillance utilizing unmanned aerial vehicles (UAVs), achieving LPD communication presents significant challenges due to the UAVs' rapid and continuous movements, which are characterized by unknown nonlinear dynamics. Therefore, accurately predicting future locations of UAVs is essential for enabling real‑time LPD communication. In this paper, we introduce a novel framework termed predictive covert communication, aimed at minimizing detectability in terrestrial ad‑hoc networks under multi‑UAV surveillance. Our data‑driven method synergistically integrates graph neural networks (GNN) with Koopman theory to model the complex interactions within a multi‑UAV network and facilitating long‑term predictions by linearizing the dynamics, even with limited historical data. Extensive simulation results substantiate that the predicted trajectories using our method result in at least 63%‑75% lower probability of detection when compared to well‑known state‑of‑the‑art baseline approaches, showing promise in enabling low‑latency covert operations in practical scenarios.
Authors: Sotirios N. Aspragkathos, Panagiotis Rousseas, George C. Karras, Kostas J. Kyriakopoulos
Abstract: This article presents a Visual Servoing Nonlinear Model Predictive Control (NMPC) scheme for autonomously tracking a moving target using multirotor Unmanned Aerial Vehicles (UAVs). The scheme is developed for surveillance and tracking of contour‑based areas with evolving features. NMPC is used to manage input and state constraints, while additional barrier functions are incorporated in order to ensure system safety and optimal performance. The proposed control scheme is designed based on the extraction and implementation of the full dynamic model of the features describing the target and the state variables. Real‑time simulations and experiments using a quadrotor UAV equipped with a camera demonstrate the effectiveness of the proposed strategy.
Authors: Yannik Blei, Michael Krawez, Nisarga Nilavadi, Tanja Katharina Kaiser, Wolfram Burgard
Abstract: Nowadays, unmanned aerial vehicles (UAVs) are commonly used in search and rescue scenarios to gather information in the search area. The automatic identification of the person searched for in aerial footage could increase the autonomy of such systems, reduce the search time, and thus increase the missed person's chances of survival. In this paper, we present a novel approach to perform semantically conditioned open vocabulary object tracking that is specifically designed to cope with the limitations of UAV hardware. Our approach has several advantages. It can run with verbal descriptions of the missing person, e.g., the color of the shirt, it does not require dedicated training to execute the mission and can efficiently track a potentially moving person. Our experimental results demonstrate the versatility and efficacy of our approach.
Authors: Gabriele Magrini, Federico Becattini, Pietro Pala, Alberto Del Bimbo, Antonio Porta
Abstract: In recent years, drone detection has quickly become a subject of extreme interest: the potential for fast‑moving objects of contained dimensions to be used for malicious intents or even terrorist attacks has posed attention to the necessity for precise and resilient systems for detecting and identifying such elements. While extensive literature and works exist on object detection based on RGB data, it is also critical to recognize the limits of such modality when applied to UAVs detection. Detecting drones indeed poses several challenges such as fast‑moving objects and scenes with a high dynamic range or, even worse, scarce illumination levels. Neuromorphic cameras, on the other hand, can retain precise and rich spatio‑temporal information in situations that are challenging for RGB cameras. They are resilient to both high‑speed moving objects and scarce illumination settings, while prone to suffer a rapid loss of information when the objects in the scene are static. In this context, we present a novel model for integrating both domains together, leveraging multimodal data to take advantage of the best of both worlds. To this end, we also release NeRDD (Neuromorphic‑RGB Drone Detection), a novel spatio‑temporally synchronized Event‑RGB Drone detection dataset of more than 3.5 hours of multimodal annotated recordings.
Authors: Krystof Teissing, Matej Novosad, Robert Penicka, Martin Saska
Abstract: We address the challenge of real‑time planning of minimum‑time trajectories over multiple waypoints, onboard multirotor UAVs. Previous works demonstrated that achieving a truly time‑optimal trajectory is computationally too demanding to enable frequent replanning during agile flight, especially on less powerful flight computers. Our approach overcomes this stumbling block by utilizing a point‑mass model with a novel iterative thrust decomposition algorithm, enabling the UAV to use all of its collective thrust, something previous point‑mass approaches could not achieve. The approach enables gravity and drag modeling integration, significantly reducing tracking errors in high‑speed trajectories, which is proven through an ablation study. When combined with a new multi‑waypoint optimization algorithm, which uses a gradient‑based method to converge to optimal velocities in waypoints, the proposed method generates minimum‑time multi‑waypoint trajectories within milliseconds. The proposed approach, which we provide as open‑source package, is validated both in simulation and in real‑world, using Nonlinear Model Predictive Control. With accelerations of up to 3.5g and speeds over 100 km/h, trajectories generated by the proposed method yield similar or even smaller tracking errors than the trajectories generated for a full multirotor model.
Authors: Vandita Shukla, Luca Morelli, Pawel Trybala, Fabio Remondino, Wentian Gan, Yifei Yu, Xin Wang
Abstract: UAV‑based biodiversity conservation applications have exhibited many data acquisition advantages for researchers. UAV platforms with embedded data processing hardware can support conservation challenges through 3D habitat mapping, surveillance and monitoring solutions. High‑quality real‑time scene reconstruction as well as real‑time UAV localization can optimize the exploration vs exploitation balance of single or collaborative mission. In this work, we explore the potential of two collaborative frameworks ‑ Visual Simultaneous Localization and Mapping (V‑SLAM) and Structure‑from‑Motion (SfM) for 3D mapping purposes and compare results with standard offline approaches.
Authors: Jiayu Chen, Chao Yu, Guosheng Li, Wenhao Tang, Shilong Ji, Xinyi Yang, Botian Xu, Huazhong Yang, Yu Wang
Abstract: Multi‑UAV pursuit‑evasion, where pursuers aim to capture evaders, poses a key challenge for UAV swarm intelligence. Multi‑agent reinforcement learning (MARL) has demonstrated potential in modeling cooperative behaviors, but most RL‑based approaches remain constrained to simplified simulations with limited dynamics or fixed scenarios. Previous attempts to deploy RL policy to real‑world pursuit‑evasion are largely restricted to two‑dimensional scenarios, such as ground vehicles or UAVs at fixed altitudes. In this paper, we address multi‑UAV pursuit‑evasion by considering UAV dynamics and physical constraints. We introduce an evader prediction‑enhanced network to tackle partial observability in cooperative strategy learning. Additionally, we propose an adaptive environment generator within MARL training, enabling higher exploration efficiency and better policy generalization across diverse scenarios. Simulations show our method significantly outperforms all baselines in challenging scenarios, generalizing to unseen scenarios with a 100% capture rate. Finally, we derive a feasible policy via a two‑stage reward refinement and deploy the policy on real quadrotors in a zero‑shot manner. To our knowledge, this is the first work to derive and deploy an RL‑based policy using collective thrust and body rates control commands for multi‑UAV pursuit‑evasion in unknown environments. The open‑source code and videos are available at https://sites.google.com/view/pursuit‑evasion‑rl.
Authors: Zhong Yin, Hailong Pei
Abstract: In recent years, research on aerial grasping, manipulation, and transportation of objects has garnered significant attention. These tasks often require UAVs to operate safely close to environments or objects and to efficiently grasp payloads. However, current widely adopted flying platforms pose safety hazards: unprotected high‑speed rotating propellers can cause harm to the surroundings. Additionally, the space for carrying payloads on the fuselage is limited, and the restricted position of the payload also hinders efficient grasping. To address these issues, this paper presents a coaxial ducted fan UAV which is equipped with electromagnets mounted externally on the fuselage, enabling safe grasping and transfer of multiple loads in midair without complex additional actuators. It also has the capability to achieve direct human‑UAV cargo transfer in the air. The forces acting on the loads during magnetic attachment and their influencing factors were analyzed. An ADRC controller is utilized to counteract disturbances during grasping and achieve attitude control. Finally, flight tests are conducted to verify the UAV's ability to directly grasp multiple loads from human hands in flight while maintaining attitude tracking.
Authors: Chiya Zhang, Ting Wang, Chunlong He
Abstract: When Unmanned Aerial Vehicles (UAVs) perform high‑precision communication tasks, such as searching for users and providing emergency coverage, positioning errors between base stations and users make it challenging to deploy trajectory planning algorithms. To address these challenges caused by position errors, a framework was proposed to compensate it by Channel Knowledge Map (CKM), which stores channel state information (CSI). By taking the positions with errors as input, the generated CKM could give a prediction of signal attenuation which is close to true positions. Based on that, the predictions are utilized to calculate the received power and a PPO‑based algorithm is applied to optimize the compensation. After training, the framework is able to find a strategy that minimize the flight time under communication constraints and positioning error. Besides, the confidence interval is calculated to assist the allocation of power and the update of CKM is studied to adapt to the dynamic environment. Simulation results show the robustness of CKM to positioning error and environmental changes, and the superiority of CKM‑assisted UAV communication design.
Authors: Yingchao Jiao, Xuhui Zhang, Wenchao Liu, Yinyu Wu, Jinke Ren, Yanyan Shen, Bo Yang, Xinping Guan
Abstract: Unmanned aerial vehicles (UAVs) enabled Internet of things (IoT) systems have become an important part of future wireless communications. To achieve higher communication rate, the joint design of UAV trajectory and resource allocation is crucial. In this paper, a multi‑antenna UAV is dispatched to simultaneously collect data from multiple ground IoT nodes (GNs) within a time interval. To improve the sum data collection (SDC) volume from the GNs, the UAV trajectory, the UAV receive beamforming, the scheduling of the GNs, and the transmit power of the GNs are jointly optimized. Since the problem is non‑convex and the variables are highly coupled, it is hard to be solved using traditional methods. To find a near‑optimal solution, a double‑loop structured optimization‑driven deep reinforcement learning (DRL) algorithm, called rainbow learning based algorithm (RLA), and a fully DRL‑based algorithm are proposed to solve the problem effectively. Specifically, the outer‑loop of the RLA utilizes a fusion deep Q‑network to optimize the UAV trajectory, GN scheduling, and power allocation, while the inner‑loop optimizes receive beamforming by successive convex approximation. Simulation results verify that the proposed algorithms outperform two benchmarks with significant improvement in SDC volumes, energy efficiency, and fairness.
Authors: Hiu Ching Cheung, Bailun Jiang, Yang Hu, Henry K. Chu, Chih-Yung Wen, Ching-Wei Chang
Abstract: Aerial grasping, particularly soft aerial grasping, holds significant promise for drone delivery and harvesting tasks. However, controlling UAV dynamics during aerial grasping presents considerable challenges. The increased mass during payload grasping adversely affects thrust prediction, while unpredictable environmental disturbances further complicate control efforts. In this study, our objective aims to enhance the control of the Soft Aerial Vehicle (SAV) during aerial grasping by incorporating a disturbance observer into a Nonlinear Model Predictive Control (NMPC) SAV controller. By integrating the disturbance observer into the NMPC SAV controller, we aim to compensate for dynamic model idealization and uncertainties arising from additional payloads and unpredictable disturbances. Our approach combines a disturbance observer‑based NMPC with the SAV controller, effectively minimizing tracking errors and enabling precise aerial grasping along all three axes. The proposed SAV equipped with Disturbance Observer‑based Nonlinear Model Predictive Control (DOMPC) demonstrates remarkable capabilities in handling both static and non‑static payloads, leading to the successful grasping of various objects. Notably, our SAV achieves an impressive payload‑to‑weight ratio, surpassing previous investigations in the domain of soft grasping. Using the proposed soft aerial vehicle weighing 1.002 kg, we achieve a maximum payload of 337 g by grasping.
Authors: Wenyi Liu, Yunfan Ren, Rui Guo, Vickie W. W. Kong, Anthony S. P. Hung, Fangcheng Zhu, Yixi Cai, Yuying Zou, Fu Zhang
Abstract: This work presents a LiDAR‑based quadrotor system for slope inspection in dense vegetation environments. Cities like Hong Kong are vulnerable to climate hazards, which often result in landslides. To mitigate the landslide risks, the Civil Engineering and Development Department (CEDD) has constructed steel flexible debris‑resisting barriers on vulnerable natural catchments to protect residents. However, it is necessary to carry out regular inspections to identify any anomalies, which may affect the proper functioning of the barriers. Traditional manual inspection methods face challenges and high costs due to steep terrain and dense vegetation. Compared to manual inspection, unmanned aerial vehicles (UAVs) equipped with LiDAR sensors and cameras have advantages such as maneuverability in complex terrain, and access to narrow areas and high spots. However, conducting slope inspections using UAVs in dense vegetation poses significant challenges. First, in terms of hardware, the overall design of the UAV must carefully consider its maneuverability in narrow spaces, flight time, and the types of onboard sensors required for effective inspection. Second, regarding software, navigation algorithms need to be designed to enable obstacle avoidance flight in dense vegetation environments. To overcome these challenges, we develop a LiDAR‑based quadrotor, accompanied by a comprehensive software system. The goal is to deploy our quadrotor in field environments to achieve efficient slope inspection. To assess the feasibility of our hardware and software system, we conduct functional tests in non‑operational scenarios. Subsequently, invited by CEDD, we deploy our quadrotor in six field environments, including five flexible debris‑resisting barriers located in dense vegetation and one slope that experienced a landslide. These experiments demonstrated the superiority of our quadrotor in slope inspection.
Authors: Yusi Long, Shimin Gong, Sumei Sun, Gary Lee, Lanhua Li, Dusit Niyato
Abstract: This paper investigates an unmanned aerial vehicle (UAV)‑assisted semantic network where the ground users (GUs) periodically capture and upload the sensing information to a base station (BS) via UAVs' relaying. Both the GUs and the UAVs can extract semantic information from large‑size raw data and transmit it to the BS for recovery. Smaller‑size semantic information reduces latency and improves information freshness, while larger‑size semantic information enables more accurate data reconstruction at the BS, preserving the value of original information. We introduce a novel semantic‑aware age‑of‑information (SAoI) metric to capture both information freshness and semantic importance, and then formulate a time‑averaged SAoI minimization problem by jointly optimizing the UAV‑GU association, the semantic extraction, and the UAVs' trajectories. We decouple the original problem into a series of subproblems via the Lyapunov framework and then use hierarchical deep reinforcement learning (DRL) to solve each subproblem. Specifically, the UAV‑GU association is determined by DRL, followed by the optimization module updating the semantic extraction strategy and UAVs' deployment. Simulation results show that the hierarchical structure improves learning efficiency. Moreover, it achieves low AoI through semantic extraction while ensuring minimal loss of original information, outperforming the existing baselines.
Authors: Bishoy Gerges, Barbara Bazzana, Nicolò Botteghi, Youssef Aboudorra, Antonio Franchi
Abstract: In this paper, we present a novel visual servoing (VS) approach based on latent Denoising Diffusion Probabilistic Models (DDPMs), that explores the application of generative models for vision‑based navigation of UAVs (Uncrewed Aerial Vehicles). Opposite to classical VS methods, the proposed approach allows reaching the desired target view, even when the target is initially not visible. This is possible thanks to the learning of a latent representation that the DDPM uses for planning and a dataset of trajectories encompassing target‑invisible initial views. A compact representation is learned from raw images using a Cross‑Modal Variational Autoencoder. Given the current image, the DDPM generates trajectories in the latent space driving the robotic platform to the desired visual target. The approach has been validated in simulation using two generic multi‑rotor UAVs (a quadrotor and a hexarotor). The results show that we can successfully reach the visual target, even if not visible in the initial view.
Authors: Andreas Anastasiou, Savvas Papaioannou, Panayiotis Kolios, Christos G. Panayiotou
Abstract: Over the past few years, a plethora of advancements in Unmanned Areal Vehicle (UAV) technology has paved the way for UAV‑based search and rescue operations with transformative impact to the outcome of critical life‑saving missions. This paper dives into the challenging task of multiple castaway tracking using an autonomous UAV agent. Leveraging on the computing power of the modern embedded devices, we propose a Model Predictive Control (MPC) framework for tracking multiple castaways assumed to drift afloat in the aftermath of a maritime accident. We consider a stationary radar sensor that is responsible for signaling the search mission by providing noisy measurements of each castaway's initial state. The UAV agent aims at detecting and tracking the moving targets with its equipped onboard camera sensor that has limited sensing range. In this work, we also experimentally determine the probability of target detection from real‑world data by training and evaluating various Convolutional Neural Networks (CNNs). Extensive qualitative and quantitative evaluations demonstrate the performance of the proposed approach.
Authors: Angelos Zacharia, Savvas Papaioannou, Panayiotis Kolios, Christos Panayiotou
Abstract: Cooperative control of multi‑UAV systems has attracted substantial research attention due to its significance in various application sectors such as emergency response, search and rescue missions, and critical infrastructure inspection. This paper proposes a distributed control algorithm to generate collision‑free trajectories that drive the multi‑UAV system to completely inspect a set of 3D points on the surface of an object of interest. The objective of the UAVs is to cooperatively inspect the object of interest in the minimum amount of time. Extensive numerical simulations for a team of quadrotor UAVs inspecting a real 3D structure illustrate the validity and effectiveness of the proposed approach.
Authors: Edward Yao
Abstract: This research investigates the efficiency of Floyd algorithm for obstacle‑free path planning for autonomous aerial vehicles (UAVs) or drones. Floyd algorithm is used to generate the shortest paths for UAVs to fly from any place to the destination in a large‑scale field with obstacles which UAVs cannot fly over. The simulation results demonstrated that Floyd algorithm effectively plans the shortest obstacle‑free paths for UAVs to fly to a destination. It is verified that Floyd algorithm holds a time complexity of O(n3). This research revealed a correlation of a cubic polynomial relationship between the time cost and the size of the field, no correlation between the time cost and the number of obstacles, and no correlation between the time cost and the number of UAVs in the tested field. The applications of the research results are discussed in the paper as well.
Authors: Yifan Sun, Rang Liu, Zhiping Lu, Honghao Luo, Ming Li, Qian Liu
Abstract: Synthetic Aperture Radar (SAR) utilizes the movement of the radar antenna over a specific area of interest to achieve higher spatial resolution imaging. In this paper, we aim to investigate the realization of SAR imaging for a stationary radar system with the assistance of active reconfigurable intelligent surface (ARIS) mounted on an unmanned aerial vehicle (UAV). As the UAV moves along the stationary trajectory, the ARIS can not only build a high‑quality virtual line‑of‑sight (LoS) propagation path, but its mobility can also effectively create a much larger virtual aperture, which can be utilized to realize a SAR system. In this paper, we first present a range‑Doppler (RD) imaging algorithm to obtain imaging results for the proposed ARIS‑empowered SAR system. Then, to further improve the SAR imaging performance, we attempt to optimize the reflection coefficients of ARIS to maximize the signal‑to‑noise ratio (SNR) at the stationary radar receiver under the constraints of ARIS maximum power and amplification factor. An effective algorithm based on fractional programming (FP) and majorization minimization (MM) methods is developed to solve the resulting non‑convex problem. Simulation results validate the effectiveness of ARIS‑assisted SAR imaging and our proposed RD imaging and ARIS optimization algorithms.
Authors: Nathan Boyer
Abstract: It's possible to distribute the Internet to users via drones. However it is then necessary to place the drones according to the positions of the users. Moreover, the 5th Generation (5G) New Radio (NR) technology is designed to accommodate a wide range of applications and industries. The NGNM 5G White Paper \cite5gwhitepaper groups these vertical use cases into three categories:
‑ enhanced Mobile Broadband (eMBB)
‑ massive Machine Type Communication (mMTC)
‑ Ultra‑Reliable Low‑latency Communication (URLLC).
Partitioning the physical network into multiple virtual networks appears to be the best way to provide a customised service for each application and limit operational costs. This design is well known as network slicing. Each drone must thus slice its bandwidth between each of the 3 user classes. This whole problem (placement + bandwidth) can be defined as an optimization problem, but since it is very hard to solve efficiently, it is almost always addressed by AI in the litterature. In my internship, I wanted to prove that viewing the problem as an optimization problem can still be useful, by building an hybrid solution involving on one hand AI and on the other optimization. I use it to achieve better results than approaches that use only AI, although at the cost of slightly larger (but still reasonable) computation times.
Authors: Amir Khazraei, Haocheng Meng, Miroslav Pajic
Abstract: This work focuses on analyzing the vulnerability of unmanned aerial vehicles (UAVs) to stealthy black‑box false data injection attacks on GPS measurements. We assume that the quadcopter is equipped with IMU and GPS sensors, and an arbitrary sensor fusion and controller are used to estimate and regulate the system's states, respectively. We consider the notion of stealthiness in the most general form, where the attack is defined to be stealthy if it cannot be detected by any existing anomaly detector. Then, we show that if the closed‑loop control system is incrementally exponentially stable, the attacker can cause arbitrarily large deviation in the position trajectory by compromising only the GPS measurements. We also show that to conduct such stealthy impactfull attack values, the attacker does not need to have access to the model of the system. Finally, we illustrate our results in a UAV case study.
Authors: Aryo Jamshidpey, Hugh H. -T. Liu
Abstract: This paper addresses multi‑UAV uniform sweep coverage in an unknown convex environment, where a homogeneous UAV swarm must evenly visit every portion of the environment for a sampling task without access to their position and orientation. Random walk exploration is practical in this scenario because it requires no localization and is easy to implement on swarms. We demonstrate that the Self‑Organizing Nervous System (SoNS) framework, which enables a robot swarm to self‑organize into a hierarchical ad‑hoc communication network using local communication, is a promising control approach for random exploration in such environments. To this end, we propose a SoNS‑based random walk method in which UAVs self‑organize into a line formation and then perform a random walk to cover the environment while maintaining that formation. We evaluate our approach in simulations against several decentralized random walk strategies. Results show that our SoNS‑based random walk achieves full coverage faster and with greater coverage uniformity than these benchmark strategies, both globally and in local regions.
Authors: Ali Moltajaei Farid, Jafar Roshanian, Malek Mouhoub
Abstract: Unmanned aerial vehicles (UAVs) have become increasingly popular in various fields, including precision agriculture, search and rescue, and remote sensing. However, exploring unknown environments remains a significant challenge. This study aims to address this challenge by utilizing on‑policy Reinforcement Learning (RL) with Proximal Policy Optimization (PPO) to explore the two dimensional area of interest with multiple UAVs. The UAVs will avoid collision with obstacles and each other and do the exploration in a distributed manner. The proposed solution includes actor‑critic networks using deep convolutional neural networks (CNN) and long short‑term memory (LSTM) for identifying the UAVs and areas that have already been covered. Compared to other RL techniques, such as policy gradient (PG) and asynchronous advantage actor‑critic (A3C), the simulation results demonstrate the superiority of the proposed PPO approach. Also, the results show that combining LSTM with CNN in critic can improve exploration. Since the proposed exploration has to work in unknown environments, the results showed that the proposed setup can complete the coverage when we have new maps that differ from the trained maps. Finally, we showed how tuning hyper parameters may affect the overall performance.
Authors: John Lewis, Meysam Basiri, Pedro U. Lima
Abstract: Efficient exploration of large‑scale environments remains a critical challenge in robotics, with applications ranging from environmental monitoring to search and rescue operations. This article proposes Frontier Shepherding (FroShe), a bio‑inspired multi‑robot framework for large‑scale exploration. The framework heuristically models frontier exploration based on the shepherding behavior of herding dogs, where frontiers are treated as a swarm of sheep reacting to robots modeled as shepherding dogs. FroShe is robust across varying environment sizes and obstacle densities, requiring minimal parameter tuning for deployment across multiple agents. Simulation results demonstrate that the proposed method performs consistently, regardless of environment complexity, and outperforms state‑of‑the‑art exploration strategies by an average of 20% with three UAVs. The approach was further validated in real‑world experiments using single‑ and dual‑drone deployments in a forest‑like environment.
Authors: Agnaldo Batista, Aldri Santos
Abstract: Unmanned aerial vehicles (UAV) have been recognized as a versatile platform for a wide range of services. During the flight, these vehicles must avoid collisions to operate safely. In this way, they demand to keep spatial awareness, i.e., to know others in their coverage area. However, mobility and positioning aspects hamper building UAV network infrastructure to support reliable basic services. Thus, such vehicles call for a location service with up‑to‑date information resilient to false location injection threats. This work proposes FlySafe, a resilient UAVs location sharing service that employs opportunistic approaches to deliver UAVs' location. FlySafe takes into account the freshness of UAVs' location to maintain their spatial awareness. Further, it counts on the age of the UAV's location information to trigger device discovery. Simulation results showed that FlySafe achieved spatial awareness up to 94.15% of UAV operations, being resilient to~false locations injected in the network. Moreover, the accuracy in device discovery achieved 94.53% with a location error of less than 2 m.
Authors: George Rapakoulias, Panagiotis Tsiotras
Abstract: Safe and accurate control of unmanned aerial vehicles in the presence of winds is a challenging control problem due to the hard‑to‑model and highly stochastic nature of the disturbance forces acting upon the vehicle. To meet performance constraints, state‑of‑the‑art control methods such as Incremental Nonlinear Dynamic Inversion (INDI) or other adaptive control techniques require high control gains to mitigate the effects of uncertainty entering the system. While achieving good tracking performance, IDNI requires excessive control effort, results in high actuator strain, and reduced flight smoothness due to constant and aggressive corrective actions commanded by the controller. In this paper, we propose a novel control architecture that allows the user to systematically address the trade‑off between high authority control and performance constraint satisfaction. Our approach consists of two parts. To cancel out biases introduced by unmodelled aerodynamic effects we propose a hybrid, model‑based disturbance force estimator augmented with a neural network, that can adapt to external wind conditions using a Kalman Filter. We then utilize state‑of‑the‑art results from Covariance Steering theory, which offers a principled way of controlling the uncertainty of the tracking error dynamics. We first analyze the properties of the combined system and then provide extensive experimental results to verify the advantages of the proposed approach over existing methods
Authors: Zhixi Cai, Cristian Rojas Cardenas, Kevin Leo, Chenyuan Zhang, Kal Backman, Hanbing Li, Boying Li, Mahsa Ghorbanali, Stavya Datta, Lizhen Qu, Julian Gutierrez Santiago, Alexey Ignatiev, Yuan-Fang Li, Mor Vered, Peter J Stuckey, Maria Garcia de la Banda, Hamid Rezatofighi
Abstract: This paper addresses the problem of autonomous UAV search missions, where a UAV must locate specific Entities of Interest (EOIs) within a time limit, based on brief descriptions in large, hazard‑prone environments with keep‑out zones. The UAV must perceive, reason, and make decisions with limited and uncertain information. We propose NEUSIS, a compositional neuro‑symbolic system designed for interpretable UAV search and navigation in realistic scenarios. NEUSIS integrates neuro‑symbolic visual perception, reasoning, and grounding (GRiD) to process raw sensory inputs, maintains a probabilistic world model for environment representation, and uses a hierarchical planning component (SNaC) for efficient path planning. Experimental results from simulated urban search missions using AirSim and Unreal Engine show that NEUSIS outperforms a state‑of‑the‑art (SOTA) vision‑language model and a SOTA search planning model in success rate, search efficiency, and 3D localization. These results demonstrate the effectiveness of our compositional neuro‑symbolic approach in handling complex, real‑world scenarios, making it a promising solution for autonomous UAV systems in search missions.
Authors: Farzad Sanati
Abstract: One of the most useful applications of intelligent aerial robots sometimes called Unmanned Aerial Vehicles (UAV) in Australia is known to be in bushfire monitoring and prediction operations. A swarm of autonomous drones/UAVs programmed to work in real‑time observing the fire parameters using their onboard sensors would be valuable in reducing the life‑threatening impact of that fire. However autonomous UAVs face serious challenges in their positioning and navigation in critical bushfire conditions such as remoteness and severe weather conditions where GPS signals could also be unreliable. This paper tackles one of the most important factors in autonomous UAV navigation, namely Initial Positioning sometimes called Localisation. The solution provided by this paper will enable a team of autonomous UAVs to establish a relative position to their base of operation to be able to commence a team search and reconnaissance in a bushfire‑affected area and find their way back to their base without the help of GPS signals.
Authors: Zheyu Zhou, Yaqing Wang, Elliot W. Hawkes, Chen Li
Abstract: The request for fast response and safe operation after natural and man‑made disasters in urban environments has spurred the development of robotic systems designed to assist in search and rescue operations within complex rubble sites. Traditional Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs) face significant limitations in such confined and obstructed environments. This paper introduces a novel vine robot designed to navigate dense rubble, drawing inspiration from natural growth mechanisms found in plants. Unlike conventional robots, vine robots are soft robots that can grow by everting their material, allowing them to navigate through narrow spaces and obstacles. The prototype presented in this study incorporates pneumatic muscles for steering and oscillation, an equation‑based robot length control plus feedback pressure regulating system for extending and retracting the robot body. We conducted a series of controlled experiments in an artificial rubble testbed to assess the robot performance under varying environmental conditions and robot parameters, including volume ratio, environmental weight, oscillation, and steering. The results show that the vine robot can achieve significant penetration depths in cluttered environments with mixed obstacle sizes and weights, and can maintain repeated trajectories, demonstrating potential for mapping and navigating complex underground paths. Our findings highlight the suitability of the vine robot for urban search and rescue missions, with further research planned to enhance its robustness and deployability in real‑world scenarios.
Authors: Hongjiang Lei, Mingxu Yang, Ki-Hong Park, Gaofeng Pan
Abstract: Mobile edge computing (MEC) technology can reduce user latency and energy consumption by offloading computationally intensive tasks to the edge servers. Unmanned aerial vehicles (UAVs) and non‑orthogonal multiple access (NOMA) technology enable the MEC networks to provide offloaded computing services for massively accessed terrestrial users conveniently. However, the broadcast nature of signal propagation in NOMA‑based UAV‑MEC networks makes it vulnerable to eavesdropping by malicious eavesdroppers. In this work, a secure offload scheme is proposed for NOMA‑based UAV‑MEC systems with the existence of an aerial eavesdropper. The long‑term average network computational cost is minimized by jointly designing the UAV's trajectory, the terrestrial users' transmit power, and computational frequency while ensuring the security of users' offloaded data. Due to the eavesdropper's location uncertainty, the worst‑case security scenario is considered through the estimated eavesdropping range. Due to the high‑dimensional continuous action space, the deep deterministic policy gradient algorithm is utilized to solve the non‑convex optimization problem. Simulation results validate the effectiveness of the proposed scheme.
Authors: Zihan Wang, Nina Mahmoudian
Abstract: In this study, we conduct a comprehensive benchmark of the Safe Reinforcement Learning (Safe RL) algorithms for the task of vision‑driven river following of Unmanned Aerial Vehicle (UAV) in a Unity‑based photo‑realistic simulation environment. We empirically validate the effectiveness of semantic‑augmented image encoding method, assessing its superiority based on Relative Entropy and the quality of water pixel reconstruction. The determination of the encoding dimension, guided by reconstruction loss, contributes to a more compact state representation, facilitating the training of Safe RL policies. Across all benchmarked Safe RL algorithms, we find that First Order Constrained Optimization in Policy Space achieves the optimal balance between reward acquisition and safety compliance. Notably, our results reveal that on‑policy algorithms consistently outperform both off‑policy and model‑based counterparts in both training and testing environments. Importantly, the benchmarking outcomes and the vision encoding methodology extend beyond UAVs, and are applicable to Autonomous Surface Vehicles (ASVs) engaged in autonomous navigation in confined waters.
Authors: Saurabh Kumar, Shashi Ranjan Kumar, Abhinav Sinha
Abstract: In this paper, we consider the tracking of arbitrary curvilinear geometric paths in three‑dimensional output spaces of unmanned aerial vehicles (UAVs) without pre‑specified timing requirements, commonly referred to as path‑following problems, subjected to bounded inputs. Specifically, we propose a novel nonlinear path‑following guidance law for a UAV that enables it to follow any smooth curvilinear path in three dimensions while accounting for the bounded control authority in the design. The proposed solution offers a general treatment of the path‑following problem by removing the dependency on the path's geometry, which makes it applicable to paths with varying levels of complexity and smooth curvatures. Additionally, the proposed strategy draws inspiration from the pursuit guidance approach, which is known for its simplicity and ease of implementation. Theoretical analysis guarantees that the UAV converges to its desired path within a fixed time and remains on it irrespective of its initial configuration with respect to the path. Finally, the simulations demonstrate the merits and effectiveness of the proposed guidance strategy through a wide range of engagement scenarios, showcasing the UAV's ability to follow diverse curvilinear paths accurately.
Authors: Harris K. Armeniakos, Petros S. Bithas, Konstantinos Maliatsos, Athanasios G. Kanatas
Abstract: This letter studies the joint energy and signal‑to‑interference‑plus‑noise (SINR)‑based coverage probability in Unmanned Aerial Vehicle (UAV)‑assisted radio frequency (RF)‑powered Internet of Things (IoT) networks. The UAVs are spatially distributed in an aerial corridor that is modeled as a one‑dimensional (1D) binomial point process (BPP). By accurately capturing the line‑of‑sight (LoS) probability of a UAV through large‑scale fading: i) an exact form expression for the energy coverage probability is derived, and ii) a tight approximation for the overall coverage performance is obtained. Among several key findings, numerical results reveal the optimal number of deployed UAV‑BSs that maximizes the joint coverage probability, as well as the optimal length of the UAV corridors when designing such UAV‑assisted IoT networks.
Authors: Vishal Choudhary, Shashi Kant Gupta, Shaohui Foong, Hock Beng Lim
Abstract: The localization of Unmanned aerial vehicles (UAVs) in deep tunnels is extremely challenging due to their inaccessibility and hazardous environment. Conventional outdoor localization techniques (such as using GPS) and indoor localization techniques (such as those based on WiFi, Infrared (IR), Ultra‑Wideband, etc.) do not work in deep tunnels. We are developing a UAV‑based system for the inspection of defects in the Deep Tunnel Sewerage System (DTSS) in Singapore. To enable the UAV localization in the DTSS, we have developed a distance measurement module based on the optical flow technique. However, the standard optical flow technique does not work well in tunnels with poor lighting and a lack of features. Thus, we have developed an enhanced optical flow algorithm with prediction, to improve the distance measurement for UAVs in deep hazardous tunnels.
Authors: Xuexue Li
Abstract: Most recent UAV (Unmanned Aerial Vehicle) detectors focus primarily on general challenge such as uneven distribution and occlusion. However, the neglect of scale challenges, which encompass scale variation and small objects, continues to hinder object detection in UAV images. Although existing works propose solutions, they are implicitly modeled and have redundant steps, so detection performance remains limited. And one specific work addressing the above scale challenges can help improve the performance of UAV image detectors. Compared to natural scenes, scale challenges in UAV images happen with problems of limited perception in comprehensive scales and poor robustness to small objects. We found that complementary learning is beneficial for the detection model to address the scale challenges. Therefore, the paper introduces it to form our scale‑robust complementary learning network (SCLNet) in conjunction with the object detection model. The SCLNet consists of two implementations and a cooperation method. In detail, one implementation is based on our proposed scale‑complementary decoder and scale‑complementary loss function to explicitly extract complementary information as complement, named comprehensive‑scale complementary learning (CSCL). Another implementation is based on our proposed contrastive complement network and contrastive complement loss function to explicitly guide the learning of small objects with the rich texture detail information of the large objects, named inter‑scale contrastive complementary learning (ICCL). In addition, an end‑to‑end cooperation (ECoop) between two implementations and with the detection model is proposed to exploit each potential.
Authors: Volodymyr Rizun
Abstract: This paper presents a neural network that effectively removes visual defects from UAV‑captured images. It features an enhanced Pix2Pix GAN, specifically engineered to address visual defects in UAV imagery. The method incorporates advanced modifications to the Pix2Pix architecture, targeting prevalent issues such as mode collapse. The suggested method facilitates significant improvements in the quality of defected UAV images, yielding cleaner and more precise visual results. The effectiveness of the proposed approach is demonstrated through evaluation on a custom dataset of aerial photographs, highlighting its capability to refine and restore UAV imagery effectively.
Authors: Hambisa Keno, Nicholas J. Pioch, Christopher Guagliano, Timothy H. Chung
Abstract: Application of Unmanned Aerial Vehicles (UAVs) in search and rescue, emergency management, and law enforcement has gained traction with the advent of low‑cost platforms and sensor payloads. The emergence of hybrid neural and symbolic AI approaches for complex reasoning is expected to further push the boundaries of these applications with decreasing levels of human intervention. However, current UAV simulation environments lack semantic context suited to this hybrid approach. To address this gap, HAMERITT (Hybrid Ai Mission Environment for RapId Training and Testing) provides a simulation‑based autonomy software framework that supports the training, testing and assurance of neuro‑symbolic algorithms for autonomous maneuver and perception reasoning. HAMERITT includes scenario generation capabilities that offer mission‑relevant contextual symbolic information in addition to raw sensor data. Scenarios include symbolic descriptions for entities of interest and their relations to scene elements, as well as spatial‑temporal constraints in the form of time‑bounded areas of interest with prior probabilities and restricted zones within those areas. HAMERITT also features support for training distinct algorithm threads for maneuver vs. perception within an end‑to‑end mission run. Future work includes improving scenario realism and scaling symbolic context generation through automated workflow.
Authors: Julien Yuuki Burkhard, Jesse Ray Murray Lahaye, Laurent Valentin Jospin, Jan Skaloud
Abstract: Hyperspectral cameras have recently been miniaturized for operation on lightweight airborne platforms such as UAV or small aircraft. Unlike frame cameras (RGB or Multispectral), many hyperspectral sensors use a linear array or 'push‑broom' scanning design. This design presents significant challenges for image rectification and the calibration of the intrinsic and extrinsic camera parameters. Typically, methods employed to address such tasks rely on a precise GPS/INS estimate of the airborne platform trajectory and a detailed terrain model. However, inaccuracies in the trajectory or surface model information can introduce systematic errors and complicate geometric modeling which ultimately degrade the quality of the rectification. To overcome these challenges, we propose a method for tie point extraction and camera calibration for 'push‑broom' hyperspectral sensors using only the raw spectral imagery and raw, possibly low quality, GPS/INS trajectory. We demonstrate that our approach allows for the automatic calibration of airborne systems with hyperspectral cameras, outperforms other state‑of‑the‑art automatic rectification methods and reaches an accuracy on par with manual calibration methods.
Authors: Khaoula Hidawi
Abstract: Non‑Fungible Tokens (NFTs) have emerged as a revolutionary method for managing digital assets, providing transparency and secure ownership records on a blockchain. In this paper, we present a theoretical framework for leveraging NFTs to manage UAV (Unmanned Aerial Vehicle) flight data. Our approach focuses on ensuring data integrity, ownership transfer, and secure data sharing among stakeholders. This framework utilizes cryptographic methods, smart contracts, and access control mechanisms to enable a tamper‑proof and privacy‑preserving management system for UAV flight data.
Authors: Yu-Hsi Chen
Abstract: Accurate detection of Unmanned Aerial Vehicles (UAVs) is critical for surveillance, security, and airspace monitoring. However, existing datasets remain limited in scale, resolution, and the ability to capture objects across extreme size variations. To address these challenges, we present UAVDB, a benchmark dataset for UAV detection and segmentation, constructed via a point‑guided weak supervision pipeline. We introduce Patch Intensity Convergence (PIC), a lightweight annotation method that converts trajectory points into bounding boxes, eliminating the need for manual labeling while preserving precise spatial localization. Building upon these annotations, we further generate segmentation masks using SAM2, enriching the dataset with multi‑task labels. UAVDB consists of RGB frames from a fixed‑camera multi‑view video dataset, capturing UAVs across scales ranging from clearly visible objects to near single‑pixel instances under diverse conditions. Quantitative results show that PIC combined with SAM2 outperforms existing annotation techniques in terms of IoU. Furthermore, we benchmark YOLO‑based detectors on UAVDB, establishing baselines for future research.
Authors: Wali Ullah Khan, Eva Lagunas, Asad Mahmood, Muhammad Asif, Manzoor Ahmed, Symeon Chatzinotas
Abstract: The reconfigurable intelligent surface (RIS) technology shows great potential in sixth‑generation (6G) terrestrial and non‑terrestrial networks (NTNs) since it can effectively change wireless settings to improve connectivity. Extensive research has been conducted on traditional RIS systems with diagonal phase response matrices. The straightforward RIS architecture, while cost‑effective, has restricted capabilities in manipulating the wireless channels. The beyond diagonal reconfigurable intelligent surface (BD‑RIS) greatly improves control over the wireless environment by utilizing interconnected phase response elements. This work proposes the integration of unmanned aerial vehicle (UAV) communications and BD‑RIS in 6G NTNs, which has the potential to further enhance wireless coverage and spectral efficiency. We begin with the preliminaries of UAV communications and then discuss the fundamentals of BD‑RIS technology. Subsequently, we discuss the potential of BD‑RIS and UAV communications integration. We then proposed a case study based on UAV‑mounted transmissive BD‑RIS communication. Finally, we highlight future research directions and conclude this work.
Authors: Bowei Li, Saugat Tripathi, Salman Hosain, Ran Zhang, Jiang, Xie, Miao Wang
Abstract: Distributed management over Unmanned Aerial Vehicle (UAV) based communication networks (UCNs) has attracted increasing research attention. In this work, we study a distributed user connectivity maximization problem in a UCN. The work features a horizontal study over different levels of information exchange during the distributed iteration and a consideration of dynamics in UAV set and user distribution, which are not well addressed in the existing works. Specifically, the studied problem is first formulated into a time‑coupled mixed‑integer non‑convex optimization problem. A heuristic two‑stage UAV‑user association policy is proposed to faster determine the user connectivity. To tackle the NP‑hard problem in scalable manner, the distributed user connectivity maximization algorithm 1 (DUCM‑1) is proposed under the multi‑agent deep Q learning (MA‑DQL) framework. DUCM‑1 emphasizes on designing different information exchange levels and evaluating how they impact the learning convergence with stationary and dynamic user distribution. To comply with the UAV dynamics, DUCM‑2 algorithm is developed which is devoted to autonomously handling arbitrary quit's and join‑in's of UAVs in a considered time horizon. Extensive simulations are conducted i) to conclude that exchanging state information with a deliberated task‑specific reward function design yields the best convergence performance, and ii) to show the efficacy and robustness of DUCM‑2 against the dynamics.
Authors: Vladislav Semenyuk, Ildar Kurmashev, Alberto Lupidi, Dmitriy Alyoshin, Liliya Kurmasheva, Alessandro Cantelli-Forti
Abstract: This review provides a detailed analysis of the advancements in unmanned aerial vehicle (UAV) detection and classification systems from 2020 to today. It covers various detection methodologies such as radar, radio frequency, optical, and acoustic sensors, and emphasizes their integration via sophisticated sensor fusion techniques. The fundamental technologies driving UAV detection and classification are thoroughly examined, with a focus on their accuracy and range. Additionally, the paper discusses the latest innovations in artificial intelligence and machine learning, illustrating their impact on improving the accuracy and efficiency of these systems. The review concludes by predicting further technological developments in UAV detection, which are expected to enhance both performance and reliability.
Authors: Qiuchen Qian, Yanran Wang, David Boyle
Abstract: The Orienteering Problem (OP) is a well‑studied routing problem that has been extended to incorporate uncertainties, reflecting stochastic or dynamic travel costs, prize‑collection costs, and prizes. Existing approaches may, however, be inefficient in real‑world applications due to insufficient modeling knowledge and initially unknowable parameters in online scenarios. Thus, we propose the Uncertain and Dynamic Orienteering Problem (UDOP), modeling travel costs as distributions with unknown and time‑variant parameters. UDOP also associates uncertain travel costs with dynamic prizes and prize‑collection costs for its objective and budget constraints. To address UDOP, we develop an ADaptive Approach for Probabilistic paThs ‑ ADAPT, that iteratively performs 'execution' and 'online planning' based on an initial 'offline' solution. The execution phase updates system status and records online cost observations. The online planner employs a Bayesian approach to adaptively estimate power consumption and optimize path sequence based on safety beliefs. We evaluate ADAPT in a practical Unmanned Aerial Vehicle (UAV) charging scheduling problem for Wireless Rechargeable Sensor Networks. The UAV must optimize its path to recharge sensor nodes efficiently while managing its energy under uncertain conditions. ADAPT maintains comparable solution quality and computation time while offering superior robustness. Extensive simulations show that ADAPT achieves a 100% Mission Success Rate (MSR) across all tested scenarios, outperforming comparable heuristic‑based and frequentist approaches that fail up to 70% (under challenging conditions) and averaging 67% MSR, respectively. This work advances the field of OP with uncertainties, offering a reliable and efficient approach for real‑world applications in uncertain and dynamic environments.
Authors: Jorge Bes, Juan Dendarieta, Luis Riazuelo, Luis Montano
Abstract: Despite the growing impact of Unmanned Aerial Vehicles (UAVs) across various industries, most of current available solutions lack for a robust autonomous navigation system to deal with the appearance of obstacles safely. This work presents an approach to perform autonomous UAV planning and navigation in scenarios in which a safe and high maneuverability is required, due to the cluttered environment and the narrow rooms to move. The system combines an RRT global planner with a newly proposed reactive planner, DWA‑3D, which is the extension of the well known DWA method for 2D robots. We provide a theoretical‑empirical method for adjusting the parameters of the objective function to optimize, easing the classical difficulty for tuning them. An onboard LiDAR provides a 3D point cloud, which is projected on an Octomap in which the planning and navigation decisions are made. There is not a prior map; the system builds and updates the map online, from the current and the past LiDAR information included in the Octomap. Extensive real‑world experiments were conducted to validate the system and to obtain a fine tuning of the involved parameters. These experiments allowed us to provide a set of values that ensure safe operation across all the tested scenarios. Just by weighting two parameters, it is possible to prioritize either horizontal path alignment or vertical (height) tracking, resulting in enhancing vertical or lateral avoidance, respectively. Additionally, our DWA‑3D proposal is able to navigate successfully even in absence of a global planner or with one that does not consider the drone's size. Finally, the conducted experiments show that computation time with the proposed parameters is not only bounded but also remains stable around 40 ms, regardless of the scenario complexity.
Authors: Horatiu Florea, Sergiu Nedevschi
Abstract: Aerial scene understanding systems face stringent payload restrictions and must often rely on monocular depth estimation for modeling scene geometry, which is an inherently ill‑posed problem. Moreover, obtaining accurate ground truth data required by learning‑based methods raises significant additional challenges in the aerial domain. Self‑supervised approaches can bypass this problem, at the cost of providing only up‑to‑scale results. Similarly, recent supervised solutions which make good progress towards zero‑shot generalization also provide only relative depth values. This work presents TanDepth, a practical scale recovery method for obtaining metric depth results from relative estimations at inference‑time, irrespective of the type of model generating them. Tailored for Unmanned Aerial Vehicle (UAV) applications, our method leverages sparse measurements from Global Digital Elevation Models (GDEM) by projecting them to the camera view using extrinsic and intrinsic information. An adaptation to the Cloth Simulation Filter is presented, which allows selecting ground points from the estimated depth map to then correlate with the projected reference points. We evaluate and compare our method against alternate scaling methods adapted for UAVs, on a variety of real‑world scenes. Considering the limited availability of data for this domain, we construct and release a comprehensive, depth‑focused extension to the popular UAVid dataset to further research.
Authors: Jianing Chen, Sichen Qian, Chuangyin Dang, Sitian Qin
Abstract: This paper mainly investigates a class of distributed Variational Generalized Nash Equilibrium (VGNE) seeking problems for both online noncooperative games and online aggregative games with time‑varying coupling inequality constraints. Two novel continuous‑time distributed VGNE seeking algorithms are proposed, which realize the constant regret bound and sublinear fit bound, superior to those of the criteria for online optimization problems and online games. Furthermore, to reduce unnecessary communication among players, a dynamic event‑triggered mechanism involving internal variables is introduced into the distributed VGNE seeking algorithm, while the constant regret bound and sublinear fit bound are still maintained. Also, the Zeno behavior is strictly prohibited. Moreover, we further investigate the impact of communication noise on the player's measurement of its neighbors' relative states. It is demonstrated that both the regret and fit bounds remain valid as long as the noise level is not excessively large. This result reveals, to some extent, the proposed algorithm's noise‑resilient capability. Finally, an online Uncrewed Aerial Vehicle (UAV) swarm game and an online Nash‑Cournot game are given to demonstrate the validity of the theoretical results.
Authors: Kangtong Mo, Linyue Chu, Xingyu Zhang, Xiran Su, Yang Qian, Yining Ou, Wian Pretorius
Abstract: Autonomous indoor navigation of UAVs presents numerous challenges, primarily due to the limited precision of GPS in enclosed environments. Additionally, UAVs' limited capacity to carry heavy or power‑intensive sensors, such as overheight packages, exacerbates the difficulty of achieving autonomous navigation indoors. This paper introduces an advanced system in which a drone autonomously navigates indoor spaces to locate a specific target, such as an unknown Amazon package, using only a single camera. Employing a deep learning approach, a deep reinforcement adaptive learning algorithm is trained to develop a control strategy that emulates the decision‑making process of an expert pilot. We demonstrate the efficacy of our system through real‑time simulations conducted in various indoor settings. We apply multiple visualization techniques to gain deeper insights into our trained network. Furthermore, we extend our approach to include an adaptive control algorithm for coordinating multiple drones to lift an object in an indoor environment collaboratively. Integrating our DRAL algorithm enables multiple UAVs to learn optimal control strategies that adapt to dynamic conditions and uncertainties. This innovation enhances the robustness and flexibility of indoor navigation and opens new possibilities for complex multi‑drone operations in confined spaces. The proposed framework highlights significant advancements in adaptive control and deep reinforcement learning, offering robust solutions for complex multi‑agent systems in real‑world applications.
Authors: Md. Mahfuzur Rahman, Sunzida Siddique, Marufa Kamal, Rakib Hossain Rifat, Kishor Datta Gupta
Abstract: Unmanned Aerial Vehicles (UAVs), have greatly revolutionized the process of gathering and analyzing data in diverse research domains, providing unmatched adaptability and effectiveness. This paper presents a thorough examination of Unmanned Aerial Vehicle (UAV) datasets, emphasizing their wide range of applications and progress. UAV datasets consist of various types of data, such as satellite imagery, images captured by drones, and videos. These datasets can be categorized as either unimodal or multimodal, offering a wide range of detailed and comprehensive information. These datasets play a crucial role in disaster damage assessment, aerial surveillance, object recognition, and tracking. They facilitate the development of sophisticated models for tasks like semantic segmentation, pose estimation, vehicle re‑identification, and gesture recognition. By leveraging UAV datasets, researchers can significantly enhance the capabilities of computer vision models, thereby advancing technology and improving our understanding of complex, dynamic environments from an aerial perspective. This review aims to encapsulate the multifaceted utility of UAV datasets, emphasizing their pivotal role in driving innovation and practical applications in multiple domains.
Authors: Weichao Pan, Xu Wang, Wenqing Huan
Abstract: Unmanned Aerial Vehicle (UAV)‑based Road Damage Detection (RDD) is important for daily maintenance and safety in cities, especially in terms of significantly reducing labor costs. However, current UAV‑based RDD research is still faces many challenges. For example, the damage with irregular size and direction, the masking of damage by the background, and the difficulty of distinguishing damage from the background significantly affect the ability of UAV to detect road damage in daily inspection. To solve these problems and improve the performance of UAV in real‑time road damage detection, we design and propose three corresponding modules: a feature extraction module that flexibly adapts to shape and background; a module that fuses multiscale perception and adapts to shape and background ; an efficient downsampling module. Based on these modules, we designed a multi‑scale, adaptive road damage detection model with the ability to automatically remove background interference, called Dynamic Scale‑Aware Fusion Detection Model (RT‑DSAFDet). Experimental results on the UAV‑PDD2023 public dataset show that our model RT‑DSAFDet achieves a mAP50 of 54.2%, which is 11.1% higher than that of YOLOv10‑m, an efficient variant of the latest real‑time object detection model YOLOv10, while the amount of parameters is reduced to 1.8M and FLOPs to 4.6G, with a decreased by 88% and 93%, respectively. Furthermore, on the large generalized object detection public dataset MS COCO2017 also shows the superiority of our model with mAP50‑95 is the same as YOLOv9‑t, but with 0.5% higher mAP50, 10% less parameters volume, and 40% less FLOPs.
Authors: Xiao-Wei Tang, Yunmei Shi, Yi Huang, Qingqing Wu
Abstract: Recently, movable antennas (MAs) have garnered immense attention due to their capability to favorably alter channel conditions through agile movement. In this letter, we delve into a spectrum sharing system enabled by unmanned aerial vehicle (UAV) mounted MAs, thereby introducing a new degree of freedom vertically alongside the horizontal local mobility for MAs. Our objective is to maximize the minimum beamforming gain for secondary users (SUs) while ensuring that interference to the primary users (PUs) remains below a predefined threshold, which necessitates a joint optimization involving the UAV's height, the antenna weight vector (AWV), and the antenna position vector (APV). However, the formulated optimization problem is non‑convex and challenging to solve optimally. To tackle this issue, we propose an alternating optimization algorithm that optimizes the UAV's height, APV and AWV in an iterative manner, thus yielding a near‑optimal solution. Numerical results demonstrate the superiority of the proposed scheme as well as its ability to deliver full beamforming gain to SUs with reduced computational complexity.
Authors: Sourav Raxit, Simant Bahadur Singh, Abdullah Al Redwan Newaz
Abstract: By harnessing fiducial markers as visual landmarks in the environment, Unmanned Aerial Vehicles (UAVs) can rapidly build precise maps and navigate spaces safely and efficiently, unlocking their potential for fluent collaboration and coexistence with humans. Existing fiducial marker methods rely on handcrafted feature extraction, which sacrifices accuracy. On the other hand, deep learning pipelines for marker detection fail to meet real‑time runtime constraints crucial for navigation applications. In this work, we propose YoloTag a real‑time fiducial marker‑based localization system. YoloTag uses a lightweight YOLO v8 object detector to accurately detect fiducial markers in images while meeting the runtime constraints needed for navigation. The detected markers are then used by an efficient perspective‑n‑point algorithm to estimate UAV states. However, this localization system introduces noise, causing instability in trajectory tracking. To suppress noise, we design a higher‑order Butterworth filter that effectively eliminates noise through frequency domain analysis. We evaluate our algorithm through real‑robot experiments in an indoor environment, comparing the trajectory tracking performance of our method against other approaches in terms of several distance metrics.
Authors: Dexin Duan, Peilin liu, Bingwei Hui, Fei Wen
Abstract: On‑device computing, or edge computing, is becoming increasingly important for remote sensing, particularly in applications like deep network‑based perception on on‑orbit satellites and unmanned aerial vehicles (UAVs). In these scenarios, two brain‑like capabilities are crucial for remote sensing models: (1) high energy efficiency, allowing the model to operate on edge devices with limited computing resources, and (2) online adaptation, enabling the model to quickly adapt to environmental variations, weather changes, and sensor drift. This work addresses these needs by proposing an online adaptation framework based on spiking neural networks (SNNs) for remote sensing. Starting with a pretrained SNN model, we design an efficient, unsupervised online adaptation algorithm, which adopts an approximation of the BPTT algorithm and only involves forward‑in‑time computation that significantly reduces the computational complexity of SNN adaptation learning. Besides, we propose an adaptive activation scaling scheme to boost online SNN adaptation performance, particularly in low time‑steps. Furthermore, for the more challenging remote sensing detection task, we propose a confidence‑based instance weighting scheme, which substantially improves adaptation performance in the detection task. To our knowledge, this work is the first to address the online adaptation of SNNs. Extensive experiments on seven benchmark datasets across classification, segmentation, and detection tasks demonstrate that our proposed method significantly outperforms existing domain adaptation and domain generalization approaches under varying weather conditions. The proposed method enables energy‑efficient and fast online adaptation on edge devices, and has much potential in applications such as remote perception on on‑orbit satellites and UAV.
Authors: Matteo Bernabe, David Lopez-Perez, Nicola Piovesan, Giovanni Geraci, David Gesbert
Abstract: In this article, we introduce a method to optimize 5G massive multiple‑input multiple‑output (mMIMO) connectivity for unmanned aerial vehicles (UAVs) on aerial highways through strategic cell association. UAVs operating in 3D space encounter distinct channel conditions compared to traditional ground user equipment (gUE); under the typical line of sight (LoS) condition, UAVs perceive strong reference signal received power (RSRP) from multiple cells within the network, resulting in a large set of suitable serving cell candidates and in low signal‑to‑interference‑plus‑noise ratio (SINR) due to high interference levels. Additionally, a downside of aerial highways is to pack possibly many UAVs along a small portion of space which, when taking into account typical LoS propagation conditions, results in high channel correlation and severely limits spatial multiplexing capabilities.
In this paper, we propose a solution to both problems based on the suitable selection of serving cells based on a new metric which differs from the classical terrestrial approaches based on maximum RSRP. We then introduce an algorithm for optimal planning of synchronization signal block (SSB) beams for this set of cells, ensuring maximum coverage and effective management of UAVs cell associations. Simulation results demonstrate that our approach significantly improves the rates of UAVs on aerial highways, up to four times in achievable data rates, without impacting ground user performance.
Authors: Ayodeji O. Abioye, Lisa Bidgood, Sarvapali D. Ramchurn, Mohammad D. Soorati
Abstract: Recent advances in robotics bring us closer to the reality of living, co‑habiting, and sharing personal spaces with robots. However, it is not clear how close a co‑located robot can be to a human in a shared environment without making the human uncomfortable or anxious. This research aims to map safe and comfortable zones for co‑located aerial robots. The objective is to identify the distances at which a drone causes discomfort to a co‑located human and to create a map showing no‑fly, moderate‑fly, and safe‑fly zones. We recruited a total of 18 participants and conducted two indoor laboratory experiments, one with a single drone and the other set with two drones. Our results show that multiple drones cause more discomfort when close to a co‑located human than a single drone. We observed that distances below 200 cm caused discomfort, the moderate fly zone was 200 ‑ 300 cm, and the safe‑fly zone was any distance greater than 300 cm in single drone experiments. The safe zones were pushed further away by 100 cm for the multiple drone experiments. In this paper, we present the preliminary findings on safe‑fly zones for multiple drones. Further work would investigate the impact of a higher number of aerial robots, the speed of approach, direction of travel, and noise level on co‑located humans, and autonomously develop 3D models of trust zones and safe zones for co‑located aerial swarms.
Authors: Haowen Yu, Xianqi Liang, Ximin Lyu
Abstract: Unmanned Aerial Vehicles (UAVs) play a crucial role in meteorological research, particularly in environmental wind field measurements. However, several challenges exist in current wind measurement methods using UAVs that need to be addressed. Firstly, the accuracy of measurement is low, and the measurement range is limited. Secondly, the algorithms employed lack robustness and adaptability across different UAV platforms. Thirdly, there are limited approaches available for wind estimation during dynamic flight. Finally, while horizontal plane measurements are feasible, vertical direction estimation is often missing. To tackle these challenges, we present and implement a comprehensive wind estimation algorithm. Our algorithm offers several key features, including the capability to estimate the 3‑D wind vector, enabling wind estimation even during dynamic flight of the UAV. Furthermore, our algorithm exhibits adaptability across various UAV platforms. Experimental results in the wind tunnel validate the effectiveness of our algorithm, showcasing improvements such as wind speed accuracy of 0.11 m/s and wind direction errors of less than 2.8^\circ. Additionally, our approach extends the measurement range to 10 m/s.
Authors: David S. Bolme, Deniz Aykac, Ryan Shivers, Joel Brogan, Nell Barber, Bob Zhang, Laura Davies, David Cornett
Abstract: This paper examines covariate effects on fused whole body biometrics performance in the IARPA BRIAR dataset, specifically focusing on UAV platforms, elevated positions, and distances up to 1000 meters. The dataset includes outdoor videos compared with indoor images and controlled gait recordings. Normalized raw fusion scores relate directly to predicted false accept rates (FAR), offering an intuitive means for interpreting model results. A linear model is developed to predict biometric algorithm scores, analyzing their performance to identify the most influential covariates on accuracy at altitude and range. Weather factors like temperature, wind speed, solar loading, and turbulence are also investigated in this analysis. The study found that resolution and camera distance best predicted accuracy and findings can guide future research and development efforts in long‑range/elevated/UAV biometrics and support the creation of more reliable and robust systems for national security and other critical domains.
Authors: Till M. Blaha, Ewoud J. J. Smeur, Bart D. W. Remes, Coen C. de Visser
Abstract: Though control algorithms for multirotor Unmanned Air Vehicle (UAV) are well understood, the configuration, parameter estimation, and tuning of flight control algorithms takes quite some time and resources. In previous work, we have shown that it is possible to identify the control effectiveness and motor dynamics of a multirotor fast enough for it to recover to a stable hover after being thrown 4 meters in the air. In this paper, we extend this to include estimation of the position of the Inertial Measurement Unit (IMU) relative to the Center of Gravity (CoG), estimation of the IMU rotation, the thrust direction of all motors and the optimal combined thrust direction. In order to guarantee a correct IMU position estimation, two prior throw‑and‑catches of the vehicle with spin around different axes are required. For these throws, a height as low as 1 meter is sufficient. Quadrotor flight experimentation confirms the efficacy of the approach, and a simulation shows its applicability to fully‑actuated crafts with multiple possible hover orientations.
Authors: Animesh Nema, Christopher Grontkowski, Derek Calzada, Sanjuksha Nirgude
Abstract: This project aimed to develop an automated cinematography platform using an unmanned aerial vehicle. Quadcopters are a great platform for shooting aerial scenes but are difficult to maneuver smoothly and can require expertise to pilot. We aim to design an algorithm to enable automated cinematography of a desired object of interest. Given the location of an object and other obstacles in the environment, the drone is able to plan its trajectory while simultaneously keeping the desired object in the video frame and avoiding obstacles. The high maneuverability of quadcopter platforms coupled with the desire for smooth movement and stability from camera platforms means a robust motion planning algorithm must be developed which can take advantage of the quadcopter's abilities while creating motion paths which satisfy the ultimate goal of capturing aerial video. This project aims to research, develop, simulate, and test such an algorithm.
Authors: Baris Yamansavascilar, Atay Ozgovde, Cem Ersoy
Abstract: Air components, including UAVs, planes, balloons, and satellites have been widely utilized since the fixed capacity of ground infrastructure cannot meet the dynamic load of the users. However, since those air components should be coordinated in order to achieve the desired quality of service, several next‑generation paradigms have been defined including air computing. Nevertheless, even though many studies and open research issues exist for air computing, there are limited test environments that cannot satisfy the performance evaluation requirements of the dynamic environment. Therefore, in this study, we introduce our discrete event simulator, AirCompSim, which fulfills an air computing environment considering dynamically changing requirements, loads, and capacities through its modular structure. To show its capabilities, a dynamic capacity enhancement scenario is used for investigating the effect of the number of users, UAVs, and requirements of different application types on the average task success rate, service time, and server utilization. The results demonstrate that AirCompSim can be used for experiments in air computing.
Authors: Lemeng Zhao, Junjie Hu, Jianchao Bi, Yanbing Bai, Erick Mas, Shunichi Koshimura
Abstract: In recent years, unmanned aerial vehicles (UAVs) have played an increasingly crucial role in supporting disaster emergency response efforts by analyzing aerial images. While current deep‑learning models focus on improving accuracy, they often overlook the limited computing resources of UAVs. This study recognizes the imperative for real‑time data processing in disaster response scenarios and introduces a lightweight and efficient approach for aerial video understanding. Our methodology identifies redundant portions within the video through policy networks and eliminates this excess information using frame compression techniques. Additionally, we introduced the concept of a `station point,' which leverages future information in the sequential policy network, thereby enhancing accuracy. To validate our method, we employed the wildfire FLAME dataset. Compared to the baseline, our approach reduces computation costs by more than 13 times while boosting accuracy by 3%. Moreover, our method can intelligently select salient frames from the video, refining the dataset. This feature enables sophisticated models to be effectively trained on a smaller dataset, significantly reducing the time spent during the training process.
Authors: Wenhao Zhuang, Xinyu He, Yuyi Mao, Juan Liu
Abstract: Future wireless networks are envisioned to support both sensing and artificial intelligence (AI) services. However, conventional integrated sensing and communication (ISAC) networks may not be suitable due to the ignorance of diverse task‑specific data utilities in different AI applications. In this letter, a full‑duplex unmanned aerial vehicle (UAV)‑enabled wireless network providing sensing and edge learning services is investigated. To maximize the learning performance while ensuring sensing quality, a convergence‑guaranteed iterative algorithm is developed to jointly determine the uplink time allocation, as well as UAV trajectory and transmit power. Simulation results show that the proposed algorithm significantly outperforms the baselines and demonstrate the critical tradeoff between sensing and learning performance.
Authors: Sachithra Atapattu, Oscar De Silva, Thumeera R Wanasinghe, George K I Mann, Raymond G Gosine
Abstract: This study presents a machine learning‑aided approach to accurately estimate the region of attraction (ROA) of a multi‑rotor unmanned aerial vehicle (UAV) controlled using a linear quadratic regulator (LQR) controller. Conventional ROA estimation approaches rely on a nominal dynamic model for ROA calculation, leading to inaccurate estimation due to unknown dynamics and disturbances associated with the physical system. To address this issue, our study utilizes a neural network to predict these unknown disturbances of a planar quadrotor. The nominal model integrated with the learned disturbances is then employed to calculate the ROA of the planer quadrotor using a graphical technique. The estimated ROA is then compared with the ROA calculated using Lyapunov analysis and the graphical approach without incorporating the learned disturbances. The results illustrated that the proposed method provides a more accurate estimation of the ROA, while the conventional Lyapunov‑based estimation tends to be more conservative.
Authors: Yuhang Yang, Xiaoli Xu, Yong Zeng, Haijian Sun, Rose Qingyang Hu
Abstract: Channel knowledge map (CKM) is a promising technology to enable environment‑aware wireless communications and sensing. Link state map (LSM) is one particular type of CKM that aims to learn the location‑specific line‑of‑sight (LoS) link probability between the transmitter and the receiver at all possible locations, which provides the prior information to enhance the communication quality of dynamic networks. This paper investigates the LSM construction for cellularconnected unmanned aerial vehicles (UAVs) by utilizing both the expert empirical mathematical model and the measurement data. Specifically, we first model the LSM as a binary spatial random field and its initial distribution is obtained by the empirical model. Then we propose an effective binary Bayesian filter to sequentially update the LSM by using the channel measurement. To efficiently update the LSM, we establish the spatial correlation models of LoS probability on the location pairs in both the distance and angular domains, which are adopted in the Bayesian filter for updating the probabilities at locations without measurements. Simulation results demonstrate the effectiveness of the proposed algorithm for LSM construction, which significantly outperforms the benchmark scheme, especially when the measurements are sparse.