arXiv Papers of Multi-Robot Systems
Authors:Jiahao Wang, Zhongwei Jiang, Wenchao Sun, Jiaru Zhong, Haibao Yu, Yuner Zhang, Chenyang Lu, Chuang Zhang, Lei He, Shaobing Xu, Jianqiang Wang
Abstract:
Cooperative perception is critical for autonomous driving, overcoming the inherent limitations of a single vehicle, such as occlusions and constrained fields-of-view. However, current approaches sharing dense Bird's-Eye-View (BEV) features are constrained by quadratically-scaling communication costs and the lack of flexibility and interpretability for precise alignment across asynchronous or disparate viewpoints. While emerging sparse query-based methods offer an alternative, they often suffer from inadequate geometric representations, suboptimal fusion strategies, and training instability. In this paper, we propose SparseCoop, a fully sparse cooperative perception framework for 3D detection and tracking that completely discards intermediate BEV representations. Our framework features a trio of innovations: a kinematic-grounded instance query that uses an explicit state vector with 3D geometry and velocity for precise spatio-temporal alignment; a coarse-to-fine aggregation module for robust fusion; and a cooperative instance denoising task to accelerate and stabilize training. Experiments on V2X-Seq and Griffin datasets show SparseCoop achieves state-of-the-art performance. Notably, it delivers this with superior computational efficiency, low transmission cost, and strong robustness to communication latency. Code is available at https://github.com/wang-jh18-SVM/SparseCoop.
Authors:Yue Pan, Tao Sun, Liyuan Zhu, Lucas Nunes, Iro Armeni, Jens Behley, Cyrill Stachniss
Abstract:
Point cloud registration aligns multiple unposed point clouds into a common frame, and is a core step for 3D reconstruction and robot localization. In this work, we cast registration as conditional generation: a learned continuous, point-wise velocity field transports noisy points to a registered scene, from which the pose of each view is recovered. Unlike previous methods that conduct correspondence matching to estimate the transformation between a pair of point clouds and then optimize the pairwise transformations to realize multi-view registration, our model directly generates the registered point cloud. With a lightweight local feature extractor and test-time rigidity enforcement, our approach achieves state-of-the-art results on pairwise and multi-view registration benchmarks, particularly with low overlap, and generalizes across scales and sensor modalities. It further supports downstream tasks including relocalization, multi-robot SLAM, and multi-session map merging. Source code available at: https://github.com/PRBonn/RAP.
Authors:Arthur Ji Sung Baek, Geoffrey Martin
Abstract:
We present X-SYCON, a xylem-inspired multi-agent architecture in which coordination emerges from passive field dynamics rather than explicit planning or communication. Incidents (demands) and obstructions (hazards) continually write diffusing and decaying scalar fields, and agents greedily ascend a local utility $U=ϕ_{\mathrm{DE}}-κ\,ϕ_{\mathrm{HZ}}$ with light anti-congestion and separation. A beaconing rule triggered on first contact temporarily deepens the local demand sink, accelerating completion without reducing time-to-first-response. Across dynamic, partially blocked simulated environments, we observe low miss rates and stable throughput with interpretable, tunable trade-offs over carrier count, arrival rate, hazard density, and hazard sensitivity $κ$. We derive that a characteristic hydraulic length scale $\ell\approx\sqrt{D/λ}$ predicts recruitment range in a continuum approximation, and we provide a work-conservation (Ohm-law) bound consistent with sublinear capacity scaling with team size. Empirically: (i) soft hazard penalties yield fewer misses when obstacles already block motion; (ii) throughput saturates sublinearly with carriers while reliability improves sharply; (iii) stronger arrivals can reduce misses by sustaining sinks that recruit help; and (iv) phase-stability regions shrink with hazard density but are recovered by more carriers or higher arrivals. We refer to X-SYCON as an instance of Distributed Passive Computation and Control, and we evaluate it in simulations modeling communication-denied disaster response and other constrained sensing-action regimes.
Authors:Weining Lu, Deer Bin, Lian Ma, Ming Ma, Zhihao Ma, Xiangyang Chen, Longfei Wang, Yixiao Feng, Zhouxian Jiang, Yongliang Shi, Bin Liang
Abstract:
Efficient, accurate, and flexible relative localization is crucial in air-ground collaborative tasks. However, current approaches for robot relative localization are primarily realized in the form of distributed multi-robot SLAM systems with the same sensor configuration, which are tightly coupled with the state estimation of all robots, limiting both flexibility and accuracy. To this end, we fully leverage the high capacity of Unmanned Ground Vehicle (UGV) to integrate multiple sensors, enabling a semi-distributed cross-modal air-ground relative localization framework. In this work, both the UGV and the Unmanned Aerial Vehicle (UAV) independently perform SLAM while extracting deep learning-based keypoints and global descriptors, which decouples the relative localization from the state estimation of all agents. The UGV employs a local Bundle Adjustment (BA) with LiDAR, camera, and an IMU to rapidly obtain accurate relative pose estimates. The BA process adopts sparse keypoint optimization and is divided into two stages: First, optimizing camera poses interpolated from LiDAR-Inertial Odometry (LIO), followed by estimating the relative camera poses between the UGV and UAV. Additionally, we implement an incremental loop closure detection algorithm using deep learning-based descriptors to maintain and retrieve keyframes efficiently. Experimental results demonstrate that our method achieves outstanding performance in both accuracy and efficiency. Unlike traditional multi-robot SLAM approaches that transmit images or point clouds, our method only transmits keypoint pixels and their descriptors, effectively constraining the communication bandwidth under 0.3 Mbps. Codes and data will be publicly available on https://github.com/Ascbpiac/cross-model-relative-localization.git.
Authors:Karthikeyan Chandra Sekaran, Markus Geisler, Dominik Rößle, Adithya Mohan, Daniel Cremers, Wolfgang Utschick, Michael Botsch, Werner Huber, Torsten Schön
Abstract:
Recent cooperative perception datasets have played a crucial role in advancing smart mobility applications by enabling information exchange between intelligent agents, helping to overcome challenges such as occlusions and improving overall scene understanding. While some existing real-world datasets incorporate both vehicle-to-vehicle and vehicle-to-infrastructure interactions, they are typically limited to a single intersection or a single vehicle. A comprehensive perception dataset featuring multiple connected vehicles and infrastructure sensors across several intersections remains unavailable, limiting the benchmarking of algorithms in diverse traffic environments. Consequently, overfitting can occur, and models may demonstrate misleadingly high performance due to similar intersection layouts and traffic participant behavior. To address this gap, we introduce UrbanIng-V2X, the first large-scale, multi-modal dataset supporting cooperative perception involving vehicles and infrastructure sensors deployed across three urban intersections in Ingolstadt, Germany. UrbanIng-V2X consists of 34 temporally aligned and spatially calibrated sensor sequences, each lasting 20 seconds. All sequences contain recordings from one of three intersections, involving two vehicles and up to three infrastructure-mounted sensor poles operating in coordinated scenarios. In total, UrbanIng-V2X provides data from 12 vehicle-mounted RGB cameras, 2 vehicle LiDARs, 17 infrastructure thermal cameras, and 12 infrastructure LiDARs. All sequences are annotated at a frequency of 10 Hz with 3D bounding boxes spanning 13 object classes, resulting in approximately 712k annotated instances across the dataset. We provide comprehensive evaluations using state-of-the-art cooperative perception methods and publicly release the codebase, dataset, HD map, and a digital twin of the complete data collection environment.
Authors:Sai Krishna Ghanta, Ramviyas Parasuraman
Abstract:
We consider the distributed pose-graph optimization (PGO) problem, which is fundamental in accurate trajectory estimation in multi-robot simultaneous localization and mapping (SLAM). Conventional iterative approaches linearize a highly non-convex optimization objective, requiring repeated solving of normal equations, which often converge to local minima and thus produce suboptimal estimates. We propose a scalable, outlier-robust distributed planar PGO framework using Multi-Agent Reinforcement Learning (MARL). We cast distributed PGO as a partially observable Markov game defined on local pose-graphs, where each action refines a single edge's pose estimate. A graph partitioner decomposes the global pose graph, and each robot runs a recurrent edge-conditioned Graph Neural Network (GNN) encoder with adaptive edge-gating to denoise noisy edges. Robots sequentially refine poses through a hybrid policy that utilizes prior action memory and graph embeddings. After local graph correction, a consensus scheme reconciles inter-robot disagreements to produce a globally consistent estimate. Our extensive evaluations on a comprehensive suite of synthetic and real-world datasets demonstrate that our learned MARL-based actors reduce the global objective by an average of 37.5% more than the state-of-the-art distributed PGO framework, while enhancing inference efficiency by at least 6X. We also demonstrate that actor replication allows a single learned policy to scale effortlessly to substantially larger robot teams without any retraining. Code is publicly available at https://github.com/herolab-uga/policies-over-poses.
Authors:Rohan Gupta, Trevor Asbery, Zain Merchant, Abrar Anwar, Jesse Thomason
Abstract:
Coordinating heterogeneous robot fleets to achieve multiple goals is challenging in multi-robot systems. We introduce an open-source and extensible framework for centralized multi-robot task planning and scheduling that leverages LLMs to enable fleets of heterogeneous robots to accomplish multiple tasks. RobotFleet provides abstractions for planning, scheduling, and execution across robots deployed as containerized services to simplify fleet scaling and management. The framework maintains a shared declarative world state and two-way communication for task execution and replanning. By modularizing each layer of the autonomy stack and using LLMs for open-world reasoning, RobotFleet lowers the barrier to building scalable multi-robot systems. The code can be found here: https://github.com/therohangupta/robot-fleet.
Authors:Hsu-kuang Chiu, Ryo Hachiuma, Chien-Yi Wang, Yu-Chiang Frank Wang, Min-Hung Chen, Stephen F. Smith
Abstract:
Current state-of-the-art autonomous vehicles could face safety-critical situations when their local sensors are occluded by large nearby objects on the road. Vehicle-to-vehicle (V2V) cooperative autonomous driving has been proposed as a means of addressing this problem, and one recently introduced framework for cooperative autonomous driving has further adopted an approach that incorporates a Multimodal Large Language Model (MLLM) to integrate cooperative perception and planning processes. However, despite the potential benefit of applying graph-of-thoughts reasoning to the MLLM, this idea has not been considered by previous cooperative autonomous driving research. In this paper, we propose a novel graph-of-thoughts framework specifically designed for MLLM-based cooperative autonomous driving. Our graph-of-thoughts includes our proposed novel ideas of occlusion-aware perception and planning-aware prediction. We curate the V2V-GoT-QA dataset and develop the V2V-GoT model for training and testing the cooperative driving graph-of-thoughts. Our experimental results show that our method outperforms other baselines in cooperative perception, prediction, and planning tasks. Our project website: https://eddyhkchiu.github.io/v2vgot.github.io/ .
Authors:Seth Z. Zhao, Huizhi Zhang, Zhaowei Li, Juntong Peng, Anthony Chui, Zewei Zhou, Zonglin Meng, Hao Xiang, Zhiyu Huang, Fujia Wang, Ran Tian, Chenfeng Xu, Bolei Zhou, Jiaqi Ma
Abstract:
Cooperative perception through Vehicle-to-Everything (V2X) communication offers significant potential for enhancing vehicle perception by mitigating occlusions and expanding the field of view. However, past research has predominantly focused on improving accuracy metrics without addressing the crucial system-level considerations of efficiency, latency, and real-world deployability. Noticeably, most existing systems rely on full-precision models, which incur high computational and transmission costs, making them impractical for real-time operation in resource-constrained environments. In this paper, we introduce \textbf{QuantV2X}, the first fully quantized multi-agent system designed specifically for efficient and scalable deployment of multi-modal, multi-agent V2X cooperative perception. QuantV2X introduces a unified end-to-end quantization strategy across both neural network models and transmitted message representations that simultaneously reduces computational load and transmission bandwidth. Remarkably, despite operating under low-bit constraints, QuantV2X achieves accuracy comparable to full-precision systems. More importantly, when evaluated under deployment-oriented metrics, QuantV2X reduces system-level latency by 3.2$\times$ and achieves a +9.5 improvement in mAP30 over full-precision baselines. Furthermore, QuantV2X scales more effectively, enabling larger and more capable models to fit within strict memory budgets. These results highlight the viability of a fully quantized multi-agent intermediate fusion system for real-world deployment. The system will be publicly released to promote research in this field: https://github.com/ucla-mobility/QuantV2X.
Authors:Yuxiao Zhu, Junfeng Chen, Xintong Zhang, Meng Guo, Zhongkui Li
Abstract:
Online coordination of multi-robot systems in open and unknown environments faces significant challenges, particularly when semantic features detected during operation dynamically trigger new tasks. Recent large language model (LLMs)-based approaches for scene reasoning and planning primarily focus on one-shot, end-to-end solutions in known environments, lacking both dynamic adaptation capabilities for online operation and explainability in the processes of planning. To address these issues, a novel framework (DEXTER-LLM) for dynamic task planning in unknown environments, integrates four modules: (i) a mission comprehension module that resolves partial ordering of tasks specified by natural languages or linear temporal logic formulas (LTL); (ii) an online subtask generator based on LLMs that improves the accuracy and explainability of task decomposition via multi-stage reasoning; (iii) an optimal subtask assigner and scheduler that allocates subtasks to robots via search-based optimization; and (iv) a dynamic adaptation and human-in-the-loop verification module that implements multi-rate, event-based updates for both subtasks and their assignments, to cope with new features and tasks detected online. The framework effectively combines LLMs' open-world reasoning capabilities with the optimality of model-based assignment methods, simultaneously addressing the critical issue of online adaptability and explainability. Experimental evaluations demonstrate exceptional performances, with 100% success rates across all scenarios, 160 tasks and 480 subtasks completed on average (3 times the baselines), 62% less queries to LLMs during adaptation, and superior plan quality (2 times higher) for compound tasks. Project page at https://tcxm.github.io/DEXTER-LLM/
Authors:Zhuoli Tian, Yuyang Zhang, Jinsheng Wei, Meng Guo
Abstract:
Fleets of autonomous robots are increasingly deployed alongside multiple human operators to explore unknown environments, identify salient features, and perform complex tasks in scenarios such as subterranean exploration, reconnaissance, and search-and-rescue missions. In these contexts, communication is often severely limited to short-range exchanges via ad-hoc networks, posing challenges to coordination. While recent studies have addressed multi-robot exploration under communication constraints, they largely overlook the essential role of human operators and their real-time interaction with robotic teams. Operators may demand timely updates on the exploration progress and robot status, reprioritize or cancel tasks dynamically, or request live video feeds and control access. Conversely, robots may seek human confirmation for anomalous events or require help recovering from motion or planning failures. To enable such bilateral, context-aware interactions under restricted communication, this work proposes MoRoCo, a unified framework for online coordination and exploration in multi-operator, multi-robot systems. MoRoCo enables the team to adaptively switch among three coordination modes: spread mode for parallelized exploration with intermittent data sharing, migrate mode for coordinated relocation, and chain mode for maintaining high-bandwidth connectivity through multi-hop links. These transitions are managed through distributed algorithms via only local communication. Extensive large-scale human-in-the-loop simulations and hardware experiments validate the necessity of incorporating human robot interactions and demonstrate that MoRoCo enables efficient, reliable coordination under limited communication, marking a significant step toward robust human-in-the-loop multi-robot autonomy in challenging environments.
Authors:Jiaru Zhong, Jiahao Wang, Jiahui Xu, Xiaofan Li, Zaiqing Nie, Haibao Yu
Abstract:
Cooperative perception aims to address the inherent limitations of single-vehicle autonomous driving systems through information exchange among multiple agents. Previous research has primarily focused on single-frame perception tasks. However, the more challenging cooperative sequential perception tasks, such as cooperative 3D multi-object tracking, have not been thoroughly investigated. Therefore, we propose CoopTrack, a fully instance-level end-to-end framework for cooperative tracking, featuring learnable instance association, which fundamentally differs from existing approaches. CoopTrack transmits sparse instance-level features that significantly enhance perception capabilities while maintaining low transmission costs. Furthermore, the framework comprises two key components: Multi-Dimensional Feature Extraction, and Cross-Agent Association and Aggregation, which collectively enable comprehensive instance representation with semantic and motion features, and adaptive cross-agent association and fusion based on a feature graph. Experiments on both the V2X-Seq and Griffin datasets demonstrate that CoopTrack achieves excellent performance. Specifically, it attains state-of-the-art results on V2X-Seq, with 39.0\% mAP and 32.8\% AMOTA. The project is available at https://github.com/zhongjiaru/CoopTrack.
Authors:Kostas Karakontis, Thanos Petsanis, Athanasios Ch. Kapoutsis, Pavlos Ch. Kapoutsis, Elias B. Kosmatopoulos
Abstract:
Multi-UAV Coverage Path Planning (mCPP) algorithms in popular commercial software typically treat a Region of Interest (RoI) only as a 2D plane, ignoring important3D structure characteristics. This leads to incomplete 3Dreconstructions, especially around occluded or vertical surfaces. In this paper, we propose a modular algorithm that can extend commercial two-dimensional path planners to facilitate terrain-aware planning by adjusting altitude and camera orientations. To demonstrate it, we extend the well-known DARP (Divide Areas for Optimal Multi-Robot Coverage Path Planning) algorithm and produce DARP-3D. We present simulation results in multiple 3D environments and a real-world flight test using DJI hardware. Compared to baseline, our approach consistently captures improved 3D reconstructions, particularly in areas with significant vertical features. An open-source implementation of the algorithm is available here:https://github.com/konskara/TerraPlan
Authors:Seokhwan Jeong, Hogyun Kim, Younggun Cho
Abstract:
This paper presents a novel spherical target-based LiDAR-camera extrinsic calibration method designed for outdoor environments with multi-robot systems, considering both target and sensor corruption. The method extracts the 2D ellipse center from the image and the 3D sphere center from the pointcloud, which are then paired to compute the transformation matrix. Specifically, the image is first decomposed using the Segment Anything Model (SAM). Then, a novel algorithm extracts an ellipse from a potentially corrupted sphere, and the extracted center of ellipse is corrected for errors caused by the perspective projection model. For the LiDAR pointcloud, points on the sphere tend to be highly noisy due to the absence of flat regions. To accurately extract the sphere from these noisy measurements, we apply a hierarchical weighted sum to the accumulated pointcloud. Through experiments, we demonstrated that the sphere can be robustly detected even under both types of corruption, outperforming other targets. We evaluated our method using three different types of LiDARs (spinning, solid-state, and non-repetitive) with cameras positioned in three different locations. Furthermore, we validated the robustness of our method to target corruption by experimenting with spheres subjected to various types of degradation. These experiments were conducted in both a planetary test and a field environment. Our code is available at https://github.com/sparolab/MARSCalib.
Authors:Sai Krishna Ghanta, Ramviyas Parasuraman
Abstract:
Relative localization is a crucial capability for multi-robot systems operating in GPS-denied environments. Existing approaches for multi-robot relative localization often depend on costly or short-range sensors like cameras and LiDARs. Consequently, these approaches face challenges such as high computational overhead (e.g., map merging) and difficulties in disjoint environments. To address this limitation, this paper introduces MGPRL, a novel distributed framework for multi-robot relative localization using convex-hull of multiple Wi-Fi access points (AP). To accomplish this, we employ co-regionalized multi-output Gaussian Processes for efficient Radio Signal Strength Indicator (RSSI) field prediction and perform uncertainty-aware multi-AP localization, which is further coupled with weighted convex hull-based alignment for robust relative pose estimation. Each robot predicts the RSSI field of the environment by an online scan of APs in its environment, which are utilized for position estimation of multiple APs. To perform relative localization, each robot aligns the convex hull of its predicted AP locations with that of the neighbor robots. This approach is well-suited for devices with limited computational resources and operates solely on widely available Wi-Fi RSSI measurements without necessitating any dedicated pre-calibration or offline fingerprinting. We rigorously evaluate the performance of the proposed MGPRL in ROS simulations and demonstrate it with real-world experiments, comparing it against multiple state-of-the-art approaches. The results showcase that MGPRL outperforms existing methods in terms of localization accuracy and computational efficiency. Finally, we open source MGPRL as a ROS package https://github.com/herolab-uga/MGPRL.
Authors:Guobin Zhu, Rui Zhou, Wenkang Ji, Shiyu Zhao
Abstract:
Although Multi-Agent Reinforcement Learning (MARL) is effective for complex multi-robot tasks, it suffers from low sample efficiency and requires iterative manual reward tuning. Large Language Models (LLMs) have shown promise in single-robot settings, but their application in multi-robot systems remains largely unexplored. This paper introduces a novel LLM-Aided MARL (LAMARL) approach, which integrates MARL with LLMs, significantly enhancing sample efficiency without requiring manual design. LAMARL consists of two modules: the first module leverages LLMs to fully automate the generation of prior policy and reward functions. The second module is MARL, which uses the generated functions to guide robot policy training effectively. On a shape assembly benchmark, both simulation and real-world experiments demonstrate the unique advantages of LAMARL. Ablation studies show that the prior policy improves sample efficiency by an average of 185.9% and enhances task completion, while structured prompts based on Chain-of-Thought (CoT) and basic APIs improve LLM output success rates by 28.5%-67.5%. Videos and code are available at https://windylab.github.io/LAMARL/
Authors:Doncey Albin, Miles Mena, Annika Thomas, Harel Biggie, Xuefei Sun, Dusty Woods, Steve McGuire, Christoffer Heckman
Abstract:
Multi-robot systems (MRSs) are valuable for tasks such as search and rescue due to their ability to coordinate over shared observations. A central challenge in these systems is aligning independently collected perception data across space and time, i.e., multi-robot data association. While recent advances in collaborative SLAM (C-SLAM), map merging, and inter-robot loop closure detection have significantly progressed the field, evaluation strategies still predominantly rely on splitting a single trajectory from single-robot SLAM datasets into multiple segments to simulate multiple robots. Without careful consideration to how a single trajectory is split, this approach will fail to capture realistic pose-dependent variation in observations of a scene inherent to multi-robot systems. To address this gap, we present CU-Multi, a multi-robot dataset collected over multiple days at two locations on the University of Colorado Boulder campus. Using a single robotic platform, we generate four synchronized runs with aligned start times and deliberate percentages of trajectory overlap. CU-Multi includes RGB-D, GPS with accurate geospatial heading, and semantically annotated LiDAR data. By introducing controlled variations in trajectory overlap and dense lidar annotations, CU-Multi offers a compelling alternative for evaluating methods in multi-robot data association. Instructions on accessing the dataset, support code, and the latest updates are publicly available at https://arpg.github.io/cumulti
Authors:Hogyun Kim, Jiwon Choi, Juwon Kim, Geonmo Yang, Dongjin Cho, Hyungtae Lim, Younggun Cho
Abstract:
Distributed LiDAR SLAM is crucial for achieving efficient robot autonomy and improving the scalability of mapping. However, two issues need to be considered when applying it in field environments: one is resource limitation, and the other is inter/intra-robot association. The resource limitation issue arises when the data size exceeds the processing capacity of the network or memory, especially when utilizing communication systems or onboard computers in the field. The inter/intra-robot association issue occurs due to the narrow convergence region of ICP under large viewpoint differences, triggering many false positive loops and ultimately resulting in an inconsistent global map for multi-robot systems. To tackle these problems, we propose a distributed LiDAR SLAM framework designed for versatile field applications, called SKiD-SLAM. Extending our previous work that solely focused on lightweight place recognition and fast and robust global registration, we present a multi-robot mapping framework that focuses on robust and lightweight inter-robot loop closure in distributed LiDAR SLAM. Through various environmental experiments, we demonstrate that our method is more robust and lightweight compared to other state-of-the-art distributed SLAM approaches, overcoming resource limitation and inter/intra-robot association issues. Also, we validated the field applicability of our approach through mapping experiments in real-world planetary emulation terrain and cave environments, which are in-house datasets. Our code will be available at https://sparolab.github.io/research/skid_slam/.
Authors:Shalin Anand Jain, Jiazhen Liu, Siva Kailas, Harish Ravichandar
Abstract:
Multi-agent reinforcement learning (MARL) has emerged as a promising solution for learning complex and scalable coordination behaviors in multi-robot systems. However, established MARL platforms (e.g., SMAC and MPE) lack robotics relevance and hardware deployment, leaving multi-robot learning researchers to develop bespoke environments and hardware testbeds dedicated to the development and evaluation of their individual contributions. The Multi-Agent RL Benchmark and Learning Environment for the Robotarium (MARBLER) is an exciting recent step in providing a standardized robotics-relevant platform for MARL, by bridging the Robotarium testbed with existing MARL software infrastructure. However, MARBLER lacks support for parallelization and GPU/TPU execution, making the platform prohibitively slow compared to modern MARL environments and hindering adoption. We contribute JaxRobotarium, a Jax-powered end-to-end simulation, learning, deployment, and benchmarking platform for the Robotarium. JaxRobotarium enables rapid training and deployment of multi-robot RL (MRRL) policies with realistic robot dynamics and safety constraints, supporting parallelization and hardware acceleration. Our generalizable learning interface integrates easily with SOTA MARL libraries (e.g., JaxMARL). In addition, JaxRobotarium includes eight standardized coordination scenarios, including four novel scenarios that bring established MARL benchmark tasks (e.g., RWARE and Level-Based Foraging) to a robotics setting. We demonstrate that JaxRobotarium retains high simulation fidelity while achieving dramatic speedups over baseline (20x in training and 150x in simulation), and provides an open-access sim-to-real evaluation pipeline through the Robotarium testbed, accelerating and democratizing access to multi-robot learning research and evaluation. Our code is available at https://github.com/GT-STAR-Lab/JaxRobotarium.
Authors:Wenkang Ji, Huaben Chen, Mingyang Chen, Guobin Zhu, Lufeng Xu, Roderich GroÃ, Rui Zhou, Ming Cao, Shiyu Zhao
Abstract:
The development of control policies for multi-robot systems traditionally follows a complex and labor-intensive process, often lacking the flexibility to adapt to dynamic tasks. This has motivated research on methods to automatically create control policies. However, these methods require iterative processes of manually crafting and refining objective functions, thereby prolonging the development cycle. This work introduces \textit{GenSwarm}, an end-to-end system that leverages large language models to automatically generate and deploy control policies for multi-robot tasks based on simple user instructions in natural language. As a multi-language-agent system, GenSwarm achieves zero-shot learning, enabling rapid adaptation to altered or unseen tasks. The white-box nature of the code policies ensures strong reproducibility and interpretability. With its scalable software and hardware architectures, GenSwarm supports efficient policy deployment on both simulated and real-world multi-robot systems, realizing an instruction-to-execution end-to-end functionality that could prove valuable for robotics specialists and non-specialists alike.The code of the proposed GenSwarm system is available online: https://github.com/WindyLab/GenSwarm.
Authors:Heng Zhang, Guoxiang Zhao, Xiaoqiang Ren
Abstract:
Pursuit-evasion (PE) problem is a critical challenge in multi-robot systems (MRS). While reinforcement learning (RL) has shown its promise in addressing PE tasks, research has primarily focused on single-target pursuit, with limited exploration of multi-target encirclement, particularly in large-scale settings. This paper proposes a Transformer-Enhanced Reinforcement Learning (TERL) framework for large-scale multi-target encirclement. By integrating a transformer-based policy network with target selection, TERL enables robots to adaptively prioritize targets and safely coordinate robots. Results show that TERL outperforms existing RL-based methods in terms of encirclement success rate and task completion time, while maintaining good performance in large-scale scenarios. Notably, TERL, trained on small-scale scenarios (15 pursuers, 4 targets), generalizes effectively to large-scale settings (80 pursuers, 20 targets) without retraining, achieving a 100% success rate. The code and demonstration video are available at https://github.com/ApricityZ/TERL.
Authors:Yoshiki Yano, Kazuki Shibata, Maarten Kokshoorn, Takamitsu Matsubara
Abstract:
Recent advances in Large Language Models (LLMs) have permitted the development of language-guided multi-robot systems, which allow robots to execute tasks based on natural language instructions. However, achieving effective coordination in distributed multi-agent environments remains challenging due to (1) misalignment between instructions and task requirements and (2) inconsistency in robot behaviors when they independently interpret ambiguous instructions. To address these challenges, we propose Instruction-Conditioned Coordinator (ICCO), a Multi-Agent Reinforcement Learning (MARL) framework designed to enhance coordination in language-guided multi-robot systems. ICCO consists of a Coordinator agent and multiple Local Agents, where the Coordinator generates Task-Aligned and Consistent Instructions (TACI) by integrating language instructions with environmental states, ensuring task alignment and behavioral consistency. The Coordinator and Local Agents are jointly trained to optimize a reward function that balances task efficiency and instruction following. A Consistency Enhancement Term is added to the learning objective to maximize mutual information between instructions and robot behaviors, further improving coordination. Simulation and real-world experiments validate the effectiveness of ICCO in achieving language-guided task-aligned multi-robot control. The demonstration can be found at https://yanoyoshiki.github.io/ICCO/.
Authors:Runze Xiao, Yongdong Wang, Yusuke Tsunoda, Koichi Osuka, Hajime Asama
Abstract:
Navigating unknown three-dimensional (3D) rugged environments is challenging for multi-robot systems. Traditional discrete systems struggle with rough terrain due to limited individual mobility, while modular systems--where rigid, controllable constraints link robot units--improve traversal but suffer from high control complexity and reduced flexibility. To address these limitations, we propose the Multi-Robot System with Controllable Weak Constraints (MRS-CWC), where robot units are connected by constraints with dynamically adjustable stiffness. This adaptive mechanism softens or stiffens in real-time during environmental interactions, ensuring a balance between flexibility and mobility. We formulate the system's dynamics and control model and evaluate MRS-CWC against six baseline methods and an ablation variant in a benchmark dataset with 100 different simulation terrains. Results show that MRS-CWC achieves the highest navigation completion rate and ranks second in success rate, efficiency, and energy cost in the highly rugged terrain group, outperforming all baseline methods without relying on environmental modeling, path planning, or complex control. Even where MRS-CWC ranks second, its performance is only slightly behind a more complex ablation variant with environmental modeling and path planning. Finally, we develop a physical prototype and validate its feasibility in a constructed rugged environment. For videos, simulation benchmarks, and code, please visit https://wyd0817.github.io/project-mrs-cwc/.
Authors:Jiahao Wang, Xiangyu Cao, Jiaru Zhong, Yuner Zhang, Haibao Yu, Lei He, Shaobing Xu
Abstract:
Despite significant advancements, autonomous driving systems continue to struggle with occluded objects and long-range detection due to the inherent limitations of single-perspective sensing. Aerial-ground cooperation offers a promising solution by integrating UAVs' aerial views with ground vehicles' local observations. However, progress in this emerging field has been hindered by the absence of public datasets and standardized evaluation benchmarks. To address this gap, this paper presents a comprehensive solution for aerial-ground cooperative 3D perception through three key contributions: (1) Griffin, a large-scale multi-modal dataset featuring over 200 dynamic scenes (30k+ frames) with varied UAV altitudes (20-60m), diverse weather conditions, and occlusion-aware 3D annotations, enhanced by CARLA-AirSim co-simulation for realistic UAV dynamics; (2) A unified benchmarking framework for aerial-ground cooperative detection and tracking tasks, including protocols for evaluating communication efficiency, latency tolerance, and altitude adaptability; (3) AGILE, an instance-level intermediate fusion baseline that dynamically aligns cross-view features through query-based interaction, achieving an advantageous balance between communication overhead and perception accuracy. Extensive experiments prove the effectiveness of aerial-ground cooperative perception and demonstrate the direction of further research. The dataset and codes are available at https://github.com/wang-jh18-SVM/Griffin.
Authors:Hsu-kuang Chiu, Ryo Hachiuma, Chien-Yi Wang, Stephen F. Smith, Yu-Chiang Frank Wang, Min-Hung Chen
Abstract:
Current autonomous driving vehicles rely mainly on their individual sensors to understand surrounding scenes and plan for future trajectories, which can be unreliable when the sensors are malfunctioning or occluded. To address this problem, cooperative perception methods via vehicle-to-vehicle (V2V) communication have been proposed, but they have tended to focus on perception tasks like detection or tracking. How those approaches contribute to overall cooperative planning performance is still under-explored. Inspired by recent progress using Large Language Models (LLMs) to build autonomous driving systems, we propose a novel problem setting that integrates a Multi-Modal LLM into cooperative autonomous driving, with the proposed Vehicle-to-Vehicle Question-Answering (V2V-QA) dataset and benchmark. We also propose our baseline method Vehicle-to-Vehicle Multi-Modal Large Language Model (V2V-LLM), which uses an LLM to fuse perception information from multiple connected autonomous vehicles (CAVs) and answer various types of driving-related questions: grounding, notable object identification, and planning. Experimental results show that our proposed V2V-LLM can be a promising unified model architecture for performing various tasks in cooperative autonomous driving, and outperforms other baseline methods that use different fusion approaches. Our work also creates a new research direction that can improve the safety of future autonomous driving systems. The code and data will be released to the public to facilitate open-source research in this field. Our project website: https://eddyhkchiu.github.io/v2vllm.github.io/ .
Authors:Lantao Li, Kang Yang, Wenqi Zhang, Xiaoxue Wang, Chen Sun
Abstract:
Cooperative perception enhances autonomous driving by leveraging Vehicle-to-Everything (V2X) communication for multi-agent sensor fusion. However, most existing methods rely on single-modal data sharing, limiting fusion performance, particularly in heterogeneous sensor settings involving both LiDAR and cameras across vehicles and roadside units (RSUs). To address this, we propose Radian Glue Attention (RG-Attn), a lightweight and generalizable cross-modal fusion module that unifies intra-agent and inter-agent fusion via transformation-based coordinate alignment and a unified sampling/inversion strategy. RG-Attn efficiently aligns features through a radian-based attention constraint, operating column-wise on geometrically consistent regions to reduce overhead and preserve spatial coherence, thereby enabling accurate and robust fusion. Building upon RG-Attn, we propose three cooperative architectures. The first, Paint-To-Puzzle (PTP), prioritizes communication efficiency but assumes all agents have LiDAR, optionally paired with cameras. The second, Co-Sketching-Co-Coloring (CoS-CoCo), offers maximal flexibility, supporting any sensor setup (e.g., LiDAR-only, camera-only, or both) and enabling strong cross-modal generalization for real-world deployment. The third, Pyramid-RG-Attn Fusion (PRGAF), aims for peak detection accuracy with the highest computational overhead. Extensive evaluations on simulated and real-world datasets show our framework delivers state-of-the-art detection accuracy with high flexibility and efficiency. GitHub Link: https://github.com/LantaoLi/RG-Attn
Authors:Qiuyi Gu, Zhaocheng Ye, Jincheng Yu, Jiahao Tang, Tinghao Yi, Yuhan Dong, Jian Wang, Jinqiang Cui, Xinlei Chen, Yu Wang
Abstract:
Collaborative perception in unknown environments is crucial for multi-robot systems. With the emergence of foundation models, robots can now not only perceive geometric information but also achieve open-vocabulary scene understanding. However, existing map representations that support open-vocabulary queries often involve large data volumes, which becomes a bottleneck for multi-robot transmission in communication-limited environments. To address this challenge, we develop a method to construct a graph-structured 3D representation called COGraph, where nodes represent objects with semantic features and edges capture their spatial adjacency relationships. Before transmission, a data-driven feature encoder is applied to compress the feature dimensions of the COGraph. Upon receiving COGraphs from other robots, the semantic features of each node are recovered using a decoder. We also propose a feature-based approach for place recognition and translation estimation, enabling the merging of local COGraphs into a unified global map. We validate our framework on two realistic datasets and the real-world environment. The results demonstrate that, compared to existing baselines for open-vocabulary map construction, our framework reduces the data volume by over 80\% while maintaining mapping and query performance without compromise. For more details, please visit our website at https://github.com/efc-robot/MR-COGraphs.
Authors:Zhixuan Shen, Haonan Luo, Kexun Chen, Fengmao Lv, Tianrui Li
Abstract:
Understanding how humans cooperatively utilize semantic knowledge to explore unfamiliar environments and decide on navigation directions is critical for house service multi-robot systems. Previous methods primarily focused on single-robot centralized planning strategies, which severely limited exploration efficiency. Recent research has considered decentralized planning strategies for multiple robots, assigning separate planning models to each robot, but these approaches often overlook communication costs. In this work, we propose Multimodal Chain-of-Thought Co-Navigation (MCoCoNav), a modular approach that utilizes multimodal Chain-of-Thought to plan collaborative semantic navigation for multiple robots. MCoCoNav combines visual perception with Vision Language Models (VLMs) to evaluate exploration value through probabilistic scoring, thus reducing time costs and achieving stable outputs. Additionally, a global semantic map is used as a communication bridge, minimizing communication overhead while integrating observational results. Guided by scores that reflect exploration trends, robots utilize this map to assess whether to explore new frontier points or revisit history nodes. Experiments on HM3D_v0.2 and MP3D demonstrate the effectiveness of our approach. Our code is available at https://github.com/FrankZxShen/MCoCoNav.git.
Authors:Hanchu Zhou, Edward Xie, Wei Shao, Dechen Gao, Michelle Dong, Junshan Zhang
Abstract:
The growing interest in autonomous driving calls for realistic simulation platforms capable of accurately simulating cooperative perception process in realistic traffic scenarios. Existing studies for cooperative perception often have not accounted for transmission latency and errors in real-world environments. To address this gap, we introduce EI-Drive, an edge-AI based autonomous driving simulation platform that integrates advanced cooperative perception with more realistic communication models. Built on the CARLA framework, EI-Drive features new modules for cooperative perception while taking into account transmission latency and errors, providing a more realistic platform for evaluating cooperative perception algorithms. In particular, the platform enables vehicles to fuse data from multiple sources, improving situational awareness and safety in complex environments. With its modular design, EI-Drive allows for detailed exploration of sensing, perception, planning, and control in various cooperative driving scenarios. Experiments using EI-Drive demonstrate significant improvements in vehicle safety and performance, particularly in scenarios with complex traffic flow and network conditions. All code and documents are accessible on our GitHub page: \url{https://ucd-dare.github.io/eidrive.github.io/}.
Authors:Harin Park, Inha Lee, Minje Kim, Hyungyu Park, Kyungdon Joo
Abstract:
As service environments have become diverse, they have started to demand complicated tasks that are difficult for a single robot to complete. This change has led to an interest in multiple robots instead of a single robot. C-SLAM, as a fundamental technique for multiple service robots, needs to handle diverse challenges such as homogeneous scenes and dynamic objects to ensure that robots operate smoothly and perform their tasks safely. However, existing C-SLAM datasets do not include the various indoor service environments with the aforementioned challenges. To close this gap, we introduce a new multi-modal C-SLAM dataset for multiple service robots in various indoor service environments, called C-SLAM dataset in Service Environments (CSE). We use the NVIDIA Isaac Sim to generate data in various indoor service environments with the challenges that may occur in real-world service environments. By using simulation, we can provide accurate and precisely time-synchronized sensor data, such as stereo RGB, stereo depth, IMU, and ground truth (GT) poses. We configure three common indoor service environments (Hospital, Office, and Warehouse), each of which includes various dynamic objects that perform motions suitable to each environment. In addition, we drive three robots to mimic the actions of real service robots. Through these factors, we generate a more realistic C-SLAM dataset for multiple service robots. We demonstrate our dataset by evaluating diverse state-of-the-art single-robot SLAM and multi-robot SLAM methods. Our dataset is available at https://github.com/vision3d-lab/CSE_Dataset.
Authors:Yinsong Wang, Siwei Chen, Ziyi Song, Sheng Zhou
Abstract:
Cooperative perception research is hindered by the limited availability of datasets that capture the complexity of real-world Vehicle-to-Everything (V2X) interactions, particularly under dynamic communication constraints. To address this gap, we introduce WHALES (Wireless enhanced Autonomous vehicles with Large number of Engaged agents), the first large-scale V2X dataset explicitly designed to benchmark communication-aware agent scheduling and scalable cooperative perception. WHALES introduces a new benchmark that enables state-of-the-art (SOTA) research in communication-aware cooperative perception, featuring an average of 8.4 cooperative agents per scene and 2.01 million annotated 3D objects across diverse traffic scenarios. It incorporates detailed communication metadata to emulate real-world communication bottlenecks, enabling rigorous evaluation of scheduling strategies. To further advance the field, we propose the Coverage-Aware Historical Scheduler (CAHS), a novel scheduling baseline that selects agents based on historical viewpoint coverage, improving perception performance over existing SOTA methods. WHALES bridges the gap between simulated and real-world V2X challenges, providing a robust framework for exploring perception-scheduling co-design, cross-data generalization, and scalability limits. The WHALES dataset and code are available at https://github.com/chensiweiTHU/WHALES.
Authors:Lei Yang, Xinyu Zhang, Chen Wang, Jun Li, Jiaqi Ma, Zhiying Song, Tong Zhao, Ziying Song, Li Wang, Mo Zhou, Yang Shen, Kai Wu, Chen Lv
Abstract:
Modern autonomous vehicle perception systems often struggle with occlusions and limited perception range. Previous studies have demonstrated the effectiveness of cooperative perception in extending the perception range and overcoming occlusions, thereby enhancing the safety of autonomous driving. In recent years, a series of cooperative perception datasets have emerged; however, these datasets primarily focus on cameras and LiDAR, neglecting 4D Radar, a sensor used in single-vehicle autonomous driving to provide robust perception in adverse weather conditions. In this paper, to bridge the gap created by the absence of 4D Radar datasets in cooperative perception, we present V2X-Radar, the first large-scale, real-world multi-modal dataset featuring 4D Radar. V2X-Radar dataset is collected using a connected vehicle platform and an intelligent roadside unit equipped with 4D Radar, LiDAR, and multi-view cameras. The collected data encompasses sunny and rainy weather conditions, spanning daytime, dusk, and nighttime, as well as various typical challenging scenarios. The dataset consists of 20K LiDAR frames, 40K camera images, and 20K 4D Radar data, including 350K annotated boxes across five categories. To support various research domains, we have established V2X-Radar-C for cooperative perception, V2X-Radar-I for roadside perception, and V2X-Radar-V for single-vehicle perception. Furthermore, we provide comprehensive benchmarks across these three sub-datasets. We will release all datasets and benchmark codebase at http://openmpd.com/column/V2X-Radar and https://github.com/yanglei18/V2X-Radar.
Authors:Yongdong Wang, Runze Xiao, Jun Younes Louhi Kasahara, Ryosuke Yajima, Keiji Nagatani, Atsushi Yamashita, Hajime Asama
Abstract:
Large Language Models (LLMs) have demonstrated promising reasoning capabilities in robotics; however, their application in multi-robot systems remains limited, particularly in handling task dependencies. This paper introduces DART-LLM, a novel framework that employs Directed Acyclic Graphs (DAGs) to model task dependencies, enabling the decomposition of natural language instructions into well-coordinated subtasks for multi-robot execution. DART-LLM comprises four key components: a Question-Answering (QA) LLM module for dependency-aware task decomposition, a Breakdown Function module for robot assignment, an Actuation module for execution, and a Vision-Language Model (VLM)-based object detector for environmental perception, achieving end-to-end task execution. Experimental results across three task complexity levels demonstrate that DART-LLM achieves state-of-the-art performance, significantly outperforming the baseline across all evaluation metrics. Among the tested models, DeepSeek-r1-671B achieves the highest success rate, whereas Llama-3.1-8B exhibits superior response time reliability. Ablation studies further confirm that explicit dependency modeling notably enhances the performance of smaller models, facilitating efficient deployment on resource-constrained platforms. Please refer to the project website https://wyd0817.github.io/project-dart-llm/ for videos and code.
Authors:Minghao Ning, Yaodong Cui, Yufeng Yang, Shucheng Huang, Zhenan Liu, Ahmad Reza Alghooneh, Ehsan Hashemi, Amir Khajepour
Abstract:
This paper presents a novel real-time, delay-aware cooperative perception system designed for intelligent mobility platforms operating in dynamic indoor environments. The system contains a network of multi-modal sensor nodes and a central node that collectively provide perception services to mobility platforms. The proposed Hierarchical Clustering Considering the Scanning Pattern and Ground Contacting Feature based Lidar Camera Fusion improve intra-node perception for crowded environment. The system also features delay-aware global perception to synchronize and aggregate data across nodes. To validate our approach, we introduced the Indoor Pedestrian Tracking dataset, compiled from data captured by two indoor sensor nodes. Our experiments, compared to baselines, demonstrate significant improvements in detection accuracy and robustness against delays. The dataset is available in the repository: https://github.com/NingMingHao/MVSLab-IndoorCooperativePerception
Authors:Brendan Dijkstra, Ninad Jadhav, Alex Sloot, Matteo Marcantoni, Bayu Jayawardhana, Stephanie Gil, Bahar Haghighat
Abstract:
Development and testing of multi-robot systems employing wireless signal-based sensing requires access to suitable hardware, such as channel monitoring WiFi transceivers, which can pose significant limitations. The WiFi Sensor for Robotics (WSR) toolbox, introduced by Jadhav et al. in 2022, provides a novel solution by using WiFi Channel State Information (CSI) to compute relative bearing between robots. The toolbox leverages the amplitude and phase of WiFi signals and creates virtual antenna arrays by exploiting the motion of mobile robots, eliminating the need for physical antenna arrays. However, the WSR toolbox's reliance on an obsoleting WiFi transceiver hardware has limited its operability and accessibility, hindering broader application and development of relevant tools. We present an open-source simulation framework that replicates the WSR toolbox's capabilities using Gazebo and Matlab. By simulating WiFi-CSI data collection, our framework emulates the behavior of mobile robots equipped with the WSR toolbox, enabling precise bearing estimation without physical hardware. We validate the framework through experiments with both simulated and real Turtlebot3 robots, showing a close match between the obtained CSI data and the resulting bearing estimates. This work provides a virtual environment for developing and testing WiFi-CSI-based multi-robot localization without relying on physical hardware. All code and experimental setup information are publicly available at https://github.com/BrendanxP/CSI-Simulation-Framework
Authors:Apoorva Vashisth, Manav Kulshrestha, Damon Conover, Aniket Bera
Abstract:
Autonomous robots are widely utilized for mapping and exploration tasks due to their cost-effectiveness. Multi-robot systems offer scalability and efficiency, especially in terms of the number of robots deployed in more complex environments. These tasks belong to the set of Multi-Robot Informative Path Planning (MRIPP) problems. In this paper, we propose a deep reinforcement learning approach for the MRIPP problem. We aim to maximize the number of discovered stationary targets in an unknown 3D environment while operating under resource constraints (such as path length). Here, each robot aims to maximize discovered targets, avoid unknown static obstacles, and prevent inter-robot collisions while operating under communication and resource constraints. We utilize the centralized training and decentralized execution paradigm to train a single policy neural network. A key aspect of our approach is our coordination graph that prioritizes visiting regions not yet explored by other robots. Our learned policy can be copied onto any number of robots for deployment in more complex environments not seen during training. Our approach outperforms state-of-the-art approaches by at least 26.2% in terms of the number of discovered targets while requiring a planning time of less than 2 sec per step. We present results for more complex environments with up to 64 robots and compare success rates against baseline planners. Our code and trained model are available at - https://github.com/AccGen99/marl_ipp
Authors:Kehui Liu, Zixin Tang, Dong Wang, Zhigang Wang, Xuelong Li, Bin Zhao
Abstract:
Leveraging the powerful reasoning capabilities of large language models (LLMs), recent LLM-based robot task planning methods yield promising results. However, they mainly focus on single or multiple homogeneous robots on simple tasks. Practically, complex long-horizon tasks always require collaboration among multiple heterogeneous robots especially with more complex action spaces, which makes these tasks more challenging. To this end, we propose COHERENT, a novel LLM-based task planning framework for collaboration of heterogeneous multi-robot systems including quadrotors, robotic dogs, and robotic arms. Specifically, a Proposal-Execution-Feedback-Adjustment (PEFA) mechanism is designed to decompose and assign actions for individual robots, where a centralized task assigner makes a task planning proposal to decompose the complex task into subtasks, and then assigns subtasks to robot executors. Each robot executor selects a feasible action to implement the assigned subtask and reports self-reflection feedback to the task assigner for plan adjustment. The PEFA loops until the task is completed. Moreover, we create a challenging heterogeneous multi-robot task planning benchmark encompassing 100 complex long-horizon tasks. The experimental results show that our work surpasses the previous methods by a large margin in terms of success rate and execution efficiency. The experimental videos, code, and benchmark are released at https://github.com/MrKeee/COHERENT.
Authors:Jinlong Li, Xinyu Liu, Baolu Li, Runsheng Xu, Jiachen Li, Hongkai Yu, Zhengzhong Tu
Abstract:
Cooperative perception systems play a vital role in enhancing the safety and efficiency of vehicular autonomy. Although recent studies have highlighted the efficacy of vehicle-to-everything (V2X) communication techniques in autonomous driving, a significant challenge persists: how to efficiently integrate multiple high-bandwidth features across an expanding network of connected agents such as vehicles and infrastructure. In this paper, we introduce CoMamba, a novel cooperative 3D detection framework designed to leverage state-space models for real-time onboard vehicle perception. Compared to prior state-of-the-art transformer-based models, CoMamba enjoys being a more scalable 3D model using bidirectional state space models, bypassing the quadratic complexity pain-point of attention mechanisms. Through extensive experimentation on V2X/V2V datasets, CoMamba achieves superior performance compared to existing methods while maintaining real-time processing capabilities. The proposed framework not only enhances object detection accuracy but also significantly reduces processing time, making it a promising solution for next-generation cooperative perception systems in intelligent transportation networks.
Authors:Gang Xu, Yuchen Wu, Sheng Tao, Yifan Yang, Tao Liu, Tao Huang, Huifeng Wu, Yong Liu
Abstract:
This letter presents a novel multi-robot task allocation and path planning method that considers robots' maximum range constraints in large-sized workspaces, enabling robots to complete the assigned tasks within their range limits. Firstly, we developed a fast path planner to solve global paths efficiently. Subsequently, we propose an innovative auction-based approach that integrates our path planner into the auction phase for reward computation while considering the robots' range limits. This method accounts for extra obstacle-avoiding travel distances rather than ideal straight-line distances, resolving the coupling between task allocation and path planning. Additionally, to avoid redundant computations during iterations, we implemented a lazy auction strategy to speed up the convergence of the task allocation. Finally, we validated the proposed method's effectiveness and application potential through extensive simulation and real-world experiments. The implementation code for our method will be available at https://github.com/wuuya1/RangeTAP.
Authors:Rongsong Li, Xin Pei
Abstract:
Cooperative perception through vehicle-to-everything (V2X) has garnered significant attention in recent years due to its potential to overcome occlusions and enhance long-distance perception. Great achievements have been made in both datasets and algorithms. However, existing real-world datasets are limited by the presence of few communicable agents, while synthetic datasets typically cover only vehicles. More importantly, the penetration rate of connected and autonomous vehicles (CAVs) , a critical factor for the deployment of cooperative perception technologies, has not been adequately addressed. To tackle these issues, we introduce Multi-V2X, a large-scale, multi-modal, multi-penetration-rate dataset for V2X perception. By co-simulating SUMO and CARLA, we equip a substantial number of cars and roadside units (RSUs) in simulated towns with sensor suites, and collect comprehensive sensing data. Datasets with specified CAV penetration rates can be obtained by masking some equipped cars as normal vehicles. In total, our Multi-V2X dataset comprises 549k RGB frames, 146k LiDAR frames, and 4,219k annotated 3D bounding boxes across six categories. The highest possible CAV penetration rate reaches 86.21%, with up to 31 agents in communication range, posing new challenges in selecting agents to collaborate with. We provide comprehensive benchmarks for cooperative 3D object detection tasks. Our data and code are available at https://github.com/RadetzkyLi/Multi-V2X .
Authors:An Guo, Shuoxiao Zhang, Enyi Tang, Xinyu Gao, Haomin Pang, Haoxiang Tian, Yanzhou Mu, Wu Wen, Chunrong Fang, Zhenyu Chen
Abstract:
With the tremendous advancement of deep learning and communication technology, Vehicle-to-Everything (V2X) cooperative perception has the potential to address limitations in sensing distant objects and occlusion for a single-agent perception system. V2X cooperative perception systems are software systems characterized by diverse sensor types and cooperative agents, varying fusion schemes, and operation under different communication conditions. Therefore, their complex composition gives rise to numerous operational challenges. Furthermore, when cooperative perception systems produce erroneous predictions, the types of errors and their underlying causes remain insufficiently explored. To bridge this gap, we take an initial step by conducting an empirical study of V2X cooperative perception. To systematically evaluate the impact of cooperative perception on the ego vehicle's perception performance, we identify and analyze six prevalent error patterns in cooperative perception systems. We further conduct a systematic evaluation of the critical components of these systems through our large-scale study and identify the following key findings: (1) The LiDAR-based cooperation configuration exhibits the highest perception performance; (2) Vehicle-to-infrastructure (V2I) and vehicle-to-vehicle (V2V) communication exhibit distinct cooperative perception performance under different fusion schemes; (3) Increased cooperative perception errors may result in a higher frequency of driving violations; (4) Cooperative perception systems are not robust against communication interference when running online. Our results reveal potential risks and vulnerabilities in critical components of cooperative perception systems. We hope that our findings can better promote the design and repair of cooperative perception systems.
Authors:An Guo, Xinyu Gao, Chunrong Fang, Haoxiang Tian, Weisong Sun, Yanzhou Mu, Shuncheng Tang, Lei Ma, Zhenyu Chen
Abstract:
Accurately perceiving complex driving environments is essential for ensuring the safe operation of autonomous vehicles. With the tremendous progress in deep learning and communication technologies, cooperative perception with Vehicle-to-Everything (V2X) technologies has emerged as a solution to overcome the limitations of single-agent perception systems in perceiving distant objects and occlusions. Despite the considerable advancements, V2X cooperative perception systems require thorough testing and continuous enhancement of system performance. Given that V2X driving scenes entail intricate communications with multiple vehicles across various geographic locations, creating V2X test scenes for these systems poses a significant challenge. Moreover, current testing methodologies rely on manual data collection and labeling, which are both time-consuming and costly.
In this paper, we design and implement V2XGen, an automated testing generation tool for V2X cooperative perception systems. V2XGen utilizes a high-fidelity approach to generate realistic cooperative object instances and strategically place them within the background data in crucial positions. Furthermore, V2XGen adopts a fitness-guided V2X scene generation strategy for the transformed scene generation process and improves testing efficiency. We conduct experiments on V2XGen using multiple cooperative perception systems with different fusion schemes to assess its performance on various tasks. The experimental results demonstrate that V2XGen is capable of generating realistic test scenes and effectively detecting erroneous behaviors in different V2X-oriented driving conditions. Furthermore, the results validate that retraining systems under test with the generated scenes can enhance average detection precision while reducing occlusion and long-range perception errors.
Authors:Zhehan Li, Zheng Wang, Jiadong Lu, Qi Liu, Zhiren Xun, Yue Wang, Fei Gao, Chao Xu, Yanjun Cao
Abstract:
Relative localization is critical for cooperation in autonomous multi-robot systems. Existing approaches either rely on shared environmental features or inertial assumptions or suffer from non-line-of-sight degradation and outliers in complex environments. Robust and efficient fusion of inter-robot measurements such as bearings, distances, and inertials for tens of robots remains challenging. We present CREPES-X (Cooperative RElative Pose Estimation System with multiple eXtended features), a hierarchical relative localization framework that enhances speed, accuracy, and robustness under challenging conditions, without requiring any global information. CREPES-X starts with a compact hardware design: InfraRed (IR) LEDs, an IR camera, an ultra-wideband module, and an IMU housed in a cube no larger than 6cm on each side. Then CREPES-X implements a two-stage hierarchical estimator to meet different requirements, considering speed, accuracy, and robustness. First, we propose a single-frame relative estimator that provides instant relative poses for multi-robot setups through a closed-form solution and robust bearing outlier rejection. Then a multi-frame relative estimator is designed to offer accurate and robust relative states by exploring IMU pre-integration via robocentric relative kinematics with loosely- and tightly-coupled optimization. Extensive simulations and real-world experiments validate the effectiveness of CREPES-X, showing robustness to up to 90% bearing outliers, proving resilience in challenging conditions, and achieving RMSE of 0.073m and 1.817° in real-world datasets.
Authors:Faryal Batool, Malaika Zafar, Yasheerah Yaqoot, Roohan Ahmed Khan, Muhammad Haris Khan, Aleksey Fedoseev, Dzmitry Tsetserukou
Abstract:
Swarm robotics plays a crucial role in enabling autonomous operations in dynamic and unpredictable environments. However, a major challenge remains ensuring safe and efficient navigation in environments filled with both dynamic alive (e.g., humans) and dynamic inanimate (e.g., non-living objects) obstacles. In this paper, we propose ImpedanceGPT, a novel system that combines a Vision-Language Model (VLM) with retrieval-augmented generation (RAG) to enable real-time reasoning for adaptive navigation of mini-drone swarms in complex environments.
The key innovation of ImpedanceGPT lies in the integration of VLM and RAG, which provides the drones with enhanced semantic understanding of their surroundings. This enables the system to dynamically adjust impedance control parameters in response to obstacle types and environmental conditions. Our approach not only ensures safe and precise navigation but also improves coordination between drones in the swarm.
Experimental evaluations demonstrate the effectiveness of the system. The VLM-RAG framework achieved an obstacle detection and retrieval accuracy of 80 % under optimal lighting. In static environments, drones navigated dynamic inanimate obstacles at 1.4 m/s but slowed to 0.7 m/s with increased separation around humans. In dynamic environments, speed adjusted to 1.0 m/s near hard obstacles, while reducing to 0.6 m/s with higher deflection to safely avoid moving humans.
Authors:Yi Dong, Zhongguo Li, Sarvapali D. Ramchurn, Xiaowei Huang
Abstract:
This paper develops a distributed Nash Equilibrium seeking algorithm for heterogeneous multi-robot systems. The algorithm utilises distributed optimisation and output control to achieve the Nash equilibrium by leveraging information shared among neighbouring robots. Specifically, we propose a distributed optimisation algorithm that calculates the Nash equilibrium as a tailored reference for each robot and designs output control laws for heterogeneous multi-robot systems to track it in an aggregative game. We prove that our algorithm is guaranteed to converge and result in efficient outcomes. The effectiveness of our approach is demonstrated through numerical simulations and empirical testing with physical robots.
Authors:Shulan Ruan, Rongwei Wang, Xuchen Shen, Huijie Liu, Baihui Xiao, Jun Shi, Kun Zhang, Zhenya Huang, Yu Liu, Enhong Chen, You He
Abstract:
Multi-sensor fusion perception (MSFP) is a key technology for embodied AI, which can serve a variety of downstream tasks (e.g., 3D object detection and semantic segmentation) and application scenarios (e.g., autonomous driving and swarm robotics). Recently, impressive achievements on AI-based MSFP methods have been reviewed in relevant surveys. However, we observe that the existing surveys have some limitations after a rigorous and detailed investigation. For one thing, most surveys are oriented to a single task or research field, such as 3D object detection or autonomous driving. Therefore, researchers in other related tasks often find it difficult to benefit directly. For another, most surveys only introduce MSFP from a single perspective of multi-modal fusion, while lacking consideration of the diversity of MSFP methods, such as multi-view fusion and time-series fusion. To this end, in this paper, we hope to organize MSFP research from a task-agnostic perspective, where methods are reported from various technical views. Specifically, we first introduce the background of MSFP. Next, we review multi-modal and multi-agent fusion methods. A step further, time-series fusion methods are analyzed. In the era of LLM, we also investigate multimodal LLM fusion methods. Finally, we discuss open challenges and future directions for MSFP. We hope this survey can help researchers understand the important progress in MSFP and provide possible insights for future research.
Authors:Baiqing Wang, Helei Cui, Bo Zhang, Xiaolong Zheng, Bin Guo, Zhiwen Yu
Abstract:
Multi-robot systems have been widely deployed in real-world applications, providing significant improvements in efficiency and reductions in labor costs. However, most existing multi-robot collaboration methods rely on extensive task-specific training, which limits their adaptability to new or diverse scenarios. Recent research leverages the language understanding and reasoning capabilities of large language models (LLMs) to enable more flexible collaboration without specialized training. Yet, current LLM-empowered approaches remain inefficient: when confronted with identical or similar tasks, they must replan from scratch because they omit task-level similarities. To address this limitation, we propose MeCo, a similarity-aware multi-robot collaboration framework that applies the principle of ``cache and reuse'' (a.k.a., memoization) to reduce redundant computation. Unlike simple task repetition, identifying and reusing solutions for similar but not identical tasks is far more challenging, particularly in multi-robot settings. To this end, MeCo introduces a new similarity testing method that retrieves previously solved tasks with high relevance, enabling effective plan reuse without re-invoking LLMs. Furthermore, we present MeCoBench, the first benchmark designed to evaluate performance on similar-task collaboration scenarios. Experimental results show that MeCo substantially reduces planning costs and improves success rates compared with state-of-the-art approaches.
Authors:Haonan An, Zhengru Fang, Yuang Zhang, Senkang Hu, Xianhao Chen, Guowen Xu, Yuguang Fang
Abstract:
Connected and autonomous vehicles (CAVs) have garnered significant attention due to their extended perception range and enhanced sensing coverage. To address challenges such as blind spots and obstructions, CAVs employ vehicle-to-vehicle (V2V) communications to aggregate sensory data from surrounding vehicles. However, cooperative perception is often constrained by the limitations of achievable network throughput and channel quality. In this paper, we propose a channel-aware throughput maximization approach to facilitate CAV data fusion, leveraging a self-supervised autoencoder for adaptive data compression. We formulate the problem as a mixed integer programming (MIP) model, which we decompose into two sub-problems to derive optimal data rate and compression ratio solutions under given link conditions. An autoencoder is then trained to minimize bitrate with the determined compression ratio, and a fine-tuning strategy is employed to further reduce spectrum resource consumption. Experimental evaluation on the OpenCOOD platform demonstrates the effectiveness of our proposed algorithm, showing more than 20.19\% improvement in network throughput and a 9.38\% increase in average precision (AP@IoU) compared to state-of-the-art methods, with an optimal latency of 19.99 ms.
Authors:Rebekah Rousi, Niko Makitalo, Hooman Samani, Kai-Kristian Kemell, Jose Siqueira de Cerqueira, Ville Vakkuri, Tommi Mikkonen, Pekka Abrahamsson
Abstract:
The emergence of generative artificial intelligence (GAI) and large language models (LLMs) such ChatGPT has enabled the realization of long-harbored desires in software and robotic development. The technology however, has brought with it novel ethical challenges. These challenges are compounded by the application of LLMs in other machine learning systems, such as multi-robot systems. The objectives of the study were to examine novel ethical issues arising from the application of LLMs in multi-robot systems. Unfolding ethical issues in GPT agent behavior (deliberation of ethical concerns) was observed, and GPT output was compared with human experts. The article also advances a model for ethical development of multi-robot systems. A qualitative workshop-based method was employed in three workshops for the collection of ethical concerns: two human expert workshops (N=16 participants) and one GPT-agent-based workshop (N=7 agents; two teams of 6 agents plus one judge). Thematic analysis was used to analyze the qualitative data. The results reveal differences between the human-produced and GPT-based ethical concerns. Human experts placed greater emphasis on new themes related to deviance, data privacy, bias and unethical corporate conduct. GPT agents emphasized concerns present in existing AI ethics guidelines. The study contributes to a growing body of knowledge in context-specific AI ethics and GPT application. It demonstrates the gap between human expert thinking and LLM output, while emphasizing new ethical concerns emerging in novel technology.
Authors:Yucheng Sheng, Le Liang, Hao Ye, Shi Jin, Geoffrey Ye Li
Abstract:
Cooperative perception, offering a wider field of view than standalone perception, is becoming increasingly crucial in autonomous driving. This perception is enabled through vehicle-to-vehicle (V2V) communication, allowing connected automated vehicles (CAVs) to exchange sensor data, such as light detection and ranging (LiDAR) point clouds, thereby enhancing the collective understanding of the environment. In this paper, we leverage an importance map to distill critical semantic information, introducing a cooperative perception semantic communication framework that employs intermediate fusion. To counter the challenges posed by time-varying multipath fading, our approach incorporates the use of orthogonal frequency-division multiplexing (OFDM) along with channel estimation and equalization strategies. Furthermore, recognizing the necessity for reliable transmission, especially in the low SNR scenarios, we introduce a novel semantic error detection method that is integrated with our semantic communication framework in the spirit of hybrid automatic repeated request (HARQ). Simulation results show that our model surpasses the traditional separate source-channel coding methods in perception performance, both with and without HARQ. Additionally, in terms of throughput, our proposed HARQ schemes demonstrate superior efficiency to the conventional coding approaches.
Authors:Nikolaos Stathoulopoulos, Vidya Sumathy, Christoforos Kanellakis, George Nikolakopoulos
Abstract:
Recent advances in robotics are driving real-world autonomy for long-term and large-scale missions, where loop closures via place recognition are vital for mitigating pose estimation drift. However, achieving real-time performance remains challenging for resource-constrained mobile robots and multi-robot systems due to the computational burden of high-density sampling, which increases the complexity of comparing and verifying query samples against a growing map database. Conventional methods often retain redundant information or miss critical data by relying on fixed sampling intervals or operating in 3-D space instead of the descriptor feature space. To address these challenges, we introduce the concept of sample space and propose a novel keyframe sampling approach for LiDAR-based place recognition. Our method minimizes redundancy while preserving essential information in the hyper-dimensional descriptor space, supporting both learning-based and handcrafted descriptors. The proposed approach incorporates a sliding window optimization strategy to ensure efficient keyframe selection and real-time performance, enabling seamless integration into robotic pipelines. In sum, our approach demonstrates robust performance across diverse datasets, with the ability to adapt seamlessly from indoor to outdoor scenarios without parameter tuning, reducing loop closure detection times and memory requirements.
Authors:Victoria Marie Tuck, Hardik Parwana, Pei-Wei Chen, Georgios Fainekos, Bardh Hoxha, Hideki Okamoto, S. Shankar Sastry, Sanjit A. Seshia
Abstract:
This paper introduces MRTA-Sim, a Python/ROS2/Gazebo simulator for testing approaches to Multi-Robot Task Allocation (MRTA) problems on simulated robots in complex, indoor environments. Grid-based approaches to MRTA problems can be too restrictive for use in complex, dynamic environments such in warehouses, department stores, hospitals, etc. However, approaches that operate in free-space often operate at a layer of abstraction above the control and planning layers of a robot and make an assumption on approximate travel time between points of interest in the system. These abstractions can neglect the impact of the tight space and multi-agent interactions on the quality of the solution. Therefore, MRTA solutions should be tested with the navigation stacks of the robots in mind, taking into account robot planning, conflict avoidance between robots, and human interaction and avoidance. This tool connects the allocation output of MRTA solvers to individual robot planning using the NAV2 stack and local, centralized multi-robot deconfliction using Control Barrier Function-Quadrtic Programs (CBF-QPs), creating a platform closer to real-world operation for more comprehensive testing of these approaches. The simulation architecture is modular so that users can swap out methods at different levels of the stack. We show the use of our system with a Satisfiability Modulo Theories (SMT)-based approach to dynamic MRTA on a fleet of indoor delivery robots.
Authors:Martin Schuck, Dinushka Orrin Dahanaggamaarachchi, Ben Sprenger, Vedant Vyas, Siqi Zhou, Angela P. Schoellig
Abstract:
Drone swarm performances -- synchronized, expressive aerial displays set to music -- have emerged as a captivating application of modern robotics. Yet designing smooth, safe choreographies remains a complex task requiring expert knowledge. We present SwarmGPT, a language-based choreographer that leverages the reasoning power of large language models (LLMs) to streamline drone performance design. The LLM is augmented by a safety filter that ensures deployability by making minimal corrections when safety or feasibility constraints are violated. By decoupling high-level choreographic design from low-level motion planning, our system enables non-experts to iteratively refine choreographies using natural language without worrying about collisions or actuator limits. We validate our approach through simulations with swarms up to 200 drones and real-world experiments with up to 20 drones performing choreographies to diverse types of songs, demonstrating scalable, synchronized, and safe performances. Beyond entertainment, this work offers a blueprint for integrating foundation models into safety-critical swarm robotics applications.
Authors:Ashish Bastola, Hao Wang, Abolfazl Razi
Abstract:
Anomaly detection is a critical requirement for ensuring safety in autonomous driving. In this work, we leverage Cooperative Perception to share information across nearby vehicles, enabling more accurate identification and consensus of anomalous behaviors in complex traffic scenarios. To account for the real-world challenge of imperfect communication, we propose a cooperative-perception-based anomaly detection framework (CPAD), which is a robust architecture that remains effective under communication interruptions, thereby facilitating reliable performance even in low-bandwidth settings. Since no multi-agent anomaly detection dataset exists for vehicle trajectories, we introduce 15,000 different scenarios with a 90,000 trajectories benchmark dataset generated through rule-based vehicle dynamics analysis. Empirical results demonstrate that our approach outperforms standard anomaly classification methods in F1-score, AUC and showcase strong robustness to agent connection interruptions.
Authors:Xiaofan Yu, Yuwei Wu, Katherine Mao, Ye Tian, Vijay Kumar, Tajana Rosing
Abstract:
Multi-robot target tracking is a fundamental problem that requires coordinated monitoring of dynamic entities in applications such as precision agriculture, environmental monitoring, disaster response, and security surveillance. While Federated Learning (FL) has the potential to enhance learning across multiple robots without centralized data aggregation, its use in multi-Unmanned Aerial Vehicle (UAV) target tracking remains largely underexplored. Key challenges include limited onboard computational resources, significant data heterogeneity in FL due to varying targets and the fields of view, and the need for tight coupling between trajectory prediction and multi-robot planning. In this paper, we introduce DroneFL, the first federated learning framework specifically designed for efficient multi-UAV target tracking. We design a lightweight local model to predict target trajectories from sensor inputs, using a frozen YOLO backbone and a shallow transformer for efficient onboard training. The updated models are periodically aggregated in the cloud for global knowledge sharing. To alleviate the data heterogeneity that hinders FL convergence, DroneFL introduces a position-invariant model architecture with altitude-based adaptive instance normalization. Finally, we fuse predictions from multiple UAVs in the cloud and generate optimal trajectories that balance target prediction accuracy and overall tracking performance. Our results show that DroneFL reduces prediction error by 6%-83% and tracking distance by 0.4%-4.6% compared to a distributed non-FL framework. In terms of efficiency, DroneFL runs in real time on a Raspberry Pi 5 and has on average just 1.56 KBps data rate to the cloud.
Authors:Zhehui Huang, Guangyao Shi, Yuwei Wu, Vijay Kumar, Gaurav S. Sukhatme
Abstract:
Multi-robot coordination has traditionally relied on a mission-specific and expert-driven pipeline, where natural language mission descriptions are manually translated by domain experts into mathematical formulation, algorithm design, and executable code. This conventional process is labor-intensive, inaccessible to non-experts, and inflexible to changes in mission requirements. Here, we propose LAN2CB (Language to Collective Behavior), a novel framework that leverages large language models (LLMs) to streamline and generalize the multi-robot coordination pipeline. LAN2CB transforms natural language (NL) mission descriptions into executable Python code for multi-robot systems through two core modules: (1) Mission Analysis, which parses mission descriptions into behavior trees, and (2) Code Generation, which leverages the behavior tree and a structured knowledge base to generate robot control code. We further introduce a dataset of natural language mission descriptions to support development and benchmarking. Experiments in both simulation and real-world environments demonstrate that LAN2CB enables robust and flexible multi-robot coordination from natural language, significantly reducing manual engineering effort and supporting broad generalization across diverse mission types. Website: https://sites.google.com/view/lan-cb
Authors:Yuwei Wu, Yuezhan Tao, Peihan Li, Guangyao Shi, Gaurav S. Sukhatme, Vijay Kumar, Lifeng Zhou
Abstract:
Real-time multi-robot coordination in hazardous and adversarial environments requires fast, reliable adaptation to dynamic threats. While Large Language Models (LLMs) offer strong high-level reasoning capabilities, the lack of safety guarantees limits their direct use in critical decision-making. In this paper, we propose a hierarchical optimization framework that integrates LLMs into the decision loop for multi-robot target tracking in dynamic and hazardous environments. Rather than generating control actions directly, LLMs are used to generate task configuration and adjust parameters in a bi-level task allocation and planning problem. We formulate multi-robot coordination for tracking tasks as a bi-level optimization problem, with LLMs to reason about potential hazards in the environment and the status of the robot team and modify both the inner and outer levels of the optimization. This hierarchical approach enables real-time adjustments to the robots' behavior. Additionally, a human supervisor can offer broad guidance and assessments to address unexpected dangers, model mismatches, and performance issues arising from local minima. We validate our proposed framework in both simulation and real-world experiments with comprehensive evaluations, demonstrating its effectiveness and showcasing its capability for safe LLM integration for multi-robot systems.
Authors:Peihan Li, Yuwei Wu, Jiazhen Liu, Gaurav S. Sukhatme, Vijay Kumar, Lifeng Zhou
Abstract:
Multi-robot collaboration for target tracking in adversarial environments poses significant challenges, including system failures, dynamic priority shifts, and other unpredictable factors. These challenges become even more pronounced when the environment is unknown. In this paper, we propose a resilient coordination framework for multi-robot, multi-target tracking in environments with unknown sensing and communication danger zones. We consider scenarios where failures caused by these danger zones are probabilistic and temporary, allowing robots to escape from danger zones to minimize the risk of future failures. We formulate this problem as a nonlinear optimization with soft chance constraints, enabling real-time adjustments to robot behaviors based on varying types of dangers and failures. This approach dynamically balances target tracking performance and resilience, adapting to evolving sensing and communication conditions in real-time. To validate the effectiveness of the proposed method, we assess its performance across various tracking scenarios, benchmark it against methods without resilient adaptation and collaboration, and conduct several real-world experiments.
Authors:Mengmeng Zhu, Yuxuan Sun, Yukuan Jia, Wei Chen, Bo Ai, Sheng Zhou
Abstract:
Collaborative perception (CP) is a critical technology in applications like autonomous driving and smart cities. It involves the sharing and fusion of information among sensors to overcome the limitations of individual perception, such as blind spots and range limitations. However, CP faces two primary challenges. First, due to the dynamic nature of the environment, the timeliness of the transmitted information is critical to perception performance. Second, with limited computational power at the sensors and constrained wireless bandwidth, the communication volume must be carefully designed to ensure feature representations are both effective and sufficient. This work studies the dynamic scheduling problem in a multi-region CP scenario, and presents a Timeliness-Aware Multi-region Prioritized (TAMP) scheduling algorithm to trade-off perception accuracy and communication resource usage. Timeliness reflects the utility of information that decays as time elapses, which is manifested by the perception performance in CP tasks. We propose an empirical penalty function that maps the joint impact of Age of Information (AoI) and communication volume to perception performance. Aiming to minimize this timeliness-oriented penalty in the long-term, and recognizing that scheduling decisions have a cumulative effect on subsequent system states, we propose the TAMP scheduling algorithm. TAMP is a Lyapunov-based optimization policy that decomposes the long-term average objective into a per-slot prioritization problem, balancing the scheduling worth against resource cost. We validate our algorithm in both intersection and corridor scenarios with the real-world Roadside Cooperative perception (RCooper) dataset. Extensive simulations demonstrate that TAMP outperforms the best-performing baseline, achieving an Average Precision (AP) improvement of up to 27% across various configurations.
Authors:Zhe Wang, Shaocong Xu, Xucai Zhuang, Tongda Xu, Yan Wang, Jingjing Liu, Yilun Chen, Ya-Qin Zhang
Abstract:
Cooperative perception enhances the individual perception capabilities of autonomous vehicles (AVs) by providing a comprehensive view of the environment. However, balancing perception performance and transmission costs remains a significant challenge. Current approaches that transmit region-level features across agents are limited in interpretability and demand substantial bandwidth, making them unsuitable for practical applications. In this work, we propose CoopDETR, a novel cooperative perception framework that introduces object-level feature cooperation via object query. Our framework consists of two key modules: single-agent query generation, which efficiently encodes raw sensor data into object queries, reducing transmission cost while preserving essential information for detection; and cross-agent query fusion, which includes Spatial Query Matching (SQM) and Object Query Aggregation (OQA) to enable effective interaction between queries. Our experiments on the OPV2V and V2XSet datasets demonstrate that CoopDETR achieves state-of-the-art performance and significantly reduces transmission costs to 1/782 of previous methods.
Authors:Jiazhao Liang, Hao Huang, Yu Hao, Geeta Chandra Raju Bethala, Congcong Wen, John-Ross Rizzo, Yi Fang
Abstract:
Recent advancements in Large Language Models (LLMs) have demonstrated substantial capabilities in enhancing communication and coordination in multi-robot systems. However, existing methods often struggle to achieve efficient collaboration and decision-making in dynamic and uncertain environments, which are common in real-world multi-robot scenarios. To address these challenges, we propose a novel retrospective actor-critic framework for multi-robot collaboration. This framework integrates two key components: (1) an actor that performs real-time decision-making based on observations and task directives, and (2) a critic that retrospectively evaluates the outcomes to provide feedback for continuous refinement, such that the proposed framework can adapt effectively to dynamic conditions. Extensive experiments conducted in simulated environments validate the effectiveness of our approach, demonstrating significant improvements in task performance and adaptability. This work offers a robust solution to persistent challenges in robotic collaboration.
Authors:Jiaxun Zhang, Qian Xu, Zhenning Li, Chengzhong Xu, Keqiang Li
Abstract:
Vehicle-to-Everything (V2X) cooperation is reshaping traffic safety from an ego-centric sensing problem into one of collective intelligence. This survey structures recent progress within a unified Sensor-Perception-Decision (SPD) framework that formalizes how safety emerges from the interaction of distributed sensing, cooperative perception, and coordinated decision-making across vehicles and infrastructure. Rather than centering on link protocols or message formats, we focus on how shared evidence, predictive reasoning, and human-aligned interventions jointly enable proactive risk mitigation. Within this SPD lens, we synthesize advances in cooperative perception, multi-modal forecasting, and risk-aware planning, emphasizing how cross-layer coupling turns isolated detections into calibrated, actionable understanding. Timing, trust, and human factors are identified as cross-cutting constraints that determine whether predictive insights are delivered early enough, with reliable confidence, and in forms that humans and automated controllers can use. Compared with prior V2X safety surveys, this work (i) organizes the literature around a formal SPD safety loop and (ii) systematically analyzes research evolution and evaluation gaps through a PRISMA-guided bibliometric study of hundreds of publications from 2016-2025. The survey concludes with a roadmap toward cooperative safety intelligence, outlining SPD-based design principles and evaluation practices for next-generation V2X safety systems.
Authors:Damian Owerko, Frederic Vatnsdal, Saurav Agarwal, Vijay Kumar, Alejandro Ribeiro
Abstract:
This article presents a novel multi-agent spatial transformer (MAST) for learning communication policies in large-scale decentralized and collaborative multi-robot systems (DC-MRS). Challenges in collaboration in DC-MRS arise from: (i) partial observable states as robots make only localized perception, (ii) limited communication range with no central server, and (iii) independent execution of actions. The robots need to optimize a common task-specific objective, which, under the restricted setting, must be done using a communication policy that exhibits the desired collaborative behavior. The proposed MAST is a decentralized transformer architecture that learns communication policies to compute abstract information to be shared with other agents and processes the received information with the robot's own observations. The MAST extends the standard transformer with new positional encoding strategies and attention operations that employ windowing to limit the receptive field for MRS. These are designed for local computation, shift-equivariance, and permutation equivariance, making it a promising approach for DC-MRS. We demonstrate the efficacy of MAST on decentralized assignment and navigation (DAN) and decentralized coverage control. Efficiently trained using imitation learning in a centralized setting, the decentralized MAST policy is robust to communication delays, scales to large teams, and performs better than the baselines and other learning-based approaches.
Authors:Mingyue Lei, Zewei Zhou, Hongchen Li, Jia Hu, Jiaqi Ma
Abstract:
Risk quantification is a critical component of safe autonomous driving, however, constrained by the limited perception range and occlusion of single-vehicle systems in complex and dense scenarios. Vehicle-to-everything (V2X) paradigm has been a promising solution to sharing complementary perception information, nevertheless, how to ensure the risk interpretability while understanding multi-agent interaction with V2X remains an open question. In this paper, we introduce the first V2X-enabled risk quantification pipeline, CooperRisk, to fuse perception information from multiple agents and quantify the scenario driving risk in future multiple timestamps. The risk is represented as a scenario risk map to ensure interpretability based on risk severity and exposure, and the multi-agent interaction is captured by the learning-based cooperative prediction model. We carefully design a risk-oriented transformer-based prediction model with multi-modality and multi-agent considerations. It aims to ensure scene-consistent future behaviors of multiple agents and avoid conflicting predictions that could lead to overly conservative risk quantification and cause the ego vehicle to become overly hesitant to drive. Then, the temporal risk maps could serve to guide a model predictive control planner. We evaluate the CooperRisk pipeline in a real-world V2X dataset V2XPnP, and the experiments demonstrate its superior performance in risk quantification, showing a 44.35% decrease in conflict rate between the ego vehicle and background traffic participants.
Authors:Hao Xiang, Zhaoliang Zheng, Xin Xia, Seth Z. Zhao, Letian Gao, Zewei Zhou, Tianhui Cai, Yun Zhang, Jiaqi Ma
Abstract:
Cooperative perception enabled by Vehicle-to-Everything (V2X) communication holds significant promise for enhancing the perception capabilities of autonomous vehicles, allowing them to overcome occlusions and extend their field of view. However, existing research predominantly relies on simulated environments or static datasets, leaving the feasibility and effectiveness of V2X cooperative perception especially for intermediate fusion in real-world scenarios largely unexplored. In this work, we introduce V2X-ReaLO, an open online cooperative perception framework deployed on real vehicles and smart infrastructure that integrates early, late, and intermediate fusion methods within a unified pipeline and provides the first practical demonstration of online intermediate fusion's feasibility and performance under genuine real-world conditions. Additionally, we present an open benchmark dataset specifically designed to assess the performance of online cooperative perception systems. This new dataset extends V2X-Real dataset to dynamic, synchronized ROS bags and provides 25,028 test frames with 6,850 annotated key frames in challenging urban scenarios. By enabling real-time assessments of perception accuracy and communication lantency under dynamic conditions, V2X-ReaLO sets a new benchmark for advancing and optimizing cooperative perception systems in real-world applications. The codes and datasets will be released to further advance the field.
Authors:Zewei Zhou, Hao Xiang, Zhaoliang Zheng, Seth Z. Zhao, Mingyue Lei, Yun Zhang, Tianhui Cai, Xinyi Liu, Johnson Liu, Maheswari Bajji, Xin Xia, Zhiyu Huang, Bolei Zhou, Jiaqi Ma
Abstract:
Vehicle-to-everything (V2X) technologies offer a promising paradigm to mitigate the limitations of constrained observability in single-vehicle systems. Prior work primarily focuses on single-frame cooperative perception, which fuses agents' information across different spatial locations but ignores temporal cues and temporal tasks (e.g., temporal perception and prediction). In this paper, we focus on the spatio-temporal fusion in V2X scenarios and design one-step and multi-step communication strategies (when to transmit) as well as examine their integration with three fusion strategies - early, late, and intermediate (what to transmit), providing comprehensive benchmarks with 11 fusion models (how to fuse). Furthermore, we propose V2XPnP, a novel intermediate fusion framework within one-step communication for end-to-end perception and prediction. Our framework employs a unified Transformer-based architecture to effectively model complex spatio-temporal relationships across multiple agents, frames, and high-definition maps. Moreover, we introduce the V2XPnP Sequential Dataset that supports all V2X collaboration modes and addresses the limitations of existing real-world datasets, which are restricted to single-frame or single-mode cooperation. Extensive experiments demonstrate that our framework outperforms state-of-the-art methods in both perception and prediction tasks.
Authors:Junkai Jiang, Yihe Chen, Yibin Yang, Ruochen Li, Shaobing Xu, Jianqiang Wang
Abstract:
Multi-vehicle trajectory planning (MVTP) is one of the key challenges in multi-robot systems (MRSs) and has broad applications across various fields. This paper presents ESCoT, an enhanced step-based coordinate trajectory planning method for multiple car-like robots. ESCoT incorporates two key strategies: collaborative planning for local robot groups and replanning for duplicate configurations. These strategies effectively enhance the performance of step-based MVTP methods. Through extensive experiments, we show that ESCoT 1) in sparse scenarios, significantly improves solution quality compared to baseline step-based method, achieving up to 70% improvement in typical conflict scenarios and 34% in randomly generated scenarios, while maintaining high solving efficiency; and 2) in dense scenarios, outperforms all baseline methods, maintains a success rate of over 50% even in the most challenging configurations. The results demonstrate that ESCoT effectively solves MVTP, further extending the capabilities of step-based methods. Finally, practical robot tests validate the algorithm's applicability in real-world scenarios.
Authors:Junho Choi, Kihwan Ryoo, Jeewon Kim, Taeyun Kim, Eungchang Lee, Myeongwoo Jeong, Kevin Christiansen Marsim, Hyungtae Lim, Hyun Myung
Abstract:
Multi-robot localization is a crucial task for implementing multi-robot systems. Numerous researchers have proposed optimization-based multi-robot localization methods that use camera, IMU, and UWB sensors. Nevertheless, characteristics of individual robot odometry estimates and distance measurements between robots used in the optimization are not sufficiently considered. In addition, previous researches were heavily influenced by the odometry accuracy that is estimated from individual robots. Consequently, long-term drift error caused by error accumulation is potentially inevitable. In this paper, we propose a novel visual-inertial-range-based multi-robot localization method, named SaWa-ML, which enables geometric structure-aware pose correction and weight adaptation-based robust multi-robot localization. Our contributions are twofold: (i) we leverage UWB sensor data, whose range error does not accumulate over time, to first estimate the relative positions between robots and then correct the positions of each robot, thus reducing long-term drift errors, (ii) we design adaptive weights for robot pose correction by considering the characteristics of the sensor data and visual-inertial odometry estimates. The proposed method has been validated in real-world experiments, showing a substantial performance increase compared with state-of-the-art algorithms.
Authors:Haejoon Lee, Dimitra Panagou
Abstract:
Ensuring resilient consensus in multi-robot systems with misbehaving agents remains a challenge, as many existing network resilience properties are inherently combinatorial and globally defined. While previous works have proposed control laws to enhance or preserve resilience in multi-robot networks, they often assume a fixed topology with known resilience properties, or require global state knowledge. These assumptions may be impractical in physically-constrained environments, where safety and resilience requirements are conflicting, or when misbehaving agents share inaccurate state information. In this work, we propose a distributed control law that enables each robot to guarantee resilient consensus and safety during its navigation without fixed topologies using only locally available information. To this end, we establish a sufficient condition for resilient consensus in time-varying networks based on the degree of non-misbehaving or normal agents. Using this condition, we design a Control Barrier Function (CBF)-based controller that guarantees resilient consensus and collision avoidance without requiring estimates of global state and/or control actions of all other robots. Finally, we validate our method through simulations.
Authors:Jason Hughes, Marcel Hussing, Edward Zhang, Shenbagaraj Kannapiran, Joshua Caswell, Kenneth Chaney, Ruichen Deng, Michaela Feehery, Agelos Kratimenos, Yi Fan Li, Britny Major, Ethan Sanchez, Sumukh Shrote, Youkang Wang, Jeremy Wang, Daudi Zein, Luying Zhang, Ruijun Zhang, Alex Zhou, Tenzi Zhouga, Jeremy Cannon, Zaffir Qasim, Jay Yelon, Fernando Cladera, Kostas Daniilidis, Camillo J. Taylor, Eric Eaton
Abstract:
This report presents a heterogeneous robotic system designed for remote primary triage in mass-casualty incidents (MCIs). The system employs a coordinated air-ground team of unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs) to locate victims, assess their injuries, and prioritize medical assistance without risking the lives of first responders. The UAV identify and provide overhead views of casualties, while UGVs equipped with specialized sensors measure vital signs and detect and localize physical injuries. Unlike previous work that focused on exploration or limited medical evaluation, this system addresses the complete triage process: victim localization, vital sign measurement, injury severity classification, mental status assessment, and data consolidation for first responders. Developed as part of the DARPA Triage Challenge, this approach demonstrates how multi-robot systems can augment human capabilities in disaster response scenarios to maximize lives saved.
Authors:Kaleb Ben Naveed, Devansh R. Agrawal, Rahul Kumar, Dimitra Panagou
Abstract:
Autonomous robots are increasingly deployed for long-term information-gathering tasks, which pose two key challenges: planning informative trajectories in environments that evolve across space and time, and ensuring persistent operation under energy constraints. This paper presents a unified framework, mEclares, that addresses both challenges through adaptive ergodic search and energy-aware scheduling in multi-robot systems. Our contributions are two-fold: (1) we model real-world variability using stochastic spatiotemporal environments, where the underlying information evolves unpredictably due to process uncertainty. To guide exploration, we construct a target information spatial distribution (TISD) based on clarity, a metric that captures the decay of information in the absence of observations and highlights regions of high uncertainty; and (2) we introduce Robustmesch (Rmesch), an online scheduling method that enables persistent operation by coordinating rechargeable robots sharing a single mobile charging station. Unlike prior work, our approach avoids reliance on preplanned schedules, static or dedicated charging stations, and simplified robot dynamics. Instead, the scheduler supports general nonlinear models, accounts for uncertainty in the estimated position of the charging station, and handles central node failures. The proposed framework is validated through real-world hardware experiments, and feasibility guarantees are provided under specific assumptions.
Authors:Irving Solis, James Motes, Mike Qin, Marco Morales, Nancy M. Amato
Abstract:
Multi-robot systems enhance efficiency and productivity across various applications, from manufacturing to surveillance. While single-robot motion planning has improved by using databases of prior solutions, extending this approach to multi-robot motion planning (MRMP) presents challenges due to the increased complexity and diversity of tasks and configurations. Recent discrete methods have attempted to address this by focusing on relevant lower-dimensional subproblems, but they are inadequate for complex scenarios like those involving manipulator robots. To overcome this, we propose a novel approach that %leverages experience-based planning by constructs and utilizes databases of solutions for smaller sub-problems. By focusing on interactions between fewer robots, our method reduces the need for exhaustive database growth, allowing for efficient handling of more complex MRMP scenarios. We validate our approach with experiments involving both mobile and manipulator robots, demonstrating significant improvements over existing methods in scalability and planning efficiency. Our contributions include a rapidly constructed database for low-dimensional MRMP problems, a framework for applying these solutions to larger problems, and experimental validation with up to 32 mobile and 16 manipulator robots.
Authors:Walker Gosrich, Saurav Agarwal, Kashish Garg, Siddharth Mayya, Matthew Malencia, Mark Yim, Vijay Kumar
Abstract:
We propose a new formulation for the multi-robot task allocation problem that incorporates (a) complex precedence relationships between tasks, (b) efficient intra-task coordination, and (c) cooperation through the formation of robot coalitions. A task graph specifies the tasks and their relationships, and a set of reward functions models the effects of coalition size and preceding task performance. Maximizing task rewards is NP-hard; hence, we propose network flow-based algorithms to approximate solutions efficiently. A novel online algorithm performs iterative re-allocation, providing robustness to task failures and model inaccuracies to achieve higher performance than offline approaches. We comprehensively evaluate the algorithms in a testbed with random missions and reward functions and compare them to a mixed-integer solver and a greedy heuristic. Additionally, we validate the overall approach in an advanced simulator, modeling reward functions based on realistic physical phenomena and executing the tasks with realistic robot dynamics. Results establish efficacy in modeling complex missions and efficiency in generating high-fidelity task plans while leveraging task relationships.
Authors:Xiaoxiao Li, Zhirui Sun, Hongpeng Wang, Shuai Li, Jiankun Wang
Abstract:
Control barrier function (CBF)-based methods provide the minimum modification necessary to formally guarantee safety in the context of quadratic programming, and strict safety guarantee for safety critical systems. However, most CBF-related derivatives myopically focus on present safety at each time step, a reasoning over a look-ahead horizon is exactly missing. In this paper, a predictive safety matrix is constructed. We then consolidate the safety condition based on the smallest eigenvalue of the proposed safety matrix. A predefined deconfliction strategy of motion paths is embedded into the trajectory tracking module to manage deadlock conflicts, which computes the deadlock escape velocity with the minimum attitude angle. Comparison results show that the introduction of the predictive term is robust for measurement uncertainty and is immune to oscillations. The proposed deadlock avoidance method avoids a large detour, without obvious stagnation.
Authors:Xiaoxiao Li, Zhirui Sun, Mansha Zheng, Hongpeng Wang, Shuai Li, Jiankun Wang
Abstract:
One of the pivotal challenges in a multi-robot system is how to give attention to accuracy and efficiency while ensuring safety. Prior arts cannot strictly guarantee collision-free for an arbitrarily large number of robots or the results are considerably conservative. Smoothness of the avoidance trajectory also needs to be further optimized. This paper proposes an accelerationactuated simultaneous obstacle avoidance and trajectory tracking method for arbitrarily large teams of robots, that provides a nonconservative collision avoidance strategy and gives approaches for deadlock avoidance. We propose two ways of deadlock resolution, one involves incorporating an auxiliary velocity vector into the error function of the trajectory tracking module, which is proven to have no influence on global convergence of the tracking error. Furthermore, unlike the traditional methods that they address conflicts after a deadlock occurs, our decision-making mechanism avoids the near-zero velocity, which is much more safer and efficient in crowed environments. Extensive comparison show that the proposed method is superior to the existing studies when deployed in a large-scale robot system, with minimal invasiveness.
Authors:Ike Obi, Ruiqi Wang, Wonse Jo, Byung-Cheol Min
Abstract:
Trust is essential in human-robot collaboration, particularly in multi-human, multi-robot (MH-MR) teams, where it plays a crucial role in maintaining team cohesion in complex operational environments. Despite its importance, trust is rarely incorporated into task allocation and reallocation algorithms for MH-MR collaboration. While prior research in single-human, single-robot interactions has shown that integrating trust significantly enhances both performance outcomes and user experience, its role in MH-MR task allocation remains underexplored. In this paper, we introduce the Expectation Confirmation Trust (ECT) Model, a novel framework for modeling trust dynamics in MH-MR teams. We evaluate the ECT model against five existing trust models and a no-trust baseline to assess its impact on task allocation outcomes across different team configurations (2H-2R, 5H-5R, and 10H-10R). Our results show that the ECT model improves task success rate, reduces mean completion time, and lowers task error rates. These findings highlight the complexities of trust-based task allocation in MH-MR teams. We discuss the implications of incorporating trust into task allocation algorithms and propose future research directions for adaptive trust mechanisms that balance efficiency and performance in dynamic, multi-agent environments.
Authors:Zefeng Zhang, Xiangzhao Hao, Hengzhu Tang, Zhenyu Zhang, Jiawei Sheng, Xiaodong Li, Zhenyang Li, Li Gao, Daiting Shi, Dawei Yin, Tingwen Liu
Abstract:
Visual Spatial Reasoning is crucial for enabling Multimodal Large Language Models (MLLMs) to understand object properties and spatial relationships, yet current models still struggle with 3D-aware reasoning. Existing approaches typically enhance either perception, by augmenting RGB inputs with auxiliary modalities such as depth and segmentation, or reasoning, by training on spatial VQA datasets and applying reinforcement learning, and thus treat these two aspects in isolation. In this work, we investigate whether a unified MLLM can develop an intrinsic ability to enhance spatial perception and, through adaptive interleaved reasoning, achieve stronger spatial intelligence. We propose \textbf{COOPER}, a unified MLLM that leverages depth and segmentation as auxiliary modalities and is trained in two stages to acquire auxiliary modality generation and adaptive, interleaved reasoning capabilities. COOPER achieves an average \textbf{6.91\%} improvement in spatial reasoning while maintaining general performance. Moreover, even a variant trained only for auxiliary modality generation attains a \textbf{7.92\%} gain on distance and size estimation, suggesting that learning to generate auxiliary modalities helps internalize spatial knowledge and strengthen spatial understanding.
Authors:Phuc Nguyen Xuan, Thanh Nguyen Canh, Huu-Hung Nguyen, Nak Young Chong, Xiem HoangVan
Abstract:
This survey comprehensively reviews the evolving field of multi-robot collaborative Simultaneous Localization and Mapping (SLAM) using 3D Gaussian Splatting (3DGS). As an explicit scene representation, 3DGS has enabled unprecedented real-time, high-fidelity rendering, ideal for robotics. However, its use in multi-robot systems introduces significant challenges in maintaining global consistency, managing communication, and fusing data from heterogeneous sources. We systematically categorize approaches by their architecture -- centralized, distributed -- and analyze core components like multi-agent consistency and alignment, communication-efficient, Gaussian representation, semantic distillation, fusion and pose optimization, and real-time scalability. In addition, a summary of critical datasets and evaluation metrics is provided to contextualize performance. Finally, we identify key open challenges and chart future research directions, including lifelong mapping, semantic association and mapping, multi-model for robustness, and bridging the Sim2Real gap.
Authors:Zijie Zhao, Honglei Guo, Shengqian Chen, Kaixuan Xu, Bo Jiang, Yuanheng Zhu, Dongbin Zhao
Abstract:
Model-based reinforcement learning (MBRL) has shown significant potential in robotics due to its high sample efficiency and planning capability. However, extending MBRL to multi-robot cooperation remains challenging due to the complexity of joint dynamics and the reliance on synchronous communication. SeqWM employs independent, autoregressive agent-wise world models to represent joint dynamics, where each agent generates its future trajectory and plans its actions based on the predictions of its predecessors. This design lowers modeling complexity, alleviates the reliance on communication synchronization, and enables the emergence of advanced cooperative behaviors through explicit intention sharing. Experiments in challenging simulated environments (Bi-DexHands and Multi-Quad) demonstrate that SeqWM outperforms existing state-of-the-art model-based and model-free baselines in both overall performance and sample efficiency, while exhibiting advanced cooperative behaviors such as predictive adaptation, temporal alignment, and role division. Furthermore, SeqWM has been success fully deployed on physical quadruped robots, demonstrating its effectiveness in real-world multi-robot systems. Demos and code are available at: https://sites.google.com/view/seqwm-marl
Authors:Yunshuang Yuan, Yan Xia, Daniel Cremers, Monika Sester
Abstract:
Cooperative perception can increase the view field and decrease the occlusion of an ego vehicle, hence improving the perception performance and safety of autonomous driving. Despite the success of previous works on cooperative object detection, they mostly operate on dense Bird's Eye View (BEV) feature maps, which are computationally demanding and can hardly be extended to long-range detection problems. More efficient fully sparse frameworks are rarely explored. In this work, we design a fully sparse framework, SparseAlign, with three key features: an enhanced sparse 3D backbone, a query-based temporal context learning module, and a robust detection head specially tailored for sparse features. Extensive experimental results on both OPV2V and DairV2X datasets show that our framework, despite its sparsity, outperforms the state of the art with less communication bandwidth requirements. In addition, experiments on the OPV2Vt and DairV2Xt datasets for time-aligned cooperative object detection also show a significant performance gain compared to the baseline works.
Authors:Tim Felix Lakemann, Daniel Bonilla Licea, Viktor Walter, Tomáš BáÄa, Martin Saska
Abstract:
A novel onboard tracking approach enabling vision-based relative localization and communication using Active blinking Marker Tracking (AMT) is introduced in this article. Active blinking markers on multi-robot team members improve the robustness of relative localization for aerial vehicles in tightly coupled multi-robot systems during real-world deployments, while also serving as a resilient communication system. Traditional tracking algorithms struggle with fast-moving blinking markers due to their intermittent appearance in camera frames and the complexity of associating multiple of these markers across consecutive frames. AMT addresses this by using weighted polynomial regression to predict the future appearance of active blinking markers while accounting for uncertainty in the prediction. In outdoor experiments, the AMT approach outperformed state-of-the-art methods in tracking density, accuracy, and complexity. The experimental validation of this novel tracking approach for relative localization and optical communication involved testing motion patterns motivated by our research on agile multi-robot deployment.
Authors:Shreyas Muthusamy, Damian Owerko, Charilaos I. Kanatsoulis, Saurav Agarwal, Alejandro Ribeiro
Abstract:
Unlabeled motion planning involves assigning a set of robots to target locations while ensuring collision avoidance, aiming to minimize the total distance traveled. The problem forms an essential building block for multi-robot systems in applications such as exploration, surveillance, and transportation. We address this problem in a decentralized setting where each robot knows only the positions of its $k$-nearest robots and $k$-nearest targets. This scenario combines elements of combinatorial assignment and continuous-space motion planning, posing significant scalability challenges for traditional centralized approaches. To overcome these challenges, we propose a decentralized policy learned via a Graph Neural Network (GNN). The GNN enables robots to determine (1) what information to communicate to neighbors and (2) how to integrate received information with local observations for decision-making. We train the GNN using imitation learning with the centralized Hungarian algorithm as the expert policy, and further fine-tune it using reinforcement learning to avoid collisions and enhance performance. Extensive empirical evaluations demonstrate the scalability and effectiveness of our approach. The GNN policy trained on 100 robots generalizes to scenarios with up to 500 robots, outperforming state-of-the-art solutions by 8.6\% on average and significantly surpassing greedy decentralized methods. This work lays the foundation for solving multi-robot coordination problems in settings where scalability is important.
Authors:Jiaxi Huang, Yan Huang, Yixian Zhao, Wenchao Meng, Jinming Xu
Abstract:
Collaborative learning enhances the performance and adaptability of multi-robot systems in complex tasks but faces significant challenges due to high communication overhead and data heterogeneity inherent in multi-robot tasks. To this end, we propose CoCoL, a Communication efficient decentralized Collaborative Learning method tailored for multi-robot systems with heterogeneous local datasets. Leveraging a mirror descent framework, CoCoL achieves remarkable communication efficiency with approximate Newton-type updates by capturing the similarity between objective functions of robots, and reduces computational costs through inexact sub-problem solutions. Furthermore, the integration of a gradient tracking scheme ensures its robustness against data heterogeneity. Experimental results on three representative multi robot collaborative learning tasks show the superiority of the proposed CoCoL in significantly reducing both the number of communication rounds and total bandwidth consumption while maintaining state-of-the-art accuracy. These benefits are particularly evident in challenging scenarios involving non-IID (non-independent and identically distributed) data distribution, streaming data, and time-varying network topologies.
Authors:Zhaoliang Zheng, Yun Zhang, Zongling Meng, Johnson Liu, Xin Xia, Jiaqi Ma
Abstract:
Infrastructure sensing is vital for traffic monitoring at safety hotspots (e.g., intersections) and serves as the backbone of cooperative perception in autonomous driving. While vehicle sensing has been extensively studied, infrastructure sensing has received little attention, especially given the unique challenges of diverse intersection geometries, complex occlusions, varying traffic conditions, and ambient environments like lighting and weather. To address these issues and ensure cost-effective sensor placement, we propose Heterogeneous Multi-Modal Infrastructure Sensor Placement Evaluation (InSPE), a perception surrogate metric set that rapidly assesses perception effectiveness across diverse infrastructure and environmental scenarios with combinations of multi-modal sensors. InSPE systematically evaluates perception capabilities by integrating three carefully designed metrics, i.e., sensor coverage, perception occlusion, and information gain. To support large-scale evaluation, we develop a data generation tool within the CARLA simulator and also introduce Infra-Set, a dataset covering diverse intersection types and environmental conditions. Benchmarking experiments with state-of-the-art perception algorithms demonstrate that InSPE enables efficient and scalable sensor placement analysis, providing a robust solution for optimizing intelligent intersection infrastructure.
Authors:Zonglin Meng, Yun Zhang, Zhaoliang Zheng, Zhihao Zhao, Jiaqi Ma
Abstract:
Cooperative perception has attracted wide attention given its capability to leverage shared information across connected automated vehicles (CAVs) and smart infrastructures to address sensing occlusion and range limitation issues. However, existing research overlooks the fragile multi-sensor correlations in multi-agent settings, as the heterogeneous agent sensor measurements are highly susceptible to environmental factors, leading to weakened inter-agent sensor interactions. The varying operational conditions and other real-world factors inevitably introduce multifactorial noise and consequentially lead to multi-sensor misalignment, making the deployment of multi-agent multi-modality perception particularly challenging in the real world. In this paper, we propose AgentAlign, a real-world heterogeneous agent cross-modality feature alignment framework, to effectively address these multi-modality misalignment issues. Our method introduces a cross-modality feature alignment space (CFAS) and heterogeneous agent feature alignment (HAFA) mechanism to harmonize multi-modality features across various agents dynamically. Additionally, we present a novel V2XSet-noise dataset that simulates realistic sensor imperfections under diverse environmental conditions, facilitating a systematic evaluation of our approach's robustness. Extensive experiments on the V2X-Real and V2XSet-Noise benchmarks demonstrate that our framework achieves state-of-the-art performance, underscoring its potential for real-world applications in cooperative autonomous driving. The controllable V2XSet-Noise dataset and generation pipeline will be released in the future.
Authors:Serafino Cicerone, Alessia Di Fonso, Gabriele Di Stefano, Alfredo Navarra
Abstract:
In the field of swarm robotics, one of the most studied problem is Gathering. It asks for a distributed algorithm that brings the robots to a common location, not known in advance. We consider the case of robots constrained to move along the edges of a graph under the well-known OBLOT model. Gathering is then accomplished once all the robots occupy a same vertex. Differently from classical settings, we assume: i) the initial configuration may contain multiplicities, i.e. more than one robot may occupy the same vertex; ii) robots cannot detect multiplicities; iii) robots move along the edges of vertex- and edge-transitive graphs, i.e. graphs where all the vertices (and the edges, resp.) belong to a same class of equivalence. To balance somehow such a `hostile' setting, as a scheduler for the activation of the robots, we consider the round-robin, where robots are cyclically activated one at a time. We provide some basic impossibility results and we design two different algorithms approaching the Gathering for robots moving on two specific topologies belonging to edge- and vertex-transitive graphs: infinite grids and hypercubes. The two algorithms are both time-optimal and heavily exploit the properties of the underlying topologies. Because of this, we conjecture that no general algorithm can exist for all the solvable cases.
Authors:Serafino Cicerone, Alessia Di Fonso, Gabriele Di Stefano, Alfredo Navarra
Abstract:
The OBLOT model has been extensively studied in theoretical swarm robotics. It assumes weak capabilities for the involved mobile robots, such as they are anonymous, disoriented, no memory of past events (oblivious), and silent. Their only means of (implicit) communication is transferred to their positioning, i.e., stigmergic information. These limited capabilities make the design of distributed algorithms a challenging task. Over the last two decades, numerous research papers have addressed the question of which tasks can be accomplished within this model. Nevertheless, as it usually happens in distributed computing, also in OBLOT the computational power available to the robots is neglected as the main cost measures for the designed algorithms refer to the number of movements or the number of rounds required. In this paper, we prove that for synchronous robots moving on finite graphs, the unlimited computational power (other than finite time) has a significant impact. In fact, by exploiting it, we provide a definitive resolution algorithm that applies to a wide class of problems while guaranteeing the minimum number of moves and rounds.
Authors:Peihan Li, Jiazhen Liu, Yuwei Wu, Lifeng Zhou
Abstract:
Multi-robot coordination is crucial for autonomous systems, yet real-world deployments often encounter various failures. These include both temporary and permanent disruptions in sensing and communication, which can significantly degrade system robustness and performance if not explicitly modeled. Despite its practical importance, failure-aware coordination remains underexplored in the literature. To bridge the gap between idealized conditions and the complexities of real-world environments, we propose a unified failure-aware coordination framework designed to enable resilient and adaptive multi-robot target tracking under both temporary and permanent failure conditions. Our approach systematically distinguishes between two classes of failures: (1) probabilistic and temporary disruptions, where robots recover from intermittent sensing or communication losses by dynamically adapting paths and avoiding inferred danger zones, and (2) permanent failures, where robots lose sensing or communication capabilities irreversibly, requiring sustained, decentralized behavioral adaptation. To handle these scenarios, the robot team is partitioned into subgroups. Robots that remain connected form a communication group and collaboratively plan using partially centralized nonlinear optimization. Robots experiencing permanent disconnection or failure continue to operate independently through decentralized or individual optimization, allowing them to contribute to the task within their local context. We extensively evaluate our method across a range of benchmark variations and conduct a comprehensive assessment under diverse real-world failure scenarios. Results show that our framework consistently achieves robust performance in realistic environments with unknown danger zones, offering a practical and generalizable solution for the multi-robot systems community.
Authors:Peihan Li, Lifeng Zhou
Abstract:
Large Language Models (LLMs) have advanced rapidly in recent years, demonstrating strong capabilities in problem comprehension and reasoning. Inspired by these developments, researchers have begun exploring the use of LLMs as decentralized decision-makers for multi-robot formation control. However, prior studies reveal that directly applying LLMs to such tasks often leads to unstable and inconsistent behaviors, where robots may collapse to the centroid of their positions or diverge entirely due to hallucinated reasoning, logical inconsistencies, and limited coordination awareness. To overcome these limitations, we propose a novel framework that integrates LLMs with an influence-based plan consensus protocol. In this framework, each robot independently generates a local plan toward the desired formation using its own LLM. The robots then iteratively refine their plans through a decentralized consensus protocol that accounts for their influence on neighboring robots. This process drives the system toward a coherent and stable flocking formation in a fully decentralized manner. We evaluate our approach through comprehensive simulations involving both state-of-the-art closed-source LLMs (e.g., o3-mini, Claude 3.5) and open-source models (e.g., Llama3.1-405b, Qwen-Max, DeepSeek-R1). The results show notable improvements in stability, convergence, and adaptability over previous LLM-based methods. We further validate our framework on a physical team of Crazyflie drones, demonstrating its practical viability and effectiveness in real-world multi-robot systems.
Authors:Zhiying Song, Lei Yang, Fuxi Wen, Jun Li
Abstract:
Cooperative perception presents significant potential for enhancing the sensing capabilities of individual vehicles, however, inter-agent latency remains a critical challenge. Latencies cause misalignments in both spatial and semantic features, complicating the fusion of real-time observations from the ego vehicle with delayed data from others. To address these issues, we propose TraF-Align, a novel framework that learns the flow path of features by predicting the feature-level trajectory of objects from past observations up to the ego vehicle's current time. By generating temporally ordered sampling points along these paths, TraF-Align directs attention from the current-time query to relevant historical features along each trajectory, supporting the reconstruction of current-time features and promoting semantic interaction across multiple frames. This approach corrects spatial misalignment and ensures semantic consistency across agents, effectively compensating for motion and achieving coherent feature fusion. Experiments on two real-world datasets, V2V4Real and DAIR-V2X-Seq, show that TraF-Align sets a new benchmark for asynchronous cooperative perception.
Authors:Peihan Li, Zijian An, Shams Abrar, Lifeng Zhou
Abstract:
The rapid advancement of Large Language Models (LLMs) has opened new possibilities in Multi-Robot Systems (MRS), enabling enhanced communication, task planning, and human-robot interaction. Unlike traditional single-robot and multi-agent systems, MRS poses unique challenges, including coordination, scalability, and real-world adaptability. This survey provides the first comprehensive exploration of LLM integration into MRS. It systematically categorizes their applications across high-level task allocation, mid-level motion planning, low-level action generation, and human intervention. We highlight key applications in diverse domains, such as household robotics, construction, formation control, target tracking, and robot games, showcasing the versatility and transformative potential of LLMs in MRS. Furthermore, we examine the challenges that limit adapting LLMs in MRS, including mathematical reasoning limitations, hallucination, latency issues, and the need for robust benchmarking systems. Finally, we outline opportunities for future research, emphasizing advancements in fine-tuning, reasoning techniques, and task-specific models. This survey aims to guide researchers in the intelligence and real-world deployment of MRS powered by LLMs. Based on the fast-evolving nature of research in the field, we keep updating the papers in the open-source GitHub repository.
Authors:Xuekai Qiu, Pengming Zhu, Yiming Hu, Zhiwen Zeng, Huimin Lu
Abstract:
This paper presents a consensus-based payload algorithm (CBPA) to deal with the condition of robots' capability decrease for multi-robot task allocation. During the execution of complex tasks, robots' capabilities could decrease with the consumption of payloads, which causes a problem that the robot coalition would not meet the tasks' requirements in real time. The proposed CBPA is an enhanced version of the consensus-based bundle algorithm (CBBA) and comprises two primary core phases: the payload bundle construction and consensus phases. In the payload bundle construction phase, CBPA introduces a payload assignment matrix to track the payloads carried by the robots and the demands of multi-robot tasks in real time. Then, robots share their respective payload assignment matrix in the consensus phase. These two phases are iterated to dynamically adjust the number of robots performing multi-robot tasks and the number of tasks each robot performs and obtain conflict-free results to ensure that the robot coalition meets the demand and completes all tasks as quickly as possible. Physical experiment shows that CBPA is appropriate in complex and dynamic scenarios where robots need to collaborate and task requirements are tightly coupled to the robots' payloads. Numerical experiments show that CBPA has higher total task gains than CBBA.
Authors:Yansong Qu, Zixuan Xu, Zilin Huang, Zihao Sheng, Tiantian Chen, Sikai Chen
Abstract:
Semantic scene completion (SSC) is essential for achieving comprehensive perception in autonomous driving systems. However, existing SSC methods often overlook the high deployment costs in real-world applications. Traditional architectures, such as 3D Convolutional Neural Networks (3D CNNs) and self-attention mechanisms, face challenges in efficiently capturing long-range dependencies within 3D voxel grids, limiting their effectiveness. To address these issues, we introduce MetaSSC, a novel meta-learning-based framework for SSC that leverages deformable convolution, large-kernel attention, and the Mamba (D-LKA-M) model. Our approach begins with a voxel-based semantic segmentation (SS) pretraining task, aimed at exploring the semantics and geometry of incomplete regions while acquiring transferable meta-knowledge. Using simulated cooperative perception datasets, we supervise the perception training of a single vehicle using aggregated sensor data from multiple nearby connected autonomous vehicles (CAVs), generating richer and more comprehensive labels. This meta-knowledge is then adapted to the target domain through a dual-phase training strategy that does not add extra model parameters, enabling efficient deployment. To further enhance the model's capability in capturing long-sequence relationships within 3D voxel grids, we integrate Mamba blocks with deformable convolution and large-kernel attention into the backbone network. Extensive experiments demonstrate that MetaSSC achieves state-of-the-art performance, significantly outperforming competing models while also reducing deployment costs.
Authors:Kui Wang, Kazuma Nonomura, Zongdian Li, Tao Yu, Kei Sakaguchi, Omar Hashash, Walid Saad, Changyang She, Yonghui Li
Abstract:
Vehicle-road collaboration is a promising approach for enhancing the safety and efficiency of autonomous driving by extending the intelligence of onboard systems to smart roadside infrastructures. The introduction of digital twins (DTs), particularly local DTs (LDTs) at the edge, in smart mobility presents a new embodiment of augmented intelligence, which could enhance information exchange and extract human driving expertise to improve onboard intelligence. This paper presents a novel LDT-assisted hybrid autonomous driving system for improving safety and efficiency in traffic intersections. By leveraging roadside units (RSUs) equipped with sensory and computing capabilities, the proposed system continuously monitors traffic, extracts human driving knowledge, and generates intersection-specific local driving agents through an offline reinforcement learning (RL) framework. When connected and automated vehicles (CAVs) pass through RSU-equipped intersections, RSUs can provide local agents to support safe and efficient driving in local areas. Meanwhile, they provide real-time cooperative perception (CP) to broaden onboard sensory horizons. The proposed LDT-assisted hybrid system is implemented with state-of-the-art products, e.g., CAVs and RSUs, and technologies, e.g., millimeter-wave (mmWave) communications. Hardware-in-the-loop (HiL) simulations and proof-of-concept (PoC) tests validate system performance from two standpoints: (i) The peak latency for CP and local agent downloading are 8.51 ms and 146 ms, respectively, aligning with 3GPP requirements for vehicle-to-everything (V2X) and model transfer use cases. Moreover, (ii) local driving agents can improve safety measures by 10% and reduce travel time by 15% compared with conventional onboard systems. The implemented prototype also demonstrates reliable real-time performance, fulfilling the targets of the proposed system design.
Authors:Zhiqing Luo, Yi Wang, Yingying He, Wei Wang
Abstract:
Cooperative perception enables vehicles to share sensor readings and has become a new paradigm to improve driving safety, where the key enabling technology for realizing this vision is to real-time and accurately align and fuse the perceptions. Recent advances to align the views rely on high-density LiDAR data or fine-grained image feature representations, which however fail to meet the requirements of accuracy, real-time, and adaptability for autonomous driving. To this end, we present MMatch, a lightweight system that enables accurate and real-time perception fusion with mmWave radar point clouds. The key insight is that fine-grained spatial information provided by the radar present unique associations with all the vehicles even in two separate views. As a result, by capturing and understanding the unique local and global position of the targets in this association, we can quickly find out all the co-visible vehicles for view alignment. We implement MMatch on both the datasets collected from the CARLA platform and the real-world traffic with over 15,000 radar point cloud pairs. Experimental results show that MMatch achieves decimeter-level accuracy within 59ms, which significantly improves the reliability for autonomous driving.
Authors:Baolu Li, Zongzhe Xu, Jinlong Li, Xinyu Liu, Jianwu Fang, Xiaopeng Li, Hongkai Yu
Abstract:
LiDAR-based Vehicle-to-Everything (V2X) cooperative perception has demonstrated its impact on the safety and effectiveness of autonomous driving. Since current cooperative perception algorithms are trained and tested on the same dataset, the generalization ability of cooperative perception systems remains underexplored. This paper is the first work to study the Domain Generalization problem of LiDAR-based V2X cooperative perception (V2X-DG) for 3D detection based on four widely-used open source datasets: OPV2V, V2XSet, V2V4Real and DAIR-V2X. Our research seeks to sustain high performance not only within the source domain but also across other unseen domains, achieved solely through training on source domain. To this end, we propose Cooperative Mixup Augmentation based Generalization (CMAG) to improve the model generalization capability by simulating the unseen cooperation, which is designed compactly for the domain gaps in cooperative perception. Furthermore, we propose a constraint for the regularization of the robust generalized feature representation learning: Cooperation Feature Consistency (CFC), which aligns the intermediately fused features of the generalized cooperation by CMAG and the early fused features of the original cooperation in source domain. Extensive experiments demonstrate that our approach achieves significant performance gains when generalizing to other unseen datasets while it also maintains strong performance on the source dataset.
Authors:Bingyi Liu, Jian Teng, Hongfei Xue, Enshu Wang, Chuanhui Zhu, Pu Wang, Libing Wu
Abstract:
Collaborative perception significantly enhances individual vehicle perception performance through the exchange of sensory information among agents. However, real-world deployment faces challenges due to bandwidth constraints and inevitable calibration errors during information exchange. To address these issues, we propose mmCooper, a novel multi-agent, multi-stage, communication-efficient, and collaboration-robust cooperative perception framework. Our framework leverages a multi-stage collaboration strategy that dynamically and adaptively balances intermediate- and late-stage information to share among agents, enhancing perceptual performance while maintaining communication efficiency. To support robust collaboration despite potential misalignments and calibration errors, our framework prevents misleading low-confidence sensing information from transmission and refines the received detection results from collaborators to improve accuracy. The extensive evaluation results on both real-world and simulated datasets demonstrate the effectiveness of the mmCooper framework and its components.
Authors:Weizheng Wang, Aniket Bera, Byung-Cheol Min
Abstract:
A team of multiple robots seamlessly and safely working in human-filled public environments requires adaptive task allocation and socially-aware navigation that account for dynamic human behavior. Current approaches struggle with highly dynamic pedestrian movement and the need for flexible task allocation. We propose Hyper-SAMARL, a hypergraph-based system for multi-robot task allocation and socially-aware navigation, leveraging multi-agent reinforcement learning (MARL). Hyper-SAMARL models the environmental dynamics between robots, humans, and points of interest (POIs) using a hypergraph, enabling adaptive task assignment and socially-compliant navigation through a hypergraph diffusion mechanism. Our framework, trained with MARL, effectively captures interactions between robots and humans, adapting tasks based on real-time changes in human activity. Experimental results demonstrate that Hyper-SAMARL outperforms baseline models in terms of social navigation, task completion efficiency, and adaptability in various simulated scenarios.
Authors:Wanli Ni, Ruyu Luo, Xinran Zhang, Peng Wang, Wen Wang, Hui Tian
Abstract:
With the rapid development of artificial intelligence, robotics, and Internet of Things, multi-robot systems are progressively acquiring human-like environmental perception and understanding capabilities, empowering them to complete complex tasks through autonomous decision-making and interaction. However, the Internet of Robotic Things (IoRT) faces significant challenges in terms of spectrum resources, sensing accuracy, communication latency, and energy supply. To address these issues, a reconfigurable intelligent surface (RIS)-aided IoRT network is proposed to enhance the overall performance of robotic communication, sensing, computation, and energy harvesting. In the case studies, by jointly optimizing parameters such as transceiver beamforming, robot trajectories, and RIS coefficients, solutions based on multi-agent deep reinforcement learning and multi-objective optimization are proposed to solve problems such as beamforming design, path planning, target sensing, and data aggregation. Numerical results are provided to demonstrate the effectiveness of proposed solutions in improve communication quality, sensing accuracy, computation error, and energy efficiency of RIS-aided IoRT networks.
Authors:Kazuma Obata, Tatsuya Aoki, Takato Horii, Tadahiro Taniguchi, Takayuki Nagai
Abstract:
This study proposes LiP-LLM: integrating linear programming and dependency graph with large language models (LLMs) for multi-robot task planning. In order for multiple robots to perform tasks more efficiently, it is necessary to manage the precedence dependencies between tasks. Although multi-robot decentralized and centralized task planners using LLMs have been proposed, none of these studies focus on precedence dependencies from the perspective of task efficiency or leverage traditional optimization methods. It addresses key challenges in managing dependencies between skills and optimizing task allocation. LiP-LLM consists of three steps: skill list generation and dependency graph generation by LLMs, and task allocation using linear programming. The LLMs are utilized to generate a comprehensive list of skills and to construct a dependency graph that maps the relationships and sequential constraints among these skills. To ensure the feasibility and efficiency of skill execution, the skill list is generated by calculated likelihood, and linear programming is used to optimally allocate tasks to each robot. Experimental evaluations in simulated environments demonstrate that this method outperforms existing task planners, achieving higher success rates and efficiency in executing complex, multi-robot tasks. The results indicate the potential of combining LLMs with optimization techniques to enhance the capabilities of multi-robot systems in executing coordinated tasks accurately and efficiently. In an environment with two robots, a maximum success rate difference of 0.82 is observed in the language instruction group with a change in the object name.
Authors:Martin Huber, Nicola A. Cavalcanti, Ayoob Davoodi, Ruixuan Li, Christopher E. Mower, Fabio Carrillo, Christoph J. Laux, Francois Teyssere, Thibault Chandanson, Antoine Harlé, Elie Saghbiny, Mazda Farshad, Guillaume Morel, Emmanuel Vander Poorten, Philipp Fürnstahl, Sébastien Ourselin, Christos Bergeles, Tom Vercauteren
Abstract:
Despite their mechanical sophistication, surgical robots remain blind to their surroundings. This lack of spatial awareness causes collisions, system recoveries, and workflow disruptions, issues that will intensify with the introduction of distributed robots with independent interacting arms. Existing tracking systems rely on bulky infrared cameras and reflective markers, providing only limited views of the surgical scene and adding hardware burden in crowded operating rooms. We present a marker-free proprioception method that enables precise localisation of surgical robots under their sterile draping despite associated obstruction of visual cues. Our method solely relies on lightweight stereo-RGB cameras and novel transformer-based deep learning models. It builds on the largest multi-centre spatial robotic surgery dataset to date (1.4M self-annotated images from human cadaveric and preclinical in vivo studies). By tracking the entire robot and surgical scene, rather than individual markers, our approach provides a holistic view robust to occlusions, supporting surgical scene understanding and context-aware control. We demonstrate an example of potential clinical benefits during in vivo breathing compensation with access to tissue dynamics, unobservable under state of the art tracking, and accurately locate in multi-robot systems for future intelligent interaction. In addition, and compared with existing systems, our method eliminates markers and improves tracking visibility by 25%. To our knowledge, this is the first demonstration of marker-free proprioception for fully draped surgical robots, reducing setup complexity, enhancing safety, and paving the way toward modular and autonomous robotic surgery.
Authors:Chaoran Wang, Jingyuan Sun, Yanhui Zhang, Mingyu Zhang, Changju Wu
Abstract:
We introduce a novel framework for automatic behavior tree (BT) construction in heterogeneous multi-robot systems, designed to address the challenges of adaptability and robustness in dynamic environments. Traditional robots are limited by fixed functional attributes and cannot efficiently reconfigure their strategies in response to task failures or environmental changes. To overcome this limitation, we leverage large language models (LLMs) to generate and extend BTs dynamically, combining the reasoning and generalization power of LLMs with the modularity and recovery capability of BTs. The proposed framework consists of four interconnected modules task initialization, task assignment, BT update, and failure node detection which operate in a closed loop. Robots tick their BTs during execution, and upon encountering a failure node, they can either extend the tree locally or invoke a centralized virtual coordinator (Alex) to reassign subtasks and synchronize BTs across peers. This design enables long-term cooperative execution in heterogeneous teams. We validate the framework on 60 tasks across three simulated scenarios and in a real-world cafe environment with a robotic arm and a wheeled-legged robot. Results show that our method consistently outperforms baseline approaches in task success rate, robustness, and scalability, demonstrating its effectiveness for multi-robot collaboration in complex scenarios.
Authors:Ruiyang Wang, Hao-Lun Hsu, David Hunt, Shaocheng Luo, Jiwoo Kim, Miroslav Pajic
Abstract:
Autonomous exploration and object search in unknown indoor environments remain challenging for multi-robot systems (MRS). Traditional approaches often rely on greedy frontier assignment strategies with limited inter-robot coordination. In this work, we introduce LLM-MCoX (LLM-based Multi-robot Coordinated Exploration and Search), a novel framework that leverages Large Language Models (LLMs) for intelligent coordination of both homogeneous and heterogeneous robot teams tasked with efficient exploration and target object search. Our approach combines real-time LiDAR scan processing for frontier cluster extraction and doorway detection with multimodal LLM reasoning (e.g., GPT-4o) to generate coordinated waypoint assignments based on shared environment maps and robot states. LLM-MCoX demonstrates superior performance compared to existing methods, including greedy and Voronoi-based planners, achieving 22.7% faster exploration times and 50% improved search efficiency in large environments with 6 robots. Notably, LLM-MCoX enables natural language-based object search capabilities, allowing human operators to provide high-level semantic guidance that traditional algorithms cannot interpret.
Authors:Quan Quan, Jiwen Xu, Runxiao Liu, Yi Ding, Jiaxing Che, Kai-Yuan Cai
Abstract:
In comparison with existing approaches, which struggle with scalability, communication dependency, and robustness against dynamic failures, cooperative aerial transportation via robot swarms holds transformative potential for logistics and disaster response. Here, we present a physics-inspired cooperative transportation approach for flying robot swarms that imitates the dissipative mechanics of table-leg load distribution. By developing a decentralized dissipative force model, our approach enables autonomous formation stabilization and adaptive load allocation without the requirement of explicit communication. Based on local neighbor robots and the suspended payload, each robot dynamically adjusts its position. This is similar to energy-dissipating table leg reactions. The stability of the resultant control system is rigorously proved. Simulations demonstrate that the tracking errors of the proposed approach are 20%, 68%, 55.5%, and 21.9% of existing approaches under the cases of capability variation, cable uncertainty, limited vision, and payload variation, respectively. In real-world experiments with six flying robots, the cooperative aerial transportation system achieved a 94% success rate under single-robot failure, disconnection events, 25% payload variation, and 40% cable length uncertainty, demonstrating strong robustness under outdoor winds up to Beaufort scale 4. Overall, this physics-inspired approach bridges swarm intelligence and mechanical stability principles, offering a scalable framework for heterogeneous aerial systems to collectively handle complex transportation tasks in communication-constrained environments.
Authors:Gian Carlo Maffettone, Alain Boldini, Mario di Bernardo, Maurizio Porfiri
Abstract:
The design of control systems for the spatial self-organization of mobile agents is an open challenge across several engineering domains, including swarm robotics and synthetic biology. Here, we propose a bio-inspired leader-follower solution, which is aware of energy constraints of mobile agents and is apt to deal with large swarms. Akin to many natural systems, control objectives are formulated for the entire collective, and leaders and followers are allowed to plastically switch their role in time. We frame a density control problem, modeling the agents' population via a system of nonlinear partial differential equations. This approach allows for a compact description that inherently avoids the curse of dimensionality and improves analytical tractability. We derive analytical guarantees for the existence of desired steady-state solutions and their local stability for one-dimensional and higher-dimensional problems. We numerically validate our control methodology, offering support to the effectiveness, robustness, and versatility of our proposed bio-inspired control strategy.
Authors:Chuheng Wei, Ziye Qin, Walter Zimmer, Guoyuan Wu, Matthew J. Barth
Abstract:
Real-world Vehicle-to-Everything (V2X) cooperative perception systems often operate under heterogeneous sensor configurations due to cost constraints and deployment variability across vehicles and infrastructure. This heterogeneity poses significant challenges for feature fusion and perception reliability. To address these issues, we propose HeCoFuse, a unified framework designed for cooperative perception across mixed sensor setups where nodes may carry Cameras (C), LiDARs (L), or both. By introducing a hierarchical fusion mechanism that adaptively weights features through a combination of channel-wise and spatial attention, HeCoFuse can tackle critical challenges such as cross-modality feature misalignment and imbalanced representation quality. In addition, an adaptive spatial resolution adjustment module is employed to balance computational cost and fusion effectiveness. To enhance robustness across different configurations, we further implement a cooperative learning strategy that dynamically adjusts fusion type based on available modalities. Experiments on the real-world TUMTraf-V2X dataset demonstrate that HeCoFuse achieves 43.22% 3D mAP under the full sensor configuration (LC+LC), outperforming the CoopDet3D baseline by 1.17%, and reaches an even higher 43.38% 3D mAP in the L+LC scenario, while maintaining 3D mAP in the range of 21.74% to 43.38% across nine heterogeneous sensor configurations. These results, validated by our first-place finish in the CVPR 2025 DriveX challenge, establish HeCoFuse as the current state-of-the-art on TUM-Traf V2X dataset while demonstrating robust performance across diverse sensor deployments.
Authors:Elim Kwan, Rehman Qureshi, Liam Fletcher, Colin Laganier, Victoria Nockles, Richard Walters
Abstract:
Cooperative autonomous robotic systems have significant potential for executing complex multi-task missions across space, air, ground, and maritime domains. But they commonly operate in remote, dynamic and hazardous environments, requiring rapid in-mission adaptation without reliance on fragile or slow communication links to centralised compute. Fast, on-board replanning algorithms are therefore needed to enhance resilience. Reinforcement Learning shows strong promise for efficiently solving mission planning tasks when formulated as Travelling Salesperson Problems (TSPs), but existing methods: 1) are unsuitable for replanning, where agents do not start at a single location; 2) do not allow cooperation between agents; 3) are unable to model tasks with variable durations; or 4) lack practical considerations for on-board deployment. Here we define the Cooperative Mission Replanning Problem as a novel variant of multiple TSP with adaptations to overcome these issues, and develop a new encoder/decoder-based model using Graph Attention Networks and Attention Models to solve it effectively and efficiently. Using a simple example of cooperative drones, we show our replanner consistently (90% of the time) maintains performance within 10% of the state-of-the-art LKH3 heuristic solver, whilst running 85-370 times faster on a Raspberry Pi. This work paves the way for increased resilience in autonomous multi-agent systems.
Authors:Songyuan Zhang, Oswin So, Mitchell Black, Zachary Serlin, Chuchu Fan
Abstract:
Tasks for multi-robot systems often require the robots to collaborate and complete a team goal while maintaining safety. This problem is usually formalized as a constrained Markov decision process (CMDP), which targets minimizing a global cost and bringing the mean of constraint violation below a user-defined threshold. Inspired by real-world robotic applications, we define safety as zero constraint violation. While many safe multi-agent reinforcement learning (MARL) algorithms have been proposed to solve CMDPs, these algorithms suffer from unstable training in this setting. To tackle this, we use the epigraph form for constrained optimization to improve training stability and prove that the centralized epigraph form problem can be solved in a distributed fashion by each agent. This results in a novel centralized training distributed execution MARL algorithm named Def-MARL. Simulation experiments on 8 different tasks across 2 different simulators show that Def-MARL achieves the best overall performance, satisfies safety constraints, and maintains stable training. Real-world hardware experiments on Crazyflie quadcopters demonstrate the ability of Def-MARL to safely coordinate agents to complete complex collaborative tasks compared to other methods.
Authors:Jihao Huang, Jun Zeng, Xuemin Chi, Koushil Sreenath, Zhitao Liu, Hongye Su
Abstract:
Designing safety-critical controllers for acceleration-controlled unicycle robots is challenging, as control inputs may not appear in the constraints of control Lyapunov functions(CLFs) and control barrier functions (CBFs), leading to invalid controllers. Existing methods often rely on state-feedback-based CLFs and high-order CBFs (HOCBFs), which are computationally expensive to construct and fail to maintain effectiveness in dynamic environments with fast-moving, nearby obstacles. To address these challenges, we propose constructing velocity obstacle-based CBFs (VOCBFs) in the velocity space to enhance dynamic collision avoidance capabilities, instead of relying on distance-based CBFs that require the introduction of HOCBFs. Additionally, by extending VOCBFs using variants of VO, we enable reactive collision avoidance between robots. We formulate a safety-critical controller for acceleration-controlled unicycle robots as a mixed-integer quadratic programming (MIQP), integrating state-feedback-based CLFs for navigation and VOCBFs for collision avoidance. To enhance the efficiency of solving the MIQP, we split the MIQP into multiple sub-optimization problems and employ a decision network to reduce computational costs. Numerical simulations demonstrate that our approach effectively guides the robot to its target while avoiding collisions. Compared to HOCBFs, VOCBFs exhibit significantly improved dynamic obstacle avoidance performance, especially when obstacles are fast-moving and close to the robot. Furthermore, we extend our method to distributed multi-robot systems.
Authors:Beniamino Di Lorenzo, Gian Carlo Maffettone, Mario di Bernardo
Abstract:
In this paper, we address the large-scale shepherding control problem using a continuification-based strategy. We consider a scenario in which a large group of follower agents (targets) must be confined within a designated goal region through indirect interactions with a controllable set of leader agents (herders). Our approach transforms the microscopic agent-based dynamics into a macroscopic continuum model via partial differential equations (PDEs). This formulation enables efficient, scalable control design for the herders' behavior, with guarantees of global convergence. Numerical and experimental validations in a mixed-reality swarm robotics framework demonstrate the method's effectiveness.
Authors:Quan Quan, Shuhan Huang, Kai-Yuan Cai
Abstract:
With the rapid development of robotics swarm technology, there are more tasks that require the swarm to pass through complicated environments safely and efficiently. Virtual tube technology is a novel way to achieve this goal. Virtual tubes are free spaces connecting two places that provide safety boundaries and direction of motion for swarm robotics. How to determine the design quality of a virtual tube is a fundamental problem. For such a purpose, this paper presents a degree of flowability (DOF) for two-dimensional virtual tubes according to a minimum energy principle. After that, methods to calculate DOF are proposed with a feasibility analysis. Simulations of swarm robotics in different kinds of two-dimensional virtual tubes are performed to demonstrate the effectiveness of the proposed method of calculating DOF.
Authors:Stepan Dergachev, Artem Pshenitsyn, Aleksandr Panov, Alexey Skrynnik, Konstantin Yakovlev
Abstract:
Decentralized collision avoidance remains a core challenge for scalable multi-robot systems. One of the promising approaches to tackle this problem is Model Predictive Path Integral (MPPI) -- a framework that is naturally suited to handle any robot motion model and provides strong theoretical guarantees. Still, in practice MPPI-based controller may provide suboptimal trajectories as its performance relies heavily on uninformed random sampling. In this work, we introduce CoRL-MPPI, a novel fusion of Cooperative Reinforcement Learning and MPPI to address this limitation. We train an action policy (approximated as deep neural network) in simulation that learns local cooperative collision avoidance behaviors. This learned policy is then embedded into the MPPI framework to guide its sampling distribution, biasing it towards more intelligent and cooperative actions. Notably, CoRL-MPPI preserves all the theoretical guarantees of regular MPPI. We evaluate our approach in dense, dynamic simulation environments against state-of-the-art baselines, including ORCA, BVC, and a multi-agent MPPI implementation. Our results demonstrate that CoRL-MPPI significantly improves navigation efficiency (measured by success rate and makespan) and safety, enabling agile and robust multi-robot navigation.
Authors:Ricardo Vega, Connor Mattson, Kevin Zhu, Daniel S. Brown, Cameron Nowzari
Abstract:
Swarm robotics has potential for a wide variety of applications, but real-world deployments remain rare due to the difficulty of predicting emergent behaviors arising from simple local interactions. Traditional engineering approaches design controllers to achieve desired macroscopic outcomes under idealized conditions, while agent-based and artificial life studies explore emergent phenomena in a bottom-up, exploratory manner. In this work, we introduce Analytical Swarm Chemistry, a framework that integrates concepts from engineering, agent-based and artificial life research, and chemistry. This framework combines macrostate definitions with phase diagram analysis to systematically explore how swarm parameters influence emergent behavior. Inspired by concepts from chemistry, the framework treats parameters like thermodynamic variables, enabling visualization of regions in parameter space that give rise to specific behaviors. Applying this framework to agents with minimally viable capabilities, we identify sufficient conditions for behaviors such as milling and diffusion and uncover regions of the parameter space that reliably produce these behaviors. Preliminary validation on real robots demonstrates that these regions correspond to observable behaviors in practice. By providing a principled, interpretable approach, this framework lays the groundwork for predictable and reliable emergent behavior in real-world swarm systems.
Authors:Connor Mattson, Varun Raveendra, Ellen Novoseller, Nicholas Waytowich, Vernon J. Lawhern, Daniel S. Brown
Abstract:
Imitation Learning (IL) is a natural way for humans to teach robots, particularly when high-quality demonstrations are easy to obtain. While IL has been widely applied to single-robot settings, relatively few studies have addressed the extension of these methods to multi-agent systems, especially in settings where a single human must provide demonstrations to a team of collaborating robots. In this paper, we introduce and study Round-Robin Behavior Cloning (R2BC), a method that enables a single human operator to effectively train multi-robot systems through sequential, single-agent demonstrations. Our approach allows the human to teleoperate one agent at a time and incrementally teach multi-agent behavior to the entire system, without requiring demonstrations in the joint multi-agent action space. We show that R2BC methods match, and in some cases surpass, the performance of an oracle behavior cloning approach trained on privileged synchronized demonstrations across four multi-agent simulated tasks. Finally, we deploy R2BC on two physical robot tasks trained using real human demonstrations.
Authors:Yongkai Tian, Yirong Qi, Xin Yu, Wenjun Wu, Jie Luo
Abstract:
In robotic systems, the performance of reinforcement learning depends on the rationality of predefined reward functions. However, manually designed reward functions often lead to policy failures due to inaccuracies. Inverse Reinforcement Learning (IRL) addresses this problem by inferring implicit reward functions from expert demonstrations. Nevertheless, existing methods rely heavily on large amounts of expert demonstrations to accurately recover the reward function. The high cost of collecting expert demonstrations in robotic applications, particularly in multi-robot systems, severely hinders the practical deployment of IRL. Consequently, improving sample efficiency has emerged as a critical challenge in multi-agent inverse reinforcement learning (MIRL). Inspired by the symmetry inherent in multi-agent systems, this work theoretically demonstrates that leveraging symmetry enables the recovery of more accurate reward functions. Building upon this insight, we propose a universal framework that integrates symmetry into existing multi-agent adversarial IRL algorithms, thereby significantly enhancing sample efficiency. Experimental results from multiple challenging tasks have demonstrated the effectiveness of this framework. Further validation in physical multi-robot systems has shown the practicality of our method.
Authors:Ruijie Du, Ruoyu Lin, Yanning Shen, Magnus Egerstedt
Abstract:
This paper proposes a framework for multi-robot systems to perform simultaneous learning and coverage of the domain of interest characterized by an unknown and potentially time-varying density function. To overcome the limitations of Gaussian Process (GP) regression, we employ Random Feature GP (RFGP) and its online variant (O-RFGP) that enables online and incremental inference. By integrating these with Voronoi-based coverage control and Upper Confidence Bound (UCB) sampling strategy, a team of robots can adaptively focus on important regions while refining the learned spatial field for efficient coverage. Under mild assumptions, we provide theoretical guarantees and evaluate the framework through simulations in time-invariant scenarios. Furthermore, its effectiveness in time-varying settings is demonstrated through additional simulations and a physical experiment.
Authors:Ajay Shankar, Keisuke Okumura, Amanda Prorok
Abstract:
We propose a multi-robot control paradigm to solve point-to-point navigation tasks for a team of holonomic robots with access to the full environment information. The framework invokes two processes asynchronously at high frequency: (i) a centralized, discrete, and full-horizon planner for computing collision- and deadlock-free paths rapidly, leveraging recent advances in multi-agent pathfinding (MAPF), and (ii) dynamics-aware, robot-wise optimal trajectory controllers that ensure all robots independently follow their assigned paths reliably. This hierarchical shift in planning representation from (i) discrete and coupled to (ii) continuous and decoupled domains enables the framework to maintain long-term scalable motion synthesis. As an instantiation of this idea, we present LF, which combines a fast state-of-the-art MAPF solver (LaCAM), and a robust feedback control stack (Freyja) for executing agile robot maneuvers. LF provides a robust and versatile mechanism for lifelong multi-robot navigation even under asynchronous and partial goal updates, and adapts to dynamic workspaces simply by quick replanning. We present various multirotor and ground robot demonstrations, including the deployment of 15 real multirotors with random, consecutive target updates while a person walks through the operational workspace.
Authors:Tenghui Xie, Zhiying Song, Fuxi Wen, Jun Li, Guangzhao Liu, Zijian Zhao
Abstract:
Autonomous trucking offers significant benefits, such as improved safety and reduced costs, but faces unique perception challenges due to trucks' large size and dynamic trailer movements. These challenges include extensive blind spots and occlusions that hinder the truck's perception and the capabilities of other road users. To address these limitations, cooperative perception emerges as a promising solution. However, existing datasets predominantly feature light vehicle interactions or lack multi-agent configurations for heavy-duty vehicle scenarios. To bridge this gap, we introduce TruckV2X, the first large-scale truck-centered cooperative perception dataset featuring multi-modal sensing (LiDAR and cameras) and multi-agent cooperation (tractors, trailers, CAVs, and RSUs). We further investigate how trucks influence collaborative perception needs, establishing performance benchmarks while suggesting research priorities for heavy vehicle perception. The dataset provides a foundation for developing cooperative perception systems with enhanced occlusion handling capabilities, and accelerates the deployment of multi-agent autonomous trucking systems. The TruckV2X dataset is available at https://huggingface.co/datasets/XieTenghu1/TruckV2X.
Authors:Zhiying Song, Tenghui Xie, Fuxi Wen, Jun Li
Abstract:
Cooperative perception extends the perception capabilities of autonomous vehicles by enabling multi-agent information sharing via Vehicle-to-Everything (V2X) communication. Unlike traditional onboard sensors, V2X acts as a dynamic "information sensor" characterized by limited communication, heterogeneity, mobility, and scalability. This survey provides a comprehensive review of recent advancements from the perspective of information-centric cooperative perception, focusing on three key dimensions: information representation, information fusion, and large-scale deployment. We categorize information representation into data-level, feature-level, and object-level schemes, and highlight emerging methods for reducing data volume and compressing messages under communication constraints. In information fusion, we explore techniques under both ideal and non-ideal conditions, including those addressing heterogeneity, localization errors, latency, and packet loss. Finally, we summarize system-level approaches to support scalability in dense traffic scenarios. Compared with existing surveys, this paper introduces a new perspective by treating V2X communication as an information sensor and emphasizing the challenges of deploying cooperative perception in real-world intelligent transportation systems.
Authors:Lingpeng Chen, Siva Kailas, Srujan Deolasee, Wenhao Luo, Katia Sycara, Woojun Kim
Abstract:
We introduce a novel distributed source seeking framework, DIAS, designed for multi-robot systems in scenarios where the number of sources is unknown and potentially exceeds the number of robots. Traditional robotic source seeking methods typically focused on directing each robot to a specific strong source and may fall short in comprehensively identifying all potential sources. DIAS addresses this gap by introducing a hybrid controller that identifies the presence of sources and then alternates between exploration for data gathering and exploitation for guiding robots to identified sources. It further enhances search efficiency by dividing the environment into Voronoi cells and approximating source density functions based on Gaussian process regression. Additionally, DIAS can be integrated with existing source seeking algorithms. We compare DIAS with existing algorithms, including DoSS and GMES in simulated gas leakage scenarios where the number of sources outnumbers or is equal to the number of robots. The numerical results show that DIAS outperforms the baseline methods in both the efficiency of source identification by the robots and the accuracy of the estimated environmental density function.
Authors:Kenta Tsukahara, Kanji Tanaka, Daiki Iwata, Jonathan Tay Yu Liang
Abstract:
In the context of visual place recognition (VPR), continual learning (CL) techniques offer significant potential for avoiding catastrophic forgetting when learning new places. However, existing CL methods often focus on knowledge transfer from a known model to a new one, overlooking the existence of unknown black-box models. We explore a novel multi-robot CL approach that enables knowledge transfer from black-box VPR models (teachers), such as those of local robots encountered by traveler robots (students) in unknown environments. Specifically, we introduce Membership Inference Attack, or MIA, the only major privacy attack applicable to black-box models, and leverage it to reconstruct pseudo training sets, which serve as the key knowledge to be exchanged between robots, from black-box VPR models. Furthermore, we aim to overcome the inherently low sampling efficiency of MIA by leveraging insights on place class prediction distribution and un-learned class detection imported from the VPR literature as a prior distribution. We also analyze both the individual effects of these methods and their combined impact. Experimental results demonstrate that our black-box MIA (BB-MIA) approach is remarkably powerful despite its simplicity, significantly enhancing the VPR capability of lower-performing robots through brief communication with other robots. This study contributes to optimizing knowledge sharing between robots in VPR and enhancing autonomy in open-world environments with multi-robot systems that are fault-tolerant and scalable.
Authors:Zhenwei Yang, Jilei Mao, Wenxian Yang, Yibo Ai, Yu Kong, Haibao Yu, Weidong Zhang
Abstract:
Temporal perception, defined as the capability to detect and track objects across temporal sequences, serves as a fundamental component in autonomous driving systems. While single-vehicle perception systems encounter limitations, stemming from incomplete perception due to object occlusion and inherent blind spots, cooperative perception systems present their own challenges in terms of sensor calibration precision and positioning accuracy. To address these issues, we introduce LET-VIC, a LiDAR-based End-to-End Tracking framework for Vehicle-Infrastructure Cooperation (VIC). First, we employ Temporal Self-Attention and VIC Cross-Attention modules to effectively integrate temporal and spatial information from both vehicle and infrastructure perspectives. Then, we develop a novel Calibration Error Compensation (CEC) module to mitigate sensor misalignment issues and facilitate accurate feature alignment. Experiments on the V2X-Seq-SPD dataset demonstrate that LET-VIC significantly outperforms baseline models. Compared to LET-V, LET-VIC achieves +15.0% improvement in mAP and a +17.3% improvement in AMOTA. Furthermore, LET-VIC surpasses representative Tracking by Detection models, including V2VNet, FFNet, and PointPillars, with at least a +13.7% improvement in mAP and a +13.1% improvement in AMOTA without considering communication delays, showcasing its robust detection and tracking performance. The experiments demonstrate that the integration of multi-view perspectives, temporal sequences, or CEC in end-to-end training significantly improves both detection and tracking performance. All code will be open-sourced.
Authors:Junting Chen, Checheng Yu, Xunzhe Zhou, Tianqi Xu, Yao Mu, Mengkang Hu, Wenqi Shao, Yikai Wang, Guohao Li, Lin Shao
Abstract:
Heterogeneous multi-robot systems (HMRS) have emerged as a powerful approach for tackling complex tasks that single robots cannot manage alone. Current large-language-model-based multi-agent systems (LLM-based MAS) have shown success in areas like software development and operating systems, but applying these systems to robot control presents unique challenges. In particular, the capabilities of each agent in a multi-robot system are inherently tied to the physical composition of the robots, rather than predefined roles. To address this issue, we introduce a novel multi-agent framework designed to enable effective collaboration among heterogeneous robots with varying embodiments and capabilities, along with a new benchmark named Habitat-MAS. One of our key designs is $\textit{Robot Resume}$: Instead of adopting human-designed role play, we propose a self-prompted approach, where agents comprehend robot URDF files and call robot kinematics tools to generate descriptions of their physics capabilities to guide their behavior in task planning and action execution. The Habitat-MAS benchmark is designed to assess how a multi-agent framework handles tasks that require embodiment-aware reasoning, which includes 1) manipulation, 2) perception, 3) navigation, and 4) comprehensive multi-floor object rearrangement. The experimental results indicate that the robot's resume and the hierarchical design of our multi-agent system are essential for the effective operation of the heterogeneous multi-robot system within this intricate problem context.
Authors:Le Liu, Yu Kawano, Ming Cao
Abstract:
Privacy has become a critical concern in modern multi-robot systems, driven by both ethical considerations and operational constraints. As a result, growing attention has been directed toward privacy-preserving coordination in dynamical multi-robot systems. This work introduces a randomized scheduling mechanism for privacy-preserving robot rendezvous. The proposed approach achieves improved privacy even at lower communication rates, where privacy is quantified via pointwise maximal leakage. We show that lower transmission rates provide stronger privacy guarantees and prove that rendezvous is still achieved under the randomized scheduling mechanism. Numerical simulations are provided to demonstrate the effectiveness of the method.
Authors:Jiaxi Liu, Chengyuan Ma, Hang Zhou, Weizhe Tang, Shixiao Liang, Haoyang Ding, Xiaopeng Li, Bin Ran
Abstract:
Cooperative perception (CP) offers significant potential to overcome the limitations of single-vehicle sensing by enabling information sharing among connected vehicles (CVs). However, existing generic CP approaches need to transmit large volumes of perception data that are irrelevant to the driving safety, exceeding available communication bandwidth. Moreover, most CP frameworks rely on pre-defined communication partners, making them unsuitable for dynamic traffic environments. This paper proposes a Spontaneous Risk-Aware Selective Cooperative Perception (SRA-CP) framework to address these challenges. SRA-CP introduces a decentralized protocol where connected agents continuously broadcast lightweight perception coverage summaries and initiate targeted cooperation only when risk-relevant blind zones are detected. A perceptual risk identification module enables each CV to locally assess the impact of occlusions on its driving task and determine whether cooperation is necessary. When CP is triggered, the ego vehicle selects appropriate peers based on shared perception coverage and engages in selective information exchange through a fusion module that prioritizes safety-critical content and adapts to bandwidth constraints. We evaluate SRA-CP on a public dataset against several representative baselines. Results show that SRA-CP achieves less than 1% average precision (AP) loss for safety-critical objects compared to generic CP, while using only 20% of the communication bandwidth. Moreover, it improves the perception performance by 15% over existing selective CP methods that do not incorporate risk awareness.
Authors:Hangyu Li, Bofeng Cao, Zhaohui Liang, Wuzhen Li, Juyoung Oh, Yuxuan Chen, Shixiao Liang, Hang Zhou, Chengyuan Ma, Jiaxi Liu, Zheng Li, Peng Zhang, KeKe Long, Maolin Liu, Jackson Jiang, Chunlei Yu, Shengxiang Liu, Hongkai Yu, Xiaopeng Li
Abstract:
Vehicle-to-Vehicle (V2V) cooperative perception has great potential to enhance autonomous driving performance by overcoming perception limitations in complex adverse traffic scenarios (CATS). Meanwhile, data serves as the fundamental infrastructure for modern autonomous driving AI. However, due to stringent data collection requirements, existing datasets focus primarily on ordinary traffic scenarios, constraining the benefits of cooperative perception. To address this challenge, we introduce CATS-V2V, the first-of-its-kind real-world dataset for V2V cooperative perception under complex adverse traffic scenarios. The dataset was collected by two hardware time-synchronized vehicles, covering 10 weather and lighting conditions across 10 diverse locations. The 100-clip dataset includes 60K frames of 10 Hz LiDAR point clouds and 1.26M multi-view 30 Hz camera images, along with 750K anonymized yet high-precision RTK-fixed GNSS and IMU records. Correspondingly, we provide time-consistent 3D bounding box annotations for objects, as well as static scenes to construct a 4D BEV representation. On this basis, we propose a target-based temporal alignment method, ensuring that all objects are precisely aligned across all sensor modalities. We hope that CATS-V2V, the largest-scale, most supportive, and highest-quality dataset of its kind to date, will benefit the autonomous driving community in related tasks.
Authors:Chongyang Shi, Wesley A. Suttle, Michael Dorothy, Jie Fu
Abstract:
We study the problem of jointly selecting sensing agents and synthesizing decentralized active perception policies for the chosen subset of agents within a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) framework. Our approach employs a two-layer optimization structure. In the inner layer, we introduce information-theoretic metrics, defined by the mutual information between the unknown trajectories or some hidden property in the environment and the collective partial observations in the multi-agent system, as a unified objective for active perception problems. We employ various optimization methods to obtain optimal sensor policies that maximize mutual information for distinct active perception tasks. In the outer layer, we prove that under certain conditions, the information-theoretic objectives are monotone and submodular with respect to the subset of observations collected from multiple agents. We then exploit this property to design an IMAS$^2$ (Information-theoretic Multi-Agent Selection and Sensing) algorithm for joint sensing agent selection and sensing policy synthesis. However, since the policy search space is infinite, we adapt the classical Nemhauser-Wolsey argument to prove that the proposed IMAS$^2$ algorithm can provide a tight $(1 - 1/e)$-guarantee on the performance. Finally, we demonstrate the effectiveness of our approach in a multi-agent cooperative perception in a grid-world environment.
Authors:Taehyeon Kim, Vishnunandan L. N. Venkatesh, Byung-Cheol Min
Abstract:
In this paper, we propose a novel few-shot learning framework for multi-robot systems that integrate both spatial and temporal elements: Few-Shot Demonstration-Driven Task Coordination and Trajectory Execution (DDACE). Our approach leverages temporal graph networks for learning task-agnostic temporal sequencing and Gaussian Processes for spatial trajectory modeling, ensuring modularity and generalization across various tasks. By decoupling temporal and spatial aspects, DDACE requires only a small number of demonstrations, significantly reducing data requirements compared to traditional learning from demonstration approaches. To validate our proposed framework, we conducted extensive experiments in task environments designed to assess various aspects of multi-robot coordination-such as multi-sequence execution, multi-action dynamics, complex trajectory generation, and heterogeneous configurations. The experimental results demonstrate that our approach successfully achieves task execution under few-shot learning conditions and generalizes effectively across dynamic and diverse settings. This work underscores the potential of modular architectures in enhancing the practicality and scalability of multi-robot systems in real-world applications. Additional materials are available at https://sites.google.com/view/ddace.
Authors:Doncey Albin, Daniel McGann, Miles Mena, Annika Thomas, Harel Biggie, Xuefei Sun, Steve McGuire, Jonathan P. How, Christoffer Heckman
Abstract:
A central challenge for multi-robot systems is fusing independently gathered perception data into a unified representation. Despite progress in Collaborative SLAM (C-SLAM), benchmarking remains hindered by the scarcity of dedicated multi-robot datasets. Many evaluations instead partition single-robot trajectories, a practice that may only partially reflect true multi-robot operations and, more critically, lacks standardization, leading to results that are difficult to interpret or compare across studies. While several multi-robot datasets have recently been introduced, they mostly contain short trajectories with limited inter-robot overlap and sparse intra-robot loop closures. To overcome these limitations, we introduce CU-Multi, a dataset collected over multiple days at two large outdoor sites on the University of Colorado Boulder campus. CU-Multi comprises four synchronized runs with aligned start times and controlled trajectory overlap, replicating the distinct perspectives of a robot team. It includes RGB-D sensing, RTK GPS, semantic LiDAR, and refined ground-truth odometry. By combining overlap variation with dense semantic annotations, CU-Multi provides a strong foundation for reproducible evaluation in multi-robot collaborative perception tasks.
Authors:Usman A. Khan, Mouhacine Benosman, Wenliang Liu, Federico Pecora, Joseph W. Durham
Abstract:
In this paper, we propose a novel methodology for path planning and scheduling for multi-robot navigation that is based on optimal transport theory and model predictive control. We consider a setup where $N$ robots are tasked to navigate to $M$ targets in a common space with obstacles. Mapping robots to targets first and then planning paths can result in overlapping paths that lead to deadlocks. We derive a strategy based on optimal transport that not only provides minimum cost paths from robots to targets but also guarantees non-overlapping trajectories. We achieve this by discretizing the space of interest into $K$ cells and by imposing a ${K\times K}$ cost structure that describes the cost of transitioning from one cell to another. Optimal transport then provides \textit{optimal and non-overlapping} cell transitions for the robots to reach the targets that can be readily deployed without any scheduling considerations. The proposed solution requires $\unicode{x1D4AA}(K^3\log K)$ computations in the worst-case and $\unicode{x1D4AA}(K^2\log K)$ for well-behaved problems. To further accommodate potentially overlapping trajectories (unavoidable in certain situations) as well as robot dynamics, we show that a temporal structure can be integrated into optimal transport with the help of \textit{replans} and \textit{model predictive control}.
Authors:Kun Song, Shentao Ma, Gaoming Chen, Ninglong Jin, Guangbao Zhao, Mingyu Ding, Zhenhua Xiong, Jia Pan
Abstract:
A central research topic in robotics is how to use this system to interact with the physical world. Traditional manipulation tasks primarily focus on small objects. However, in factory or home environments, there is often a need for the movement of large objects, such as moving tables. These tasks typically require multi-robot systems to work collaboratively. Previous research lacks a framework that can scale to arbitrary sizes of robots and generalize to various kinds of tasks. In this work, we propose CollaBot, a generalist framework for simultaneous collaborative manipulation. First, we use SEEM for scene segmentation and point cloud extraction of the target object. Then, we propose a collaborative grasping framework, which decomposes the task into local grasp pose generation and global collaboration. Finally, we design a 2-stage planning module that can generate collision-free trajectories to achieve this task. Experiments show a success rate of 52% across different numbers of robots, objects, and tasks, indicating the effectiveness of the proposed framework.
Authors:Haokun Liu, Zhaoqi Ma, Yunong Li, Junichiro Sugihara, Yicheng Chen, Jinjie Li, Moju Zhao
Abstract:
Heterogeneous multi-robot systems show great potential in complex tasks requiring hybrid cooperation. However, traditional approaches relying on static models often struggle with task diversity and dynamic environments. This highlights the need for generalizable intelligence that can bridge high-level reasoning with low-level execution across heterogeneous agents. To address this, we propose a hierarchical framework integrating a prompted Large Language Model (LLM) and a GridMask-enhanced fine-tuned Vision Language Model (VLM). The LLM decomposes tasks and constructs a global semantic map, while the VLM extracts task-specified semantic labels and 2D spatial information from aerial images to support local planning. Within this framework, the aerial robot follows an optimized global semantic path and continuously provides bird-view images, guiding the ground robot's local semantic navigation and manipulation, including target-absent scenarios where implicit alignment is maintained. Experiments on real-world cube or object arrangement tasks demonstrate the framework's adaptability and robustness in dynamic environments. To the best of our knowledge, this is the first demonstration of an aerial-ground heterogeneous system integrating VLM-based perception with LLM-driven task reasoning and motion planning.
Authors:Lei Wan, Prabesh Gupta, Andreas Eich, Marcel Kettelgerdes, Hannan Ejaz Keen, Michael Klöppel-Gersdorf, Alexey Vinel
Abstract:
Perception is a core capability of automated vehicles and has been significantly advanced through modern sensor technologies and artificial intelligence. However, perception systems still face challenges in complex real-world scenarios. To improve robustness against various external factors, multi-sensor fusion techniques are essential, combining the strengths of different sensor modalities. With recent developments in Vehicle-to-Everything (V2X communication, sensor fusion can now extend beyond a single vehicle to a cooperative multi-agent system involving Connected Automated Vehicle (CAV) and intelligent infrastructure. This paper presents VALISENS, an innovative multi-sensor system distributed across multiple agents. It integrates onboard and roadside LiDARs, radars, thermal cameras, and RGB cameras to enhance situational awareness and support cooperative automated driving. The thermal camera adds critical redundancy for perceiving Vulnerable Road User (VRU), while fusion with roadside sensors mitigates visual occlusions and extends the perception range beyond the limits of individual vehicles. We introduce the corresponding perception module built on this sensor system, which includes object detection, tracking, motion forecasting, and high-level data fusion. The proposed system demonstrates the potential of cooperative perception in real-world test environments and lays the groundwork for future Cooperative Intelligent Transport Systems (C-ITS) applications.
Authors:Wenliang Liu, Nathalie Majcherczyk, Federico Pecora
Abstract:
Motion planning with simple objectives, such as collision-avoidance and goal-reaching, can be solved efficiently using modern planners. However, the complexity of the allowed tasks for these planners is limited. On the other hand, signal temporal logic (STL) can specify complex requirements, but STL-based motion planning and control algorithms often face scalability issues, especially in large multi-robot systems with complex dynamics. In this paper, we propose an algorithm that leverages the best of the two worlds. We first use a single-robot motion planner to efficiently generate a set of alternative reference paths for each robot. Then coordination requirements are specified using STL, which is defined over the assignment of paths and robots' progress along those paths. We use a Mixed Integer Linear Program (MILP) to compute task assignments and robot progress targets over time such that the STL specification is satisfied. Finally, a local controller is used to track the target progress. Simulations demonstrate that our method can handle tasks with complex constraints and scales to large multi-robot teams and intricate task allocation scenarios.
Authors:Shuhao Qi, Zengjie Zhang, Zhiyong Sun, Sofie Haesaert
Abstract:
This paper is devoted to the analysis and resolution of a pathological phenomenon in airplane encounters called blocking mode. As autonomy in airplane systems increases, a pathological phenomenon can be observed in two-aircraft encounter scenarios, where airplanes stick together and fly in parallel for an extended period. This parallel flight results in a temporary blocking that significantly delays progress. In contrast to widely studied deadlocks in multi-robot systems, such transient blocking is often overlooked in existing literature. Since such prolonged parallel flying places high-speed airplanes at elevated risks of near-miss collisions, encounter conflicts must be resolved as quickly as possible in the context of aviation. We develop a mathematical model for a two-airplane encounter system that replicates this blocking phenomenon. Using this model, we analyze the conditions under which blocking occurs, quantify the duration of the blocking period, and demonstrate that the blocking condition is significantly less restrictive than that of deadlock. Based on these analytical insights, we propose an intention-aware strategy with an adaptive priority mechanism that enables efficient resolution of ongoing blocking phenomena while also incidentally eliminating deadlocks. Notably, the developed strategy does not rely on central coordination and communications that can be unreliable in harsh situations. The analytical findings and the proposed resolution strategy are validated through extensive simulations.
Authors:Qingquan Lin, Weining Lu, Litong Meng, Chenxi Li, Bin Liang
Abstract:
For tasks conducted in unknown environments with efficiency requirements, real-time navigation of multi-robot systems remains challenging due to unfamiliarity with surroundings.In this paper, we propose a novel multi-robot collaborative planning method that leverages the perception of different robots to intelligently select search directions and improve planning efficiency. Specifically, a foundational planner is employed to ensure reliable exploration towards targets in unknown environments and we introduce Graph Attention Architecture with Information Gain Weight(GIWT) to synthesizes the information from the target robot and its teammates to facilitate effective navigation around obstacles.In GIWT, after regionally encoding the relative positions of the robots along with their perceptual features, we compute the shared attention scores and incorporate the information gain obtained from neighboring robots as a supplementary weight. We design a corresponding expert data generation scheme to simulate real-world decision-making conditions for network training. Simulation experiments and real robot tests demonstrates that the proposed method significantly improves efficiency and enables collaborative planning for multiple robots. Our method achieves approximately 82% accuracy on the expert dataset and reduces the average path length by about 8% and 6% across two types of tasks compared to the fundamental planner in ROS tests, and a path length reduction of over 6% in real-world experiments.
Authors:Mason B. Peterson, Yixuan Jia, Yulun Tian, Annika Thomas, Jonathan P. How
Abstract:
Global localization is a fundamental capability required for long-term and drift-free robot navigation. However, current methods fail to relocalize when faced with significantly different viewpoints. We present ROMAN (Robust Object Map Alignment Anywhere), a global localization method capable of localizing in challenging and diverse environments by creating and aligning maps of open-set and view-invariant objects. ROMAN formulates and solves a registration problem between object submaps using a unified graph-theoretic global data association approach with a novel incorporation of a gravity direction prior and object shape and semantic similarity. This work's open-set object mapping and information-rich object association algorithm enables global localization, even in instances when maps are created from robots traveling in opposite directions. Through a set of challenging global localization experiments in indoor, urban, and unstructured/forested environments, we demonstrate that ROMAN achieves higher relative pose estimation accuracy than other image-based pose estimation methods or segment-based registration methods. Additionally, we evaluate ROMAN as a loop closure module in large-scale multi-robot SLAM and show a 35% improvement in trajectory estimation error compared to standard SLAM systems using visual features for loop closures. Code and videos can be found at https://acl.mit.edu/roman.
Authors:Chenxi Li, Weining Lu, Qingquan Lin, Litong Meng, Haolu Li, Bin Liang
Abstract:
This paper proposes a lightweight systematic solution for multi-robot coordinated navigation with decentralized cooperative perception. An information flow is first created to facilitate real-time observation sharing over unreliable ad-hoc networks. Then, the environmental uncertainties of each robot are reduced by interaction fields that deliver complementary information. Finally, path optimization is achieved, enabling self-organized coordination with effective convergence, divergence, and collision avoidance. Our method is fully interpretable and ready for deployment without gaps. Comprehensive simulations and real-world experiments demonstrate reduced path redundancy, robust performance across various tasks, and minimal demands on computation and communication.
Authors:Mark Gonzales, Ethan Oh, Joseph Moore
Abstract:
In this paper, we present a receding-horizon, sampling-based planner capable of reasoning over multimodal policy distributions. By using the cross-entropy method to optimize a multimodal policy under a common cost function, our approach increases robustness against local minima and promotes effective exploration of the solution space. We show that our approach naturally extends to multi-robot collision-free planning, enables agents to share diverse candidate policies to avoid deadlocks, and allows teams to minimize a global objective without incurring the computational complexity of centralized optimization. Numerical simulations demonstrate that employing multiple modes significantly improves success rates in trap environments and in multi-robot collision avoidance. Hardware experiments further validate the approach's real-time feasibility and practical performance.
Authors:Ioana Hustiu, Roozbeh Abolpour, Cristian Mahulea, Marius Kloetzer
Abstract:
This paper presents a novel path-planning and task assignment algorithm for multi-robot systems that should fulfill a global Boolean specification. The proposed method is based on Integer Linear Programming (ILP) formulations, which are combined with structural insights from Petri nets to improve scalability and computational efficiency. By proving that the \emph{constraint matrix} is totally unimodular (TU) for certain classes of problems, the ILP formulation can be relaxed into a Linear Programming (LP) problem without losing the integrality of the solution. This relaxation eliminates complex combinatorial techniques, significantly reducing computational overhead and thus ensuring scalability for large-scale systems. Using the approach proposed in this paper, we can solve path-planning problems for teams made up to 500 robots. The method guarantees computational tractability, handles collision avoidance and reduces computational demands through iterative LP optimization techniques. Case studies demonstrate the efficiency of the algorithm in generating scalable, collision-free paths for large robot teams navigating in complex environments. While the conservative nature of collision avoidance introduces additional constraints, and thus, computational requirements, the solution remains practical and impactful for diverse applications. The algorithm is particularly applicable to real-world scenarios, including warehouse logistics where autonomous robots must efficiently coordinate tasks or search-and-rescue operations in various environments. This work contributes both theoretically and practically to scalable multi-robot path planning and task allocation, offering an efficient framework for coordinating autonomous agents in shared environments.
Authors:Tohid Kargar Tasooji, Sakineh Khodadadi
Abstract:
This paper addresses the problem of distributed coordination control for multi-robot systems (MRSs) in the presence of localization uncertainty using a Linear Quadratic Gaussian (LQG) approach. We introduce a stochastic LQG control strategy that ensures the coordination of mobile robots while optimizing a performance criterion. The proposed control framework accounts for the inherent uncertainty in localization measurements, enabling robust decision-making and coordination. We analyze the stability of the system under the proposed control protocol, deriving conditions for the convergence of the multi-robot network. The effectiveness of the proposed approach is demonstrated through experimental validation using Robotrium simulation experiments, showcasing the practical applicability of the control strategy in real-world scenarios with localization uncertainty.
Authors:Tohid Kargar Tasooji, Sakineh Khodadadi
Abstract:
Multi-robot coordination is fundamental to various applications, including autonomous exploration, search and rescue, and cooperative transportation. This paper presents an optimal consensus framework for multi-robot systems (MRSs) that ensures efficient rendezvous while minimizing energy consumption and addressing actuator constraints. A critical challenge in real-world deployments is actuator limitations, particularly wheel velocity saturation, which can significantly degrade control performance. To address this issue, we incorporate Pontryagin Minimum Principle (PMP) into the control design, facilitating constrained optimization while ensuring system stability and feasibility. The resulting optimal control policy effectively balances coordination efficiency and energy consumption, even in the presence of actuation constraints. The proposed framework is validated through extensive numerical simulations and real-world experiments conducted using a team of Robotarium mobile robots. The experimental results confirm that our control strategies achieve reliable and efficient coordinated rendezvous while addressing real-world challenges such as communication delays, sensor noise, and packet loss.
Authors:Juwon Kim, Hogyun Kim, Seokhwan Jeong, Youngsik Shin, Younggun Cho
Abstract:
We encounter large-scale environments where both structured and unstructured spaces coexist, such as on campuses. In this environment, lighting conditions and dynamic objects change constantly. To tackle the challenges of large-scale mapping under such conditions, we introduce DiTer++, a diverse terrain and multi-modal dataset designed for multi-robot SLAM in multi-session environments. According to our datasets' scenarios, Agent-A and Agent-B scan the area designated for efficient large-scale mapping day and night, respectively. Also, we utilize legged robots for terrain-agnostic traversing. To generate the ground-truth of each robot, we first build the survey-grade prior map. Then, we remove the dynamic objects and outliers from the prior map and extract the trajectory through scan-to-map matching. Our dataset and supplement materials are available at https://sites.google.com/view/diter-plusplus/.
Authors:Ruoyu Zhou, Zhiwei Zhang, Haocheng Han, Xiaodong Zhang, Zehan Chen, Jun Sun, Yulong Shen, Dehai Xu
Abstract:
Multi-robot swarms play an essential role in complex missions including battlefield reconnaissance, agricultural pest monitoring, as well as disaster search and rescue. Unfortunately, given the complexity of swarm algorithms, logical vulnerabilities are inevitable and often lead to severe safety and security consequences. Although various methods have been presented for detecting logical vulnerabilities through software testing, when they are used in swarm environments, these techniques face significant challenges: 1) Due to the swarm's vast composable parameter space, it is extremely difficult to generate failure-triggering scenarios, which is crucial to effectively expose logical vulnerabilities; 2) Because of the swarm's high flexibility and dynamism, it is challenging to model and evaluate the global swarm state, particularly in terms of cooperative behaviors, which makes it difficult to detect logical vulnerabilities. In this work, we propose RSFuzz, a robustness-guided swarm fuzzing framework designed to detect logical vulnerabilities in multi-robot systems. It leverages the robustness of behavioral constraints to quantitatively evaluate the swarm state and guide the generation of failure-triggering scenarios. In addition, RSFuzz identifies and targets key swarm nodes for perturbations, effectively reducing the input space. Upon the RSFuzz framework, we construct two swarm fuzzing schemes, Single Attacker Fuzzing (SA-Fuzzing) and Multiple Attacker Fuzzing (MA-Fuzzing), which employ single and multiple attackers, respectively, during fuzzing to disturb swarm mission execution. We evaluated RSFuzz's performance with three popular swarm algorithms in simulated environments. The results show that RSFuzz outperforms the state-of-the-art with an average improvement of 17.75\% in effectiveness and a 38.4\% increase in efficiency. We validated some vulnerabilities in real world.
Authors:Alexandre Pacheco, Sébastien De Vos, Andreagiovanni Reina, Marco Dorigo, Volker Strobel
Abstract:
Federated learning is a new approach to distributed machine learning that offers potential advantages such as reducing communication requirements and distributing the costs of training algorithms. Therefore, it could hold great promise in swarm robotics applications. However, federated learning usually requires a centralized server for the aggregation of the models. In this paper, we present a proof-of-concept implementation of federated learning in a robot swarm that does not compromise decentralization. To do so, we use blockchain technology to enable our robot swarm to securely synchronize a shared model that is the aggregation of the individual models without relying on a central server. We then show that introducing a single malfunctioning robot can, however, heavily disrupt the training process. To prevent such situations, we devise protection mechanisms that are implemented through secure and tamper-proof blockchain smart contracts. Our experiments are conducted in ARGoS, a physics-based simulator for swarm robotics, using the Ethereum blockchain protocol which is executed by each simulated robot.
Authors:Kanghoon Lee, Hyeonjun Kim, Jiachen Li, Jinkyoo Park
Abstract:
Multi-robot systems are widely used for coverage tasks that require efficient coordination across large environments. In Multi-Robot Coverage Path Planning (MCPP), the objective is typically to minimize the makespan by generating non-overlapping paths for full-area coverage. However, most existing methods assume uniform importance across regions, limiting their effectiveness in scenarios where some zones require faster attention. We introduce the Priority-Aware MCPP (PA-MCPP) problem, where a subset of the environment is designated as prioritized zones with associated weights. The goal is to minimize, in lexicographic order, the total priority-weighted latency of zone coverage and the overall makespan. To address this, we propose a scalable two-phase framework combining (1) greedy zone assignment with local search, spanning-tree-based path planning, and (2) Steiner-tree-guided residual coverage. Experiments across diverse scenarios demonstrate that our method significantly reduces priority-weighted latency compared to standard MCPP baselines, while maintaining competitive makespan. Sensitivity analyses further show that the method scales well with the number of robots and that zone coverage behavior can be effectively controlled by adjusting priority weights.
Authors:Yutong Wang, Yichun Qu, Tengxiang Wang, Lishuo Pan, Nora Ayanian
Abstract:
Maintaining connectivity is crucial in many multi-robot applications, yet fragile to obstacles and visual occlusions. We present a real-time distributed framework for multi-robot navigation certified by high-order control barrier functions (HOCBFs) that controls inter-robot proximity to maintain connectivity while avoiding collisions. We incorporate control Lyapunov functions to enable connectivity recovery from initial disconnected configurations and temporary losses, providing robust connectivity during navigation in obstacle-rich environments. Our trajectory generation framework concurrently produces planning and control through a Bezier-parameterized trajectory, which naturally provides smooth curves with arbitrary degree of derivatives. The main contribution is the unified MPC-CLF-CBF framework, a continuous-time trajectory generation and control method for connectivity maintenance and recovery of multi-robot systems. We validate the framework through extensive simulations and a physical experiment with 4 Crazyflie nano-quadrotors.
Authors:Lantao Li, Kang Yang, Rui Song, Chen Sun
Abstract:
Cooperative perception enabled by Vehicle-to-Everything communication has shown great promise in enhancing situational awareness for autonomous vehicles and other mobile robotic platforms. Despite recent advances in perception backbones and multi-agent fusion, real-world deployments remain challenged by hard detection cases, exemplified by partial detections and noise accumulation which limit downstream detection accuracy. This work presents Diffusion on Reinforced Cooperative Perception (DRCP), a real-time deployable framework designed to address aforementioned issues in dynamic driving environments. DRCP integrates two key components: (1) Precise-Pyramid-Cross-Modality-Cross-Agent, a cross-modal cooperative perception module that leverages camera-intrinsic-aware angular partitioning for attention-based fusion and adaptive convolution to better exploit external features; and (2) Mask-Diffusion-Mask-Aggregation, a novel lightweight diffusion-based refinement module that encourages robustness against feature perturbations and aligns bird's-eye-view features closer to the task-optimal manifold. The proposed system achieves real-time performance on mobile platforms while significantly improving robustness under challenging conditions. Code will be released in late 2025.
Authors:Zhiheng Chen, Wei Wang
Abstract:
Micro Autonomous Surface Vehicles (MicroASVs) offer significant potential for operations in confined or shallow waters and swarm robotics applications. However, achieving precise and robust control at such small scales remains highly challenging, mainly due to the complexity of modeling nonlinear hydrodynamic forces and the increased sensitivity to self-motion effects and environmental disturbances, including waves and boundary effects in confined spaces. This paper presents a physics-driven dynamics model for an over-actuated MicroASV and introduces a data-driven optimal control framework that leverages a weak formulation-based online model learning method. Our approach continuously refines the physics-driven model in real time, enabling adaptive control that adjusts to changing system parameters. Simulation results demonstrate that the proposed method substantially enhances trajectory tracking accuracy and robustness, even under unknown payloads and external disturbances. These findings highlight the potential of data-driven online learning-based optimal control to improve MicroASV performance, paving the way for more reliable and precise autonomous surface vehicle operations.
Authors:Markus Buchholz, Ignacio Carlucho, Zebin Huang, Michele Grimaldi, Pierre Nicolay, Sumer Tuncay, Yvan R. Petillot
Abstract:
This paper introduces CoralGuide, a novel framework designed for path planning and trajectory optimization for tethered multi-robot systems. We focus on marine robotics, which commonly have tethered configurations of an Autonomous Surface Vehicle (ASV) and an Autonomous Underwater Vehicle (AUV). CoralGuide provides safe navigation in marine environments by enhancing the A* algorithm with specialized heuristics tailored for tethered ASV-AUV systems. Our method integrates catenary curve modelling for tether management and employs Bezier curve interpolation for smoother trajectory planning, ensuring efficient and synchronized operations without compromising safety. Through simulations and real-world experiments, we have validated CoralGuides effectiveness in improving path planning and trajectory optimization, demonstrating its potential to significantly enhance operational capabilities in marine research and infrastructure inspection.
Authors:Markus Buchholz, Ignacio Carlucho, Michele Grimaldi, Yvan R. Petillot
Abstract:
This paper introduces a novel simulation framework for evaluating motion control in tethered multi-robot systems within dynamic marine environments. Specifically, it focuses on the coordinated operation of an Autonomous Underwater Vehicle (AUV) and an Autonomous Surface Vehicle(ASV). The framework leverages GazeboSim, enhanced with realistic marine environment plugins and ArduPilots SoftwareIn-The-Loop (SITL) mode, to provide a high-fidelity simulation platform. A detailed tether model, combining catenary equations and physical simulation, is integrated to accurately represent the dynamic interactions between the vehicles and the environment. This setup facilitates the development and testing of advanced control strategies under realistic conditions, demonstrating the frameworks capability to analyze complex tether interactions and their impact on system performance.
Authors:Ricardo Vega, Cameron Nowzari
Abstract:
Emergence and swarms are widely discussed topics, yet no consensus exists on their formal definitions. This lack of agreement makes it difficult not only for new researchers to grasp these concepts, but also for experts who may use the same terms to mean different things. Many attempts have been made to objectively define 'swarm' or 'emergence,' with recent work highlighting the role of the external observer. Still, several researchers argue that once an observer's vantage point (e.g., scope, resolution, context) is established, the terms can be made objective or measured quantitatively. In this note, we propose a framework to discuss these ideas rigorously by separating externally observable states from latent, unobservable ones. This allows us to compare and contrast existing definitions of swarms and emergence on common ground. We argue that these concepts are ultimately subjective-shaped less by the system itself than by the perception and tacit knowledge of the observer. Specifically, we suggest that a 'swarm' is not defined by its group behavior alone, but by the process generating that behavior. Our broader goal is to support the design and deployment of robotic swarm systems, highlighting the critical distinction between multi-robot systems and true swarms.
Authors:Jianhong Wang, Yang Li, Samuel Kaski, Jonathan Lawry
Abstract:
Open multi-agent systems are increasingly important in modeling real-world applications, such as smart grids, swarm robotics, etc. In this paper, we aim to investigate a recently proposed problem for open multi-agent systems, referred to as n-agent ad hoc teamwork (NAHT), where only a number of agents are controlled. Existing methods tend to be based on heuristic design and consequently lack theoretical rigor and ambiguous credit assignment among agents. To address these limitations, we model and solve NAHT through the lens of cooperative game theory. More specifically, we first model an open multi-agent system, characterized by its value, as an instance situated in a space of cooperative games, generated by a set of basis games. We then extend this space, along with the state space, to accommodate dynamic scenarios, thereby characterizing NAHT. Exploiting the justifiable assumption that basis game values correspond to a sequence of n-step returns with different horizons, we represent the state values for NAHT in a form similar to $λ$-returns. Furthermore, we derive Shapley values to allocate state values to the controlled agents, as credits for their contributions to the ad hoc team. Different from the conventional approach to shaping Shapley values in an explicit form, we shape Shapley values by fulfilling the three axioms uniquely describing them, well defined on the extended game space describing NAHT. To estimate Shapley values in dynamic scenarios, we propose a TD($λ$)-like algorithm. The resulting reinforcement learning (RL) algorithm is referred to as Shapley Machine. To our best knowledge, this is the first time that the concepts from cooperative game theory are directly related to RL concepts. In experiments, we demonstrate the effectiveness of Shapley Machine and verify reasonableness of our theory.
Authors:Khaled Wahba, Wolfgang Hönig
Abstract:
Motion planning problems for physically-coupled multi-robot systems in cluttered environments are challenging due to their high dimensionality. Existing methods combining sampling-based planners with trajectory optimization produce suboptimal results and lack theoretical guarantees. We propose Physically-coupled discontinuity-bounded Conflict-Based Search (pc-dbCBS), an anytime kinodynamic motion planner, that extends discontinuity-bounded CBS to rigidly-coupled systems. Our approach proposes a tri-level conflict detection and resolution framework that includes the physical coupling between the robots. Moreover, pc-dbCBS alternates iteratively between state space representations, thereby preserving probabilistic completeness and asymptotic optimality while relying only on single-robot motion primitives. Across 25 simulated and six real-world problems involving multirotors carrying a cable-suspended payload and differential-drive robots linked by rigid rods, pc-dbCBS solves up to 92% more instances than a state-of-the-art baseline and plans trajectories that are 50-60% faster while reducing planning time by an order of magnitude.
Authors:Siwei Cai, Yuwei Wu, Lifeng Zhou
Abstract:
Autonomous landing is essential for drones deployed in emergency deliveries, post-disaster response, and other large-scale missions. By enabling self-docking on charging platforms, it facilitates continuous operation and significantly extends mission endurance. However, traditional approaches often fall short in dynamic, unstructured environments due to limited semantic awareness and reliance on fixed, context-insensitive safety margins. To address these limitations, we propose a hybrid framework that integrates large language model (LLMs) with model predictive control (MPC). Our approach begins with a vision-language encoder (VLE) (e.g., BLIP), which transforms real-time images into concise textual scene descriptions. These descriptions are processed by a lightweight LLM (e.g., Qwen 2.5 1.5B or LLaMA 3.2 1B) equipped with retrieval-augmented generation (RAG) to classify scene elements and infer context-aware safety buffers, such as 3 meters for pedestrians and 5 meters for vehicles. The resulting semantic flags and unsafe regions are then fed into an MPC module, enabling real-time trajectory replanning that avoids collisions while maintaining high landing precision. We validate our framework in the ROS-Gazebo simulator, where it consistently outperforms conventional vision-based MPC baselines. Our results show a significant reduction in near-miss incidents with dynamic obstacles, while preserving accurate landings in cluttered environments.
Authors:Ze Zhang, Yifan Xue, Nadia Figueroa, Knut Ã
kesson
Abstract:
For safe and flexible navigation in multi-robot systems, this paper presents an enhanced and predictive sampling-based trajectory planning approach in complex environments, the Gradient Field-based Dynamic Window Approach (GF-DWA). Building upon the dynamic window approach, the proposed method utilizes gradient information of obstacle distances as a new cost term to anticipate potential collisions. This enhancement enables the robot to improve awareness of obstacles, including those with non-convex shapes. The gradient field is derived from the Gaussian process distance field, which generates both the distance field and gradient field by leveraging Gaussian process regression to model the spatial structure of the environment. Through several obstacle avoidance and fleet collision avoidance scenarios, the proposed GF-DWA is shown to outperform other popular trajectory planning and control methods in terms of safety and flexibility, especially in complex environments with non-convex obstacles.
Authors:Xinyu Zhang, Zewei Zhou, Zhaoyi Wang, Yangjie Ji, Yanjun Huang, Hong Chen
Abstract:
Vehicle-to-everything technologies (V2X) have become an ideal paradigm to extend the perception range and see through the occlusion. Exiting efforts focus on single-frame cooperative perception, however, how to capture the temporal cue between frames with V2X to facilitate the prediction task even the planning task is still underexplored. In this paper, we introduce the Co-MTP, a general cooperative trajectory prediction framework with multi-temporal fusion for autonomous driving, which leverages the V2X system to fully capture the interaction among agents in both history and future domains to benefit the planning. In the history domain, V2X can complement the incomplete history trajectory in single-vehicle perception, and we design a heterogeneous graph transformer to learn the fusion of the history feature from multiple agents and capture the history interaction. Moreover, the goal of prediction is to support future planning. Thus, in the future domain, V2X can provide the prediction results of surrounding objects, and we further extend the graph transformer to capture the future interaction among the ego planning and the other vehicles' intentions and obtain the final future scenario state under a certain planning action. We evaluate the Co-MTP framework on the real-world dataset V2X-Seq, and the results show that Co-MTP achieves state-of-the-art performance and that both history and future fusion can greatly benefit prediction.
Authors:Davide Peron, Victor Nan Fernandez-Ayala, Eleftherios E. Vlahakis, Dimos V. Dimarogonas
Abstract:
We consider multi-robot systems under recurring tasks formalized as linear temporal logic (LTL) specifications. To solve the planning problem efficiently, we propose a bottom-up approach combining offline plan synthesis with online coordination, dynamically adjusting plans via real-time communication. To address action delays, we introduce a synchronization mechanism ensuring coordinated task execution, leading to a multi-agent coordination and synchronization framework that is adaptable to a wide range of multi-robot applications. The software package is developed in Python and ROS2 for broad deployment. We validate our findings through lab experiments involving nine robots showing enhanced adaptability compared to previous methods. Additionally, we conduct simulations with up to ninety agents to demonstrate the reduced computational complexity and the scalability features of our work.
Authors:Lishuo Pan, Mattia Catellani, Lorenzo Sabattini, Nora Ayanian
Abstract:
Many approaches to multi-robot coordination are susceptible to failure due to communication loss and uncertainty in estimation. We present a real-time communication-free distributed algorithm for navigating robots to their desired goals certified by control barrier functions, that model and control the onboard sensing behavior to keep neighbors in the limited field of view for position estimation. The approach is robust to temporary tracking loss and directly synthesizes control in real time to stabilize visual contact through control Lyapunov-barrier functions. The main contributions of this paper are a continuous-time robust trajectory generation and control method certified by control barrier functions for distributed multi-robot systems and a discrete optimization procedure, namely, MPC-CBF, to approximate the certified controller. In addition, we propose a linear surrogate of high-order control barrier function constraints and use sequential quadratic programming to solve MPC-CBF efficiently. We demonstrate results in simulation with 10 robots and physical experiments with 2 custom-built UAVs. To the best of our knowledge, this work is the first of its kind to generate a robust continuous-time trajectory and controller concurrently, certified by control barrier functions utilizing piecewise splines.
Authors:Valerio Brunacci, Alberto Dionigi, Alessio De Angelis, Gabriele Costante
Abstract:
In multi-robot systems, relative localization between platforms plays a crucial role in many tasks, such as leader following, target tracking, or cooperative maneuvering. State of the Art (SotA) approaches either rely on infrastructure-based or on infrastructure-less setups. The former typically achieve high localization accuracy but require fixed external structures. The latter provide more flexibility, however, most of the works use cameras or lidars that require Line-of-Sight (LoS) to operate. Ultra Wide Band (UWB) devices are emerging as a viable alternative to build infrastructure-less solutions that do not require LoS. These approaches directly deploy the UWB sensors on the robots. However, they require that at least one of the platforms is static, limiting the advantages of an infrastructure-less setup. In this work, we remove this constraint and introduce an active method for infrastructure-less relative localization. Our approach allows the robot to adapt its position to minimize the relative localization error of the other platform. To this aim, we first design a specialized anchor placement for the active localization task. Then, we propose a novel UWB Relative Localization Loss that adapts the Geometric Dilution Of Precision metric to the infrastructure-less scenario. Lastly, we leverage this loss function to train an active Deep Reinforcement Learning-based controller for UWB relative localization. An extensive simulation campaign and real-world experiments validate our method, showing up to a 60% reduction of the localization error compared to current SotA approaches.
Authors:Kai Xiong, Xingyu Wu, Anna Duan, Supeng Leng, Jianhua He
Abstract:
The efficacy of UAV swarm cooperative perception fundamentally depends on three-dimensional (3D) formation geometry, which governs target observability and sensor complementarity. In the literature, the exploitation of formation geometry and its impact on UAV sensing have rarely been studied, which can significantly degrade multimodal cooperative perception at scenarios where heterogeneous payloads (vision cameras and LiDAR) should be geometrically arranged to exploit their complementary strengths while managing communication interference and hardware budgets. To bridge this critical gap, we propose an information-theoretic optimization framework that allocation of UAVs and multimodal sensors, configures formation geometries, and flight control. The UAV-sensor allocation is optimized by the Fisher Information Matrix (FIM) determinant maximization. Under this framework we introduce an equivalent formation transition strategy that enhances field-of-view (FOV) coverage without compromising perception accuracy and communication interference. Furthermore, we design a novel Lyapunov-stable flight control scheme with logarithmic potential fields to generate energy-efficient trajectories for formation transitions. Extensive simulations demonstrate our formation-aware design achieves 25.0\% improvement in FOV coverage, 104.2\% enhancement in communication signal strength, and 47.2\% reduction in energy consumption compared to conventional benchmarks. This work establishes that task-driven geometric configuration represents a foundational rather than incidental component in next-generation UAV swarm systems.
Authors:Peng Chen, Jing Liangb, Kang-Jia Qiao, Hui Song, Cai-Tong Yue, Kun-Jie Yu, Ponnuthurai Nagaratnam Suganthan, Witold Pedrycz
Abstract:
The continuous innovation of smart robotic technologies is driving the development of smart orchards, significantly enhancing the potential for automated harvesting systems. While multi-robot systems offer promising solutions to address labor shortages and rising costs, the efficient scheduling of these systems presents complex optimization challenges. This research investigates the multi-trip picking robot task scheduling (MTPRTS) problem. The problem is characterized by its provision for robot redeployment while maintaining strict adherence to makespan constraints, and encompasses the interdependencies among robot weight, robot load, and energy consumption, thus introducing substantial computational challenges that demand sophisticated optimization algorithms.To effectively tackle this complexity, metaheuristic approaches, which often utilize local search mechanisms, are widely employed. Despite the critical role of local search in vehicle routing problems, most existing algorithms are hampered by redundant local operations, leading to slower search processes and higher risks of local optima, particularly in large-scale scenarios. To overcome these limitations, we propose an adaptive experience-based discrete genetic algorithm (AEDGA) that introduces three key innovations: (1) integrated load-distance balancing initialization method, (2) a clustering-based local search mechanism, and (3) an experience-based adaptive selection strategy. To ensure solution feasibility under makespan constraints, we develop a solution repair strategy implemented through three distinct frameworks. Comprehensive experiments on 18 proposed test instances and 24 existing test problems demonstrate that AEDGA significantly outperforms eight state-of-the-art algorithms.
Authors:Peng Chen, Jing Liang, Kang-Jia Qiao, Hui Song, Tian-lei Ma, Kun-Jie Yu, Cai-Tong Yue, Ponnuthurai Nagaratnam Suganthan, Witold Pedryc
Abstract:
Multi-robot systems have emerged as a key technology for addressing the efficiency and cost challenges in labor-intensive industries. In the representative scenario of smart farming, planning efficient harvesting schedules for a fleet of electric robots presents a highly challenging frontier problem. The complexity arises not only from the need to find Pareto-optimal solutions for the conflicting objectives of makespan and transportation cost, but also from the necessity to simultaneously manage payload constraints and finite battery capacity. When robot loads are dynamically updated during planned multi-trip operations, a mandatory recharge triggered by energy constraints introduces an unscheduled load reset. This interaction creates a complex cascading effect that disrupts the entire schedule and renders traditional optimization methods ineffective. To address this challenge, this paper proposes the segment anchoring-based balancing algorithm (SABA). The core of SABA lies in the organic combination of two synergistic mechanisms: the sequential anchoring and balancing mechanism, which leverages charging decisions as `anchors' to systematically reconstruct disrupted routes, while the proportional splitting-based rebalancing mechanism is responsible for the fine-grained balancing and tuning of the final solutions' makespans. Extensive comparative experiments, conducted on a real-world case study and a suite of benchmark instances, demonstrate that SABA comprehensively outperforms 6 state-of-the-art algorithms in terms of both solution convergence and diversity. This research provides a novel theoretical perspective and an effective solution for the multi-robot task allocation problem under energy constraints.
Authors:Xuan-Thuan Nguyen, Khac Nam Nguyen, Ngoc Duy Tran, Thi Thoa Mac, Anh Nguyen, Hoang Hiep Ly, Tung D. Ta
Abstract:
Multi-robot systems, particularly mobile manipulators, face challenges in control coordination and dynamic stability when working together. To address this issue, this study proposes MobiDock, a modular self-reconfigurable mobile manipulator system that allows two independent robots to physically connect and form a unified mobile bimanual platform. This process helps transform a complex multi-robot control problem into the management of a simpler, single system. The system utilizes an autonomous docking strategy based on computer vision with AprilTag markers and a new threaded screw-lock mechanism. Experimental results show that the docked configuration demonstrates better performance in dynamic stability and operational efficiency compared to two independently cooperating robots. Specifically, the unified system has lower Root Mean Square (RMS) Acceleration and Jerk values, higher angular precision, and completes tasks significantly faster. These findings confirm that physical reconfiguration is a powerful design principle that simplifies cooperative control, improving stability and performance for complex tasks in real-world environments.
Authors:Mohamed Manzour, Catherine M. Elias, Omar M. Shehata, Rubén Izquierdo, Miguel Ãngel Sotelo
Abstract:
Research on lane change prediction has gained attention in the last few years. Most existing works in this area have been conducted in simulation environments or with pre-recorded datasets, these works often rely on simplified assumptions about sensing, communication, and traffic behavior that do not always hold in practice. Real-world deployments of lane-change prediction systems are relatively rare, and when they are reported, the practical challenges, limitations, and lessons learned are often under-documented. This study explores cooperative lane-change prediction through a real hardware deployment in mixed traffic and shares the insights that emerged during implementation and testing. We highlight the practical challenges we faced, including bottlenecks, reliability issues, and operational constraints that shaped the behavior of the system. By documenting these experiences, the study provides guidance for others working on similar pipelines.
Authors:Aymeric Vellinger, Nemanja Antonic, Elio Tuci
Abstract:
Swarm intelligence emerges from decentralised interactions among simple agents, enabling collective problem-solving. This study establishes a theoretical equivalence between pheromone-mediated aggregation in \celeg\ and reinforcement learning (RL), demonstrating how stigmergic signals function as distributed reward mechanisms. We model engineered nematode swarms performing foraging tasks, showing that pheromone dynamics mathematically mirror cross-learning updates, a fundamental RL algorithm. Experimental validation with data from literature confirms that our model accurately replicates empirical \celeg\ foraging patterns under static conditions. In dynamic environments, persistent pheromone trails create positive feedback loops that hinder adaptation by locking swarms into obsolete choices. Through computational experiments in multi-armed bandit scenarios, we reveal that introducing a minority of exploratory agents insensitive to pheromones restores collective plasticity, enabling rapid task switching. This behavioural heterogeneity balances exploration-exploitation trade-offs, implementing swarm-level extinction of outdated strategies. Our results demonstrate that stigmergic systems inherently encode distributed RL processes, where environmental signals act as external memory for collective credit assignment. By bridging synthetic biology with swarm robotics, this work advances programmable living systems capable of resilient decision-making in volatile environments.
Authors:Peng Chen, Jing Liang, Hui Song, Kang-Jia Qiao, Cai-Tong Yue, Kun-Jie Yu, Ponnuthurai Nagaratnam Suganthan, Witold Pedrycz
Abstract:
The increasing labor costs in agriculture have accelerated the adoption of multi-robot systems for orchard harvesting. However, efficiently coordinating these systems is challenging due to the complex interplay between makespan and energy consumption, particularly under practical constraints like load-dependent speed variations and battery limitations. This paper defines the multi-objective agricultural multi-electrical-robot task allocation (AMERTA) problem, which systematically incorporates these often-overlooked real-world constraints. To address this problem, we propose a hybrid hierarchical route reconstruction algorithm (HRRA) that integrates several innovative mechanisms, including a hierarchical encoding structure, a dual-phase initialization method, task sequence optimizers, and specialized route reconstruction operators. Extensive experiments on 45 test instances demonstrate HRRA's superior performance against seven state-of-the-art algorithms. Statistical analysis, including the Wilcoxon signed-rank and Friedman tests, empirically validates HRRA's competitiveness and its unique ability to explore previously inaccessible regions of the solution space. In general, this research contributes to the theoretical understanding of multi-robot coordination by offering a novel problem formulation and an effective algorithm, thereby also providing practical insights for agricultural automation.
Authors:Leo Cazenille, Loona Macabre, Nicolas Bredeche
Abstract:
Pogobots are a new type of open-source/open-hardware robots specifically designed for swarm robotics research. Their cost-effective and modular design, complemented by vibration-based and wheel-based locomotion, fast infrared communication and extensive software architecture facilitate the implementation of swarm intelligence algorithms. However, testing even simple distributed algorithms directly on robots is particularly labor-intensive. Scaling to more complex problems or calibrate user code parameters will have a prohibitively high strain on available resources. In this article we present Pogosim, a fast and scalable simulator for Pogobots, designed to reduce as much as possible algorithm development costs. The exact same code will be used in both simulation and to experimentally drive real robots. This article details the software architecture of Pogosim, explain how to write configuration files and user programs and how simulations approximate or differ from experiments. We describe how a large set of simulations can be launched in parallel, how to retrieve and analyze the simulation results, and how to optimize user code parameters using optimization algorithms.
Authors:Kartik A. Pant, Jaehyeok Kim, James M. Goppert, Inseok Hwang
Abstract:
The problem of multi-robot coverage control becomes significantly challenging when multiple robots leave the mission space simultaneously to charge their batteries, disrupting the underlying network topology for communication and sensing. To address this, we propose a resilient network design and control approach that allows robots to achieve the desired coverage performance while satisfying energy constraints and maintaining network connectivity throughout the mission. We model the combined motion, energy, and network dynamics of the multirobot systems (MRS) as a hybrid system with three modes, i.e., coverage, return-to-base, and recharge, respectively. We show that ensuring the energy constraints can be transformed into designing appropriate guard conditions for mode transition between each of the three modes. Additionally, we present a systematic procedure to design, maintain, and reconfigure the underlying network topology using an energy-aware bearing rigid network design, enhancing the structural resilience of the MRS even when a subset of robots departs to charge their batteries. Finally, we validate our proposed method using numerical simulations.
Authors:Rathin Chandra Shit, Sharmila Subudhi
Abstract:
In this paper, a novel framework is presented that achieves a combined solution based on Multi-Robot Task Allocation (MRTA) and collision avoidance with respect to homogeneous measurement tasks taking place in industrial environments. The spatial clustering we propose offers to simultaneously solve the task allocation problem and deal with collision risks by cutting the workspace into distinguishable operational zones for each robot. To divide task sites and to schedule robot routes within corresponding clusters, we use K-means clustering and the 2-Opt algorithm. The presented framework shows satisfactory performance, where up to 93\% time reduction (1.24s against 17.62s) with a solution quality improvement of up to 7\% compared to the best performing method is demonstrated. Our method also completely eliminates collision points that persist in comparative methods in a most significant sense. Theoretical analysis agrees with the claim that spatial partitioning unifies the apparently disjoint tasks allocation and collision avoidance problems under conditions of many identical tasks to be distributed over sparse geographical areas. Ultimately, the findings in this work are of substantial importance for real world applications where both computational efficiency and operation free from collisions is of paramount importance.
Authors:Alessia Loi, Loona Macabre, Jérémy Fersula, Keivan Amini, Leo Cazenille, Fabien Caura, Alexandre Guerre, Stéphane Gourichon, Olivier Dauchot, Nicolas Bredeche
Abstract:
This paper describes the Pogobot, an open-source and open-hardware platform specifically designed for research involving swarm robotics. Pogobot features vibration-based locomotion, infrared communication, and an array of sensors in a cost-effective package (approx. 250~euros/unit). The platform's modular design, comprehensive API, and extensible architecture facilitate the implementation of swarm intelligence algorithms and distributed online reinforcement learning algorithms. Pogobots offer an accessible alternative to existing platforms while providing advanced capabilities including directional communication between units. More than 200 Pogobots are already being used on a daily basis at Sorbonne Université and PSL to study self-organizing systems, programmable active matter, discrete reaction-diffusion-advection systems as well as models of social learning and evolution.
Authors:Ahmad Sarlak, Rahul Amin, Abolfazl Razi
Abstract:
Autonomous Vehicles (AVs) rely on individual perception systems to navigate safely. However, these systems face significant challenges in adverse weather conditions, complex road geometries, and dense traffic scenarios. Cooperative Perception (CP) has emerged as a promising approach to extending the perception quality of AVs by jointly processing shared camera feeds and sensor readings across multiple vehicles. This work presents a novel CP framework designed to optimize vehicle selection and networking resource utilization under imperfect communications. Our optimized CP formation considers critical factors such as the helper vehicles' spatial position, visual range, motion blur, and available communication budgets. Furthermore, our resource optimization module allocates communication channels while adjusting power levels to maximize data flow efficiency between the ego and helper vehicles, considering realistic models of modern vehicular communication systems, such as LTE and 5G NR-V2X. We validate our approach through extensive experiments on pedestrian detection in challenging scenarios, using synthetic data generated by the CARLA simulator. The results demonstrate that our method significantly improves upon the perception quality of individual AVs with about 10% gain in detection accuracy. This substantial gain uncovers the unleashed potential of CP to enhance AV safety and performance in complex situations.
Authors:Di Meng, Tianhao Zhao, Chaoyu Xue, Jun Wu, Qiuguo Zhu
Abstract:
Multi-robot autonomous exploration in an unknown environment is an important application in robotics.Traditional exploration methods only use information around frontier points or viewpoints, ignoring spatial information of unknown areas. Moreover, finding the exact optimal solution for multi-robot task allocation is NP-hard, resulting in significant computational time consumption. To address these issues, we present a hierarchical multi-robot exploration framework using a new modeling method called RegionGraph. The proposed approach makes two main contributions: 1) A new modeling method for unexplored areas that preserves their spatial information across the entire space in a weighted graph called RegionGraph. 2) A hierarchical multi-robot exploration framework that decomposes the global exploration task into smaller subtasks, reducing the frequency of global planning and enabling asynchronous exploration. The proposed method is validated through both simulation and real-world experiments, demonstrating a 20% improvement in efficiency compared to existing methods.
Authors:Kartik A. Pant, Vishnu Vijay, Minhyun Cho, Inseok Hwang
Abstract:
The problem of multi-robot coverage control has been widely studied to efficiently coordinate a team of robots to cover a desired area of interest. However, this problem faces significant challenges when some robots are lost or deviate from their desired formation during the mission due to faults or cyberattacks. Since a majority of multi-robot systems (MRSs) rely on communication and relative sensing for their efficient operation, a failure in one robot could result in a cascade of failures in the entire system. In this work, we propose a hierarchical framework for area coverage, combining centralized coordination by leveraging Voronoi partitioning with decentralized reference tracking model predictive control (MPC) for control design. In addition to reference tracking, the decentralized MPC also performs bearing maintenance to enforce a rigid MRS network, thereby enhancing the structural resilience, i.e., the ability to detect and mitigate the effects of localization errors and robot loss during the mission. Furthermore, we show that the resulting control architecture guarantees the recovery of the MRS network in the event of robot loss while maintaining a minimally rigid structure. The effectiveness of the proposed algorithm is validated through numerical simulations.
Authors:Giorgio Cignarale, Stephan Felber, Eric Goubault, Bernardo Hummes Flores, Hugo Rincon Galeana
Abstract:
In this paper, we provide a framework integrating distributed multi-robot systems and temporal epistemic logic. We show that continuous-discrete hybrid systems are compatible with logical models of knowledge already used in distributed computing, and demonstrate its usefulness by deriving sufficient epistemic conditions for exploration and gathering robot tasks to be solvable. We provide a separation of the physical and computational aspects of a robotic system, allowing us to decouple the problems related to each and directly use methods from control theory and distributed computing, fields that are traditionally distant in the literature. Finally, we demonstrate a novel approach for reasoning about the knowledge in multi-robot systems through a principled method of converting a switched hybrid dynamical system into a temporal-epistemic logic model, passing through an abstract state machine representation. This creates space for methods and results to be exchanged across the fields of control theory, distributed computing and temporal-epistemic logic, while reasoning about multi-robot systems.
Authors:Prajit KrisshnaKumar, Steve Paul, Souma Chowdhury
Abstract:
Interesting and efficient collective behavior observed in multi-robot or swarm systems emerges from the individual behavior of the robots. The functional space of individual robot behaviors is in turn shaped or constrained by the robot's morphology or physical design. Thus the full potential of multi-robot systems can be realized by concurrently optimizing the morphology and behavior of individual robots, informed by the environment's feedback about their collective performance, as opposed to treating morphology and behavior choices disparately or in sequence (the classical approach). This paper presents an efficient concurrent design or co-design method to explore this potential and understand how morphology choices impact collective behavior, particularly in an MRTA problem focused on a flood response scenario, where the individual behavior is designed via graph reinforcement learning. Computational efficiency in this case is attributed to a new way of near exact decomposition of the co-design problem into a series of simpler optimization and learning problems. This is achieved through i) the identification and use of the Pareto front of Talent metrics that represent morphology-dependent robot capabilities, and ii) learning the selection of Talent best trade-offs and individual robot policy that jointly maximizes the MRTA performance. Applied to a multi-unmanned aerial vehicle flood response use case, the co-design outcomes are shown to readily outperform sequential design baselines. Significant differences in morphology and learned behavior are also observed when comparing co-designed single robot vs. co-designed multi-robot systems for similar operations.
Authors:Pei Liu, Nanfang Zheng, Yiqun Li, Junlan Chen, Ziyuan Pu
Abstract:
With the development of AI-assisted driving, numerous methods have emerged for ego-vehicle 3D perception tasks, but there has been limited research on roadside perception. With its ability to provide a global view and a broader sensing range, the roadside perspective is worth developing. LiDAR provides precise three-dimensional spatial information, while cameras offer semantic information. These two modalities are complementary in 3D detection. However, adding camera data does not increase accuracy in some studies since the information extraction and fusion procedure is not sufficiently reliable. Recently, Kolmogorov-Arnold Networks (KANs) have been proposed as replacements for MLPs, which are better suited for high-dimensional, complex data. Both the camera and the LiDAR provide high-dimensional information, and employing KANs should enhance the extraction of valuable features to produce better fusion outcomes. This paper proposes Kaninfradet3D, which optimizes the feature extraction and fusion modules. To extract features from complex high-dimensional data, the model's encoder and fuser modules were improved using KAN Layers. Cross-attention was applied to enhance feature fusion, and visual comparisons verified that camera features were more evenly integrated. This addressed the issue of camera features being abnormally concentrated, negatively impacting fusion. Compared to the benchmark, our approach shows improvements of +9.87 mAP and +10.64 mAP in the two viewpoints of the TUMTraf Intersection Dataset and an improvement of +1.40 mAP in the roadside end of the TUMTraf V2X Cooperative Perception Dataset. The results indicate that Kaninfradet3D can effectively fuse features, demonstrating the potential of applying KANs in roadside perception tasks.
Authors:Abhinav Chakraborty, Pritam Goswami, Satakshi Ghosh
Abstract:
Gathering is a fundamental coordination problem in swarm robotics, where the objective is to bring robots together at a point not known to them at the beginning. While most research focuses on continuous domains, some studies also examine the discrete domain. This paper addresses the optimal gathering problem on an infinite grid, aiming to improve the energy efficiency by minimizing the maximum distance any robot must travel. The robots are autonomous, anonymous, homogeneous, identical, and oblivious. We identify all initial configurations where the optimal gathering problem is unsolvable. For the remaining configurations, we introduce a deterministic distributed algorithm that effectively gathers $n$ robots ($n\ge 9$). The algorithm ensures that the robots gathers at one of the designated min-max nodes in the grid. Additionally, we provide a comprehensive characterization of the subgraph formed by the min-max nodes in this infinite grid model.
Authors:Han Liu, Yu Jin, Tianjiang Hu, Kai Huang
Abstract:
Collision avoidance and trajectory planning are crucial in multi-robot systems, particularly in environments with numerous obstacles. Although extensive research has been conducted in this field, the challenge of rapid traversal through such environments has not been fully addressed. This paper addresses this problem by proposing a novel real-time scheduling scheme designed to optimize the passage of multi-robot systems through complex, obstacle-rich maps. Inspired from network flow optimization, our scheme decomposes the environment into a network structure, enabling the efficient allocation of robots to paths based on real-time congestion data. The proposed scheduling planner operates on top of existing collision avoidance algorithms, focusing on minimizing traversal time by balancing robot detours and waiting times. Our simulation results demonstrate the efficiency of the proposed scheme. Additionally, we validated its effectiveness through real world flight tests using ten quadrotors. This work contributes a lightweight, effective scheduling planner capable of meeting the real-time demands of multi-robot systems in obstacle-rich environments.
Authors:Robinroy Peter, Lavanya Ratnabala, Eugene Yugarajah Andrew Charles, Dzmitry Tsetserukou
Abstract:
This paper addresses the challenges of exploration and navigation in unknown environments from the perspective of evolutionary swarm robotics. A key focus is on path formation, which is essential for enabling cooperative swarm robots to navigate effectively. We designed the task allocation and path formation process based on a finite state machine, ensuring systematic decision-making and efficient state transitions. The approach is decentralized, allowing each robot to make decisions independently based on local information, which enhances scalability and robustness. We present a novel subgoal-based path formation method that establishes paths between locations by leveraging visually connected subgoals. Simulation experiments conducted in the Argos simulator show that this method successfully forms paths in the majority of trials. However, inter-collision (traffic) among numerous robots during path formation can negatively impact performance. To address this issue, we propose a task allocation strategy that uses local communication protocols and light signal-based communication to manage robot deployment. This strategy assesses the distance between points and determines the optimal number of robots needed for the path formation task, thereby reducing unnecessary exploration and traffic congestion. The performance of both the subgoal-based path formation method and the task allocation strategy is evaluated by comparing the path length, time, and resource usage against the A* algorithm. Simulation results demonstrate the effectiveness of our approach, highlighting its scalability, robustness, and fault tolerance.
Authors:Ali Azarbahram, Shenyu Liu, Gian Paolo Incremona
Abstract:
This paper presents a unified and scalable framework for predictive and safe autonomous navigation in dynamic transportation environments by integrating model predictive control (MPC) with distributed Koopman operator learning. High-dimensional sensory data are employed to model and forecast the motion of surrounding dynamic obstacles. A consensus-based distributed Koopman learning algorithm enables multiple computational agents or sensing units to collaboratively estimate the Koopman operator without centralized data aggregation, thereby supporting large-scale and communication-efficient learning across a networked system. The learned operator predicts future spatial densities of obstacles, which are subsequently represented through Gaussian mixture models. Their confidence ellipses are approximated by convex polytopes and embedded as linear constraints in the MPC formulation to guarantee safe and collision-free navigation. The proposed approach not only ensures obstacle avoidance but also scales efficiently with the number of sensing or computational nodes, aligning with cooperative perception principles in intelligent transportation system (ITS) applications. Theoretical convergence guarantees and predictive constraint formulations are established, and extensive simulations demonstrate reliable, safe, and computationally efficient navigation performance in complex environments.
Authors:Zili Tang, Ying Zhang, Meng Guo
Abstract:
Many robots are not equipped with a manipulator and many objects are not suitable for prehensile manipulation (such as large boxes and cylinders). In these cases, pushing is a simple yet effective non-prehensile skill for robots to interact with and further change the environment. Existing work often assumes a set of predefined pushing modes and fixed-shape objects. This work tackles the general problem of controlling a robotic fleet to push collaboratively numerous arbitrary objects to respective destinations, within complex environments of cluttered and movable obstacles. It incorporates several characteristic challenges for multi-robot systems such as online task coordination under large uncertainties of cost and duration, and for contact-rich tasks such as hybrid switching among different contact modes, and under-actuation due to constrained contact forces. The proposed method is based on combinatorial hybrid optimization over dynamic task assignments and hybrid execution via sequences of pushing modes and associated forces. It consists of three main components: (I) the decomposition, ordering and rolling assignment of pushing subtasks to robot subgroups; (II) the keyframe guided hybrid search to optimize the sequence of parameterized pushing modes for each subtask; (III) the hybrid control to execute these modes and transit among them. Last but not least, a diffusion-based accelerator is adopted to predict the keyframes and pushing modes that should be prioritized during hybrid search; and further improve planning efficiency. The framework is complete under mild assumptions. Its efficiency and effectiveness under different numbers of robots and general-shaped objects are validated extensively in simulations and hardware experiments, as well as generalizations to heterogeneous robots, planar assembly and 6D pushing.
Authors:Guibin Sun, Jinhu Lü, Kexin Liu, Zhenqian Wang, Guanrong Chen
Abstract:
Swarms evolving from collective behaviors among multiple individuals are commonly seen in nature, which enables biological systems to exhibit more efficient and robust collaboration. Creating similar swarm intelligence in engineered robots poses challenges to the design of collaborative algorithms that can be programmed at large scales. The assignment-based method has played an eminent role for a very long time in solving collaboration problems of robot swarms. However, it faces fundamental limitations in terms of efficiency and robustness due to its unscalability to swarm variants. This article presents a tutorial review on recent advances in assignment-free collaboration of robot swarms, focusing on the problem of shape formation. A key theoretical component is the recently developed \emph{mean-shift exploration} strategy, which improves the collaboration efficiency of large-scale swarms by dozens of times. Further, the efficiency improvement is more significant as the swarm scale increases. Finally, this article discusses three important applications of the mean-shift exploration strategy, including precise shape formation, area coverage formation, and maneuvering formation, as well as their corresponding industrial scenarios in smart warehousing, area exploration, and cargo transportation.
Authors:Fernando Salanova, Jesús Roche, Cristian Mahuela, Eduardo Montijano
Abstract:
The reliable execution of high-level missions in multi-robot systems with heterogeneous agents, requires robust methods for detecting spurious behaviors. In this paper, we address the challenge of identifying spurious executions of plans specified as a Linear Temporal Logic (LTL) formula, as incorrect task sequences, violations of spatial constraints, timing inconsis- tencies, or deviations from intended mission semantics. To tackle this, we introduce a structured data generation framework based on the Nets-within-Nets (NWN) paradigm, which coordinates robot actions with LTL-derived global mission specifications. We further propose a Transformer-based anomaly detection pipeline that classifies robot trajectories as normal or anomalous. Experi- mental evaluations show that our method achieves high accuracy (91.3%) in identifying execution inefficiencies, and demonstrates robust detection capabilities for core mission violations (88.3%) and constraint-based adaptive anomalies (66.8%). An ablation experiment of the embedding and architecture was carried out, obtaining successful results where our novel proposition performs better than simpler representations.
Authors:Jesús Roche, Eduardo Sebastián, Eduardo Montijano
Abstract:
Learning control policies for multi-robot systems (MRS) remains a major challenge due to long-term coordination and the difficulty of obtaining realistic training data. In this work, we address both limitations within an imitation learning framework. First, we shift the typical role of Curriculum Learning in MRS, from scalability with the number of robots, to focus on improving long-term coordination. We propose a curriculum strategy that gradually increases the length of expert trajectories during training, stabilizing learning and enhancing the accuracy of long-term behaviors. Second, we introduce a method to approximate the egocentric perception of each robot using only third-person global state demonstrations. Our approach transforms idealized trajectories into locally available observations by filtering neighbors, converting reference frames, and simulating onboard sensor variability. Both contributions are integrated into a physics-informed technique to produce scalable, distributed policies from observations. We conduct experiments across two tasks with varying team sizes and noise levels. Results show that our curriculum improves long-term accuracy, while our perceptual estimation method yields policies that are robust to realistic uncertainty. Together, these strategies enable the learning of robust, distributed controllers from global demonstrations, even in the absence of expert actions or onboard measurements.
Authors:Yuyang Zhang, Zhuoli Tian, Jinsheng Wei, Meng Guo
Abstract:
Fleets of autonomous robots have been deployed for exploration of unknown scenes for features of interest, e.g., subterranean exploration, reconnaissance, search and rescue missions. During exploration, the robots may encounter un-identified targets, blocked passages, interactive objects, temporary failure, or other unexpected events, all of which require consistent human assistance with reliable communication for a time period. This however can be particularly challenging if the communication among the robots is severely restricted to only close-range exchange via ad-hoc networks, especially in extreme environments like caves and underground tunnels. This paper presents a novel human-centric interactive exploration and assistance framework called FlyKites, for multi-robot systems under limited communication. It consists of three interleaved components: (I) the distributed exploration and intermittent communication (called the "spread mode"), where the robots collaboratively explore the environment and exchange local data among the fleet and with the operator; (II) the simultaneous optimization of the relay topology, the operator path, and the assignment of robots to relay roles (called the "relay mode"), such that all requested assistance can be provided with minimum delay; (III) the human-in-the-loop online execution, where the robots switch between different roles and interact with the operator adaptively. Extensive human-in-the-loop simulations and hardware experiments are performed over numerous challenging scenes.
Authors:Min Deng, Bo Fu, Lingyao Li, Xi Wang
Abstract:
Multi-robot systems are emerging as a promising solution to the growing demand for productivity, safety, and adaptability across industrial sectors. However, effectively coordinating multiple robots in dynamic and uncertain environments, such as construction sites, remains a challenge, particularly due to unpredictable factors like material delays, unexpected site conditions, and weather-induced disruptions. To address these challenges, this study proposes an adaptive task allocation framework that strategically leverages the synergistic potential of Digital Twins, Integer Programming (IP), and Large Language Models (LLMs). The multi-robot task allocation problem is formally defined and solved using an IP model that accounts for task dependencies, robot heterogeneity, scheduling constraints, and re-planning requirements. A mechanism for narrative-driven schedule adaptation is introduced, in which unstructured natural language inputs are interpreted by an LLM, and optimization constraints are autonomously updated, enabling human-in-the-loop flexibility without manual coding. A digital twin-based system has been developed to enable real-time synchronization between physical operations and their digital representations. This closed-loop feedback framework ensures that the system remains dynamic and responsive to ongoing changes on site. A case study demonstrates both the computational efficiency of the optimization algorithm and the reasoning performance of several LLMs, with top-performing models achieving over 97% accuracy in constraint and parameter extraction. The results confirm the practicality, adaptability, and cross-domain applicability of the proposed methods.
Authors:Aditya Bhatt, Mary Katherine Corra, Franklin Merlo, Prajit KrisshnaKumar, Souma Chowdhury
Abstract:
Signal source localization has been a problem of interest in the multi-robot systems domain given its applications in search & rescue and hazard localization in various industrial and outdoor settings. A variety of multi-robot search algorithms exist that usually formulate and solve the associated autonomous motion planning problem as a heuristic model-free or belief model-based optimization process. Most of these algorithms however remains tested only in simulation, thereby losing the opportunity to generate knowledge about how such algorithms would compare/contrast in a real physical setting in terms of search performance and real-time computing performance. To address this gap, this paper presents a new lab-scale physical setup and associated open-source software pipeline to evaluate and benchmark multi-robot search algorithms. The presented physical setup innovatively uses an acoustic source (that is safe and inexpensive) and small ground robots (e-pucks) operating in a standard motion-capture environment. This setup can be easily recreated and used by most robotics researchers. The acoustic source also presents interesting uncertainty in terms of its noise-to-signal ratio, which is useful to assess sim-to-real gaps. The overall software pipeline is designed to readily interface with any multi-robot search algorithm with minimal effort and is executable in parallel asynchronous form. This pipeline includes a framework for distributed implementation of multi-robot or swarm search algorithms, integrated with a ROS (Robotics Operating System)-based software stack for motion capture supported localization. The utility of this novel setup is demonstrated by using it to evaluate two state-of-the-art multi-robot search algorithms, based on swarm optimization and batch-Bayesian Optimization (called Bayes-Swarm), as well as a random walk baseline.
Authors:Luca Ballotta, Ãron Vékássy, Stephanie Gil, Michal Yemini
Abstract:
Wireless communication-based multi-robot systems open the door to cyberattacks that can disrupt safety and performance of collaborative robots. The physical channel supporting inter-robot communication offers an attractive opportunity to decouple the detection of malicious robots from task-relevant data exchange between legitimate robots. Yet, trustworthiness indications coming from physical channels are uncertain and must be handled with this in mind. In this paper, we propose a resilient protocol for multi-robot operation wherein a parameter λt accounts for how confident a robot is about the legitimacy of nearby robots that the physical channel indicates. Analytical results prove that our protocol achieves resilient coordination with arbitrarily many malicious robots under mild assumptions. Tuning λt allows a designer to trade between near-optimal inter-robot coordination and quick task execution; see Fig. 1. This is a fundamental performance tradeoff and must be carefully evaluated based on the task at hand. The effectiveness of our approach is numerically verified with experiments involving platoons of autonomous cars where some vehicles are maliciously spoofed.
Authors:Zhaoxing Li, Wenbo Wu, Yue Wang, Yanran Xu, William Hunt, Sebastian Stein
Abstract:
Rapid advancements in artificial intelligence (AI) have enabled robots to performcomplex tasks autonomously with increasing precision. However, multi-robot systems (MRSs) face challenges in generalization, heterogeneity, and safety, especially when scaling to large-scale deployments like disaster response. Traditional approaches often lack generalization, requiring extensive engineering for new tasks and scenarios, and struggle with managing diverse robots. To overcome these limitations, we propose a Human-in-the-loop Multi-Robot Collaboration Framework (HMCF) powered by large language models (LLMs). LLMs enhance adaptability by reasoning over diverse tasks and robot capabilities, while human oversight ensures safety and reliability, intervening only when necessary. Our framework seamlessly integrates human oversight, LLM agents, and heterogeneous robots to optimize task allocation and execution. Each robot is equipped with an LLM agent capable of understanding its capabilities, converting tasks into executable instructions, and reducing hallucinations through task verification and human supervision. Simulation results show that our framework outperforms state-of-the-art task planning methods, achieving higher task success rates with an improvement of 4.76%. Real-world tests demonstrate its robust zero-shot generalization feature and ability to handle diverse tasks and environments with minimal human intervention.
Authors:Zhongqi Wei, Xusheng Luo, Changliu Liu
Abstract:
Task and motion planning (TAMP) for multi-robot systems, which integrates discrete task planning with continuous motion planning, remains a challenging problem in robotics. Existing TAMP approaches often struggle to scale effectively for multi-robot systems with complex specifications, leading to infeasible solutions and prolonged computation times. This work addresses the TAMP problem in multi-robot settings where tasks are specified using expressive hierarchical temporal logic and task assignments are not pre-determined. Our approach leverages the efficiency of hierarchical temporal logic specifications for task-level planning and the optimization-based graph of convex sets method for motion-level planning, integrating them within a product graph framework. At the task level, we convert hierarchical temporal logic specifications into a single graph, embedding task allocation within its edges. At the motion level, we represent the feasible motions of multiple robots through convex sets in the configuration space, guided by a sampling-based motion planner. This formulation allows us to define the TAMP problem as a shortest path search within the product graph, where efficient convex optimization techniques can be applied. We prove that our approach is both sound and complete under mild assumptions. Additionally, we extend our framework to cooperative pick-and-place tasks involving object handovers between robots. We evaluate our method across various high-dimensional multi-robot scenarios, including simulated and real-world environments with quadrupeds, robotic arms, and automated conveyor systems. Our results show that our approach outperforms existing methods in execution time and solution optimality while effectively scaling with task complexity.
Authors:Wonjong Lee, Joonyeol Sim, Joonkyung Kim, Siwon Jo, Wenhao Luo, Changjoo Nam
Abstract:
We propose a hybrid approach for decentralized multi-robot navigation that ensures both safety and deadlock prevention. Building on a standard control formulation, we add a lightweight deadlock prevention mechanism by forming temporary "roundabouts" (circular reference paths). Each robot relies only on local, peer-to-peer communication and a controller for base collision avoidance; a roundabout is generated or joined on demand to avert deadlocks. Robots in the roundabout travel in one direction until an escape condition is met, allowing them to return to goal-oriented motion. Unlike classical decentralized methods that lack explicit deadlock resolution, our roundabout maneuver ensures system-wide forward progress while preserving safety constraints. Extensive simulations and physical robot experiments show that our method consistently outperforms or matches the success and arrival rates of other decentralized control approaches, particularly in cluttered or high-density scenarios, all with minimal centralized coordination.
Authors:Dengyu Zhang, Chenghao, Feng Xue, Qingrui Zhang
Abstract:
Flocking control is essential for multi-robot systems in diverse applications, yet achieving efficient flocking in congested environments poses challenges regarding computation burdens, performance optimality, and motion safety. This paper addresses these challenges through a multi-agent reinforcement learning (MARL) framework built on Gibbs Random Fields (GRFs). With GRFs, a multi-robot system is represented by a set of random variables conforming to a joint probability distribution, thus offering a fresh perspective on flocking reward design. A decentralized training and execution mechanism, which enhances the scalability of MARL concerning robot quantity, is realized using a GRF-based credit assignment method. An action attention module is introduced to implicitly anticipate the motion intentions of neighboring robots, consequently mitigating potential non-stationarity issues in MARL. The proposed framework enables learning an efficient distributed control policy for multi-robot systems in challenging environments with success rate around $99\%$, as demonstrated through thorough comparisons with state-of-the-art solutions in simulations and experiments. Ablation studies are also performed to validate the efficiency of different framework modules.
Authors:Kevin Fu, Shalin Anand Jain, Pierce Howell, Harish Ravichandar
Abstract:
Recent advances have enabled heterogeneous multi-robot teams to learn complex and effective coordination skills. However, existing neural architectures that support heterogeneous teaming tend to force a trade-off between expressivity and efficiency. Shared-parameter designs prioritize sample efficiency by enabling a single network to be shared across all or a pre-specified subset of robots (via input augmentations), but tend to limit behavioral diversity. In contrast, recent designs employ a separate policy for each robot, enabling greater diversity and expressivity at the cost of efficiency and generalization. Our key insight is that such tradeoffs can be avoided by viewing these design choices as ends of a broad spectrum. Inspired by recent work in transfer and meta learning, and building on prior work in multi-robot task allocation, we propose Capability-Aware Shared Hypernetworks (CASH), a soft weight sharing architecture that uses hypernetworks to efficiently learn a flexible shared policy that dynamically adapts to each robot post-training. By explicitly encoding the impact of robot capabilities (e.g., speed and payload) on collective behavior, CASH enables zero-shot generalization to unseen robots or team compositions. Our experiments involve multiple heterogeneous tasks, three learning paradigms (imitation learning, value-based, and policy-gradient RL), and SOTA multi-robot simulation (JaxMARL) and hardware (Robotarium) platforms. Across all conditions, we find that CASH generates appropriately-diverse behaviors and consistently outperforms baseline architectures in terms of performance and sample efficiency during both training and zero-shot generalization, all with 60%-80% fewer learnable parameters.
Authors:Dolev Mutzari, Yonatan Aumann, Sarit Kraus
Abstract:
Multi-Robot Coverage problems have been extensively studied in robotics, planning and multi-agent systems. In this work, we consider the coverage problem when there are constraints on the proximity (e.g., maximum distance between the agents, or a blue agent must be adjacent to a red agent) and the movement (e.g., terrain traversability and material load capacity) of the robots. Such constraints naturally arise in many real-world applications, e.g. in search-and-rescue and maintenance operations. Given such a setting, the goal is to compute a covering tour of the graph with a minimum number of steps, and that adheres to the proximity and movement constraints. For this problem, our contributions are four: (i) a formal formulation of the problem, (ii) an exact algorithm that is FPT in F, d and tw, the set of robot formations that encode the proximity constraints, the maximum nodes degree, and the tree-width of the graph, respectively, (iii) for the case that the graph is a tree: a PTAS approximation scheme, that given an approximation parameter epsilon, produces a tour that is within a epsilon times error(||F||, d) of the optimal one, and the computation runs in time poly(n) times h(1/epsilon,||F||). (iv) for the case that the graph is a tree, with $k=3$ robots, and the constraint is that all agents are connected: a PTAS scheme with multiplicative approximation error of 1+O(epsilon), independent of the maximal degree d.
Authors:Toby Godfrey, William Hunt, Mohammad D. Soorati
Abstract:
Multi-agent reinforcement learning is a key method for training multi-robot systems over a series of episodes in which robots are rewarded or punished according to their performance; only once the system is trained to a suitable standard is it deployed in the real world. If the system is not trained enough, the task will likely not be completed and could pose a risk to the surrounding environment. We introduce Multi-Agent Reinforcement Learning guided by Language-based Inter-Robot Negotiation (MARLIN), in which the training process requires fewer training episodes to reach peak performance. Robots are equipped with large language models that negotiate and debate a task, producing plans used to guide the policy during training. The approach dynamically switches between using reinforcement learning and large language model-based action negotiation throughout training. This reduces the number of training episodes required, compared to standard multi-agent reinforcement learning, and hence allows the system to be deployed to physical hardware earlier. The performance of this approach is evaluated against multi-agent reinforcement learning, showing that our hybrid method achieves comparable results with significantly reduced training time.
Authors:Yupeng Yang, Yiwei Lyu, Yanze Zhang, Ian Gao, Wenhao Luo
Abstract:
This paper proposes a novel data-driven control strategy for maintaining connectivity in networked multi-robot systems. Existing approaches often rely on a pre-determined communication model specifying whether pairwise robots can communicate given their relative distance to guide the connectivity-aware control design, which may not capture real-world communication conditions. To relax that assumption, we present the concept of Data-driven Connectivity Barrier Certificates, which utilize Control Barrier Functions (CBF) and Gaussian Processes (GP) to characterize the admissible control space for pairwise robots based on communication performance observed online. This allows robots to maintain a satisfying level of pairwise communication quality (measured by the received signal strength) while in motion. Then we propose a Data-driven Connectivity Maintenance (DCM) algorithm that combines (1) online learning of the communication signal strength and (2) a bi-level optimization-based control framework for the robot team to enforce global connectivity of the realistic multi-robot communication graph and minimally deviate from their task-related motions. We provide theoretical proofs to justify the properties of our algorithm and demonstrate its effectiveness through simulations with up to 20 robots.
Authors:Kaige Qu, Zixiong Qin, Weihua Zhuang
Abstract:
To accommodate high network dynamics in real-time cooperative perception (CP), reinforcement learning (RL) based adaptive CP schemes have been proposed, to allow adaptive switchings between CP and stand-alone perception modes among connected and autonomous vehicles. The traditional offline-training online-execution RL framework suffers from performance degradation under nonstationary network conditions. To achieve fast and efficient model adaptation, we formulate a set of Markov decision processes for adaptive CP decisions in each stationary local vehicular network (LVN). A meta RL solution is proposed, which trains a meta RL model that captures the general features among LVNs, thus facilitating fast model adaptation for each LVN with the meta RL model as an initial point. Simulation results show the superiority of meta RL in terms of the convergence speed without reward degradation. The impact of the customization level of meta models on the model adaptation performance has also been evaluated.
Authors:Haoying Li, Qingcheng Zeng, Haoran Li, Yanglin Zhang, Junfeng Wu
Abstract:
Cooperative localization and target tracking are essential for multi-robot systems to implement high-level tasks. To this end, we propose a distributed invariant Kalman filter based on covariance intersection for effective multi-robot pose estimation. The paper utilizes the object-level measurement models, which have condensed information further reducing the communication burden. Besides, by modeling states on special Lie groups, the better linearity and consistency of the invariant Kalman filter structure can be stressed. We also use a combination of CI and KF to avoid overly confident or conservative estimates in multi-robot systems with intricate and unknown correlations, and some level of robot degradation is acceptable through multi-robot collaboration. The simulation and real data experiment validate the practicability and superiority of the proposed algorithm.
Authors:Shyalan Ramesh, Scott Mann, Alex Stumpf
Abstract:
The increasing complexity of marine operations has intensified the need for intelligent robotic systems to support ocean observation, exploration, and resource management. Underwater swarm robotics offers a promising framework that extends the capabilities of individual autonomous platforms through collective coordination. Inspired by natural systems, such as fish schools and insect colonies, bio-inspired swarm approaches enable distributed decision-making, adaptability, and resilience under challenging marine conditions. Yet research in this field remains fragmented, with limited integration across algorithmic, communication, and hardware design perspectives. This review synthesises bio-inspired coordination mechanisms, communication strategies, and system design considerations for underwater swarm robotics. It examines key marine-specific algorithms, including the Artificial Fish Swarm Algorithm, Whale Optimisation Algorithm, Coral Reef Optimisation, and Marine Predators Algorithm, highlighting their applications in formation control, task allocation, and environmental interaction. The review also analyses communication constraints unique to the underwater domain and emerging acoustic, optical, and hybrid solutions that support cooperative operation. Additionally, it examines hardware and system design advances that enhance system efficiency and scalability. A multi-dimensional classification framework evaluates existing approaches across communication dependency, environmental adaptability, energy efficiency, and swarm scalability. Through this integrated analysis, the review unifies bio-inspired coordination algorithms, communication modalities, and system design approaches. It also identifies converging trends, key challenges, and future research directions for real-world deployment of underwater swarm systems.
Authors:Ardalan Tajbakhsh, Augustinos Saravanos, James Zhu, Evangelos A. Theodorou, Lorenz T. Biegler, Aaron M. Johnson
Abstract:
This paper addresses the challenge of coordinating multi-robot systems under realistic communication delays using distributed optimization. We focus on consensus ADMM as a scalable framework for generating collision-free, dynamically feasible motion plans in both trajectory optimization and receding-horizon control settings. In practice, however, these algorithms are sensitive to penalty tuning or adaptation schemes (e.g. residual balancing and adaptive parameter heuristics) that do not explicitly consider delays. To address this, we introduce a Delay-Aware ADMM (DA-ADMM) variant that adapts penalty parameters based on real-time delay statistics, allowing agents to down-weight stale information and prioritize recent updates during consensus and dual updates. Through extensive simulations in 2D and 3D environments with double-integrator, Dubins-car, and drone dynamics, we show that DA-ADMM significantly improves robustness, success rate, and solution quality compared to fixed-parameter, residual-balancing, and fixed-constraint baselines. Our results highlight that performance degradation is not solely determined by delay length or frequency, but by the optimizer's ability to contextually reason over delayed information. The proposed DA-ADMM achieves consistently better coordination performance across a wide range of delay conditions, offering a principled and efficient mechanism for resilient multi-robot motion planning under imperfect communication.
Authors:Jin Huang, Yingqiang Wang, Ying Chen
Abstract:
Precise underwater positioning remains a fundamental challenge for underwater robotics since global navigation satellite system (GNSS) signals cannot penetrate the sea surface. This paper presents Raspi$^2$USBL, an open-source, Raspberry Pi-based passive inverted ultra-short baseline (piUSBL) positioning system designed to provide a low-cost and accessible solution for underwater robotic research. The system comprises a passive acoustic receiver and an active beacon. The receiver adopts a modular hardware architecture that integrates a hydrophone array, a multichannel preamplifier, an oven-controlled crystal oscillator (OCXO), a Raspberry Pi 5, and an MCC-series data acquisition (DAQ) board. Apart from the Pi 5, OCXO, and MCC board, the beacon comprises an impedance-matching network, a power amplifier, and a transmitting transducer. An open-source C++ software framework provides high-precision clock synchronization and triggering for one-way travel-time (OWTT) messaging, while performing real-time signal processing, including matched filtering, array beamforming, and adaptive gain control, to estimate the time of flight (TOF) and direction of arrival (DOA) of received signals. The Raspi$^2$USBL system was experimentally validated in an anechoic tank, freshwater lake, and open-sea trials. Results demonstrate a slant-range accuracy better than 0.1%, a bearing accuracy within 0.1$^\circ$, and stable performance over operational distances up to 1.3 km. These findings confirm that low-cost, reproducible hardware can deliver research-grade underwater positioning accuracy. By releasing both the hardware and software as open-source, Raspi$^2$USBL provides a unified reference platform that lowers the entry barrier for underwater robotics laboratories, fosters reproducibility, and promotes collaborative innovation in underwater acoustic navigation and swarm robotics.
Authors:Tyler M. Paine, Anastasia Bizyaeva, Michael R. Benjamin
Abstract:
Collaborating teams of robots show promise due in their ability to complete missions more efficiently and with improved robustness, attributes that are particularly useful for systems operating in marine environments. A key issue is how to model, analyze, and design these multi-robot systems to realize the full benefits of collaboration, a challenging task since the domain of multi-robot autonomy encompasses both collective and individual behaviors. This paper introduces a layered model of multi-robot autonomy that uses the principle of census, or a weighted count of the inputs from neighbors, for collective decision-making about teaming, coupled with multi-objective behavior optimization for individual decision-making about actions. The census component is expressed as a nonlinear opinion dynamics model and the multi-objective behavior optimization is accomplished using interval programming. This model can be reduced to recover foundational algorithms in distributed optimization and control, while the full model enables new types of collective behaviors that are useful in real-world scenarios. To illustrate these points, a new method for distributed optimization of subgroup allocation is introduced where robots use a gradient descent algorithm to minimize portions of the cost functions that are locally known, while being influenced by the opinion states from neighbors to account for the unobserved costs. With this method the group can collectively use the information contained in the Hessian matrix of the total global cost. The utility of this model is experimentally validated in three categorically different experiments with fleets of autonomous surface vehicles: an adaptive sampling scenario, a high value unit protection scenario, and a competitive game of capture the flag.
Authors:Shyalan Ramesh, Scott Mann, Alex Stumpf
Abstract:
Autonomous Underwater vehicles must operate in strong currents, limited acoustic bandwidth, and persistent sensing requirements where conventional swarm optimisation methods are unreliable. This paper presents NOAH, a novel nature-inspired swarm optimisation algorithm that combines current-aware drift, irreversible settlement in persistent sensing nodes, and colony-based communication. Drawing inspiration from the behaviour of barnacle nauplii, NOAH addresses the critical limitations of existing swarm algorithms by providing hydrodynamic awareness, irreversible anchoring mechanisms, and colony-based communication capabilities essential for underwater exploration missions. The algorithm establishes a comprehensive foundation for scalable and energy-efficient underwater swarm robotics with validated performance analysis. Validation studies demonstrate an 86% success rate for permanent anchoring scenarios, providing a unified formulation for hydrodynamic constraints and irreversible settlement behaviours with an empirical study under flow.
Authors:Jun Chen, Mingjia Chen, Shinkyu Park
Abstract:
Multi-robot Coverage Path Planning (MCPP) addresses the problem of computing paths for multiple robots to effectively cover a large area of interest. Conventional approaches to MCPP typically assume that robots move at fixed velocities, which is often unrealistic in real-world applications where robots must adapt their speeds based on the specific coverage tasks assigned to them.Consequently, conventional approaches often lead to imbalanced workload distribution among robots and increased completion time for coverage tasks. To address this, we introduce a novel Multi-robot Dynamic Coverage Path Planning (MDCPP) algorithm for complete coverage in two-dimensional environments. MDCPP dynamically estimates each robot's remaining workload by approximating the target distribution with Gaussian mixture models, and assigns coverage regions using a capacity-constrained Voronoi diagram. We further develop a distributed implementation of MDCPP for range-constrained robotic networks. Simulation results validate the efficacy of MDCPP, showing qualitative improvements and superior performance compared to an existing sweeping algorithm, and a quantifiable impact of communication range on coverage efficiency.
Authors:Sijiang Li, Rongqing Zhang, Xiang Cheng, Jian Tang
Abstract:
To support cooperative perception (CP) of networked mobile agents in dynamic scenarios, the efficient and robust transmission of sensory data is a critical challenge. Deep learning-based joint source-channel coding (JSCC) has demonstrated promising results for image transmission under adverse channel conditions, outperforming traditional rule-based codecs. While recent works have explored to combine JSCC with the widely adopted multiple-input multiple-output (MIMO) technology, these approaches are still limited to the discrete-time analog transmission (DTAT) model and simple tasks. Given the limited performance of existing MIMO JSCC schemes in supporting complex CP tasks for networked mobile agents with digital MIMO communication systems, this paper presents a Synesthesia of Machines (SoM)-based task-driven MIMO system for image transmission, referred to as SoM-MIMO. By leveraging the structural properties of the feature pyramid for perceptual tasks and the channel properties of the closed-loop MIMO communication system, SoM-MIMO enables efficient and robust digital MIMO transmission of images. Experimental results have shown that compared with two JSCC baseline schemes, our approach achieves average mAP improvements of 6.30 and 10.48 across all SNR levels, while maintaining identical communication overhead.
Authors:Chao Ning, Han Wang, Longyan Li, Yang Shi
Abstract:
This paper develops a novel COllaborative-Online-Learning (COOL)-enabled motion control framework for multi-robot systems to avoid collision amid randomly moving obstacles whose motion distributions are partially observable through decentralized data streams. To address the notable challenge of data acquisition due to occlusion, a COOL approach based on the Dirichlet process mixture model is proposed to efficiently extract motion distribution information by exchanging among robots selected learning structures. By leveraging the fine-grained local-moment information learned through COOL, a data-stream-driven ambiguity set for obstacle motion is constructed. We then introduce a novel ambiguity set propagation method, which theoretically admits the derivation of the ambiguity sets for obstacle positions over the entire prediction horizon by utilizing obstacle current positions and the ambiguity set for obstacle motion. Additionally, we develop a compression scheme with its safety guarantee to automatically adjust the complexity and granularity of the ambiguity set by aggregating basic ambiguity sets that are close in a measure space, thereby striking an attractive trade-off between control performance and computation time. Then the probabilistic collision-free trajectories are generated through distributionally robust optimization problems. The distributionally robust obstacle avoidance constraints based on the compressed ambiguity set are equivalently reformulated by deriving separating hyperplanes through tractable semi-definite programming. Finally, we establish the probabilistic collision avoidance guarantee and the long-term tracking performance guarantee for the proposed framework. The numerical simulations are used to demonstrate the efficacy and superiority of the proposed approach compared with state-of-the-art methods.
Authors:Chang Liu, Yang Xu, Tamas Sziranyi
Abstract:
Extracting narrow roads from high-resolution remote sensing imagery remains a significant challenge due to their limited width, fragmented topology, and frequent occlusions. To address these issues, we propose D3FNet, a Dilated Dual-Stream Differential Attention Fusion Network designed for fine-grained road structure segmentation in remote perception systems. Built upon the encoder-decoder backbone of D-LinkNet, D3FNet introduces three key innovations:(1) a Differential Attention Dilation Extraction (DADE) module that enhances subtle road features while suppressing background noise at the bottleneck; (2) a Dual-stream Decoding Fusion Mechanism (DDFM) that integrates original and attention-modulated features to balance spatial precision with semantic context; and (3) a multi-scale dilation strategy (rates 1, 3, 5, 9) that mitigates gridding artifacts and improves continuity in narrow road prediction. Unlike conventional models that overfit to generic road widths, D3FNet specifically targets fine-grained, occluded, and low-contrast road segments. Extensive experiments on the DeepGlobe and CHN6-CUG benchmarks show that D3FNet achieves superior IoU and recall on challenging road regions, outperforming state-of-the-art baselines. Ablation studies further verify the complementary synergy of attention-guided encoding and dual-path decoding. These results confirm D3FNet as a robust solution for fine-grained narrow road extraction in complex remote and cooperative perception scenarios.
Authors:Seabin Lee, Joonyeol Sim, Changjoo Nam
Abstract:
We consider the Multi-Robot Task Allocation (MRTA) problem that aims to optimize an assignment of multiple robots to multiple tasks in challenging environments which are with densely populated obstacles and narrow passages. In such environments, conventional methods optimizing the sum-of-cost are often ineffective because the conflicts between robots incur additional costs (e.g., collision avoidance, waiting). Also, an allocation that does not incorporate the actual robot paths could cause deadlocks, which significantly degrade the collective performance of the robots.
We propose a scalable MRTA method that considers the paths of the robots to avoid collisions and deadlocks which result in a fast completion of all tasks (i.e., minimizing the \textit{makespan}). To incorporate robot paths into task allocation, the proposed method constructs a roadmap using a Generalized Voronoi Diagram. The method partitions the roadmap into several components to know how to redistribute robots to achieve all tasks with less conflicts between the robots. In the redistribution process, robots are transferred to their final destinations according to a push-pop mechanism with the first-in first-out principle. From the extensive experiments, we show that our method can handle instances with hundreds of robots in dense clutter while competitors are unable to compute a solution within a time limit.
Authors:Akshaya Agrawal, Evan Palmer, Zachary Kingston, Geoffrey A. Hollinger
Abstract:
Deploying multi-robot systems in underwater environments is expensive and lengthy; testing algorithms and software in simulation improves development by decoupling software and hardware. However, this requires a simulation framework that closely resembles the real-world. Angler is an open-source framework that simulates low-level communication protocols for an onboard autopilot, such as ArduSub, providing a framework that is close to reality, but unfortunately lacking support for simulating multiple robots. We present an extension to Angler that supports multi-robot simulation and motion planning. Our extension has a modular architecture that creates non-conflicting communication channels between Gazebo, ArduSub Software-in-the-Loop (SITL), and MAVROS to operate multiple robots simultaneously in the same environment. Our multi-robot motion planning module interfaces with cascaded controllers via a JointTrajectory controller in ROS~2. We also provide an integration with the Open Motion Planning Library (OMPL), a collision avoidance module, and tools for procedural environment generation. Our work enables the development and benchmarking of underwater multi-robot motion planning in dynamic environments.
Authors:Basit Muhammad Imran, Jeeseop Kim, Taizoon Chunawala, Alexander Leonessa, Kaveh Akbari Hamed
Abstract:
This paper presents a novel hierarchical, safety-critical control framework that integrates distributed nonlinear model predictive controllers (DNMPCs) with control barrier functions (CBFs) to enable cooperative locomotion of multi-agent quadrupedal robots in complex environments. While NMPC-based methods are widely adopted for enforcing safety constraints and navigating multi-robot systems (MRSs) through intricate environments, ensuring the safety of MRSs requires a formal definition grounded in the concept of invariant sets. CBFs, typically implemented via quadratic programs (QPs) at the planning layer, provide formal safety guarantees. However, their zero-control horizon limits their effectiveness for extended trajectory planning in inherently unstable, underactuated, and nonlinear legged robot models. Furthermore, the integration of CBFs into real-time NMPC for sophisticated MRSs, such as quadrupedal robot teams, remains underexplored. This paper develops computationally efficient, distributed NMPC algorithms that incorporate CBF-based collision safety guarantees within a consensus protocol, enabling longer planning horizons for safe cooperative locomotion under disturbances and rough terrain conditions. The optimal trajectories generated by the DNMPCs are tracked using full-order, nonlinear whole-body controllers at the low level. The proposed approach is validated through extensive numerical simulations with up to four Unitree A1 robots and hardware experiments involving two A1 robots subjected to external pushes, rough terrain, and uncertain obstacle information. Comparative analysis demonstrates that the proposed CBF-based DNMPCs achieve a 27.89% higher success rate than conventional NMPCs without CBF constraints.
Authors:Lauren Bramblett, Jonathan Reasoner, Nicola Bezzo
Abstract:
A Multi-robot system (MRS) provides significant advantages for intricate tasks such as environmental monitoring, underwater inspections, and space missions. However, addressing potential communication failures or the lack of communication infrastructure in these fields remains a challenge. A significant portion of MRS research presumes that the system can maintain communication with proximity constraints, but this approach does not solve situations where communication is either non-existent, unreliable, or poses a security risk. Some approaches tackle this issue using predictions about other robots while not communicating, but these methods generally only permit agents to utilize first-order reasoning, which involves reasoning based purely on their own observations. In contrast, to deal with this problem, our proposed framework utilizes Theory of Mind (ToM), employing higher-order reasoning by shifting a robot's perspective to reason about a belief of others observations. Our approach has two main phases: i) an efficient runtime plan adaptation using active inference to signal intentions and reason about a robot's own belief and the beliefs of others in the system, and ii) a hierarchical epistemic planning framework to iteratively reason about the current MRS mission state. The proposed framework outperforms greedy and first-order reasoning approaches and is validated using simulations and experiments with heterogeneous robotic systems.
Authors:Jingtao Tang, Zining Mao, Hang Ma
Abstract:
We study Multi-Robot Coverage Path Planning (MCPP) on a 4-neighbor 2D grid G, which aims to compute paths for multiple robots to cover all cells of G. Traditional approaches are limited as they first compute coverage trees on a quadrant coarsened grid H and then employ the Spanning Tree Coverage (STC) paradigm to generate paths on G, making them inapplicable to grids with partially obstructed 2x2 blocks. To address this limitation, we reformulate the problem directly on G, revolutionizing grid-based MCPP solving and establishing new NP-hardness results. We introduce Extended-STC (ESTC), a novel paradigm that extends STC to ensure complete coverage with bounded suboptimality, even when H includes partially obstructed blocks. Furthermore, we present LS-MCPP, a new algorithmic framework that integrates ESTC with three novel types of neighborhood operators within a local search strategy to optimize coverage paths directly on G. Unlike prior grid-based MCPP work, our approach also incorporates a versatile post-processing procedure that applies Multi-Agent Path Finding (MAPF) techniques to MCPP for the first time, enabling a fusion of these two important fields in multi-robot coordination. This procedure effectively resolves inter-robot conflicts and accommodates turning costs by solving a MAPF variant, making our MCPP solutions more practical for real-world applications. Extensive experiments demonstrate that our approach significantly improves solution quality and efficiency, managing up to 100 robots on grids as large as 256x256 within minutes of runtime. Validation with physical robots confirms the feasibility of our solutions under real-world conditions.
Authors:Jumman Hossain, Emon Dey, Snehalraj Chugh, Masud Ahmed, MS Anwar, Abu-Zaher Faridee, Jason Hoppes, Theron Trout, Anjon Basak, Rafidh Chowdhury, Rishabh Mistry, Hyun Kim, Jade Freeman, Niranjan Suri, Adrienne Raglin, Carl Busart, Timothy Gregory, Anuradha Ravi, Nirmalya Roy
Abstract:
The increasing deployment of autonomous systems in complex environments necessitates efficient communication and task completion among multiple agents. This paper presents SERN (Simulation-Enhanced Realistic Navigation), a novel framework integrating virtual and physical environments for real-time collaborative decision-making in multi-robot systems. SERN addresses key challenges in asset deployment and coordination through our bi-directional SERN ROS Bridge communication framework. Our approach advances the SOTA through: accurate real-world representation in virtual environments using Unity high-fidelity simulator; synchronization of physical and virtual robot movements; efficient ROS data distribution between remote locations; and integration of SOTA semantic segmentation for enhanced environmental perception. Additionally, we introduce a Multi-Metric Cost Function (MMCF) that dynamically balances latency, reliability, computational overhead, and bandwidth consumption to optimize system performance in contested environments. We further provide theoretical justification for synchronization accuracy by proving that the positional error between physical and virtual robots remains bounded under varying network conditions. Our evaluations show a 15% to 24% improvement in latency and up to a 15% increase in processing efficiency compared to traditional ROS setups. Real-world and virtual simulation experiments with multiple robots (Clearpath Jackal and Husky) demonstrate synchronization accuracy, achieving less than $5\text{ cm}$ positional error and under $2^\circ$ rotational error. These results highlight SERN's potential to enhance situational awareness and multi-agent coordination in diverse, contested environments.
Authors:Xintong Zhang, Junfeng Chen, Yuxiao Zhu, Bing Luo, Meng Guo
Abstract:
Multi-robot systems can greatly enhance efficiency through coordination and collaboration, yet in practice, full-time communication is rarely available and interactions are constrained to close-range exchanges. Existing methods either maintain all-time connectivity, rely on fixed schedules, or adopt pairwise protocols, but none adapt effectively to dynamic spatio-temporal task distributions under limited communication, resulting in suboptimal coordination. To address this gap, we propose CoCoPlan, a unified framework that co-optimizes collaborative task planning and team-wise intermittent communication. Our approach integrates a branch-and-bound architecture that jointly encodes task assignments and communication events, an adaptive objective function that balances task efficiency against communication latency, and a communication event optimization module that strategically determines when, where and how the global connectivity should be re-established. Extensive experiments demonstrate that it outperforms state-of-the-art methods by achieving a 22.4% higher task completion rate, reducing communication overhead by 58.6%, and improving the scalability by supporting up to 100 robots in dynamic environments. Hardware experiments include the complex 2D office environment and large-scale 3D disaster-response scenario.
Authors:Si Wu, Zhengyan Qin, Tengfei Liu, Zhong-Ping Jiang
Abstract:
This paper investigates the control problem of steering a group of spherical mobile robots to cooperatively transport a spherical object. By controlling the movements of the robots to exert appropriate contact (pushing) forces, it is desired that the object follows a velocity command. To solve the problem, we first treat the robots' positions as virtual control inputs of the object, and propose a velocity-tracking controller based on quadratic programming (QP), enabling the robots to cooperatively generate desired contact forces while minimizing the sum of the contact-force magnitudes. Then, we design position-tracking controllers for the robots. By appropriately designing the objective function and the constraints for the QP, it is guaranteed that the QP admits a unique solution and the QP-based velocity-tracking controller is Lipschitz continuous. Finally, we consider the closed-loop system as an interconnection of two subsystems, corresponding to the velocity-tracking error of the object and the position-tracking error of the robots, and employ nonlinear small-gain techniques for stability analysis. The effectiveness of the proposed design is demonstrated through numerical simulations.
Authors:Lianhao Yin, Haiping Yu, Pascal Spino, Daniela Rus
Abstract:
Biological swarms, such as ant colonies, achieve collective goals through decentralized and stochastic individual behaviors. Similarly, physical systems composed of gases, liquids, and solids exhibit random particle motion governed by entropy maximization, yet do not achieve collective objectives. Despite this analogy, no unified framework exists to explain the stochastic behavior in both biological and physical systems. Here, we present empirical evidence from \textit{Formica polyctena} ants that reveals a shared statistical mechanism underlying both systems: maximization under different energy function constraints. We further demonstrate that robotic swarms governed by this principle can exhibit scalable, decentralized cooperation, mimicking physical phase-like behaviors with minimal individual computation. These findings established a unified stochastic model linking biological, physical, and robotic swarms, offering a scalable principle for designing robust and intelligent swarm robotics.
Authors:Jinyue Song, Hansol Ku, Jayneel Vora, Nelson Lee, Ahmad Kamari, Prasant Mohapatra, Parth Pathak
Abstract:
Automotive FMCW radars remain reliable in rain and glare, yet their sparse, noisy point clouds constrain 3-D object detection. We therefore release CoVeRaP, a 21 k-frame cooperative dataset that time-aligns radar, camera, and GPS streams from multiple vehicles across diverse manoeuvres. Built on this data, we propose a unified cooperative-perception framework with middle- and late-fusion options. Its baseline network employs a multi-branch PointNet-style encoder enhanced with self-attention to fuse spatial, Doppler, and intensity cues into a common latent space, which a decoder converts into 3-D bounding boxes and per-point depth confidence. Experiments show that middle fusion with intensity encoding boosts mean Average Precision by up to 9x at IoU 0.9 and consistently outperforms single-vehicle baselines. CoVeRaP thus establishes the first reproducible benchmark for multi-vehicle FMCW-radar perception and demonstrates that affordable radar sharing markedly improves detection robustness. Dataset and code are publicly available to encourage further research.
Authors:Hadush Hailu, Bruk Gebregziabher, Prudhvi Raj
Abstract:
The Iterative Forecast Planner (IFP) is a geometric planning approach that offers lightweight computations, scalable, and reactive solutions for multi-robot path planning in decentralized, communication-free settings. However, it struggles in symmetric configurations, where mirrored interactions often lead to collisions and deadlocks. We introduce eIFP-MPC, an optimized and extended version of IFP that improves robustness and path consistency in dense, dynamic environments. The method refines threat prioritization using a time-to-collision heuristic, stabilizes path generation through cost-based via-point selection, and ensures dynamic feasibility by incorporating model predictive control (MPC) into the planning process. These enhancements are tightly integrated into the IFP to preserve its efficiency while improving its adaptability and stability. Extensive simulations across symmetric and high-density scenarios show that eIFP-MPC significantly reduces oscillations, ensures collision-free motion, and improves trajectory efficiency. The results demonstrate that geometric planners can be strengthened through optimization, enabling robust performance at scale in complex multi-agent environments.
Authors:Prithvi Poddar, Ehsan Tarkesh Esfahani, Karthik Dantu, Souma Chowdhury
Abstract:
Operations in disaster response, search \& rescue, and military missions that involve multiple agents demand automated processes to support the planning of the courses of action (COA). Moreover, traverse-affecting changes in the environment (rain, snow, blockades, etc.) may impact the expected performance of a COA, making it desirable to have a pool of COAs that are diverse in task distributions across agents. Further, variations in agent capabilities, which could be human crews and/or autonomous systems, present practical opportunities and computational challenges to the planning process. This paper presents a new theoretical formulation and computational framework to generate such diverse pools of COAs for operations with soft variations in agent-task compatibility. Key to the problem formulation is a graph abstraction of the task space and the pool of COAs itself to quantify its diversity. Formulating the COAs as a centralized multi-robot task allocation problem, a genetic algorithm is used for (order-ignoring) allocations of tasks to each agent that jointly maximize diversity within the COA pool and overall compatibility of the agent-task mappings. A graph neural network is trained using a policy gradient approach to then perform single agent task sequencing in each COA, which maximizes completion rates adaptive to task features. Our tests of the COA generation process in a simulated environment demonstrate significant performance gain over a random walk baseline, small optimality gap in task sequencing, and execution time of about 50 minutes to plan up to 20 COAs for 5 agent/100 task operations.
Authors:Chenguang Liu, Jianjun Chen, Yunfei Chen, Yubei He, Zhuangkun Wei, Hongjian Sun, Haiyan Lu, Qi Hao
Abstract:
Cooperative perception, leveraging shared information from multiple vehicles via vehicle-to-vehicle (V2V) communication, plays a vital role in autonomous driving to alleviate the limitation of single-vehicle perception. Existing works have explored the effects of V2V communication impairments on perception precision, but they lack generalization to different levels of impairments. In this work, we propose a joint weighting and denoising framework, Coop-WD, to enhance cooperative perception subject to V2V channel impairments. In this framework, the self-supervised contrastive model and the conditional diffusion probabilistic model are adopted hierarchically for vehicle-level and pixel-level feature enhancement. An efficient variant model, Coop-WD-eco, is proposed to selectively deactivate denoising to reduce processing overhead. Rician fading, non-stationarity, and time-varying distortion are considered. Simulation results demonstrate that the proposed Coop-WD outperforms conventional benchmarks in all types of channels. Qualitative analysis with visual examples further proves the superiority of our proposed method. The proposed Coop-WD-eco achieves up to 50% reduction in computational cost under severe distortion while maintaining comparable accuracy as channel conditions improve.
Authors:Mahboubeh Zarei, Robin Chhabra
Abstract:
Consistent localization of cooperative multi-robot systems during navigation presents substantial challenges. This paper proposes a fault-tolerant, multi-modal localization framework for multi-robot systems on matrix Lie groups. We introduce novel stochastic operations to perform composition, differencing, inversion, averaging, and fusion of correlated and non-correlated estimates on Lie groups, enabling pseudo-pose construction for filter updates. The method integrates a combination of proprioceptive and exteroceptive measurements from inertial, velocity, and pose (pseudo-pose) sensors on each robot in an Extended Kalman Filter (EKF) framework. The prediction step is conducted on the Lie group $\mathbb{SE}_2(3) \times \mathbb{R}^3 \times \mathbb{R}^3$, where each robot's pose, velocity, and inertial measurement biases are propagated. The proposed framework uses body velocity, relative pose measurements from fiducial markers, and inter-robot communication to provide scalable EKF update across the network on the Lie group $\mathbb{SE}(3) \times \mathbb{R}^3$. A fault detection module is implemented, allowing the integration of only reliable pseudo-pose measurements from fiducial markers. We demonstrate the effectiveness of the method through experiments with a network of wheeled mobile robots equipped with inertial measurement units, wheel odometry, and ArUco markers. The comparison results highlight the proposed method's real-time performance, superior efficiency, reliability, and scalability in multi-robot localization, making it well-suited for large-scale robotic systems.
Authors:Qizhen Wu, Lei Chen, Kexin Liu, Jinhu Lu
Abstract:
In swarm robotics, confrontation scenarios, including strategic confrontations, require efficient decision-making that integrates discrete commands and continuous actions. Traditional task and motion planning methods separate decision-making into two layers, but their unidirectional structure fails to capture the interdependence between these layers, limiting adaptability in dynamic environments. Here, we propose a novel bidirectional approach based on hierarchical reinforcement learning, enabling dynamic interaction between the layers. This method effectively maps commands to task allocation and actions to path planning, while leveraging cross-training techniques to enhance learning across the hierarchical framework. Furthermore, we introduce a trajectory prediction model that bridges abstract task representations with actionable planning goals. In our experiments, it achieves over 80% in confrontation win rate and under 0.01 seconds in decision time, outperforming existing approaches. Demonstrations through large-scale tests and real-world robot experiments further emphasize the generalization capabilities and practical applicability of our method.
Authors:Takumi Ito, Riku Funada, Mitsuji Sampei, Gennaro Notomista
Abstract:
This work proposes a novel multi-robot task allocation framework for robots that can switch between multiple modes, e.g., flying, driving, or walking. We first provide a method to encode the multi-mode property of robots as a graph, where the mode of each robot is represented by a node. Next, we formulate a constrained optimization problem to decide both the task to be allocated to each robot as well as the mode in which the latter should execute the task. The robot modes are optimized based on the state of the robot and the environment, as well as the energy required to execute the allocated task. Moreover, the proposed framework is able to encompass kinematic and dynamic models of robots alike. Furthermore, we provide sufficient conditions for the convergence of task execution and allocation for both robot models.
Authors:Jonas Friemel, David Liedtke, Christian Scheffer
Abstract:
Shape formation is one of the most thoroughly studied problems in programmable matter and swarm robotics. However, in many models, the class of shapes that can be formed is highly restricted due to the particles' limited memory. In the hybrid model, an active agent with the computational power of a deterministic finite automaton can form shapes by lifting and placing passive tiles on the triangular lattice. We study the shape reconfiguration problem where the agent additionally has the ability to distinguish so-called target nodes from non-target nodes and needs to form a target shape from the initial tile configuration. We present a worst-case optimal $O(mn)$ algorithm for simply connected target shapes, where $m$ is the initial number of unoccupied target nodes and $n$ is the total number of tiles. Furthermore, we show how an agent can reconfigure a large class of target shapes with holes in $O(n^4)$ steps.
Authors:James Gao, Jacob Lee, Yuting Zhou, Yunze Hu, Chang Liu, Pingping Zhu
Abstract:
Swarm robotics, or very large-scale robotics (VLSR), has many meaningful applications for complicated tasks. However, the complexity of motion control and energy costs stack up quickly as the number of robots increases. In addressing this problem, our previous studies have formulated various methods employing macroscopic and microscopic approaches. These methods enable microscopic robots to adhere to a reference Gaussian mixture model (GMM) distribution observed at the macroscopic scale. As a result, optimizing the macroscopic level will result in an optimal overall result. However, all these methods require systematic and global generation of Gaussian components (GCs) within obstacle-free areas to construct the GMM trajectories. This work utilizes centroidal Voronoi tessellation to generate GCs methodically. Consequently, it demonstrates performance improvement while also ensuring consistency and reliability.
Authors:Jiazhen Liu, Glen Neville, Jinwoo Park, Sonia Chernova, Harish Ravichandar
Abstract:
Complex multi-robot missions often require heterogeneous teams to jointly optimize task allocation, scheduling, and path planning to improve team performance under strict constraints. We formalize these complexities into a new class of problems, dubbed Spatio-Temporal Efficacy-optimized Allocation for Multi-robot systems (STEAM). STEAM builds upon trait-based frameworks that model robots using their capabilities (e.g., payload and speed), but goes beyond the typical binary success-failure model by explicitly modeling the efficacy of allocations as trait-efficacy maps. These maps encode how the aggregated capabilities assigned to a task determine performance. Further, STEAM accommodates spatio-temporal constraints, including a user-specified time budget (i.e., maximum makespan). To solve STEAM problems, we contribute a novel algorithm named Efficacy-optimized Incremental Task Allocation Graph Search (E-ITAGS) that simultaneously optimizes task performance and respects time budgets by interleaving task allocation, scheduling, and path planning. Motivated by the fact that trait-efficacy maps are difficult, if not impossible, to specify, E-ITAGS efficiently learns them using a realizability-aware active learning module. Our approach is realizability-aware since it explicitly accounts for the fact that not all combinations of traits are realizable by the robots available during learning. Further, we derive experimentally-validated bounds on E-ITAGS' suboptimality with respect to efficacy. Detailed numerical simulations and experiments using an emergency response domain demonstrate that E-ITAGS generates allocations of higher efficacy compared to baselines, while respecting resource and spatio-temporal constraints. We also show that our active learning approach is sample efficient and establishes a principled tradeoff between data and computational efficiency.
Authors:Kewei Chen, Yayu Long, Mingsheng Shang
Abstract:
Multi-robot systems in complex physical collaborations face a "shared brain dilemma": transmitting high-dimensional multimedia data (e.g., video streams at ~30MB/s) creates severe bandwidth bottlenecks and decision-making latency. To address this, we propose PIPHEN, an innovative distributed physical cognition-control framework. Its core idea is to replace "raw data communication" with "semantic communication" by performing "semantic distillation" at the robot edge, reconstructing high-dimensional perceptual data into compact, structured physical representations. This idea is primarily realized through two key components: (1) a novel Physical Interaction Prediction Network (PIPN), derived from large model knowledge distillation, to generate this representation; and (2) a Hamiltonian Energy Network (HEN) controller, based on energy conservation, to precisely translate this representation into coordinated actions. Experiments show that, compared to baseline methods, PIPHEN can compress the information representation to less than 5% of the original data volume and reduce collaborative decision-making latency from 315ms to 76ms, while significantly improving task success rates. This work provides a fundamentally efficient paradigm for resolving the "shared brain dilemma" in resource-constrained multi-robot systems.
Authors:Utkarsh U. Chavan, Prashant Trivedi, Nandyala Hemachandra
Abstract:
Multi-agent systems (MAS) are central to applications such as swarm robotics and traffic routing, where agents must coordinate in a decentralized manner to achieve a common objective. Stochastic Shortest Path (SSP) problems provide a natural framework for modeling decentralized control in such settings. While the problem of learning in SSP has been extensively studied in single-agent settings, the decentralized multi-agent variant remains largely unexplored. In this work, we take a step towards addressing that gap. We study decentralized multi-agent SSPs (Dec-MASSPs) under linear function approximation, where the transition dynamics and costs are represented using linear models. Applying novel symmetry-based arguments, we identify the structure of optimal policies. Our main contribution is the first regret lower bound for this setting based on the construction of hard-to-learn instances for any number of agents, $n$. Our regret lower bound of $Ω(\sqrt{K})$, over $K$ episodes, highlights the inherent learning difficulty in Dec-MASSPs. These insights clarify the learning complexity of decentralized control and can further guide the design of efficient learning algorithms in multi-agent systems.
Authors:Nengbo Zhang, Hann Woei Ho
Abstract:
Recognizing the motion of Micro Aerial Vehicles (MAVs) is crucial for enabling cooperative perception and control in autonomous aerial swarms. Yet, vision-based recognition models relying only on RGB data often fail to capture the complex spatial temporal characteristics of MAV motion, which limits their ability to distinguish different actions. To overcome this problem, this paper presents MAVR-Net, a multi-view learning-based MAV action recognition framework. Unlike traditional single-view methods, the proposed approach combines three complementary types of data, including raw RGB frames, optical flow, and segmentation masks, to improve the robustness and accuracy of MAV motion recognition. Specifically, ResNet-based encoders are used to extract discriminative features from each view, and a multi-scale feature pyramid is adopted to preserve the spatiotemporal details of MAV motion patterns. To enhance the interaction between different views, a cross-view attention module is introduced to model the dependencies among various modalities and feature scales. In addition, a multi-view alignment loss is designed to ensure semantic consistency and strengthen cross-view feature representations. Experimental results on benchmark MAV action datasets show that our method clearly outperforms existing approaches, achieving 97.8\%, 96.5\%, and 92.8\% accuracy on the Short MAV, Medium MAV, and Long MAV datasets, respectively.
Authors:Ben Rossano, Jaein Lim, Jonathan P. How
Abstract:
This paper proposes a task allocation algorithm for teams of heterogeneous robots in environments with uncertain task requirements. We model these requirements as probability distributions over capabilities and use this model to allocate tasks such that robots with complementary skills naturally position near uncertain tasks, proactively mitigating task failures without wasting resources. We introduce a market-based approach that optimizes the joint team objective while explicitly capturing coupled rewards between robots, offering a polynomial-time solution in decentralized settings with strict communication assumptions. Comparative experiments against benchmark algorithms demonstrate the effectiveness of our approach and highlight the challenges of incorporating coupled rewards in a decentralized formulation.
Authors:Victor V. Puche, Kashish Verma, Matteo Fumagalli
Abstract:
The growing global demand for critical raw materials (CRMs) has highlighted the need to access difficult and hazardous environments such as abandoned underground mines. These sites pose significant challenges for conventional machinery and human operators due to confined spaces, structural instability, and lack of infrastructure. To address this, we propose a modular multi-robot system designed for autonomous operation in such environments, enabling sequential mineral extraction tasks. Unlike existing work that focuses primarily on mapping and inspection through global behavior or central control, our approach incorporates physical interaction capabilities using specialized robots coordinated through local high-level behavior control. Our proposed system utilizes Hierarchical Finite State Machine (HFSM) behaviors to structure complex task execution across heterogeneous robotic platforms. Each robot has its own HFSM behavior to perform sequential autonomy while maintaining overall system coordination, achieved by triggering behavior execution through inter-robot communication. This architecture effectively integrates software and hardware components to support collaborative, task-driven multi-robot operation in confined underground environments.
Authors:Maryam Kazemi Eskeri, Ville Kyrki, Dominik Baumann, Tomasz Piotr Kucner
Abstract:
Multi-robot systems are increasingly deployed in applications, such as intralogistics or autonomous delivery, where multiple robots collaborate to complete tasks efficiently. One of the key factors enabling their efficient cooperation is Multi-Robot Task Allocation (MRTA). Algorithms solving this problem optimize task distribution among robots to minimize the overall execution time. In shared environments, apart from the relative distance between the robots and the tasks, the execution time is also significantly impacted by the delay caused by navigating around moving people. However, most existing MRTA approaches are dynamics-agnostic, relying on static maps and neglecting human motion patterns, leading to inefficiencies and delays. In this paper, we introduce \acrfull{method name}. This method leverages Maps of Dynamics (MoDs), spatio-temporal queryable models designed to capture historical human movement patterns, to estimate the impact of humans on the task execution time during deployment. \acrshort{method name} utilizes a stochastic cost function that includes MoDs. Experimental results show that integrating MoDs enhances task allocation performance, resulting in reduced mission completion times by up to $26\%$ compared to the dynamics-agnostic method and up to $19\%$ compared to the baseline. This work underscores the importance of considering human dynamics in MRTA within shared environments and presents an efficient framework for deploying multi-robot systems in environments populated by humans.
Authors:Aalok Patwardhan, Andrew J. Davison
Abstract:
Robot swarms require cohesive collective behaviour to address diverse challenges, including shape formation and decision-making. Existing approaches often treat consensus in discrete and continuous decision spaces as distinct problems. We present DANCeRS, a unified, distributed algorithm leveraging Gaussian Belief Propagation (GBP) to achieve consensus in both domains. By representing a swarm as a factor graph our method ensures scalability and robustness in dynamic environments, relying on purely peer-to-peer message passing. We demonstrate the effectiveness of our general framework through two applications where agents in a swarm must achieve consensus on global behaviour whilst relying on local communication. In the first, robots must perform path planning and collision avoidance to create shape formations. In the second, we show how the same framework can be used by a group of robots to form a consensus over a set of discrete decisions. Experimental results highlight our method's scalability and efficiency compared to recent approaches to these problems making it a promising solution for multi-robot systems requiring distributed consensus. We encourage the reader to see the supplementary video demo.
Authors:Tianfu Wu, Jiaqi Fu, Wugang Meng, Sungjin Cho, Huanzhe Zhan, Fumin Zhang
Abstract:
Formation control is essential for swarm robotics, enabling coordinated behavior in complex environments. In this paper, we introduce a novel formation control system for an indoor blimp swarm using a specialized leader-follower approach enhanced with a dynamic leader-switching mechanism. This strategy allows any blimp to take on the leader role, distributing maneuvering demands across the swarm and enhancing overall formation stability. Only the leader blimp is manually controlled by a human operator, while follower blimps use onboard monocular cameras and a laser altimeter for relative position and altitude estimation. A leader-switching scheme is proposed to assist the human operator to maintain stability of the swarm, especially when a sharp turn is performed. Experimental results confirm that the leader-switching mechanism effectively maintains stable formations and adapts to dynamic indoor environments while assisting human operator.
Authors:Pengda Mao, Shuli Lv, Chen Min, Zhaolong Shen, Quan Quan
Abstract:
Swarm robotics navigating through unknown obstacle environments is an emerging research area that faces challenges. Performing tasks in such environments requires swarms to achieve autonomous localization, perception, decision-making, control, and planning. The limited computational resources of onboard platforms present significant challenges for planning and control. Reactive planners offer low computational demands and high re-planning frequencies but lack predictive capabilities, often resulting in local minima. Long-horizon planners, on the other hand, can perform multi-step predictions to reduce deadlocks but cost much computation, leading to lower re-planning frequencies. This paper proposes a real-time optimal virtual tube planning method for swarm robotics in unknown environments, which generates approximate solutions for optimal trajectories through affine functions. As a result, the computational complexity of approximate solutions is $O(n_t)$, where $n_t$ is the number of parameters in the trajectory, thereby significantly reducing the overall computational burden. By integrating reactive methods, the proposed method enables low-computation, safe swarm motion in unknown environments. The effectiveness of the proposed method is validated through several simulations and experiments.
Authors:Sai Krishna Ghanta, Ramviyas Parasuraman
Abstract:
In indoor environments, multi-robot visual (RGB-D) mapping and exploration hold immense potential for application in domains such as domestic service and logistics, where deploying multiple robots in the same environment can significantly enhance efficiency. However, there are two primary challenges: (1) the "ghosting trail" effect, which occurs due to overlapping views of robots impacting the accuracy and quality of point cloud reconstruction, and (2) the oversight of visual reconstructions in selecting the most effective frontiers for exploration. Given these challenges are interrelated, we address them together by proposing a new semi-distributed framework (SPACE) for spatial cooperation in indoor environments that enables enhanced coverage and 3D mapping. SPACE leverages geometric techniques, including "mutual awareness" and a "dynamic robot filter," to overcome spatial mapping constraints. Additionally, we introduce a novel spatial frontier detection system and map merger, integrated with an adaptive frontier assigner for optimal coverage balancing the exploration and reconstruction objectives. In extensive ROS-Gazebo simulations, SPACE demonstrated superior performance over state-of-the-art approaches in both exploration and mapping metrics.
Authors:Ethan Schneider, Daniel Wu, Devleena Das, Sonia Chernova
Abstract:
As the complexity of multi-robot systems grows to incorporate a greater number of robots, more complex tasks, and longer time horizons, the solutions to such problems often become too complex to be fully intelligible to human users. In this work, we introduce an approach for generating natural language explanations that justify the validity of the system's solution to the user, or else aid the user in correcting any errors that led to a suboptimal system solution. Toward this goal, we first contribute a generalizable formalism of contrastive explanations for multi-robot systems, and then introduce a holistic approach to generating contrastive explanations for multi-robot scenarios that selectively incorporates data from multi-robot task allocation, scheduling, and motion-planning to explain system behavior. Through user studies with human operators we demonstrate that our integrated contrastive explanation approach leads to significant improvements in user ability to identify and solve system errors, leading to significant improvements in overall multi-robot team performance.
Authors:Rayan Bahrami, Hamidreza Jafarnejadsani
Abstract:
This paper concerns the consensus and formation of a network of mobile autonomous agents in adversarial settings where a group of malicious (compromised) agents are subject to deception attacks. In addition, the communication network is arbitrarily time-varying and subject to intermittent connections, possibly imposed by denial-of-service (DoS) attacks. We provide explicit bounds for network connectivity in an integral sense, enabling the characterization of the system's resilience to specific classes of adversarial attacks. We also show that under the condition of connectivity in an integral sense uniformly in time, the system is finite-gain $\mathcal{L}_{p}$ stable and uniformly exponentially fast consensus and formation are achievable, provided malicious agents are detected and isolated from the network. We present a distributed and reconfigurable framework with theoretical guarantees for detecting malicious agents, allowing for the resilient cooperation of the remaining cooperative agents. Simulation studies are provided to illustrate the theoretical findings.
Authors:Lei Shi, Qichao Liu, Cheng Zhou, Xiong Li
Abstract:
This paper proposes a fair control framework for multi-robot systems, which integrates the newly introduced Alternative Authority Control (AAC) and Flexible Control Barrier Function (F-CBF). Control authority refers to a single robot which can plan its trajectory while considering others as moving obstacles, meaning the other robots do not have authority to plan their own paths. The AAC method dynamically distributes the control authority, enabling fair and coordinated movement across the system. This approach significantly improves computational efficiency, scalability, and robustness in complex environments. The proposed F-CBF extends traditional CBFs by incorporating obstacle shape, velocity, and orientation. F-CBF enhances safety by accurate dynamic obstacle avoidance. The framework is validated through simulations in multi-robot scenarios, demonstrating its safety, robustness and computational efficiency.
Authors:Jeremy Fersula, Nicolas Bredeche, Olivier Dauchot
Abstract:
Morphological computing, the use of the physical design of a robot to ease the realization of a given task has been proven to be a relevant concept in the context of swarm robotics. Here we demonstrate both experimentally and numerically, that the success of such a strategy may heavily rely on the type of policy adopted by the robots, as well as on the details of the physical design. To do so, we consider a swarm of robots, composed of Kilobots embedded in an exoskeleton, the design of which controls the propensity of the robots to align or anti-align with the direction of the external force they experience. We find experimentally that the contrast that was observed between the two morphologies in the success rate of a simple phototactic task, where the robots were programmed to stop when entering a light region, becomes dramatic, if the robots are not allowed to stop, and can only slow down. Building on a faithful physical model of the self-aligning dynamics of the robots, we perform numerical simulations and demonstrate on one hand that a precise tuning of the self-aligning strength around a sweet spot is required to achieve an efficient phototactic behavior, on the other hand that exploring a range of self-alignment strength allows for a rich expressivity of collective behaviors.
Authors:Bangya Liu, Chengpo Yan, Chenghao Jiang, Suman Banerjee, Akarsh Prabhakara
Abstract:
Cooperative perception between vehicles is poised to offer robust and reliable scene understanding. Recently, we are witnessing experimental systems research building testbeds that share raw spatial sensor data for cooperative perception. While there has been a marked improvement in accuracies and is the natural way forward, we take a moment to consider the problems with such an approach for eventual adoption by automakers. In this paper, we first argue that new forms of privacy concerns arise and discourage stakeholders to share raw sensor data. Next, we present SHARP, a research framework to minimize privacy leakage and drive stakeholders towards the ambitious goal of raw data based cooperative perception. Finally, we discuss open questions for networked systems, mobile computing, perception researchers, industry and government in realizing our proposed framework.
Authors:Themistoklis Charalambous, Nikolaos Pappas, Nikolaos Nomikos, Risto Wichman
Abstract:
As multi-agent systems (MAS) become increasingly prevalent in autonomous systems, distributed control, and edge intelligence, efficient communication under resource constraints has emerged as a critical challenge. Traditional communication paradigms often emphasize message fidelity or bandwidth optimization, overlooking the task relevance of the exchanged information. In contrast, goal-oriented communication prioritizes the importance of information with respect to the agents' shared objectives. This review provides a comprehensive survey of goal-oriented communication in MAS, bridging perspectives from information theory, communication theory, and machine learning. We examine foundational concepts alongside learning-based approaches and emergent protocols. Special attention is given to coordination under communication constraints, as well as applications in domains such as swarm robotics, federated learning, and edge computing. The paper concludes with a discussion of open challenges and future research directions at the intersection of communication theory, machine learning, and multi-agent decision making.
Authors:Fuda van Diggelen, Tugay Alperen Karagüzel, Andres Garcia Rincon, A. E. Eiben, Dario Floreano, Eliseo Ferrante
Abstract:
In this paper, we introduce Hebbian learning as a novel method for swarm robotics, enabling the automatic emergence of heterogeneity. Hebbian learning presents a biologically inspired form of neural adaptation that solely relies on local information. By doing so, we resolve several major challenges for learning heterogeneous control: 1) Hebbian learning removes the complexity of attributing emergent phenomena to single agents through local learning rules, thus circumventing the micro-macro problem; 2) uniform Hebbian learning rules across all swarm members limit the number of parameters needed, mitigating the curse of dimensionality with scaling swarm sizes; and 3) evolving Hebbian learning rules based on swarm-level behaviour minimises the need for extensive prior knowledge typically required for optimising heterogeneous swarms. This work demonstrates that with Hebbian learning heterogeneity naturally emerges, resulting in swarm-level behavioural switching and in significantly improved swarm capabilities. It also demonstrates how the evolution of Hebbian learning rules can be a valid alternative to Multi Agent Reinforcement Learning in standard benchmarking tasks.
Authors:Kehinde O. Aina, Hosain Bagheri, Daniel I. Goldman
Abstract:
As robots are increasingly deployed to collaborate on tasks within shared workspaces and resources, the failure of an individual robot can critically affect the group's performance. This issue is particularly challenging when robots lack global information or direct communication, relying instead on social interaction for coordination and to complete their tasks. In this study, we propose a novel fault-tolerance technique leveraging physical contact interactions in multi-robot systems, specifically under conditions of limited sensing and spatial confinement. We introduce the "Active Contact Response" (ACR) method, where each robot modulates its behavior based on the likelihood of encountering an inoperative (faulty) robot. Active robots are capable of collectively repositioning stationary and faulty peers to reduce obstructions and maintain optimal group functionality. We implement our algorithm in a team of autonomous robots, equipped with contact-sensing and collision-tolerance capabilities, tasked with collectively excavating cohesive model pellets. Experimental results indicate that the ACR method significantly improves the system's recovery time from robot failures, enabling continued collective excavation with minimal performance degradation. Thus, this work demonstrates the potential of leveraging local, social, and physical interactions to enhance fault tolerance and coordination in multi-robot systems operating in constrained and extreme environments.
Authors:Nithish Kumar Saravanan, Varun Jammula, Yezhou Yang, Jeffrey Wishart, Junfeng Zhao
Abstract:
Perception is a key component of Automated vehicles (AVs). However, sensors mounted to the AVs often encounter blind spots due to obstructions from other vehicles, infrastructure, or objects in the surrounding area. While recent advancements in planning and control algorithms help AVs react to sudden object appearances from blind spots at low speeds and less complex scenarios, challenges remain at high speeds and complex intersections. Vehicle to Infrastructure (V2I) technology promises to enhance scene representation for AVs in complex intersections, providing sufficient time and distance to react to adversary vehicles violating traffic rules. Most existing methods for infrastructure-based vehicle detection and tracking rely on LIDAR, RADAR or sensor fusion methods, such as LIDAR-Camera and RADAR-Camera. Although LIDAR and RADAR provide accurate spatial information, the sparsity of point cloud data limits its ability to capture detailed object contours of objects far away, resulting in inaccurate 3D object detection results. Furthermore, the absence of LIDAR or RADAR at every intersection increases the cost of implementing V2I technology. To address these challenges, this paper proposes a V2I framework that utilizes monocular traffic cameras at road intersections to detect 3D objects. The results from the roadside unit (RSU) are then combined with the on-board system using an asynchronous late fusion method to enhance scene representation. Additionally, the proposed framework provides a time delay compensation module to compensate for the processing and transmission delay from the RSU. Lastly, the V2I framework is tested by simulating and validating a scenario similar to the one described in an industry report by Waymo. The results show that the proposed method improves the scene representation and the AV's perception range, giving enough time and space to react to adversary vehicles.
Authors:Khai Yi Chin, Carlo Pinciroli
Abstract:
The collective perception problem -- where a group of robots perceives its surroundings and comes to a consensus on an environmental state -- is a fundamental problem in swarm robotics. Past works studying collective perception use either an entire robot swarm with perfect sensing or a swarm with only a handful of malfunctioning members. A related study proposed an algorithm that does account for an entire swarm of unreliable robots but assumes that the sensor faults are known and remain constant over time. To that end, we build on that study by proposing the Bayes Collective Perception Filter (BayesCPF) that enables robots with continuously degrading sensors to accurately estimate the fill ratio -- the rate at which an environmental feature occurs. Our main contribution is the Extended Kalman Filter within the BayesCPF, which helps swarm robots calibrate for their time-varying sensor degradation. We validate our method across different degradation models, initial conditions, and environments in simulated and physical experiments. Our findings show that, regardless of degradation model assumptions, fill ratio estimation using the BayesCPF is competitive to the case if the true sensor accuracy is known, especially when assumptions regarding the model and initial sensor accuracy levels are preserved.
Authors:Namo Asavisanu, Tina Khezresmaeilzadeh, Rohan Sequeira, Hang Qiu, Fawad Ahmad, Konstantinos Psounis, Ramesh Govindan
Abstract:
With cooperative perception, autonomous vehicles can wirelessly share sensor data and representations to overcome sensor occlusions, improving situational awareness. Securing such data exchanges is crucial for connected autonomous vehicles. Existing, automated reputation-based approaches often suffer from a delay between detection and exclusion of misbehaving vehicles, while majority-based approaches have communication overheads that limits scalability. In this paper, we introduce CATS, a novel automated system that blends together the best traits of reputation-based and majority-based detection mechanisms to secure vehicle-to-everything (V2X) communications for cooperative perception, while preserving the privacy of cooperating vehicles. Our evaluation with city-scale simulations on realistic traffic data shows CATS's effectiveness in rapidly identifying and isolating misbehaving vehicles, with a low false negative rate and overheads, proving its suitability for real world deployments.
Authors:Pascal Goldschmid, Aamir Ahmad
Abstract:
Simulation frameworks are essential for the safe development of robotic applications. However, different components of a robotic system are often best simulated in different environments, making full integration challenging. This is particularly true for partially-open or closed-source simulators, which commonly suffer from two limitations: (i) lack of runtime control over scene actors via interfaces like ROS, and (ii) restricted access to real-time state data (e.g., pose, velocity) of scene objects. In the first part of this work, we address these issues by integrating aerial drones simulated in Parrot's Sphinx environment (used for Anafi drones) into the Gazebo simulator. Our approach uses a mirrored drone instance embedded within Gazebo environments to bridge the two simulators. One key application is aerial target tracking, a common task in multi-robot systems. However, Parrot's default PID-based controller lacks the agility needed for tracking fast-moving targets. To overcome this, in the second part of this work we develop a model predictive controller (MPC) that leverages cumulative error states to improve tracking accuracy. Our MPC significantly outperforms the built-in PID controller in dynamic scenarios, increasing the effectiveness of the overall system. We validate our integrated framework by incorporating the Anafi drone into an existing Gazebo-based airship simulation and rigorously test the MPC against a custom PID baseline in both simulated and real-world experiments.
Authors:Lucas C. D. Bezerra, AtaÃde M. G. dos Santos, Shinkyu Park
Abstract:
We propose a decentralized, learning-based framework for dynamic coalition formation in Multi-Robot Task Allocation (MRTA). Our approach extends MAPPO by integrating spatial action maps, robot motion planning, intention sharing, and task allocation revision to enable effective and adaptive coalition formation. Extensive simulation studies confirm the effectiveness of our model, enabling each robot to rely solely on local information to learn timely revisions of task selections and form coalitions with other robots to complete collaborative tasks. The results also highlight the proposed framework's ability to handle large robot populations and adapt to scenarios with diverse task sets.
Authors:Susu Fang, Hao Li
Abstract:
The SLAMMOT, i.e. simultaneous localization, mapping, and moving object (detection and) tracking, represents an emerging technology for autonomous vehicles in dynamic environments. Such single-vehicle systems still have inherent limitations, such as occlusion issues. Inspired by SLAMMOT and rapidly evolving cooperative technologies, it is natural to explore cooperative simultaneous localization, mapping, moving object (detection and) tracking (C-SLAMMOT) to enhance state estimation for ego-vehicles and moving objects. C-SLAMMOT could significantly upgrade the single-vehicle performance by utilizing and integrating the shared information through communication among the multiple vehicles. This inevitably leads to a fundamental trade-off between performance and communication cost, especially in a scalable manner as the number of collaboration vehicles increases. To address this challenge, we propose a LiDAR-based communication-efficient C-SLAMMOT (CE C-SLAMMOT) method by determining the number of collaboration vehicles. In CE C-SLAMMOT, we adopt descriptor-based methods for enhancing ego-vehicle pose estimation and spatial confidence map-based methods for cooperative object perception, allowing for the continuous and dynamic selection of the corresponding critical collaboration vehicles and interaction content. This approach avoids the waste of precious communication costs by preventing the sharing of information from certain collaborative vehicles that may contribute little or no performance gain, compared to the baseline method of exchanging raw observation information among all vehicles. Comparative experiments in various aspects have confirmed that the proposed method achieves a good trade-off between performance and communication costs, while also outperforms previous state-of-the-art methods in cooperative perception performance.
Authors:Alvaro Calvo, Jesus Capitan
Abstract:
We present a framework for Multi-Robot Task Allocation (MRTA) in heterogeneous teams performing long-endurance missions in dynamic scenarios. Given the limited battery of robots, especially for aerial vehicles, we allow for robot recharges and the possibility of fragmenting and/or relaying certain tasks. We also address tasks that must be performed by a coalition of robots in a coordinated manner. Given these features, we introduce a new class of heterogeneous MRTA problems which we analyze theoretically and optimally formulate as a Mixed-Integer Linear Program. We then contribute a heuristic algorithm to compute approximate solutions and integrate it into a mission planning and execution architecture capable of reacting to unexpected events by repairing or recomputing plans online. Our experimental results show the relevance of our newly formulated problem in a realistic use case for inspection with aerial robots. We assess the performance of our heuristic solver in comparison with other variants and with exact optimal solutions in small-scale scenarios. In addition, we evaluate the ability of our replanning framework to repair plans online.
Authors:Khai Yi Chin, Carlo Pinciroli
Abstract:
Collective perception is a fundamental problem in swarm robotics, often cast as best-of-$n$ decision-making. Past studies involve robots with perfect sensing or with small numbers of faulty robots. We previously addressed these limitations by proposing an algorithm, here referred to as Minimalistic Collective Perception (MCP) [arxiv:2209.12858], to reach correct decisions despite the entire swarm having severely damaged sensors. However, this algorithm assumes that sensor accuracy is known, which may be infeasible in reality. In this paper, we eliminate this assumption to (i) investigate the decline of estimation performance and (ii) introduce an Adaptive Sensor Degradation Filter (ASDF) to mitigate the decline. We combine the MCP algorithm and a hypothesis test to enable adaptive self-calibration of robots' assumed sensor accuracy. We validate our approach across several parameters of interest. Our findings show that estimation performance by a swarm with correctly known accuracy is superior to that by a swarm unaware of its accuracy. However, the ASDF drastically mitigates the damage, even reaching the performance levels of robots aware a priori of their correct accuracy.
Authors:Leonardo Santos, Caio C. G. Ribeiro, Douglas G. Macharet
Abstract:
The exchange of information is key in applications that involve multiple agents, such as search and rescue, military operations, and disaster response. In this work, we propose a simple and effective trajectory planning framework that tackles the design, deployment, and reconfiguration of a communication backbone by reframing the problem of networked multi-agent motion planning as a manipulator motion planning problem. Our approach works for backbones of variable configurations both in terms of the number of robots utilized and the distance limit between each robot. While research has been conducted on connection-restricted navigation for multi-robot systems in the last years, the field of manipulators is arguably more developed both in theory and practice. Hence, our methodology facilitates practical applications built on top of widely available motion planning algorithms and frameworks for manipulators.
Authors:Edward Andert, Francis Mendoza, Hans Walter Behrens, Aviral Shrivastava
Abstract:
Connected Autonomous Vehicles have great potential to improve automobile safety and traffic flow, especially in cooperative applications where perception data is shared between vehicles. However, this cooperation must be secured from malicious intent and unintentional errors that could cause accidents. Previous works typically address singular security or reliability issues for cooperative driving in specific scenarios rather than the set of errors together. In this paper, we propose CONClave, a tightly coupled authentication, consensus, and trust scoring mechanism that provides comprehensive security and reliability for cooperative perception in autonomous vehicles. CONClave benefits from the pipelined nature of the steps such that faults can be detected significantly faster and with less compute. Overall, CONClave shows huge promise in preventing security flaws, detecting even relatively minor sensing faults, and increasing the robustness and accuracy of cooperative perception in CAVs while adding minimal overhead.
Authors:Ruixing Ren, Minqi Tao, Junhui Zhao, Xiaoke Sun, Qiuping Li
Abstract:
As a core technology of intelligent transportation systems, vehicular ad-hoc networks support latency-sensitive services such as safety warning and cooperative perception via vehicle-to-everything communications. However, their highly dynamic topology increases average path length, raises latency, and reduces throughput, severely limiting communication performance. Existing topology optimization methods lack capabilities in multi-objective coordination, dynamic adaptation, and global-local synergy. To address this, this paper proposes a two-layer dynamic topology regulation scheme combining local feature aggregation and global adjustment. The scheme constructs a dynamic multi-objective optimization model integrating average path length, end-to-end latency, and network throughput, and achieves multi-index coordination via link adaptability metrics and a dynamic normalization mechanism. it quickly responds to local link changes via feature fusion of local node feature extraction and dynamic neighborhood sensing, and balances optimization accuracy and real-time performance using a dual-mode adaptive solving strategy for global topology adjustment. It reduces network oscillation risks by introducing a performance improvement threshold and a topology validity verification mechanism. Simulation results on real urban road networks via the SUMO platform show that the proposed scheme outperforms traditional methods in average path length (stabilizing at ~4 hops), end-to-end latency (remaining ~0.01 s), and network throughput.
Authors:Simay Atasoy Bingöl, Tobias Töpfer, Sven Kosub, Heiko Hamann, Andreagiovanni Reina
Abstract:
In collective systems, the available agents are a limited resource that must be allocated among tasks to maximize collective performance. Computing the optimal allocation of several agents to numerous tasks through a brute-force approach can be infeasible, especially when each task's performance scales differently with the increase of agents. For example, difficult tasks may require more agents to achieve similar performances compared to simpler tasks, but performance may saturate nonlinearly as the number of allocated agents increases. We propose a computationally efficient algorithm, based on marginal performance gains, for optimally allocating agents to tasks with concave scalability functions, including linear, saturating, and retrograde scaling, to achieve maximum collective performance. We test the algorithm by allocating a simulated robot swarm among collective decision-making tasks, where embodied agents sample their environment and exchange information to reach a consensus on spatially distributed environmental features. We vary task difficulties by different geometrical arrangements of environmental features in space (patchiness). In this scenario, decision performance in each task scales either as a saturating curve (following the Condorcet's Jury Theorem in an interference-free setup) or as a retrograde curve (when physical interference among robots restricts their movement). Using simple robot simulations, we show that our algorithm can be useful in allocating robots among tasks. Our approach aims to advance the deployment of future real-world multi-robot systems.
Authors:Zhenwei Yang, Yibo Ai, Weidong Zhang
Abstract:
Multi-view cooperative perception and multimodal fusion are essential for reliable 3D spatiotemporal understanding in autonomous driving, especially under occlusions, limited viewpoints, and communication delays in V2X scenarios. This paper proposes XET-V2X, a multi-modal fused end-to-end tracking framework for v2x collaboration that unifies multi-view multimodal sensing within a shared spatiotemporal representation. To efficiently align heterogeneous viewpoints and modalities, XET-V2X introduces a dual-layer spatial cross-attention module based on multi-scale deformable attention. Multi-view image features are first aggregated to enhance semantic consistency, followed by point cloud fusion guided by the updated spatial queries, enabling effective cross-modal interaction while reducing computational overhead. Experiments on the real-world V2X-Seq-SPD dataset and the simulated V2X-Sim-V2V and V2X-Sim-V2I benchmarks demonstrate consistent improvements in detection and tracking performance under varying communication delays. Both quantitative results and qualitative visualizations indicate that XET-V2X achieves robust and temporally stable perception in complex traffic scenarios.
Authors:Zikun Guo, Adeyinka P. Adedigba, Rammohan Mallipeddi, Heoncheol Lee
Abstract:
Multi-robot path planning is a fundamental yet challenging problem due to its combinatorial complexity and the need to balance global efficiency with fair task allocation among robots. Traditional swarm intelligence methods, although effective on small instances, often converge prematurely and struggle to scale to complex environments. In this work, we present a structure-induced exploration framework that integrates structural priors into the search process of the ant colony optimization (ACO). The approach leverages the spatial distribution of the task to induce a structural prior at initialization, thereby constraining the search space. The pheromone update rule is then designed to emphasize structurally meaningful connections and incorporates a load-aware objective to reconcile the total travel distance with individual robot workload. An explicit overlap suppression strategy further ensures that tasks remain distinct and balanced across the team. The proposed framework was validated on diverse benchmark scenarios covering a wide range of instance sizes and robot team configurations. The results demonstrate consistent improvements in route compactness, stability, and workload distribution compared to representative metaheuristic baselines. Beyond performance gains, the method also provides a scalable and interpretable framework that can be readily applied to logistics, surveillance, and search-and-rescue applications where reliable large-scale coordination is essential.
Authors:Connor York, Zachary R Madin, Paul O'Dowd, Edmund R Hunt
Abstract:
Multi-robot systems performing continuous tasks face a performance trade-off when interrupted by urgent, time-critical sub-tasks. We investigate this trade-off in a scenario where a team must balance area patrolling with locating an anomalous radio signal. To address this trade-off, we evaluate both behavioral heterogeneity through agent role specialization ("patrollers" and "searchers") and sensing heterogeneity (i.e., only the searchers can sense the radio signal). Through simulation, we identify the Pareto-optimal trade-offs under varying team compositions, with behaviorally heterogeneous teams demonstrating the most balanced trade-offs in the majority of cases. When sensing capability is restricted, heterogeneous teams with half of the sensing-capable agents perform comparably to homogeneous teams, providing cost-saving rationale for restricting sensor payload deployment. Our findings demonstrate that pre-deployment role and sensing specialization are powerful design considerations for multi-robot systems facing time-conflicting tasks, where varying the degree of behavioral heterogeneity can tune system performance toward either task.
Authors:Gabriel Aguirre, Simay Atasoy Bingöl, Heiko Hamann, Jonas Kuckling
Abstract:
Effective collective decision-making in swarm robotics often requires balancing exploration, communication and individual uncertainty estimation, especially in hazardous environments where direct measurements are limited or costly. We propose a decentralized Bayesian framework that enables a swarm of simple robots to identify the safer of two areas, each characterized by an unknown rate of hazardous events governed by a Poisson process. Robots employ a conjugate prior to gradually predict the times between events and derive confidence estimates to adapt their behavior. Our simulation results show that the robot swarm consistently chooses the correct area while reducing exposure to hazardous events by being sample-efficient. Compared to baseline heuristics, our proposed approach shows better performance in terms of safety and speed of convergence. The proposed scenario has potential to extend the current set of benchmarks in collective decision-making and our method has applications in adaptive risk-aware sampling and exploration in hazardous, dynamic environments.
Authors:Adrian Schönnagel, Michael Dubé, Christoph Steup, Felix Keppler, Sanaz Mostaghim
Abstract:
This paper presents a novel approach to avoiding jackknifing and mutual collisions in Heavy Articulated Vehicles (HAVs) by leveraging decentralized swarm intelligence. In contrast to typical swarm robotics research, our robots are elongated and exhibit complex kinematics, introducing unique challenges. Despite its relevance to real-world applications such as logistics automation, remote mining, airport baggage transport, and agricultural operations, this problem has not been addressed in the existing literature. To tackle this new class of swarm robotics problems, we propose a purely reaction-based, decentralized swarm intelligence strategy tailored to automate elongated, articulated vehicles. The method presented in this paper prioritizes jackknifing avoidance and establishes a foundation for mutual collision avoidance. We validate our approach through extensive simulation experiments and provide a comprehensive analysis of its performance. For the experiments with a single HAV, we observe that for 99.8% jackknifing was successfully avoided and that 86.7% and 83.4% reach their first and second goals, respectively. With two HAVs interacting, we observe 98.9%, 79.4%, and 65.1%, respectively, while 99.7% of the HAVs do not experience mutual collisions.
Authors:Ismail Zrigui, Samira Khoulji, Mohamed Larbi Kerkeb
Abstract:
Intelligent transportation systems (ITS) still have a hard time accurately predicting traffic in cities, especially in big, multimodal settings with complicated spatiotemporal dynamics. This paper presents HybridST, a hybrid architecture that integrates Graph Neural Networks (GNNs), multi-head temporal Transformers, and supervised ensemble learning methods (XGBoost or Random Forest) to collectively capture spatial dependencies, long-range temporal patterns, and exogenous signals, including weather, calendar, or control states. We test our model on the METR-LA, PEMS-BAY, and Seattle Loop tree public benchmark datasets. These datasets include situations ranging from freeway sensor networks to vehicle-infrastructure cooperative perception. Experimental results show that HybridST consistently beats classical baselines (LSTM, GCN, DCRNN, PDFormer) on important metrics like MAE and RMSE, while still being very scalable and easy to understand. The proposed framework presents a promising avenue for real-time urban mobility planning, energy optimization, and congestion alleviation strategies, especially within the framework of smart cities and significant events such as the 2030 FIFA World Cup.
Authors:Malintha Fernando, Petter Ögren, Silun Zhang
Abstract:
The Team Orienteering Problem (TOP) generalizes many real-world multi-robot scheduling and routing tasks that occur in autonomous mobility, aerial logistics, and surveillance applications. While many flavors of the TOP exist for planning in multi-robot systems, they assume that all the robots cooperate toward a single objective; thus, they do not extend to settings where the robots compete in reward-scarce environments. We propose Stochastic Prize-Collecting Games (SPCG) as an extension of the TOP to plan in the presence of self-interested robots operating on a graph, under energy constraints and stochastic transitions. A theoretical study on complete and star graphs establishes that there is a unique pure Nash equilibrium in SPCGs that coincides with the optimal routing solution of an equivalent TOP given a rank-based conflict resolution rule. This work proposes two algorithms: Ordinal Rank Search (ORS) to obtain the ''ordinal rank'' --one's effective rank in temporarily-formed local neighborhoods during the games' stages, and Fictitious Ordinal Response Learning (FORL) to obtain best-response policies against one's senior-rank opponents. Empirical evaluations conducted on road networks and synthetic graphs under both dynamic and stationary prize distributions show that 1) the state-aliasing induced by OR-conditioning enables learning policies that scale more efficiently to large team sizes than those trained with the global index, and 2) Policies trained with FORL generalize better to imbalanced prize distributions than those with other multi-agent training methods. Finally, the learned policies in the SPCG achieved between 87% and 95% optimality compared to an equivalent TOP solution obtained by mixed-integer linear programming.
Authors:Longchen Niu, Gennaro Notomista
Abstract:
In this paper, we introduce a decentralized optimization-based density controller designed to enforce set invariance constraints in multi-robot systems. By designing a decentralized control barrier function, we derived sufficient conditions under which local safety constraints guarantee global safety. We account for localization and motion noise explicitly by modeling robots as spatial probability density functions governed by the Fokker-Planck equation. Compared to traditional centralized approaches, our controller requires less computational and communication power, making it more suitable for deployment in situations where perfect communication and localization are impractical. The controller is validated through simulations and experiments with four quadcopters.
Authors:Takahiro Suzuki, Keisuke Okumura
Abstract:
We consider Connected Unlabeled Multi-Agent Pathfinding (CUMAPF), a variant of MAPF where the agents must maintain connectivity at all times. This problem is fundamental to swarm robotics applications like self-reconfiguration and marching, where standard MAPF is insufficient as it does not guarantee the required connectivity between agents. While unlabeled MAPF is tractable in optimization, CUMAPF is NP-hard even on highly restricted graph classes. To tackle this challenge, we propose PULL, a complete and polynomial-time algorithm with a simple design. It is based on a rule-based one-step function that computes a subsequent configuration that preserves connectivity and advances towards the target configuration. PULL is lightweight, and runs in $O(n^2)$ time per step in 2D grid, where $n$ is the number of agents. Our experiments further demonstrate its practical performance: PULL finds competitive solution qualities against trivial solutions for hundreds of agents, in randomly generated instances. Furthermore, we develop an eventually optimal solver that integrates PULL into an existing search-based MAPF algorithm, providing a valuable tool for small-scale instances.
Authors:Tianheng Zhu, Yiheng Feng
Abstract:
As autonomous driving (AD) technology advances, increasing research has focused on leveraging cooperative perception (CP) data collected from multiple AVs to enhance traffic applications. Due to the impracticality of large-scale real-world AV deployments, simulation has become the primary approach in most studies. While game-engine-based simulators like CARLA generate high-fidelity raw sensor data (e.g., LiDAR point clouds) which can be used to produce realistic detection outputs, they face scalability challenges in multi-AV scenarios. In contrast, microscopic traffic simulators such as SUMO scale efficiently but lack perception modeling capabilities. To bridge this gap, we propose MIDAR, a LiDAR detection mimicking model that approximates realistic LiDAR detections using vehicle-level features readily available from microscopic traffic simulators. Specifically, MIDAR predicts true positives (TPs) and false negatives (FNs) from ideal LiDAR detection results based on the spatial layouts and dimensions of surrounding vehicles. A Refined Multi-hop Line-of-Sight (RM-LoS) graph is constructed to encode the occlusion relationships among vehicles, upon which MIDAR employs a GRU-enhanced APPNP architecture to propagate features from the ego AV and occluding vehicles to the prediction target. MIDAR achieves an AUC of 0.909 in approximating the detection results generated by CenterPoint, a mainstream 3D LiDAR detection model, on the nuScenes AD dataset. Two CP-based traffic applications further validate the necessity of such realistic detection modeling, particularly for tasks requiring accurate individual vehicle observations (e.g., position, speed, lane index). As demonstrated in the applications, MIDAR can be seamlessly integrated into traffic simulators and trajectory datasets and will be open-sourced upon publication.
Authors:Tyrone Justin Sta Maria, Faith Griffin, Jordan Aiko Deja
Abstract:
Gestures are an expressive input modality for controlling multiple robots, but their use is often limited by rigid mappings and recognition constraints. To move beyond these limitations, we propose roleplaying metaphors as a scaffold for designing richer interactions. By introducing three roles: Director, Puppeteer, and Wizard, we demonstrate how narrative framing can guide the creation of diverse gesture sets and interaction styles. These roles enable a variety of scenarios, showing how roleplay can unlock new possibilities for multi-robot systems. Our approach emphasizes creativity, expressiveness, and intuitiveness as key elements for future human-robot interaction design.
Authors:Chenyi Wang, Ruoyu Song, Raymond Muller, Jean-Philippe Monteuuis, Z. Berkay Celik, Jonathan Petit, Ryan Gerdes, Ming Li
Abstract:
Cooperative perception (CP) enhances situational awareness of connected and autonomous vehicles by exchanging and combining messages from multiple agents. While prior work has explored adversarial integrity attacks that degrade perceptual accuracy, little is known about CP's robustness against attacks on timeliness (or availability), a safety-critical requirement for autonomous driving. In this paper, we present CP-FREEZER, the first latency attack that maximizes the computation delay of CP algorithms by injecting adversarial perturbation via V2V messages. Our attack resolves several unique challenges, including the non-differentiability of point cloud preprocessing, asynchronous knowledge of the victim's input due to transmission delays, and uses a novel loss function that effectively maximizes the execution time of the CP pipeline. Extensive experiments show that CP-FREEZER increases end-to-end CP latency by over $90\times$, pushing per-frame processing time beyond 3 seconds with a 100% success rate on our real-world vehicle testbed. Our findings reveal a critical threat to the availability of CP systems, highlighting the urgent need for robust defenses.
Authors:Samratul Fuady, Danesh Tarapore, Mohammad D. Soorati
Abstract:
Collective decision-making is a key function of autonomous robot swarms, enabling them to reach a consensus on actions based on environmental features. Existing strategies require the participation of all robots in the decision-making process, which is resource-intensive and prevents the swarm from allocating the robots to any other tasks. We propose Subset-Based Collective Decision-Making (SubCDM), which enables decisions using only a swarm subset. The construction of the subset is dynamic and decentralized, relying solely on local information. Our method allows the swarm to adaptively determine the size of the subset for accurate decision-making, depending on the difficulty of reaching a consensus. Simulation results using one hundred robots show that our approach achieves accuracy comparable to using the entire swarm while reducing the number of robots required to perform collective decision-making, making it a resource-efficient solution for collective decision-making in swarm robotics.
Authors:Tohid Kargar Tasooji, Ramviyas Parasuraman
Abstract:
In multi-robot systems (MRS), cooperative localization is a crucial task for enhancing system robustness and scalability, especially in GPS-denied or communication-limited environments. However, adversarial attacks, such as sensor manipulation, and communication jamming, pose significant challenges to the performance of traditional localization methods. In this paper, we propose a novel distributed fault-tolerant cooperative localization framework to enhance resilience against sensor and communication disruptions in adversarial environments. We introduce an adaptive event-triggered communication strategy that dynamically adjusts communication thresholds based on real-time sensing and communication quality. This strategy ensures optimal performance even in the presence of sensor degradation or communication failure. Furthermore, we conduct a rigorous analysis of the convergence and stability properties of the proposed algorithm, demonstrating its resilience against bounded adversarial zones and maintaining accurate state estimation. Robotarium-based experiment results show that our proposed algorithm significantly outperforms traditional methods in terms of localization accuracy and communication efficiency, particularly in adversarial settings. Our approach offers improved scalability, reliability, and fault tolerance for MRS, making it suitable for large-scale deployments in real-world, challenging environments.
Authors:Felipe Valle Quiroz, Johan Elfing, Joel PÃ¥lsson, Elena Haller, Oscar Amador Molina
Abstract:
This paper introduces a novel intention-sharing mechanism for Electrically Power-Assisted Cycles (EPACs) within V2X communication frameworks, enhancing the ETSI VRU Awareness Message (VAM) protocol. The method replaces discrete predicted trajectory points with a compact elliptical geographical area representation derived via quadratic polynomial fitting and Least Squares Method (LSM). This approach encodes trajectory predictions with fixed-size data payloads, independent of the number of forecasted points, enabling higher-frequency transmissions and improved network reliability. Simulation results demonstrate superior inter-packet gap (IPG) performance compared to standard ETSI VAMs, particularly under constrained communication conditions. A physical experiment validates the feasibility of real-time deployment on embedded systems. The method supports scalable, low-latency intention sharing, contributing to cooperative perception and enhanced safety for vulnerable road users in connected and automated mobility ecosystems. Finally, we discuss the viability of LSM and open the door to other methods for prediction.
Authors:Ashish Verma, Avinash Gautam, Tanishq Duhan, V. S. Shekhawat, Sudeept Mohan
Abstract:
Coordinating time-sensitive deliveries in environments like hospitals poses a complex challenge, particularly when managing multiple online pickup and delivery requests within strict time windows using a team of heterogeneous robots. Traditional approaches fail to address dynamic rescheduling or diverse service requirements, typically restricting robots to single-task types. This paper tackles the Multi-Pickup and Delivery Problem with Time Windows (MPDPTW), where autonomous mobile robots are capable of handling varied service requests. The objective is to minimize late delivery penalties while maximizing task completion rates. To achieve this, we propose a novel framework leveraging a heterogeneous robot team and an efficient dynamic scheduling algorithm that supports dynamic task rescheduling. Users submit requests with specific time constraints, and our decentralized algorithm, Heterogeneous Mobile Robots Online Diverse Task Allocation (HMR-ODTA), optimizes task assignments to ensure timely service while addressing delays or task rejections. Extensive simulations validate the algorithm's effectiveness. For smaller task sets (40-160 tasks), penalties were reduced by nearly 63%, while for larger sets (160-280 tasks), penalties decreased by approximately 50%. These results highlight the algorithm's effectiveness in improving task scheduling and coordination in multi-robot systems, offering a robust solution for enhancing delivery performance in structured, time-critical environments.
Authors:Sheikh A. Tahmid, Gennaro Notomista
Abstract:
Many modern robotic systems such as multi-robot systems and manipulators exhibit redundancy, a property owing to which they are capable of executing multiple tasks. This work proposes a novel method, based on the Reinforcement Learning (RL) paradigm, to train redundant robots to be able to execute multiple tasks concurrently. Our approach differs from typical multi-objective RL methods insofar as the learned tasks can be combined and executed in possibly time-varying prioritized stacks. We do so by first defining a notion of task independence between learned value functions. We then use our definition of task independence to propose a cost functional that encourages a policy, based on an approximated value function, to accomplish its control objective while minimally interfering with the execution of higher priority tasks. This allows us to train a set of control policies that can be executed simultaneously. We also introduce a version of fitted value iteration to learn to approximate our proposed cost functional efficiently. We demonstrate our approach on several scenarios and robotic systems.
Authors:Mingming Peng, Zhendong Chen, Jie Yang, Jin Huang, Zhengqi Shi, Qihao Liu, Xinyu Li, Liang Gao
Abstract:
With the accelerated development of Industry 4.0, intelligent manufacturing systems increasingly require efficient task allocation and scheduling in multi-robot systems. However, existing methods rely on domain expertise and face challenges in adapting to dynamic production constraints. Additionally, enterprises have high privacy requirements for production scheduling data, which prevents the use of cloud-based large language models (LLMs) for solution development. To address these challenges, there is an urgent need for an automated modeling solution that meets data privacy requirements. This study proposes a knowledge-augmented mixed integer linear programming (MILP) automated formulation framework, integrating local LLMs with domain-specific knowledge bases to generate executable code from natural language descriptions automatically. The framework employs a knowledge-guided DeepSeek-R1-Distill-Qwen-32B model to extract complex spatiotemporal constraints (82% average accuracy) and leverages a supervised fine-tuned Qwen2.5-Coder-7B-Instruct model for efficient MILP code generation (90% average accuracy). Experimental results demonstrate that the framework successfully achieves automatic modeling in the aircraft skin manufacturing case while ensuring data privacy and computational efficiency. This research provides a low-barrier and highly reliable technical path for modeling in complex industrial scenarios.
Authors:Kai Li, Zhao Ma, Liang Li, Shiyu Zhao
Abstract:
In this paper, we propose a framework, collective behavioral cloning (CBC), to learn the underlying interaction mechanism and control policy of a swarm system. Given the trajectory data of a swarm system, we propose a graph variational autoencoder (GVAE) to learn the local interaction graph. Based on the interaction graph and swarm trajectory, we use behavioral cloning to learn the control policy of the swarm system. To demonstrate the practicality of CBC, we deploy it on a real-world decentralized vision-based robot swarm system. A visual attention network is trained based on the learned interaction graph for online neighbor selection. Experimental results show that our method outperforms previous approaches in predicting both the interaction graph and swarm actions with higher accuracy. This work offers a promising approach for understanding interaction mechanisms and swarm dynamics in future swarm robotics research. Code and data are available.
Authors:Aritra Pal, Anandsingh Chauhan, Mayank Baranwal
Abstract:
Efficient task allocation among multiple robots is crucial for optimizing productivity in modern warehouses, particularly in response to the increasing demands of online order fulfillment. This paper addresses the real-time multi-robot task allocation (MRTA) problem in dynamic warehouse environments, where tasks emerge with specified start and end locations. The objective is to minimize both the total travel distance of robots and delays in task completion, while also considering practical constraints such as battery management and collision avoidance. We introduce MRTAgent, a dual-agent Reinforcement Learning (RL) framework inspired by self-play, designed to optimize task assignments and robot selection to ensure timely task execution. For safe navigation, a modified linear quadratic controller (LQR) approach is employed. To the best of our knowledge, MRTAgent is the first framework to address all critical aspects of practical MRTA problems while supporting continuous robot movements.
Authors:Tengfei Lyu, Md Noor-A-Rahim, Dirk Pesch, Aisling O'Driscoll
Abstract:
This paper provides an in-depth review and discussion of the state of the art in redundancy mitigation for the vehicular Collective Perception Service (CPS). We focus on the evolutionary differences between the redundancy mitigation rules proposed in 2019 in ETSI TR 103 562 versus the 2023 technical specification ETSI TS 103 324, which uses a Value of Information (VoI) based mitigation approach. We also critically analyse the academic literature that has sought to quantify the communication challenges posed by the CPS and present a unique taxonomy of the redundancy mitigation approaches proposed using three distinct classifications: object inclusion filtering, data format optimisation, and frequency management. Finally, this paper identifies open research challenges that must be adequately investigated to satisfactorily deploy CPS redundancy mitigation measures. Our critical and comprehensive evaluation serves as a point of reference for those undertaking research in this area.
Authors:Atharva Sagale, Tohid Kargar Tasooji, Ramviyas Parasuraman
Abstract:
This paper presents a novel approach to range-based cooperative localization for robot swarms in GPS-denied environments, addressing the limitations of current methods in noisy and sparse settings. We propose a robust multi-layered localization framework that combines shadow edge localization techniques with the strategic deployment of UAVs. This approach not only addresses the challenges associated with nonrigid and poorly connected graphs but also enhances the convergence rate of the localization process. We introduce two key concepts: the S1-Edge approach in our distributed protocol to address the rigidity problem of sparse graphs and the concept of a powerful UAV node to increase the sensing and localization capability of the multi-robot system. Our approach leverages the advantages of the distributed localization methods, enhancing scalability and adaptability in large robot networks. We establish theoretical conditions for the new S1-Edge that ensure solutions exist even in the presence of noise, thereby validating the effectiveness of shadow edge localization. Extensive simulation experiments confirm the superior performance of our method compared to state-of-the-art techniques, resulting in up to 95\% reduction in localization error, demonstrating substantial improvements in localization accuracy and robustness to sparse graphs. This work provides a decisive advancement in the field of multi-robot localization, offering a powerful tool for high-performance and reliable operations in challenging environments.
Authors:Aiman Munir, Ayan Dutta, Ramviyas Parasuraman
Abstract:
We propose a distributed control law for a heterogeneous multi-robot coverage problem, where the robots could have different energy characteristics, such as capacity and depletion rates, due to their varying sizes, speeds, capabilities, and payloads. Existing energy-aware coverage control laws consider capacity differences but assume the battery depletion rate to be the same for all robots. In realistic scenarios, however, some robots can consume energy much faster than other robots; for instance, UAVs hover at different altitudes, and these changes could be dynamically updated based on their assigned tasks. Robots' energy capacities and depletion rates need to be considered to maximize the performance of a multi-robot system. To this end, we propose a new energy-aware controller based on Lloyd's algorithm to adapt the weights of the robots based on their energy dynamics and divide the area of interest among the robots accordingly. The controller is theoretically analyzed and extensively evaluated through simulations and real-world demonstrations in multiple realistic scenarios and compared with three baseline control laws to validate its performance and efficacy.
Authors:Di Ni, Hungtang Ko, Radhika Nagpal
Abstract:
The schooling behavior of fish is hypothesized to confer many survival benefits, including foraging success, safety from predators, and energy savings through hydrodynamic interactions when swimming in formation. Underwater robot collectives may be able to achieve similar benefits in future applications, e.g. using formation control to achieve efficient spatial sampling for environmental monitoring. Although many theoretical algorithms exist for multi-robot formation control, they have not been tested in the underwater domain due to the fundamental challenges in underwater communication. Here we introduce a leader-follower strategy for underwater formation control that allows us to realize complex 3D formations, using purely vision-based perception and a reactive control algorithm that is low computation. We use a physical platform, BlueSwarm, to demonstrate for the first time an experimental realization of inline, side-by-side, and staggered swimming 3D formations. More complex formations are studied in a physics-based simulator, providing new insights into the convergence and stability of formations given underwater inertial/drag conditions. Our findings lay the groundwork for future applications of underwater robot swarms in aquatic environments with minimal communication.
Authors:Pengxi Zeng, Alberto Presta, Jonah Reinis, Dinesh Bharadia, Hang Qiu, Pamela Cosman
Abstract:
Efficient point cloud (PC) compression is crucial for streaming applications, such as augmented reality and cooperative perception. Classic PC compression techniques encode all the points in a frame. Tailoring compression towards perception tasks at the receiver side, we ask the question, "Can we remove the ground points during transmission without sacrificing the detection performance?" Our study reveals a strong dependency on the ground from state-of-the-art (SOTA) 3D object detection models, especially on those points below and around the object. In this work, we propose a lightweight obstacle-aware Pillar-based Ground Removal (PGR) algorithm. PGR filters out ground points that do not provide context to object recognition, significantly improving compression ratio without sacrificing the receiver side perception performance. Not using heavy object detection or semantic segmentation models, PGR is light-weight, highly parallelizable, and effective. Our evaluations on KITTI and Waymo Open Dataset show that SOTA detection models work equally well with PGR removing 20-30% of the points, with a speeding of 86 FPS.
Authors:Zachary Fuge, Benjamin Beiter, Alexander Leonessa
Abstract:
This work presents the design and development of the quadruped robot Squeaky to be used as a research and learning platform for single and multi-SLAM robotics, computer vision, and reinforcement learning. Affordable robots are becoming necessary when expanding from single to multi-robot applications, as the cost can increase exponentially as fleet size increases. SLAM is essential for a robot to perceive and localize within its environment to perform applications such as cave exploration, disaster assistance, and remote inspection. For improved efficiency, a fleet of robots can be employed to combine maps for multi-robot SLAM. Squeaky is an affordable quadrupedal robot, designed to have easily adaptable hardware and software, capable of creating a merged map under a shared network from multiple robots, and available open-source for the benefit of the research community.
Authors:Chao Huang, Wenshuo Zang, Carlo Pinciroli, Zhi Jane Li, Taposh Banerjee, Lili Su, Rui Liu
Abstract:
Compared with single robots, Multi-Robot Systems (MRS) can perform missions more efficiently due to the presence of multiple members with diverse capabilities. However, deploying an MRS in wide real-world environments is still challenging due to uncertain and various obstacles (e.g., building clusters and trees). With a limited understanding of environmental uncertainty on performance, an MRS cannot flexibly adjust its behaviors (e.g., teaming, load sharing, trajectory planning) to ensure both environment adaptation and task accomplishments. In this work, a novel joint preference landscape learning and behavior adjusting framework (PLBA) is designed. PLBA efficiently integrates real-time human guidance to MRS coordination and utilizes Sparse Variational Gaussian Processes with Varying Output Noise to quickly assess human preferences by leveraging spatial correlations between environment characteristics. An optimization-based behavior-adjusting method then safely adapts MRS behaviors to environments. To validate PLBA's effectiveness in MRS behavior adaption, a flood disaster search and rescue task was designed. 20 human users provided 1764 feedback based on human preferences obtained from MRS behaviors related to "task quality", "task progress", "robot safety". The prediction accuracy and adaptation speed results show the effectiveness of PLBA in preference learning and MRS behavior adaption.
Authors:Shree Charran R, Rahul Kumar Dubey
Abstract:
Rapid motorization in emerging economies such as India has created severe enforcement asymmetries, with over 11 million recorded violations in 2023 against a human policing density of roughly one officer per 4000 vehicles. Traditional surveillance and manual ticketing cannot scale to this magnitude, motivating the need for an autonomous, cooperative, and energy efficient edge AI perception infrastructure. This paper presents a real time roadside perception node for multi class traffic violation analytics and safety event dissemination within a connected and intelligent vehicle ecosystem. The node integrates YOLOv8 Nano for high accuracy multi object detection, DeepSORT for temporally consistent vehicle tracking, and a rule guided OCR post processing engine capable of recognizing degraded or multilingual license plates compliant with MoRTH AIS 159 and ISO 7591 visual contrast standards. Deployed on an NVIDIA Jetson Nano with a 128 core Maxwell GPU and optimized via TensorRT FP16 quantization, the system sustains 28 to 30 frames per second inference at 9.6 W, achieving 97.7 percent violation detection accuracy and 84.9 percent OCR precision across five violation classes, namely signal jumping, zebra crossing breach, wrong way driving, illegal U turn, and speeding, without manual region of interest calibration. Comparative benchmarking against YOLOv4 Tiny, PP YOLOE S, and Nano DetPlus demonstrates a 10.7 percent mean average precision gain and a 1.4 times accuracy per watt improvement. Beyond enforcement, the node publishes standardized safety events of CAM and DENM type to connected vehicles and intelligent transportation system backends via V2X protocols, demonstrating that roadside edge AI analytics can augment cooperative perception and proactive road safety management within the IEEE Intelligent Vehicles ecosystem.
Authors:Shyam prasad reddy Kaitha, Hongrui Yu
Abstract:
Multi-robot task allocation in construction automation has traditionally relied on optimization methods such as Dynamic Programming and Reinforcement Learning. This research introduces the LangGraph-based Task Allocation Agent (LTAA), an LLM-driven framework that integrates phase-adaptive allocation strategies, multi-stage validation with hierarchical retries, and dynamic prompting for efficient robot coordination. Although recent LLM approaches show potential for construction robotics, they largely lack rigorous validation and benchmarking against established algorithms. This paper presents the first systematic comparison of LLM-based task allocation with traditional methods in construction scenarios.The study validates LLM feasibility through SMART-LLM replication and addresses implementation challenges using a Self-Corrective Agent Architecture. LTAA leverages natural-language reasoning combined with structured validation mechanisms, achieving major computational gains reducing token usage by 94.6% and allocation time by 86% through dynamic prompting. The framework adjusts its strategy across phases: emphasizing execution feasibility early and workload balance in later allocations.The authors evaluate LTAA against Dynamic Programming, Q-learning, and Deep Q-Network (DQN) baselines using construction operations from the TEACh human-robot collaboration dataset. In the Heavy Excels setting, where robots have strong task specializations, LTAA achieves 77% task completion with superior workload balance, outperforming all traditional methods. These findings show that LLM-based reasoning with structured validation can match established optimization algorithms while offering additional advantages such as interpretability, adaptability, and the ability to update task logic without retraining.
Authors:Jonathan S. Kent, Eliana Stefani, Brian K. Plancher
Abstract:
Fast, efficient, robust communication during wildfire and other emergency responses is critical. One way to achieve this is by coordinating swarms of autonomous aerial vehicles carrying communications equipment to form an ad-hoc network connecting emergency response personnel to both each other and central command. However, operating in such extreme environments may lead to individual networking agents being damaged or rendered inoperable, which could bring down the network and interrupt communications. To overcome this challenge and enable multi-agent UAV networking in difficult environments, this paper introduces and formalizes the problem of Robust Task Networking Under Attrition (RTNUA), which extends connectivity maintenance in multi-robot systems to explicitly address proactive redundancy and attrition recovery. We introduce Physics-Informed Robust Employment of Multi-Agent Networks ($Φ$IREMAN), a topological algorithm leveraging physics-inspired potential fields to solve this problem. Through simulation across 25 problem configurations, $Φ$IREMAN consistently outperforms the DCCRS baseline, and on large-scale problems with up to 100 tasks and 500 drones, maintains $>99.9\%$ task uptime despite substantial attrition, demonstrating both effectiveness and scalability.
Authors:Chenyi Wang, Zhaowei Li, Ming F. Li, Wujie Wen
Abstract:
Multi-agent cooperative perception (CP) promises to overcome the inherent occlusion and sensing-range limitations of single-agent systems (e.g., autonomous driving). However, its practicality is severely constrained by the limited communication bandwidth. Existing approaches attempt to improve bandwidth efficiency via compression or heuristic message selection, without considering the semantic relevance or cross-agent redundancy of sensory data. We argue that a practical CP system must maximize the contribution of every transmitted bit to the final perception task, by extracting and transmitting semantically essential and non-redundant data. In this paper, we formulate a joint semantic feature encoding and transmission problem, which aims to maximize CP accuracy under limited bandwidth. To solve this problem, we introduce JigsawComm, an end-to-end trained, semantic-aware, and communication-efficient CP framework that learns to ``assemble the puzzle'' of multi-agent feature transmission. It uses a regularized encoder to extract semantically-relevant and sparse features, and a lightweight Feature Utility Estimator to predict the contribution of each agent's features to the final perception task. The resulting meta utility maps are exchanged among agents and leveraged to compute a provably optimal transmission policy, which selects features from agents with the highest utility score for each location. This policy inherently eliminates redundancy and achieves a scalable $\mathcal{O}(1)$ communication cost as the number of agents increases. On the benchmarks OPV2V and DAIR-V2X, JigsawComm reduces the total data volume by up to $>$500$\times$ while achieving matching or superior accuracy compared to state-of-the-art methods.
Authors:Jaskirat Singh, Rohan Chandra
Abstract:
Multi-robot systems are increasingly being used for critical applications such as rescuing injured people, delivering food and medicines, and monitoring key areas. These applications usually involve navigating at high speeds through constrained spaces such as small gaps. Navigating such constrained spaces becomes particularly challenging when the space is crowded with multiple heterogeneous agents all of which have urgent priorities. What makes the problem even harder is that during an active response situation, roles and priorities can quickly change on a dime without informing the other agents. In order to complete missions in such environments, robots must not only be safe, but also agile, able to dodge and change course at a moment's notice. In this paper, we propose FACA, a fair and agile collision avoidance approach where robots coordinate their tasks by talking to each other via natural language (just as people do). In FACA, robots balance safety with agility via a novel artificial potential field algorithm that creates an automatic ``roundabout'' effect whenever a conflict arises. Our experiments show that FACA achieves a improvement in efficiency, completing missions more than 3.5X faster than baselines with a time reduction of over 70% while maintaining robust safety margins.
Authors:Dave van der Meer, Loïck P. Chovet, Gabriel M. Garcia, Abhishek Bera, Miguel A. Olivares-Mendez
Abstract:
The European Space Agency (ESA) and the European Space Resources Innovation Centre (ESRIC) created the Space Resources Challenge to invite researchers and companies to propose innovative solutions for Multi-Robot Systems (MRS) space prospection. This paper proposes the Resilient Exploration And Lunar Mapping System 2 (REALMS2), a MRS framework for planetary prospection and mapping. Based on Robot Operating System version 2 (ROS 2) and enhanced with Visual Simultaneous Localisation And Mapping (vSLAM) for map generation, REALMS2 uses a mesh network for a robust ad hoc network. A single graphical user interface (GUI) controls all the rovers, providing a simple overview of the robotic mission. This system is designed for heterogeneous multi-robot exploratory missions, tackling the challenges presented by extraterrestrial environments. REALMS2 was used during the second field test of the ESA-ESRIC Challenge and allowed to map around 60% of the area, using three homogeneous rovers while handling communication delays and blackouts.
Authors:Hadas C. Kuzmenko, David Ehevich, Oren Gal
Abstract:
Marine oil spills pose grave environmental and economic risks, threatening marine ecosystems, coastlines, and dependent industries. Predicting and managing oil spill trajectories is highly complex, due to the interplay of physical, chemical, and environmental factors such as wind, currents, and temperature, which makes timely and effective response challenging. Accurate real-time trajectory forecasting and coordinated mitigation are vital for minimizing the impact of these disasters. This study introduces an integrated framework combining a multi-agent swarm robotics system built on the MOOS-IvP platform with Liquid Time-Constant Neural Networks (LTCNs). The proposed system fuses adaptive machine learning with autonomous marine robotics, enabling real-time prediction, dynamic tracking, and rapid response to evolving oil spills. By leveraging LTCNs--well-suited for modeling complex, time-dependent processes--the framework achieves real-time, high-accuracy forecasts of spill movement. Swarm intelligence enables decentralized, scalable, and resilient decision-making among robot agents, enhancing collective monitoring and containment efforts. Our approach was validated using data from the Deepwater Horizon spill, where the LTC-RK4 model achieved 0.96 spatial accuracy, surpassing LSTM approaches by 23%. The integration of advanced neural modeling with autonomous, coordinated robotics demonstrates substantial improvements in prediction precision, flexibility, and operational scalability. Ultimately, this research advances the state-of-the-art for sustainable, autonomous oil spill management and environmental protection by enhancing both trajectory prediction and response coordination.
Authors:Federica Di Lauro, Domenico G. Sorrenti, Miguel Angel Sotelo
Abstract:
Multi-robot SLAM aims at localizing and building a map with multiple robots, interacting with each other. In the work described in this article, we analyze the pipeline of a decentralized LiDAR SLAM system to study the current limitations of the state of the art, and we discover a significant source of failures, i.e., that the loop detection is the source of too many false positives. We therefore develop and propose a new heuristic to overcome these limitations. The environment taken as reference in this work is the highly challenging case of underground tunnels. We also highlight potential new research areas still under-explored.
Authors:Maria Eduarda Silva de Macedo, Ana Paula Chiarelli de Souza, Roberto Silvio Ubertino Rosso, Yuri Kaszubowski Lopes
Abstract:
The deployment of simple emergent behaviors in swarm robotics has been well-rehearsed in the literature. A recent study has shown how self-aggregation is possible in a multitask approach -- where multiple self-aggregation task instances occur concurrently in the same environment. The multitask approach poses new challenges, in special, how the dynamic of each group impacts the performance of others. So far, the multitask self-aggregation of groups of robots suffers from generating a circular formation -- that is not fully compact -- or is not fully autonomous. In this paper, we present a multitask self-aggregation where groups of homogeneous robots sort themselves into different compact clusters, relying solely on a line-of-sight sensor. Our multitask self-aggregation behavior was able to scale well and achieve a compact formation. We report scalability results from a series of simulation trials with different configurations in the number of groups and the number of robots per group. We were able to improve the multitask self-aggregation behavior performance in terms of the compactness of the clusters, keeping the proportion of clustered robots found in other studies.
Authors:Thayanne França da Silva, José Everardo Bessa Maia
Abstract:
Understanding leadership dynamics in collective behavior is a key challenge in animal ecology, swarm robotics, and intelligent transportation. Traditional information-theoretic approaches, including Transfer Entropy (TE) and Time-Lagged Mutual Information (TLMI), have been widely used to infer leader-follower relationships but face critical limitations in noisy or short-duration datasets due to their reliance on robust probability estimations. This study proposes a method based on dynamic network inference using time-lagged correlations across multiple kinematic variables: velocity, acceleration, and direction. Our approach constructs directed influence graphs over time, enabling the identification of leadership patterns without the need for large volumes of data or parameter-sensitive discretization. We validate our method through two multi-agent simulations in NetLogo: a modified Vicsek model with informed leaders and a predator-prey model featuring coordinated and independent wolf groups. Experimental results demonstrate that the network-based method outperforms TE and TLMI in scenarios with limited spatiotemporal observations, ranking true leaders at the top of influence metrics more consistently than TE and TLMI.
Authors:YR Darr, MA Niazi
Abstract:
The self-organization of robots for the formation of structures and shapes is a stimulating application of the swarm robotic system. It involves a large number of autonomous robots of heterogeneous behavior, coordination among them, and their interaction with the dynamic environment. This process of complex structure formation is considered a complex system, which needs to be modeled by using any modeling approach. Although the formal specification approach along with other formal methods has been used to model the behavior of robots in a swarm. However, to the best of our knowledge, the formal specification approach has not been used to model the self-organization process in swarm robotic systems for shape formation. In this paper, we use a formal specification approach to model the shape formation task of swarm robots. We use Z (Zed) language of formal specification, which is a state-based language, to model the states of the entities of the systems. We demonstrate the effectiveness of Z for the self-organized shape formation. The presented formal specification model gives the outlines for designing and implementing the swarm robotic system for the formation of complex shapes and structures. It also provides the foundation for modeling the complex shape formation process for swarm robotics using a multi-agent system in a simulation-based environment. Keywords: Swarm robotics, Self-organization, Formal specification, Complex systems
Authors:Jose Fernando Contreras-Monsalvo, Victor Dossetti, Blanca Susana Soto-Cruz
Abstract:
In this work, we fabricated and studied two designs for omnidirectional vision sensors for swarm robotics, based on catadioptric systems consisting of a mirror with rotational symmetry, eight discrete infrared photodiodes and a single LED, in order to provide localization and navigation abilities for mobile robotic agents. We considered two arrangements for the photodiodes: one in which they point upward into the mirror, and one in which they point outward, perpendicular to the mirror. To determine which design offers a better field of view on the plane, as well as detection of distance and orientation between two agents, we developed a test rail with three degrees of freedom to experimentally and systematically measure the signal registered by the photodiodes of a given sensor (in a single readout) from the light emitted by another as functions of the distance and orientation. Afterwards, we processed and analyzed the experimental data to develop mathematical models for the mean response of a photodiode in each design. Finally, by numerically inverting the models, we compared the two designs in terms of their accuracy. Our results show that the design with the photodiodes pointing upward resolves better the distance, while the other resolves better the orientation of the emitting agent, both providing an omnidirectional field of view.
Authors:Norah K. Alghamdi, Shinkyu Park
Abstract:
We propose an opinion-driven navigation framework for multi-robot traversal through a narrow corridor. Our approach leverages a multi-agent decision-making model known as the Nonlinear Opinion Dynamics (NOD) to address the narrow corridor passage problem, formulated as a multi-robot navigation game. By integrating the NOD model with a multi-robot path planning algorithm, we demonstrate that the framework effectively reduces the likelihood of deadlocks during corridor traversal. To ensure scalability with an increasing number of robots, we introduce a game reduction technique that enables efficient coordination in larger groups. Extensive simulation studies are conducted to validate the effectiveness of the proposed approach.
Authors:Pranav Kedia, Madhav Rao
Abstract:
GenGrid is a novel comprehensive open-source, distributed platform intended for conducting extensive swarm robotic experiments. The modular platform is designed to run swarm robotics experiments that are compatible with different types of mobile robots ranging from Colias, Kilobot, and E puck. The platform offers programmable control over the experimental setup and its parameters and acts as a tool to collect swarm robot data, including localization, sensory feedback, messaging, and interaction. GenGrid is designed as a modular grid of attachable computing nodes that offers bidirectional communication between the robotic agent and grid nodes and within grids. The paper describes the hardware and software architecture design of the GenGrid system. Further, it discusses some common experimental studies covering multi-robot and swarm robotics to showcase the platform's use. GenGrid of 25 homogeneous cells with identical sensing and communication characteristics with a footprint of 37.5 cm X 37.5 cm, exhibits multiple capabilities with minimal resources. The open-source hardware platform is handy for running swarm experiments, including robot hopping based on multiple gradients, collective transport, shepherding, continuous pheromone deposition, and subsequent evaporation. The low-cost, modular, and open-source platform is significant in the swarm robotics research community, which is currently driven by commercial platforms that allow minimal modifications.
Authors:Juan Bravo-Arrabal, Ricardo Vázquez-MartÃn, J. J. Fernández-Lozano, Alfonso GarcÃa-Cerezo
Abstract:
This paper presents field-tested use cases from Search and Rescue (SAR) missions, highlighting the co-design of mobile robots and communication systems to support Edge-Cloud architectures based on 5G Standalone (SA). The main goal is to contribute to the effective cooperation of multiple robots and first responders. Our field experience includes the development of Hybrid Wireless Sensor Networks (H-WSNs) for risk and victim detection, smartphones integrated into the Robot Operating System (ROS) as Edge devices for mission requests and path planning, real-time Simultaneous Localization and Mapping (SLAM) via Multi-Access Edge Computing (MEC), and implementation of Uncrewed Ground Vehicles (UGVs) for victim evacuation in different navigation modes. These experiments, conducted in collaboration with actual first responders, underscore the need for intelligent network resource management, balancing low-latency and high-bandwidth demands. Network slicing is key to ensuring critical emergency services are performed despite challenging communication conditions. The paper identifies architectural needs, lessons learned, and challenges to be addressed by 6G technologies to enhance emergency response capabilities.
Authors:Xinxin Feng, Haoran Sun, Haifeng Zheng
Abstract:
Vehicle-to-Infrastructure (V2I) collaborative perception leverages data collected by infrastructure's sensors to enhance vehicle perceptual capabilities. LiDAR, as a commonly used sensor in cooperative perception, is widely equipped in intelligent vehicles and infrastructure. However, its superior performance comes with a correspondingly high cost. To achieve low-cost V2I, reducing the cost of LiDAR is crucial. Therefore, we study adopting low-resolution LiDAR on the vehicle to minimize cost as much as possible. However, simply reducing the resolution of vehicle's LiDAR results in sparse point clouds, making distant small objects even more blurred. Additionally, traditional communication methods have relatively low bandwidth utilization efficiency. These factors pose challenges for us. To balance cost and perceptual accuracy, we propose a new collaborative perception framework, namely LCV2I. LCV2I uses data collected from cameras and low-resolution LiDAR as input. It also employs feature offset correction modules and regional feature enhancement algorithms to improve feature representation. Finally, we use regional difference map and regional score map to assess the value of collaboration content, thereby improving communication bandwidth efficiency. In summary, our approach achieves high perceptual performance while substantially reducing the demand for high-resolution sensors on the vehicle. To evaluate this algorithm, we conduct 3D object detection in the real-world scenario of DAIR-V2X, demonstrating that the performance of LCV2I consistently surpasses currently existing algorithms.
Authors:Xiaoshan Lin, Roberto Tron
Abstract:
This work addresses the problem of multi-robot coordination under unknown robot transition models, ensuring that tasks specified by Time Window Temporal Logic are satisfied with user-defined probability thresholds. We present a bi-level framework that integrates (i) high-level task allocation, where tasks are assigned based on the robots' estimated task completion probabilities and expected rewards, and (ii) low-level distributed policy learning and execution, where robots independently optimize auxiliary rewards while fulfilling their assigned tasks. To handle uncertainty in robot dynamics, our approach leverages real-time task execution data to iteratively refine expected task completion probabilities and rewards, enabling adaptive task allocation without explicit robot transition models. We theoretically validate the proposed algorithm, demonstrating that the task assignments meet the desired probability thresholds with high confidence. Finally, we demonstrate the effectiveness of our framework through comprehensive simulations.
Authors:Federico Pratissoli, Mattia Mantovani, Amanda Prorok, Lorenzo Sabattini
Abstract:
Multi-robot systems are essential for environmental monitoring, particularly for tracking spatial phenomena like pollution, soil minerals, and water salinity, and more. This study addresses the challenge of deploying a multi-robot team for optimal coverage in environments where the density distribution, describing areas of interest, is unknown and changes over time. We propose a fully distributed control strategy that uses Gaussian Processes (GPs) to model the spatial field and balance the trade-off between learning the field and optimally covering it. Unlike existing approaches, we address a more realistic scenario by handling time-varying spatial fields, where the exploration-exploitation trade-off is dynamically adjusted over time. Each robot operates locally, using only its own collected data and the information shared by the neighboring robots. To address the computational limits of GPs, the algorithm efficiently manages the volume of data by selecting only the most relevant samples for the process estimation. The performance of the proposed algorithm is evaluated through several simulations and experiments, incorporating real-world data phenomena to validate its effectiveness.
Authors:Seyed Amir Tafrishi, Mikhail Svinin, Kenji Tahara
Abstract:
This paper explores an eclectic range of path-planning methodologies engineered for rolling surfaces. Our focus is on the kinematic intricacies of rolling contact systems, which are investigated through a motion planning lens. Beyond summarizing the approaches to single-contact rotational surfaces, we explore the challenging domain of spin-rolling multi-contact systems. Our work proposes solutions for the higher-dimensional problem of multiple rotating objects in contact. Venturing beyond kinematics, these methodologies find application across a spectrum of domains, including rolling robots, reconfigurable swarm robotics, micro/nano manipulation, and nonprehensile manipulations. Through meticulously examining established planning strategies, we unveil their practical implementations in various real-world scenarios, from intricate dexterous manipulation tasks to the nimble manoeuvring of rolling robots and even shape planning of multi-contact swarms of particles. This study introduces the persistent challenges and unexplored frontiers of robotics, intricately linked to both path planning and mechanism design. As we illuminate existing solutions, we also set the stage for future breakthroughs in this dynamic and rapidly evolving field by highlighting the critical importance of addressing rolling contact problems.
Authors:Sumit Paul, Danh Lephuoc, Manfred Hauswirth
Abstract:
In the autonomous vehicle and self-driving paradigm, cooperative perception or exchanging sensor information among vehicles over wireless communication has added a new dimension. Generally, an autonomous vehicle is a special type of robot that requires real-time, highly reliable sensor inputs due to functional safety. Autonomous vehicles are equipped with a considerable number of sensors to provide different required sensor data to make the driving decision and share with other surrounding vehicles. The inclusion of Data Distribution Service(DDS) as a communication middleware in ROS2 has proved its potential capability to be a reliable real-time distributed system. DDS comes with a scoping mechanism known as domain. Whenever a ROS2 process is initiated, it creates a DDS participant. It is important to note that there is a limit to the number of participants allowed in a single domain.
The efficient handling of numerous in-vehicle sensors and their messages demands the use of multiple ROS2 nodes in a single vehicle. Additionally, in the cooperative perception paradigm, a significant number of ROS2 nodes can be required when a vehicle functions as a single ROS2 node. These ROS2 nodes cannot be part of a single domain due to DDS participant limitation; thus, different domain communication is unavoidable. Moreover, there are different vendor-specific implementations of DDS, and each vendor has their configurations, which is an inevitable communication catalyst between the ROS2 nodes. The communication between vehicles or robots or ROS2 nodes depends directly on the vendor-specific configuration, data type, data size, and the DDS implementation used as middleware; in our study, we evaluate and investigate the limitations, capabilities, and prospects of the different domain communication for various vendor-specific DDS implementations for diverse sensor data type.
Authors:Xin Ye, Karl Handwerker, Sören Hohmann
Abstract:
The physical coupling between robots has the potential to improve the capabilities of multi-robot systems in challenging manufacturing processes. However, the path tracking accuracy of physically coupled robots is not studied adequately, especially considering the uncertain kinematic parameters, the mechanical elasticity, and the built-in controllers of off-the-shelf robots. This paper addresses these issues with a novel differential-algebraic system model which is verified against measurement data from real execution. The uncertain kinematic parameters are estimated online to adapt the model. Consequently, an adaptive model predictive controller is designed as a coordinator between the robots. The controller achieves a path tracking error reduction of 88.6% compared to the state-of-the-art benchmark in the simulation.
Authors:Guilherme S. Y. Giardini, John F. Hardy, Carlo R. da Cunha
Abstract:
Understanding the mechanisms behind emergent behaviors in multi-agent systems is critical for advancing fields such as swarm robotics and artificial intelligence. In this study, we investigate how neural networks evolve to control agents' behavior in a dynamic environment, focusing on the relationship between the network's complexity and collective behavior patterns. By performing quantitative and qualitative analyses, we demonstrate that the degree of network non-linearity correlates with the complexity of emergent behaviors. Simpler behaviors, such as lane formation and laminar flow, are characterized by more linear network operations, while complex behaviors like swarming and flocking show highly non-linear neural processing. Moreover, specific environmental parameters, such as moderate noise, broader field of view, and lower agent density, promote the evolution of non-linear networks that drive richer, more intricate collective behaviors. These results highlight the importance of tuning evolutionary conditions to induce desired behaviors in multi-agent systems, offering new pathways for optimizing coordination in autonomous swarms. Our findings contribute to a deeper understanding of how neural mechanisms influence collective dynamics, with implications for the design of intelligent, self-organizing systems.
Authors:Gengyuan Cai, Luosong Guo, Xiangmao Chang
Abstract:
The autonomous exploration of environments by multi-robot systems is a critical task with broad applications in rescue missions, exploration endeavors, and beyond. Current approaches often rely on either greedy frontier selection or end-to-end deep reinforcement learning (DRL) methods, yet these methods are frequently hampered by limitations such as short-sightedness, overlooking long-term implications, and convergence difficulties stemming from the intricate high-dimensional learning space. To address these challenges, this paper introduces an innovative integration strategy that combines the low-dimensional action space efficiency of frontier-based methods with the far-sightedness and optimality of DRL-based approaches. We propose a three-tiered planning framework that first identifies frontiers in free space, creating a sparse map representation that lightens data transmission burdens and reduces the DRL action space's dimensionality. Subsequently, we develop a multi-graph neural network (mGNN) that incorporates states of potential targets and robots, leveraging policy-based reinforcement learning to compute affinities, thereby superseding traditional heuristic utility values. Lastly, we implement local routing planning through subsequence search, which avoids exhaustive sequence traversal. Extensive validation across diverse scenarios and comprehensive simulation results demonstrate the effectiveness of our proposed method. Compared to baseline approaches, our framework achieves environmental exploration with fewer time steps and a notable reduction of over 30% in data transmission, showcasing its superiority in terms of efficiency and performance.
Authors:Shiba Biswal, Karthik Elamvazhuthi, Rishi Sonthalia
Abstract:
This paper investigates the use of transformers to approximate the mean-field dynamics of interacting particle systems exhibiting collective behavior. Such systems are fundamental in modeling phenomena across physics, biology, and engineering, including opinion formation, biological networks, and swarm robotics. The key characteristic of these systems is that the particles are indistinguishable, leading to permutation-equivariant dynamics. First, we empirically demonstrate that transformers are well-suited for approximating a variety of mean field models, including the Cucker-Smale model for flocking and milling, and the mean-field system for training two-layer neural networks. We validate our numerical experiments via mathematical theory. Specifically, we prove that if a finite-dimensional transformer effectively approximates the finite-dimensional vector field governing the particle system, then the $L_2$ distance between the \textit{expected transformer} and the infinite-dimensional mean-field vector field can be uniformly bounded by a function of the number of particles observed during training. Leveraging this result, we establish theoretical bounds on the distance between the true mean-field dynamics and those obtained using the transformer.
Authors:Ang He, Xi-mei Wu, Xiao-bin Guo, Li-bin Liu
Abstract:
The evolving field of mobile robotics has indeed increased the demand for simultaneous localization and mapping (SLAM) systems. To augment the localization accuracy and mapping efficacy of SLAM, we refined the core module of the SLAM system. Within the feature matching phase, we introduced cross-validation matching to filter out mismatches. In the keyframe selection strategy, an exponential threshold function is constructed to quantify the keyframe selection process. Compared with a single robot, the multi-robot collaborative SLAM (CSLAM) system substantially improves task execution efficiency and robustness. By employing a centralized structure, we formulate a multi-robot SLAM system and design a coarse-to-fine matching approach for multi-map point cloud registration. Our system, built upon ORB-SLAM3, underwent extensive evaluation utilizing the TUM RGB-D, EuRoC MAV, and TUM_VI datasets. The experimental results demonstrate a significant improvement in the positioning accuracy and mapping quality of our enhanced algorithm compared to those of ORB-SLAM3, with a 12.90% reduction in the absolute trajectory error.
Authors:Rishabh Dev Yadav
Abstract:
Multi-robot formation control has various applications in domains such as vehicle troops, platoons, payload transportation, and surveillance. Maintaining formation in a vehicle platoon requires designing a suitable control scheme that can tackle external disturbances and uncertain system parameters while maintaining a predefined safe distance between the robots. A crucial challenge in this context is dealing with the unknown/uncertain friction forces between wheels and the ground, which vary with changes in road surface, wear in tires, and speed of the vehicle. Although state-of-the-art adaptive controllers can handle a priori bounded uncertainties, they struggle with accurately modeling and identifying frictional forces, which are often state-dependent and cannot be a priori bounded. This thesis proposes a new adaptive sliding mode controller for wheeled mobile robot-based vehicle platoons that can handle the unknown and complex behavior of frictional forces without prior knowledge of their parameters and structures. The controller uses the adaptive sliding mode control techniques to regulate the platoon's speed and maintain a predefined inter-robot distance, even in the presence of external disturbances and uncertain system parameters. This approach involves a two-stage process: first, the kinematic controller calculates the desired velocities based on the desired trajectory; and second, the dynamics model generates the commands to achieve the desired motion. By separating the kinematics and dynamics of the robot, this approach can simplify the control problem and allow for more efficient and robust control of the wheeled mobile robot.
Authors:Arun Muthukkumar
Abstract:
Pose estimation is essential for many applications within computer vision and robotics. Despite its uses, few works provide rigorous uncertainty quantification for poses under dense or learned models. We derive a closed-form lower bound on the covariance of camera pose estimates by treating a differentiable renderer as a measurement function. Linearizing image formation with respect to a small pose perturbation on the manifold yields a render-aware Cramér-Rao bound. Our approach reduces to classical bundle-adjustment uncertainty, ensuring continuity with vision theory. It also naturally extends to multi-agent settings by fusing Fisher information across cameras. Our statistical formulation has downstream applications for tasks such as cooperative perception and novel view synthesis without requiring explicit keypoint correspondences.
Authors:Hossein B. Jond
Abstract:
Collective behaviors such as swarming and flocking emerge from simple, decentralized interactions in biological systems. Existing models, such as Vicsek and Cucker-Smale, lack collision avoidance, whereas the Olfati-Saber model imposes rigid formations, limiting their applicability in swarm robotics. To address these limitations, this paper proposes a minimal yet expressive model that governs agent dynamics using relative positions, velocities, and local density, modulated by two tunable parameters: the spatial offset and kinetic offset. The model achieves spatially flexible, collision-free behaviors that reflect naturalistic group dynamics. Furthermore, we extend the framework to cognitive autonomous systems, enabling energy-aware phase transitions between swarming and flocking through adaptive control parameter tuning. This cognitively inspired approach offers a robust foundation for real-world applications in multi-robot systems, particularly autonomous aerial swarms.
Authors:Massoud Pourmandi
Abstract:
The proposal introduces an innovative drone swarm perception system that aims to solve problems related to computational limitations and low-bandwidth communication, and real-time scene reconstruction. The framework enables efficient multi-agent 3D/4D scene synthesis through federated learning of shared diffusion model and YOLOv12 lightweight semantic extraction and local NeRF updates while maintaining privacy and scalability. The framework redesigns generative diffusion models for joint scene reconstruction, and improves cooperative scene understanding, while adding semantic-aware compression protocols. The approach can be validated through simulations and potential real-world deployment on drone testbeds, positioning it as a disruptive advancement in multi-agent AI for autonomous systems.
Authors:Ashish Kumar
Abstract:
Efficient exploration is a well known problem in deep reinforcement learning and this problem is exacerbated in multi-agent reinforcement learning due the intrinsic complexities of such algorithms. There are several approaches to efficiently explore an environment to learn to solve tasks by multi-agent operating in that environment, of which, the idea of expert exploration is investigated in this work. More specifically, this work investigates the application of large-language models as expert planners for efficient exploration in planning based tasks for multiple agents.
Authors:Teng Guo
Abstract:
The labeled MRPP (Multi-Robot Path Planning) problem involves routing robots from start to goal configurations efficiently while avoiding collisions. Despite progress in solution quality and runtime, its complexity and industrial relevance continue to drive research.
This dissertation introduces scalable MRPP methods with provable guarantees and practical heuristics. First, we study dense MRPP on 2D grids, relevant to warehouse and parcel systems. We propose the Rubik Table method, achieving $(1 + δ)$-optimal makespan (with $δ\in (0, 0.5]$) for up to $\frac{m_1 m_2}{2}$ robots, solving large instances efficiently and setting a new theoretical benchmark.
Next, we address real-world MRPP. We design optimal layouts for structured environments (e.g., warehouses, parking systems) and propose a puzzle-based system for dense, deadlock-free autonomous vehicle parking. We also extend MRPP to Reeds-Shepp robots, introducing motion primitives and smoothing techniques to ensure feasible, efficient paths under nonholonomic constraints. Simulations and real-world tests validate the approach in urban driving and robotic transport scenarios.
Authors:Michael P. Wozniak
Abstract:
Multi-agent systems seeking consensus may also have other objective functions to optimize, requiring the research of multi-objective optimization in consensus. Several recent publications have explored this domain using various methods such as weighted-sum optimization and penalization methods. This paper reviews the state of the art for consensus-based multi-objective optimization, poses a multi-agent lunar rover exploration problem seeking consensus and maximization of explored area, and achieves optimal edge weights and steering angles by applying SQP algorithms.
Authors:Logan Beaver
Abstract:
In this article we propose a game-theoretic approach to the multi-robot task allocation problem using the framework of global games. Each task is associated with a global signal, a real-valued number that captures the task execution progress and/or urgency. We propose a linear objective function for each robot in the system, which, for each task, increases with global signal and decreases with the number assigned robots. We provide conditions on the objective function hyperparameters to induce a mixed Nash equilibrium, i.e., solutions where all robots are not assigned to a single task. The resulting algorithm only requires the inversion of a matrix to determine a probability distribution over the robot assignments. We demonstrate the performance of our algorithm in simulation and provide direction for applications and future work.
Authors:Andrea Giusti
Abstract:
Large-Scale Multi-Agent Systems (LS-MAS) consist of several autonomous components, interacting in a non-trivial way, so that the emerging behaviour of the ensemble depends on the individual dynamics of the components and their reciprocal interactions. These models can describe a rich variety of natural systems, as well as artificial ones, characterised by unparalleled scalability, robustness, and flexibility. Indeed, a crucial objective is devising efficient strategies to model and control the spatial behaviours of LS-MAS to achieve specific goals. However, the inherent complexity of these systems and the wide spectrum of their emerging behaviours pose significant challenges. The overarching goal of this thesis is, therefore, to advance methods for modelling, analyzing and controlling the spatial behaviours of LS-MAS, with applications to cellular populations and swarm robotics. The thesis begins with an overview of the existing Literature, and is then organized into two distinct parts. In the context of swarm robotics, Part I deals with distributed control algorithms to spatially organize agents on geometric patterns. The contribution is twofold, encompassing both the development of original control algorithms, and providing a novel formal analysis, which allows to guarantee the emergence of specific geometric patterns. In Part II, looking at the spatial behaviours of biological agents, experiments are carried out to study the movement of microorganisms and their response to light stimuli. This allows the derivation and parametrization of mathematical models that capture these behaviours, and pave the way for the development of innovative approaches for the spatial control of microorganisms. The results presented in the thesis were developed by leveraging formal analytical tools, simulations, and experiments, using innovative platforms and original computational frameworks.
Authors:James O'Keeffe
Abstract:
An active approach to fault tolerance, the combined processes of fault detection, diagnosis, and recovery, is essential for long term autonomy in robots -- particularly multi-robot systems and swarms. Previous efforts have primarily focussed on spontaneously occurring electro-mechanical failures in the sensors and actuators of a minority sub-population of robots. While the systems that enable this function are valuable, they have not yet considered that many failures arise from gradual wear and tear with continued operation, and that this may be more challenging to detect than sudden step changes in performance. This paper presents the Artificial Antibody Population Dynamics (AAPD) model -- an immune-inspired model for the detection and diagnosis of gradual degradation in robot swarms. The AAPD model is demonstrated to reliably detect and diagnose gradual degradation, as well as spontaneous changes in performance, among swarms of robots of varying sizes while remaining tolerant of normally behaving robots. The AAPD model is distributed, offers supervised and unsupervised configurations, and demonstrates promising scalable properties. Deploying the AAPD model on a swarm of foraging robots undergoing gradual degradation enables the swarm to operate on average at between 70% - 97% of its performance in perfect conditions and is able to prevent instances of robots failing in the field during experiments in most of the cases tested.
Authors:Ankit Shaw
Abstract:
This paper presents a comprehensive overview of exploration strategies utilized in both 2D and 3D environments, focusing on autonomous multi-robot systems designed for building exploration and fire detection. We explore the limitations of traditional algorithms that rely on prior knowledge and predefined maps, emphasizing the challenges faced when environments undergo changes that invalidate these maps. Our modular approach integrates localization, mapping, and trajectory planning to facilitate effective exploration using an OctoMap framework generated from point cloud data. The exploration strategy incorporates obstacle avoidance through potential fields, ensuring safe navigation in dynamic settings. Additionally, I propose future research directions, including decentralized map creation, coordinated exploration among unmanned aerial vehicles (UAVs), and adaptations to time-varying environments. This work serves as a foundation for advancing coordinated multi-robot exploration algorithms, enhancing their applicability in real-world scenarios.
Authors:Elena Tonini
Abstract:
Building upon the foundational work of the Bachelor's Degree Thesis titled "Analysis and Characterization of Wi-Fi Channel State Information'', this thesis significantly advances the research by conducting an in-depth analysis of CSIs, offering new insights that extend well beyond the original study. The goal of this work is to broaden the mathematical and statistical representation of a wireless channel through the study of CSI behavior and evolution over time and frequency.
CSI provides a high-level description of the behavior of a signal propagating from a transmitter to a receiver, thereby representing the structure of the environment where the signal propagates. This knowledge can be used to perform ambvient sensing, a technique that extracts relevant information about the surroundings of the receiver from the properties of the received signal, which are affected by interactions with the surfaces of the objects within the analyzed environment. Ambient sensing already plays an essential role in new wireless networks such as 5G and Beyond 5G; its use Joint Communication and Sensing applications and for the optimization of signal propagation through beamforming could also enable the implementation of efficient cooperative ambient sensing in vehicular networks, facilitating Cooperative Perception and, consequently, increasing road safety.
Due to the lack of research on CSI characterization, this study aims to begin analyzing the structure of CSI traces collected in a controlled experimental environment and to describe their statistical properties. The results of such characterization could provide mathematical support for environment classification and movement recognition tasks that are currently performed only through Machine Learning techniques, introducing instead efficient, dedicated algorithms.
Authors:Jushan Chen
Abstract:
This paper presents a tutorial on the Consensus Alternating Direction Method of Multipliers (Consensus ADMM) for distributed optimization, with a specific focus on applications in multi-robot systems. In this tutorial, we derive the consensus ADMM algorithm, highlighting its connections to the augmented Lagrangian and primal-dual methods. Finally, we apply Consensus ADMM to an example problem for trajectory optimization of a multi-agent system.
Authors:Inmo Jang
Abstract:
Swarm robotics explores the coordination of multiple robots to achieve collective goals, with collective decision-making being a central focus. This process involves decentralized robots autonomously making local decisions and communicating them, which influences the overall emergent behavior. Testing such decentralized algorithms in real-world scenarios with hundreds or more robots is often impractical, underscoring the need for effective simulation tools. We propose SPACE (Swarm Planning and Control Evaluation), a Python-based simulator designed to support the research, evaluation, and comparison of decentralized Multi-Robot Task Allocation (MRTA) algorithms. SPACE streamlines core algorithmic development by allowing users to implement decision-making algorithms as Python plug-ins, easily construct agent behavior trees via an intuitive GUI, and leverage built-in support for inter-agent communication and local task awareness. To demonstrate its practical utility, we implement and evaluate CBBA and GRAPE within the simulator, comparing their performance across different metrics, particularly in scenarios with dynamically introduced tasks. This evaluation shows the usefulness of SPACE in conducting rigorous and standardized comparisons of MRTA algorithms, helping to support future research in the field.
Authors:Wei Liu
Abstract: This paper investigates the impact of cooperative perception on autonomous driving decision making on urban roads. The extended perception range contributed by the cooperative perception can be properly leveraged to address the implicit dependencies within the vehicles, thereby the vehicle decision making performance can be improved. Meanwhile, we acknowledge the inherent limitation of wireless communication and propose a Cooperative Perception on Demand (CPoD) strategy, where the cooperative perception will only be activated when the extended perception range is necessary for proper situation-awareness. The situation-aware decision making with CPoD is modeled as a Partially Observable Markov Decision Process (POMDP) and solved in an online manner. The evaluation results demonstrate that the proposed approach can function safely and efficiently for autonomous driving on urban roads.