RSS2021

Abstract:
Performing a number of motion patterns — referred to as skills — (e.g.; wave; spiral; sweeping motions) during teleoperation is an integral part of many industrial processes such as spraying; welding; and wiping (cleaning; polishing). Maintaining these motions whilst simultaneously avoiding obstacles and traversing complex terrain requires expert operators. In this work; we propose a novel skill-based shared control framework for incorporating the notion of skill assistance to aid novice operators to sustain these motion patterns whilst adhering to environmental constraints. Our shared control method uses streaming joystick data to estimate the model parameters that provide a description of the operator's intention. We introduce a novel parametrization for state and control that combines skill and underlying trajectory models; leveraging a special type of curve known as Clothoids. This new parameterization allows for efficient computation of skill-based short term horizon plans; enabling the use of a Model Predictive Control (MPC) loop. We perform experiments on a hardware mock-up; validating the effectiveness of our method to recognize a switch of intended skill; and showing an improved quality of output motion; even under dynamically changing obstacles. See our accompanying video here: https://youtu.be/TwhsgA6fw6M .

Abstract:
High level declarative constraints provide a powerful (and popular) way to define and construct control policies; however; most synthesis algorithms do not support specifying the degree of randomness (unpredictability) of the resulting controller. In many contexts; e.g.; patrolling; testing; behavior prediction; and planning on idealized models; predictable or biased controllers are undesirable. To address these concerns; we introduce the Entropic Reactive Control Improvisation (ERCI) framework and algorithm that supports synthesizing control policies for stochastic games that are declaratively specified by (i) a hard constraint specifying what must occur (ii) a soft constraint specifying what typically occurs; and (iii) a randomization constraint specifying the unpredictability and variety of the controller; as quantified using causal entropy. This framework; which extends the state-of-the-art by supporting arbitrary combinations of adversarial and probabilistic uncertainty in the environment; enables a flexible modeling formalism which we argue; theoretically and empirically; remains tractable.

Abstract:
Deep learning has had a far reaching impact in robotics. Specifically; deep reinforcement learning algorithms have been highly effective in synthesizing neural-network controllers for a wide range of tasks. However; despite this empirical success; these controllers still lack theoretical guarantees on their performance; such as Lyapunov stability (i.e.; all trajectories of the closed-loop system are guaranteed to converge to a goal state under the control policy). This is in stark contrast to traditional model-based controller design; where principled approaches (like LQR) can synthesize stable controllers with provable guarantees. To address this gap; we propose a generic method to synthesize a Lyapunov-stable neural-network controller; together with a neural-network Lyapunov function to simultaneously certify its stability. Our approach formulates the Lyapunov condition verification as a mixed-integer linear program (MIP). Our MIP verifier either certifies the Lyapunov condition; or generates counter examples that can help improve the candidate controller and the Lyapunov function. We also present an optimization program to compute an inner approximation of the region of attraction for the closed-loop system. We apply our approach to robots including an inverted pendulum; a 2D and a 3D quadrotor; and showcase that our neural-network controller outperforms a baseline LQR controller. The code is open sourced at https://github.com/StanfordASL/neural-network-lyapunov .

Abstract:
We tackle the problem of generalization to unseen configurations for dynamic tasks in the real world while learning from high-dimensional image input. The family of nonlinear dynamical system-based methods have successfully demonstrated dynamic robot behaviors but have difficulty in generalizing to unseen configurations as well as learning from image inputs. Recent works approach this issue by using deep network policies and reparameterize actions to embed the structure of dynamical systems but still struggle in domains with diverse configurations of image goals; and hence; find it difficult to generalize. In this paper; we address this dichotomy by leveraging embedding the structure of dynamical systems in a hierarchical deep policy learning framework; called Hierarchical Neural Dynamical Policies (H-NDPs). Instead of fitting deep dynamical systems to diverse data directly; H-NDPs form a curriculum by learning local dynamical system-based policies on small regions in state-space and then distill them into a global dynamical system-based policy that operates only from high-dimensional images. H-NDPs additionally provide smooth trajectories; a strong safety benefit in the real world. We perform extensive experiments on dynamic tasks both in the real world (digit writing; scooping; and pouring) and simulation (catching; throwing; picking). We show that H-NDPs are easily integrated with both imitation as well as reinforcement learning setups and achieve state-of-the-art results. Video results at https://shikharbahl.github.io/hierarchical-ndps .

Abstract:
Imitation learning is an effective tool for robotic learning tasks where specifying a reinforcement learning (RL) reward is not feasible or where the exploration problem is particularly difficult. Imitation; typically behavior cloning or inverse RL; derive a policy from a collection of first-person action-state trajectories. This is contrary to how humans and other animals imitate: we observe a behavior; even from other species; understand its perceived effect on the state of the environment; and figure out what actions our body can perform to reach a similar outcome. In this work; we explore the possibility of third-person visual imitation of manipulation trajectories; only from vision and without access to actions; demonstrated by embodiments different to the ones of our imitating agent. Specifically; we investigate what would be an appropriate representation method with which an RL agent can visually track trajectories of complex manipulation behavior —non-planar with multiple-object interactions— demonstrated by experts with different embodiments. We present a way to train manipulator-independent representations (MIR) that primarily focus on the change in the environment and have all the characteristics that make them suitable for cross-embodiment visual imitation with RL: domain-invariant; temporally smooth; and actionable. We show that with our proposed method our agents are able to imitate; with complex robot control; trajectories from a variety of embodiments and with significant visual and dynamics differences; e.g. simulation-to-reality gap.

Abstract:
Loops are pervasive in robotics problems; appearing in mapping and localization; where one is interested in finding loop closure constraints to better approximate robot poses or other estimated quantities; as well as planning and prediction; where one is interested in the homotopy classes of the space through which a robot is moving. We generalize the standard topological definition of a loop to cases where a trajectory passes close to itself; but doesn't necessarily touch; giving a definition that is more practical for real robotics problems. This relaxation leads to new and useful properties of inexact loops; such as their ability to be partitioned into topologically connected sets closely matching the concept of a "loop closure"; and the existence of simple and nonsimple loops. Building from these ideas; we introduce several ways to measure properties and quantities of inexact loops on a trajectory; such as the trajectory's "loop area" and "loop density'"; and use them to compare strategies for sampling representative inexact loops to build constraints in mapping and localization problems.

Abstract:
Quadrotors are extremely agile; so much in fact; that classic first-principle-models come to their limits. Aerodynamic effects; while insignificant at low speeds; become the dominant model defect during high speeds or agile maneuvers. Accurate modeling is needed to design robust high-performance control systems and enable flying close to the platform's physical limits. We propose a hybrid approach fusing first principles and learning to model quadrotors and their aerodynamic effects with unprecedented accuracy. First principles fail to capture such aerodynamic effects; rendering traditional approaches inaccurate when used for simulation or controller tuning. Data-driven approaches try to capture aerodynamic effects with blackbox modeling; such as neural networks; however; they struggle to robustly generalize to arbitrary flight conditions. Our hybrid approach unifies and outperforms both first-principles blade-element momentum theory and learned residual dynamics. It is evaluated in one of the world's largest motion-capture systems; using autonomous-quadrotor-flight data at speeds up to 65 km/h. The resulting model captures the aerodynamic thrust; torques; and parasitic effects with astonishing accuracy; outperforming existing models with 50% reduced prediction errors; and shows strong generalization capabilities beyond the training set.

Abstract:
Real-time adaptation is imperative to the control of robots operating in complex; dynamic environments. Adaptive control laws can endow even nonlinear systems with good trajectory tracking performance; provided that any uncertain dynamics terms are linearly parameterizable with known nonlinear features. However; it is often difficult to specify such features a priori; such as for aerodynamic disturbances on rotorcraft or interaction forces between a manipulator arm and various objects. In this paper; we turn to data-driven modeling with neural networks to learn; offline from past data; an adaptive controller with an internal parametric model of these nonlinear features. Our key insight is that we can better prepare the controller for deployment with control-oriented meta-learning of features in closed-loop simulation; rather than regression-oriented meta-learning of features to fit input-output data. Specifically; we meta-learn the adaptive controller with closed-loop tracking simulation as the base-learner and the average tracking error as the meta-objective. With a nonlinear planar rotorcraft subject to wind; we demonstrate that our adaptive controller outperforms other controllers trained with regression-oriented meta-learning when deployed in closed-loop for trajectory tracking control.

Abstract:
This paper presents Particle-based Object Manipulation (PROMPT); a new approach to robot manipulation of novel objects ab initio; without prior object models or pre-training on a large object data set. The key element of PROMPT is a particle-based object representation; in which each particle represents a point in the object; the local geometric; physical; and other features of the point; and also its relation with other particles. Like the model-based analytic approaches to manipulation; the particle representation enables the robot to reason about the object's geometry and dynamics in order to choose suitable manipulation actions. Like the data-driven approaches; the particle representation is inferred online in real-time from visual sensor input; specifically; multi-view RGB images. The particle representation thus connects visual perception with robot control. PROMPT combines the benefits of both model-based reasoning and data-driven learning. We show empirically that PROMPT successfully handles a variety of everyday objects; some of which are transparent. It handles various manipulation tasks; including grasping; pushing; etc;. Our experiments also show that PROMPT outperforms a state-of-the-art data-driven grasping method on the daily objects; even though it does not use any offline training data.

Abstract:
We address the problem of planning robot motions in constrained configuration spaces where the constraints change throughout the motion. The problem is formulated as a fixed sequence of intersecting manifolds; which the robot needs to traverse in order to solve the task. We specify a class of sequential motion planning problems that fulfill a particular property of the change in the free configuration space when transitioning between manifolds. For this problem class; the algorithm Planning on Sequenced Manifolds (PSM) is developed which searches for optimal intersection points between manifolds by using RRT in an inner loop with a novel steering strategy. We provide a theoretical analysis regarding PSMs probabilistic completeness and asymptotic optimality. Further; we evaluate its planning performance on multi-robot object transportation tasks.

Abstract:
Long horizon sequential manipulation tasks are effectively addressed hierarchically: at a high level of abstraction the planner searches over abstract action sequences; and when a plan is found; lower level motion plans are generated. Such a strategy hinges on the ability to reliably predict that a feasible low level plan will be found which satisfies the abstract plan. However; computing Abstract Plan Feasibility (APF) is difficult because the outcome of a plan depends on complex real-world phenomena that are computationally costly to model; such as noise in estimation and plan execution. In this work; we present an active learning approach to efficiently acquire an APF predictor through curious exploration on a robot. The robot identifies plans whose outcomes would be informative about APF; executes those plans; and learns from their subsequent successes or failures. We evaluate our strategy in simulation and on a real Franka Emika Panda robot with integrated perception; experimentation; planning; and execution. In a stacking domain where objects have non-uniform mass distributions; we show that our system permits real-robot learning of an APF model in four hundred self-supervised interactions; and that our learned model can be used effectively in different downstream tasks (e.g.; constructing the tallest tower or tower with the longest overhang).

Abstract:
Robots deployed at orders of magnitude different size scales; and that retain the same desired behavior at any of those scales; would greatly expand the environments in which the robots could operate. However it is currently not known whether such robots exist; and; if they do; how to design them. Since self similar structures in nature often exhibit self similar behavior at different scales; we hypothesize that there may exist robot designs that have the same property. Here we demonstrate that this is indeed the case for some; but not all; modular soft robots: there are robot designs that exhibit a desired behavior at a small size scale; and if copies of that robot are attached together to realize the same design at higher scales; those larger robots exhibit similar behavior. We show how to find such designs in simulation using an evolutionary algorithm. Further; when fractal attachment is not assumed and attachment geometries must thus be evolved along with the design of the base robot unit; scale invariant behavior is not achieved; demonstrating that structural self similarity; when combined with appropriate designs; is a useful path to realizing scale invariant robot behavior. We validate our findings by demonstrating successful transferal of self similar structure and behavior to pneumatically-controlled soft robots. Finally; we show that biobots can spontaneously exhibit self similar attachment geometries; thereby suggesting that self similar behavior via self similar structure may be realizable across a wide range of robot platforms in future.

Abstract:
We present a learning-based approach to prove infeasibility of kinematic motion planning problems. Sampling-based motion planners are effective in high-dimensional spaces but are only probabilistically complete. Consequently; these planners cannot provide a definite answer if no plan exists; which is important for high-level scenarios; such as task-motion planning. We propose a combination of bidirectional sampling-based planning (such as RRT-connect) and machine learning to construct an infeasibility proof alongside the two search trees. An infeasibility proof is a closed manifold in the obstacle region of the configuration space that separates the start and goal into disconnected components of the free configuration space. We train the manifold using common machine learning techniques and then triangulate the manifold into a polytope to prove containment in the obstacle region. Under assumptions about learning hyper-parameters and robustness of configuration space optimization; the output is either an infeasibility proof or a motion plan. We demonstrate proof construction for 3-DOF and 4-DOF manipulators and show improvement over previous algorithms.

Abstract:
In this paper; we provide a generalized framework for Variational Inference-Stochastic Optimal Control by using the non-extensive Tsallis divergence. By incorporating the deformed exponential function into the optimality likelihood function; a novel Tsallis Variational Inference-Model Predictive Control algorithm is derived; which includes prior works such as Variational Inference-Model Predictive Control; Model Predictive Path Integral Control; Cross Entropy Method; and Stein Variational Inference Model Predictive Control as special cases. The proposed algorithm allows for effective control of the cost/reward transform and is characterized by superior performance in terms of mean and variance reduction of the associated cost. The aforementioned features are supported by a theoretical and numerical analysis on the level of risk sensitivity of the proposed algorithm as well as simulation experiments on 5 different robotic systems with 3 different policy parameterizations.

Abstract:
When transferring a control policy from simulation to a physical system; this policy needs to be robust to variations in the dynamics to perform well. Commonly; the optimal policy overfits to the approximate model and the corresponding state-distribution. Therefore; the policy fails when transferred to the physical system. In this paper; we are presenting robust value iteration. This approach uses dynamic programming to compute the optimal value function on the compact state domain and incorporates adversarial perturbations of the system dynamics. The adversarial perturbations cause the resulting optimal policy to be robust to changes in the dynamics. Utilizing the continuous time perspective of reinforcement learning; we derive the optimal perturbations for the states; actions; observations and model parameters in closed-form. The resulting algorithm does not require discretization of states or actions. Therefore; the optimal adversarial perturbations can be efficiently incorporated in the min-max value function update. We apply the resulting algorithm to the physical Furuta Pendulum and cartpole. By changing the masses of the systems we evaluate the quantitative and qualitative performance across different model parameters. We show that robust value iteration is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.

Abstract:
In this paper; we first prove an interesting result for point feature based SLAM. "When the covariance matrices of feature observation errors are isotropic; the robot poses and feature positions obtained in each Gauss-Newton iteration (when solving a reformulated least squares optimisation based SLAM) are independent of the feature positions in the previous step". That is; even if we reset the feature positions to different random values before each iteration; the results after the iteration never change. Building on this finding; we propose an algorithm to solve the robot poses only ("localisation") and show that the algorithm generates exactly the same robot poses in each iteration as the Gauss-Newton method (SLAM). The optimal feature positions can be easily recovered using a closed-form formula after the optimal robot poses are obtained. Similarly; when the covariance matrices of odometry translation errors are also isotropic; we can prove that the SLAM results are independent of both the feature positions and the robot positions. Thus; we can have an "rotation-only algorithm" which generates the same robot rotations as the full SLAM. Again; the optimal robot positions and the optimal feature positions can be computed from the obtained optimal robot rotations using a closed-form formula. We have used multiple 2D and 3D SLAM datasets to demonstrate our research findings. The video shows the interesting convergence results can be found at https://youtu.be/j1T8toqGtDE . We expect the findings in this paper can help SLAM researchers to further understand the special structure of the SLAM problems and to further develop more efficient and reliable SLAM algorithms.

Abstract:
Successful real-world deployment of legged robots would require them to adapt in real-time to unseen scenarios like changing terrains; changing payloads; wear and tear. This paper presents Rapid Motor Adaptation (RMA) algorithm to solve this problem of real-time online adaptation in quadruped robots. RMA consists of two components: a base policy and an adaptation module. The combination of these components enables the robot to adapt to novel situations in fractions of a second. RMA is trained completely in simulation without using any domain knowledge like reference trajectories or predefined foot trajectory generators and is deployed on the A1 robot without any fine-tuning. We train RMA on a varied terrain generator using bioenergetics-inspired rewards and deploy it on a variety of difficult terrains including rocky; slippery; deformable surfaces in environments with grass; long vegetation; concrete; pebbles; stairs; sand; etc. RMA shows state-of-the-art performance across diverse real-world as well as simulation experiments. Project Webpage and Videos: https://ashish-kmr.github.io/rma-legged-robots/

Abstract:
This paper describes a system for visually guided autonomous navigation of under-canopy farm robots. Low-cost under-canopy robots can drive between crop rows under the plant canopy and accomplish tasks that are infeasible for over-the-canopy drones or larger agricultural equipment. However; autonomously navigating them under the canopy presents a number of challenges: unreliable GPS and LiDAR; high cost of sensing; challenging farm terrain; clutter due to leaves and weeds; and large variability in appearance over the season and across crop types. We address these challenges by building a modular system that leverages machine learning for robust and generalizable perception from monocular RGB images from low-cost cameras; and model predictive control for accurate control in challenging terrain. Our system; CropFollow; is able to autonomously drive 485 meters per intervention on average; outperforming a state-of-the-art LiDAR based system (286 meters per intervention) in extensive field testing spanning over 25 km.

Abstract:
The partially observable Markov decision process (POMDP) is a principled general framework for robot decision making under uncertainty; but POMDP planning suffers from high computational complexity; when long-term planning is required. While temporally-extended macro-actions help to cut down the effective planning horizon and significantly improve computational efficiency; how do we acquire good macro-actions? This paper proposes Macro-Action Generator-Critic (MAGIC); which performs offline learning of macro-actions optimized for online POMDP planning. Specifically; MAGIC learns a macro-action generator end-to-end; using an online planner's performance as the feedback. During online planning; the generator generates on the fly situation-aware macro-actions conditioned on the robot's belief and the environment context. We evaluated MAGIC on several long-horizon planning tasks both in simulation and on a real robot. The experimental results show that the learned macro-actions offer significant benefits in online planning performance; compared with primitive actions and handcrafted macro-actions.

Abstract:
In this paper; we propose a novel task; Manipulation Question Answering (MQA); where the robot performs manipulation actions to change the environment in order to answer a given question. To solve this problem; a framework consisting of a QA module and a manipulation module is proposed. For the QA module; we adopt the method for the Visual Question Answering (VQA) task. For the manipulation module; a Deep Q Network (DQN) model is designed to generate manipulation actions for the robot to interact with the environment. We consider the situation where the robot continuously manipulating objects inside a bin until the answer to the question is found. Besides; a novel dataset that contains a variety of object models; scenarios and corresponding question-answer pairs is established in a simulation environment. Extensive experiments have been conducted to validate the effectiveness of the proposed framework.

Abstract:
Natural language is perhaps the most flexible and intuitive way for humans to communicate tasks to a robot. Prior work in imitation learning typically requires each task be specified with a task id or goal image—something that is often impractical in open-world environments. On the other hand; previous approaches in instruction following allow agent behavior to be guided by language; but typically assume structure in the observations; actuators; or language that limit their applicability to complex settings like robotics. In this work; we present a method for incorporating free-form natural language conditioning into imitation learning. Our approach learns perception from pixels; natural language understanding; and multitask continuous control end-to-end as a single neural network. Unlike prior work in imitation learning; our method is able to incorporate unlabeled and unstructured demonstration data (i.e. no task or language labels). We show this dramatically improves language conditioned performance; while reducing the cost of language annotation to less than 1% of total data. At test time; a single language conditioned visuomotor policy trained with our method can perform a wide variety of robotic manipulation skills in a 3D environment; specified only with natural language descriptions of each task (e.g. "open the drawer...now pick up the block...now press the green button...") (see video). To scale up the number of instructions an agent can follow; we propose combining text conditioned policies with large pretrained neural language models. We find this allows a policy to be robust to many out-of-distribution synonym instructions; without requiring new demonstrations. See videos of a human typing live text commands to our agent at https://groundinglanguage.github.io

Abstract:
The locomotion for many modern robotic systems is optimized for a single target domain - aerial; surface or underwater. In this work; we address the challenge of developing a robotic system capable of controlled motion in air and underwater. Further; we explore the particular challenge of dynamic transitions between air and water. We propose Dipper; an aerial-aquatic hybrid vehicle. Dipper is a light-weight fixed-wing UAV with actively swept wings. The bio-inspired system is not only capable of maneuvering efficiently during flight and underwater; but can also perform dynamic aerial-aquatic transitions. We describe the design; construction and testing of the Dipper prototype; and demonstrate repeatability and robustness especially during the transition phases.

Abstract:
Recent work on decision-making and planning for autonomous driving has made use of game theoretic methods to model interaction between agents. We demonstrate that methods based on the Stackelberg game formulation of this problem are susceptible to an issue that we refer to as Conflict. Our results show that when Conflict occurs; it can cause sub-optimal and potentially dangerous behaviour. In response; we develop a theoretical framework for analysing the extent to which such methods are impacted by Conflict; and apply this framework to several existing approaches modelling interaction between agents. Moreover; we propose Augmented Altruism; a novel approach to modelling interaction between players in a Stackelberg game; and show that it is less prone to Conflict than previous techniques. Finally; we investigate the behavioural assumptions that underpin our approach by performing experiments with human participants. The results show that our model approximates human decision-making more accurately than existing game-theoretic models of interactive driving.

Abstract:
We present a novel approach to motion planning for autonomous ground vehicles by formulating motion primitives as probabilistic distributions of trajectories (aka probabilistic motion primitives - ProMP) and performing stochastic optimisation on them for finding an optimal path. We show that compared to the traditional approach of using discrete motion primitives or direct stochastic optimisation of the whole path; incorporating ProMPs enables higher quality of paths by enabling constraint conditioning; combination and blending of probability distributions. We present two motion planners utilizing this approach: feasibility-based trajectory sampling (PROMPT-S) and stochastic gradient-based trajectory optimisation (PROMPT-O). We show simulation results of our approach outperforming state-of-the-art optimisation as well as discrete motion primitives-based planners. We additionally illustrate the versatility of our approach by showing PROMPT's ability to handle significantly skewed motion primitives; e.g; as induced by steering failure in AGVs as well as the composition of motion primitives to perform complex maneuvers. Finally; we demonstrate the practicality of these planners by implementing them on a real self-driving vehicle navigating on structured and unstructured off-road terrains.

Abstract:
Robots will be expected to manipulate a wide variety of objects in complex and arbitrary ways as they become more widely used in human environments. As such; the rearrangement of objects has been noted to be an important benchmark for AI capabilities in recent years. We propose NeRP (Neural Rearrangement Planning); a deep learning based approach for multi-step neural object rearrangement planning which works with never-before-seen objects; that is trained on simulation data; and generalizes to the real world. We compare NeRP to several naive and model-based baselines; demonstrating that our approach is measurably better and can efficiently arrange unseen objects in fewer steps and with less planning time. Finally; we demonstrate it on several challenging rearrangement problems in the real world.

Abstract:
This paper presents a novel method for continuous integration over the 3D rotation group SO(3). The key idea is to model the system's dynamics with Gaussian Processes (GPs) and learn the GP training data to match the system's instantaneous angular velocity measurements. This is formulated as the minimisation of a simple cost function that leverages the application of linear operators over GP kernels. The proposed integration method over SO(3) is applied to the preintegration of inertial data provided by a 6-DoF-Inertial Measurement Unit (IMU). Unlike standard integration that requires the recomputation of the integrals every time the estimate changes; preintegration combines the IMU readings into pseudo-measurements that are independent from the pose estimate and allows for efficient multi-modal sensor-fusion. The pseudo-measurements generated by the proposed method are named Unified Gaussian Preintegrated Measurements (UGPMs). UGPMs rely on GP regression and linear operators to analytically integrate the acceleration data. Moreover; a mechanism for IMU bias and time-shift correction is introduced to allow for seamless multi-modal state estimation. Over quantitative experiments; we show that the UGPMs outperform the current state-of-the-art preintegration methods.

Abstract:
For robots to work alongside humans and perform in unstructured environments; they must learn new motion skills and adapt them to unseen situations on the fly. This demands learning models that capture relevant motion patterns; while offering enough flexibility to adapt the encoded skills to new requirements; such as dynamic obstacle avoidance. We introduce a Riemannian manifold perspective on this problem; and propose to learn a Riemannian manifold from human demonstrations on which geodesics are natural motion skills. We realize this with a variational autoencoder (VAE) over the space of position and orientations of the robot end-effector. Geodesic motion skills let a robot plan movements from and to arbitrary points on the data manifold. They also provide a straightforward method to avoid obstacles by redefining the ambient metric in an online fashion.Moreover; geodesics naturally exploit the manifold resulting from multiple-solution settings to design motions that were not demonstrated previously. We test our learning framework usinga7-DoF robotic manipulator; where the robot satisfactorily learns and reproduces realistic skills featuring elaborated motion patterns; avoids previously–unseen obstacles; and generates novel movements in multiple-solution settings.

Abstract:
Accurate system modeling and identification gain importance as tasks executed by autonomously acting unmanned aerial vehicles (UAVs) get more complex and demanding. This paper presents a Bayesian filter approach to online and continuously identify the system parameters; sensor suite calibration states; and vehicle navigation states in a holistic framework. Previous work only tackles subsets of the overall state vector during dedicated phases (e.g.; motionless; online during flight; post-processing). These works often introduce the artificial so-called body frame forcing assumptions on system states; such as the inertia matrix’s principal axes orientation. Our approach estimates the entire state vector in the (usually not precisely known) center of mass; eliminating several assumptions caused by the artificially introduced body frame in other work. Since our approach also estimates geometric states such as the rotor and sensor placements; no hand-made measures to the unknown center of mass are required – the system is fully self-calibrating. A detailed discussion on the system’s observability reveals additionally required (different) measurements for a theoretical and a real N-arm multicopter. We show that easy and precise hand-measurable quantities in real applications can provide the required information. Statistically relevant simulations in Gazebo/RotorS providing ground truth for all states yet having realistic physics validate all our findings.

Abstract:
We present in-hand manipulation skills on a dexterous; compliant; anthropomorphic hand. Even though these skills were derived in a simplistic manner; they exhibit surprising robustness to variations in shape; size; weight; and placement of the manipulated object. They are also very insensitive to variation of execution speeds; ranging from highly dynamic to quasi-static. The robustness of the skills leads to compositional properties that enable extended and robust manipulation programs. To explain the surprising robustness of the in-hand manipulation skills; we performed a detailed; empirical analysis of the skills' performance. From this analysis; we identify three principles for skill design: 1) Exploiting the hardware's innate ability to drive hard-to-model contact dynamics. 2) Taking actions to constrain these interactions; funneling the system into a narrow set of possibilities. 3) Composing such action sequences into complex manipulation programs. We believe that these principles constitute an important foundation for robust robotic in-hand manipulation; and possibly for manipulation in general.

Abstract:
The current dominant paradigm for robotic manipulation involves two separate stages: manipulator design and control. Because the robot's morphology and how it can be controlled are intimately linked; joint optimization of design and control can significantly improve performance. Existing methods for co-optimization are limited and fail to explore a rich space of designs. The primary reason is the trade-off between the complexity of designs that is necessary for contact-rich tasks against the practical constraints of manufacturing; optimization; contact handling; etc. We overcome several of these challenges by building an end-to-end differentiable framework for contact-aware robot design. The two key components of this framework are: a novel deformation-based parameterization that allows for the design of articulated rigid robots with arbitrary; complex geometry; and a differentiable rigid body simulator that can handle contact-rich scenarios and computes analytical gradients for a full spectrum of kinematic and dynamic parameters. On multiple manipulation tasks; our framework outperforms existing methods that either only optimize for control or for design using alternate representations or co-optimize using gradient-free methods.

Abstract:
We present Ruckig; an algorithm for Online Trajectory Generation (OTG) respecting third-order constraints and complete kinematic target states. Given any initial state of a system with multiple Degrees of Freedom (DoFs); Ruckig calculates a time-optimal trajectory to an arbitrary target state defined by its position; velocity; and acceleration limited by velocity; acceleration; and jerk constraints. The proposed algorithm and implementation allows three contributions: (1) To the best of our knowledge; we derive the first time-optimal OTG algorithm for arbitrary; multi-dimensional target states; in particular including non-zero target acceleration. (2) This is the first open-source prototype of time-optimal OTG with limited jerk and complete time synchronization for multiple DoFs. (3) Ruckig allows for directional velocity and acceleration limits; enabling robots to better use their dynamical resources. We evaluate the robustness and real-time capability of the proposed algorithm on a test suite with over 1 000 000 000 random trajectories as well as in real-world applications.

Abstract:
Control of robots with kinematic constraints like loop-closure constraints or interactions with the environment requires solving the underlying constrained dynamics equations of motion. Several approaches have been proposed so far in the literature to solve these constrained optimization problems; for instance by either taking advantage in part of the sparsity of the kinematic tree or by considering an explicit formulation of the constraints in the problem resolution. Yet; not all the constraints allow an explicit formulation and in general; approaches of the state of the art suffer from singularity issues; especially in the context of redundant or singular constraints. In this paper; we propose a unified approach to solve forward dynamics equations involving constraints in an efficient; generic and robust manner. To this aim; we first (i) propose a proximal formulation of the constrained dynamics which converges to an optimal solution in the least-square sense even in the presence of singularities.Based on this proximal formulation; we introduce (ii) a sparse Cholesky factorization of the underlying Karush–Kuhn–Tucker matrix related to the constrained dynamics; which exploits at best the sparsity of the kinematic structure of the robot. We also show (iii) that it is possible to extract from this factorization the Cholesky decomposition associated to the so-called Operational Space Inertia Matrix; inherent to task-based control frameworks or physic simulations. These new formulation and factorization;implemented within the Pinocchio library; are benchmark on various robotic platforms; ranging from classic robotic arms or quadrupeds to humanoid robots with closed kinematic chains; and show how they significantly outperform alternative solutions of the state of the art by a factor 2 or more.

Abstract:
This paper presents INVIGORATE; a robot system that interacts with humans through natural language and grasps a specified object in clutter. The objects may occlude; obstruct; or even stack on top of one another. INVIGORATE embodies several challenges: (i) infer the target object among other occluding objects; from input language expressions and RGB images; (ii) infer object blocking relationships (OBRs) from the images; and (iii) synthesize a multi-step plan to ask questions that disambiguate the target object and to grasp it successfully. We train separate neural networks for object detection; for visual grounding; for question generation; and for OBR detection and grasping. They allow for unrestricted object categories and language expressions; subject to the training datasets. However; errors in visual perception and ambiguity in human languages are inevitable and negatively impact the robot’s performance. To overcome these uncertainties; we build a partially observable Markov decision process (POMDP) that integrates the learned neural network modules. Through approximate POMDP planning; the robot tracks the history of observations and asks disambiguation questions in order to achieve a near-optimal sequence of actions that identify and grasp the target object. INVIGORATE combines the benefits of model-based POMDP planning and data-driven deep learning. Preliminary experiments with INVIGORATE on a Fetch robot show significant benefits of this integrated approach to object grasping in clutter with natural language interactions. A demonstration video is available online: https://youtu.be/zYakh80SGcU .

Abstract:
Reinforcement learning is a promising approach to solving hard robotics tasks. An important challenge is ensuring safety—e.g.; that a walking robot does not fall over or an autonomous car does not crash into an obstacle. We build on an approach that composes the learned policy with a backup policy—it uses the learned policy on the interior of the region where the backup policy is guaranteed to be safe; and switches to the backup policy on the boundary of this region. The key challenge is checking when the backup policy is guaranteed to be safe. Our algorithm; statistical model predictive shielding (SMPS); uses sampling-based verification and linear systems analysis to perform this check. We prove that SMPS ensures safety with high probability; and empirically evaluate its performance on several benchmarks.

Abstract:
We address the problem of calculating complex Jacobian matrices that can arise from optimization problems. An example is the inverse optimal control in human motion analysis which has a cost function that depends on the second order time-derivative of torque ̈τ. Thus; its gradient decomposed to; among other; the Jacobian δ ̈τ/δq. We propose a new concept called N-order Comprehensive Motion Transformation Matrix (N-CMTM) to provide an exact analytical solution of several Jacobians. The computational complexity of the basic Jacobian and its N-order time-derivatives computed from the N-CMTM is experimentally shown to be linear to the number of joints Nj. The N-CMTM is based on well-known spatial algebra which makes it available for any type of robots. Moreover; it can be used along classical algorithms. The computational complexity of the construction of the N-CMTM itself is experimentally shown to be N².

Abstract:
High capacity end-to-end approaches for human motion (behavior) prediction have the ability to represent subtle nuances in human behavior; but struggle with robustness to out of distribution inputs and tail events. Planning-based prediction; on the other hand; can reliably output decent-but-not-great predictions: it is much more stable in the face of distribution shift (as we verify in this work); but it has high inductive bias; missing important aspects that drive human decisions; and ignoring cognitive biases that make human behavior suboptimal. In this work; we analyze one family of approaches that strive to get the best of both worlds: use the end-to-end predictor on common cases; but do not rely on it for tail events / out-of-distribution inputs — switch to the planning-based predictor there. We contribute an analysis of different approaches for detecting when to make this switch; using an autonomous driving domain. We find that promising approaches based on ensembling or generative modeling of the training distribution might not be reliable; but that there very simple methods which can perform surprisingly well — including training a classifier to pick up on tell-tale issues in predicted trajectories.

Abstract:
When studying robots collaborating with humans; much of the focus has been on robot policies that coordinate fluently with human teammates in collaborative tasks. However; less emphasis has been placed on the effect of the environment on coordination behaviors. To thoroughly explore environments that result in diverse behaviors; we propose a framework for procedural generation of environments that are (1) stylistically similar to human-authored environments; (2) guaranteed to be solvable by the human-robot team; and (3) diverse with respect to coordination measures. We analyze the procedurally generated environments in the Overcooked benchmark domain via simulation and an online user study. Results show that the environments result in qualitatively different emerging behaviors and statistically significant differences in collaborative fluency metrics; even when the robot runs the same planning algorithm.

Abstract:
This paper presents a first low-cost autonomous robotic system for underwater assembly of mortarless structures. The long-term goal is to enable the construction of large-scale underwater structures; such as retaining walls and artificial reefs. The approach follows the principle of co-design; the 2-DOF manipulator and blocks are designed to complement the localization and control strategies. The blocks and gripper are designed with a connector geometry that removes error during pickup of blocks and drop assembly. This error correction feature allows a simplification of localization and control; which are based on fiducial markers on custom platforms. We developed the proposed system on a low-cost heavily modified BlueROV2 autonomous vehicle — which we call Droplet — with a two-degree of freedom hand that can open and close a gripper and rotate over the yaw. We performed extensive experiments in the pool to evaluate each component and the system as a whole. Results showed a 100% success rate in dropping blocks in the presence of some localization and control errors and the assembly of several different 3D structures composed of up to eight blocks.

Abstract:
This paper explores a system for assembling structures by dropping block components into place. During and after assembly; the blocks are held together by geometric interlock; so that fasteners or mortar are only needed to bind the final block to one of its neighbors. Drop assembly is a promising strategy for assembly by swimming or flying robots; as it may allow structures to be built without requiring close contact with the existing structure. The current paper explores a mathematical model of interlock; and presents a particular block design that allows interlock to be achieved using only gravity. Proof-of-concept demonstrations of the system are presented using a low-cost and relatively low-precision robot arm. The paper finally analyzes some of the potential limitations of the approach; particularly including flexing of the structure due to manufacturing tolerance limitations.

Abstract:
Tool use requires reasoning about the fit between an object’s affordances and the demands of a task. Visual affordance learning can benefit from goal-directed interaction experience; but current techniques rely on human labels or expert demonstrations to generate this data. In this paper; we describe a method that grounds affordances in physical interactions instead; thus removing the need for human labels or expert policies. We use an efficient sampling-based method to generate successful trajectories that provide contact data; which are then used to reveal affordance representations. Our framework; GIFT; operates in two phases: first; we discover visual affordances from goal-directed interaction with a set of procedurally generated tools; second; we train a model to predict new instances of the discovered affordances on novel tools in a self-supervised fashion. In our experiments; we show that GIFT can leverage a sparse keypoint representation to predict grasp and interaction points to accommodate multiple tasks; such as hooking; reaching; and hammering. GIFT outperforms baselines on all tasks and matches a human oracle on two of three tasks using novel tools.

Abstract:
Accurate and precise terrain estimation is a difficult problem for robot locomotion in real-world environments. Thus; it is useful to have systems that do not depend on accurate estimation to the point of fragility. In this paper; we explore the limits of such an approach by investigating the problem of traversing stair-like terrain without any external perception or terrain models on a bipedal robot. For such blind bipedal platforms; the problem appears difficult (even for humans) due to the surprise elevation changes. Our main contribution is to show that sim-to-real reinforcement learning (RL) can achieve robust locomotion over stair-like terrain on the bipedal robot Cassie using only proprioceptive feedback. Importantly; this only requires modifying an existing flat-terrain training RL framework to include stair-like terrain randomization; without any changes in reward function. To our knowledge; this is the first controller for a bipedal; human-scale robot capable of reliably traversing a variety of real-world stairs and other stair-like disturbances using only proprioception.

Abstract:
Autonomous vehicles interacting with other traffic participants heavily rely on the perception and prediction of other agents' behaviors to plan safe trajectories. However; as occlusions limit the vehicle's perception ability; reasoning about potential hazards beyond the field-of-view is one of the most challenging issues in developing autonomous driving systems. This paper introduces a novel analytical approach that poses the problem of safe trajectory planning under occlusions as a hybrid zero-sum dynamic game between the autonomous vehicle (evader); and an initially hidden traffic participant (pursuer). Due to occlusions; the pursuer's state is initially unknown to the evader and may later be discovered by the vehicle's sensors. The analysis yields optimal strategies for both players as well as the set of initial conditions from which the autonomous vehicle is guaranteed to avoid collisions. We leverage this theoretical result to develop a novel trajectory planning framework that provides worst-case safety guarantees while minimizing conservativeness by accounting for the autonomous vehicle's ability to actively avoid other road users as soon as they are detected in future observations. Our framework is agnostic to the driving environment and suitable for various motion planners. We demonstrate our algorithm on challenging urban and highway driving scenarios using the open-source CARLA simulator.

Abstract:
Highly constrained manipulation tasks continue to be challenging for autonomous robots as they require high levels of precision; typically less than 1mm; which is often incompatible with what can be achieved by traditional perception systems. This paper demonstrates that the combination of state-of-the-art object tracking with passively adaptive mechanical hardware can be leveraged to complete precision manipulation tasks with tight; industrially-relevant tolerances (0.25mm). The proposed control method closes the loop through vision by tracking the relative 6D pose of objects in the relevant workspace. It adjusts the control reference of both the compliant manipulator and the hand to complete object insertion tasks via within-hand manipulation. Contrary to previous efforts for insertion; our method does not require expensive force sensors; precision manipulators; or time-consuming; online learning; which is data hungry. Instead; this effort leverages mechanical compliance and utilizes an object-agnostic manipulation model of the hand learned offline; off-the-shelf motion planning; and an RGBD-based object tracker trained solely with synthetic data. These features allow the proposed system to easily generalize and transfer to new tasks and environments. This paper describes in detail the system components and showcases its efficacy with extensive experiments involving tight tolerance peg-in-hole insertion tasks of various geometries as well as open-world constrained placement tasks.

Abstract:
Reach-avoid optimal control problems; in which the system must reach certain goal conditions while staying clear of unacceptable failure modes; are central to safety and liveness assurance for autonomous robotic systems; but their exact solutions are intractable for complex dynamics and environments. Recent successes in the use of reinforcement learning methods to approximately solve optimal control problems with performance objectives make their application to certification problems attractive; however; the Lagrange-type objective (cumulative costs or rewards over time) used in reinforcement learning is not suitable to encode temporal logic requirements. Recent work has shown promise in extending the reinforcement learning machinery to safety-type problems; whose objective is not a sum but a minimum over time. In this work; we generalize the reinforcement learning formulation to handle all optimal control problems in the reach-avoid category. We derive a time-discounted reach-avoid Bellman backup with contraction mapping properties and prove that the resulting reach-avoid Q-learning algorithm converges under analogous conditions to the traditional Lagrange-type problem; yielding an arbitrarily tight conservative approximation to the reach-avoid set. We further demonstrate the use of this formulation with deep reinforcement learning methods; by treating their approximate solutions as untrusted oracles in a supervisory control framework. We evaluate our proposed framework on a range of nonlinear systems; validating the results against analytic and numerical solutions; and through Monte Carlo simulation in previously intractable problems. Our results open the door to a range of learning-based methods for safe-and-live autonomous behavior; with applications across robotics and automation.

Abstract:
We study planning problems where a controllable agent operates under partial observability and interacts with an uncontrollable opponent; also referred to as the adversary. The agent has two distinct objectives: To maximize an expected value and to adhere to a safety specification. Multi-objective partially observable stochastic games (POSGs) formally model such problems. Yet; even for a single objective; the task of computing suitable policies for POSGs is theoretically hard and computationally intractable in practice. Using a factored state-space representation; we define a decoupling scheme for the POSG state space that—under certain assumptions on the observability and the reward structure—separates the state components relevant for the reward from those relevant for safety. This decoupling affects the possibility to compute provably safe and reward-optimal policies in a tractable two-stage approach. In particular; on the fully observable components related to safety; we exactly compute the set of policies that captures all possible safe choices against the opponent. We restrict the agent's behavior to these safe policies and project the POSG to a partially observable Markov decision process (POMDP). Any reward-maximal policy for the POMDP is then guaranteed to be safe and reward-maximal for the POSG. We showcase our approach's feasibility using high-fidelity simulations of two case studies that concern UAV path planning and autonomous driving. Moreover; to demonstrate the practical applicability; we design a physical experiment involving a robot decision making problem under energy constraints that is motivated by a paired helicopter with NASA's Perseverance Mars rover.

Abstract:
Mapping and localization using surface features is prone to failure due to environment changes such as inclement weather. Recently; Localizing Ground Penetrating Radar (LGPR) has been proposed as an alternative means of localizing using underground features that are stable over time and less affected by surface conditions. However; due to the lack of commercially available LGPR sensors; the wider research community has been largely unable to replicate this work or build new and innovative solutions. We present GROUNDED an open dataset of LGPR scans collected in a variety of environments and weather conditions. By labeling this data with ground truth localization from an RTK-GPS / Inertial Navigation System; and carefully calibrating and time synchronizing the radar scans with ground truth positions; camera imagery; and Lidar data; we enable researchers to build novel localization solutions that are resilient to changing surface conditions. We include 108 individual runs totalling 450 km of driving with LGPR; GPS; Odometry; Camera; and Lidar measurements. We also present two new evaluation benchmarks for 1) Localizing in Weather and 2) Multi-lane Mapping; to enable comparisons of future work supported by the dataset. The dataset can be accessed at http://lgprdata.com .

Abstract:
Medical steerable needles can move along 3D curvilinear trajectories to avoid anatomical obstacles and reach clinically significant targets inside the human body. Automating steerable needle procedures can enable physicians and patients to harness the full potential of steerable needles by maximally leveraging their steerability to safely and accurately reach targets for medical procedures such as biopsies and localized therapy delivery for cancer. For the automation of medical procedures to be clinically accepted; it is critical from a patient care; safety; and regulatory perspective to certify the correctness and effectiveness of the motion planning algorithms involved in procedure automation. In this paper; we take an important step toward creating a certifiable motion planner for steerable needles. We introduce the first motion planner for steerable needles that offers a guarantee; under clinically appropriate assumptions; that it will; in finite time; compute an exact; obstacle-avoiding motion plan to a specified target; or notify the user that no such plan exists. We present an efficient; resolution-complete motion planner for steerable needles based on a novel adaptation of multi-resolution planning. Compared to state-of-the-art steerable needle motion planners (none of which provide any completeness guarantees); we demonstrate that our new resolution-complete motion planner computes plans faster and with a higher success rate.

Abstract:
In this work; we analyze and present an algorithm to find shortest-paths for generic rigid bodies. We derived the necessary conditions for optimality using Lagrange multipliers; and compared it to the conditions derived from Pontraygin's Maximum Principle. We derived the equations of the necessary conditions using geometric Jacobian; drawing inspiration from the similarity between the rigid-body systems and the arm-like systems. In the previous work, the analysis focused on finding shortest-paths to reach goals in positions only. This work extends the analysis to find the shortest-path to reach a goal with complete configuration in 3D. We show that the algorithm is resolution complete even when the orientations are included. To overcome the complexity of 3D orientations; we describe the system using three points in the robot frame; and show that this parameter system is redundant but can derive the same necessary conditions as those derived using the minimum parameters (configuration). We used a 3D Dubins system to demonstrate the correctness of the analysis and the algorithm.

Abstract:
Robots frequently need to perceive object attributes; such as "red;" "heavy;" and "empty;" using multimodal exploratory actions; such as "look;" "lift;" and "shake." Robot attribute learning algorithms aim to learn an observation model for each perceivable attribute given an exploratory action. Once the attribute models are learned; they can be used to identify attributes of new objects; answering questions; such as "Is this object red and empty?" Attribute learning and identification are being treated as two separate problems in the literature. In this paper; we first define a new problem called online robot attribute learning (On-RAL); where the robot works on attribute learning and attribute identification simultaneously. Then we develop an algorithm called information-theoretic reward shaping (ITRS) that actively addresses the trade-off between exploration and exploitation in On-RAL problems. ITRS was compared with competitive robot attribute learning baselines; and experimental results demonstrate ITRS' superiority in learning efficiency and identification accuracy.

Abstract:
The learning efficiency of an intelligent agent can be greatly improved by utilizing a useful set of skills. However; the design of robot skills can often be intractable in real-world applications due to the prohibitive amount of effort and expertise that it requires. In this work; we introduce Skill Learning In Diversified Environments (SLIDE); a method to discover generalizable skills via automated generation of a diverse set of tasks. As opposed to prior work on unsupervised discovery of skills which incentivizes the skills to produce different outcomes in the same environment; our method pairs each skill with a unique task produced by a trainable task generator. To encourage generalizable skills to emerge; our method trains each skill to specialize in the paired task and maximizes the diversity of the generated tasks. A task discriminator defined on the robot behaviors in the generated tasks is jointly trained to estimate the evidence lower bound of the diversity objective. The learned skills can then be composed in a hierarchical reinforcement learning algorithm to solve unseen target tasks. We demonstrate that the proposed method can effectively learn a variety of robot skills in two tabletop manipulation domains. Our results suggest that the learned skills can effectively improve the robot's performance in various unseen target tasks compared to existing reinforcement learning and skill learning methods.

Abstract:
We are motivated by the goal of generalist robots that can complete a wide range of tasks across many environments. Critical to this is the robot's ability to acquire some metric of task success or reward; which is necessary for reinforcement learning; planning; or knowing when to ask for help. For a general-purpose robot operating in the real world; this reward function must also be able to generalize broadly across environments; tasks; and objects; while depending only on on-board sensor observations (e.g. RGB images). While deep learning on large and diverse datasets has shown promise as a path towards such generalization in computer vision and natural language; collecting high quality datasets of robotic interaction at scale remains an open challenge. In contrast; “in-the-wild” videos of humans (e.g. YouTube) contain an extensive collection of people doing interesting tasks across a diverse range of settings. In this work; we propose a simple approach; Domain-agnostic Video Discriminator (DVD); that learns multitask reward functions by training a discriminator to classify whether two videos are performing the same task; and can generalize by virtue of learning from a small amount of robot data with a broad dataset of human videos. We find that by leveraging diverse human datasets; this reward function (a) can generalize zero shot to unseen environments; (b) generalize zero shot to unseen tasks; and (c) can be combined with visual model predictive control to solve robotic manipulation tasks on a real WidowX200 robot in an unseen environment from a single human demo.

Abstract:
We present a method for localizing a single camera with respect to a point cloud map in indoor and outdoor scenes. The problem is challenging because correspondences of local invariant features are inconsistent across the domains between image and 3D. The problem is even more challenging as the method must handle various environmental conditions such as illumination; weather; and seasonal changes. Our method can match equirectangular images to the 3D range projections by extracting cross-domain symmetric place descriptors. Our key insight is to retain condition-invariant 3D geometry features from limited data samples while eliminating the condition-related features by a designed Generative Adversarial Network. Based on such features; we further design a spherical convolution network to learn viewpoint-invariant symmetric place descriptors. We evaluate our method on extensive self-collected datasets; which involve Long-term (variant appearance conditions); Large-scale (up to 2km structure/unstructured environment); and Multistory (four-floor confined space). Our method surpasses other current state-of-the-arts by achieving around 3 times higher place retrievals to inconsistent environments; and above 3 times accuracy on online localization. To highlight our method's generalization capabilities; we also evaluate the recognition across different datasets. With one single trained model; i3dLoc can achieve reliable visual localization under random conditions and viewpoints.

Abstract:
This paper presents a radar odometry method that combines probabilistic trajectory estimation and deep learned features without needing groundtruth pose information. The feature network is trained unsupervised; using only the on-board radar data. With its theoretical foundation based on a data likelihood objective; our method leverages a deep network for processing rich radar data; and a non-differentiable classic estimator for probabilistic inference. We provide extensive experimental results on both the publicly available Oxford Radar RobotCar Dataset and an additional 100 km of driving collected in an urban setting. Our sliding-window implementation of radar odometry outperforms most hand-crafted methods and approaches the current state of the art without requiring a groundtruth trajectory for training. We also demonstrate the effectiveness of radar odometry under adverse weather conditions. Code for this project can be found at: https://github.com/utiasASRL/hero_radar_odometry

Abstract:
To safely navigate unknown environments; robots must accurately perceive dynamic obstacles. Instead of directly measuring the scene depth with a LiDAR sensor; we explore the use of a much cheaper and higher resolution sensor: programmable light curtains. Light curtains are controllable depth sensors that sense only along a surface that a user selects. We use light curtains to estimate the safety envelope of a scene: a hypothetical surface that separates the robot from all obstacles. We show that generating light curtains that sense random locations (from a particular distribution) can quickly discover the safety envelope for scenes with unknown objects. Importantly; we produce theoretical safety guarantees on the probability of detecting an obstacle using random curtains. We combine random curtains with a machine learning based model that forecasts and tracks the motion of the safety envelope efficiently. Our method accurately estimates safety envelopes while providing probabilistic safety guarantees that can be used to certify the efficacy of a robot perception system to detect and avoid dynamic obstacles. We evaluate our approach in a simulated urban driving environment and a real-world environment with moving pedestrians using a light curtain device and show that we can estimate safety envelopes efficiently and effectively.

Abstract:
Today; even the most compute-and-power constrained robots can measure complex; high data-rate video and LIDAR sensory streams. Often; such robots; ranging from low-power drones to space and subterranean rovers; need to transmit high-bitrate sensory data to a remote compute server if they are uncertain or cannot scalably run complex perception or mapping tasks locally. However; today's representations for sensory data are mostly designed for human; not robotic; perception and thus often waste precious compute or wireless network resources to transmit unimportant parts of a scene that are unnecessary for a high-level robotic task. This paper presents an algorithm to learn task-relevant representations of sensory data that are co-designed with a pre-trained robotic perception model's ultimate objective. Our algorithm aggressively compresses robotic sensory data by up to 11x more than competing methods. Further; it achieves high accuracy and robust generalization on diverse tasks including Mars terrain classification with low-power deep learning accelerators; neural motion planning; and environmental timeseries classification.

Abstract:
Cooperatively avoiding collision is a critical functionality for robots navigating in dense human crowds; failure of which could lead to either overaggressive or overcautious behavior. A necessary condition for cooperative collision avoidance is to couple the prediction of the agents' trajectories with the planning of the robot's trajectory. However; it is unclear that trajectory based cooperative collision avoidance captures the correct agent attributes. In this work we migrate from trajectory based coupling to a formalism that couples agent preference distributions. In particular; we show that preference distributions (probability density functions representing agents' intentions) can capture higher order statistics of agent behaviors; such as willingness to cooperate. Thus; coupling in distribution space exploits more information about inter-agent cooperation than coupling in trajectory space. We thus introduce a general objective for coupled prediction and planning in distribution space; and propose an iterative best response optimization method based on variational analysis with guaranteed sufficient decrease. Based on this analysis; we develop a sampling-based motion planning framework called DistNav that runs in real time on a laptop CPU. We evaluate our approach on challenging scenarios from both real world datasets and simulation environments; and benchmark against a wide variety of model based and machine learning based approaches. The safety and efficiency statistics of our approach outperform all other models. Finally; we find that DistNav is competitive with human safety and efficiency performance.

Abstract:
In this paper; our aim is to highlight Tactile Perceptual Aliasing as a problem when using deep neural networks and other discriminative models. Perceptual aliasing will arise wherever a physical variable extracted from tactile data is subject to ambiguity between stimuli that are physically distinct. Here we address this problem using a probabilistic discriminative model implemented as a 5-component mixture density network comprised of a deep neural network that predicts the parameters of a Gaussian mixture model. We show that discriminative regression models such as deep neural networks and Gaussian process regression perform poorly on aliased data; with accurate predictions only when the sources of aliasing are removed. In contrast; the mixture density network identifies aliased data with improved prediction accuracy. The uncertain predictions of the model form patterns that are consistent with the various sources of perceptual ambiguity. In our view; perceptual aliasing will become an unavoidable issue for robot touch as the field progresses to training robots that act in uncertain and unstructured environments; such as with deep reinforcement learning.

Abstract:
Robotic cutting of soft materials is critical for applications such as food processing; household automation; and surgical manipulation. As in other areas of robotics; simulators can facilitate controller verification; policy learning; and dataset generation. Moreover; differentiable simulators can enable gradient-based optimization; which is invaluable for calibrating simulation parameters and optimizing controllers. In this work; we present DiSECt: the first differentiable simulator for cutting soft materials. The simulator augments the finite element method (FEM) with a continuous contact model based on signed distance fields (SDF); as well as a continuous damage model that inserts springs on opposite sides of the cutting plane and allows them to weaken until zero stiffness; enabling crack formation. Through various experiments; we evaluate the performance of the simulator. We first show that the simulator can be calibrated to match resultant forces and deformation fields from a state-of-the-art commercial solver and real-world cutting datasets; with generality across cutting velocities and object instances. We then show that Bayesian inference can be performed efficiently by leveraging the differentiability of the simulator; estimating posteriors over hundreds of parameters in a fraction of the time of derivative-free methods. Finally; we illustrate that control parameters in the simulation can be optimized to minimize cutting forces via lateral slicing motions. We publish videos and additional results on our project website at https://diff-cutting-sim.github.io .

Abstract:
Model predictive control (MPC) schemes have a proven track record for delivering aggressive and robust performance in many challenging control tasks; coping with nonlinear system dynamics; constraints; and observational noise. Despite their success; these methods often rely on simple control distributions; which can limit their performance in highly uncertain and complex environments. MPC frameworks must be able to accommodate changing distributions over system parameters; based on the most recent measurements. In this paper; we devise an implicit variational inference algorithm able to estimate distributions over model parameters and control inputs on-the-fly. The method incorporates Stein Variational gradient descent to approximate the target distributions as a collection of particles; and performs updates based on a Bayesian formulation. This enables the approximation of complex multi-modal posterior distributions; typically occurring in challenging and realistic robot navigation tasks. We demonstrate our approach on both simulated and real-world experiments requiring real-time execution in the face of dynamically changing environments.

Abstract:
Place recognition is the task of recognizing the current scene from a database of known places. The currently dominant algorithmic paradigm is to use (deep learning based) holistic feature vectors to describe each place and use fast vector query methods to find matchings. We propose a novel type of image descriptor; Vector Semantic Representations (VSR); that encodes the spatial semantic layout from a semantic segmentation together with appearance properties in a; for example; 4;096 dimensional vector for place recognition. We leverage operations from the established class of Vector Symbolic Architectures to combine symbolic (e.g. class label) and numeric (e.g. feature map response) information in a common vector representation. We evaluate the proposed semantic descriptor on 13 standard mobile robotic place recognition datasets and compare to six descriptors from the literature. VSR is on par with the best compared descriptor (NetVLAD) in terms of mean average precision and superior in terms of recall and worst-case average precision. This makes the approach particularly interesting for candidate selection. For a more detailed investigation; we discuss and evaluate recall integrity as additional criterion. Further; we demonstrate that the semantic descriptor is particularly well suited for combination with existing appearance descriptors indicating that semantics provide complementary information for image matching.

Abstract:
We consider the problem of learning motion policies for acceleration-based robotics systems with a structured policy class. We leverage a multi-task control framework called RMPflow which has been successfully applied in many robotics problems. Using RMPflow as a structured policy class in learning has several benefits; such as sufficient expressiveness; the flexibility to inject different levels of prior knowledge as well as the ability to transfer policies between robots. However; implementing a system for end-to-end learning of RMPflow policies faces several computational challenges. In this work; we re-examine the RMPflow algorithm and propose a more practical alternative; called RMP2; that uses modern automatic differentiation tools (such as TensorFlow and PyTorch) to compute RMPflow policies. Our new design retains the strengths of RMPflow while bringing in advantages from automatic differentiation; including 1) simple programming interfaces to designing complex transformations; 2) support of general directed acyclic graph (DAG) transformation structures; 3) end-to-end differentiability for policy learning; 4) improved computational efficiency. Because of these features; RMP2 can be treated as a structured policy class for efficient robot learning that is suitable for encoding domain knowledge. Our experiments show that using the structured policy class given by RMP2 can improve policy performance and safety in reinforcement learning tasks for goal reaching in cluttered space. The video for our experimental results can be found at https://youtu.be/dliQ-jsYhgI and the code is available at https://github.com/UWRobotLearning/rmp2 .

Abstract:
In order to provide adaptive and user-friendly solutions to robotic manipulation; it is important that the agent can learn to accomplish tasks even if they are only provided with very sparse instruction signals. To address the issues reinforcement learning algorithms face when task rewards are sparse; this paper proposes an intrinsic motivation approach that can be easily integrated into any standard reinforcement learning algorithm and can allow robotic manipulators to learn useful manipulation skills with only sparse extrinsic rewards. Through integrating and balancing empowerment and curiosity; this approach shows superior performance compared to other state-of-the-art intrinsic exploration approaches during extensive empirical testing. Qualitative analysis also shows that when combined with diversity-driven intrinsic motivations; this approach can help manipulators learn a set of diverse skills which could potentially be applied to other more complicated manipulation tasks and accelerate their learning process.

Abstract:
We propose and study a class of multi-object rearrangement problems in which a robotic manipulator; capable of carrying an item and making item swaps; is tasked to sort items stored in lattices in a time-optimal manner. We systematically analyze the intrinsic optimality structure; which is fairly rich and intriguing; under different levels of item distinguishability (fully labeled; where each item has a unique label; or partially labeled; where multiple items may be of the same type) and different lattice dimensions. Focusing on the most practical setting of one and two dimensions; we develop efficient (low polynomial time) algorithms that optimally perform rearrangements on 1D lattices under both fully- and partially-labeled settings. On the other hand; we prove that rearrangement on 2D and higher dimensional lattices becomes computationally intractable to optimally solve. Despite their NP-hardness; we are able to again develop efficient algorithms for 2D fully- and partially-labeled settings that are asymptotically optimal; in expectation; assuming that the initial configuration is randomly selected. Simulation studies confirm the effectiveness of our algorithms in comparison to greedy best-first approaches. Source code of Python implementation: https://github.com/rutgers-arc-lab/lattice-rearrangement/

Abstract:
We present a method for autonomous exploration in complex three-dimensional (3D) environments. Our method demonstrates exploration faster than the current state-of-the-art using a hierarchical framework — one level maintains data densely and computes a detailed path within a local planning horizon; while another level maintains data sparsely and computes a coarse path at the global scale. Such a framework shares the insight that detailed processing is most effective close to the robot; and gains computational speed by trading-off computation of details far away from the robot. The method optimizes an overall exploration path with respect to the length of the path. In addition; the path in the local area is kinodynamically feasible for the vehicle to follow at a high speed. In experiments; our systems autonomously explore indoor and outdoor environments at a high degree of complexity; with ground and aerial robots. The method produces 80% more exploration efficiency; defined as the average explored volume per second through a run; and consumes less than 50% of computation compared to the state-of-the-art.

Abstract:
Although ground robotic autonomy has gained widespread usage in structured and controlled environments; autonomy in unknown and off-road terrain remains a difficult problem. Extreme; off-road; and unstructured environments such as undeveloped wilderness; caves; and rubble pose unique and challenging problems for autonomous navigation. To tackle these problems we propose an approach for assessing traversability and planning a safe; feasible; and fast trajectory in real-time. Our approach; which we name STEP (Stochastic Traversability Evaluation and Planning); relies on: 1) rapid uncertainty-aware mapping and traversability evaluation; 2) tail risk assessment using the Conditional Value-at-Risk (CVaR); and 3) efficient risk and constraint-aware kinodynamic motion planning using sequential quadratic programming-based (SQP) model predictive control (MPC). We analyze our method in simulation and validate its efficacy on wheeled and legged robotic platforms exploring extreme terrains including an abandoned subway and an underground lava tube.

Abstract:
Correspondence identification is essential for multi-robot collaborative perception; which aims to identify the same objects in order to ensure consistent references of the objects by a group of robots/agents in their own fields of view. Although recent deep learning methods have shown encouraging performance on correspondence identification; they suffer from two shortcomings; including the inability to address non-covisibility in collaborative perception that is caused by occlusion and limited fields of view of the agents; and the inability to quantify and reduce uncertainty to improve correspondence identification. To address both issues; we propose a novel uncertainty-aware deep graph matching method for correspondence identification in collaborative perception. Our method formulates correspondence identification as a deep graph matching problem; which identifies correspondences based upon graph representations that are constructed from robot observations. We propose new deep graph matching networks in the Bayesian framework to explicitly quantify uncertainty in identified correspondences. In addition; we design novel loss functions in order to explicitly reduce correspondence uncertainty and perceptual non-covisibility during learning. We evaluate our method in the robotics applications of collaborative assembly and multi-robot coordination using high-fidelity simulations and physical robots. Experiments have validated that; by addressing uncertainty and non-covisibility; our proposed approach achieves the state-of-the-art performance of correspondence identification.

Abstract:
We consider a category-level perception problem; where one is given 3D sensor data picturing an object of a given category (e.g.; a car); and has to reconstruct the pose and shape of the object despite intra-class variability (i.e.; different car models have different shapes). We consider an active shape model; where—for an object category— we are given a library of potential CAD models describing objects in that category; and we adopt a standard formulation where pose and shape estimation are formulated as a non-convex optimization. Our first contribution is to provide the first certifiably optimal solver for pose and shape estimation. In particular; we show that rotation estimation can be decoupled from the estimation of the object translation and shape; and we demonstrate that (i) the optimal object rotation can be computed via a tight (small-size) semidefinite relaxation; and (ii) the translation and shape parameters can be computed in closed-form given the rotation. Our second contribution is to add an outlier rejection layer to our solver; hence making it robust to a large number of misdetections. Towards this goal; we wrap our optimal solver in a robust estimation scheme based on graduated non-convexity. To further enhance robustness to outliers; we also develop the first graph-theoretic formulation to prune outliers in category-level perception; which removes outliers via convex hull and maximum clique computations; the resulting approach is robust to 70 − 90% outliers. Our third contribution is an extensive experimental evaluation. Besides providing an ablation study on a simulated dataset and on the PASCAL3D+ dataset; we combine our solver with a deep-learned keypoint detector; and show that the resulting approach improves over the state of the art in vehicle pose estimation in the ApolloScape driving datasets.

Abstract:
For tabletop rearrangement problems with overhand grasps; storage space outside the tabletop workspace; or buffers; can temporarily hold objects which greatly facilitates the resolution of a given rearrangement task. This brings forth the natural question of how many running buffers are required so that certain classes of tabletop rearrangement problems are feasible. In this work; we examine the problem for both the labeled (where each object has a specific goal pose) and the unlabeled (where goal poses of objects are interchangeable) settings. On the structural side; we observe that finding the minimum number of running buffers (MRB) can be carried out on a dependency graph abstracted from a problem instance; and show that computing MRB on dependency graphs is NP-hard. We then prove that under both labeled and unlabeled settings; even for uniform cylindrical objects; the number of required running buffers may grow unbounded as the number of objects to be rearranged increases; we further show that the bound for the unlabeled case is tight. On the algorithmic side; we develop highly effective algorithms for finding MRB for both labeled and unlabeled tabletop rearrangement problems; scalable to over a hundred objects under very high object density. Employing these algorithms; empirical evaluations show that random labeled and unlabeled instances; which more closely mimics real-world setups; have much smaller MRB.

Abstract:
Ensuring human safety without unnecessarily impacting task efficiency during human-robot interactive manipulation tasks is a critical challenge. In this work; we formally define human physical safety as collision avoidance or safe impact in the event of a collision. We developed a motion planner that theoretically guarantees safety; with a high probability; under the uncertainty in human dynamic models. Our two-pronged definition of safety is able to unlock the planner's potential in finding efficient plans even when collision avoidance is nearly impossible. The improved efficiency is empirically demonstrated in both a simulated goal-reaching domain and a real-world robot-assisted dressing domain. We provide a unified view of two approaches to safe human-robot interaction: human-aware motion planners that use predictive human models and reactive controllers that compliantly handle collisions.

Abstract:
Reactive motion generation problems are usually solved by computing actions as a sum of policies. However; these policies are independent of each other and thus; they can have conflicting behaviors when summing their contributions together. We introduce Composable Energy Policies (CEP); a novel framework for modular reactive motion generation. CEP computes the control action by optimization over the product of a set of stochastic policies. This product of policies will provide a high probability to those actions that satisfy all the components and low probability to the others. Optimizing over the product of the policies avoids the detrimental effect of conflicting behaviors between policies choosing an action that satisfies all the objectives. Besides; we show that CEP naturally adapts to the Reinforcement Learning problem allowing us to integrate; in a hierarchical fashion; any distribution as prior; from multimodal distributions to non-smooth distributions and learn a new policy given them

Abstract:
Simulation provides a safe and efficient way to generate useful data for learning complex robotic tasks. However; matching simulation and real-world dynamics can be quite challenging; especially for systems that have a large number of unobserved or unmeasurable parameters; which may lie in the robot dynamics itself or in the environment with which the robot interacts. We introduce a novel approach to tackle such a sim-to-real problem by developing policies capable of adapting to new environments; in a zero-shot manner. Key to our approach is an error-aware policy (EAP) that is explicitly made aware of the effect of unobservable factors during training. An EAP takes as input the predicted future state error in the target environment; which is provided by an error-prediction function; simultaneously trained with the EAP. We validate our approach on an assistive walking device trained to help the human user recover from external pushes. We show that a trained EAP for a hip-torque assistive device can be transferred to different human agents with unseen biomechanical characteristics. In addition; we show that our method can be applied to other standard RL control tasks.

Abstract:
In this paper; we address the trajectory planning problem in uncertain nonconvex static and dynamic environments that contain obstacles with probabilistic location; size; and geometry. To address this problem; we provide a risk bounded trajectory planning method that looks for continuous-time trajectories with guaranteed bounded risk over the planning time horizon. Risk is defined as the probability of collision with uncertain obstacles. Existing approaches to address risk bounded trajectory planning problems either are limited to Gaussian uncertainties and convex obstacles or rely on sampling-based methods that need uncertainty samples and time discretization. To address the risk bounded trajectory planning problem; we leverage the notion of risk contours to transform the risk bounded planning problem into a deterministic optimization problem. Risk contours are the set of all points in the uncertain environment with guaranteed bounded risk. The obtained deterministic optimization is; in general; nonlinear and nonconvex time-varying optimization. We provide convex methods based on sum-of-squares optimization to efficiently solve the obtained nonconvex time-varying optimization problem and obtain the continuous-time risk bounded trajectories without time discretization. The provided approach deals with arbitrary probabilistic uncertainties; nonconvex and nonlinear; static and dynamic obstacles; and is suitable for online trajectory planning problems.

Abstract:
In this paper; we address the problem of steering a team of agents under stochastic linear dynamics to prescribed final state means and covariances. The agents operate in a common environment where inter-agent constraints may also be present. In order for our method to be scalable to large-scale systems and computationally efficient; we approach the problem in a distributed control framework using the Alternating Direction Method of Multipliers (ADMM). Each agent solves its own covariance steering problem in parallel; while additional copy variables for its closest neighbors are introduced to ensure that the inter-agent constraints will be satisfied. The inclusion of these additional variables creates a requirement for consensus between original and copy variables that involve the same agent. For this reason; we employ a variation of ADMM for consensus optimization. Simulation results on multi-vehicle systems under uncertainty with collision avoidance constraints illustrate the effectiveness of our algorithm. The substantially improved scalability of our distributed approach with respect to the number of agents is also demonstrated; in comparison with an equivalent centralized scheme.

Abstract:
Many robotic applications involve interactions between multiple agents where an agent's decisions affect the behavior of other agents. Such behaviors can be captured by the equilibria of differential games which provide an expressive framework for modeling the agents' mutual influence. However; finding the equilibria of differential games is in general challenging as it involves solving a set of coupled optimal control problems. In this work; we propose to leverage the special structure of multi-agent interactions to generate interactive trajectories by simply solving a single optimal control problem; namely; the optimal control problem associated with minimizing the potential function of the differential game. Our key insight is that for a certain class of multi-agent interactions; the underlying differential game is indeed a potential differential game for which equilibria can be found by solving a single optimal control problem. We introduce such an optimal control problem and build on single-agent trajectory optimization methods to develop a computationally tractable and scalable algorithm for planning multi-agent interactive trajectories. We will demonstrate the performance of our algorithm in simulation and show that our algorithm outperforms the state-of-the-art game solvers. To further show the real-time capabilities of our algorithm; we will demonstrate the application of our proposed algorithm in a set of experiments involving interactive trajectories for two quadcopters.

Abstract:
Complex mission specifications can be often specified through temporal logics; such as Linear Temporal Logic and its syntactically co-safe fragment; scLTL. Finding trajectories that satisfy such specifications becomes hard if the robot is to fulfil the mission in an initially unknown environment; where neither locations of regions or objects of interest in the environment nor the obstacle space are known a priori. We propose an algorithm that; while exploring the environment; learns important semantic dependencies in the form of a semantic abstraction; and uses it to bias the growth of an Rapidly-exploring random graph towards faster mission completion. Our approach leads to finding trajectories that are much shorter than those found by the sequential approach; which first explores and then plans. Simulations comparing our solution to the sequential approach; carried out in 100 randomized office-like environments; show more than 50% reduction in the trajectory length.

Abstract:
The ability to transfer a policy from one environment to another is a promising avenue for efficient robot learning in realistic settings where task supervision is not available. This can allow us to take advantage of environments well suited for training; such as simulators or laboratories; to learn a policy for a real robot in a home or office. To succeed; such policy transfer must overcome both the visual domain gap (e.g. different illumination or background) and the dynamics domain gap (e.g. different robot calibration or modelling error) between source and target environments. However; prior policy transfer approaches either cannot handle a large domain gap or can only address one type of domain gap at a time. In this paper; we propose a novel policy transfer method with iterative "environment grounding"; IDAPT; that alternates between (1) directly minimizing both visual and dynamics domain gaps by grounding the source environment in the target environment domains; and (2) training a policy on the grounded source environment. This iterative training progressively aligns the domains between the two environments and adapts the policy to the target environment. Once trained; the policy can be directly executed on the target environment. The empirical results on locomotion and robotic manipulation tasks demonstrate that our approach can effectively transfer a policy across visual and dynamics domain gaps with minimal supervision and interaction with the target environment. Videos and code are available at https://clvrai.com/idapt

Abstract:
Robot manipulation for untangling 1D deformable structures such as ropes; cables; and wires is challenging due to their infinite dimensional configuration space; complex dynamics; and tendency to self-occlude. Analytical controllers often fail in the presence of dense configurations; due to the difficulty of grasping between adjacent cable segments. We present two algorithms that enhance robust cable untangling; LOKI and SPiDERMan; which operate alongside HULK; a high-level planner from prior work. LOKI uses a learned model of manipulation features to refine a coarse grasp keypoint prediction to a precise; optimized location and orientation; while SPiDERMan uses a learned model to sense task progress and apply recovery actions. We evaluate these algorithms in physical cable untangling experiments with 336 knots and over 1500 actions on real cables using the da Vinci surgical robot. We find that the combination of HULK; LOKI; and SPiDERMan is able to untangle dense overhand; figure-eight; double-overhand; square; bowline; granny; stevedore; and triple-overhand knots. The composition of these methods successfully untangles a cable from a dense initial configuration in 68.3% of 60 physical experiments and achieves 50% higher success rates than baselines from prior work. Supplementary material; code; and videos can be found at https://tinyurl.com/rssuntangling .

Abstract:
Robots deployed in human-populated spaces often need human help to effectively complete their tasks. Yet; a robot that asks for help too frequently or at the wrong times may cause annoyance; and a robot that asks too infrequently may be unable to complete its tasks. In this paper; we present a model of humans' helpfulness towards a robot in an office environment; learnt from online user study data. Our key insight is that effectively planning for a task that involves bystander help requires disaggregating individual and contextual factors and explicitly reasoning about uncertainty over individual factors. Our model incorporates the individual factor of latent helpfulness and the contextual factors of human busyness and robot frequency of asking. We integrate the model into a Bayes-Adaptive Markov Decision Process (BAMDP) framework and run a user study that compares it to baseline models that do not incorporate individual or contextual factors. The results show that our model significantly outperforms baseline models by a factor of 1.5X; and it does so by asking for help more effectively: asking 1.2X times less while still receiving more human help on average.

Abstract:
Grasp detection in clutter requires the robot to reason about the 3D scene from incomplete and noisy perception. In this work; we draw insight that 3D reconstruction and grasp learning are two intimately connected tasks; both of which require a fine-grained understanding of local geometry details. We thus propose to utilize the synergies between grasp affordance and 3D reconstruction through multi-task learning of a shared representation. Our model takes advantage of deep implicit functions; a continuous and memory-efficient representation; to enable differentiable training of both tasks. We train the model on self-supervised grasp trials data in simulation. Evaluation is conducted on a clutter removal task; where the robot clears cluttered objects by grasping them one at a time. The experimental results in simulation and on the real robot have demonstrated that the use of implicit neural representations and joint learning of grasp affordance and 3D reconstruction have led to state-of-the-art grasping results. Our method outperforms baselines by over 10% in terms of grasp success rate.

Abstract:
Robots and autonomous systems must interact with one another and their environment to provide high-quality services to their users. Dynamic game theory provides an expressive theoretical framework for modeling scenarios involving multiple agents with differing objectives interacting over time. A core challenge when formulating a dynamic game is designing objectives for each agent that capture desired behavior. In this paper; we propose a method for inferring parametric objective models of multiple agents based on observed interactions. Our inverse game solver jointly optimizes player objectives and continuous-state estimates by coupling them through Nash equilibrium constraints. Hence; our method is able to directly maximize the observation likelihood rather than other non-probabilistic surrogate criteria. Our method does not require full observations of game states or player strategies to identify player objectives. Instead; it robustly recovers this information from noisy; partial state observations. As a byproduct of estimating player objectives; our method computes a Nash equilibrium trajectory corresponding to those objectives. Thus; it is suitable for downstream trajectory forecasting tasks. We demonstrate our method in several simulated traffic scenarios. Results show that it reliably estimates player objectives from a short sequence of noise-corrupted partial state observations. Furthermore; using the estimated objectives; our method makes accurate predictions of each player's trajectory.

Abstract:
The rapid rise of accessibility of unmanned aerial vehicles or drones pose a threat to general security and confidentiality. Most of the commercially available or custom-built drones are multi-rotors and are comprised of multiple propellers. Since these propellers rotate at a high-speed; they are generally the fastest moving parts of an image and cannot be directly "seen'' by a classical camera without severe motion blur. We utilize a class of sensors that are particularly suitable for such scenarios called event cameras; which have a high temporal resolution; low-latency; and high dynamic range. In this paper; we model the geometry of a propeller and use it to generate simulated events which are used to train a deep neural network called EVPropNet to detect propellers from the data of an event camera. EVPropNet directly transfers to the real world without any fine-tuning or retraining. We present two applications of our network: (a) tracking and following an unmarked drone and (b) landing on a near-hover drone. We successfully evaluate and demonstrate the proposed approach in many real-world experiments with different propeller shapes and sizes. Our network can detect propellers at a rate of 85.1% even when 60% of the propeller is occluded and can run at upto 35Hz on a 2W power budget. To our knowledge; this is the first deep learning-based solution for detecting propellers (to detect drones). Finally; our applications also show an impressive success rate of 92% and 90% for the tracking and landing tasks respectively.

Abstract:
Traditional parallel-jaw grippers are insufficient for delicate object manipulation due to their stiffness and lack of dexterity. Other dexterous robotic hands often have bulky fingers; rely on complex time-varying cable drives; or are prohibitively expensive. In this paper; we introduce a novel low-cost compliant gripper with two centimeter-scaled 3-DOF delta robots using off-the-shelf linear actuators and 3D-printed soft materials. To model the kinematics of delta robots with soft compliant links; which diverge from typical rigid links; we train neural networks using a perception system. Furthermore; we analyze the delta robot's force profile by varying the starting position in its workspace and measuring the resulting force from a push action. Finally; we demonstrate the compliance and dexterity of our gripper through six dexterous manipulation tasks involving small and delicate objects. Thus; we present the groundwork for creating modular multi-fingered hands that can execute precise and low-inertia manipulations.

Abstract:
Contact planning is crucial to the locomotion performance of limbless robots. Typically; the pattern by which contact is made and broken between the mechanism and its environment determines the motion of the robot. The design of these patterns; often called contact patterns; is a difficult problem. In previous work; the prescription of contact patterns was derived from observations of biological systems or determined empirically from black-box optimization algorithms. However; such heuristic-based contact pattern prescription is only applicable to specific mechanisms; and is challenging to generalize. For example; the stable and effective contact pattern prescribed for a 12-link limbless robot can be neither stable nor effective for a 6-link limbless robot. In this paper; using a geometric motion planning scheme; we develop a framework to design; optimize; and analyze contact patterns to generate effective motion in the desired directions. Inspired by prior work in geometric mechanics; we separate the configuration space into a shape space (the internal joint angles); a contact state space; and a position space; then we optimize the function that couples the contact state space and the shape space. Our framework provides physical insights into the contact pattern design and reveals principles of empirically derived contact pattern prescriptions. Applying this framework; we can not only control the direction of motion of a 12-link limbless robot by modulating the contact patterns; but also design effective sidewinding gaits for robots with fewer motors (e.g.; a 6-link robot). We test our designed gaits by robophysical experiments and obtain excellent agreement. We expect our scheme can be broadly applicable to robots which make/break contact.

Abstract:
Visual place recognition is the task of finding same places in a set of database images for a given set of query images. This task becomes particularly challenging if the environmental condition changes between database and query; for example from day to night. In this paper; we build upon our recent work on graph optimization for place recognition; where a graph was used to model additional structural knowledge like sequences. A subsequent non-linear least squares optimization (NLSQ) improved the place recognition performance. While this approach achieves very high performance; it is quite slow and memory inefficient. This paper addresses the long runtime and the high memory usage in order to obtain the same or better place recognition performance faster on larger problems. We propose a novel graph optimization procedure that is based on Iterated Conditional Modes (ICM). In addition; we investigate a new cost function for an edge in the graph. Our novel ICM-based approach achieves 9.1msec maximum runtime per query; which is 260x faster than the minimum runtime with NLSQ. Moreover; with ICM we can optimize problems that are not feasible with NLSQ on a full graph due to memory limitations. To demonstrate the superior performance of our ICM-based method; we provide extensive experimental evaluations with the essence of 987 precision-recall curves: Our proposed ICM-based method is compared to the NLSQ-based method as well as to six sequence-based approaches from the literature on 21 sequence combinations from five datasets with four different image descriptors. Our experiments show that our ICM-based method with sequence-exploitation not only improves the NLSQ-based performance by 10% on average while being 385x faster and using more than 60x less memory. It also significantly outperforms all six sequence-based methods from the literature by at least 32% on average with the NetVLAD descriptor while using comparable runtime and memory. Code is available online.

Abstract:
We present a fast and feature-complete differentiable physics engine; Nimble ( nimblephysics.org ); that supports Lagrangian dynamics and hard contact constraints for articulated rigid body simulation. Our differentiable physics engine offers a complete set of features that are typically only available in non-differentiable physics simulators commonly used by robotics applications. We solve contact constraints precisely using linear complementarity problems (LCPs). We present efficient and novel analytical gradients through the LCP formulation of inelastic contact that exploit the sparsity of the LCP solution. We support complex contact geometry; and gradients approximating continuous-time elastic collision. We also introduce a novel method to compute complementarity-aware gradients that help downstream optimization tasks avoid stalling in saddle points. We show that an implementation of this combination in an existing physics engine (DART) is capable of a 87x single-core speedup over finite-differencing in computing analytical Jacobians for a single timestep; while preserving all the expressiveness of original DART.

Abstract:
Robots operating in everyday environments need to effectively perceive; model; and infer semantic properties of objects. Existing knowledge reasoning frameworks only model binary relations between an object's class label and its semantic properties; unable to collectively reason about object properties detected by different perception algorithms and grounded in diverse sensory modalities. We bridge the gap between multimodal perception and knowledge reasoning by introducing an n-ary representation that models complex; inter-related object properties. To tackle the problem of collecting n-ary semantic knowledge at scale; we propose a transformer neural network that directly generalizes knowledge from observations of object instances. The learned model can reason at different levels of abstraction; effectively predicting unknown properties of objects in different environmental contexts given different amounts of observed information. We quantitatively validate our approach against five prior methods on LINK; a unique dataset we contribute that contains 1457 situated object instances with 15 multimodal properties types and 200 total properties. Compared to the prior state of the art Markov Logic Network; our model obtains a 10% improvement in predicting unknown properties of novel object instances while reducing training and inference time by 150 times. Additionally; we apply our work to a mobile manipulation robot; demonstrating its ability to leverage n-ary reasoning to retrieve objects and actively detect object properties.

Abstract:
The growth of scale and complexity of interactions between humans and robots highlights the need for new computational methods to automatically evaluate novel algorithms and applications. Exploring diverse scenarios of humans and robots interacting in simulation can improve understanding of the robotic system and avoid potentially costly failures in real-world settings. We formulate this problem as a quality diversity (QD) problem; where the goal is to discover diverse failure scenarios by simultaneously exploring both environments and human actions. We focus on the shared autonomy domain; where the robot attempts to infer the goal of a human operator; and adopt the QD algorithm MAP-Elites to generate scenarios for two published algorithms in this domain: shared autonomy via hindsight optimization and linear policy blending. Some of the generated scenarios confirm previous theoretical findings; while others are surprising and bring about a new understanding of state-of-the-art implementations. Our experiments show that MAP-Elites outperforms Monte-Carlo simulation and optimization based methods in effectively searching the scenario space; highlighting its promise for automatic evaluation of algorithms in human-robot interaction.

Abstract:
We present a novel method for estimation of 3D human poses from a multi-camera setup; employing distributed smart edge sensors coupled with a backend through a semantic feedback loop. 2D joint detection for each camera view is performed locally on a dedicated embedded inference processor. Only the semantic skeleton representation is transmitted over the network and raw images remain on the sensor board. 3D poses are recovered from 2D joints on a central backend; based on triangulation and a body model which incorporates prior knowledge of the human skeleton. A feedback channel from backend to individual sensors is implemented on a semantic level. The allocentric 3D pose is backprojected into the sensor views where it is fused with 2D joint detections. The local semantic model on each sensor can thus be improved by incorporating global context information. The whole pipeline is capable of real-time operation. We evaluate our method on three public datasets; where we achieve state-of-the-art results and show the benefits of our feedback architecture; as well as in our own setup for multi-person experiments. Using the feedback signal improves the 2D joint detections and in turn the estimated 3D poses.

Abstract:
In this work we present a planning and control method for a quadrotor in an autonomous drone race. Our method combines the advantages of both model-based optimal control and model-free deep reinforcement learning. We consider a single drone racing on a track marked by a series of gates; through which it must maneuver in minimum time. Firstly we solve the discretized Hamilton-Jacobi-Bellman (HJB) equation to produce a closed-loop policy for a simplified; reduced order model of the drone. Next; we train a deep network policy in a supervised fashion to mimic the HJB policy. Finally; we further train this network using policy gradient reinforcement learning on the full drone dynamics model with a low-level feedback controller in the loop. This gives a deep network policy for controlling the drone to pass through a single gate. In a race course; this policy is applied successively to each new oncoming gate to guide the drone through the course. The resulting policy completes a high-fidelity AirSim drone race with 12 gates in 34.89s (on average); outracing a model-based HJB policy by 33.20s; a supervised learning policy by 1.24s; and a trajectory planning policy by 12.99s; while a model-free RL policy was never able to complete the race.

Abstract:
Accurate models of robot dynamics are critical for safe and stable control and generalization to novel operational conditions. Hand-designed models; however; may be insufficiently accurate; even after careful parameter tuning. This motivates the use of machine learning techniques to approximate the robot dynamics over a training set of state-control trajectories. The dynamics of many robots; including ground; aerial; and underwater vehicles; are described in terms of their SE(3) pose and generalized velocity; and satisfy conservation of energy principles. This paper proposes a Hamiltonian formulation over the SE(3) manifold of the structure of a neural ordinary differential equation (ODE) network to approximate the dynamics of a rigid body. In contrast to a black-box ODE network; our formulation guarantees total energy conservation by construction. We develop energy shaping and damping injection control for the learned; potentially under-actuated SE(3) Hamiltonian dynamics to enable a unified approach for stabilization and trajectory tracking with various platforms; including pendulum; rigid-body; and quadrotor systems.

Abstract:
This paper is about localising a robot in overhead images using lidar. Specifically; we show how to solve both place recognition and metric localisation of a lidar using only publicly available overhead imagery as a map proxy. This is in contrast to current approaches that rely on prior sensor maps. To handle the drastic modality difference (overhead image vs. on the ground lidar); our method learns a representation that purposely and suitably transforms a given overhead image into a collection of 2D points; allowing for direct comparison against lidar scans. After both modalities are expressed as points; point-based methods can then be leveraged to learn the registration and place recognition task. Our method is the first to learn the place recognition of a lidar using only overhead imagery; and outperforms prior work for metric localisation with large initial pose offsets.

Abstract:
Over the past several years there has been a considerable research investment into learning-based approaches to industrial assembly; but despite significant progress these techniques have yet to be adopted by industry. We argue that it is the prohibitively large design space for Deep Reinforcement Learning (DRL); rather than algorithmic limitations per se; that are truly responsible for this lack of adoption. Pushing these techniques into the industrial mainstream requires an industry-oriented paradigm which differs significantly from the academic mindset. In this paper we define criteria for industry-oriented DRL; and perform a thorough comparison according to these criteria of one family of learning approaches; DRL from demonstration; against a professional industrial integrator on the recently established NIST assembly benchmark. We explain the design choices; representing several years of investigation; which enabled our DRL system to consistently outperform the integrator baseline in terms of both speed and reliability. Finally; we conclude with a competition between our DRL system and a human on a challenge task of insertion into a randomly moving target. This study suggests that DRL is capable of outperforming not only established engineered approaches; but the human motor system as well; and that there remains significant room for improvement. Videos can be found on our project website: https://sites.google.com/view/shield-nist .