RSS2020

Abstract:
Performing acrobatic maneuvers with quadrotors is extremely challenging. Acrobatic flight requires high thrust and extreme angular accelerations that push the platform to its physical limits. Professional drone pilots often measure their level of mastery by flying such maneuvers in competitions. In this paper, we propose to learn a sensorimotor policy that enables an autonomous quadrotor to fly extreme acrobatic maneuvers with only onboard sensing and computation. We train the policy entirely in simulation by leveraging demonstrations from an optimal controller that has access to privileged information. We use appropriate abstractions of the visual input to enable transfer to a real quadrotor. We show that the resulting policy can be directly deployed in the physical world without any fine-tuning on real data. Our methodology has several favorable properties: it does not require a human expert to provide demonstrations, it cannot harm the physical system during training, and it can be used to learn maneuvers that are challenging even for the best human pilots. Our approach enables a physical quadrotor to fly maneuvers such as the Power Loop, the Barrel Roll, and the Matty Flip, during which it incurs accelerations of up to 3g.

Abstract:
Embodiment is an important characteristic for all intelligent agents (creatures and robots), while existing scene description tasks mainly focus on analyzing images passively and the semantic understanding of the scenario is separated from the interaction between the agent and the environment. In this work, we propose the Embodied Scene Description, which exploits the embodiment ability of the agent to find an optimal viewpoint in its environment for scene description tasks. A learning framework with the paradigms of imitation learning and reinforcement learning is established to teach the intelligent agent to generate corresponding sensorimotor activities. The proposed framework is tested on both the AI2Thor dataset and a real world robotic platform demonstrating the effectiveness and extendability of the developed method.

Abstract:
This paper presents a novel system for autonomous, vision-based drone racing combining learned data abstraction, nonlinear filtering, and time-optimal trajectory planning. The system has successfully been deployed at the first autonomous drone racing world championship: the 2019 AlphaPilot Challenge. Contrary to traditional drone racing systems, which only detect the next gate, our approach makes use of any visible gate and takes advantage of multiple, simultaneous gate detections to compensate for drift in the state estimate and build a global map of the gates. The global map and drift-compensated state estimate allow the drone to navigate through the race course even when the gates are not immediately visible and further enable to plan a near time-optimal path through the race course in real time based on approximate drone dynamics. The proposed system has been demonstrated to successfully guide the drone through tight race courses reaching speeds up to 8m/s and ranked second at the 2019 AlphaPilot Challenge.

Abstract:
A shared grasp is a grasp formed by contacts between the manipulated object and both the robot hand and the environment. By trading off hand contacts for environmental contacts, a shared grasp requires fewer contacts with the hand, and enables manipulation even when a full grasp is not possible. Previous research has used shared grasps for non-prehensile manipulation such as pivoting and tumbling. This paper treats the problem more generally, with methods to select the best shared grasp and robot actions for a desired object motion. The central issue is to evaluate the feasible contact modes: for each contact, whether that contact will remain active, and whether slip will occur. Robustness is important. When a contact mode fails, e.g., when a contact is lost, or when unintentional slip occurs, the operation will fail, and in some cases damage may occur. In this work, we enumerate all feasible contact modes, calculate corresponding controls, and select the most robust candidate. We can also optimize the contact geometry for robustness. This paper employs quasi-static analysis of planar rigid bodies with Coulomb friction to derive the algorithms and controls. Finally, we demonstrate the robustness of shared grasping and the use of our methods in representative experiments and examples.

Abstract:
State-of-the-art dense mapping approaches cannot be deployed on Size, Weight, and Power (SWaP) constrained platforms because of their large memory and compute requirements. In this paper, we present an accurate, and efficient approach to dense multi-fidelity 3D mapping using Gaussian distributions as volumetric primitives. The proposed mapping approach supports both high fidelity dense surface reconstruction and lower fidelity volumetric environment representation for fundamental robotics applications. We exploit the inherent working characteristics of an off-the-shelf depth sensor and approximate the distribution of approximately planar points using Gaussian distributions. Explicit modeling of the sensor noise characteristics enable us to incrementally update the map representation in real-time with high accuracy. We present the advantages of our proposed map representation over other well known state-of-the-art representations by highlighting its superior performance in terms of reconstruction accuracy, completeness and map compression properties via quantitative and qualitative metrics.

Abstract:
Simultaneous localization and mapping (SLAM) is a fundamental capability required by most autonomous systems. In this paper, we address the problem of loop closing for SLAM based on 3D laser scans recorded by autonomous cars. Our approach utilizes a deep neural network exploiting different cues generated from LiDAR data for finding loop closures. It estimates an image overlap generalized to range images and provides a relative yaw angle estimate between pairs of scans. Based on such predictions, we tackle loop closure detection and integrate our approach into an existing SLAM system to improve its mapping results. We evaluate our approach on sequences of the KITTI odometry benchmark and the Ford campus dataset. We show that our method can effectively detect loop closures surpassing the detection performance of state-of-the-art methods. To highlight the generalization capabilities of our approach, we evaluate our model on the Ford campus dataset while using only KITTI for training. The experiments show that the learned representation is able to provide reliable loop closure candidates, also in unseen environments.

Abstract:
Assistive robots enable people with disabilities to conduct everyday tasks on their own. However, these tasks can be complex, containing both coarse reaching motions and fine-grained manipulation. For example, when eating, not only does one need to move to the correct food item, but they must also precisely manipulate the food in different ways (e.g., cutting, stabbing, scooping). Shared autonomy methods make robot teleoperation safer and more precise by arbitrating user inputs with robot controls. However, these works have focused mainly on the high-level task of reaching a goal from a discrete set, while largely ignoring manipulation of objects at that goal. Meanwhile, dimensionality reduction techniques for teleoperation map useful high-dimensional robot actions into an intuitive low-dimensional controller, but it is unclear if these methods can achieve the requisite precision for tasks like eating. Our insight is that---by combining intuitive embeddings from learned latent actions with robotic assistance from shared autonomy---we can enable precise assistive manipulation. In this work, we adopt learned latent actions for shared autonomy by proposing a new model structure that changes the meaning of the human's input based on the robot's confidence of the goal. We show convergence bounds on the robot's distance to the most likely goal, and develop a training procedure to learn a controller that is able to move between goals even in the presence of shared autonomy. We evaluate our method in simulations and an eating user study. See videos of our experiments here: https://youtu.be/7BouKojzVyk.

Abstract:
Uncooled microbolometers can enable robots to see in the absence of visible illumination by imaging the “heat” radiated from the scene. Despite this ability to see in the dark, these sensors suffer from significant motion blur. This has limited their application on robotic systems. As described in this paper, this motion blur arises due to the thermal inertia of each pixel. This has meant that traditional motion deblurring techniques, which rely on identifying an appropriate spatial blur kernel to perform spatial deconvolution, are unable to reliably perform motion deblurring on thermal camera images. To address this problem, this paper formulates reversing the effect of thermal inertia at a single pixel as a Least Absolute Shrinkage and Selection Operator (LASSO) problem which we can solve rapidly using a quadratic programming solver. By leveraging sparsity and a high frame rate, this pixel-wise LASSO formulation is able to recover motion deblurred frames of thermal videos without using any spatial information. To compare its quality against state-of-the-art visible camera based deblurring methods, this paper evaluated the performance of a family of pre-trained object detectors on a set of images restored by different deblurring algorithms. All evaluated object detectors performed systematically better on images restored by the proposed algorithm rather than any other tested, state-of-the-art methods.

Abstract:
In this paper, we present an integrated, model-based system for state estimation and control in dynamic manipulation tasks with partial observability. We track a belief over the system state using a particle filter from which we extract a Gaussian Mixture Model (GMM). This compressed representation of the belief is used to automatically create a discrete set of goal-directed motion controllers. A reinforcement learning agent then switches between these motion controllers in real-time to accomplish the manipulation task. The proposed system closes the loop from joint sensor feedback to high-frequency, acceleration-limited position commands, thus eliminating the need for pre- and post-processing. We evaluate our approach with respect to five distinct manipulation tasks from the domains of active localization, grasping under uncertainty, assembly, and non-prehensile object manipulation. Extensive simulations demonstrate that the hierarchical policy actively exploits the uncertainty information encoded in the compressed belief. Finally, we validate the proposed method on a real-world robot.

Abstract:
Cables are complex, high dimensional, and dynamic objects. Standard approaches to manipulate them often rely on conservative strategies that involve long series of very slow and incremental deformations, or various mechanical fixtures such as clamps, pins or rings. We are interested in manipulating freely moving cables, in real time, with a pair of robotic grippers, and with no added mechanical constraints. The main contribution of this paper is a perception and control framework that moves in that direction, and uses real-time tactile feedback to accomplish the task of following a dangling cable. The approach relies on a vision-based tactile sensor, GelSight, that estimates the pose of the cable in the grip, and the friction forces during cable sliding. We achieve the behavior by combining two tactile-based controllers: 1) Cable grip controller, where a PD controller combined with a leaky integrator regulates the gripping force to maintain the frictional sliding forces close to a suitable value; and 2) Cable pose controller, where an LQR controller based on a learned linear model of the cable sliding dynamics keeps the cable centered and aligned on the fingertips to prevent the cable from falling from the grip. This behavior is possible by a reactive gripper fitted with GelSight-based high-resolution tactile sensors. The robot can follow one meter of cable in random configurations within 2-3 hand regrasps, adapting to cables of different materials and thicknesses. We demonstrate a robot grasping a headphone cable, sliding the fingers to the jack connector, and inserting it. To the best of our knowledge, this is the first implementation of real-time cable following without the aid of mechanical fixtures.

Abstract:
Typical end-to-end formulations for learning robotic navigation involve predicting a small set of steering command actions (e.g., step forward, turn left, turn right, etc.) from images of the current state (e.g., a bird's-eye view of a SLAM reconstruction). Instead, we show that it can be advantageous to learn with dense action representations defined in the same domain as the state. In this work, we present "spatial action maps," in which the set of possible actions is represented by a pixel map (aligned with the input image of the current state), where each pixel represents a local navigational endpoint at the corresponding scene location. Using ConvNets to infer spatial action maps from state images, action predictions are thereby spatially anchored on local visual features in the scene, enabling significantly faster learning of complex behaviors for mobile manipulation tasks with reinforcement learning. In our experiments, we task a robot with pushing objects to a goal location, and find that policies learned with spatial action maps achieve much better performance than traditional alternatives.

Abstract:
To represent motions from a mechanical point of view, this paper explores motion embedding using the motion taxonomy. With this taxonomy, manipulations can be described and represented as binary strings called motion codes. Motion codes capture mechanical properties, such as contact type and trajectory, that should be used to define suitable distance metrics between motions or loss functions for deep learning and reinforcement learning. Motion codes can also be used to consolidate aliases or cluster motion types that share similar properties. Using existing data sets as a reference, we discuss how motion codes can be created and assigned to actions that are commonly seen in activities of daily living based on intuition as well as real data. Motion codes are compared to vectors from pre-trained Word2Vec models, and we show that motion codes maintain distances that closely match the reality of manipulation.

Abstract:
The successful application of general reinforcement learning algorithms to real-world robotics applications is often limited by their high data requirements. We introduce Regularized Hierarchical Policy Optimization (RHPO) to improve data-efficiency for domains with multiple dominant tasks and ultimately reduce required platform time. To this end, we employ compositional inductive biases on multiple levels and corresponding mechanisms for sharing off-policy transition data across low-level controllers and tasks as well as scheduling of tasks. The presented algorithm enables stable and fast learning for complex, real-world domains in the parallel multitask and sequential transfer case. We show that the investigated types of hierarchy enable positive transfer while partially mitigating negative interference and evaluate the benefits of additional incentives for efficient, compositional task solutions in single task domains. Finally, we demonstrate substantial data-efficiency and final performance gains over competitive baselines in a week-long, physical robot stacking experiment.

Abstract:
Shared autonomy provides an effective framework for human-robot collaboration that takes advantage of the complementary strengths of humans and robots to achieve common goals. Many existing approaches to shared autonomy make restrictive assumptions that the goal space, environment dynamics, or human policy are known a priori, or are limited to discrete action spaces, preventing those methods from scaling to complicated real world environments. We propose a model-free, residual policy learning algorithm for shared autonomy that alleviates the need for these assumptions. Our agents are trained to minimally adjust the human’s actions such that a set of goal-agnostic constraints are satisfied. We test our method in two continuous control environments: Lunar Lander, a 2D flight control domain, and a 6-DOF quadrotor reaching task. In experiments with human and surrogate pilots, our method significantly improves task performance without any knowledge of the human’s goal beyond the constraints. These results highlight the ability of model-free deep reinforcement learning to realize assistive agents suited to continuous control settings with little knowledge of user intent.

Abstract:
Learning-based control aims to construct models of a system to use for planning or trajectory optimization, e.g. in model-based reinforcement learning. In order to obtain guarantees of safety in this context, uncertainty must be accurately quantified. This uncertainty may come from errors in learning (due to a lack of data, for example), or may be inherent to the system. Propagating uncertainty forward in learned dynamics models is a difficult problem. In this work we use deep learning to obtain expressive and flexible models of how distributions of trajectories behave, which we then use for nonlinear Model Predictive Control (MPC). We introduce a deep quantile regression framework for control that enforces probabilistic quantile bounds and quantifies epistemic uncertainty. Using our method we explore three different approaches for learning tubes that contain the possible trajectories of the system, and demonstrate how to use each of them in a Tube MPC scheme. We prove these schemes are recursively feasible and satisfy constraints with a desired margin of probability. We present experiments in simulation on a nonlinear quadrotor system, demonstrating the practical efficacy of these ideas.

Abstract:
Creating accurate spatial representations that take into account uncertainty is critical for autonomous robots to safely navigate in unstructured environments. Although recent LIDAR based mapping techniques can produce robust occupancy maps, learning the parameters of such models demand considerable computational time, discouraging them from being used in real-time and large-scale applications such as autonomous driving. Recognizing the fact that real-world structures exhibit similar geometric features across a variety of urban environments, in this paper, we argue that it is redundant to learn all geometry dependent parameters from scratch. Instead, we propose a theoretical framework building upon the theory of optimal transport to adapt model parameters to account for changes in the environment, significantly amortizing the training cost. Further, with the use of high-fidelity driving simulators and real-world datasets, we demonstrate how parameters of 2D and 3D occupancy maps can be automatically adapted to accord with local spatial changes. We validate various domain adaptation paradigms through a series of experiments, ranging from inter-domain feature transfer to simulation-to-real-world feature transfer. Experiments verified the possibility of estimating parameters with a negligible computational and memory cost, enabling large-scale probabilistic mapping in urban environments.

Abstract:
Truly intelligent agents need to capture the interplay of all their senses to build a rich physical understanding of their world. In robotics, we have seen tremendous progress in using visual and tactile perception; however we have often ignored a key sense: sound. This is primarily due to lack of data that captures the interplay of action and sound. In this work, we perform the first large-scale study of the interactions between sound and robotic action. To do this, we create the largest available sound-action-vision dataset with 15,000 interactions on60 objects using our robotic platform Tilt-Bot. By tilting objects and allowing them to crash into the walls of a robotic tray, we collect rich four-channel audio information. Using this data, we explore the synergies between sound and action, and present three key insights. First, sound is indicative of fine-grained object class information, e.g., sound can differentiate a metal screwdriver from a metal wrench. Second, sound also contains information about the causal effects of an action, i.e. given the sound produced, we can predict what action was applied on the object. Finally, object representations derived from audio embeddings are indicative of implicit physical properties. We demonstrate that on previously unseen objects, audio embeddings generated through interactions can predict forward models 24%better than passive visual embeddings.

Abstract:
We consider the problem of completely covering an unknown discrete environment with a swarm of asynchronous, frequently-crashing autonomous mobile robots. We represent the environment by a discrete graph, and task the robots with occupying every vertex and with constructing an implicit distributed spanning tree of the graph. The robotic agents activate independently at random exponential waiting times of mean 1 and enter the graph environment over time from a source location. They grow the environment's coverage by `settling' at empty locations and aiding other robots' navigation from these locations. The robots are identical and make decisions driven by the same simple and local rule of behaviour. The local rule is based only on the presence of neighbouring robots, and on whether a settled robot points to the current location. Whenever a robot moves, it may crash and disappear from the environment. Each vertex in the environment has limited physical space, so robots frequently obstruct each other. Our goal is to show that even under conditions of asynchronicity, frequent crashing, and limited physical space, the simple mobile robots complete their mission almost surely in linear time, and time to completion degrades gracefully with the frequency of the crashes. Our model and analysis are based on the well-studied ``totally asymmetric simple exclusion process'' in statistical mechanics.

Abstract:
Demand for high-performance, robust, and safe autonomous systems has grown substantially in recent years. These objectives motivate the desire for efficient risk estimation that can be embedded in core decision-making tasks such as motion planning. On one hand, Monte-Carlo (MC) and other sampling-based techniques provide accurate solutions for a wide variety of motion models but are cumbersome in the context of continuous optimization. On the other hand, “direct” approximations aim to compute (or upper-bound) the failure probability as a smooth function of the decision variables, and thus are convenient for optimization. However, existing direct approaches fundamentally assume discrete-time dynamics and can perform unpredictably when applied to continuous-time systems ubiquitous in the real world, often manifesting as severe conservatism. State-of-the-art attempts to address this within a conventional discrete-time framework require additional Gaussianity approximations that ultimately produce inconsistency of their own. In this paper we take a fundamentally different approach, deriving a risk approximation framework directly in continuous time and producing a lightweight estimate that actually converges as the underlying discretization is refined. Our approximation is shown to significantly outperform state-of-the-art techniques in replicating the MC estimate while maintaining the functional and computational benefits of a direct method. This enables robust, risk-aware, continuous motion-planning for a broad class of nonlinear, partially-observable systems.

Abstract:
This work contributes an event-driven visual-tactile perception system, comprising a novel biologically-inspired tactile sensor and multi-modal spike-based learning. Our neuromorphic fingertip tactile sensor, NeuTouch, scales well with the number of taxels thanks to its event-based nature. Likewise, our Visual-Tactile Spiking Neural Network (VT-SNN) enables fast perception when coupled with event sensors. We evaluate our visual-tactile system (using the NeuTouch and Prophesee event camera) on two robot tasks: container classification and rotational slip detection. On both tasks, we observe good accuracies relative to standard deep learning methods. We have made our visual-tactile datasets freely-available to encourage research on multi-modal event-driven robot perception, which we believe is a promising approach towards intelligent power-efficient robot systems.

Abstract:
This paper addresses the problem of visual-inertial self-calibration while focusing on the necessity of online IMU intrinsic calibration. To this end, we perform observability analysis for visual-inertial navigation systems (VINS) with four different inertial model variants containing intrinsic parameters that encompass one commonly used IMU model for low-cost inertial sensors. The analysis theoretically confirms what is intuitively believed in the literature, that is, the IMU intrinsics are observable given fully-excited 6-axis motion. Moreover, we, for the first time, identify 6 primitive degenerate motions for IMU intrinsic calibration. Each degenerate motion profile will cause a set of intrinsic parameters to be unobservable and any combinations of these degenerate motions are still degenerate. This result holds for all four inertial model variants and has significant implications on the necessity to perform online IMU intrinsic calibration in many robotic applications. Extensive simulations and real-world experiments are performed to validate both our observability analysis and degenerate motion analysis.

Abstract:
Controlling a non-statically stable biped is a difficult problem largely due to the complex hybrid dynamics involved. Recent work has demonstrated the effectiveness of reinforcement learning (RL) for simulation-based training of neural network controllers that successfully transfer to real bipeds. The existing work, however, has primarily used simple memoryless network architectures, even though more sophisticated architectures, such as those including memory, often yield superior performance in other RL domains. In this work, we consider recurrent neural networks (RNNs) for sim-to-real biped locomotion, allowing for policies that learn to use internal memory to model important physical properties. We show that while RNNs are able to significantly outperform memoryless policies in simulation, they do not exhibit superior behavior on the real biped due to overfitting to the simulation physics unless trained using dynamics randomization to prevent overfitting; this leads to consistently better sim-to-real transfer. We also show that RNNs could use their learned memory states to perform online system identification by encoding parameters of the dynamics into memory.

Abstract:
We consider the problem of generating a time-optimal quadrotor trajectory that attains a set of prescribed waypoints. This problem is challenging since the optimal trajectory is located on the boundary of the set of dynamically feasible trajectories. This boundary is hard to model as it involves limitations of the entire system, including hardware and software, in agile high-speed flight. In this work, we propose a multi-fidelity Bayesian optimization framework that models the feasibility constraints based on analytical approximation, numerical simulation, and real-world flight experiments. By combining evaluations at different fidelities, trajectory time is optimized while keeping the number of required costly flight experiments to a minimum. The algorithm is thoroughly evaluated in both simulation and real-world flight experiments at speeds up to 11 m/s. Resulting trajectories were found to be significantly faster than those obtained through minimum-snap trajectory planning.

Abstract:
Robotic fabric manipulation has applications in home robotics, textiles, senior care and surgery. Existing fabric manipulation techniques, however, are designed for specific tasks, making it difficult to generalize across different but related tasks. We extend the Visual Foresight framework to learn fabric dynamics that can be efficiently reused to accomplish different fabric manipulation tasks with a single goal-conditioned policy. We introduce VisuoSpatial Foresight (VSF), which builds on prior work by learning visual dynamics on domain randomized RGB images and depth maps simultaneously and completely in simulation. We experimentally evaluate VSF on multi-step fabric smoothing and folding tasks against 5 baseline methods in simulation and on the da Vinci Research Kit (dVRK) surgical robot without any demonstrations at train or test time. Furthermore, we find that leveraging depth significantly improves performance. RGBD data yields an 80% improvement in fabric folding success rate over pure RGB data. Code, data, videos, and supplementary material are available at https://sites.google.com/view/fabric-vsf/.

Abstract:
In this paper we tackle the problem of deformable object manipulation through model-free visual reinforcement learning (RL). In order to circumvent the sample inefficiency of RL, we propose two key ideas that accelerate learning. First, we propose an iterative pick-place action space that encodes the conditional relationship between picking and placing on deformable objects. The explicit structural encoding enables faster learning under complex object dynamics. Second, instead of jointly learning both the pick and the place locations, we only explicitly learn the placing policy conditioned on random pick points. Then, by selecting the pick point that has Maximal Value under Placing (MVP), we obtain our picking policy. Using this learning framework, we obtain an order of magnitude faster learning compared to independent action-spaces on our suite of deformable object manipulation tasks. Finally, using domain randomization, we transfer our policies to a real PR2 robot for challenging cloth and rope manipulation.

Abstract:
We present an end-to-end algorithm for training deep neural networks to grasp novel objects. Our algorithm builds all the essential components of a grasping system using a forward-backward automatic differentiation approach, including the forward kinematics of the gripper, the collision between the gripper and the target object, and the metric for grasp poses. In particular, we show that a generalized Q1 grasp metric is defined and differentiable for inexact grasps generated by a neural network, and the derivatives of our generalized Q1 metric can be computed from a sensitivity analysis of the induced optimization problem. We show that the derivatives of the (self-)collision terms can be efficiently computed from a watertight triangle mesh of low-quality. Altogether, our algorithm allows for the computation of grasp poses for high-DOF grippers in an unsupervised mode with no ground truth data, or it improves the results in a supervised mode using a small dataset. Our new learning algorithm significantly simplifies the data preparation for learning-based grasping systems and leads to higher qualities of learned grasps on common 3D shape datasets [7, 49, 26, 25], achieving a 22% higher success rate on physical hardware and a 0.12 higher value on the Q1 grasp quality metric.

Abstract:
The distributional perspective on reinforcement learning (RL) has given rise to a series of successful Q-learning algorithms, resulting in state-of-the-art performance in arcade game environments. However, it has not yet been analyzed how these findings from a discrete setting translate to complex practical applications characterized by noisy, high dimensional and continuous state-action spaces. In this work, we propose Quantile QT-Opt (Q2-Opt), a distributional variant of the recently introduced distributed Q-learning algorithm for continuous domains, and examine its behaviour in a series of simulated and real vision-based robotic grasping tasks. The absence of an actor in Q2-Opt allows us to directly draw a parallel to the previous discrete experiments in the literature without the additional complexities induced by an actor-critic architecture. We demonstrate that Q2-Opt achieves a superior vision-based object grasping success rate, while also being more sample efficient. The distributional formulation also allows us to experiment with various risk distortion metrics that give us an indication of how robots can concretely manage risk in practice using a Deep RL control policy. As an additional contribution, we perform batch RL experiments in our virtual environment and compare them with the latest findings from discrete settings. Surprisingly, we find that the previous batch RL findings from the literature obtained on arcade game environments do not generalise to our setup.

Abstract:
This paper presents a reinforcement learning approach to synthesizing task-driven control policies for robotic systems equipped with rich sensory modalities (e.g., vision or depth). Standard reinforcement learning algorithms typically produce policies that tightly couple control actions to the entirety of the system's state and rich sensor observations. As a consequence, the resulting policies can often be sensitive to changes in task-irrelevant portions of the state or observations (e.g., changing background colors). In contrast, the approach we present here learns to create a task-driven representation that is used to compute control actions. Formally, this is achieved by deriving a policy gradient-style algorithm that creates an information bottleneck between the states and the task-driven representation; this constrains actions to only depend on task-relevant information. We demonstrate our approach in a thorough set of simulation results on multiple examples including a grasping task that utilizes depth images and a ball-catching task that utilizes RGB images. Comparisons with a standard policy gradient approach demonstrate that the task-driven policies produced by our algorithm are often significantly more robust to sensor noise and task-irrelevant changes in the environment.

Abstract:
Most current methods for learning from demonstrations assume that those demonstrations alone are sufficient to learn the underlying task. This is often untrue, especially if extra safety specifications exist which were not present in the original demonstrations. In this paper, we allow an expert to elaborate on their original demonstration with additional specification information using linear temporal logic (LTL). Our system converts LTL specifications into a differentiable loss. This loss is then used to learn a dynamic movement primitive that satisfies the underlying specification, while remaining close to the original demonstration. Further, by leveraging adversarial training, our system learns to robustly satisfy the given LTL specification on unseen inputs, not just those seen in training. We show our method is expressive enough to work across a variety of common movement specification patterns such as obstacle avoidance, patrolling, keeping steady, and speed limitation. In addition, we show our system can modify a base demonstration with complex specifications by incrementally composing multiple simpler specifications. We also implement our system on a PR-2 robot to show how a demonstrator can start with an initial (sub-optimal) demonstration, then interactively improve task success by including additional specifications enforced with our differentiable LTL loss.

Abstract:
Deep convolutional neural networks (CNNs) have shown outstanding performance in the task of semantically segmenting images. Applying the same methods on 3D data still poses challenges due to the heavy memory requirements and the lack of structured data. Here, we propose LatticeNet, a novel approach for 3D semantic segmentation, which takes raw point clouds as input. A PointNet describes the local geometry which we embed into a sparse permutohedral lattice. The lattice allows for fast convolutions while keeping a low memory footprint. Further, we introduce DeformSlice, a novel learned data-dependent interpolation for projecting lattice features back onto the point cloud. We present results of 3D segmentation on multiple datasets where our method achieves state-of-the-art performance.

Abstract:
In this paper, we study the resilient diffusion problem in a network of robots aiming to perform a task by optimizing a global cost function in a cooperative manner. In distributed diffusion, robots combine the information collected from their local neighbors and incorporate this aggregated information to update their states. If some robots are adversarial, this cooperation can disrupt the convergence of robots to the desired state. We propose a resilient aggregation rule based on the notion of \emphcenterpoint, which is a generalization of the median in the higher dimensional Euclidean space. Robots exchange their d-dimensional state vectors with neighbors. We show that if a normal robot implements the centerpoint-based aggregation rule and has n neighbors, of which at most \lceil\fracnd+1\rceil - 1 are adversarial, then the aggregated state always lies in the convex hull of the states of the normal neighbors of the robot. Consequently, all normal robots implementing the distributed diffusion algorithm converge resiliently to the true target state. We also show that commonly used aggregation rules based on the coordinate-wise median and geometric median are, in fact, not resilient to certain attacks. We numerically evaluate our results on mobile multi-robot networks and demonstrate the cases where diffusion with the weighted average, coordinate-wise median, and geometric median-based aggregation rules fail to converge to the true target state, whereas diffusion with the centerpoint-based rule is resilient in the same scenario.

Abstract:
Designing reward functions is a challenging problem in AI and robotics. Humans usually have a difficult time directly specifying all the desirable behaviors that a robot needs to optimize. One common approach is to learn reward functions from collected expert demonstrations. However, learning reward functions from demonstrations introduces many challenges: some methods require highly structured models, e.g. reward functions that are linear in some predefined set of features, while others adopt less structured reward functions that on the other hand require tremendous amount of data. In addition, humans tend to have a difficult time providing demonstrations on robots with high degrees of freedom, or even quantifying reward values for given demonstrations. To address these challenges, we present a preference-based learning approach, where as an alternative, the human feedback is only in the form of comparisons between trajectories. Furthermore, we do not assume highly constrained structures on the reward function. Instead, we model the reward function using a Gaussian Process (GP) and propose a mathematical formulation to actively find a GP using only human preferences. Our approach enables us to tackle both inflexibility and data-inefficiency problems within a preference-based learning framework. Our results in simulations and a user study suggest that our approach can efficiently learn expressive reward functions for robotics tasks.

Abstract:
Self-reconfigurable modular robots are composed of many modules that can be rearranged into various structures with respect to different activities and tasks. The variable topology truss (VTT) is a class of modular truss robot. These robots are able to change their shape by not only controlling joint positions which is similar to robots with fixed morphologies, but also reconfiguring the connections among modules in order to change their morphologies. Motion planning for VTT robots is difficult due to their non-fixed morphologies, high-dimensionality, potential for self-collision, and complex motion constraints. In this paper, a new motion planning algorithm to dramatically alter the structure of a VTT is presented, as well as some comparative tests to show its effectiveness.

Abstract:
Publicly available satellite imagery can be an ubiquitous, cheap, and powerful tool for vehicle localisation when a prior sensor map is unavailable. However, satellite images are not directly comparable to data from ground range sensors because of their starkly different modalities. We present a learned metric localisation method that not only handles the modality difference, but is cheap to train, learning in a self-supervised fashion without metrically accurate ground truth. By evaluating across multiple real-world datasets, we demonstrate the robustness and versatility of our method for various sensor configurations. We pay particular attention to the use of millimetre wave radar, which, owing to its complex interaction with the scene and its immunity to weather and lighting, makes for a compelling and valuable use case.

Abstract:
This work presents a significantly improved strategy for coordinated multi-agent weeding under conditions of partial environmental information. We show that by using Entropic value-at-risk (EVaR) together with the Gittins index, agents can make intelligent decisions about whether to exploit the estimated distribution of weeds in the environment or to explore new areas of the environment. The use of this method improves the performance of agents in comparison to previous methods, resulting in a system which can weed denser fields using fewer robots. Furthermore, we show that for the reward function and environmental dynamics which represent the weeding problem, our system is able to perform comparably to the fully observed case over the real-world range of seed bank densities, while operating under partial observability.

Abstract:
Reproducing the diverse and agile locomotion skills of animals has been a longstanding challenge in robotics. While manually-designed controllers have been able to emulate many complex behaviors, building such controllers involves a time-consuming and difficult development process, often requiring substantial expertise of the nuances of each skill. Reinforcement learning provides an appealing alternative for automating the manual effort involved in the development of controllers. However, designing learning objectives that elicit the desired behaviors from an agent can also require a great deal of skill-specific expertise. In this work, we present an imitation learning system that enables legged robots to learn agile locomotion skills by imitating real-world animals. We show that by leveraging reference motion data, a single learning-based approach is able to automatically synthesize controllers for a diverse repertoire behaviors for legged robots. By incorporating sample efficient domain adaptation techniques into the training process, our system is able to learn adaptive policies in simulation that can then be quickly adapted for real-world deployment. To demonstrate the effectiveness of our system, we train an 18-DoF quadruped robot to perform a variety of agile behaviors ranging from different locomotion gaits to dynamic hops and turns.

Abstract:
Natural language object retrieval is a highly useful yet challenging task for robots in human-centric environments. Previous work has primarily focused on commands specifying the desired object's type such as "scissors" and/or visual attributes such as "red," thus limiting the robot to only known object classes. We develop a model to retrieve objects based on descriptions of their usage. The model takes in a language command containing a verb, for example "Hand me something to cut," and RGB images of candidate objects and selects the object that best satisfies the task specified by the verb. Our model directly predicts an object's appearance from the object's use specified by a verb phrase. We do not need to explicitly specify an object's class label. Our approach allows us to predict high level concepts like an object's utility based on the language query. Based on contextual information present in the language commands, our model can generalize to unseen object classes and unknown nouns in the commands. Our model correctly selects objects out of sets of five candidates to fulfill natural language commands, and achieves an average accuracy of 62.3% on a held-out test set of unseen ImageNet object classes and 53.0% on unseen object classes and unknown nouns. Our model also achieves an average accuracy of 54.7% on unseen YCB object classes, which have a different image distribution from ImageNet objects. We demonstrate our model on a KUKA LBR iiwa robot arm, enabling the robot to retrieve objects based on natural language descriptions of their usage. We also present a new dataset of 655 verb-object pairs denoting object usage over 50 verbs and 216 object classes.

Abstract:
Dynamic games are an effective paradigm for dealing with the control of multiple interacting actors. This paper introduces ALGAMES (Augmented Lagrangian GAME-theoretic Solver), a solver that handles trajectory optimization problems with multiple actors and general nonlinear state and input constraints. Its novelty resides in satisfying the first order optimality conditions with a quasi-Newton root-finding algorithm and rigorously enforcing constraints using an augmented Lagrangian formulation. We evaluate our solver in the context of autonomous driving on scenarios with a strong level of interactions between the vehicles. We assess the robustness of the solver using Monte Carlo simulations. It is able to reliably solve complex problems like ramp merging with three vehicles three times faster than a state-of-the-art DDP-based approach. A model predictive control (MPC) implementation of the algorithm, running at more than 60Hz, demonstrates ALGAMES' ability to mitigate the "frozen robot" problem on complex autonomous driving scenarios like merging onto a crowded highway.

Abstract:
The multiple-path orienteering problem asks for paths for a team of robots that maximize the total reward collected while satisfying budget constraints on the path length. This problem models many multi-robot routing tasks such as exploring unknown environments and information gathering for environmental monitoring. In this paper, we focus on how to make the robot team robust to failures when operating in adversarial environments. We introduce the Robust Multiple path Orienteering Problem (RMOP) where we seek worst-case guarantees against an adversary that is capable of attacking at most \alpha robots. Our main contribution is a general approximation scheme with bounded approximation guarantee that depends on \alpha and the approximation factor for single robot orienteering. In particular, we show that the algorithm yields a (i) constant factor approximation when the cost function is modular; (ii) log factor approximation when the cost function is submodular; and (iii) constant-factor approximation when the cost function is submodular but the robots are allowed to exceed their path budgets by a bounded amount. In addition to theoretical analysis, we perform simulation study for an ocean monitoring application to demonstrate the efficacy of our approach.

Abstract:
For robotic arms to operate in arbitrary environments, especially near people, it is critical to certify the safety of their motion planning algorithms. However, there is often a trade-off between safety and real-time performance; one can either carefully design safe plans, or rapidly generate potentially-unsafe plans. This work presents a receding-horizon, real-time trajectory planner with safety guarantees, called ARMTD (Autonomous Reachability-based Manipulator Trajectory Design). The method first computes (offline) a reachable set of parameterized trajectories for each joint of an arm. Each trajectory includes a fail-safe maneuver (braking to a stop). At runtime, in each receding-horizon planning iteration, ARMTD constructs a parameterized reachable set of the full arm in workspace and intersects it with obstacles to generate sub-differentiable, provably-conservative collision-avoidance constraints on the trajectory parameters. ARMTD then performs trajectory optimization over the parameters, subject to these constraints. On a 6 degree-of-freedom arm, ARMTD outperforms CHOMP in simulation, never crashes, and completes a variety of real-time planning tasks on hardware.

Abstract:
Models used in modern planning problems to simulate outcomes of real world action executions are becoming increasingly complex, ranging from simulators that do physics-based reasoning to precomputed analytical motion primitives. However, robots operating in the real world often face situations not modeled by these models before execution. This imperfect modeling can lead to highly suboptimal or even incomplete behavior during execution. In this paper, we propose CMAX an approach for interleaving planning and execution. CMAX adapts its planning strategy online during real-world execution to account for any discrepancies in dynamics during planning, without requiring updates to the dynamics of the model. This is achieved by biasing the planner away from transitions whose dynamics are discovered to be inaccurately modeled, thereby leading to robot behavior that tries to complete the task despite having an inaccurate model. We provide provable guarantees on the completeness and efficiency of the proposed planning and execution framework under specific assumptions on the model, for both small and large state spaces. Our approach CMAX is shown to be efficient empirically in simulated robotic tasks including 4D planar pushing, and in real robotic experiments using PR2 involving a 3D pick-and-place task where the mass of the object is incorrectly modeled, and a 7D arm planning task where one of the joints is not operational leading to discrepancy in dynamics

Abstract:
A theoretically complete solution to the optimal Non-revisiting Coverage Path Planning (NCPP) problem of any arbitrarily-shaped object with a non-redundant manipulator is proposed in this work. Given topological graphs of surface cells corresponding to feasible and continuous manipulator configurations, the scheme is aimed at ensuring optimality with respect to the number of surface discontinuities, and extends the existing provable solution attained for simply-connected configuration cell topologies to any arbitrary shape. This is typically classified through their genus, or the number of "holes" which appear increasingly as configurations are further constrained with the introduction of additional metrics for the task at hand, e.g., manipulability thresholds, clearance from obstacles, end-effector orientations, tooling force/torque magnitudes, etc. The novel contribution of this paper is to show that no matter what the resulting topological shapes from such quality cell constraints may be, the graph is finitely solvable, and a multi-stage iterative solver is designed to find all such optimal solutions.

Abstract:
Effective multi-agent teaming requires knowledgeable robots to have the capability of influencing their teammates. Robots are able to possess information that their human and other agent teammates do not, such as by scouting ahead in dangerous areas. To work as an effective team, robots must be able to influence their teammates when necessary and adapt to changing situations in order to move to goal positions that only they may be aware of, while remaining connected as a team. In this paper, we propose the problem of multiple robot teammates tasked with leading a multi-agent team to multiple goal positions while maintaining the ability to communicate with one another. We define utilities of making progress towards goals, maintaining communications with followers, and maintaining communications with fellow leaders. In addition, we introduce a novel regularized optimization formulation that balances these utilities and utilizes structured sparsity inducing norms to focus the leaders' attention on specific goals and followers over time. The dynamically learned utility allows our approach to generate an action for each leader at each time step, which allows the leaders to reach goals without sacrificing communication. We show through extensive synthetic and high-fidelity simulations that our method effectively enables multiple robotic leaders to guide a multi-agent team to different goals while maintaining communication.

Abstract:
Modular robotic systems comprise groups of physically connected modules which can be reconfigured to create morphologies that suit an environment or task. One method of reconfiguration is via subtraction, where extraneous modules disconnect from an initial configuration, before being removed by external intervention. In this paper, we consider an approach to reconfiguration in two dimensions, here termed active subtraction, in which unwanted modules traverse a configuration in order to remove themselves safely, without the need for external intervention, making it a form of self-reconfiguration. We present a sequential solution that selects suitable extraneous modules that then remove themselves, one by one. We also present a parallel solution that, while being more computationally demanding, allows multiple modules to move simultaneously. Both solutions are proven to (i) be correct for any given non-hollow structure, and (ii) require, in the worst case, quadratic time proportionally to the number of modules. Simulation studies demonstrate that both solutions work effectively for specified and randomly generated desired configurations with hundreds of modules, and reveal a non-monotonic dependence between the performance and the percentage of modules to be removed. This work demonstrates active subtraction as a viable method of self-reconfiguration, without the need for heuristics or stochasticity, and suggests its potential for application in real-world systems.

Abstract:
In robot manipulation, planning the motion of a robot manipulator to grasp an object is a fundamental problem. A manipulation planner needs to generate a trajectory of the manipulator to avoid obstacles in the environment and plan an end-effector pose for grasping. While trajectory planning and grasp planning are often tackled separately, how to efficiently integrate the two planning problems remains a challenge. In this work, we present a novel method for joint motion and grasp planning. Our method integrates manipulation trajectory optimization with online grasp synthesis and selection, where we apply online learning techniques to select goal configurations for grasping, and introduce a new grasp synthesis algorithm to generate grasps online. We evaluate our planning approach and demonstrate that our method generates robust and efficient motion plans for grasping in cluttered scenes.

Abstract:
Affordance models are widely used in robotics to represent a robot's possible interactions with its environment. However, robot affordance models are inherently quantitative, making them difficult for humans to understand and interact with. To address this problem, previous works have constructed affordance models by grounding (connecting) them to natural language, but primarily used expert-defined actions, effects, or labels to do so. In this paper, we use short text responses provided by humans and simple randomized robot manipulation actions to construct a labeled affordance model that defines a relationship between English-language labels and robots' internal affordance representations. We first collect label data from a combination of crowdsourced real-world human-robot interactions and online user studies. We then use this data to train classifiers predicting whether or not a particular quantitative affordance will receive a specific label from a person, achieving an average affordance prediction score of 0.87 (area under Receiver Operating Characteristic curve). Our results also show that labels are more accurately predicted by affordance effects than affordance actions---a result that has been hypothesized in prior work but has never been directly tested. Finally, we develop a technique for automatically constructing a hierarchy of labels from crowdsourced data, discovering structure within the learned labels and suggesting the existence of a more universal set of affordance primitives.

Abstract:
Given a desired object trajectory, how should a robot make contact to achieve it? This paper proposes a global optimization model for this problem with alternated-sticking contact, referred to as Contact-Trajectory Optimization. We achieve this by reasoning on simplified geometric environments with a quasi-dynamic relaxation of the physics. These relaxations are the result of approximating bilinear torque effects and deprecating high-order forces and impacts. Moreover, we apply convex approximations that retain the fundamental properties of rigid multi-contact interaction. As result, we derive a mixed-integer convex model that provides global optimality, infeasibility detection and convergence guarantees. This approach does not require seeding and accounts for the shapes of the object and environment. We validate this approach with extensive simulated and real-robot experiments, demonstrating its ability to quickly and reliably optimize multi-contact manipulation behaviors.

Abstract:
Reinforcement learning provides a general framework for learning robotic skills while minimizing engineering effort. However, most reinforcement learning algorithms assume that a well-designed reward function is provided, and learn a single behavior for that single reward function. Such reward functions can be difficult to design in practice. Can we instead develop efficient reinforcement learning methods that acquire diverse skills without any reward function, and then re-purpose these skills for downstream tasks? In this paper, we demonstrate that a recently proposed unsupervised skill discovery algorithm can be extended into an efficient off-policy method, making it suitable for performing unsupervised reinforcement learning in the real world. Firstly, we show that our proposed algorithm provides substantial improvement in learning efficiency, making reward-free real-world training feasible. Secondly, we move beyond the simulation environments and evaluate the algorithm on real physical hardware. On quadrupeds, we observe that locomotion skills with diverse gaits and different orientations emerge without any rewards or demonstrations. We also demonstrate that the learnt skills can be composed using model predictive control for goal-oriented navigation, without any additional training.

Abstract:
As the number of agents comprising a swarm increases, individual-agent-based control techniques for collective task completion become computationally intractable. We study a setting in which the agents move along the nodes of a graph, and the high-level task specifications for the swarm are expressed in a recently proposed language called graph temporal logic (GTL). By constraining the distribution of the swarm over the nodes of the graph, GTL specifies a wide range of properties, including safety, progress, and response. In contrast to the individual-agent-based control techniques, we develop an algorithm to control, in a decentralized and probabilistic manner, a collective property of the swarm: its density distribution. The algorithm, agnostic to the number of agents in the swarm, synthesizes a time-varying Markov chain modeling the time evolution of the density distribution of a swarm subject to GTL. We first formulate the synthesis of such a Markov chain as a mixed-integer nonlinear program (MINLP). Then, to address the intractability of MINLPs, we present an iterative scheme alternating between two relaxations of the MINLP: a linear program and a mixed-integer linear program. We evaluate the algorithm in several scenarios, including a rescue mission in a high-fidelity ROS-Gazebo simulation.

Abstract:
We investigate the problem of using mobile robots equipped with 2D range sensors to optimally guard perimeters or regions, i.e., 1D or 2D sets. Given such a set of arbitrary shape to be guarded, and k mobile sensors where the i-th sensor can guard a circular region with a variable radius r_i, we seek the optimal strategy to deploy the k sensors to fully cover the set such that max r_i is minimized. On the side of computational complexity, we show that computing a 1.155-optimal solution for guarding a perimeter or a region is NP-hard, i.e., the problem is hard to approximate. The hardness result on perimeter guarding holds when each sensor may guard at most two disjoint perimeter segments. On the side of computational methods, for the guarding perimeters, we develop a fully polynomial time approximation scheme (FPTAS) for the special setting where each sensor may only guard a single continuous perimeter segment, suggesting that the aforementioned hard-to-approximate result on the two-disjoint-segment sensing model is tight. For the general problem, we first describe a polynomial-time (2+\epsilon)-approximation algorithm as an upper bound, applicable to both perimeter guarding and region guarding. This is followed by a high-performance integer linear programming (ILP) based method that computes near-optimal solutions. Thorough computational benchmarks as well as evaluation on potential application scenarios demonstrate the effectiveness of these algorithmic solutions.

Abstract:
We consider the problem of dynamically allocating tasks to multiple agents under time window constraints and task completion uncertainty. Our objective is to minimize the number of unsuccessful tasks at the end of the operation horizon. We present a multi-robot allocation algorithm that decouples the key computational challenges of sequential decision-making under uncertainty and multi-agent coordination and addresses them in a hierarchical manner. The lower layer computes policies for individual agents using dynamic programming with tree search, and the upper layer resolves conflicts in individual plans to obtain a valid multi-agent allocation. Our algorithm, Stochastic Conflict-Based Allocation (SCoBA), is optimal in expectation and complete under some reasonable assumptions. In practice, SCoBA is computationally efficient enough to interleave planning and execution online. On the metric of successful task completion, SCoBA consistently outperforms a number of baseline methods and shows strong competitive performance against an oracle with complete lookahead. It also scales well with the number of tasks and agents. We validate our results over a wide range of simulations on two distinct domains: multi-arm conveyor belt pick-and-place and multi-drone delivery dispatch in a city.

Abstract:
We propose NH-TTC, a general method for fast, anticipatory collision avoidance for autonomous robots with arbitrary equations of motions. Our approach exploits implicit differentiation and subgradient descent to locally optimize the non-convex and non-smooth cost functions that arise from planning over the anticipated future positions of nearby obstacles. The result is a flexible framework capable of supporting high-quality, collision-free navigation with a wide variety of robot motion models in various challenging scenarios. We show results for different navigating tasks, with various numbers of agents (with and without reciprocity), on both physical differential drive robots, and simulated robots with different motion models and kinematic and dynamic constraints, including acceleration-controlled agents, differential-drive agents, and smooth car-like agents. The resulting paths are high quality and collision-free, while needing only a few milliseconds of computation as part of an integrated sense-plan-act navigation loop. For a video of further results and reference code, please see the corresponding webpage: http://motion.cs.umn.edu/r/NH-TTC/

Abstract:
We aim to endow a robot with the ability to learn manipulation concepts that link natural language instructions to motor skills. Our goal is to learn a single multi-task policy that takes as input a natural language instruction and an image of the initial scene and outputs a robot motion trajectory to achieve the specified task. This policy has to generalize over different instructions and environments. Our insight is that we can approach this problem through Learning from Demonstration by leveraging large-scale video datasets of humans performing manipulation actions. Thereby, we avoid more time-consuming processes such as teleoperation or kinesthetic teaching. We also avoid having to manually design task-specific rewards. We propose a two-stage learning process where we first learn single-task policies through reinforcement learning. The reward is provided by scoring how well the robot visually appears to perform the task. This score is given by a video-based action classifier trained on a large-scale human activity dataset. In the second stage, we train a multi-task policy through imitation learning to imitate all the single-task policies. In extensive simulation experiments, we show that the multi-task policy learns to perform a large percentage of the 78 different manipulation tasks on which it was trained. The tasks are of greater variety and complexity than previously considered robot manipulation tasks. We show that the policy generalizes over variations of the environment. We also show examples of successful generalization over novel but similar instructions.

Abstract:
There is increasing demand for automated systems that can fabricate 3D structures. Robotic spatial extrusion has become an attractive alternative to traditional layer-based 3D printing due to a manipulator's flexibility to print large, directionally-dependent structures. However, existing extrusion planning algorithms require a substantial amount of human input, do not scale to large instances, and lack theoretical guarantees. In this work, we present a rigorous formalization of robotic spatial extrusion planning and provide several efficient and probabilistically complete planning algorithms. The key planning challenge is, throughout the printing process, satisfying both stiffness constraints that limit the deformation of the structure and geometric constraints that ensure the robot does not collide with the structure. We show that, although these constraints often conflict with each other, a greedy backward state-space search guided by a stiffness-aware heuristic is able to successfully balance both constraints. We empirically compare our methods on a benchmark of over 40 simulated extrusion problems. Finally, we apply our approach to 3 real-world extrusion problems.

Abstract:
We propose a new technique for pushing an unknown object from an initial configuration to a goal configuration with stability constraints. The proposed method leverages recent progress in differentiable physics models to learn unknown mechanical properties of pushed objects, such as their distributions of mass and coefficients of friction. The proposed learning technique computes the gradient of the distance between predicted poses of objects and their actual observed poses, and utilizes that gradient to search for values of the mechanical properties that reduce the reality gap. The proposed approach is also utilized to optimize a policy to efficiently push an object toward the desired goal configuration. Experiments with real objects using a real robot to gather data show that the proposed approach can identify mechanical properties of heterogeneous objects from a small number of pushing actions.

Abstract:
Robotic reinforcement learning (RL) holds the promise of enabling robots to learn complex behaviors through experience. However, realizing this promise for long-horizon tasks in the real world requires mechanisms to reduce human burden in terms of defining the task and scaffolding the learning process. In this paper, we study how these challenges can be alleviated with an automated robotic learning framework, in which multi-stage tasks are defined simply by providing videos of a human demonstrator and then learned autonomously by the robot from raw image observations. A central challenge in imitating human videos is the difference in appearance between the human and robot, which typically requires manual correspondence. We instead take an automated approach and perform pixel-level image translation via CycleGAN to convert the human demonstration into a video of a robot, which can then be used to construct a reward function for a model-based RL algorithm. The robot then learns the task one stage at a time, automatically learning how to reset each stage to retry it multiple times without human-provided resets. This makes the learning process largely automatic, from intuitive task specification via a video to automated training with minimal human intervention. We demonstrate that our approach is capable of learning complex tasks, such as operating a coffee machine, directly from raw image observations, requiring only 20 minutes to provide human demonstrations and about 180 minutes of robot interaction.

Abstract:
Autonomous driving has achieved significant progress in recent years, but autonomous cars are still unable to tackle high-risk situations where a potential accident is likely. In such near-accident scenarios, even a minor change in the vehicle's actions may result in drastically different consequences. To avoid unsafe actions in near-accident scenarios, we need to fully explore the environment. However, reinforcement learning (RL) and imitation learning (IL), two widely-used policy learning methods, cannot model rapid phase transitions and are not scalable to fully cover all the states. To address driving in near-accident scenarios, we propose a hierarchical reinforcement and imitation learning (H-ReIL) approach that consists of low-level policies learned by IL for discrete driving modes, and a high-level policy learned by RL that switches between different driving modes. Our approach exploits the advantages of both IL and RL by integrating them into a unified learning framework. Experimental results and user studies suggest our approach can achieve higher efficiency and safety compared to other methods. Analyses of the policies demonstrate our high-level policy appropriately switches between different low-level policies in near-accident driving situations.

Abstract:
We propose a principled kernel-based policy iteration algorithm to solve the continuous-state Markov Decision Processes (MDPs). In contrast to most decision-theoretic planning frameworks, which assume fully known state transition models, we design a method that eliminates such a strong assumption which is oftentimes extremely difficult to engineer in reality. To achieve this, we first apply the second-order Taylor expansion of the kernelized value function. The Bellman equation is then approximated by a partial differential equation, which only relies on the first and second moments of the transition model. By combining the kernel representation of value function, we then design an efficient policy iteration algorithm whose policy evaluation step can be represented as a linear system of equations evaluated at a finite set of supporting states. We have validated the proposed method through extensive simulations on both simplified and realistic planning scenarios, and the experiments show that our proposed approach leads to a much superior performance over several baseline methods.

Abstract:
Traditional dense volumetric representations for robotic mapping make simplifying assumptions about sensor noise characteristics due to computational constraints. We present a framework that, unlike conventional occupancy grid maps, explicitly models the sensor ray formation for a depth sensor via a Markov Random Field and performs loopy belief propagation to infer the marginal probability of occupancy at each voxel in a map. By explicitly reasoning about occlusions our approach models the correlations between adjacent voxels in the map. Further, by incorporating learnt sensor noise characteristics we perform accurate inference even with noisy sensor data without ad-hoc definitions of sensor uncertainty. We propose a new metric for evaluating probabilistic volumetric maps and demonstrate the higher fidelity of our approach on simulated as well as real-world datasets.

Abstract:
Imitation learning is an effective and safe technique to train robot policies in the real world because it does not depend on an expensive random exploration process. However, due to the lack of exploration, learning policies that generalize beyond the demonstrated behaviors is still an open challenge. We present a novel imitation learning framework to enable robots to 1) learn complex real world manipulation tasks efficiently from a small number of human demonstrations, and 2) synthesize new behaviors not contained in the collected demonstrations. Our key insight is that multi-task domains often present a latent structure, where demonstrated trajectories for different tasks intersect at common regions of the state space. We present Generalization Through Imitation (GTI), a two-stage offline imitation learning algorithm that exploits this intersecting structure to train goal-directed policies that generalize to unseen start and goal state combinations. In the first stage of GTI, we train a stochastic policy that leverages trajectory intersections to have the capacity to compose behaviors from different demonstration trajectories together. In the second stage of GTI, we collect a small set of rollouts from the unconditioned stochastic policy of the first stage, and train a goal-directed agent to generalize to novel start and goal configurations. We validate GTI in both simulated domains and a challenging long-horizon robotic manipulation domain in real world. Additional results and videos are available at https://sites.google.com/view/gti2020/.

Abstract:
Untethered small-scale soft robots have promising applications in minimally invasive surgery, targeted drug delivery, and bioengineering applications as they can access confined spaces in the human body. However, due to highly nonlinear soft continuum deformation kinematics, inherent variability during fabrication on the miniature scale, and lack of accurate models, the conventional control methods cannot be easily applied. Adaptivity of the robot control is additionally crucial for medical operations, as operation environments show large variability and robot materials may degrade or change over time, which would have deteriorating factors on the robot motion and task performance. In this work, we propose using a probabilistic learning approach for millimeter-scale magnetic walking soft robots using Bayesian optimization (BO) and Gaussian processes (GPs). Our approach provides a data-efficient learning scheme to find controller parameters while optimizing the stride length performance of the walking soft millirobot. We demonstrate adaptation to fabrication variabilities and different walking surfaces by adopting our controller learning system to three robots within a small number of physical experiments.

Abstract:
In this work, we present a semi-supervised learning method to transfer human motion data to humanoid robots with varying kinematic configurations while avoiding self-collisions.To this end, we propose a data-driven motion retargeting named locally weighted latent learning which possesses the benefits of both nonparametric regression and deep latent variable modeling.The method can leverage both paired and domain-specific datasets and can maintain robot motion feasibility owing to the nonparametric regression and graph-based heuristics it uses. The proposed method is evaluated using two different humanoid robots,the Robotis ThorMang and COMAN, in simulation environments with diverse motion capture datasets. Furthermore, online puppeteering of a real humanoid robot is implemented.

Abstract:
By harnessing a growing dataset of robot experience, we learn control policies for a diverse and increasing set of related manipulation tasks. To make this possible, we introduce reward sketching: an effective way of eliciting human preferences to learn the reward function for a new task. This reward function is then used to retrospectively annotate all historical data, collected for different tasks, with predicted rewards for the new task. The resulting massive annotated dataset can then be used to learn manipulation policies with batch reinforcement learning (RL) from visual input in a completely off-line way, i.e., without interactions with the real robot. This approach makes it possible to scale up RL in robotics, as we no longer need to run the robot for each step of learning. We show that the trained batch RL agents, when deployed in real robots, can perform a variety of challenging tasks involving multiple interactions among rigid or deformable objects. Moreover, they display a significant degree of robustness and generalization. In some cases, they even outperform human teleoperators.

Abstract:
Training robotic policies in simulation suffers from the sim-to-real gap, as simulated dynamics can be different from real-world dynamics. Past works tackled this problem through domain randomization and online system-identification. The former is sensitive to the manually-specified training distribution of dynamics parameters and can result in behaviors that are overly conservative. The latter requires learning policies that concurrently perform the task and generate useful trajectories for system identification. In this work, we propose and analyze a framework for learning exploration policies that explicitly perform task-oriented exploration actions to identify task-relevant system parameters. These parameters are then used by model-based trajectory optimization algorithms to perform the task in the real world. We instantiate the framework in simulation with the Linear Quadratic Regulator as well as in the real world with pouring and object dragging tasks. Experiments show that task-oriented exploration helps model-based policies adapt to systems with initially unknown parameters, and it leads to better task performance than task-agnostic exploration.

Abstract:
Correspondence identification is a critical capability for multi-robot collaborative perception, which allows a group of robots to consistently refer to the same objects in their own fields of view. Correspondence identification is a challenging problem, especially due to the non-covisible objects that cannot be observed by all robots and the uncertainty in robot perception, which have not been well studied yet in collaborative perception. In this work, we propose a principled approach of regularized graph matching that addresses perception uncertainties and non-covisible objects in a unified mathematical framework to perform correspondence identification in collaborative perception. Our method formulates correspondence identification as a graph matching problem in the regularized constrained optimization framework. We introduce a regularization term to explicitly address perception uncertainties by penalizing the object correspondence with a high uncertainty. We also design a second regularization term to explicitly address non-covisible objects by penalizing the correspondences built by the non-covisible objects. The formulated constrained optimization problem is difficulty to solve, because it is not convex and it contains regularization terms. Thus, we develop a new sampling-based algorithm to solve our formulated regularized constrained optimization problem. We evaluate our approach in the scenarios of connected autonomous driving and multi-robot coordination in simulations and using real robots. Experimental results show that our method is able to address correspondence identification under uncertainty and non-covisibility, and it outperforms the previous techniques and achieves the state-of-the-art performance.

Abstract:
Sidewinder rattlesnakes generate movement through coordinated lateral and vertical traveling waves of body curvature. Previous biological and robotic studies have demonstrated that proper control and coordination of these two waves enables robust and versatile locomotion in complex environments. However, the propagation of the vertical wave, which sets the body-environment contact state, can affect static stability and cause undesirable locomotion behaviors, especially when for movement at low speeds. Here, we propose to stabilize gaits by modulations of the spatial frequency of the vertical wave, which can be used to tune the number of distinct body-environment contact patches (while maintaining a constant overall contact area). These modulations act to stabilize configurations that were previously statically unstable and therefore, by eliminating dynamic effects such as undesired turning, broaden the range of movements and behaviors accessible to limbless locomotors at a variety of speeds. Specifically, our approach identifies, for a given lateral wave, the spatial frequency of the vertical wave that statically stabilizes the locomotor and then uses geometric mechanics tools to identify the coordination (i.e., the phase shift) between the vertical and lateral waves that produces a desired motion. We demonstrate the effectiveness of our technique on the locomotion of both robotic and robophysical systems.

Abstract:
Natural language instructions often exhibit sequential constraints rather than being simply goal-oriented, for example ``go around the lake and then travel north until the intersection''. Existing approaches map these kinds of natural language expressions to Linear Temporal Logic expressions but require an expensive dataset of LTL expressions paired with English sentences. We introduce an approach that can learn to map from English to LTL expressions given only pairs of English sentences and trajectories, enabling a robot to understand commands with sequential constraints. We use formal methods of LTL progression to reward the produced logical forms by progressing each LTL logical form against the ground-truth trajectory, represented as a sequence of states, so that no LTL expressions are needed during training. We evaluate in two ways: on the SAIL dataset, a benchmark artificial environment of 3,266 trajectories and language commands as well as on 10 newly-collected real-world environments of roughly the same size. We show that our model correctly interprets natural language commands with 76.9% accuracy on average. We demonstrate the end-to-end process in real-time in simulation, starting with only a natural language instruction and an initial robot state, producing a logical form from the model trained with trajectories, and finding a trajectory that satisfies sequential constraints with an LTL planner in the environment.

Abstract:
In this paper, we introduce and tackle the simultaneous enhancement and super-resolution (SESR) problem for underwater robot vision and provide an efficient solution for near real-time applications. We present Deep SESR, a residual-in-residual network-based generative model that can learn to restore perceptual image qualities at 2x, 3x, or 4x higher spatial resolution. We supervise its training by formulating a multi-modal objective function that addresses the chrominance-specific underwater color degradation, lack of image sharpness, and loss in high-level feature representation. It is also supervised to learn salient foreground regions in the image, which in turn guides the network to learn global contrast enhancement. We design an end-to-end training pipeline to jointly learn the saliency prediction and SESR on a shared hierarchical feature space for fast inference. Moreover, we present UFO-120, the first dataset to facilitate large-scale SESR learning; it contains over 1500 training samples and a benchmark test set of 120 samples. By thorough experimental evaluation on UFO-120 and several other standard datasets, we demonstrate that Deep SESR outperforms the existing solutions for underwater image enhancement and super-resolution. We also validate its generalization performance on several test cases that include underwater images with diverse spectral and spatial degradation levels, and also terrestrial images with unseen natural objects. Lastly, we analyze its computational feasibility for single-board deployments and demonstrate its operational benefits for visually-guided underwater robots.

Abstract:
We consider shared workspace scenarios with humans and robots acting to achieve independent goals, termed as parallel play. We model these as general-sum games and construct a framework that utilizes the Nash equilibrium solution concept to consider the interactive effect of both agents while planning. We find multiple Pareto-optimal equilibria in these tasks. We hypothesize that people act by choosing an equilibrium based on social norms and their personalities. To enable coordination, we infer the equilibrium online using a probabilistic model that includes these two factors and use it to select the robot's action. We apply our approach to a close-proximity pick-and-place task involving a robot and a simulated human with three potential behaviors - defensive, selfish, and norm-following. We showed that using a Bayesian approach to infer the equilibrium enables the robot to complete the task with less than half the number of collisions while also reducing the task execution time as compared to the best baseline. We also performed a study with human participants interacting either with other humans or with different robot agents and observed that our proposed approach performs similar to human-human parallel play interactions. The code is available at https://github.com/shray/bayes-nash.

Abstract:
This paper presents a game-theoretic path-following formulation where the opponent is an adversary road model. This formulation allows us to compute safe sets using tools from viability theory, that can be used as terminal constraints in an optimization-based motion planner. Based on the adversary road model, we first derive an analytical discriminating domain, which even allows guaranteeing safety in the case when steering rate constraints are considered. Second, we compute the discriminating kernel and show that the output of the gridding based algorithm can be accurately approximated by a fully connected neural network, which can again be used as a terminal constraint. Finally, we show that by using our proposed safe sets, an optimization-based motion planner can successfully drive on city and country roads with prediction horizons too short for other baselines to complete the task.

Abstract:
We present Nav2Goal, a data-efficient and end-to-end learning method for goal-conditioned visual navigation. Our technique is used to train a navigation policy that enables a robot to navigate close to sparse geographic waypoints provided by a user without any prior map, all while avoiding obstacles and choosing paths that cover user-informed regions of interest. Our approach is based on recent advances in conditional imitation learning. General-purpose safe and informative actions are demonstrated by a human expert. The learned policy is subsequently extended to be goal-conditioned by training with hindsight relabelling, guided by the robot's relative localization system, which requires no additional manual annotation. We deployed our method on an underwater vehicle in the open ocean to collect scientifically relevant data of coral reefs, which allowed our robot to operate safely and autonomously, even at very close proximity to the coral. Our field deployments have demonstrated over a kilometer of autonomous visual navigation, where the robot reaches on the order of 40 waypoints, while collecting scientifically relevant data. This is done while travelling within 0.5 m altitude from sensitive corals and exhibiting significant learned agility to overcome turbulent ocean conditions and to actively avoid collisions.

Abstract:
There is a rising interest in Spatio-temporal systems described by Partial Differential Equations (PDEs) among the control community. Not only are these systems challenging to control, but the sizing and placement of their actuation is an NP-hard problem on its own. Recent methods either discretize the space before optimziation, or apply tools from linear systems theory under restrictive linearity assumptions. In this work we consider control and actuator placement as a coupled optimization problem, and derive an optimization algorithm on Hilbert spaces for nonlinear PDEs with an additive spatio-temporal description of white noise. We study first and second order systems and in doing so, extend several results to the case of second order PDEs. The described approach is based on variational optimization, and performs joint RL-type optimization of the feedback control law and the actuator design over episodes. We demonstrate the efficacy of the proposed approach with several simulated experiments on a variety of SPDEs.

Abstract:
We present a novel approach to generate collision-free trajectories for a robot operating in close proximity with a human obstacle in an occluded environment. The self-occlusions of the robot can significantly reduce the accuracy of human motion prediction, and we present a novel deep learning-based prediction algorithm. Our formulation uses CNNs and LSTMs and we augment human-action datasets with synthetically generated occlusion information for training. We also present an occlusion-aware planner that uses our motion prediction algorithm to compute collision-free trajectories. We highlight performance of the overall approach (HMPO) in complex scenarios and observe upto 68% performance improvement in motion prediction accuracy, and 38% improvement in terms of error distance between the ground-truth and the predicted human joint positions.

Abstract:
Scalable robot learning from seamless human-robot interaction is critical if robots are to solve a multitude of tasks in the real world. Current approaches to imitation learning suffer from one of two drawbacks. On the one hand, they rely solely on off-policy human demonstration, which in some cases leads to a mismatch in train-test distribution. On the other, they burden the human to label every state the learner visits, rendering it impractical in many applications. We argue that learning interactively from expert interventions enjoys the best of both worlds. Our key insight is that any amount of expert feedback, whether by intervention or non-intervention, provides information about the quality of the current state, the optimality of the action, or both. We formalize this as a constraint on the learner's value function, which we can efficiently learn using no regret, online learning techniques. We call our approach Expert Intervention Learning (EIL), and evaluate it on a real and simulated driving task with a human expert, where it learns collision avoidance from scratch with just a few hundred samples (about one minute) of expert control.

Abstract:
Autonomous agents are limited in their ability to observe the world state. Partially observable Markov decision processes (POMDPs) model planning under world state uncertainty, but POMDPs with multimodal beliefs, continuous actions, and nonlinear dynamics suitable for robotics applications are challenging to solve. We present a dynamic programming algorithm for planning in the belief space over discrete latent states in POMDPs with continuous states, actions, observations, and nonlinear dynamics. Unlike prior belief space motion planning approaches which assume unimodal Gaussian uncertainty, our approach constructs a novel tree-structured representation of possible observations and multimodal belief space trajectories, and optimizes a contingency plan over this structure. We apply our method to problems with uncertainty over the reward or cost function (e.g., the configuration of goals or obstacles), uncertainty over the dynamics, and uncertainty about interactions, where other agents' behavior is conditioned on latent intentions. Three experiments show that our algorithm outperforms strong baselines for planning under uncertainty, and results from an autonomous lane changing task demonstrate that our algorithm can synthesize robust interactive trajectories.

Abstract:
High-density afferents in the human hand have long been regarded as essential for human grasping and manipulation abilities. In contrast, robotic tactile sensors are typically used to provide low-density contact data, such as center-of-pressure and resultant force. Although useful, this data does not exploit the rich information content that some tactile sensors (e.g., the SynTouch BioTac) naturally provide. This research extends robotic tactile sensing beyond reduced-order models through 1) the automated creation of a precise tactile dataset for the BioTac over diverse physical interactions, 2) a 3D finite element (FE) model of the BioTac, which complements the experimental dataset with high-resolution, distributed contact data, and 3) neural-network-based mappings from raw BioTac signals to low-dimensional experimental data, and more importantly, high-density FE deformation fields. These data streams can provide a far greater quantity of interpretable information for grasping and manipulation algorithms than previously accessible.

Abstract:
We introduce a reconfigurable underactuated robot hand able to perform systematic prehensile in-hand manipulations regardless of object size or shape. The hand utilises a two-degree-of-freedom five-bar linkage as the palm of the gripper, with three three-phalanx underactuated fingers---jointly controlled by a single actuator---connected to the mobile revolute joints of the palm. Three actuators are used in the robot hand system, one for controlling the force exerted on objects by the fingers and two for changing the configuration of the palm. This novel layout allows decoupling grasping and manipulation, facilitating the planning and execution of in-hand manipulation operations. The reconfigurable palm provides the hand with large grasping versatility, and allows easy computation of a map between task space and joint space for manipulation based on distance-based linkage kinematics. The motion of objects of different sizes and shapes from one pose to another is then straightforward and systematic, provided the objects are kept grasped. This is guaranteed independently and passively by the underactuated fingers using a custom tendon routing method, which allows no tendon length variation when the relative finger base position changes with palm reconfigurations. We analyse the theoretical grasping workspace and manipulation capability of the hand, present algorithms for computing the manipulation map and in-hand manipulation planning, and evaluate all these experimentally. Numerical and empirical results of several manipulation trajectories with objects of different size and shape clearly demonstrate the viability of the proposed concept.

Abstract:
Robot teams are increasingly being deployed in environments, such as manufacturing facilities and warehouses, to save cost and improve productivity. To efficiently coordinate multi-robot teams, fast, high-quality scheduling algorithms are essential to satisfy the temporal and spatial constraints imposed by dynamic task specification and part and robot availability. Traditional solutions include exact methods, which are intractable for large-scale problems, or application-specific heuristics, which require expert domain knowledge to develop. In this paper, we propose a novel heterogeneous graph attention network model, called ScheduleNet. By introducing robot- and proximity-specific nodes into the simple temporal network encoding temporal constraints, we obtain a heterogeneous graph structure that is nonparametric in the number of tasks, robots and task resources or locations. We show that our model is end-to-end trainable via imitation learning on small-scale problems, generalizing to large, unseen problems. Empirically, our method outperforms the existing state-of-the-art methods in a variety of testing scenarios.

Abstract:
A framework is presented for handling a potential loss of observability of a dynamical system in a provably safe way. Inspired by the fragility of data-driven perception systems used by autonomous vehicles, we formulate the problem that arises when a sensing modality fails or is found to be untrustworthy during autonomous operation. We cast this problem as a differen- tial game played between the dynamical system being controlled and the external system factor(s) for which observations are lost. The game is a zero-sum Stackelberg game in which the controlled system (leader) is trying to find a trajectory which maximizes a function representing the safety of the system, and the unobserved factor (follower) is trying to minimize the same function. The set of winning initial configurations of this game for the controlled system represents the set of all states in which safety can be maintained with respect to the external factor, even if observability of that factor is lost. This is the set we refer to as the Eyes-Closed Safety Kernel. In practical use, the policy defined by the winning strategy of the controlled system is only needed to be executed whenever observability of the external system is lost or the system deviates from the Eyes-Closed Safety Kernel due to other, non-safety oriented control schemes. We present a means for solving this game offline, such that the resulting winning strategy can be used for computationally efficient, provably-safe, online control when needed. The solution approach presented is based on representing the game using the solutions of two Hamilton-Jacobi partial differential equations. We illustrate the applicability of our framework by working through a realistic example in which an autonomous car must avoid a dynamic obstacle despite potentially losing observability.

Abstract:
We present a method for learning to perform multi-stage tasks from demonstrations by learning the logical structure and atomic propositions of a consistent linear temporal logic (LTL) formula. The learner is given successful but potentially suboptimal demonstrations, where the demonstrator is optimizing a cost function while satisfying the LTL formula, and the cost function is uncertain to the learner. Our algorithm uses the Karush-Kuhn-Tucker (KKT) optimality conditions of the demonstrations together with a counterexample-guided falsification strategy to learn the atomic proposition parameters and logical structure of the LTL formula, respectively. We provide theoretical guarantees on the conservativeness of the recovered atomic proposition sets, as well as completeness in the search for finding an LTL formula consistent with the demonstrations. We evaluate our method on high-dimensional nonlinear systems by learning LTL formulas explaining multi-stage tasks on 7-DOF arm and quadrotor systems and show that it outperforms competing methods for learning LTL formulas from positive examples.

Abstract:
The theoretical unification of Nonlinear Model Predictive Control (NMPC) with Control Lyapunov Functions (CLFs) provides a framework for achieving optimal control performance while ensuring stability guarantees. In this paper we present the first real-time realization of a unified NMPC and CLF controller on a robotic system with limited computational resources. These limitations motivate a set of approaches for efficiently incorporating CLF stability constraints into a general NMPC formulation. We evaluate the performance of the proposed methods compared to baseline CLF and NMPC controllers with a robotic Segway platform both in simulation and on hardware. The addition of a prediction horizon provides a performance advantage over CLF based controllers, which operate optimally point-wise in time. Moreover, the explicitly imposed stability constraints remove the need for difficult cost function and parameter tuning required by NMPC. Therefore the unified controller improves the performance of each isolated controller and simplifies the overall design process.

Abstract:
Formation of subgroups and thereby the problem of intergroup bias is well-studied in psychology. Already from the age of five, children can show ingroup preferences. We developed a social robot mediator to explore how a robot could help overcome these intergroup biases, especially for children newly arrived to a country. By utilizing an online evaluation of collaboration levels, we allow the robot to perceive and act upon the current group dynamics. We investigated the effectiveness of the robot's mediating behavior in a between-subject study with 39 children, of whom 13 children had arrived in Sweden within the last 2 years. Results indicate that the robot could help the process of inclusion by mediating the activity. The robot succeeds in encouraging the newly arrived children to act more outgoing and in increasing collaboration among ingroup children. Further, children show a higher level of prosociality after interacting with the robot. In line with prior work, this study demonstrates the ability of social robotic technology to assist group processes.

Abstract:
We present a numerical method to compute singularity sets in the configuration space of free-floating robots, comparing two different criteria based on formal methods. By exploiting specific properties of free-floating systems and an alternative formulation of the generalized Jacobian, the search space and computational complexity of the algorithm is reduced. It is shown that the resulting singularity maps can be applied in the context of trajectory planning to guarantee feasibility with respect to singularity avoidance. The proposed approach is validated on a space robot composed of a six degrees-of-freedom (DOF) arm mounted on a body with six DOF.

Abstract:
In this paper, we present a new class of Markov decision processes (MDPs), called Tsallis MDPs, with Tsallis entropy maximization, which generalizes existing maximum entropy reinforcement learning (RL). A Tsallis MDP provides a unified framework for the original RL problem and RL with various types of entropy, including the well-known standard Shannon-Gibbs (SG) entropy, using an additional real-valued parameter, called an entropic index. By controlling the entropic index, we can generate various types of entropy, including the SG entropy, and a different entropy results in a different class of the optimal policy in Tsallis MDPs. We also provide a full mathematical analysis of Tsallis MDPs. Our theoretical result enables us to use any positive entropic index in RL. To handle complex and large-scale problems such as learning a controller for soft mobile robot, we also propose a Tsallis actor-critic (TAC). For a different type of RL problems, we find that a different value of the entropic index is desirable and empirically show that TAC with a proper entropic index outperforms the state-of-the-art actor-critic methods. Furthermore, to alleviate the effort for finding the proper entropic index, we propose a linear scheduling method where an entropic index linearly increases as the number of interactions increases. In simulations, the linear scheduling shows the fast convergence speed and a similar performance to TAC with the optimal entropic index, which is a useful property for real robot applications. We also apply TAC with the linear scheduling to learn a feedback controller of a soft mobile robot and shows the best performance compared to other existing actor critic methods in terms of convergence speed and the sum of rewards. Consequently, we empirically show that the proposed method efficiently learns a controller of soft mobile robots.

Abstract:
Aerial manipulation aims at combining the maneuverability of aerial vehicles with the manipulation capabilities of robotic arms. This, however, comes at the cost of the additional control complexity due to the coupling of the dynamics of the two systems. In this paper we present a Nonlinear Model Predictive Controller (NMPC) specifically designed for Micro Aerial Vehicles (MAVs) equipped with a robotic arm. We formulate a hybrid control model for the combined MAV-arm system which incorporates interaction forces acting on the end effector. We explain the practical implementation of our algorithm and show extensive experimental results of our custom built system performing multiple `aerial-writing' tasks on a whiteboard, revealing accuracy in the order of millimetres.

Abstract:
Whether in factory or household scenarios, rhythmic movements play a crucial role in many daily-life tasks. In this paper we propose a Fourier movement primitive (FMP) representation to learn such type of skills from human demonstrations. Our approach takes inspiration from the probabilistic movement primitives (ProMP) framework, and is grounded in signal processing theory through the Fourier transform. It works with minimal preprocessing, as it does not require demonstration alignment nor finding the frequency of demonstrated signals. Additionally, it does not entail the careful choice/parameterization of basis functions, that typically occurs in most forms of movement primitive representations. Indeed, its basis functions are the Fourier series, which can approximate any periodic signal. This makes FMP an excellent choice for tasks that involve a superposition of different frequencies. We show that it is successful for tasks that involve a superposition of different frequencies. Finally, FMP shows interesting extrapolation capabilities as the system has the property of smoothly returning back to the demonstrations (e.g. the limit cycle) when faced with a completely new situation, being safe for real-world robotic tasks. We validate FMP in several experimental cases with real-world data from polishing and 8-letter tasks as well as on a 7-DoF, torque-controlled, Panda robot.

Abstract:
This paper presents a formulation for swarm control and high-level task planning that is dynamically responsive to user commands and adaptable to environmental changes. We design an end-to-end pipeline from a tactile tablet interface for user commands to onboard control of robotic agents based on decentralized ergodic coverage. Our approach demonstrates reliable and dynamic control of a swarm collective through the use of ergodic specifications for planning and executing agent trajectories as well as responding to user and external inputs. We validate our approach in a virtual reality simulation environment and in real-world experiments at the DARPA OFFSET Urban Swarm Challenge FX3 field tests with a robotic swarm where user-based control of the swarm and mission-based tasks require a dynamic and flexible response to changing conditions and objectives in real-time.

Abstract:
In this work, we present a spiking neural network (SNN) based PID controller on a neuromorphic chip. On-chip SNNs are currently being explored in low-power AI applications. Due to potentially ultra-low power consumption, low latency, and high processing speed, on-chip SNNs are a promising tool for control of power-constrained platforms, such as Unmanned Aerial Vehicles (UAV). To obtain highly efficient and fast end-to-end neuromorphic controllers, the SNN-based AI architectures must be seamlessly integrated with motor control. Towards this goal, we present here the first implementation of a fully neuromorphic PID controller. We interfaced Intel's neuromorphic research chip Loihi to a UAV, constrained to a single degree of freedom. We developed an SNN control architecture using populations of spiking neurons, in which each spike carries information about the measured, control, or error value, defined by the identity of the spiking neuron. Using this sparse code, we realize a precise PID controller. The P, I, and D gains of the controller are implemented as synaptic weights that can adapt according to an on-chip plasticity rule. In future work, these plastic synapses can be used to tune and adapt the controller autonomously.

Abstract:
We present a unified representation for actionable spatial perception: 3D Dynamic Scene Graphs. Scene graphs are directed graphs where nodes represent entities in the scene (e.g., objects, walls, rooms), and edges represent relations (e.g., inclusion, adjacency) among nodes. Dynamic scene graphs (DSGs) extend this notion to represent dynamic scenes with moving agents (e.g., humans, robots), and to include actionable information to support planning and decision-making (e.g., spatio-temporal relations, topology at different levels of abstraction). Our second contribution is to provide the first end-to-end fully automatic Spatial PerceptIon eNgine (SPIN) to build a DSG from visual-inertial data. We integrate state-of-the-art techniques for object and human detection and pose estimation, and we describe how to robustly infer object, robot, and human nodes in crowded scenes. To the best of our knowledge, this is the first paper that reconciles visual-inertial SLAM and dense human mesh tracking. Moreover, we provide algorithms to obtain hierarchical representations of indoor environments (e.g., places, structures, rooms) and their relations. Our third contribution is to demonstrate the proposed spatial perception engine in a photo-realistic Unity-based simulator, where we assess its robustness and expressiveness. Finally, we discuss the implications of our proposal on modern robotics applications. We believe 3D Dynamic Scene Graphs can have a profound impact on planning and decision-making, human-robot interaction, long-term autonomy, and scene prediction.

Abstract:
This paper presents fast non-sampling based methods to assess the risk of trajectories for autonomous vehicles when probabilistic predictions of other agents’ futures are generated by deep neural networks (DNNs). The presented methods address a wide range of representations for uncertain predictions including both Gaussian and non-Gaussian mixture models for predictions of both agent positions and controls. We show that the problem of risk assessment when Gaussian mixture models (GMMs) of agent positions are learned can be solved rapidly to arbitrary levels of accuracy with existing numerical methods. To address the problem of risk assessment for non-Gaussian mixture models of agent position, we propose finding upper bounds on risk using Chebyshev’s Inequality and sums-of-squares (SOS) programming; they are both of interest as the former is much faster while the latter can be arbitrarily tight. These approaches only require statistical moments of agent positions to determine upper bounds on risk. To perform risk assessment when models are learned for agent controls as opposed to positions, we develop TreeRing, an algorithm analogous to tree search over the ring of polynomials that can be used to exactly propagate moments of control distributions into position distributions through nonlinear dynamics. The presented methods are demonstrated on realistic predictions from DNNs trained on the Argoverse and CARLA datasets and are shown to be effective for rapidly assessing the probability of low probability events.

Abstract:
Accurate rotation estimation is at the heart of robot perception tasks such as visual odometry and object pose estimation. Deep neural networks have provided a new way to perform these tasks, and the choice of rotation representation is an important part of network design. In this work, we present a novel symmetric matrix representation of the 3D rotation group, SO(3), with two important properties that make it particularly suitable for learned models: (1) it satisfies a smoothness property that improves convergence and generalization when regressing large rotation targets, and (2) it encodes a symmetric Bingham belief over the space of unit quaternions, permitting the training of uncertainty-aware models. We empirically validate the benefits of our formulation by training deep neural rotation regressors on two data modalities. First, we use synthetic point-cloud data to show that our representation leads to superior predictive accuracy over existing representations for arbitrary rotation targets. Second, we use image data collected onboard ground and aerial vehicles to demonstrate that our representation is amenable to an effective out-of-distribution (OOD) rejection technique that significantly improves the robustness of rotation estimates to unseen environmental effects and corrupted input images, without requiring the use of an explicit likelihood loss, stochastic sampling, or an auxiliary classifier. This capability is key for safety-critical applications where detecting novel inputs can prevent catastrophic failure of learned models.

Abstract:
In the past years, research on the embodiment of interactive social agents has been focused on comparisons between robots and virtually-displayed agents. Our work contributes to this line of research by providing a comparison between social robots and disembodied agents exploring the role of embodiment within group interactions. We conducted a user study where participants formed a team with two agents to play a Collective Risk Dilemma (CRD). Besides having two levels of embodiment as between-subjects ---physically-embodied and disembodied---, we also manipulated the agents' degree of cooperation as a within-subjects variable ---one of the agents used a prosocial strategy and the other used selfish strategy. Our results show that while trust levels were similar between the two conditions of embodiment, participants identified more with the team of embodied agents. Surprisingly, when the agents were disembodied, the prosocial agent was rated more positively and the selfish agent was rated more negatively, compared to when they were embodied. The obtained results support that embodied interactions might improve how humans relate with agents in team settings. However, if the social aspects can positively mask selfish behaviours, as our results suggest, a dark side of embodiment may emerge.

Abstract:
In warehousing and manufacturing environments, manipulation platforms are frequently deployed at conveyor belts to perform pick and place tasks. Because objects on the conveyor belts are moving, robots have limited time to pick them up. This brings the requirement for fast and reliable motion planners that could provide provable real-time planning guarantees, which the existing algorithms do not provide. Besides the planning efficiency, the success of manipulation tasks relies heavily on the accuracy of the perception system which often is noisy, especially if the target objects are perceived from a distance. For fast moving conveyor belts, the robot cannot wait for a perfect estimate before it starts execution. In order to be able to reach the object in time it must start moving early on (relying on the initial noisy estimates) and adjust its motion on-the-fly in response to the pose updates from perception. We propose an approach that meets these requirements by providing provable constant-time planning and replanning guarantees. We present it, give its analytical properties and show experimental analysis in simulation and on a real robot.

Abstract:
Creating natural and autonomous interactions with social robots requires rich, multi-modal sensory input from the user. Writing interactive robot programs that make use of this input can demand tedious and error-prone tuning of program parameters, such as tuning thresholds on noisy sensory streams for detecting whether the robot's user is engaged or not. This tuning process dealing with low-level streams and parameters makes programming of social robots time-consuming and inaccessible for people who could benefit the most from unique use cases of social robots. To address this challenge, we propose the use of iterative program repair, where programmers create an initial program sketch in our new Social Robot Program Transition Sketch Language (SoRTSketch), a domain-specific language that supports expressing uncertainties related to thresholds in transition functions. The program is then iteratively repaired using Bayesian inference based on corrections of interaction traces that are either provided by the programmer or derived from implicit feedback given by the user during the interaction. Based on experiments with a human simulator and with 10 human users, we demonstrate the ease and effectiveness of this approach in improving social robot programming and program outputs that represent three common human-robot interaction patterns. We also show how our approach helps programs adapt to environment changes over time.

Abstract:
In this work, we describe an end-to-end system for automatically synthesizing correct-by-construction structure and controls for modular manipulators from high-level task specifications. We define specifications that include both continuous trajectories the end-effector must follow and constraints on the physical space (obstacles and possible locations of the base of the manipulator). In our formulation, trajectories are composed of basic shape primitives (lines, arcs, and circles) and we avoid discretizing the desired trajectory, as other approaches in the literature do. We encode the task as a set of constraints on the manipulator’s kinematics and return the manipulator’s structure and associated control to the user, if a solution is found. By solving for the continuous trajectory, as opposed to a discretized one, we ensure that the original task is satisfied. We demonstrate our approach on three different specifications, and present the resulting physical robots tracing complex trajectories in the presence of obstacles.

Abstract:
Flapping wing aerial vehicles rely heavily on accurate models for a variety of different tasks. There have been significant efforts in creating both analytical and data-driven models for many of these types of vehicles including ornithopters and small aerial vehicles mimicking insects. However, very few works have explored modeling for aerial vehicles with a skeletal structure throughout the wings and a single flexible membrane that covers the wings and tail such as is found in robots with bat morphology. In this paper, we build upon previous efforts to model a bat robot using a combination of first-principles and data-driven tools. We record a series of load cell tests and free-flight experiments, and we optimize the model parameters to improve long-term flight prediction. We introduce several extra terms in the model including a term explaining the coupling between wings and tail in order to maximize the effectiveness of collected flight data. The result is a model that performs well in prediction for a range of different tail actuator configurations as demonstrated by our flight results using a bat robot.

Abstract:
The effectiveness of Socially Assistive Robots (SAR) relies on their ability to motivate particular user behaviours, e.g. engagement with a task, requiring complex social interactions tailored to the needs and motivations of the user. Professionals from human-centred domains such as healthcare are experts in such interactions, but their ability to contribute to SAR development has traditionally been limited to the identification of applications and key design requirements. In this work we demonstrate how interactive machine learning offers a way for such experts to be involved at every stage of design and automation of a robot, as well as the value of taking this approach. We present a novel technical framework for in-situ, online interactive machine learning that can be used in ecologically-valid human robot interactions. Using this framework, we were able generate fully autonomous, appropriate and personalised robot behaviour in a high-dimensional application of assistive robotics.

Abstract:
This work introduces the so-called Modular Passive Tracking Controller (MPTC), a generic passivity-based controller, which aims at independently fulfilling several subtask objectives. These are combined in a stack of tasks (SoT) that serves as a basis for the synthesis of an overall system controller. The corresponding analysis and controller design are based on Lyapunov theory. An important contribution of this work is the design of a specific optimization weighting matrix that ensures passivity of an overdetermined and thus conflicting task setup. The proposed framework is validated through simulations and experiments for both fixed-base and free-floating robots.

Abstract:
Enabling robots to learn tasks and follow instructions as easily as humans is important for many real-world robot applications. Previous approaches have applied machine learning to teach the mapping from language to low dimensional symbolic representations constructed by hand, using demonstration trajectories paired with accompanying instructions. These symbolic methods lead to data efficient learning. Other methods map language directly to high-dimensional control behavior, which requires less design effort but is data-intensive. We propose to first learning symbolic abstractions from demonstration data and then mapping language to those learned abstractions. These symbolic abstractions can be learned with significantly less data than end-to-end approaches, and support partial behavior specification via natural language since they permit planning using traditional planners. During training, our approach requires only a small number of demonstration trajectories paired with natural language—without the use of a simulator—and results in a representation capable of planning to fulfill natural language instructions specifying a goal or partial plan. We apply our approach to two domains, including a mobile manipulator, where a small number of demonstrations enable the robot to follow navigation commands like “Take left at the end of the hallway,” in environments it has not encountered before.

Abstract:
We present a hybrid rigid-soft arm and manipulator for performing tasks requiring dexterity and reach in cluttered environments. Our system combines the benefit of the dexterity of a variable length soft manipulator and the rigid support capability of a hard arm. The hard arm positions the extendable soft manipulator close to the target, and the soft arm manipulator navigates the last few centimeters to reach and grab the target. A novel magnetic sensor and reinforcement learning based control is developed for end effector position control of the robot. A compliant gripper with an IR reflectance sensing system is designed, and a k-nearest neighbor classifier is used to detect target engagement. The system is evaluated in several challenging berry picking scenarios.

Abstract:
In this paper, the issue of model uncertainty in safety-critical control is addressed with a data-driven approach. For this purpose, we utilize the structure of an input-ouput linearization controller based on a nominal model along with a Control Barrier Function and Control Lyapunov Function based Quadratic Program (CBF-CLF-QP). Specifically, we propose a novel reinforcement learning framework which learns the model uncertainty present in the CBF and CLF constraints, as well as other control-affine dynamic constraints in the quadratic program. The trained policy is combined with the nominal model-based CBF-CLF-QP, resulting in the Reinforcement Learning-based CBF-CLF-QP (RL-CBF-CLF-QP), which addresses the problem of model uncertainty in the safety constraints. The performance of the proposed method is validated by testing it on an underactuated nonlinear bipedal robot walking on randomly spaced stepping stones with one step preview, obtaining stable and safe walking under model uncertainty.

Abstract:
In this paper, we propose a deep convolutional recurrent neural network that predicts action sequences for task and motion planning (TAMP) from an initial scene image. Typical TAMP problems are formalized by combining reasoning on a symbolic, discrete level (e.g. first-order logic) with continuous motion planning such as nonlinear trajectory optimization. Due to the great combinatorial complexity of possible discrete action sequences, a large number of optimization/motion planning problems have to be solved to find a solution, which limits the scalability of these approaches. To circumvent this combinatorial complexity, we develop a neural network which, based on an initial image of the scene, directly predicts promising discrete action sequences such that ideally only one motion planning problem has to be solved to find a solution to the overall TAMP problem. A key aspect is that our method generalizes to scenes with many and varying number of objects, although being trained on only two objects at a time. This is possible by encoding the objects of the scene in images as input to the neural network, instead of a fixed feature vector. Results show runtime improvements of several magnitudes.

Abstract:
The use of simple control schemes with only a few basic sensors and no feedback allows improved stability when traversing unforeseen rough terrain by applying a single controller. Exploiting multiple controllers simultaneously can further improve robustness but is often mechanically hard to implement, especially when stiffness modulation is a controller. To overcome this limitation, we investigate and simulate a leg shape that applies variable leg stiffness and free-leg length. The leg shape couples the physical parameters with the leg angle of a monopod, while the leg orientation is governed with only a simple control law during the flight phase. We study the usage of an optimal controller coupling and show that it can increase robustness to perturbations in the initial horizontal velocity when traversing unknown rough terrain. This work presents the process of obtaining the optimal coupled parameters and demonstrates its benefits. This work also lays the foundations for a conceptual leg shape to exhibit the controllers physically.