B. Tovar, L. Freda and S. M. LaValle. Learning Combinatorial Map Information from Permutations of Landmarks. To appear in the International Journal of Robotics Research, 2010. [pdf]
Abstract: This paper considers a robot that moves in the plane and is only able to sense the cyclic order of landmarks with respect to its current position. No metric information is available regarding the robot or landmark positions; moreover, the robot does not have a compass or odometers (i.e., knowledge of coordinates). We carefully study the information space of the robot, and establish its capabilities in terms of mapping the environment and accomplishing tasks, such as navigation and patrolling. When the robot moves exclusively inside the convex hull of the set of landmarks, the information space can be succinctly characterized as an order type, which provides information powerful enough to determine which points lie inside the convex hulls of subsets of landmarks. Additionally, if the robot is allowed to move outside the convex hull of the set of landmarks, the information space is described with a swap cell decomposition, which is an aspect graph in which each aspect is a cyclic permutation of landmarks. We show how to construct such decomposition through its dual, using two kinds of feedback motion commands based on the landmarks sensed.[show]
F. Duvallet and A. Stentz. Imitation Learning for Task Allocation. To appear in the Proc. IEEE International Conference on Intelligent Robotics and Systems (IROS), 2010. [pdf]
Abstract: At the heart of multi-robot task allocation lies the ability to compare multiple options in order to select the best. In some domains, this utility computation is not straightforward, for example due to complex and unmodeled underlying dynamics or an adversary in the environment. Explicitly modeling these extrinsic in?uences well enough so that they can be accounted for in utility computation (and thus task allocation) may be intractable, but a human expert may be able to quickly gain some intuition about the form of the desired solution.
We propose to harness the expert’s intuition by applying imitation learning to the multi-robot task allocation domain. Using a market-based task allocation method, we steer the allocation process by biasing prices in the market according to a policy which we learn using a set of demonstrated allocations that represent the expert’s solutions (accounting for external in?uences). We present results in two distinct domains: a disaster response domain where a team of agents must put out fires that are spreading between buildings, and an adversarial game in which teams must make complex strategic decisions to score more points than the opposite team.
[show]
D. Munoz, J. A. Bagnell and M. Hebert. Stacked Hierarchical Labeling. Proc. of the European Conference on Computer Vision (ECCV), 2010. [pdf]
Abstract: In this work we propose a hierarchical approach for labeling semantic objects and regions in scenes. Our approach is reminiscent of early vision literature in that we use a decomposition of the image in order to encode relational and spatial information. In contrast to much existing work on structured prediction for scene understanding, we bypass a global probabilistic model and instead directly train a hierarchical inference procedure inspired by the message passing mechanics of some approximate inference procedures in graphical models. This approach mitigates both the theoretical and empirical difficulties of learning probabilistic models when exact inference is intractable. In particular, we draw from recent work in machine learning and break the complex inference process into a hierarchical series of simple machine learning subproblems. Each subproblem in the hierarchy is designed to capture the image and contextual statistics in the scene. This hierarchy spans coarse-to-fine regions and explicitly models the mixtures of semantic labels that may be present due to imperfect segmentation. To avoid cascading of errors and overfitting, we train the learning problems in sequence to ensure robustness to likely errors earlier in the inference sequence and leverage the stacking approach developed by Cohen et al.[show]
B. D. Ziebart, J. A. Bagnell and A. K. Dey. Modeling Interaction via the Principle of Maximum Causal Entropy. Proc. of the International Conference on Machine Learning (ICML), 2010. [pdf]
Abstract: The principle of maximum entropy provides a powerful framework for statistical models of joint, conditional, and marginal distributions. However, there are many important distributions with elements of interaction and feedback where its applicability has not been established. This work presents the principle of maximum causal entropy–an approach based on causally conditioned probabilities that can appropriately model the availability and influence of sequentially revealed side information. Using this principle, we derive models for sequential data with revealed information, interaction, and feedback, and demonstrate their applicability for statistically framing inverse optimal control and decision prediction tasks.[show]
L. Song, B. Boots, S. M. Siddiqi, G. J. Gordon and A. J. Smola. Hilbert space embeddings of hidden Markov models. Proc. of the International Conference on Machine Learning (ICML), 2010. [pdf]
Abstract: Hidden Markov Models (HMMs) are important tools for modeling sequence data. However, they are restricted to discrete latent states, and are largely restricted to Gaussian and discrete observations. And,learning algorithms for HMMs have predominantly relied on local search heuristics, with the exception of spectral methods such as those described below. We propose a non parametric HMM that extends traditional HMMs to structured and non-Gaussian continuous distributions. Furthermore, we derive a local-minimum-free kernel spectral algorithm for learning these HMMs. We apply our method to robot vision data, slot car inertial sensor data and audio event classification data, and show that in these applications, embedded HMMs exceed the previous state-of-the-art performance.[show]
A. Grubb and J. A. Bagnell. Boosted Backpropagation Learning for Training Deep Modular Networks. Proc. of the International Conference on Machine Learning (ICML), 2010. [pdf]
Abstract: Divide-and-conquer is key to building sophisticated learning machines: hard problems are solved by composing a network of modules that solve simpler problems (LeCun et al., 1998; Rohde, 2002; Bradley, 2009). Many such existing systems rely on learning algorithms which are based on simple parametric gradient descent where the parametrization must be predetermined, or more specialized per-application algorithms which are usually ad-hoc and complicated. We present a novel approach for training generic modular networks that uses two existing techniques: the error propagation strategy of backpropagation and more recent research on descent in spaces of functions (Mason et al., 1999; Scholkopf & Smola, 2001). Combining these two methods of optimization gives a simple algorithm for training heterogeneous networks of functional modules using simple gradient propagation mechanics and established learning algorithms. The resulting separation of concerns between learning individual modules and error propagation mechanics eases implementation, enables a larger class of modular learning strategies, and allows per-module control of complexity/regularization. We derive and demonstrate this functional backpropagation and contrast it with traditional gradient descent in parameter space, observing that in our example domain the method is significantly more robust to local optima. [show]
S. M. Siddiqi, B. Boots and G. J. Gordon. Reduced-rank hidden Markov models. Proc. of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), 2010. [pdf]
Abstract: Hsu et al.(2009) recently proposed an efficient, accurate spectral learning algorithm for Hidden Markov Models (HMMs). In this paper we relax their assumptions and prove a tighter finite-sample error bound for the case of Reduced-Rank HMMs, i.e., HMMs with low-rank transition matrices. Since rank-k RR-HMMs are a larger class of models than k-state HMMs while being equally efficient to work with, this relaxation greatly increases the learning algorithm’s scope. In addition, we generalize the algorithm and bounds to models where multiple observations are needed to disambiguate state, and to models that emit multivariate real-valued observations. Finally we prove consistency for learning Predictive State Representations, an even larger class of models. Experiments on synthetic data and a toy video, as well as on difficult robot vision data, yield accurate models that compare favorably with alternatives in simulation quality and prediction accuracy. [show]
S. Ross and J. A. Bagnell. Efficient Reductions for Imitation Learning. Proc. of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2010. [pdf] [supplementary material]
Abstract: Imitation Learning, while applied successfully on many large real-world problems, is typically addressed as a standard supervised learning problem, where it is assumed the training and testing data are i.i.d.. This is not true in imitation learning as the learned policy influences the future test inputs (states) upon which it will be tested. We show that this leads to compounding errors and a regret bound that grows quadratically in the time horizon of the task. We propose two alternative algorithms for imitation learning where training occurs over several episodes of interaction. These two approaches share in common that the learner’s policy is slowly modified from executing the expert’s policy to the learned policy. We show that this leads to stronger performance guarantees and demonstrate the improved performance on two challenging problems: training a learner to play 1) a 3D racing game (Super Tux Kart) and 2) Mario Bros.; given input images from the games and corresponding actions taken by a human expert and near-optimal planner respectively.[show]
A. Yershova, B. Tovar, M. Katsev, R. Ghrist, and S. M. LaValle. Mapping and pursuit-evasion strategies for a simple wall-following robot. Under review for IEEE Transactions on Robotics, 2010. [pdf]
Abstract: This paper defines and analyzes a simple robot with local sensors that moves in an unknown polygonal environment. The robot can execute wall-following motions and can traverse the interior of the environment only when following parallel to an edge. The robot has no global sensors that would allow precise mapping or localization. Special information spaces are introduced for this particular model. Using these, strategies are presented for solving several tasks: 1) counting vertices, 2) computing the path winding number, 3) learning a combinatorial map, called the cut ordering, that encodes partial geometric information, and 4) solving pursuit-evasion problems. [show]
J. Yu, D. Liberzon, and S. M. LaValle. Rendezvous without coordinates. Under review for IEEE Transactions on Automatic Control, 2010. [pdf]
Abstract: We study minimalism in sensing and control by considering a multi-agent system in which each agent moves like a Dubins car and has a limited sensor that reports only the presence of another agent within some sector of its windshield. Using a very simple quantized control law with three values, each agent tracks another agent assigned to it by maintaining that agent within this windshield sector. We use Lyapunov analysis to show that by acting autonomously in this way, the agents will achieve rendezvous if the initial assignment graph is connected. We then proceed to show that, under a slightly different control law, the sensing model can be weakened further and a connected assignment graph is not necessary. A distinguishing feature of our approach is that it does not involve any estimation procedure aimed at reconstructing coordinate information. Our scenario thus appears to be the first example in which an interesting task is performed with extremely coarse sensing and control, and without state estimation. The system was implemented in computer simulation, accessible through the Web, of which the results are presented in the paper. [show]
B. Boots, G. Gordon, and S. Siddiqi. Closing the Learning-Planning Loop with Predictive State Representations. Proc. of Robotics: Science and Systems VI (RSS), 2010. [pdf]
Abstract: A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must learn an accurate model of our environment, and then plan to maximize reward. Unfortunately, learning algorithms often recover a model which is too inaccurate to support planning or too large and complex for planning to be feasible; or, they require large amounts of prior domain knowledge or fail to provide important guarantees such as statistical consistency. To begin to fill this gap, we propose a novel algorithm which provably learns a compact, accurate model directly from sequences of action-observation pairs. To evaluate the learner, we then close the loop from observations to actions: we plan in the learned model and recover a policy which is nearoptimal in the original environment (not the model). In more detail, we present a spectral algorithm for learning a Predictive State Representation (PSR). We demonstrate the algorithm by learning a model of a simulated high-dimensional, vision-based mobile robot planning task, and then performing approximate point-based planning in the learned model. This experiment shows that the learned PSR captures the essential features of the environment, allows accurate prediction with a small number of parameters, and enables successful and efficient planning. Our algorithm has several benefits which have not appeared together in any previous PSR learner: it is computationally efficient and statistically consistent; it handles high-dimensional observations and long time horizons by working from real-valued features of observation sequences; and finally, our close-the-loop experiments provide an end-to-end practical test. [show]
B. Boots, G. Gordon, and S. Siddiqi. Closing the Learning-Planning Loop with Predictive State Representations (extended abstract). Proc. of the 9th Intl. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), 2010. [pdf]
P. Henry, C. Vollmer, B. Ferris, and D. Fox. Learning to navigate through crowded environments. Proc. of the IEEE International Conference on Robotics & Automation (ICRA), 2010. [pdf]
Abstract: The goal of this research is to enable mobile robots to navigate through crowded environments such as indoor shopping malls, airports, or downtown side walks. The key research question addressed in this paper is how to learn planners that generate human-like motion behavior. Our approach uses inverse reinforcement learning (IRL) to learn human-like navigation behavior based on example paths. Since robots have only limited sensing, we extend existing IRL methods to the case of partially observable environments. We demonstrate the capabilities of our approach using a realistic crowd flow simulator in which we modeled multiple scenarios in crowded environments. We show that our planner learned to guide the robot along the flow of people when the environment is crowded, and along the shortest path if no people are around. [show]
B. Tovar and S. M. LaValle. Searching and mapping among indistinguishable convex obstacles. Proc. of the IEEE International Conference on Robotics & Automation (ICRA), 2010. [pdf]
Abstract: This paper considers a robot that moves in the plane and is only able to sense the cyclic order of landmarks with respect to its current position. No metric information is available regarding the robot or landmark positions; moreover, the robot does not have a compass or odometers (e.g., coordinates). We carefully study the information space of the robot, and establish its capabilities in terms of mapping the environment and accomplishing tasks, such as navigation and patrolling. When the robot moves exclusively inside the convex hull of the set of landmarks, the information space can be nicely characterized as an order type, which provides information powerful enough to determine which points lie inside the convex hulls of subsets of landmarks. Additionally, if the robot is allowed to move outside the convex hull of the set of landmarks, the information space is described with a swap cell decomposition, which is an aspect graph in which each aspect is a cyclic permutation of landmarks. We show how to construct such decomposition through its dual, using two kinds of feedback motion commands based on the landmarks sensed. [show]
J. Yu and S. M. LaValle. Probabilistic shadow information spaces. Proc. of the IEEE International Conference on Robotics & Automation (ICRA), 2010. [pdf]
Abstract: This paper introduces a Bayesian filter that is specifically designed for counting targets that move outside of the field of view while performing a sensor sweep. Information space concepts are used to dramatically reduce the filter complexity so that information is processed only when the shadow region (all points invisible to the sensors) changes combinatorially or targets pass in and out of view. Previous work assumed perfect observations; however, this paper extends the approach to enable probabilistic disturbances. Practical algorithms are introduced, implemented, and demonstrated for computing the filter outputs based on realistic data. [show]
J. Ramos, S. Siddiqi, A. Dubrawski, and G. Gordon. Automatic State Discovery for Unstructured Audio Scene Classification. Proc. in 35th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2010. [pdf]
Abstract: In this paper we present a novel scheme for unstructured audio scene classification that possesses three highly desirable and powerful features: autonomy, scalability, and robustness. Our scheme is based on our recently introduced machine learning algorithm called Simultaneous Temporal And Contextual Splitting (STACS) that discovers the appropriate number of states and efficiently learns accurate Hidden Markov Model (HMM) parameters for the given data. STACS-based algorithms train HMMs up to five times faster than Baum-Welch, avoid the overfitting problem commonly encountered in learning large state-space HMMs using Expectation Maximization (EM) methods such as Baum-Welch, and achieve superior classification results on a very diverse dataset with minimal pre-processing. Furthermore, our scheme has proven to be highly effective for building real-world applications and has been integrated into a commercial surveillance system as an event detection component. [show]
J. J. Kuffner and S. M. LaValle. Space-filling trees. Technical Report CMU-RI-TR-09-47, Robotics Institute, Carnegie Mellon University, 2009. [pdf]
Abstract: This paper introduces the notion of space-filling trees, which are analogous to space-filling curves, but have a branching, tree-like structure and are rooted. A space-filling tree is defined by an incremental process that results in a tree for which every point in the space has a finite-length path that converges to it. In contrast to space-filling curves, individual paths in the tree are short, allowing any part of the space to be quickly reached from the root. These structures have interesting parallels in nature, including fluid distribution systems, vascular networks, and fractal plant growth, and may have many uses in engineering and computer science. We provide basic examples, general definitions and constructions, characterizations of some tree properties, and leave many open mathematical questions. [show]
S. Siddiqi, B. Boots, and G. Gordon. Reduced-Rank Hidden Markov Models. Technical Report, 2009. [arxiv]
Abstract: We introduce the Reduced-Rank Hidden Markov Model (RR-HMM), a generalization of HMMs that can model smooth state evolution as in Linear Dynamical Systems (LDSs) as well as non-log-concave predictive distributions as in continuous-observation HMMs. RR-HMMs assume an m-dimensional latent state and n discrete observations, with a transition matrix of rank k <= m. This implies the dynamics evolve in a k-dimensional subspace, while the shape of the set of predictive distributions is determined by m. Latent state belief is represented with a k-dimensional state vector and inference is carried out entirely in R^k, making RR-HMMs as computationally efficient as k-state HMMs yet more expressive. To learn RR-HMMs, we relax the assumptions of a recently proposed spectral learning algorithm for HMMs (Hsu, Kakade and Zhang 2009) and apply it to learn k-dimensional observable representations of rank-k RR-HMMs. The algorithm is consistent and free of local optima, and we extend its performance guarantees to cover the RR-HMM case. We show how this algorithm can be used in conjunction with a kernel density estimator to efficiently model high-dimensional multivariate continuous data. We also relax the assumption that single observations are sufficient to disambiguate state, and extend the algorithm accordingly. Experiments on synthetic data and a toy video, as well as on a difficult robot vision modeling problem, yield accurate models that compare favorably with standard alternatives in simulation quality and prediction capability. [show]
S. M. LaValle. Filtering and planning in information spaces. Technical report, Department of Computer Science, University of Illinois, 2009. [pdf]
Abstract: This tutorial presents a fresh perspective on modeling sensors and then using them for filtering and planning. The concepts and tools are motivated by many problems of current interest, such as tracking, monitoring, navigation, pursuit-evasion, exploration, and mapping. First, an overview of sensors that appear in numerous systems is presented. Following this, the notion of a virtual sensor is explained, which provides a mathematical way to model numerous sensors while abstracting away their particular physical implementation. Dozens of useful models are given. In the next part, a new perspective on filtering is given based on information spaces. This includes classics such as the Kalman and Bayesian filters; however, it also opens up a new family of reduced-complexity filters that try to maintain as little information as possible while performing their required task. Finally, the planning problem is presented in terms of filters and information spaces. [show]
J. Ko and D. Fox. Learning GP-BayesFilters via Gaussian Process Latent Variable Models. Proc. Robotics: Science and Systems V, 2009. [pdf]
Abstract: GP-BayesFilters are a general framework for integrating Gaussian process prediction and observation models into Bayesian filtering techniques, including particle filters and extended and unscented Kalman filters. GP-BayesFilters learn nonparametric filter models from training data containing sequences of control inputs, observations, and ground truth states. The need for ground truth states limits the applicability of GP-BayesFilters to systems for which the ground truth can be estimated without prohibitive overhead. In this paper we introduce GPBF-LEARN, a framework for training GP-BayesFilters without any ground truth states. Our approach extends Gaussian Process Latent Variable Models to the setting of dynamical robotics systems. We show how weak labels for the ground truth states can be incorporated into the GPBF-LEARN framework. The approach is evaluated using a difficult tracking task, namely tracking a slotcar based on IMU measurements only. [show]