Aravind Sundaresan

Publications

The copyright of these papers is with the respective publishers. They are reproduced here for the timely dissemination of scholarly information.

Ph.D. Thesis

[1] Aravind Sundaresan. Towards Markerless Motion Capture: Model estimation, Initialization and Tracking. PhD thesis, University of Maryland, College Park, MD 20740, 2007. [ .pdf ] [abstract]
Motion capture is an important application in diverse areas such as bio-mechanics, computer animation, and human-computer interaction. Current motion capture methods use markers that are attached to the body of the subject and are therefore intrusive. In applications such as pathological human movement analysis, these markers may introduce unknown artifacts in the motion and are, in general, cumbersome. We present a computer vision based system for markerless human motion capture that uses images obtained from multiple synchronized and calibrated cameras. We model the human body as a set of rigid segments connected in articulated chains. We use a volumetric representation (voxels) of the subject using images obtained from the cameras in our work. We propose a novel, bottom-up approach to segment the voxels into different articulated chains based on their mutual connectivity, by mapping the voxels into Laplacian Eigenspace. We prove properties of the mapping that show that it is ideal for mapping voxels on non-rigid chains in normal space to nodes that lie on smooth 1D curves in Laplacian Eigenspace. We then use a 1D spline fitting procedure to segment the nodes according to which 1D curve they belong to. The segmentation is followed by a top-down approach that uses our knowledge of the structure of the human body to register the segmented voxels to different articulated chains such as the head, trunk and limbs. We propose a hierarchical algorithm to simultaneously initialize and estimate the pose and body model parameters for the subject. Finally, we propose a tracking algorithm that uses the estimated human body model and the initialized pose for a single frame of a given sequence to track the pose for the remainder of the frames. The tracker uses an iterative algorithm to estimate the pose, that combines both motion and shape cues in a predictor-corrector framework. The motion and shape cues complement each other and overcome drift and local minima problems. We provide results on 3D laser scans, synthetic data, and real video sequences with different subjects for our segmentation, model and pose estimation algorithms.

Slides from the talk [ .pdf ] With videos [ .tgz ]

Book chapters

[1] Amit Kale, Aravind Sundaresan, Amit RoyChowdhury, and Rama Chellappa. Gait-based human identification from a monocular video sequence. In C. H. Chen and P. S. P. Wang, editors, Handbook on Pattern Recognition and Computer Vision. World Scientific Publishing Company Pvt. Ltd., 2005. [ .pdf ] [abstract]
Human gait is a spatio-temporal phenomenon that characterizes the motion characteristics of an individual. It is possible to detect and measure gait even in low-resolution video. In this chapter, we discuss algorithms for identifying people by their gait from a monocular video sequence. Human identification using gait, similar to text-based speaker identification, involves different individuals performing the same task and a template-matching approach is suitable for such problems. In situations where the amount of training data is limited, we demonstrate the utility of a simple width feature for gait recognition. By virtue of their deterministic nature, template matching methods have limited noise resilience. In order to deal with noise we introduce a systematic approach to gait recognition by building representations for the structural and dynamic components of gait using exemplars and hidden Markov models (HMMs). The above methods assume that an exact side-view of the subject is available in the probe sequence. For the case when the person walks at an arbitrary angle far away from the camera we present a view invariant gait recognition algorithm which is based on synthesizing a side view of a person from an arbitrary monocular view.

[2] Aravind Sundaresan and Rama Chellappa. Markerless motion capture using multiple cameras. In Christopher Jaynes and Robert Collins, editors, Computer Vision for Interactive and Intelligent Environments. IEEE Press, 2005. [ .pdf ] [abstract]
Motion capture has important applications in different areas such as biomechanics, computer animation, and human-computer interaction. Current motion capture methods use passive markers that are attached to different body parts of the subject and are therefore intrusive in nature. In applications such as pathological human move- ment analysis, these markers may introduce an unknown artifact in the motion, and are, in general, cumbersome. We present computer vision based methods for performing markerless human motion capture. We model the hu- man body as a set of super-quadrics connected in an articulated structure and propose algorithms to estimate the parameters of the model from video sequences. We compute a volume data (voxel) representation from the images and combine bottom-up approach with top down approach guided by our knowledge of the model. We propose a tracking algorithm that uses this model to track human pose. The tracker uses an iterative framework akin to an Iterated Extended Kalman Filter to estimate articulated human motion using multiple cues that combine both spatial and temporal information in a novel manner. We provide preliminary results using data collected from 8-16 cameras. The emphasis of our work is on models and algorithms that are able to scale with respect to the requirement for accuracy. Our ultimate objective is to build an end-to-end system that can integrate the above mentioned components into a completely automated markerless motion capture system.

Journal articles

[1] Aravind Sundaresan and Rama Chellappa. Multi-camera tracking of articulated human motion using shape and motion cues. IEEE Transactions on Image Processing, 18(9):2114-2126, September 2009. [ .pdf ] [abstract]
We present a completely automatic algorithm for initializing and tracking the articulated motion of humans using image sequences obtained from multiple cameras. A detailed articulated human body model composed of sixteen rigid segments that allows both translation and rotation at joints is used. Voxel data of the subject obtained from the images is segmented into the different articulated chains using Laplacian Eigenmaps. The segmented chains are registered in a subset of the frames using a single-frame registration technique and subsequently used to initialize the pose in the sequence. A temporal registration method is proposed to identify the partially segmented or unregistered articulated chains in the remaining frames in the sequence. The proposed tracker uses motion cues such as pixel displacement as well as 2D and 3D shape cues such as silhouettes, motion residue and skeleton curves. The tracking algorithm consists of a predictor that uses motion cues and a corrector that uses shape cues. The use of complementary cues in the tracking alleviates the twin problems of drift and convergence to local minima. The use of multiple cameras also allows us to deal with the problems due to self-occlusion and kinematic singularity. We present tracking results on sequences with different kinds of motion to illustrate the effectiveness of our approach. The pose of the subject is correctly tracked for the duration of the sequence as can be verified by inspection.

[2] Radu Bogdan Rusu, Aravind Sundaresan, Benoit Morisset, Kris Hauser, Motilal Agrawal, Jean-Claude Latombe, and Michael Beetz. Leaving Flatland: Efficient Real-Time three-dimensional perception and motion planning. Journal of Field Robotics: Special Issue on Three-Dimensional Mapping, 26(10), September 2009. [ .pdf ] [abstract]
In this article we present the complete details of the architecture and implementation of Leaving Flatland, an exploratory project that attempts to surmount the challenges of closing the loop between autonomous perception and action on challenging terrain. The proposed system includes comprehensive localization, mapping, path planning and visualization techniques for a mobile robot to operate autonomously in complex 3D indoor and outdoor environments. In doing so we integrate robust Visual Odometry localization techniques with real-time 3D mapping methods from stereo data to obtain consistent global models annotated with semantic labels. These models are used by a multi-region motion planner which adapts existing 2D planning techniques to operate in 3D terrain. All the system components are evaluated on a variety of real world data sets, and their computational performance is shown to be favorable for high-speed autonomous navigation.

[3] Kurt Konolige, Motilal Agrawal, Morten Rufus Blas, Robert C. Bolles, Brian Gerkey, Joan Sola, and Aravind Sundaresan. Mapping, Navigation, and Learning for Off-Road Traversal. Journal of Field Robotics: Special Issue on LAGR Program, 26(1), December 2008. [ .pdf ] [abstract]
We present a completely automatic algorithm for initializing and tracking the articulated motion of humans using image sequences obtained from multiple cameras. A detailed articulated human body model composed of sixteen rigid segments that allows both translation and rotation at joints is used. Voxel data of the subject obtained from the images is segmented into the different articulated chains using Laplacian Eigenmaps. The segmented chains are registered in a subset of the frames using a single-frame registration technique and subsequently used to initialize the pose in the sequence. A temporal registration method is proposed to identify the partially segmented or unregistered articulated chains in the remaining frames in the sequence. The proposed tracker uses motion cues such as pixel displacement as well as 2D and 3D shape cues such as silhouettes, motion residue and skeleton curves. The tracking algorithm consists of a predictor that uses motion cues and a corrector that uses shape cues. The use of complementary cues in the tracking alleviates the twin problems of drift and convergence to local minima. The use of multiple cameras also allows us to deal with the problems due to self-occlusion and kinematic singularity. We present tracking results on sequences with different kinds of motion to illustrate the effectiveness of our approach. The pose of the subject is correctly tracked for the duration of the sequence as can be verified by inspection.

[4] Aravind Sundaresan and Rama Chellappa. Model driven segmentation and registration of articulating humans in Laplacian Eigenspace. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(10):1771-1785, October 2008. [ .pdf ] [abstract]
We propose a general approach using Laplacian Eigenmaps and a graphical model of the human body to segment 3D voxel data of humans into different articulated chains. In the bottom-up stage, the voxels are transformed into a high-dimensional (6D or less) Laplacian Eigenspace (LE) of the voxel neighborhood graph. We show that LE is effective at mapping voxels on long articulated chains to nodes on smooth 1D curves that can be easily discriminated, and prove these properties using representative graphs. We fit 1D splines to voxels belonging to different articulated chains such as the limbs, head and trunk, and determine the boundary between splines using the spline fitting error. A top-down probabilistic approach is then used to register the segmented chains, utilizing their mutual connectivity and individual properties. Our approach enables us to deal with complex poses such as those where the limbs form loops. We use the segmentation results to automatically estimate the human body models. While we use human subjects in our experiments, the method is fairly general and can be applied to voxel-based segmentation of any articulated object composed of long chains. We present results on real and synthetic data that illustrate the usefulness of this approach.

[5] Amit A. Kale, Aravind Sundaresan, A. N. Rajagopalan, Naresh P. Cuntoor, Amit K. Roy Chowdhury, Volker Krüger, and Rama Chellappa. Identification of humans using gait. IEEE Transactions on Image Processing, 13(9):1163-1173, September 2004. [ .pdf ] [abstract]
We present a completely automatic algorithm for initializing and tracking the articulated motion of humans using image sequences obtained from multiple cameras. A detailed articulated human body model composed of sixteen rigid segments that allows both translation and rotation at joints is used. Voxel data of the subject obtained from the images is segmented into the different articulated chains using Laplacian Eigenmaps. The segmented chains are registered in a subset of the frames using a single-frame registration technique and subsequently used to initialize the pose in the sequence. A temporal registration method is proposed to identify the partially segmented or unregistered articulated chains in the remaining frames in the sequence. The proposed tracker uses motion cues such as pixel displacement as well as 2D and 3D shape cues such as silhouettes, motion residue and skeleton curves. The tracking algorithm consists of a predictor that uses motion cues and a corrector that uses shape cues. The use of complementary cues in the tracking alleviates the twin problems of drift and convergence to local minima. The use of multiple cameras also allows us to deal with the problems due to self-occlusion and kinematic singularity. We present tracking results on sequences with different kinds of motion to illustrate the effectiveness of our approach. The pose of the subject is correctly tracked for the duration of the sequence as can be verified by inspection.

Conference articles

[1] Benoit Morisset, Radu Bogdan Rusu, Aravind Sundaresan, Kris Hauser, Motilal Agrawal, Jean-Claude Latombe, and Michael Beetz. Leaving Flatland: Toward Real-Time 3D Navigation. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Kobe, Japan, May 2009. [ .pdf ] [abstract]
We report our first experiences with Leaving Flatland, an exploratory project that studies the key challenges of closing the loop between autonomous perception and action on challenging terrain. We propose a comprehensive system for localization, mapping, and planning for the RHex mobile robot in fully 3D indoor and outdoor environments. This system integrates Visual Odometry-based localization with new techniques in real-time 3D mapping from stereo data. The motion planner uses a new decomposition approach to adapt existing 2D planning techniques to operate in 3D terrain. We test the map-building and motion-planning subsystems on real and synthetic data, and show that they have favorable computational performance for use in high-speed autonomous navigation.

[2] Radu Bogdan Rusu, Aravind Sundaresan, Benoit Morisset, Motilal Agrawal, and Michael Beetz. Leaving Flatland: Realtime 3D Stereo Semantic Reconstruction. In Proceedings of the International Conference on Intelligent Robotics and Applications, volume 5314, pages 921-932, Wuhan, China, October 2008. [ .pdf ] [abstract]
We report our first experiences with Leaving Flatland, an exploratory project which studies the key challenges in closing the loop on autonomous perception and action in challenging terrain. A primary objective of the project is to demonstrate the acquisition and processing of robust 3D geometric model maps from stereo data and Visual Odometry techniques. The 3D geometric model is used to infer different terrain types and construct a 3D semantic model which can be used for path planning or teleoperation. This paper presents the set of methods and techniques used for building such a model, and provides insight on the mathematical optimizations used for obtaining realtime processing. To validate our approach, we show results obtained on multiple datasets and perform a comparison with other similar initiatives.

[3] Rufus Blas, Motilal Agrawal, Aravind Sundaresan, and Kurt Konolige. Fast color/texture segmentation for outdoor robots. In IEEE International Conference on Intelligent Robots and Systems, pages 4078-4085, Nice, France, September 2008. [ .pdf ] [abstract]
We present a fast integrated approach for online segmentation of images for outdoor robots. A compact color and texture descriptor has been developed to describe local color and texture variations in an image. This descriptor is then used in a two stage fast clustering framework using K-means to perform online segmentation of natural images. We present results of applying our descriptor for segmenting a synthetic image and compare it against other state-of-the-art descriptors. We also apply our segmentation algorithm to the task of detecting natural paths in outdoor images. The whole system has been demonstrated to work online alongside localization, 3D obstacle detection and planning.

[4] Aravind Sundaresan and Rama Chellappa. Segmentation and probabilistic registration of articulated body model. In Proc. of the International Conference on Pattern Recognition, volume 2, pages 92-96, Hong Kong, China, August 2006. [Best Student Paper Award in Computer Vision and Image Analysis]. [ .pdf ] [abstract]
There are different approaches to pose estimation and registration of different body parts using voxel data. We propose a general bottom-up approach in order to segment the voxels into different body parts. The voxels are first transformed into a high dimensional space which is the eigenspace of the Laplacian of the neighbourhood graph. We exploit the properties of this transformation and fit splines to the voxels belonging to different body segments in eigenspace. The boundary of the splines is determined by examination of the error in spline fitting. We then use a probabilistic approach to register the segmented body segments by utilizing their connectivity and prior knowledge of the general structure of the subjects. We present results on real data, containing both simple and complex poses. While we use human subjects in our experiment, the method is fairly general and can be applied to voxel-based registration of any articulated or non-rigid object composed of primarily 1-D parts.

[5] Aravind Sundaresan and Rama Chellappa. Acquisition of articulated human body models using multiple cameras. In Proc. of the Conference on Articulated Motion and Deformable Objects, pages 78-89, Port d'Andratx, Mallorca, Spain, July 2006. [ .pdf ] [abstract]
Motion capture is an important application in different areas such as biomechanics, computer animation, and human-computer interaction. Current motion capture methods typically use human body models in order to guide pose estimation and tracking. We model the human body as a set of tapered super-quadrics connected in an articulated structure and propose an algorithm to automatically estimate the parameters of the model using video sequences obtained from multiple calibrated cameras. Our method is based on the fact that the human body is constructed of several articulated chains that can be visualised as essentially 1-D segments embedded in 3-D space and connected at specific joint locations. The proposed method first computes a voxel representation from the images and maps the voxels to a high dimensional space in order to extract the 1-D structure. A bottom-up approach is then suggested in order to build a parametric (spline-based) representation of a general articulated body in the high dimensional space followed by a top-down probabilistic approach that registers the segments to the known human body model. We then present an algorithm to estimate the parameters of our model using the segmented and registered voxels.

[6] Aravind Sundaresan and Rama Chellappa. Multi-camera tracking of articulated human motion using motion and shape. In Proc. of the Asian Conference on Computer Vision, volume 2, pages 131-140, Hyderabad, India, January 2006. [ .pdf ] [abstract]
We present a framework and algorithm for tracking articulated motion for humans. We use multiple calibrated cameras and an articulated human shape model. Tracking is performed using motion cues as well as image-based cues (such as silhouettes and “motion residues” hereafter referred to as spatial cues,) as opposed to constructing a 3D volume image or visual hulls. Our algorithm consists of a predictor and corrector: the predictor estimates the pose at the t + 1 using motion information between images at t and t + 1. The error in the estimated pose is then corrected using spatial cues from images at t + 1. In our predictor, we use robust multi-scale parametric optimisation to estimate the pixel displacement for each body segment. We then use an iterative procedure to estimate the change in pose from the pixel displacement of points on the individual body segments. We present a method for fusing information from different spatial cues such as silhouettes and “motion residues” into a single energy function. We then express this energy function in terms of the pose parameters, and find the optimum pose for which the energy is minimised.

[7] Lars Mündermann, Stefano Corazza, Ajit M. Chaudhari, Thomas P. Andriacchi, Aravind Sundaresan, and Rama Chellappa. Measuring human movement for biomechanical applications using markerless motion capture. In Proc. of SPIE Three-Dimensional Image Capture and Applications, volume 6056, January 2006. [ .pdf ] [abstract]
Modern biomechanical and clinical applications require the accurate capture of normal and pathological human movement without the artifacts associated with standard marker-based motion capture techniques such as soft tissue artifacts and the risk of artificial stimulus of taped-on or strapped-on markers. In this study, the need for new markerless human motion capture methods is discussed in view of biomechanical applications. Three different approaches for estimating human movement from multiple image sequences were explored. The first two approaches tracked a 3D articulated model in 3D representations constructed from the image sequences, while the third approach tracked a 3D articulated model in multiple 2D image planes. The three methods are systematically evaluated and results for real data are presented. The role of choosing appropriate technical equipment and algorithms for accurate markerless motion capture is critical. The implementation of this new methodology offers the promise for simple, time-efficient, and potentially more meaningful assessments of human movement in research and clinical practice.

[8] Aravind Sundaresan, Amit Roy-Chowdhury, and Rama Chellappa. Multiple view tracking of human motion modelled by kinematic chains. In Proc. of the IEEE International Conference on Image Processing, volume 2, pages 1009-1012, Singapore, October 2004. [ .pdf ] [abstract]
We use a kinematic chain to model human body motion. We estimate the kinematic chain motion parameters using pixel displacements calculated from video sequences obtained from multiple calibrated cameras to perform tracking. We derive a linear relation between the 2D motion of pixels in terms of the 3D motion parameters of various body parts using a perspective projection model for the cameras, a rigid body motion model for the base body, and the kinematic chain model for the body parts. An error analysis of the estimator is provided, leading to an iterative algorithm for calculating the motion parameters from the pixel displacements. We provide experimental results to demonstrate the accuracy of our formulation. We also compare our iterative algorithm to the non-iterative algorithm and discuss its robustness in the presence of noise.

[9] Aravind Sundaresan, Amit Roy-Chowdhury, and Rama Chellappa. 3D Modeling of Human Motion using Kinematic Chains and Multiple Cameras for Tracking. In Proc. of the Eigth International Symposium on the 3-D Analysis of Human Movement, Tampa, Florida, March 2004. [ .pdf ]
[10] Aravind Sundaresan, Rama Chellappa, and Amit Roy-Chowdhury. A hidden markov model based framework for recognition of humans from gait sequences. In Proc. of the IEEE International Conference on Image Processing, volume 2, pages 93-96, Barcelona, Spain, September 2003. [ .pdf ] [abstract]
In this paper we propose a generic framework based on Hidden Markov Models (HMMs) for recognition of individuals from their gait. The HMM framework is suitable, because the gait of an individual can be visualized as his adopting postures from a set, in a sequence which has an underlying structured probabilistic nature. The postures that the individual adopts can be regarded as the states of the HMM and are typical to that individual and provide a means of discrimination. The framework assumes that, during gait, the individual transitions between N discrete postures or states but it is not dependent on the particular feature vector used to represent the gait information contained in the postures. The framework, thus, provides flexibility in the selection of the feature vector. The statistical nature of the HMM lends robustness to the model. In this paper we use the binarized background-subtracted image as the feature vector and use different distance metrics, such as those based on the L1 and L2 norms of the vector difference, and the normalized inner product of the vectors, to measure the similarity between feature vectors. The results we obtain are better than the baseline recognition rates reported before.