Research‎ > ‎

Efficient Online Structured Output Learning for Keypoint-Based Object Tracking


Efficient keypoint-based object detection methods are used in many real-time computer vision applications. These approaches often model an object as a collection of keypoints and associated descriptors, and detection then involves first constructing a set of correspondences between object and image keypoints via descriptor matching, and subsequently using these correspondences as input to a robust geometric estimation algorithm such as RANSAC to find the transformation of the object in the image. In such approaches, the object model is generally constructed offline, and does not adapt to a given environment at runtime. Furthermore, the feature matching and transformation estimation stages are treated entirely separately. In this paper, we introduce a new approach to address these problems by combining the overall pipeline of correspondence generation and transformation estimation into a single structured output learning framework.

Following the recent trend of using efficient binary descriptors for feature matching, we also introduce an approach to approximate the learned object model as a collection of binary basis functions which can be evaluated very efficiently at runtime. Experiments on challenging video sequences show that our algorithm significantly improves over state-of-the-art descriptor matching techniques using a range of descriptors, as well as recent online learning based approaches.


Efficient Online Structured Output Learning for Keypoint-Based Object Tracking
Sam Hare, Amir SaffariPhilip H. S. Torr
Computer Vision and Pattern Recognition (CVPR), 2012


Code is available here.


The sequences used for the experiments in the paper can be downloaded here: