VAMPIRE___Visual Active Memory Processes and Interactive REtrieval
VAMPIRE Events Publications Consortium Media archive
Intro Research Activities Scenario 'Mobile augmented reality' Scenario 'video annotations' Slideshow
object recognition and learningvisual trackingaction recognitionAR gearself localisationscene analysis
contextual analysisinteraction and augmented realitysystem integration

Visual tracking

An integral part of the Vampire system is the visual tracking component. Visual tracking divides into two main tasks:

Object Tracking

Which regions in an image sequence are worth to be tracked? In VAMPIRE, tracking of objects is initiated from or object recognition results. Visual tracking of the object region allow to compute a trajectory to be fed into action recognition modules, to acquired different views of an object for learning of new objects. Several approaches coping with different requirements are studied in VAMPIRE.

Colour based object tracking

The object tracking system of Vampire has to fulfil several requirements. It has to

  • be real-time capable (i.e. process at least 15 frames per seconds),
  • be robust against illumination changes, occlusions, and other kinds of appearance changes,
  • tolerate moving cameras,
  • cope with strong object movement in the image plane,
  • be completely data-driven (i.e. work without model knowledge of the object).

Several different object tracking approaches were investigated. Especially colour histogram based techniques proved to be robust and accurate. For the Vampire system, we use HS-V colour histogram features. Pixels that are weakly saturated are accumulated in a V histogram, the others are inserted into a HS histogram. An example for HS-V histograms is given in the following illustration.

Histogram-based tracking algorithms detect the region whose colour histogram is most similar to the object's colour histogram. This is an optimisation-based approach and different optimisation techniques were evaluated. One of them is a probabilistic approach that applies a particle filter. This method proved to be very robust even in case of strong object movements. Methods based on local optimisation techniques (e.g. the mean-shift algorithm) showed to be computationally efficient (less than 9 milliseconds per frame) and very accurate. In the video below, the Vampire helmet was tracked by the colour histogram-based particle filter.

It is not only important to know the 2-D coordinates of the object in the image plane. 3-D information about the object's location is also a vital contribution to the whole system. For this, we investigated multi occular object tracking approaches. One approach that provides very fast and accurate 3-D estimations has an average error of less than one centimetre in the conducted experiments.



Examples of HS-V histograms

AVI 640X480 Pixel MPG 640X480 Pixel

Model based 3D object tracking

Several different algorithms for model-based tracking were evaluated. A probabilistic extension of the hyperplane tracker, which uses a large number of reference templates obtained in a training step, was implemented. Another approach, the 3-D hyperplane tracker, uses a 3-D model of the tracked object to estimate the position and orientation of the object. The point features of the 3-D model are acquired with the scale-invariant feature transform (SIFT), which is also used for initialising the 3-D tracking algorithm. Finally, a tracker that combines a 3-D model of SIFT features with a data-driven feature point tracker for robust real-time estimation of the pose of the object was developed. All of these approaches have the ability to estimate all six pose parameters in real-time. The last approach works in real-time and proved to be the most robust in our experimental evaluations. In the following video sequence, a package of juice was tracked with this approach. Red dots illustrate points which are tracked independently by the feature point tracker. Small green dots represent reprojected model points.

If the camera is at a fixed position (i.e. confined to rotating and zooming), the global motion in the image domain can be represented by a simpler geometric model - a homography. The tracked points from the low-level tracker can be used to determine the homography and hence to track the motion of the whole scene.

AVI 640X480 Pixel MPG 640X480 Pixel

Tennis ball tracking

Tracking the ball in tennis video is a great challenge, because the ball is very fast compared to the usual 25Hz image refresh rate of common video material. Thus, the large distance the ball covers from one image to the other complicates visual tracking.

In foreground blob recognition, tennis ball candidates are detected, with possibly false positives and false negatives. Since a trajectory is a collection of ball candidates, tennis ball tracking problem can then be considered as a searching problem: to search an optimal trajectory from all possible combination of candidates, which minimise some kind of cost function. Such a cost function can be constructed using mismatch between predicted positions and observed positions of the object, i.e., the innovation, in some way. A simple one, for instance, would be the cumulative likelihood defined in the standard Kalman filter. In practice, however, there is a major difficulty in performing such an optimal searching. Since the number of possible trajectories grows exponentially as new frames are acquired, the computational power needed will soon become astronomical figure. We propose a sub-optimal searching algorithm, which trades optimality for computational efficiency. The standard Kalman filter is extended to accommodate multiple hypotheses.


Making multiple hypotheses

We also developed a tennis ball tracking algorithm based on particle filter. By separating object detection and object tracking, drawing samples directly from posterior density becomes possible. As a result, an improved sampling efficiency is achieved. By incorporating tennis player tracking result, two dynamic models switch automatically according to the distance between a particle and the tennis players. Tracking robustness against abrupt motion change is increased. The trajectory is sufficiently accurate for key event detection.  

Particle filter based ball tracker

Tennis player tracking

A relatively simple method for tracking the players in a tennis court, albeit quite efficient in this context, is to initially (for the first frame, that is) detect the players at any plausible court position (but preferably close to the court lines, as they always stand there at the beginning of play) and then only allow them to move within a small distance from one frame to the next. But we also adopt an adaptive colour-based particle filter to track the tennis players. Foreground moving objects are extracted using background subtraction. The tracker is initialised by detecting smoothly moving foreground objects. After initialisation, a colour histogram of the pixels inside player bounding box is constructed. The histogram then serves as a template. Bhattacharyya distance between the template and each particle is used to weight the particle. The histogram template is updated online, to handle appearance shift.


Tracking of tennis players

Feature point tracking

Another different component of visual tracking is the feature point tracker. It is used for real-time self localisation, augmented reality, mosaicing, and supports the object tracker that has been already described above. The feature point tracker is based on the Shi-Tomasi-Kanade tracker. In order to meet the requirements with respect to computational efficiency and robustness, this approach was enhanced in several key areas.

The application of a linear illumination model reduced the sensitivity of the feature point tracker to illumination changes. This is a very critical point, as a change of the angle between the object's surface and the light source can lead to large changes of the intensities. Also, the auto exposure of the camera leads to fluctuations of the brightness.

For reducing of the computation time, several improvements were developed. One of them is an efficient hierarchical search for new features during run-time. Also, for ego-motion estimation, the traditional gradient descent algorithm for translation estimation was replaced with a block matching algorithm, which requires much lower overhead per frame and therefore allow much higher frame rates. We reach a frame rate of about 160 frames per seconds for 30 features on a personal computer with a 2.4 GHz Intel P4 CPU.

AVI 640X480 Pixel MPG 640X480 Pixel

Selected Publications

  • Bajramovic, F. and Graeßl, Ch. and Denzler, J.
    Efficient Combination of Histograms for Real-Time Tracking Using Mean-Shift and Trust-Region Optimization
    Proc. Pattern Recognition Symposium (DAGM), Springer , 2005.
  • Deutsch, B. and Graessl, Ch. and Bajramovic, F. and Denzler, J.
    A Comparative Evaluation of Template and Histogram Based 2-d Tracking Algorithms
    Proc. Pattern Recognition Symposium (DAGM), Springer , 2005.
  • Gräßl, C.; Zinßer, T. & Niemann, H.
    A Probabilistic Model-Based Template Matching Approach for Robust Object Tracking in Real-Time
    Girod, B.; Magnor, M. & Seidel, H. (ed.) Vision, Modeling, and Visualization, Aka / IOS Press, Berlin, Amsterdam , 81-88, 2004.
  • Zinßer, T.; Gräßl, C. & Niemann, H.
    Efficient Feature Tracking for Long Video Sequences
    Rasmussen, C.E.; Bülthoff, H.H.; Giese, M.A. & Schölkopf, B. (ed.) Pattern Recognition, 26th DAGM Symposium, Springer-Verlag, Berlin, Heidelberg, New York , 326-333, 2004.
  • Yan, F.; Christmas, W.; Kittler, J.
    A Tennis Ball Tracking Algorithm for Automatic Annotation of Tennis Match
    The British Machine Vision Conference, to appear, 2005.