VAMPIRE___Visual Active Memory Processes and Interactive REtrieval
VAMPIRE Events Publications Consortium Media archive
Intro Research Activities Scenario 'Mobile augmented reality' Scenario 'video annotations' Slideshow
object recognition and learningvisual trackingaction recognitionAR gearself localisationscene analysis
contextual analysisinteraction and augmented realitysystem integration

Interaction is fundamentally important for assistant technologies as investigated in the Scenario "Mobile Augmented Reality". Here, the user is part of the processing loop since he or she controls the system's view and retrieves information via the head mounted display.

Following the "Human in the Loop" paradigm in VAMPIRE, augmented reality techniques are applied in several use cases. Spatial reference allows the system to indicate certain spatial positions of objects or parts of the (three dimensional) scene, which is beneficial for system learning and dialogues("What is the name of this object?"). Furthermore, the system often has to direct the user's attention to certain places. Online feedback about referred positions increases efficiency in human-system interaction.

Visual Interaction Server

For interaction and augmentation a Visual Interaction Server (VIS) has been implemented and integrated as part of the visualisation subsystem. It provides the capability for all other VAMPIRE components to share the AR gear for interaction and augmentation purposes. Any visual interaction between components and the user is mediated by the VIS. It therefore not only carries out the interaction chains, but also priorities interaction request to avoid mental overload of the user. Some of the available augmentation primitives are displayed in the screenshot.

The VIS has an open event architecture allowing different interaction modalities like (pointing) gestures, 3D cursor (see below), head gestures, mouse action and speech to be integrated.


Interaction elements of the VIS

3D Cursor

The stereoscopic 3D cursor is a tool to determine position and size of objects in 3D. Under the assumption that the head pose obtained from the tracking subsystem with respect to artificial CR targets (see self-localisation), translation and rotation between the tracking camera and the firewire cameras used for the video loop, and the vector between firewire camera and object are known (the relations between the camera coordinate systems are determined during an off-line calibration stage)), we can estimate the object's position in 3D.

In order to estimate the vector between firewire camera and object, we have implemented two different approaches: a manual mode for void pointing at arbitrary position in 3D and an automated mode based on stereo reconstruction.

For the manual mode, the impression of distance is generated by manipulating the disparity of a cursor primitive by the user. For the automated mode, disparity is determined by pattern matching.

Basic idea of the 3D cursor                         

Snapshot of the 3D cursor in action

3D Guide

If the VAM has been taught the position of an object (e.g. the Vampire Cup), the AR user can be guided to the object with an AR compass using the head pose of the user (see selflocalization) and the object position stored in the VAM.

Below, snapshots of a guiding session are shown. The user asks for the position of the VAMPIRE cup and is directed towards the last known position of the cup. If the cup is in the field of view, it is marked with a cursor primitive.

Below, snapshots of a guiding session are shown. The user asks for the position of the VAMPIRE cup and is directed towards the last known position of the cup. If the cup is in the field of view, it is marked with a cursor primitive.

Snapshots of the 3D Guide (fixed position compass mode)

Pointing Gestures

Pointing gestures can serve as a more natural input modality for referencing spatial position and item selection. Based on skin colour models the hand is detected in images and gestures are classified. By means of this, the user can reference objects for learning and retrieval, press buttons on the augmented reality user interface or direct the system's attention to certain spatial positions.


Object referencing by pointing gestures. Here the user references a missclassified object for re-training.

Head gestures

The digital motion tracker that is part of the AR gear registers 3D rate-of-turn and acceleration. It therefore provides a convenient means for accurately measuring head motions in real time and thus enables determining a wide range of head gestures as a modality for human-machine. We are considering three different types of head gestures to operate the system: Spatial head gestures are head motions of moderate speed where the head steadily moves in one of the general directions left, right, up and down. Spatial gestures allow for spatial references and indication of directions. Semantic head gestures like nodding and shaking the head are used in many cultures to express agreement or denial. In combination with the Visual Interaction Server (VIS), nodding the head signals the wish to select a button.

Navigating a menu by head gesture

Visualisation (lightfields)

For visualisation of objects, we apply an image-based rendering technique. As a starting point, a video sequence of the object of interest has to be captured. Feature points are either tracked with the feature point tracker, or matched with the help of local SIFT features. Then, a sparse 3-D reconstruction of the object is computed with a structure-from-motion technique. For the efficient incorporation of the scene geometry, a triangle mesh is generated. The illustration below shows the process of object reconstruction.

Steps of creation of a lightfield for image based rendering

Selected Publications