VAMPIRE___Visual Active Memory Processes and Interactive REtrieval
VAMPIRE Events Publications Consortium Media archive
Pommersfelden II Pommersfelden I Den Haag

1st Joint VAMPIRE Workshop with Industry

May 26/27, 2004, Pommersfelden near Erlangen, Germany

The workshop was organised as a dialogue between Cognitive Computer Vision research as conducted by the VAMPIRE consortium and industry experience, applications and demands. 45 persons participated in the workshop. Besides 23 researches from the VAMPIRE partners, 8 representatives from medium-sized companies (DTS Medien GmbH, Herford, Germany; Imagination Computer Services, Wien, Austria; Sira Ltd., Kent, UK; VRVis GmbH für Virtual Reality und Visualisierung, Wien, Austria; Busch-Jaeger Elektro GmbH, Lüdenscheid, Germany; Frequentis Innovations Graz, Austria; Ars Electronica Futurelab, Linz, Austria; Alicona Imaging GmbH, Graz, Austria), and 11 persons in leading research positions of corporate companies (DaimlerChrysler AG, Siemens AG, Honda Research Institute Europe GmbH) attended the workshop. Additionally, Patrick Courtney representeted the ECVision research network, and Cécile Huet the European Commission. As a University cooperation partner Joachim Denzler, University of Jena, participated in the workshop.

The project seems to be on a good track of industry relevant research. Applications are seen on the mid and long term period, while partial results might be used in a short-term range, too. The report summaries results of the workshop, points to reporting in Press, Radio and Television, and presents evaluation results of a questionnaire done by the industrial participants.


Technical Program

Thu May 26, VAMPIRE Presentations I, 13:00-15:00

Real-Time Tracking and Image-Based Modeling

Heinrich Niemann, Friedrich Alexander Universität Erlangen-Nürnberg

We present data-driven and model-based methods for robust real-time object tracking in a real environment. These processes are vital for the VAMPIRE system, since they are used by the object recognition and the motion analysis module, and provide low-level functionality. After the detection of motion in a scene, data-driven methods are the only possible way to perform object tracking. For this purpose, we use the hyperplane approach, enhanced to cope with illumination changes and partial occlusions. If the object is known, the robustness of tracking can be further increased by employing model-based methods. We focus on image-based modeling with light fields, which can also be used for visualisation and augmented reality, as they provide highly realistic images.



Hybrid Tracking for Mobile Augmented Reality

Axel Pinz, Technische Universität Graz

Mobile Augmented Reality (AR) applications require the tracking of the user's head pose in 3D scene coordinates based on visual landmarks. This talk presents in detail the tracking subsystem of the mobile VAMPIRE AR gear. We use a high-speed CMOS camera together with inertial sensors. During an initialisation phase, the camera detects visual landmarks and matches them with a scene model. Next, these landmarks are tracked over time, and the camera pose is continuously calculated. Fusion of vision-based and inertial pose makes the combined system more robust against temporal occlusion and mismatches. So called "simultaneous structure and motion estimation" has only recently become feasible and will open a wide field of high-potential applications in navigation, robotics, safety and edutainment.


Contextual Scene Interpretation for Video Annotation

Josef Kittler, University of Surrey

At the University of Surrey, we are using the scenario of a person watching a tennis match to explore the idea of Visual Active Memory. The spatial, temporal, and combined spatio-temporal contexts of tennis video material are explored in order to interpret the progress of the match. To achieve this, there are a number of separate processes that populate the memory, of which some of the more significant are:

  • parsing the video
  • tracking the ball and the players in the image sequence
  • interpreting behaviour of ball and players in the real world
  • recognising players from a variety of poses
  • interpreting the rules of tennis
  • interpreting the progress of the game by matching observations with the rules
There is a sense of both short-term and long-term memory. The short term memory provides a unified communication interface between the various processes, and contains the processing results in a flat structure, until such time as they are no longer needed. Significant results are retained in the long-term memory, in a more structured form suitable for subsequent browsing.



Thu May 26, VAMPIRE Demonstrations, 15:30-17:00

Active Object Recognition

Friedrich Alexander Universität Erlangen-Nürnberg

Complex cognitive vision systems like VAMPIRE are based on low level processes, which include active localisation, tracking, and recognition of objects. In this demonstration, efficient and robust approaches for these processes are shown in the context of an application which helps a user to identify objects in a real environment. The user wears a helmet with a camera and a head-mounted display, which are used for choosing the object that has to be recognised. A second system with a camera placed on a pan-tilt unit locates, magnifies, tracks, and recognises the object using robust and highly efficient methods, such as the enhanced hyperplane approach for data-driven object tracking in real-time, and SIFT features for object recognition. The result of the recognition process is visualised in the display of the helmet.

Virtual Pointing for Active Image Acquisition

Technische Universität Graz

The VAMPIRE mobile AR gear is used to teach object appearance into the Visual Active Memory (VAM). This demonstration shows the hybrid tracking of the user's head pose relative to a scene coordinate system, which is defined by a visual landmark. A 3D cursor can be operated interactively using the video-see-through head-mounted display. The user points at an object in 3D, and the system calculates object position and approximate object size. An active camera takes a close-up view of the object. The camera is positioned on a tripod at an arbitrary location in the scene and uses a motorised pan/tilt/zoom to focus the object. This close-up view, the images acquired by the mobile gear and the determined 3D position can be used to create a representation of the selected object for further processing in the VAM.


Attention-based object recognition

Bielefeld University, Neuroinformatics

The operation of mobile vision systems in natural environments introduces new constraints regarding the interaction between the user and the system. Classical devices like keyboards, mice and screens can no longer be used. Instead, natural means of communication like gesture and speech have to be incorporated. We present the experimental prototype of an augmented reality based object recognition system, that is capable of detecting and naming objects in the visual field. The system can be controlled and adapted online by using natural hand gestures to reference objects and to teach new objects. This is achieved by coupling the core recognition system with an attentional subsystem, that fuses low level feature information with the results of a pointing gesture recognition module.

Context aware scene augmentation

Bielefeld University, Applied Computer Science

We present a context aware AR system that displays additional information about scene objects triggered by user interaction. The system combines object recognition with gestural reference and contextual analysis to anticipate the user's intention and only present him the information he is interested in. Focus of this demonstrator is on contextual and technical integration of different memory processes. It is designed as distributed system on four laptops mediating its data through active memories. It is a first prototype of an integrated cognitive computer vision system VAMPIRE is targeting at.

Mosaicing (Video presentation)

Bielefeld University, Applied Computer Science

Mosaics are well established as a compact and non-redundant representation of image sequences and successfully applied in VAMPIRE for tennis video application. But their usual restriction to only zoomed and rotated cameras make them inapplicable for the mobile scenario with head-mounted cameras. As the user is moving arbitrarily, it is impossible to integrate all views into one mosaic without having parallax errors. As there are no such errors when creating mosaics of planar regions, our approach first decomposes the scene into planar sub-scenes from stereo vision and creates mosaics for each plane individually. The presentation shows results from scene decomposition and tracking of the planar sub-scenes.

Robot Companion

Bielefeld University, Applied Computer Science

The dialog and attentional systems of a mobile robot developed within cooperating projects BIRON and COGNIRON in Bielefeld.

Thu May 26, Industry Presentations, 17:00-19:00

Manfred Prantl, Alicona Imaging Topomicroscopy: 3D surface measurement in the light and electron microscope

Konrad Karner, VR VIS VRVis Research Center for Virtual Reality and Visualization

John Gilby, Sira Ltd. Sira: an independent Research and Technology Organisation

Christian Wöhler, DaimlerChrysler Applications of VAMPIRE in Manufacturing

Edgar Körner, Honda Research First steps to an active vision system

Patrick Courtney, Perkin Elmer Ecvision's industrial liaison activities

Cecile Huet, European Commission Cognition in IST

Fri May 27, VAMPIRE Presentations II, 9:00-10:20

Learning on the Fly - Towards Attentive Computer Vision Systems

Helge Ritter, Bielefeld University (Neuroinformatics)

Despite tremendous advances in recent years, computer vision systems still differ tremendously from natural vision performance as seen in animals or humans. While the field has made significant progress in achieving isolated capabilities, such as object recognition, tracking, or learning, the smooth integration of these into a natural interaction-cycle has received much less attention so far.

In particular, learning is still often rather rigidly and unnaturally separated from the actual operation of the system. We argue that this deprives applications from one of the most significant benefits of learning, namely the capability to rapidly acquire new object concepts in a natural fashion when co-operating with humans.

We show how flexible and natural on-the-fly acquisition of new object categories can be achieved by combining an attention subsystem with a neurally inspired object recognition architecture that permits a natural split of learning into a fast on-line and a slower off-line contribution. We give some examples and results on the system in the context of the VAMPIRE scenario and discuss some of the issues and possibilities connected with that approach.



An Active Memory Model for Integrated Vision Systems

Sven Wachsmuth, Bielefeld University (Applied Computer Science)

Computer vision applications become more and more embedded in our daily life. Content-based video retrieval, home robotics, and Augmented Reality applications will be future markets for integrated computer vision systems. These systems need to robustly perform and naturally interact with humans in highly dynamic environments. Information fusion, contextual reasoning, learning on different levels of abstraction, and dealing with hypotheses from different categorial domains are mandatory prerequisites for such cognitive vision systems. Following some principles of human cognition, we present a computational framework that closely couples reasoning and representation.

We will discuss how processes like probabilistic contextual reasoning as well as functional and non-functional requirements in storing data from different sources can be integrated by a distributable XML-based memory. The core of the memory consists of an embedded XML database and is augmented with support for event-condition-action style data processing. Due to the interaction between active processes and data storage, we call our approach an active memory. Performance results of an implemented system as well as an evaluation of data fusion from contextual inference will be presented