ECCV 2008 Videos
Videos of Computer Vision systems, tools and techniques were presented and published in the electronic proceedings of the 10th European Conference on Computer Vision. The list is as follows:
- Analysis of Player Motion in Sport Games
Janez Pers, Matej Kristan, Matej Perse and Stanislav Kovacic (Faculty of Electrical Engineering, University of Ljubljana)
The video contains a narrated presentation of an application for the analysis of player motions in sport games. The application results from several years of research and development. It implements state-of-the art computer vision algorithms, packed in extremely user-friendly and ergonomic GUI. The software is used at University of Ljubljana for sport research in (European) handball, basketball, tennis and squash. The software runs under Windows or Linux (and possibly other Unix-like) operating systems.
- GeoS: Geodesic Image Segmentation
Antonio Criminisi, Toby Sharp and Andrew Blake (Microsoft Research Cambridge)
This video demonstrates GeoS, a new algorithm for the efficient segmentation of n-dimensional image and video data.
The segmentation problem is cast as approximate energy minimization in a conditional random field. A new, parallel filtering operator built upon efficient
geodesic distance computation is used to propose a set of spatially smooth, contrast-sensitive segmentation hypotheses. An economical search algorithm then
finds the solution with minimum energy within a sensible and highly restricted subset of all possible labellings.
Advantages include: i) computational efficiency with high segmentation accuracy; ii) the ability to estimate an approximation to the posterior over segment
ations; iii) the ability to handle generally complex energy models.
Comparison with min-cut indicates up to 60 times greater computational efficiency as well as greater memory efficiency.
GeoS is validated quantitatively and qualitatively by thorough comparative experiments on existing and novel ground-truth data. Numerous results on interac
tive and automatic segmentation of photographs, video and volumetric medical image data are presented.
- Hyperlinking Reality via Camera Phones
Dusan Omercevic and Ales Leonardis (University of Ljubljana)
Camera phones have become an ubiquitous companion in our daily lives. While they possess substantial processing power, high-speed data connections, (multi-)touch screens and cameras with decent image quality, the use of camera phones as context-aware information search devices is still limited. In this video,
we demonstrate a novel user interface concept for camera phones based on state-of-the-art computer vision techniques. Instead of typing keywords on a smal
l and clumsy keypad, the user just snaps a photo of his surroundings and objects on the photo become hyperlinks to information. The photo of the user\'s en
vironment on the camera phone\'s screen thus becomes a natural interaction device allowing intuitive access to information with a simple tap of a finger. T
o demonstrate the performance of the system in a real world scenario of a pedestrian user exploring a city, we have captured a sequence of still images fro
m a stroll through the city of Graz, Austria. Each image was processed independently of others and annotated with hyperlinks to some interesting stories ab
out buildings and other objects in the user\'s environment.
- Segmentation and Recognition using Structure from Motion Point Clouds
Gabriel J. Brostow (ETHZ & UCL), Jamie Shotton (Microsoft Research Cambridge), Julien Fauqueur and Roberto Cipolla (University of Cambridge)
We propose an algorithm for semantic segmentation based on 3D point clouds derived from ego-motion. We motivate five simple cues designed to model specific
patterns of motion and 3D world structure that vary with object category. We introduce features that project the 3D cues back to the 2D image plane while
modeling spatial layout and context. A randomized decision forest combines many such features to achieve a coherent 2D segmentation and recognize the objec
t categories present. Our main contribution is to show how semantic segmentation is possible based solely on motion-derived 3D world structure. Our method
works well on sparse, noisy point clouds, and unlike existing approaches, does not need appearance-based descriptors.
Experiments were performed on a challenging new video database containing sequences filmed from a moving car in daylight and at dusk. The results confirm t
hat indeed, accurate segmentation and recognition are possible using only motion and 3D world structure. Further, we show that the motion-derived informati
on complements an existing state-of-the-art appearance-based method, improving both qualitative and quantitative performance.
- The Grimage platform: Interaction and Telepresence using Image Based 3D modeling
Benjamin Petit, Jean-Denis Lesage, Edmond Boyer and Bruno Raffin (INRIA)
This video presents an immersive environment that allows collaborative and remote 3D interactions. The environment uses multiple camera systems and 3D modeling tools to build, in real time, virtual models of users. Such users can be on distant sites and still share the same virtual environment. Their 3D models are embedded into the virtual environment where they can interact with shared virtual objects. Models encode geometric information that are plugged into a physical simulation for interactive purposes. They also encode photometric information that are used through textures to ensure a good sense of presence.
- Unwrap Mosaics for Video Editing
Alex Rav Acha (Weizmann Institute), Pushmeet Kohli, Carsten Rother and Andrew Fitzgibbon (Microsoft Research)
We demonstrate the use of a new representation of video for facilitating a number of common editing tasks. The representation has some of the power of a full reconstruction of 3D surface models from video, but is designed to be easy to recover from a priori unseen and uncalibrated footage. By modelling the im
age-formation process as a 2D-to-2D transformation from an object's texture map to the image, modulated by an object-space occlusion mask, we can recover a
representation which we term the unwrap mosaic. Many editing operations can be performed on the unwrap mosaic, and then re-composited into the original
sequence, for example resizing objects, repainting textures, copying/cutting/pasting objects, and attaching effects layers to deforming objects.
- VideOlympics: Real-Time Evaluation of Video Search Engines
Cees Snoek, Ork De Rooij, Koen Van De Sande and Marcel Worring (University of Amsterdam)
Demo sessions of computer vision systems are ideal venues to disseminate scientific results. Existing demo sessions, however, fail to engage the audience f
ully. We argue that simultaneous and real-time evaluation of several computer vision systems in a single showcase increases impact. This movie gives an imp
ression of such a showcase, namely the VideOlympics, a competition for video search engines. The major aim of the VideOlympics is promoting research in vid
eo retrieval. An additional main goal of the VideOlympics is giving the audience a good perspective on the possibilities and limitations of current state-o
f-the-art visual search engines. Where traditional evaluation campaigns like the Pascal Visual Object Classes Challenge and TRECVID focus primarily on the
effectiveness of collected retrieval results, the VideOlympics also allows taking into account the influence of interaction mechanisms and the advanced vis
ualizations in the interface. In the end, all scientists go home with a golden retriever award, but the real winner is the audience.
Video Chairs: James Crowley and Edmond Boyer