Free-hand sketch provides a natural and expressive modality for interaction with computers. This project explores methods to intuitively search video databases using sketches. Although video search is typically performed using keywords that specify content, text is cumbersome for describing scene appearance. Rather, a sketched depiction of a scene represents an orthogonal channel to constrain search. Although sketch based image retrieval (SBIR) has received much attention, the related problem of video retrieval (SBVR) is only sparsely researched – especially the fusion of text and sketch.
Building upon prior SBIR (Hu,2011,ICIP,17,1025-1028) and SBVR (Collomosse, 2009, ICCV, 12) we describe intermediate results from such a hybrid system. Our sketches describe objects based on their semantics (e.g. horse), a sketch of their motion trajectory, and their colour; collectively representing a natural interface for conveying multiple facets of events. We analyse salient features in a scene, compensate for camera motion, and project feature onto a stable reference. Feature trajectories are clustered into trails, which are classified according to object class and colour distribution. A “fingerprint” (descriptor) is extracted for each object – such a descriptor is also extracted from the sketch. Following dimensionality reduction, descriptors are matched to identify relevant videos.
Our results show annotated sketches retrieving clips from a database of sports footage. Natural user interfaces such as sketch will have significant impact in the near-term, given the trend toward non-classical platforms such as tablet, mobile and large-scale touch-screen devices. SBVR also motivates cross-disciplinary studies of user perception and depiction of events (Collomosse, 2008, ICPR, 19).