Tuesday, September 29 • 11:30 - 12:20
Deriving Business Value From Large Image Collections on Hadoop - Michael Natusch, Pivotal

Image collections are rapidly growing in size. Efficient image management is necessary for large image collections to ensure easy searching and browsing. In this talk, we will describe how large image collections can be efficiently managed by presenting a content-based image retrieval (CBIR) system built on Hadoop. A CBIR system takes as an input a query image and returns images depicting content most similar to the input query image. Putting together a CBIR system involves building many components: the image collection, a feature extractor, and machine learning models for mining similar images. In this talk, we will present how a CBIR system can be easily and efficiently realized using Hadoop and SQL on Hadoop technologies. The system we present here discovers latent visual topics associated with each image and retrieves images based on similarity between corresponding visual topics.


