Monday, September 28 • 15:00 - 15:50
Apache Tika for Enabling Metadata Interoperability - Michael Starch, NASA Jet Propulsion Laboratory and Nick Burch

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Apache Tika is the de facto standard technology for textual content and metadata extraction from over a thousand different file types. Given the growing importance of metadata, Tika has become a fundamental tool, providing support for many metadata models. However, enabling uniform access to very large sets of heterogeneous documents requires dealing with most accurate interoperability techniques, such as metadata mapping. In this talk, Michael and Nick will review existing solutions based on Tika that make possible to obtain consistent metadata across file formats (i.e., TikaCoreProperties, Solr’s ExtractingRequestHandler) and then present a new component for Tika. This integration provides an extension of Metadata object in order to achieve metadata interoperability by using a highly configurable, fine-grained mapping technique that subsumes schema mapping and instance transformation.

This work has been proposed by Giuseppe Totaro (“Sapienza" University of Rome) and Chris Mattmann (NASA JPL). 


Nick Burch

CTO, Quanticate
Nick began contributing to Apache projects in 2003, and hasn't looked back since! He's mostly involved in "Content" projects like Apache POI, Apache Tika and Apache Chemistry, as well as foundation-wide activities like Conferences and Travel Assistance.Nick is CTO at Quanticate, a... Read More →

Michael Starch

Computer Engineer in Applications, NASA Jet Propulsion Laboratory
Michael Starch has been employed by the Jet Propulsion laboratory for the past 5 years. His primary responsibilities include: engineering big data processing systems for handling scientific data, researching the next generation of big data technologies, and helping infuse these systems... Read More →

Monday September 28, 2015 15:00 - 15:50 CEST

Attendees (0)