This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
View analytic
Monday, September 28 • 15:00 - 15:50
Apache Tika for Enabling Metadata Interoperability - Michael Starch, NASA Jet Propulsion Laboratory and Nick Burch

Sign up or log in to save this to your schedule and see who's attending!

Apache Tika is the de facto standard technology for textual content and metadata extraction from over a thousand different file types. Given the growing importance of metadata, Tika has become a fundamental tool, providing support for many metadata models. However, enabling uniform access to very large sets of heterogeneous documents requires dealing with most accurate interoperability techniques, such as metadata mapping. In this talk, Michael and Nick will review existing solutions based on Tika that make possible to obtain consistent metadata across file formats (i.e., TikaCoreProperties, Solr’s ExtractingRequestHandler) and then present a new component for Tika. This integration provides an extension of Metadata object in order to achieve metadata interoperability by using a highly configurable, fine-grained mapping technique that subsumes schema mapping and instance transformation.

This work has been proposed by Giuseppe Totaro (“Sapienza" University of Rome) and Chris Mattmann (NASA JPL). 


Nick Burch

CTO, Apache Software Foundation
Nick began contributing to Apache projects in 2003, and hasn't looked back since! He's mostly involved in ""Content"" projects like Apache POI, Apache Tika and Apache Chemistry, as well as foundation-wide activities like Conferences and Travel Assistance. | | Nick is CTO at Quanticate, a Clinical Research Organisation (CRO) with a strong focus on data and statistics. | | Nick has spoken at most ApacheCons since 2007, and as well as many... Read More →

Michael Starch

Computer Engineer in Applications, NASA Jet Propulsion Laboratory
Michael Starch has been employed by the Jet Propulsion laboratory for the past 5 years. His primary responsibilities include: engineering big data processing systems for handling scientific data, researching the next generation of big data technologies, and helping infuse these systems into the mission world. He is a commiter and PMC on Apache OODT and has spoken about his work at the Southern California Linux Expo and ApacheCon North America.

Monday September 28, 2015 15:00 - 15:50

Attendees (15)