Back To Schedule
Monday, September 28 • 16:00 - 16:50
Integrating Apache Spark with an Enterprise Data Warehouse - Michael Wurst, IBM

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

This session will discuss the challenges and opportunities of integrating Apache Spark with enterprise data warehouses, especially the impact of columnar storage on the example of IBM DB2 and IBM dashDB. We will show how columnar storage can help to increase scalability and reduce response time, especially when pushing down processing of projections and aggregates to the database instead of processing them in Spark natively. Key takeaways from the session are: 1.How to benefit from the features of closed-source data warehouses from Spark without access to internal data structures, 2.the role of storage when working with large warehouses from Spark, 3. Opportunities of columnar storage vs. row based storage, 4. How such an integration impacts end-to-end analytics based on Spark MLlib.

avatar for Michael Wurst

Michael Wurst

Architect / Senior Software Developer, IBM Research & Development
Michael Wurst, Ph.D. is a senior software engineer and architect at the IBM Research & Development Lab in Germany. He holds a Ph.D. in computer science and is responsible for the integration of open source analytics based on R, Python or Spark into IBM's Datawarehouse portfolio. Prior... Read More →

Monday September 28, 2015 16:00 - 16:50 CEST

Attendees (0)