Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
View analytic
Monday, September 28 • 16:00 - 16:50
Integrating Apache Spark with an Enterprise Data Warehouse - Michael Wurst, IBM

Sign up or log in to save this to your schedule and see who's attending!

This session will discuss the challenges and opportunities of integrating Apache Spark with enterprise data warehouses, especially the impact of columnar storage on the example of IBM DB2 and IBM dashDB. We will show how columnar storage can help to increase scalability and reduce response time, especially when pushing down processing of projections and aggregates to the database instead of processing them in Spark natively. Key takeaways from the session are: 1.How to benefit from the features of closed-source data warehouses from Spark without access to internal data structures, 2.the role of storage when working with large warehouses from Spark, 3. Opportunities of columnar storage vs. row based storage, 4. How such an integration impacts end-to-end analytics based on Spark MLlib.

Speakers
avatar for Michael Wurst

Michael Wurst

Architect / Senior Software Developer, IBM Research & Development
Michael Wurst, Ph.D. is a senior software engineer and architect at the IBM Research & Development Lab in Germany. He holds a Ph.D. in computer science and is responsible for the integration of open source analytics based on R, Python or Spark into IBM's Datawarehouse portfolio. Prior to joining IBM, Michael worked as a co-developer for the RapidMiner open source data mining software. Michael presented at a wide range of conferences, including... Read More →


Monday September 28, 2015 16:00 - 16:50
Dery/Mikszath

Attendees (29)