Wednesday, September 30 • 15:30 - 16:20
Deploying Spark Streaming with Kafka: Gotchas and Performance Analysis - Nishkam Ravi, Cloudera

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Apache Spark is an in-memory compute engine that supports real time data processing through the streaming API. Kafka is a popular publish-subscribe messaging system used for data ingest and distribution. The performance of Spark streaming with Kafka is barely understood. In this talk, we will discuss different Spark streaming APIs that can be used for receiving data from Kafka and evaluate their performance for complex event processing. We will also highlight some caveats and corresponding workarounds for best performance. We find that Spark+Kafka yields high throughput and sub-second latencies for complex events when configured properly.


Nishkam Ravi

Software Engineer, Cloudera
Nishkam is a Software Engineer at Cloudera. His current focus is Spark and MapReduce performance. Nishkam got his B.Tech from IIT-Bombay and PhD from Rutgers. His first job was with Intel as a compiler engineer. Prior to joining Cloudera, Nishkam was a Research Staff Member at NEC... Read More →

Wednesday September 30, 2015 15:30 - 16:20 CEST

Attendees (0)