This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
View analytic
Wednesday, September 30 • 15:30 - 16:20
Deploying Spark Streaming with Kafka: Gotchas and Performance Analysis - Nishkam Ravi, Cloudera

Sign up or log in to save this to your schedule and see who's attending!

Apache Spark is an in-memory compute engine that supports real time data processing through the streaming API. Kafka is a popular publish-subscribe messaging system used for data ingest and distribution. The performance of Spark streaming with Kafka is barely understood. In this talk, we will discuss different Spark streaming APIs that can be used for receiving data from Kafka and evaluate their performance for complex event processing. We will also highlight some caveats and corresponding workarounds for best performance. We find that Spark+Kafka yields high throughput and sub-second latencies for complex events when configured properly.


Nishkam Ravi

Software Engineer, Cloudera
Nishkam is a Software Engineer at Cloudera. His current focus is Spark and MapReduce performance. Nishkam got his B.Tech from IIT-Bombay and PhD from Rutgers. His first job was with Intel as a compiler engineer. Prior to joining Cloudera, Nishkam was a Research Staff Member at NEC Labs where he developed an optimizing compiler for MapReduce. He has presented in numerous peer reviewed AI and systems conferences in the past. | | Hari is a... Read More →

Wednesday September 30, 2015 15:30 - 16:20

Attendees (31)