Wednesday, September 30 • 16:30 - 17:20
Near Real Time Indexing Kafka Messages to Apache Blur using Spark Streaming - Dibyendu Bhattacharya, Pearson North America

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Pearson is building a next generation adaptive learning platform and their Near Real Time architecture is powered by Kafka and Spark Streaming. Pearson also building a search infrastructure to index various learners data to Apache Blur, which is a Lucene based distributed search solution on Hadoop. For supporting NRT indexing into Apache Blur, Pearson has designed a fault-tolerant and reliable low-level Kafka Consumer for Spark Streaming. This talk will cover why Pearson chosen Apache Blur and how they designed this Kafka Consumer for Spark which helped NRT indexing into Blur. This talk will also cover the implementation details of Spark to Blur connector for doing bulk indexing to Apache Blur using Spark Hadoop API. This Spark-Blur connector is contributed to Apache Blur Project (http://bit.ly/1HVWk7G) and Kafka-Spark consumer is contributed to spark-packages (http://bit.ly/1PRNNtM)

avatar for Dibyendu Bhattacharya

Dibyendu Bhattacharya

Big Data Architect, Pearson North America
Holds MS in Software Systems and B.Tech in Computer Science. Experience in building applications and products leveraging distributed computing and big data technologies. Working as Big Data Architect at Pearson,building adaptive learning platform to capture behavioral data across... Read More →

Wednesday September 30, 2015 16:30 - 17:20 CEST

Attendees (0)