Name: Near Real Time Indexing Kafka Messages to Apache Blur using Spark Streaming - Dibyendu Bhattacharya, Pearson North America
Start: 2015-09-30T16:30:00+0200
End: 2015-09-30T17:20:00+0200

Click Here For More Details or to Register

Back To Schedule

Near Real Time Indexing Kafka Messages to Apache Blur using Spark Streaming - Dibyendu Bhattacharya, Pearson North America

Pearson is building a next generation adaptive learning platform and their Near Real Time architecture is powered by Kafka and Spark Streaming. Pearson also building a search infrastructure to index various learners data to Apache Blur, which is a Lucene based distributed search solution on Hadoop. For supporting NRT indexing into Apache Blur, Pearson has designed a fault-tolerant and reliable low-level Kafka Consumer for Spark Streaming. This talk will cover why Pearson chosen Apache Blur and how they designed this Kafka Consumer for Spark which helped NRT indexing into Blur. This talk will also cover the implementation details of Spark to Blur connector for doing bulk indexing to Apache Blur using Spark Hadoop API. This Spark-Blur connector is contributed to Apache Blur Project (http://bit.ly/1HVWk7G) and Kafka-Spark consumer is contributed to spark-packages (http://bit.ly/1PRNNtM)

Speakers

Dibyendu Bhattacharya

Big Data Architect, Pearson North America

Holds MS in Software Systems and B.Tech in Computer Science. Experience in building applications and products leveraging distributed computing and big data technologies. Working as Big Data Architect at Pearson,building adaptive learning platform to capture behavioral data across... Read More →

Apache Big Data talk pdf

Wednesday September 30, 2015 16:30 - 17:20 CEST
Krudy/Jokai

Streaming - Pipelining - IoT

Apache: Big Data 2015

Dibyendu Bhattacharya

Attendees (0)

Apache: Big Data 2015

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Dibyendu Bhattacharya

Attendees (0)