Apache: Big Data 2015: Full Schedule

Click Here For More Details or to Register

10:30 CEST

Apache Phoenix: The Evolution of a Relational Database Layer over HBase - Nick Dimiduk, Hortonworks

This presentation will begin by giving a "State of the Union" of Apache Phoenix, a relational database layer on top of HBase for low latency applications, with a brief overview of new and existing features. Next, the approach for transaction support, a work in-progress will be discussed. Lastly, the current means of integrating with the rest of the Hadoop ecosystem will be examined, including the vision for how this will evolve going forward.

Speakers

Nick Dimiduk

Hortonworks

Nick Dimiduk is a committer and PMC member on both Apache HBase and Apache Phoenix. He's Release Manager for the HBase 1.1 branch and an author of the book HBase in Action, on Manning Press. Nick has also contributed to a number of Apache projects around HBase, including, HTrace... Read More →

Phoenix ApacheBigDataEU2015v01 pdf

Tuesday September 29, 2015 10:30 - 11:20 CEST
Petofi

SQL

11:30 CEST

Adding Insert, Update, and Delete to Apache Hive - Owen O'Malley, Hortonworks

Apache Hive provides a convenient SQL query engine and table abstraction for data stored in Hadoop. Hive uses Hadoop to provide highly scalable bandwidth to the data, but until recently did not support updates, deletes, or transaction isolation. This has prevented many desirable use cases such as updating of dimension tables or doing data cleanup. We have implemented the standard SQL commands insert, update, and delete allowing users to insert new records as they become available, update changing dimension tables, repair incorrect data, and remove individual records. This also allows very low latency ingestion of streaming data from tools like Storm and Flume. Additionally, we have added ACID-compliant snapshot isolation between queries so that queries will see a consistent view of the committed transactions when they are launched.

Speakers

Owen O’Malley

Co-founder & Sr Architect, Hortonworks

Owen O’Malley is a co-founder and architect at Hortonworks, which develops the completely open source Hortonworks Data Platform (HDP). HDP includes Hadoop and the large ecosystem of big data tools that enterprises need for data analytics. Owen has been working on Hadoop since 2006... Read More →

Tuesday September 29, 2015 11:30 - 12:20 CEST
Petofi

SQL

14:00 CEST

Drilling into Data with Apache Drill - Tugdual Grall, MapR Technologies

Apache Drill is a next-generation SQL engine for Hadoop and NoSQL. Its unique schema-free approach enables self-service data exploration with the agility that organizations need in this new era of rapidly growing and evolving data.

In this talk, based on demonstrations, you will understand the key features and architecture of Apache Drill. You will also see how to get started with Drill; and start query, using SQL, various data sources such as HBase, Hive, Parquet, and Avro, but also more complex data structure stored in JSON documents.

Speakers

Tugdual Grall

Technical Evangelist, MapR

Tugdual Grall Bio: Tugdual Grall, est Chief Technical Evangelist EMEA chez MapR. Il travaille avec les clients et les communautés de développeurs européennes, pour faciliter l’adoption de MapR, Hadoop et NoSQL. Avant de travailler chez MapR, “Tug”, était Technical Evangelist... Read More →

Tuesday September 29, 2015 14:00 - 14:50 CEST
Petofi

SQL

15:00 CEST

Hive on Spark: What It Means to You? - Xuefu Zhang, Cloudera

Apache Hive has wide use cases for batch-oriented SQL workloads for ETL and data analytics in the Hadoop ecosystem. Up to now, most of the workloads are still executed by a 10 year old technology, MapReduce. On the other hand, Apache Spark as a general, open-source data processing framework is positioned to replace MapReduce with faster data processing and efficient memory utilization.

The Hive on Spark initiative introduced Spark as Hive's new execution engine, providing faster SQL on Hadoop while maintaining Hive's feature richness. With a joint effort from the Hive community and feedback from early adopters and beta users, Hive on Spark is ready for production deployment!

This presentation will share with you the motivation, architecture, deployment practice, and performance tuning. A live demo will be given to conclude the presentation.

Speakers

Xuefu Zhang

Software Engineer, Uber Technologies

Xuefu Zhang has over 10 year’s experience in software development. Earlier this year he joined as a software engineer in Uber from Cloudera, where he spent his main efforts on Apache Hive and Pig. He also worked in the Hadoop team at Yahoo when the majority of the development on... Read More →

Tuesday September 29, 2015 15:00 - 15:50 CEST
Petofi

SQL

16:00 CEST

Help Build the most Advanced SQL Database on Hadoop: HAWQ - Lei Chang, Pivotal

HAWQ is a massively parallel processing SQL engine sitting on top of HDFS. As a hybrid of MPP database and Hadoop, it inherits the merits from both parties. It is standard SQL compliant, extremely fast and scalable, and unlike other SQL engines on Hadoop, it is fully
transactional. HAWQ is currently being proposed as an Apache incubating project.

In this talk, Dr. Lei Chang will give an overview on HAWQ architecture and the major exciting areas that are soliciting contributions from open source community. And he will also introduce the easiest way contributors can work with HAWQ developers to bring their innovative ideas to the HAWQ kernels.

Speakers

Lei Chang

Engineering Director for Apache HAWQ, Pivotal

Dr. Lei Chang is the Engineering Director for Apache HAWQ at Pivotal Inc. He is the co-creator and architect of HAWQ. Before he joined Pivotal, he is a senior research scientist in EMC. Main research area include parallel database, data analytics and cloud computing. He has published... Read More →

Tuesday September 29, 2015 16:00 - 16:50 CEST
Petofi

SQL

10:00 CEST

Apache Kylin - Extreme OLAP engine for Hadoop - Seshu Adunuthula, eBay Cloud Services

Apache Kylin is an open source Distributed Analytics Engine contributed by eBay Inc. that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets. Kylin’s pre-built MOLAP cubes, distributed architecture, and high concurrency helps users analyze multidimensional queries using Kylin’s SQL interface as well as via other BI tools like Tableau and Microstrategy. Kylin is successfully deployed and used in eBay for a variety of production use cases, including web traffic analysis and geographical expansion analysis. It was open sourced on Oct 1, 2014 and has 320 stars and 125 forks. Kylin has been accepted as Apache Incubator Project on Nov 25, 2014.

Speakers

Seshu Adunuthula

Sr. Director of Analytics Infrastructure, eBay

Seshu Adunuthula is Sr Director of Analytics Infrastructure at eBay responsible for managing some of the world’s largest deployments of Hadoop, Teradata and ETL Ingest platforms. He is an industry veteran with over 20 years of Distributed Computing and Analytics Experience. Prior... Read More →

Qianhao Zhou

Qianhao Zhou is Sr. Software Engineer of eBay CCOE, core developer of Apache Kylin in eBay, working on different componments of Kylin, Including Job Engine, Streaming Engine, he is now working on Kylin on Spark to enable fast cubing on Spark for Kylin cube build process.

Wednesday September 30, 2015 10:00 - 10:50 CEST
Petofi

SQL

11:00 CEST

Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA - Christian Tzolov, Pivotal

In the space of Big Data, two powerful data processing tools compliment each other. Namely HAWQ and Geode. HAWQ is a scalable OLAP SQL-on-Hadoop system, while Geode is OLTP like, in-memory data grid and event processing system. This presentation will show different integration approaches that allow integration and data exchange between HAWQ and Geode. Presentation will walking you through the implementation of the different Integration strategies demonstrating the power of combining various OSS technologies for processing bit and fast data. Presentation will touch upon OSS technologies like HAWQ, Geode, SpringXD, Hadoop and Spring Boot.

Speakers

Christian Tzolov

Pivotal Inc

Christian Tzolov, Pivotal technical architect, BigData and Hadoop specialist, contributing to various open source projects. In addition to being an ApacheÂ® Committer and Apache Crunch PMC Member, he has spent over a decade working with various Java and Spring projects and has led... Read More →

Wednesday September 30, 2015 11:00 - 11:50 CEST
Petofi

SQL

12:00 CEST

Introduction to Pivotal HAWQ[1]: A Deep Drive Into the Architecture of an Advanced SQL Engine - Caleb Welton, Pivotal

The Pivotal HAWQ[1] project, planned for incubation into an Apache project, is designed to provide a highly performant ANSI SQL compliant query engine supporting a sophisticated resource management model, transactional DML and DDL operations, window functions, grouping sets, complex sub-queries, common table expressions, and strong extensibility capabilities for customized analytics and machine learning.

In this session we will present an overview of the product features, describe the key architectural components, and walk through the overall project structure.

[1] Project incubation and Apache project name pending approval by the Apache Foundation.

Speakers

Caleb Welton

Director, Pivotal

Caleb Welton is Director for SQL on Hadoop at Pivotal covering the Pivotal HAWQ database. He has spent the last 18 years developing database technology for Oracle, Greenplum, EMC and Pivotal. In addition to his contributions in database technology he is one of the founding members... Read More →

Wednesday September 30, 2015 12:00 - 12:50 CEST
Petofi

SQL

14:30 CEST

Apache Trafodion (incubating) brings operational workloads to Hadoop - Rohit Jain, Esgyn

Trafodion is a world class Transactional SQL RDBMS running on HBase/Hadoop, currently in Apache incubation.

In this talk we will discuss:

How operational workloads are different from BI and analytical workloads
The operational (OLTP & Operational Data Store) use cases Trafodion addresses
Why Trafodion is the right solution for these use cases. That is, what is the recipe for a world class database engine, and how Trafodion implements the ingredients that make up that recipe:

Time, money, and talent!
World class query optimizer
World class parallel data flow execution engine
World class distributed transaction management system

Other important aspects such as performance, scale, availability, and future directions

Speakers

Rohit Jain

CTO, Esgyn

Rohit Jain is Co-Founder and CTO at Esgyn, an open source database company. Rohit provided the vision behind Apache Trafodion, an enterprise-class MPP SQL Database for Big Data, donated to the Apache Software Foundation by HP in 2015. A veteran database technologist over the past... Read More →

Wednesday September 30, 2015 14:30 - 15:20 CEST
Petofi

SQL