Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

SQL [clear filter]
Tuesday, September 29


Apache Phoenix: The Evolution of a Relational Database Layer over HBase - Nick Dimiduk, Hortonworks
This presentation will begin by giving a "State of the Union" of Apache Phoenix, a relational database layer on top of HBase for low latency applications, with a brief overview of new and existing features. Next, the approach for transaction support, a work in-progress will be discussed. Lastly, the current means of integrating with the rest of the Hadoop ecosystem will be examined, including the vision for how this will evolve going forward.

avatar for Nick Dimiduk

Nick Dimiduk

Nick Dimiduk is a committer and PMC member on both Apache HBase and Apache Phoenix. He's Release Manager for the HBase 1.1 branch and an author of the book HBase in Action, on Manning Press. Nick has also contributed to a number of Apache projects around HBase, including, HTrace... Read More →

Tuesday September 29, 2015 10:30 - 11:20


Adding Insert, Update, and Delete to Apache Hive - Owen O'Malley, Hortonworks
Apache Hive provides a convenient SQL query engine and table abstraction for data stored in Hadoop. Hive uses Hadoop to provide highly scalable bandwidth to the data, but until recently did not support updates, deletes, or transaction isolation. This has prevented many desirable use cases such as updating of dimension tables or doing data cleanup. We have implemented the standard SQL commands insert, update, and delete allowing users to insert new records as they become available, update changing dimension tables, repair incorrect data, and remove individual records. This also allows very low latency ingestion of streaming data from tools like Storm and Flume. Additionally, we have added ACID-compliant snapshot isolation between queries so that queries will see a consistent view of the committed transactions when they are launched.

avatar for Owen O’Malley

Owen O’Malley

Co-founder & Sr Architect, Hortonworks
Owen O’Malley is a co-founder and architect at Hortonworks, which develops the completely open source Hortonworks Data Platform (HDP). HDP includes Hadoop and the large ecosystem of big data tools that enterprises need for data analytics. Owen has been working on Hadoop since 2006... Read More →

Tuesday September 29, 2015 11:30 - 12:20


Drilling into Data with Apache Drill - Tugdual Grall, MapR Technologies
Apache Drill is a next-generation SQL engine for Hadoop and NoSQL. Its unique schema-free approach enables self-service data exploration with the agility that organizations need in this new era of rapidly growing and evolving data.

In this talk, based on demonstrations, you will understand the key features and architecture of Apache Drill. You will also see how to get started with Drill; and start query, using SQL, various data sources such as HBase, Hive, Parquet, and Avro, but also more complex data structure stored in JSON documents.

avatar for Tugdual Grall

Tugdual Grall

Technical Evangelist, MapR
Tugdual Grall Bio: Tugdual Grall, est Chief Technical Evangelist EMEA chez MapR. Il travaille avec les clients et les communautés de développeurs européennes, pour faciliter l’adoption de MapR, Hadoop et NoSQL. Avant de travailler chez MapR, “Tug”, était Technical Evangelist... Read More →

Tuesday September 29, 2015 14:00 - 14:50


Hive on Spark: What It Means to You? - Xuefu Zhang, Cloudera
Apache Hive has wide use cases for batch-oriented SQL workloads for ETL and data analytics in the Hadoop ecosystem. Up to now, most of the workloads are still executed by a 10 year old technology, MapReduce. On the other hand, Apache Spark as a general, open-source data processing framework is positioned to replace MapReduce with faster data processing and efficient memory utilization.

The Hive on Spark initiative introduced Spark as Hive's new execution engine, providing faster SQL on Hadoop while maintaining Hive's feature richness. With a joint effort from the Hive community and feedback from early adopters and beta users, Hive on Spark is ready for production deployment!

This presentation will share with you the motivation, architecture, deployment practice, and performance tuning. A live demo will be given to conclude the presentation.


Xuefu Zhang

Software Engineer, Uber Technologies
Xuefu Zhang has over 10 year’s experience in software development. Earlier this year he joined as a software engineer in Uber from Cloudera, where he spent his main efforts on Apache Hive and Pig. He also worked in the Hadoop team at Yahoo when the majority of the development on... Read More →

Tuesday September 29, 2015 15:00 - 15:50


Help Build the most Advanced SQL Database on Hadoop: HAWQ - Lei Chang, Pivotal
HAWQ is a massively parallel processing SQL engine sitting on top of HDFS. As a hybrid of MPP database and Hadoop, it inherits the merits from both parties. It is standard SQL compliant, extremely fast and scalable, and unlike other SQL engines on Hadoop, it is fully
transactional. HAWQ is currently being proposed as an Apache incubating project.

In this talk, Dr. Lei Chang will give an overview on HAWQ architecture and the major exciting areas that are soliciting contributions from open source community. And he will also introduce the easiest way contributors can work with HAWQ developers to bring their innovative ideas to the HAWQ kernels.


Lei Chang

Engineering Director for Apache HAWQ, Pivotal
Dr. Lei Chang is the Engineering Director for Apache HAWQ at Pivotal Inc. He is the co-creator and architect of HAWQ. Before he joined Pivotal, he is a senior research scientist in EMC. Main research area include parallel database, data analytics and cloud computing. He has published... Read More →

Tuesday September 29, 2015 16:00 - 16:50
Wednesday, September 30


Apache Kylin - Extreme OLAP engine for Hadoop - Seshu Adunuthula, eBay Cloud Services
Apache Kylin is an open source Distributed Analytics Engine contributed by eBay Inc. that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets. Kylin’s pre-built MOLAP cubes, distributed architecture, and high concurrency helps users analyze multidimensional queries using Kylin’s SQL interface as well as via other BI tools like Tableau and Microstrategy. Kylin is successfully deployed and used in eBay for a variety of production use cases, including web traffic analysis and geographical expansion analysis. It was open sourced on Oct 1, 2014 and has 320 stars and 125 forks. Kylin has been accepted as Apache Incubator Project on Nov 25, 2014.

avatar for Seshu Adunuthula

Seshu Adunuthula

Sr. Director of Analytics Infrastructure, eBay
Seshu Adunuthula is Sr Director of Analytics Infrastructure at eBay responsible for managing some of the world’s largest deployments of Hadoop, Teradata and ETL Ingest platforms. He is an industry veteran with over 20 years of Distributed Computing and Analytics Experience. Prior... Read More →

Qianhao Zhou

Qianhao Zhou is Sr. Software Engineer of eBay CCOE, core developer of Apache Kylin in eBay, working on different componments of Kylin, Including Job Engine, Streaming Engine, he is now working on Kylin on Spark to enable fast cubing on Spark for Kylin cube build process.

Wednesday September 30, 2015 10:00 - 10:50


Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA - Christian Tzolov, Pivotal
In the space of Big Data, two powerful data processing tools compliment each other. Namely HAWQ and Geode. HAWQ is a scalable OLAP SQL-on-Hadoop system, while Geode is OLTP like, in-memory data grid and event processing system. This presentation will show different integration approaches that allow integration and data exchange between HAWQ and Geode. Presentation will walking you through the implementation of the different Integration strategies demonstrating the power of combining various OSS technologies for processing bit and fast data. Presentation will touch upon OSS technologies like HAWQ, Geode, SpringXD, Hadoop and Spring Boot.

avatar for Christian Tzolov

Christian Tzolov

Pivotal Inc
Christian Tzolov, Pivotal technical architect, BigData and Hadoop specialist, contributing to various open source projects. In addition to being an Apache® Committer and Apache Crunch PMC Member, he has spent over a decade working with various Java and Spring projects and has led... Read More →

Wednesday September 30, 2015 11:00 - 11:50


Introduction to Pivotal HAWQ[1]: A Deep Drive Into the Architecture of an Advanced SQL Engine - Caleb Welton, Pivotal
The Pivotal HAWQ[1] project, planned for incubation into an Apache project, is designed to provide a highly performant ANSI SQL compliant query engine supporting a sophisticated resource management model, transactional DML and DDL operations, window functions, grouping sets, complex sub-queries, common table expressions, and strong extensibility capabilities for customized analytics and machine learning.

In this session we will present an overview of the product features, describe the key architectural components, and walk through the overall project structure.

[1] Project incubation and Apache project name pending approval by the Apache Foundation.


Caleb Welton

Director, Pivotal
Caleb Welton is Director for SQL on Hadoop at Pivotal covering the Pivotal HAWQ database. He has spent the last 18 years developing database technology for Oracle, Greenplum, EMC and Pivotal. In addition to his contributions in database technology he is one of the founding members... Read More →

Wednesday September 30, 2015 12:00 - 12:50


Apache Trafodion (incubating) brings operational workloads to Hadoop - Rohit Jain, Esgyn

Trafodion is a world class Transactional SQL RDBMS running on HBase/Hadoop, currently in Apache incubation.

In this talk we will discuss:

  • How operational workloads are different from BI and analytical workloads
  • The operational (OLTP & Operational Data Store) use cases Trafodion addresses
  • Why Trafodion is the right solution for these use cases.  That is, what is the recipe for a world class database engine, and how Trafodion implements the ingredients that make up that recipe: 
  1. Time, money, and talent!
  2. World class query optimizer
  3. World class parallel data flow execution engine
  4. World class distributed transaction management system
  • Other important aspects such as performance, scale, availability, and future directions


Rohit Jain

CTO, Esgyn
Rohit Jain is Co-Founder and CTO at Esgyn, an open source database company. Rohit provided the vision behind Apache Trafodion, an enterprise-class MPP SQL Database for Big Data, donated to the Apache Software Foundation by HP in 2015. A veteran database technologist over the past... Read More →

Wednesday September 30, 2015 14:30 - 15:20