Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

DevOps - Distribution - Testing [clear filter]
Monday, September 28


Apache Bigtop: Where it Came From and Where It's Going - Nate DAmico
Following the mantra, “best tool for the job,” you seldom use a single Open Source tool for data processing. The more tools you use, however, the more you start to realize the difficulties of managing dependencies and configuring packages across components, projects, and versions. This is where the Apache Bigtop project and community comes in. Come get an overview of the origins of Apache Bigtop and why organizations like Cloudera, Wandisco, and Amazon Web Services rely on Bigtop for their own bigdata component distribution efforts, and where the project is going post its summer 1.0 release.


Nate DAmico

Nate has been working in the enterprise and mobile software industry for 14 years in various capacities. In recent years his tech efforts have focused around areas of mobile computer vision as well as the rise of the consumerization of IT Operations. Three years ago he started Reactor8... Read More →

Monday September 28, 2015 10:30 - 11:20


One-Click Hadoop Clusters - Anywhere (Using Docker) - Janos Matyas, Hortonworks
This session presents the provisioning of Hadoop clusters running inside Docker containers on different environments - let it be public/private cloud or bare metal. We share the same processes, automations and zero-configuration approach across all environments and allow users to span up SLA policy based autoscaling clusters of arbitrary sizes in minutes - all built on open source components exclusively. We will discuss the architecture, main building blocks (Docker, Consul, Apache Ambari, YARN) and the tools we made available (API, CLI and UI). The session will end up with a quick demonstration. Be your own Hadoop as a Service provider.


Janos Matyas

Janos is a Sr. Director of Engineering at Hortonworks and former CTO at SequenceIQ (acquired by Hortonworks) - a young startup with the mission statement of simplifying the provisioning, development and SLA policy based autoscaling on Hadoop. Before co-founding SequenceIQ he was a... Read More →

Monday September 28, 2015 11:30 - 12:20


Dynamics of Benchmarking Distributed Key-Value(KV) Store (Hbase, Cassandra, Accumulo, Hypertable, Aerospike) for Hosting TeraBytes of Data - Pracheer Agarwal, Inmobi & Kunal Gautam, Inmobi
Identifying a KV store to host terabytes of data, from wide range of choices, for a set of given use cases, is a daunting proposition. Even after crossing the fearsome first step and narrowing down the candidates, it is non-trivial to explain and reason the actual results of benchmarking experiments with the expected results. There are multiple variables at play and it is not often clear on how these interplay at run time.
In this talk, we present our experiences and methodology of how to analyze and effectively benchmark a distributed KV store. This involves monitoring and characterizing key server parameters like RAM, CPU, network, storage, page cache, IO scheduler, JVM size, GC tuning and logically reason out their effects on the overall performance and capabilities of the underlying KV store. These parameters were monitored by utilities like iostat, dstat, iftop, jstat and cachestat

avatar for Kunal Gautam

Kunal Gautam

Senior Software Engineer, Inmobi
Kunal Gautam is a strong thought leader in the field of Big data and has hands on experience in using Hadoop framework. Very talented and received several awards for providing ideas and implementing them to working product.Kunal has been working with distributed architectures over... Read More →

Monday September 28, 2015 14:00 - 14:50


Leveraging Ambari to Build Comprehensive Management UIs For Your Hadoop Applications - Christian Tzolov, Pivotal
This presentation will demonstrate how to leverage modern HTML5 technologies with the flexibility of Apache Ambari to build a comprehensive, responsive and attractive management interfaces for your Hadoop applications. In the process we will walk you through the reference implementation of an management interface for SQL-on-Hadoop application and integrate it with Apache Ambari. We will share our experience in using technologies like Google Polymer, Spring Boot and Apache Ambari.

avatar for Christian Tzolov

Christian Tzolov

Pivotal Inc
Christian Tzolov, Pivotal technical architect, BigData and Hadoop specialist, contributing to various open source projects. In addition to being an Apache® Committer and Apache Crunch PMC Member, he has spent over a decade working with various Java and Spring projects and has led... Read More →

Monday September 28, 2015 15:00 - 15:50


Testing Big Data Pipelines Made Super Easy - Pallavi Rao, Inmobi & Pavan Kumar Kolamuri, Inmobi
Your company has developed a system that crunches humongous data from multiple data sources. It involves multiple and varied processing modules. Each of the individual modules has been well tested. But, when you try to deploy these modules and connect them either in an integration or a staging environment, you face issues. To debug these errors and re-test them requires you to set up a mirror environment and is time consuming. So, you write complex integration tests and equally complex setup scripts. Apache Falcon and Falcon Unit to your rescue! While Apache Falcon alleviates some of the problems of pipeline orchestration, Falcon Unit, a feature of Falcon, helps users test their entire pipeline and data lifecycle without even setting up a test environment. This talk will outline the capabilities of Falcon Unit and how it helps users test data pipelines early on in the development phase.


Pallavi Rao

Pallavi is an Architect at InMobi. She has been working on big data technologies for nearly 4 years now. She has deep knowledge of the Hadoop ecosystem, especially, YARN, PIG, Oozie, HBase, Hive and Storm. Since the past 6 months she has been actively contributing to Apache Falcon... Read More →

Monday September 28, 2015 16:00 - 16:50
Tuesday, September 29


Synthetic Data Generation for Realistic Analytics Examples and Testing - RJ Nowling, Red Hat
Big Data users are faced with an enormous gap between trivial tutorial applications and real-world analytics pipelines. Word count and TeraSort have limited value as blueprints and may not exercise enough of the data processing stack to be useful for testing deployments. Since real data are typically encumbered by privacy or intellectual property concerns, tutorials and test cases often use small or unrepresentative data sets. Generative models can enable a new class of realistic example and test applications by synthesizing rich and complex data sets. Furthermore, synthetic data can be scaled from a single laptop to data centers. We will present on data generators, such as BigPetStore from Apache BigTop, influenced by data we’ve analyzed in the Emerging Technologies team at Red Hat. We also discuss realistic example applications and usage for smoke-testing deployments.


RJ Nowling

Software Engineer, Red Hat, Inc.
RJ Nowling is a Software Engineer in Emerging Technology at Red Hat, Inc., where he is part of a data science team that consults for internal customers. RJ is a committer on Apache BigTop, a contributor to Apache Spark, and co-lead of the BigPetStore family of big data example applications... Read More →

Tuesday September 29, 2015 10:30 - 11:20


How Bigtop Leveraged Docker for Build Automation and One-Click Hadoop Provisioning - Evans Ye, Trend Micro
Apache Bigtop as an open source Hadoop distribution, focuses on developing packaging, testing and deployment solutions that helps infrastructure engineers to build up their own customized bigdata platform as easy as possible. However, packages deployed in production require a solid CI testing framework to ensure its quality. Numbers of Hadoop component must be ensured to work perfectly together as well. In this presentation, we'll talk about how Bigtop deliver its containerized CI framework which can be directly replicated by Bigtop users. The core revolution here is the newly developed Hadoop provisioner that leveraged Docker for infra automation. The content of this talk includes the technical details of Bigtop Hadoop provisinoer, a hierarchy of docker images we designed, and several components we developed such as Bigtop Toolchain to achieve build automation.

avatar for Evans Ye

Evans Ye

ASF member, Apache Bigtop Committer/PMC member/Former VP, Director of Taiwan Data Engineering Association, Apache Software Foundation
Yu-Hsin Yeh(Evans Ye) is former VP, and currently committer and PMC member of Apache Bigtop. He loves to code, automate things, and tackling big data challenges. Aside from engineering stuff, he is also an enthusiast in giving talks to share software innovations and cutting-edge technologies... Read More →

Tuesday September 29, 2015 11:30 - 12:20