Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Use Cases [clear filter]
Tuesday, September 29


Data Ethics - Louis Suárez-Potts, Age of Peers, Inc.
I examine the ethics of Big Data in several ongoing projects and the possibilities of engaging subject communities in the processes and projects. As background: The ethics of data, especially "Big Data," can be considered as the linked ethics of gathering the data and then interpreting it. Big Data--the data and interpretation dyad--complicates this otherwise dull as dishwater process in part by obscuring acquisition and reading it as discovery, and in part by abstracting the particular elements making up the data even as those may refer to persons and their doings. That is: Were an experiment conducted on any population, the persons objectified would likely have to sign their consent. This talk looks to ways to engage (and so form) communities as subjects and not just objects of Big Data projects. Apache's important projects are key here.

avatar for Louis Suárez-Potts

Louis Suárez-Potts

Community Strategist, Age of Peers, Inc.
Louis Suárez-Potts is the community strategist for Age of Peers, a consultancy he co-founded in 2011. He also participates on the Project Membership Committee for Apache OpenOffice. From 2000 to 2011, Suárez-Potts was the Community Manager for OpenOffice.org, a role that entailed... Read More →

Tuesday September 29, 2015 10:30 - 11:20


Deriving Business Value From Large Image Collections on Hadoop - Michael Natusch, Pivotal
Image collections are rapidly growing in size. Efficient image management is necessary for large image collections to ensure easy searching and browsing. In this talk, we will describe how large image collections can be efficiently managed by presenting a content-based image retrieval (CBIR) system built on Hadoop. A CBIR system takes as an input a query image and returns images depicting content most similar to the input query image. Putting together a CBIR system involves building many components: the image collection, a feature extractor, and machine learning models for mining similar images. In this talk, we will present how a CBIR system can be easily and efficiently realized using Hadoop and SQL on Hadoop technologies. The system we present here discovers latent visual topics associated with each image and retrieves images based on similarity between corresponding visual topics.


Tuesday September 29, 2015 11:30 - 12:20


More Data, More Problems - A Practical Guide to Testing on Hadoop - Michael Miklavcic, Hortonworks
Just because the data is big doesn't mean you can't test. We believe it's probably even more critical to your SDLC on Hadoop to automate testing of your Hive, Pig, and MapReduce code than almost any other time investment you can make. We provide a soup-to-nuts, practical exposition on testing a variety of Hadoop application types to enable you to get better results faster.

avatar for Michael Miklavcic

Michael Miklavcic

Systems Architect, Hortonworks
Michael is a software engineer with over ten years of industry experience and has been a Systems Architect with Hortonworks for the past two years. He is a code contributor to the Apache Falcon project and works directly with clients to implement solutions using Hadoop. For over 2... Read More →

Tuesday September 29, 2015 14:00 - 14:50


Unified Analytics @InMobi Through Apache Lens - Amareshwari Sriramadasu, Inmobi
Apache Lens enables multi-dimensional queries in a unified way over datasets stored in multiple warehouses. Apache Lens allows queries to be executed where the data resides providing logical data cube abstraction. In a typical enterprise multiple data warehouses co-exist, as single one does not address the needs of all workload requirements in cost-effective way. Apache Lens unifies the underlying storages and allows multiple execution engines to access underlying data. It picks the right engine for execution at query time. In this talk, speakers will share the experience of running Apache Lens in production and discuss upcoming features in Apache Lens.

avatar for Amareshwari Sriramadasu

Amareshwari Sriramadasu

Architect, Inmobi
Amareshwari is currently working as Architect in data team at Inmobi, where she works on Hadoop and related projects for data collection and analytics. She is member of the ASF, Apache Incubator PMC, Apache Hadoop PMC, Apache Lens PMC and Apache Falcon PMC, and is Apache Hive committer... Read More →

Tuesday September 29, 2015 15:00 - 15:50


Collecting User's Data in a Socially-Responsible Manner - Konark Modi, Cliqz
Data from users is needed to build great products. Google, Facebook, Doubleclick would not be able to offer their services unless they had tons of data.

Cliqz is no exception, it needs massive amounts of query-logs, browsing patterns, etc. to build its search engine and phishing protection. Such data is also collected by other search engines like Google, Yandex and Bing. Industry standard is to send raw data and sanitize and filter out at the 'backend', however, this approach implies absolute trust on the company's good intentions, more so there is always a risk of a data-breach or a government subpoena.

Cliqz developed a framework called 'Human Web' that combines algorithms and an open infrastructure to collect data in an anonymous way by removing any trace of user identifiability. The 'Human Web' will be open-sourced to encourage others to collect user's data in a safer manner.


Konark Modi

Software Engineer, Cliqz
Konark Modi works as a Software Engineer with Cliqz on projects related to data collection and safe web principles.Cliqz is a novel search engine embedded in the browser with a very strong focus on privacy. Cliqz for Firefox (Germany only) has more than 3M users with more than 500K... Read More →

Tuesday September 29, 2015 16:00 - 16:50


BoFs: Next Generation Data Processing

This session is an informal meeting about post-map reduce frameworks such as Spark or Flink. We will also talk about the ecosystem, architectural patterns (eg. Lambda & Kappa), Programming(Scala et al) and abstraction/SQL framework on general purpose data engines.

 After hours of listening, it is about time that you have a chance talk. Share your thoughts, ideas and questions. Remember, there is no such thing as a stupid questions. This is also the perfect place to ask questions to session topics that came up after the session was closed.


1.       Recap and introduction to the topic

2.       General discussion in the big room

3.       Fork into sub-bofs into smaller room on demand (If you want to talk about details on a certain topics and have a deep-dive into technical details, we invite you to gather some people and create a “sub-bof”)



Stefan Papp

I am an Apache Hadoop Evangelist and my focus is data processing in distributed platforms. My core interests is right now Apache Flink and Apache Spark. And in addition to this I love to program, Scala is my favourite language as it is perfectly designed for distributed data processing... Read More →

Tuesday September 29, 2015 17:00 - 19:00
Wednesday, September 30


How We Use Kappa Architecture in all of Our Projects - Juantomas Garcia, Aspgems
As the CDO of ASPGems we use Kappa Architecture in every project we did the last 15 months. We want to explain what kappa architecture is, how we use it and what kind of problems we are solving in real projects. From small projects to very big ones (millions records per seconds).
We will explain also why scala + kafka + spark are the key technologies that help to do successful projects.

avatar for Juantomas Garcia

Juantomas Garcia

Data Solutions Manager, Open Sistemas
President Hispalinux (Spanish User Local Group) (1999-2007) Author of the book "La Pastilla Roja" the first book in spanish about free software (2004) More than 200 lectures around the world. Now CDO of Open Sistemas and advocate of Apache Spark and Kappa Architecture. Organize of... Read More →

Wednesday September 30, 2015 10:00 - 10:50


Profiting From Apache Projects Without Losing Your Soul - Shane Curcuru, Apache Software Foundation
Does your company want to capitalize on the Apache brand? Are you interested in seeing how closely you can tie your marketing into the latest Apache projects? Do you recognize the importance of supporting the Apache ecosystem, not just with code contributions but other actions? As VP of Brand Management for all Apache projects, Shane can help show business and technical leaders some of the ways they can respectfully and successfully market and position their own services and products in relation to Apache project brands. The key message is: Apache project governance is independent; but we are happy to have businesses build their software and services on any Apache software products. You may incorporate Apache brands within your brands, but in specific ways that still give our communities credit. We're here to help!

avatar for Shane Curcuru

Shane Curcuru

Founder, Punderthings Consulting
Shane serves as V.P. of Brand Management for the ASF, setting trademark and brand policy for all 250+ Apache projects, and has served as five-time Director, and member and mentor for Conferences and the Incubator. Shane's Punderthings consultancy is here to help both companies and... Read More →

Wednesday September 30, 2015 11:00 - 11:50


Data Quality on Mars - ISO 80000 and other Standards -Werner Keil
Big Data without Data Quality becomes messy and meaningless in most cases. Therefore, data and measurements have to be stored and transferred in a standard way.
We all know that when representing a temperature, for example, we normally have it as decimal/float. But, is this float in Celsius? Fahrenheit? Kelvin?

One of the most vivid examples was Mars Climate Orbiter being lost as the spacecraft went into orbital insertion, due to ground-based computer software which produced output in non-SI units of pound-seconds (lbf×s) instead of the metric units of newton-seconds (N×s) specified in the contract between NASA and Lockheed.

In this session we're going to explore data quality and measurement standards like ISO 80000 or UCUM (Unified Code for Units of Measure), Unit support for programming languages and APIs plus projects using them like Apache SIS, Performance Co-Pilot or uDig.

avatar for Werner Keil

Werner Keil

Director, Creative Arts & Technologies
Werner Keil is Agile Coach Java and IoT/Embedded expert. Helping Global 500 Enterprises across industries and leading IT vendors. He worked for over 25 years as Program Manager, Coach, SW architect and consultant for Finance, Mobile, Media, Tansport and Public sector. Werner is Eclipse... Read More →

Wednesday September 30, 2015 12:00 - 12:50


Hot 100 on Spark - Analyzing Trends in the Billboard Charts - Michael Miklavcic, Hortonworks
Are you a fan of data and music? It may be common knowledge that Taylor Swift and Katy Perry land a lot of number one singles, but are there other more subtle truths that we can find if we dig a little deeper? In this talk we dive into the Billboard charts using Spark and Spark SQL to look for trends and chart outliers using popular statistical analysis techniques like median absolute deviation (MAD).

avatar for Michael Miklavcic

Michael Miklavcic

Systems Architect, Hortonworks
Michael is a software engineer with over ten years of industry experience and has been a Systems Architect with Hortonworks for the past two years. He is a code contributor to the Apache Falcon project and works directly with clients to implement solutions using Hadoop. For over 2... Read More →

Wednesday September 30, 2015 14:30 - 15:20


Hadoop Backup and Scaling in Hybrid Environment - Pawel Leszczynski, Robert Mroczkowski, Mariusz Strzelecki, Allegro Group
In event sourcing architecture there is a single source of truth and Hadoop is the tool to fulfil that. We use Apache Kafka and Hermes message bus as a single entry point of events. There is no efficient solution for live backup of data with CRUD operations enabled. However when handling immutable events, we can backup data live to multiple locations like Hadoop cluster in another data center or any storage provider that supports S3 API.

Storing exact copies of data in different locations allows to extend compute power of private data center with public platform provider. Such a hybrid solution benefits from cloud elasticity, so we can easily scale on demand.

In this presentation architectural design patterns for backup and compute power scaling will be presented. We also focus on technical aspects of our architecture built on the top of open source software.

avatar for Pawel Leszczynski

Pawel Leszczynski

Hadoop Product Owner, Allegro Group
Paweł holds PhD in distributed databases and his interests focus on making Big Data easy. He has 7 years of technical experience at Allegro and currently works as Hadoop Product Owner in a Big Data Solutions Team. The team develops and maintains a petabyte Hadoop cluster with endpoints... Read More →
avatar for Robert Mroczkowski

Robert Mroczkowski

Senior Data Engineer, Allegro Group
In 2006 graduated master studies in Computer Science at Nicolaus Copernicus University. In years 2006 - 2011 he was a PhD student in Computer Science. His research field was Computer Science applied in Bioinformatics. He gained experience in Hadoop World building and maintaining a... Read More →
avatar for Mariusz Strzelecki

Mariusz Strzelecki

Senior Data Engineer, Allegro Group
A software developer with 5+ years of professional experience. Now working as a Senior Data Engineer in Allegro Group, developing tools that support internal Big Data ecosystem and contributing to Open Source.

Wednesday September 30, 2015 15:30 - 16:20


Leveraging Arm64 for Big Data Scale Out - Martin Stadtler, Linaro
ARM 64-bit servers are a true implementation of the scale-out architecture and a very good fit for distributed processing frameworks like Hadoop, Spark and big data analytics in general.

The session will provide a summary of the workloads running on ARM servers, the status of Aarch64 support in JDK9 and it will describe the set up, build and testing of Hadoop on ARM, the optimizations achieved so far, plans to be a reference citizen in the big data analytics community, collaborations with the ecosystem and next steps.

avatar for Martin Stadtler

Martin Stadtler

Director, Enterprise Group, Linaro
Martin Stadtler leads the Enterprise Group at Linaro.org. With over 20 years of experience with Open Source in the Enterprise and Telecom fields, now focused on ARM server adoption.

Wednesday September 30, 2015 15:30 - 16:20


How to Transform Data into Money Using Big Data Technologies - Jorge Lopez-Malla, Stratio
We are used to hearing that we live in the Age of Data but we have to face the truth: we live in the Age of “Big Data”. Companies are starting to realize that traditional technologies are not enough to accomplish their usual tasks with the massive amount of information that we are generating every day. Big Data processes are not as brand new as people think. Nonetheless, what we, the developers, as well as the companies aren’t used to seeing, is getting value out of their own data.

To illustrate this fact, we are going to show a successful use case in which using Apache Spark, HDFS and Apache Parquet, a Middle-East Telco company could not only start a new business line getting prized information for third parties from their own data, but also improve its own coverage network through the analysis of these data

avatar for Jorge Lopez-Malla

Jorge Lopez-Malla

Big Data Architech, Stratio
Jorge has been involved in the inception and implementation of projects related to several fields such as digital media, telcos, banks & insurance companies. He is in charge of Stratio’s Big Data training, having been one of the first engineers to become Spark certified. Previous... Read More →

Wednesday September 30, 2015 16:30 - 17:20