Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
View analytic
Tuesday, September 29 • 10:30 - 11:20
Synthetic Data Generation for Realistic Analytics Examples and Testing - RJ Nowling, Red Hat

Sign up or log in to save this to your schedule and see who's attending!

Big Data users are faced with an enormous gap between trivial tutorial applications and real-world analytics pipelines. Word count and TeraSort have limited value as blueprints and may not exercise enough of the data processing stack to be useful for testing deployments. Since real data are typically encumbered by privacy or intellectual property concerns, tutorials and test cases often use small or unrepresentative data sets. Generative models can enable a new class of realistic example and test applications by synthesizing rich and complex data sets. Furthermore, synthetic data can be scaled from a single laptop to data centers. We will present on data generators, such as BigPetStore from Apache BigTop, influenced by data we’ve analyzed in the Emerging Technologies team at Red Hat. We also discuss realistic example applications and usage for smoke-testing deployments.

Speakers
RN

RJ Nowling

Software Engineer, Red Hat, Inc.
RJ Nowling is a Software Engineer in Emerging Technology at Red Hat, Inc., where he is part of a data science team that consults for internal customers. RJ is a committer on Apache BigTop, a contributor to Apache Spark, and co-lead of the BigPetStore family of big data example applications. Before joining Red Hat, RJ focused on academic research in the fields of computational physics, bioinformatics, and distributed systems. He is currently a PhD... Read More →


Tuesday September 29, 2015 10:30 - 11:20
Krudy/Jokai

Attendees (17)