Back To Schedule
Tuesday, September 29 • 10:30 - 11:20
Synthetic Data Generation for Realistic Analytics Examples and Testing - RJ Nowling, Red Hat

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Big Data users are faced with an enormous gap between trivial tutorial applications and real-world analytics pipelines. Word count and TeraSort have limited value as blueprints and may not exercise enough of the data processing stack to be useful for testing deployments. Since real data are typically encumbered by privacy or intellectual property concerns, tutorials and test cases often use small or unrepresentative data sets. Generative models can enable a new class of realistic example and test applications by synthesizing rich and complex data sets. Furthermore, synthetic data can be scaled from a single laptop to data centers. We will present on data generators, such as BigPetStore from Apache BigTop, influenced by data we’ve analyzed in the Emerging Technologies team at Red Hat. We also discuss realistic example applications and usage for smoke-testing deployments.


RJ Nowling

Software Engineer, Red Hat, Inc.
RJ Nowling is a Software Engineer in Emerging Technology at Red Hat, Inc., where he is part of a data science team that consults for internal customers. RJ is a committer on Apache BigTop, a contributor to Apache Spark, and co-lead of the BigPetStore family of big data example applications... Read More →

Tuesday September 29, 2015 10:30 - 11:20 CEST

Attendees (0)