Back To Schedule
Wednesday, September 30 • 14:30 - 15:20
Using Natural Language Processing on Non-Textual Data with MLLib - Casey Stella, Hortonworks

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Natural language processing techniques are well established due to their obvious utility. Further, the rise in unstructured textual data has resulted in mature, distributed and scalable implementations beginning to be seen. While textual data is extremely common, there is apparently unstructured data which has underlying structure in the same way words which compose sentences have an underlying grammatical structure. This talk explores borrowing some natural language programming techniques to analyze the structure in non-textual data.

In particular, we consider the Word2Vec implementation in MLLib to help us organize and analyze non-textual clinical event data (I.e. Diagnoses, drugs prescribed, etc.). We will explore connections between diseases and drugs in an unsupervised way with Python, Spark and MLLib.


Casey Stella

Principal Architect, Hortonworks
I am a principal architect focusing on Data Science in the consulting organization at Hortonworks. In the past, I've worked as an architect and senior engineer at a healthcare informatics startup spun out of the Cleveland Clinic, as a developer at Oracle and as a Research Geophysicist... Read More →

Wednesday September 30, 2015 14:30 - 15:20 CEST

Attendees (0)