Meet the Community
H2O World New York 2019 is an interactive community event featuring advancements in AI, machine learning and explainable AI. Thousands of attendees from around the world watch sessions from the makers behind H2O.ai, leading academics, and our customer community. Attendees discover the strategies and insights they need to accelerate their AI journey.
Join us to connect with the community and learn how to harness the full value of AI, machine learning, explainable AI, deep learning and data science from industry-recognized speakers and a hands-on training session with H2O Driverless AI.
Bio: Sri Ambati is the CEO and Founder of H2O.ai – the maker behind H2O, the leading open source machine learning platform used by 18,000 companies and hundreds of thousands of data scientists. Prior to H2O.ai, he co-founded the big data analytics company Platfora. His professional career also spans technical and executive roles at Datastax, Azul Systems, and RightOrder. His academic career involved sabbaticals in theoretical neuroscience at Stanford and Berkeley and an M.S. in math and computer science from the University of Memphis. He was recently recognized by Datanami as one of the 12 People to Watch 2019.
Sri is known for his knack for envisioning the killer apps in fast evolving spaces and assembling stellar teams towards productizing that vision. A regular speaker in the AI, ML and Big Data circuit, Sri leaves a trail @srisatish
Bio: Arno Candel is the Chief Technology Officer at H2O.ai. He is the main committer of H2O-3 and Driverless AI and has been designing and implementing high-performance machine-learning algorithms since 2012. Previously, he spent a decade in supercomputing at ETH and SLAC and collaborated with CERN on next-generation particle accelerators.
Arno holds a PhD and Masters summa cum laude in Physics from ETH Zurich, Switzerland. He was named “2014 Big Data All-Star” by Fortune Magazine and featured by ETH GLOBE in 2015. Follow him on Twitter: @ArnoCandel.
Bio: Dr. LeDell is the Chief Machine Learning Scientist at H2O.ai, the company that produces the open source, distributed machine learning platform, H2O. Before joining H2O.ai, she was the Principal Data Scientist at two AI startups (both acquired), the founder of DataScientific, Inc. and a software engineer at a large consulting firm. She received her Ph.D. from UC Berkeley where her research focused on machine learning and computational statistics. She also holds a B.S. and M.A. in Mathematics.
Leland Wilkinson is Chief Scientist at H2O and Adjunct Professor of Computer Science at the University of Illinois Chicago. He received an A.B. degree from Harvard in 1966, an S.T.B. degree from Harvard Divinity School in 1969, and a Ph.D. from Yale in 1975. Wilkinson wrote the SYSTAT statistical package and founded SYSTAT Inc. in 1984. After the company grew to 50 employees, he sold SYSTAT to SPSS in 1994 and worked there for ten years on research and development of visualization systems. Wilkinson subsequently worked at Skytree and Tableau before joining H2O.
Wilkinson is a Fellow of the American Statistical Association, an elected member of the International Statistical Institute, and a Fellow of the American Association for the Advancement of Science. He has won best speaker award at the National Computer Graphics Association and the Youden prize for best expository paper in the statistics journal Technometrics. He has served on the Committee on Applied and Theoretical Statistics of the National Research Council and is a member of the Boards of the National Institute of Statistical Sciences (NISS) and the Institute for Pure and Applied Mathematics (IPAM). In addition to authoring journal articles, the original SYSTAT computer program and manuals, and patents in visualization and distributed analytic computing, Wilkinson is the author (with Grant Blank and Chris Gruber) of Desktop Data Analysis with SYSTAT. He is also the author of The Grammar of Graphics, the foundation for several commercial and opensource visualization systems (IBMRAVE, Tableau, Rggplot2, and PythonBokeh).
The Case for Model Debugging
Abstract: Prediction by machine learning models is fundamentally the execution of computer code. Like all good code, machine learning models should be debugged for logical or runtime errors or for security vulnerabilities. Recent, high-profile failures have made it clear that machine learning models must also be debugged for disparate impact across demographic segments and other types of sociological bias. Model debugging enhances trust in machine learning directly by increasing accuracy in new or holdout data, by decreasing or identifying hackable attack surfaces, or by decreasing sociological bias. As a side-effect, model debugging should also increase understanding and explainability of model mechanisms and predictions. This presentation outlines several standard and newer model debugging techniques and proposes several potential remediation methods for any discovered bugs. Discussed debugging techniques include adversarial examples, benchmark models, partial dependence and individual conditional expectation, random attacks, Shapley explanations of predictions and residuals, and models of residuals. Proposed remediation approaches include alternate models, editing of deployable model artifacts, missing value injection, prediction assertions, and regularization methods.
Bio: Patrick Hall is the Senior Director of Product at H2O.ai where he focuses mainly on model interpretability. Patrick is also currently an adjunct professor in the Department of Decision Sciences at George Washington University, where he teaches graduate classes in data mining and machine learning. Prior to joining H2O.ai, Patrick held global customer-facing roles and research and development roles at SAS Institute.
Bio: Ashrith is the security scientist designing anomalous detection algorithms at H2O.ai. He recently graduated from the Center of Education and Research in Information Assurance and Security (CERIAS) at Purdue University with a PhD in Information security. He is specialized in anomaly detection on networks under the guidance of Dr. William S. Cleveland. He tries to break into anything that has an operating system, sometimes into things that don’t. He has been christened as “The Only Human Network Packet Sniffer” by his advisors. When he is not working he swims and bikes long distances.
Bio: Kim Montgomery has a Ph.D. in applied mathematics, with a background in both predictive modeling and differential equations. She has significant experience applying mathematical modeling to problems in the energy industry and in the biosciences.
She is a Kaggle Grandmaster and has been ranked as high as 15th in the overall Kaggle rankings. She’s excited to be applying her skills at H2O.ai.
Bio: Nicholas Schmidt is a partner at BLDS, LLC, and heads the Artificial Intelligence Practice. In these roles, Nick specializes in the application of statistics and economics to questions of law, regulatory compliance, and best practices in model governance.
As head of the A.I. practice, Nick develops and assists in the deployment of methods that allow his clients to make their A.I. models fairer and more inclusive. In this work, he has created A.I.-based techniques that enable clients to minimize disparate impact in credit, insurance, and marketing models. He has additionally helped his clients understand and implement methods that open “black-box” A.I. models, enabling a clearer understanding A.I.’s decision-making process. His clients use this work to inform their customers on potential denials of credit (“adverse action notices”). These methods are used in a number of the top-10 U.S. retail banks and FinTechs.
In his litigation practice, Nick testifies and consults on matters relating to employment discrimination litigation, wage and hour law, and other matters requiring the utilization of statistics to address questions of liability or damages.
Nick holds an MBA in economics and econometrics from the University of Chicago Booth School of Business.
Bio: Mark Landry is a Competition Data Scientist and Product Manager at H2O.ai. He enjoys testing ideas in Kaggle competitions, where he is ranked in the top 100 in the world (top 0.03%) and well-trained in getting quick solutions to iterate over. Most at home in SQL, he found H2O through hacking in R. Interests are multi-model architectures and helping the world make fewer models that perform worse than the mean
Sudalai Rajkumar (SRK)
Natural Language Processing (NLP) with Driverless AI
H2O Driverless AI is H2O.ai’s flagship platform for automatic machine learning. It fully automates the data science workflow including some of the most challenging tasks in applied data science such as feature engineering, model tuning, model optimization, and model deployment. Driverless AI turns Kaggle Grandmaster recipes into a full functioning platform that delivers “”an expert data scientist in a box”” from training to deployment. In the latest version of our Driverless AI platform, we have included Natural Language Processing (NLP) recipes for text classification and regression problems. With this new capability, Driverless AI can now address a whole new set of problems in the text space like automatic document classification, sentiment analysis, emotion detection and so on using the textual data. Stay tuned to the webinar to know more.
Bio: Sudalai Rajkumar (aka SRK) is a Senior Data Scientist at H2O.ai, building Driverless AI, an automated machine learning platform. Prior to this, he was with Freshworks, Tiger Analytics and Global Analytics. He has more than 8 years of experience in the DS / ML field and solved a lot of interesting data science problems for various customers across the globe. Apart from his day job, he takes part in various data science competitions to enhance his knowledge and has won several of them. He is a Kaggle Grandmaster in Competitions & Kernels section. He is ranked #1 on Analytics Vidhya platform as well.
Bio: Matt Dowle is the main author of the data.table package in R. He has worked for some of the world’s largest financial organizations: Lehman Brothers, Salomon Brothers, Citigroup, Concordia Advisors and Winton Capital. He is particularly pleased that data.table is also used outside Finance, for example Genomics where large and ordered datasets are also researched. Matt has been programming in S/R for 15 years, knows C pretty well and holds a first class BSc in Applied Maths and Computing from Warwick University, U.K.
Productionizing H2O Models with Apache Spark
Spark pipelines represent a powerful concept to support productionizing machine learning workflows. Their API allows to combine data processing with machine learning algorithms and opens opportunities for integration with various machine learning libraries. However, to benefit from the power of pipelines, their users need to have a freedom to choose and experiment with any machine learning algorithm or library. Therefore, we developed Sparkling Water that embeds H2O machine learning library of advanced algorithms into the Spark ecosystem and exposes them via pipeline API. Furthermore, the algorithms benefit from H2O MOJOs – Model Object Optimized – a powerful concept shared across entire H2O platform to store and exchange models. The MOJOs are designed for effective model deployment with focus on scoring speed, traceability, exchangeability, and backward compatibility. In this talk we will explain the architecture of Sparkling Water with focus on integration into the Spark pipelines and MOJOs. We’ll demonstrate creation of pipelines integrating H2O machine learning models and their deployments using Scala or Python. Furthermore, we will show how to utilize pre-trained model MOJOs with Spark pipelines.
Bio: Jakub (or “Kuba” as we call him) completed his Bachelor’s Degree in Computer Science and Master’s Degree in Software Systems at Charles University in Prague. As a bachelor’s thesis, Kuba wrote a small platform for distributed computing of any types of tasks. During his master’s degree studies, he developed a cluster monitoring tool for JVM based languages which makes debugging and reasoning the performance of distributed systems easier using a concept called distributed stack traces. Kuba enjoys dealing with problems and learning new programming languages. At H2O.ai, Kuba works on Sparkling Water.
Aside from programming, Kuba enjoys exploring new cultures and bouldering. He’s also a big fan of tea preparation and the associated ceremony.
Bio: Shivam Bansal is a Data Scientist at H2O.ai and Kaggle Grandmaster in Kernels Section. He is the three times winner of Kaggle’s Data Science for Good Competition and winner of multiple other offline AI and Data Science competitions.
Shivam has extensive cross-industry and hands-on experience in building data science products. He has helped clients in the Insurance, Healthcare, Banking, and Retail domains to solve unstructured data science problems by building end to end pipelines and solutions. Shivam really likes to work on all aspects of a data science project which includes both technical aspects as well as business aspects.
Shivam obtained his masters degree in Business Analytics from National University of Singapore in 2019 and his bachelors was in Computer Science.
Bio: A Kaggle Grandmaster and a Data Scientist at H2O.ai, Mathias Müller holds an AI and ML focused diploma (eq. M.Sc.) in computer science from Humboldt University in Berlin. During his studies, he keenly worked on computer vision in the context of bio-inspired visual navigation of autonomous flying quadrocopters. Prior to H2O.ai, he as a machine learning engineer for FSD Fahrzeugsystemdaten GmbH in the automotive sector. His stint with Kaggle was a chance encounter as he stumbled upon the data competition platform while looking for a more ML-focused platform as compared to TopCoder. This is where he entered his first predictive modeling competition and climbed up the ladder to be a Grandmaster. He is an active contributor to XGBoost and is working on Driverless AI with H2O.ai.
Bio: Kaggle Grandmaster Branden is a customer data scientist at H2O.ai and holds a B.S. in Finance from the San Diego State University. Among his favorite hobbies is participating in predictive analytics competitions primarily on Kaggle.com. Currently, he is ranked 58th among Grandmasters globally and has stood in the top 10% 8 times among all the competitions he participated on Kaggle.
Branden is on the team of data scientists from H2O.ai behind PwC’s Audit Innovation of the Year title. They have collectively developed PwC’s Audit.ai – a revolutionary bot that does what humans can’t. Its AI analyses billions of different data points in seconds and applies judgement to detect anomalies in general ledger transactions.
Bio: Rohan Rao is a Machine Learning Engineer and Kaggle Grandmaster with over 5 years of experience building data science products in various industries and projects like digital payments, e-commerce retail, credit risk, fraud prevention, growth, logistics and more. He enjoys working on competitions, hackathons and collaborating with folks around the globe on building solutions.
He completed my post-graduation in Applied Statistics from IIT-Bombay in 2013.
Solving sudokus and puzzles has been his big hobby for over a decade. Having won the national championship multiple times, he has represented India and been in the top-10 in the World, as well as finished twice on the podium at the Asian Championships.
Bio: Yauhen holds a Master’s Degree in Applied Data Analysis and has over 4 years of working experience in Data Science. He worked in Banking, Gaming and eCommerce domains. He’s also the first Kaggle competitions Grandmaster in Belarus having gold medals in both classic Machine Learning and Deep Learning competitions.
Bio: Olivier graduated from Supelec, France and holds a PhD in Signal Processing. He worked in the Airline IT business at Amadeus as a C/C++ developer then joined the London branch as a team leader. In Capgemini, he worked with clients in the public sector as a senior project manager.
Olivier then moved to trading the commodity markets, building and backtesting trading systems, and progressively started using more machine learning tools. His data science journey really began on Kaggle where he could practice and improve his skills on various competitions and datasets.
In the last year, he worked for caring.com using machine learning and data mining to help families find communities for their parents.
Olivier loves spending his spare time in the yard where he grows organic apples, peers, grapes and strawberries. He also produces his own special jam.
October 22, 2019
Registration & Breakfast8:00am - 9:00am
Keynotes9:00am - 10:45am
Break10:45am - 11:00am
Keynotes11:00am - 12:00pm
Lunch & Leland Wilkinson "Grammar of Graphics" Book Signing12:00pm - 1:00pm
Explainable AI Track
Sessions1:00pm - 3:00pm
Break3:00pm - 3:15pm
Sessions3:15pm - 4:15pm
AI in Financial Services Panel4:15pm - 4:45pm
Meet the Kaggle Grandmasters4:45pm - 5:15pm
Explainable AI Panel5:15pm - 5:45pm
Closing Remarks5:45pm - 6:00pm
Reception & Networking6:00pm - 7:00pm
Many of New York’s most popular attractions are within walking distance of the Hilton Midtown hotel. Experience the buzz of Times Square, catch a Broadway show, or shop the day away on 5th Avenue. Radio City Music Hall, The Rockefeller Center, MOMA and Central Park are all just minutes away.