September 15th, 2020

My Experience at the World’s Best AI Company

RSS icon RSS Category: Makers

Blog post by Spencer Loggia

When H2O announced that remote work would continue through the summer due to Covid-19, I was a little disappointed. I expected that it would be difficult to connect with others as a new employee, especially as an intern.

My internship now comes to an end, and I realize how completely wrong I was. I’ve met and worked with people across the company, each with a unique set of skills I have learned from. Everyone was willing to help with whatever problem I had, and I felt that any question was valid. I was given real responsibility and felt like a full employee instead of an intern. 

There were definitely challenges unique to the remote experience. I had to be comfortable reaching out to people I’d never met before. I had to stay focused and efficient without any immediate guidance. I learned to navigate and work with the infrastructure of a whole company without anyone physically there to walk me through it. Ultimately, this all may have been a gift, because I had to understand the software I was working with and, to some extent, myself on a deeper level in order to succeed. 

My time at H2O wasn’t spent working on an isolated “Intern Project” as is common at other companies. That being said, my contributions can be broadly broken down into two separate projects below, as well as a variety of smaller tasks. 

Reducing the size of the MOJO Pipeline

The Mojo scoring pipeline is a lightweight and platform independent way for users to productionize completed experiments. Mojo works by serializing models using Google’s protobuf, and then loading them to make predictions using either a Java or C++ runtime. 

I determined that the Mojo pipeline was especially large for certain time series experiments. For anyone who doesn’t know, a time series is when the sequence of the data rows is important. For many datasets order is arbitrary, for example a task like image recognition. However, for others the order itself may be the most important feature, like when predicting the value of a product some distance into the future. Some methods for dealing with sequential data involve the use of “lag intervals”. This allows information from some prior rows to be used for the current prediction. 

For each lag interval, a transformation may be applied to the lagged data. This transformer is serialized to be used in the mojo scoring pipeline. With an ordinary experiment there should be no circumstance in which many identical transformers act on the same data. But with lag this is exactly what occurs, and it can result in whole columns being redundantly serialized and stored for each lag interval.

The solution in the end was rather simple, just hashing the relevant key and value data, then referencing previous protobuf files if they had been created. However, it required me to learn how DAI transforms data, how protobuf works, and how serialized models are read by the mojo runtime. In the end, I achieved an order of magnitude decrease in the size of the pipeline for certain time series experiments, which can help clients to successfully deploy mojos for very large datasets.

Enterprise Steam Client Testing

I was lucky enough to be able to work with H2O Enterprise Steam as well, which is a product that allows admins to securely manage H2O-3 and Sparkling Water clusters on Hadoop and Driverless instances on Kubernetes. Early on I was assigned a small task related to Steam, a bug fix or something of that nature. It turned out that the team could use help developing a testing suite for the Steam python client, which gave me a chance to see a completely different side of the company, and to learn more about Hadoop, Spark, and Docker. 

It was challenging at times to be split between unrelated tasks, but I think it painted a more accurate picture of what life at a rapidly growing company looks like than I would have gotten otherwise.

To Conclude

Besides that, I was able to work with some of the brightest minds in the field, and to use Driverless AI every day, which made even complicated, large-scale ML problems breathtakingly simple and efficient. I also had the invaluable experience of the hectic push before new version releases, it was a pleasure to participate in all the testing and bug catching necessary for new features. 

Everything I worked on might seem pretty mundane. After all, optimization, testing, and bug-chasing isn’t exactly glamorous – except that it really is. Certain new technologies have changed the very fabric of society, and it seems clear that AI is next. With its mission of democratizing AI, H2O will ensure that change happens in the best way possible. I am happy to have been able to contribute to that in some small way. 

About the Author

Spencer Loggia is a Junior at Johns Hopkins University majoring in Computer Science and Neuroscience. He has worked in three research labs focusing on resolving the structure of the neural networks involved in attention selectivity, developing software for modeling protein complex assembly, and viral engineering. He is especially interested in brain machine interfaces, general AI, and the use of machine learning to better understand biological systems.

https://www.linkedin.com/in/spencerloggia/

About the Author

Jo-Fai Chow

Jo-fai (or Joe) has multiple roles (data scientist / evangelist / community manager) at H2O.ai. Since joining the company in 2016, Joe has delivered H2O talks/workshops in 40+ cities around Europe, US, and Asia. Nowadays, he is best known as the H2O #360Selfie guy. He is also the co-organiser of H2O's EMEA meetup groups including London Artificial Intelligence & Deep Learning - one of the biggest data science communities in the world with more than 11,000 members.

Before joining H2O, he was in the business intelligence team at Virgin Media where he developed data products to enable quick and smart business decisions. He also worked remotely for Domino Data Lab as a data science evangelist promoting products via blogging and giving talks at external events.

Joe has a background in water engineering. Before his data science journey, he was an EngD researcher at STREAM Industrial Doctorate Centre working on machine learning techniques for drainage design optimisation. Prior to that, he was an asset management consultant specialised in data mining and constrained optimisation for the utilities sector in UK and abroad. He also holds a MSc in Environmental Management and a BEng in Civil Engineering.

Long before Joe immersed himself in the world of open-source R and Python, he learned his trade as an avid MATLAB user. When he was a kid, his parents taught him one of the famous old Chinese sayings - when one drinks water, one must not forget where it comes from. So when Twitter asked Joe to be creative, he simply put down @matlabulous as his handle.

Leave a Reply

Using AI to unearth the unconscious bias in job descriptions

“Diversity is the collective strength of any successful organization Unconscious Bias in Job Descriptions Unconscious bias affects

January 19, 2021 - by Parul Pandey and Shivam Bansal
H2O Driverless AI 1.9.1: Continuing to Push the Boundaries for Responsible AI

At H2O.ai, we have been busy. Not only do we have our most significant new

January 18, 2021 - by Benjamin Cox
Meet the Data Scientist who just cannot stop winning on Kaggle.

In conversation with Philipp Singer: A Data Scientist, Kaggle Double Grandmaster, and a Ph.D. in

January 15, 2021 - by Parul Pandey
Liqui.do Speeds Credit Scoring for Fair Lending with H2O.ai

Liqui.do is a technological and innovative company developing a platform for leasing equipment for small

January 12, 2021 - by Eve-Anne Tréhin
New Improvements in H2O 3.32.0.2

There is a new minor release of H2O that introduces two useful improvements to our

December 17, 2020 - by Veronika Maurerova
Introducing H2O Wave

For almost a decade, H2O.ai has worked to build open source and commercial products that

December 15, 2020 - by Jo-Fai Chow and Benjamin Cox

Join the AI Revolution

Subscribe, read the documentation, download or contact us.

Subscribe to the Newsletter

Start Your 21-Day Free Trial Today

Get It Now
Desktop img