November 24th, 2017

Laying a Strong Foundation for Data Science Work

Category: Data Science, IT
Fallback Featured Image


By William Merchan, CSO, DataScience.com

In the past few years, data science has become the cornerstone of enterprise companies’ efforts to understand how to deliver better customer experiences. Even so, when DataScience.com commissioned Forrester to survey over 200 data-driven businesses last year, only 22% reported they were leveraging big data well enough to get ahead of their competition.
That’s because there’s a big difference between building predictive models and putting them into production effectively. Data science teams need the support of IT from the very beginning to ensure that issues with large-scale data management, governance, and access don’t stand in the way of operationalizing key insights about your customers. However, many enterprise companies are still treating IT involvement as an afterthought, which ultimately delays the timeline for seeing value from their data science efforts.
There are many ways that better IT management can help scale the impact of data science at your organization. Three best practices include using containers for data science environments, managing compute resources effectively, and putting work into production faster with the help of tools. Here’s how it’s done.
1. Using software containers is one of the most impactful steps you can take to implement IT management best practices. These standardized development environments ensure that the hard work your data scientists put into building predictive models won’t go to waste when it’s time to deploy their code. Without a container-based workflow, a data scientist starting a new analysis must either wait for IT to build an environment from scratch, or build one themselves using the unique combination of packages and resources they prefer — and waiting for those to install or compile.
There are two major issues associated with both of these approaches: they don’t scale, and they’re slow. When data scientists are individually responsible for configuring environments as needed, their work isn’t reproducible — if it’s used in a different environment, it might not even run. Containers put the power in the hands of IT to standardize environment configuration in advance using images, which are snapshots of containers. Data scientists can launch environments from those images — which have already been vetted by IT — saving a lot of time in the long run.
2. Provide ample computing power to support your data scientists’ analysis from start to finish. Empowering them to spin up compute resources in the cloud as needed ensures they never get held up by limited computing power. It also eliminates the potential additional cost of maintaining unnecessary nodes. The same idea applies to on-prem data centers. IT must carefully monitor the expansion of data science work and scale resources accordingly. It may seem obvious, but IHS Markit reports that companies not anticipating this need lose approximately $700 billion a year to IT downtime.
3. Put data science work into production right away to start seeing its value earlier on. Imagine your data science team has built a recommender system to predict what products a customer is likely to enjoy based on the products he or she has already purchased. Even if you’re satisfied with the model’s accuracy and have identified some unexpected relationships that should inform your targeting strategies, this information still needs to be integrated into your application or website for it to be valuable.
Traditionally, the pipeline that delivers those recommendations to your customers would be built by engineers and require extensive support from IT. The rise of microservices, however, gives data scientists the opportunity to deploy models as APIs that can be integrated directly into an application.
If you’re among the 78% of companies not fully realizing the return on your data science investment, chances are there’s room to improve the IT foundation you’ve laid. To learn more about the next steps, find out how to take an agile approach to data science.
About the Author
William Merchan leads business and corporate development, partner initiatives, and strategy at DataScience.com as chief strategy officer. He most recently served as SVP of Strategic Alliances and GM of Dynamic Pricing at MarketShare, where he oversaw global business development and partner relationships, and successfully led the company to a $450 million acquisition by Neustar.

Leave a Reply

New features in H2O 3.22

Xia Release (H2O 3.22) There's a new major release of H2O and it's packed with new

November 12, 2018 - by Jo-Fai Chow
Top 5 things you should know about H2O AI World London

We had a blast at H2O AI World London last week! With a record number

November 6, 2018 - by Bruna Smith
Fallback Featured Image
Anomaly Detection with Isolation Forests using H2O

Introduction Anomaly detection is a common data science problem where the goal is to identify odd

November 6, 2018 - by angela
Fallback Featured Image
Launching the Academic Program … OR … What Made My First Four Weeks at H2O.ai so Special!

We just launched the H2O.ai Academic Program at our sold-out H2O AI World London. With

October 30, 2018 - by Conrad
Welcome H2O.ai’s new Driverless AI Community!

I am very excited to announce the formation of the inaugural community for H2O Driverless

October 30, 2018 - by Rafael Coss
Fallback Featured Image
How This AI Tool Breathes New Life Into Data Science

Ask any data scientist in your workplace. Any Data Science Supervised Learning ML/AI project will

October 16, 2018 - by Saurabh Kumar

Join the AI Revolution

Subscribe, read the documentation, download or contact us.

Subscribe to the Newsletter

Start Your 21-Day Free Trial Today

Get It Now
Desktop img