February 2nd, 2021

Successful AI: Which Comes First, the Data or the Question?

RSS icon RSS Category: Business, Driverless AI
Fallback Featured Image

Successful AI is a business process.

Even the most sophisticated models, the latest algorithms, and highly experienced AI experts cannot make AI a practical success unless it is connected to a meaningful business goal. To make that happen, you need a good interaction between those with knowledge of the business and with the AI team. But where do you start in finding the valuable AI use case? Do you start with a business question or with your data? 

In AI and machine learning, it’s essential to use the right data and the right machine learning technology. In fact, unlike traditional programming, data makes the model. This is one reason that AI offers the advantage of adaptability: the ability to change fairly quickly in response to changes in the world around you.  So which data you use matters a lot for the outcome of an AI project, but how do you know what data is the right data until you’ve defined and refined the question you are asking?  

The question you ask will point you to what data is needed if the question is framed in a way that it can be addressed by an AI model. But how do you find the right question?

The answer to which comes first, the data or the question, is really both. You can start with either one, but the process is a bit different depending on where you start. We will dive into what’s involved either way, but first, please keep in mind that successful AI is not a process with a rigid set of rules. Instead, it evolves out of trial and error, good communication and creative thinking: keep the goal in sight – a specific useful business goal – and then explore various ways to approach it. And as to starting with data or the question, if you are new to AI, beginning with the question may be a simpler way to get started.


Part 1  Start with the AI Question

You do not need to be an AI expert to help frame the questions that AI will address. That’s because you do not need to understand the exact details of how an algorithm works in order to know what you need it to do for your business. Even though those details will matter eventually, the best results tend to come from an effective collaboration between AI experts and people with expert domain knowledge, particularly including business knowledge. And starting with the question rather than the data is not just an approach for those new to AI: experts often begin with the question they want to answer.  But regardless of the experience level of those involved, the AI questions they frame generally should meet the following criteria in order to be of value. 

Is the AI question:

  • Attached to a practical and valuable business goal?
  • Sufficiently focused and appropriate to be answerable, and is there a way to measure success? (In other words, what metrics will be used?)
  • Addressable by AI algorithms? And do you already have, or can you get, the required data?
  • One for which you have a way to act on the results the model will deliver?  

Let’s dive a bit into the first two bullet points, a practical business goal and an answerable question for which you can measure the success. 

Define and Refine the AI Question 

AI-based projects have great potential, even sometimes to the extent of creating a new line of business, but more often they are designed to address something you already want to know or a process you already want to optimize. That’s one reason that starting with the question can be an easier entry point for those new to AI and machine learning. The real issue is to figure out how AI modelling can provide value beyond what you would do manually or through a traditional program or rules-based approach. This may be a situation where there’s too much data to make it practical to analyze via a manual approach. Or you may want to work on a process that you already do manually or based on rules to see if an adaptive learning model would be faster or more accurate or more convenient. It could even be as simple as prioritizing manual tasks. In general, look for bottlenecks in business processes to see if AI can improve performance. 

Whether you are trying something new or looking to optimize an existing process, it’s important to start with a question that addresses a specific goal and then refine it until it is sufficiently focused to be answerable by an AI approach. For example, saying “How do we reduce customer churn?” is a good start, but stated this way it is is far too broad of a question. Instead begin to ask, what behaviors or signals might let an AI system effectively predict which customers are at risk in terms of churn? What events or behaviors might suggest the issues that are causing dissatisfaction? You may want to further refine the question of prediction by applying a time interval to the events and behaviors being modeled. Can your AI system predict the risk of churn 2 – 4 weeks in advance? This approach also can help you think of how to design a system – perhaps an automated system  – to take appropriate and targeted action to restore confidence and satisfaction among customers at risk. 

The process of defining and refining good questions often occurs in stages and may take place through back-and-forth between those with business expertise and those with data expertise on the data science team. This exchange requires translating ideas between these different contexts, a topic I’ve discussed in a webinar called “The Conversation that is Key to AI Success”. Consider this hypothetical conversation between the domain knowledge expert and a data scientist in framing a question regarding the supply chain for a manufacturer:

One way to become more practiced at framing good questions is to look at what has been successful for others. And don’t limit yourself to just the sector of your own business. There’s a lot to learn from the way people have used AI across a variety of sectors and businesses.

Cross-Industry Patterns

The details of how good questions are focused and refined will of course vary with the industry, but there frequently are similarities that cross industries. Predicting churn, for example, is valuable for businesses as different as telecommunications and the financial sector. The exact situations or behaviors used to generate features in training data will be different in each case, but the style of the approach is related, so it can be useful to consider how AI systems were built to accomplish this common goal. In the same vein, using AI for anomaly detection or as the basis to improve fraud detection and prevention is naturally of interest for banks and other financial businesses, but it’s also an important issue for insurance companies or health care providers or even manufacturers. While the parameters for refining the AI question are industry-specific, the approach may follow the same general pattern.

For that reason, it’s really helpful to look at examples of AI at work, both by use case and by industry. The H2O.ai solutions page describes many real-world examples.

AI in Manufacturing

AI in Retail

Notice the many ways that AI can help in each of these industries, including the cross-overs between these very different businesses. For example, understanding how AI is used to address supply chain issues in manufacturing is a big help in knowing how to deal with supply chain in retail. Predictive maintenance in manufacturing often involves anomaly detection, an approach useful for fraud detection techniques in the insurance industry.


Metrics: How Will You Recognize Success?

The ultimate test of whether or not you’ve crafted a good AI question is the outcome. This is much more than just the evaluation of the models’ performance. How will you measure the impact that AI has on the business process you’ve targeted?

The idea of evaluating impact is a whole topic in itself, but it is important to keep it in mind as you select a business goal and refine the question or process you are going to use AI to address. Think about what business metrics are available to measure change. And consider how much improvement is worth the effort. Especially if you are using AI to optimize an existing process, use business metrics of current performance to establish a baseline for comparison. That way, you’ll have an indication of how much (or if) AI is helping. For example, a 1% improvement in a key business indicator might be hugely valuable or it might just be bragging rights for the data scientist. Practical business metrics and cost/benefit considerations will tell you how to recognize success.

Communication is Key

As mentioned in the introduction, an effective interaction between AI professionals and experts with business or other domain-specific knowledge is a key to AI success. AI technology can facilitate this conversation if it is good at providing explainable AI. H2O.ai is a leader in this field, with excellent AI technology such as the H2O Driverless AI.

The H2O Driverless AI platform provides a rich collection of capabilities to improve help with visualization of the process and with explainability of AI models. These capabilities not only speed up AI development but also make it easier for data scientists to communicate with and collaborate with business experts in framing the right questions for AI to address.

Free 2-hour tutorials are available or take H2O Driverless AI out for a free 21-day test run

Additional Resources

To find out more about how to make AI a practical part of your business, read my blog “Making AI a Reality”. 

You can also download a free pdf of the ebook Practical Advice for Making AI a Part of Your Company’s Future.

For more on the collaboration between business leaders and data scientists, watch a replay of the webinar “The Conversation that is Key to AI Success”


About the Author

Ellen Friedman, PhD

Ellen is Technical Evangelist at H2O.ai. She is an international speaker, author, and scientist with a PhD in biochemistry from Rice University. Ellen has been a committer for Apache Drill and Apache Mahout projects and previously a laboratory researcher in molecular biology. In addition to authoring publications in technical fields from genetics to oceanography, she is co-author of data-related books published by O’Reilly Media, including AI & Analytics in ProductionMachine Learning Logistics, Streaming ArchitectureIntroduction to Apache Flink and the Practical Machine Learning series. Ellen has been an invited speaker for keynotes at JFokus in Stockholm, Big Data London, the University of Sheffield Methods Institute (UK) and NoSQL Matters in Barcelona as well as invited talks at Nike Tech Talks (Portland OR), Berlin Buzzwords and Strata Data conferences in San Jose CA and London. She's also an artist with not-enough-time for the paint box.

Leave a Reply

What are we buying today?

Note: this is a guest blog post by Shrinidhi Narasimhan. It’s 2021 and recommendation engines are

July 5, 2021 - by Rohan Rao
The Emergence of Automated Machine Learning in Industry

This post was originally published by K-Tech, Centre of Excellence for Data Science and AI,

June 30, 2021 - by Parul Pandey
What does it take to win a Kaggle competition? Let’s hear it from the winner himself.

In this series of interviews, I present the stories of established Data Scientists and Kaggle

June 14, 2021 - by Parul Pandey
Snowflake on H2O.ai
H2O Integrates with Snowflake Snowpark/Java UDFs: How to better leverage the Snowflake Data Marketplace and deploy In-Database

One of the goals of machine learning is to find unknown predictive features, even hidden

June 9, 2021 - by Eric Gudgion
Getting the best out of H2O.ai’s academic program

“H2O.ai provides impressively scalable implementations of many of the important machine learning tools in a

May 19, 2021 - by Ana Visneski and Jo-Fai Chow
Regístrese para su prueba gratuita y podrá explorar H2O AI Hybrid Cloud

Recientemente, lanzamos nuestra prueba gratuita de 14 días de H2O AI Hybrid Cloud, lo que

May 17, 2021 - by Ana Visneski and Jo-Fai Chow

Start your 14-day free trial today