Read the Full Transcript
Patrick Moran: Hello everybody. Thank you for joining us today. My name is Patrick Moran. I’m from the marketing team at H2O.ai. And before we get started, we’d like to do a quick sound check. If you could please type into the questions tab if you are able to hear us clearly. Once we get a few responses, we’ll go ahead and get started.
Ingrid Burton: Great. Well, thank you everybody. Thank you for joining us here this morning, this afternoon or this evening, depending on where you’re at. I’m Ingrid Burton; I’m the CMO and an AI transformation leader at H2O.ai. And I have with me today Vinod Iyengar – I’m the lead of data transformation here at H2O.ai. I’m really excited to be here. So what we wanted to do today is share with you some key takeaways for the C- Suite. We’ll share what we’ve heard from a number of customers and people we talked to in the industry, as well as some key AI takeaways. So how do you embark upon an AI strategy if your company has not started? That seems to be a common theme today, as we move to seeing more AI and machine learning use across enterprises.
A lot of business leaders, um, from the CIO to the CTO to the chief analytics officer and even the CEOs have a lot of questions on “What do I do next?” So today, what we wanted to do is share with you some of the key challenges, the opportunities and just five things you should start looking at or thinking about. So with that, I’m going to take it away.
In terms of the challenges today in AI, we’re at an all-time peak in terms of interest that we see in the industry. There are a lot of reports and news out there around AI. But the bottom line for many businesses today is that there are three key challenges in embarking upon an AI strategy. First of all, who? Where are you going to get the people? That’s the talent question.
Where are the data scientists? How do I embark upon this journey, if I don’t even have a data scientist on my team. So that’s a main question that reoccurs regularly; we’re going to talk about that today. The second thing is maybe you have a data scientist, or maybe you have a team. The element of time comes into play. So it’s really important to understand that it’s not a quick fix for your business. There isn’t a fast way to just get an AI strategy and get results instantly. We would say that there’s a time element, and we want to address that time element today and talk about how you get to faster results, because it’s all about getting business results. It’s not about AI per se, it’s about what you can do with it.
So how do we cut the time to get into results? Time is a challenge today. Then there are a lot of questions around trust. How do I trust the models? How do I trust an AI? How do I explain this to regulators? And we’ll have an opportunity to discuss that today as well, and we want to address that. So the talent, time, and trust challenges that exist today for businesses when they think about an AI strategy, are clearly top of mind. We’ve met with a lot of CEOs and a lot of chief analytics officers, chief data officers, and people who are in the Fortune 500 and Fortune 100. One of the things that we clearly have heard is that you have to start with a data culture.
When you start thinking about AI, you can’t get to an AI result unless you have data, and you can’t get to that unless you really embed data and a data culture within your company. We have regularly been looking at data for decades. A lot of you have gone through a data transformation, and data just exists everywhere. So how do you tap into that data? How do you get to a data transformation – a digital transformation – then harness it for AI? It really is a key challenge in order to not only think about it, but to understand who in your company is going to embark upon this AI transformation. We talk about data as a team sport, so why don’t we elaborate on that a little bit?
Vinod Iyengar: It’s a great point, right? We are living in a time where there’s a lot of data being collected. People are collecting data from their marketing sources, from sales mechanisms, from the data products, from product launches, etc. You get customer analytics coming in through all these different sources. So we have the ability to collect data. So that’s the focus, right? So building a data culture means you proactively start collecting data first and putting it in a format which is easily then accessible. To create a data culture, you need to be able to make it really easy for people to see the data and actually get insights from it. Now, how do you do that? To do that, you need to train the people on the team to be able to take advantage of data. Now you don’t need PHD-level data scientists machine learning people from day one. When you think of a team sport, what we mean is that you need people of all different types of skills; you need to build a whole slew of people. You do of course want experts who can build really sophisticated models and regard them, but you also need people who can just look at the data and quickly get a sense of what’s useful or not.
So when talking about a team sport, we’re talking about building people with all kinds of skills and also up-leveling existing people. So your existing marketers, your existing finance people, your existing product people – can they become data aware or data science-like? So how do you do that? Just giving them the basics of data science, like telling them how to make assumptions and ask the right questions – we’ll come to that in a bit. But it’s about creating that sort of culture – up-leveling everyone to be data-aware, and also to start asking this question from your data.
Ingrid Burton: We were talking recently to a CEO and a CIO of a company in L.A. They were starting from the standpoint of the fact that they had data, but their company, their culture was data averse. And so they had to collect it all into a single place. They had to create dashboards. So they just started on their journey and now they’re going into AI. But they really were at this point where the people that had been around for decades within the company said, “Oh, I know my gut feel, I know what’s going on in the market. I know my customers.” And when presented with data, it turned out that they were off by a few percentage points and more. So really embracing data as part of your culture is important. We look at the cultural transformation of getting to AI. There are a number of reports that say this now, is really a key driver.
You really have to understand that as a member of an executive team, you can’t just mandate AI; you can’t just mandate digital transformation – I think a lot of you know that. What you have to start embracing is, “What can you glean from the data? What can you find out from data that also augments that gut feel that we all have.?” I mean, we’ve all been there, right? “I think I know best.” Data can tell me that I’m a little off or that my gut feel is correct. So it’s good to know that. So when you think about who’s on your team, we look at it as kind of a triumvirate, so to speak. There’s the business leader, right? The business leader is probably one of you in the c-suite; it’s the CMO, it’s the chief revenue officer, or it’s the officer of risk management at a bank. They are trying to solve a business problem. We’ll talk about that a little bit. We’re not doing this because we’re just having fun. We’re doing this because we’re trying to solve business problems, and to get answers to the business. They rely upon a data science team or an analytics team. So the chief data officer, or the chief analytics officer comes into play here. They want to know, “How do I arrive at the best answer for the business?” So they have to work in concert together. And then of course IT is important; it’s so imperative, because you can’t just drop new software or new processes into an existing system without IT really knowing about it. It becomes a real problem. So the CIO needs to be involved. This is the culture that you need to start creating. This is a team sport; people have to work together. They have to realize that they’re deciding on outcomes. We’re not trying to mandate AI, we’re not trying to mandate digital transformation, but we’re trying to look for answers so that a company can get to a competitive edge.
So the first key takeaway here is: build a data culture. Takeaway #1: Who’s on the team? We also found that you might have the right talent within your organization already; you probably already have really smart, excellent people that want to be part of that team. This is something that we find to be true in almost every company that we talk with, from large telcos to media and entertainment companies, to banks. They have really smart people that want to become well-versed in becoming part of this data culture. And of course, as part of this data culture: data, data, data – from every facet.
Vinod Iyengar: Yes, and if you think about it from a tactical perspective, what we’ve seen with some of the best mature organizations is that they didn’t start a lot earlier; they started maybe a few years ahead of you. But all they did is create central banks of data science people who are data aware, and then build around them. So you can do it two ways. One is if you look at the trim rate that we talked about in the previous slide, you can have a cross-functional data science team that is basically permeating all over the organization. Our CEO always likes to say, put a data scientist in every office.
So what that means is that you have data-aware thinking permeating around the entire group because of this one person who might be an expert. And that way you can slowly build a mentor/mentee model and have more and more people surrounding them who start acquiring data science skills and become users of data and consumers of data, and then they start going one step further and are able to create insights from the data.
The second way to do this is to uplevel your existing teams. So you have people who are BI analysts, for example. These are people who have already been working with data. They know the structure of the data, and how different things can be combined together, so you can possibly send them to bootcamp. You can send them to one of our conferences, or send them to an online training course like Coursera. Give them the opportunity to uplevel, and you’ll find that some of these people already working with data, and they will be able to pick this up really quick.
That way, you can sort of shortcut the problem by sort of up-leveling those in an existing workforce who are always trying to get to the next stage in their career. So there are different ways to do this; choose whatever works best for you.
Ingrid Burton: So the key takeaway is to build a data culture: figure out who’s on the team, find the right talent within, and be very data- focused. The key takeaway #2 is that you really have to ask the right questions. If you’re a major bank or a retailer, what are they trying to solve for right now? Recommendations, offers, personalization, know their customer; they all have these questions. Every major bank wants to learn how to detect fraud faster.
How do we do KYC, which is know your customer? How do we thwart money laundering? How do we get to those questions? Yes, there are early adopters in many different industries, but there is so much low-hanging fruit in terms of just finding the answers to questions you already have. You already have these questions. How do I get that next customer? Who is that next customer? How do I optimize my supply chain? How do I cut out transportation costs in the supply chain? You can get down to very granular, detailed questions, so asking the right questions is imperative. So part two, when thinking about creating an AI strategy is that you need the right people. You need analytical people – mathematicians, statisticians; these are not computer scientists per se.
So this is a different skill set. These are problem-solvers. These could be business analysts that are ready to be part of an AI strategy within your company. I would contend, and I always like to put this in, I think you’ve got to be really creative. You have to think outside the box. You’ve got to find people: you, yourself, and your company, have to think about creative solutions, meaning the same old way isn’t the way forward, or maybe it’s part of the way forward, but you’ve got to be very creative and open, really open minded and of course you really need be data-driven.
Your data is everywhere. Sometimes companies don’t realize how much data they’re generating. Look to the marketing department for sentiment analysis of what’s going out on, on social. Look at the finance department, look at the transactional data that you have. Look at the engineering department or support functions. Look at the innovation functions within your team to hear feedback from your customers. Data is everywhere. And so you can harness it to answer the questions that you may have. Finding that next customer, making a recommendation – all of that is possible with machine learning. All of that is possible with AI in terms of getting closer to the pin, so to speak. You need to make sure that you’re spending your money wisely and getting those answers.
So we’re going to talk a little bit about use cases. If you’re embarking upon an AI strategy and you’re in one of these industries (by the way, AI and machine learning are now in almost every industry that we see), take a look at some examples here of what customers are doing today. Credit cards, new credit, retail banking, wholesale banking, hedge funds management, bond trading – the list goes on. We’ve got customers in almost every segment of financial services.
In healthcare and life sciences there are some really amazing results happening in terms of sepsis detection. Better patient outcomes are being had because of AI. I personally am very passionate about that – if we’re going to save somebody’s life with AI, it was all worth it. Telcos – predictive maintenance, avoiding truck rolls. Next best customer, customer churn prediction, marketing and retail, etc. You have to have AI and machine learning. We’re working with a number of retailers right now with digital marketing firms on next best customer, funnel predictions, personalization; it’s all very important.
What’s the point of all this? Save time, save money, and gain a competitive edge. What’s happening right now within our own customer base is if you aren’t doing many of these use cases right now in your particular vertical, you may be left out, and we don’t want that to happen. These are use cases, but they are problem solving, right? We’re using our algorithms to provide you with the answers. You have a problem. We’re giving you a prediction with our technology.
Vinod Iyengar: I know that it’s important to ask that question. Typically, when you’re framing a experiment, you’re basically asking you a specific question. You always want to talk about what the hypothesis is that you’re testing, and you’re either trying to prove the hypothesis or disprove the hypothesis, right? That basically boils down to data science. You typically ask a question and try to prove whether the answer is true or not. And so that means that you had asked the right question. For example, at one of the CEO summits, we heard someone jokingly say that management comes up and says, “Hey, can you just throw AI on the problem?” and that’s a good example of when you are not aware of what questions to ask, and you end up getting these naïve questions.
We use these use cases to present the art of the possible. You can take fraud detection, for example. It can be applied across different verticals, right? But the question you’re asking is, is this transaction a fraudulent one or not? Is this person a fraudulent person or not?
If you frame the question in that sense, you know what data to collect, because the data you collect for a particular problem is very dependent on the question you’re asking. What if you’re trying to predict sales forecasts for the next 12 months? Then the data that you want to collect to ask that question is very specific. You’re going to pull in all of your historical sales information, the customer information, the type of customers, and their interactions with your team. Once you have that information, then you can answer the question effectively, whether you can forecast the next 18 months or not, with some degree of conference. But again, the key is to frame the question. So how do you do that?
It goes back to the previous question. You have to keep educating your people to think about their high-level problems. As Ingrid said, every business wants to save money, make more money, and save time. So these are not different for any business, but how do you then “peel the onion back” to translate that high level goal to a question that you can ask of your data? So it’s having that culture of asking these questions and learning from what others have done. We just published 100 use cases, and those are just a way to start the conversation.
You might think, “Oh, I have customer data, I have sales data. So what are the 10 questions you can ask to predict whether someone’s going to buy a product, or whether someone’s going to click on that path? How much time has someone spent on our web site? These are the questions that you could ask. Once you ask the question, you may then realize that you actually don’t have all the data you need for that particular question. And that can trigger the question to the IT or data science person: “How can I collect this data?” You might find that you need to do more, right? So asking the right questions is absolutely number one, because if you ask the right questions, that then determines what outcomes can be generated out of it. And then you can actually measure whether the experiments are successful or not. Because if you don’t set the right outcomes, the right questions, your metrics may be all over the place. Worst case, they are probably wrong. So if you don’t set the right metrics, the right successes, and the right outcomes, you are not going the right direction. You don’t even know which direction you’re going in.
Ingrid Burton: The other thing that we hear a lot is, “Try a number of things. Ask a number of questions.” Don’t just focus on just one thing, like, “How do I add more customers?” It’s almost too broad, right? So you need to ask a number of questions to get to the answers. So the key takeaway, number two, is to ask the right questions. Realize what problems are you trying to solve. It’s not just about sprinkling a little AI on everything – that’s not going to work. But what problem are you solving? Also determine outcomes. Measure your success and keep refining, right? You are never really done. It’s just like everything else. It’s a constant iterative approach. Determine outcomes. So if you want to minimize fraud in this part of our business, a good outcome would be to increase our detection rate by what, 2%, or 5% – put a measurement on it. Because then, you’re going to be able to measure that success. And you’re going to find out a lot once you start experimenting. So once your data science teams have the right data, and the business is asking the right questions, you can start to feed what you’re looking for. It starts to emerge very quickly in front of you. So asking the right questions is key.
So we have the also some other things that we’re we’re finding to be true, which is there are a lot of people that are in the same kind of situation where they are just starting, and they want to learn. Learning is all part of this. We make learning fun because machine learning and AI are just one facet. Asking questions is the other. But also learn from the community; send your people to be trained. There are so many online training options as we said, but there are ways to connect with the community online.
For instance, we have a huge meetup community. If you look up meetups and machine learning or you come to our website, h20.AI, you can learn a lot from other people. We share as much as information as we can. It’s such a new field for many people. If you just share your knowledge and join a meetup or send your people to meetups and present, you’ll find out there’s great talent out there. Number one, it’s a great place to find talent, but you’ll also learn so much from other people’s experiences. Whether it’s a mistake or success, we’re all going to learn. So what we encourage people to do is really join in the community, participate, and really become part of the movement as we see it. So key takeaway number three is connect with your local community. It doesn’t matter where you are. There are community members in machine learning all over the world. We are seeing community involvement in Chile, Paraguay, Uruguay, London, Paris, Prague, all over the US and Canada, Mexico, Japan, Singapore, and India. So we want you to think about the fact that once you establish your team’s data culture, you realize who’s on the team. You want to ask all the right questions, share with others, participate with them, and learn from everyone. I think this is really critical for people’s success; the folks and the companies that share will actually benefit.
Vinod Iyengar: I’m particular, I’m really excited about this because this one is built on top of open source, and it’s very uniquely AI. In the last 10 years, most of the cutting-edge research has come from the academic community and from the open source community. So even large companies like Google, Facebook, or Microsoft- even with these largest companies, the latest cutting-edge research that comes out doesn’t go into the product – it goes to open source; the community gets to try it out first. There’s a selfish reason for this because this space is so new; there are so many things that we are not aware of, especially when you get to some of these complex algorithms and technologies.
So you really want the community to do crowdsourcing and peer review and tell each other that this is really useful. What we then see are some amazing cross-disciplinary things coming out. For example, in the bioinformatics space, we are seeing some amazing research coming out there and in other sciences. So they are getting used in completely different verticals.
Ingrid Burton: So some of the algorithms from bioinformatics might get applied in marketing. Some of our customers are taking bioinformatics algorithms and applying it to marketing and sentiment analysis and into some of the banks. So it’s really interesting to see the cross-sectional kind of view of this.
Vinod Iyengar: It’s really a fundamental function of what the data looks like and what the natural distributions look like in some of these industries. And that’s really tough to know, even with experts, until you try different things and that’s what the community allows you to do.
There’s also a second reason why you want to connect the community. Most of data scientists want to share and collaborate. If you try to build a closed community, you’ll find it really tough to retain your data scientists; they really want to share, and that’s how they learn and grow as well. I read this Tweet that said, “Specific knowledge cannot be taught but it can be learned. Data science is that; it’s cutting-edge of research and it’s so new that you cannot really teach it in a university, but you can learn it from the community.
So if you close off your group, you don’t connect the community, your people’s data science skills will atrophy. One great example was that five years ago, the algorithms that were popular were things like SPM. Today, there are a whole new batch of algorithms. If I came from school, learned the techniques but did not connect to the community, I’d be stuck in a technique which was really old – that was barely five years ago. So you want to join the community. Keep in mind that the community can be multiple places – you can have meetups in diffent cities, you can sponsor a meetup.
Ingrid Burton: That’s right -you can get host meetups or even a hackathon in your office. A number of our customers are doing that and inviting others in. So they’ll do very public hackathons within their environment, and their data scientists and analytics people are all learning from people that are walking in. And you also getting students who are coming in. In terms of the talent issue, if you start holding these hackathons, you’ll be bringing in top data scientists that are just about to graduate or could do an internship.
Vinod Iyengar: Unlike other spaces, interns actually might have more of the latest knowledge. You might have business knowledge, but an intern might know the latest algorithms. They can come and teach your team something that you might not be aware of. So connect with the local university, and maybe start an internship program when you have these data scientists coming in and interning with you.
Ingrid Burton: Every university out there is focused on this. We actually have a free academic program. We’re giving out licenses to students, researchers, universities, and professors, and we’re working with them on creating curriculum. But also more importantly, what that means is we’re creating the talent pool. Helping people create the talent pool is really important to us. So if you connect to the community, you’re going to see more of these people out there, which is going to help you staff up regardless of who you are or where you are.
There’s also some great talent in the Midwest. I would say Latin America’s coming on strong as well, with a lot of data scientists. So the community is a wealth of information.
Let’s talk about that Kaggle Grandmasters briefly. Kaggle Grandmasters is hosted by Google. Kaggle holds regular competitions, and people vie for prize money and status, and they learn from each other. We were just talking to a Kaggle Grandmaster this morning who saids “I have a friend who’s a Kaggle Grandmaster. I’ve actually never met him, but he’s in Melbourne, and he wants to work with us on XYZ.
As we all know, you go where the talent is, and Kaggle Grandmasters are in every part of the planet. So we’re really proud that we are supporting that. We actually have about 14 Kaggle Grandmasters at H20.ai that help customers get started. So back to connecting with the community; we’re here to help you, but also the Kaggle people themselves, a data scientists there, the universities want to get involved and also don’t just look for data science curriculums. Look at people with math or physics degrees – our CTO here has a PhD in physics. We’ve gotten nuclear engineers here. These people are able to ask the right questions, be creative, think outside the box, and be fast and analytical. So connecting with the community is very important. You’re going to find out that it’s not that you need to go hire a PhD in data science, or someone who has a Master’s degree in data science.
By connecting with the community, you’re going to hear from such a broad section of people with diverse backgrounds, which is really important to get the answers. We’re excited about the community. As Vinod said, open source is where we got our start. This community is unique in that it’s built on open source and we connect with everyone. We encourage you to do the same. So that’s the key takeaway.
We’re going to just jump right into takeaway number four. So in terms of technology considerations, how do you minimize the talent problem, the time problem, and the trust problem? There are technologies out there called automatic machine learning, auto ML. That’s what you should look for – automatic machine learning platforms. If you’re a CIO listening to this call or you’re in IT and you’re listening to this call, or you’re a business leader – you may not know where to begin or maybe you do. So there are a lot of different options out there. You have to think through, “Am I going to go open source? Do I have to run this solution in the cloud or on prem?” You can do both. Build or buy – do I build my own solution or do I buy one off the shelf? And of course, then you’ve got to think about the data problem. So with that, I’m going to turn it over to Vinod, who is the expert in this technology consideration.
Vinod Iyengar: There are a few different things to think about. If you are starting off fresh on this journey, the question is, should you go and buy a stack from a vendor or should you start with some open source libraries to build the knowledge? There’s no right or wrong. I highly recommend you do open source. We’ve cut our team on open source. Start with open source just to get a feel for how these are libraries work. The great news is that they are free. You can just download and try them and you can quickly get some value out of it and see that, okay, I can do some interesting things. Data scientists love them too because data scientists share their work through open source. They already probably know a Python library that they like, or they want to try out an R library, and it’s great.
But once you quickly get some wins, then the question is how do we know go to production? So the key with the dirty secret of data science is that 80% of most projects actually will never see the light of the day.
Ingrid Burton: 80% of the projects don’t see the light of day. That’s because it’s an iterative process, right? You’re never quite done.
Vinod Iyengar: Absolutely. But, but the other reason is technological, right? So oftentimes data scientists might pick a library which is deprecated or doesn’t have support. When we say, “going into production,” what that means is your applications or your mission-critical systems are now relying on these models. When you reach that stage, your IT software engineers have a very different set of requirements that they are hoping for. So they are looking for liability, scale, performance, latency, and they’re looking for support. So you start with open source, but you’ll quickly realize that you might need a vendor to support me. So with my mission-critical applications, when there’s an issue, there’s someone I can call. And sometimes this sort of a maturity can be built internally, but oftentimes, you’ll find that it’s better to just go with a vendor. For example, you can find a vendor who is making it out of an open source package.
For example, our H20.ai open source library gets used by over 18,000 organizations globally. But a small percentage of those, when they are ready to go to production and they have mission- critical applications running on them, they buy support from us. And what that gets them access to our data scientists, our support people, and basically 24/7 support if an issue happens on your production model.
The second thing to think about when it comes to prod and on- prem obviously is, how fast do you want to get started? So if you are coming into this journey really new, you might find that the cloud will get you running up and running very quickly, because you basically bypass or shotgun the whole DevOps journey of procuring hardware, putting software on top of it, maintaining that security infrastructure, and all those considerations.
So if you’re starting up fresh, cloud might be a good consideration to really jump the gun. On the other hand, if you already have a data center, you already have a reasonably mature DevOps practice, then you might just run it on prem. Obviously, you can save a lot of money by not going to the cloud because you already have your hardware, and it’s already amortized. So that’s a consideration.
Often, we find that customers end up doing kind of like a hybrid study. After some point of time you might realize that you can guarantee that you’ll have a certain amount of compute workloads for sure, but there are these spikes which you cannot model for. So you might use a hybrid strategy where you can jump to the cloud when you have spikes, but then use your on prem as much as possible. That way, you can optimize for your costs and also you don’t have the heartburn of having to procure “x.”
When it comes to build or buy, this is similar to the open source question. You do want to build some in-house expertise. And it often helps also hire people, because people want to contribute to packages and want to build new stuff. But at the same time, you want to have a good view on what the time value of money is. So the time to insight is critical for a lot of companies, especially in a competitive space. You often want to quickly get to an answer or solution quickly, and oftentimes, that means buying it. You can buy it, but also gain the expertise.
So you can basically work with the vendors like H20. With our customers, we don’t just sell them software, we often teach their data scientists. We often run training programs; we give them tutorials, and we give them test drives. So they bought the product, but they also learn along with the using the product. And finally, data is of course critical, right? So, it’s about understanding how you save your data and how do you make it accessbile? So there’s obviously a lot of conversations around data governance, data privacy, access, and lineage. So these are all critical pieces, especially with GDPR and the California privacy laws. You need to be very cautious about how you save your data and how you make your data accessible to your employees. So obviously, everyone should be able to look at the data and get insights out of it, but you should also have those guardrails to sort of put in place so that you don’t accidentally leak data or you don’t accidentally see something that you’re not supposed to.
Another thing is to understand how good your data is – the data quality piece comes into play. If your data has known issues or if there are missing values and a lot of errors, that might give you a bad results. Garbage in, garbage out, right? So it’s about creating processes so that you know good a particular set of data is that you’re going to be using for a particular set of models, and having a process to sort of keep checking that. So that’d if you know that at some point the data is changing, you can sort of remodel or recalibrate your model.
Ingrid Burton: So these are all of the technology considerations that your data science and it teams are going to need to consider. As a business leader, you may or may not be thinking about it, but there are costs associated with it. Right now, nothing comes for free. If you want to get results, you’re going to have to invest. So you may already have all the infrastructure, you may already be on cloud, you may already have all of that. You may have all your data. But you still need to invest. And then, as Vinood said, one of the keys is that last mile, which is trust in the data and thinking about the data because of GDPR and the California – we’re going to see more of that across the US and across the world.
It’s about the idea of trust in AI. So how do I trust the model? Where’s the bias? What’s going on? You know, we hear it’s not fair. We hear we can’t trust it. We’re a regulated industry. So explainability is a big field right now in AI and, and I believe that h20.ai is at the forefront of leading the charge to really try and help people interpret and explain their models and give regulators the documentation that they’re looking for. We could spend hours talking about fairness and bias in the data. The key point is humans. You’re never going to take a human out of the loop. That’s, our assertion, right? You need, as it’s been said, garbage in, garbage out.
Humans put bias into data, right? This happens. What’s fair to you and I may be different to somebody else. So humans have to be in the mix to kind of oversee not only the AI but each other. And so when we talk about AI and data science and machine learning, we want the human element to be in there because humans still have the ability to look at an answer that a model might be giving them and say, wait a minute. That’s way out of balance from what I’m thinking or that doesn’t make any sense. Let’s talk about credit scoring. So let’s say you get a credit score, it’s phenomenal. And the AI says, “You’re eligible to receive a $1 million loan right. Now, if you just let the AI go, that person might get $1 million, right? But you can’t have that happen, right? The bank can’t let that happen. They have to still check to see, “How did we arrive at that model? Why did it indicate that about that particular individual?” So a human has to be in the loop in terms of looking at the models, understanding what’s happening. You need to look at an auto ML platform (and we have one called Driverless AI), that helps you interpret that model, that documents the model, that provides reason codes to regulators and regulated industries, which include healthcare, banking, and other industries – fair lending, the Fair Housing Act. There are hundreds of regulations out there in the US and across Europe, and across the world.
You need to be looking at those regulations and be able to give a regulator or a lawmaker a reason why a particular model was scored the way it was. And that’s a platform that does that automatically will be helpful, but also have a human in the loop making sure that you’re not missing something – you’ve got to put some guardrails on it. So trusting an AI is a very complex subject, but it’s very simple when you think about it. It’s “put a human in the loop from the beginning to the end of the process.” Put an automatic machine learning platform in there, and then really look at the documentation that’s being generated. Look at the regulations that it’s adhering to. And really, the technology can help humans make better decisions.
Vinod Iyengar: We’re talking about basically trust for humans, right? But really when you talk about trust, we are talking about human-centric trust. So what that means is, creating explanations that a human can understand and articulate. And that means that you have to document the results that you’re making, and then explain why those additions have been made. But this will often encourage really interesting things, like you might see that your data has bias in it. Just as an example, historically, say a certain group of people or a class of people may have not gotten credit because of different reasons. So if you start using the same model to build same data, to build future models, you will keep sort of reinforcing that bias and the data. This is where basically you might say, “I’m going to overrule the AI’s additions and other software and machine learning models additions because I know that the reason the model made the addition is because of past data. But I know that I want to sort of cover for the bias or sort of fix that part. A rule engine that does this is running the addition a machine learning model.
The other part of this goes back to the classic Occam’s Razor, right? Oftentimes the simplest explanation is often the right explanation. What that means in machine learning terms is that the simplest model that gets you the best accuracy is often the best model. So this is a great example that you see in Kaggle all the time. If you look at the Kaggle leaderboard for a competition, you’ll find that the top sort of our 10% dialuup results are all within 0.1%. So they are all so close to each other, they’re winning on like 0.01% decimals. Right? So, and at that point the question to ask is, “Hey, is it worth the additional complexity to get an additional 0.01% lift in accuracy?” Most times, it is not. You go for the simplest model are there, which can get you 99% accuracy of the most complex model.
Another example is a Netflix competition about 10 years ago, when they put this whole massive data set out there and asked people to beat the recommendation engine. The winner built a really complex model which actually never got used. They actually picked the person who came in fifth or sixth, but they had a model which was much simpler, but good enough. So the concept of complexity can be significantly high, especially if you don’t know what kind of edge cases that might be in your models and your data. Right? So it’s about picking simpler models which are more explainable, which have more constraints, which are a human-understandable and which can basically be overruled if needed. That’s what constitutes trust, and obviously this is a huge topic.
I recommend that you go to our website, which has tons of resources. We’ve done a lot of webinars on this topic.
Ingrid Burton: And we have a number of our customers have spoken on the topic at many of our events and we’ll be talking about that even further. But go to our website and you can learn about explainable AI.
This is really important – you’ve got to put it into your thinking when you embark upon an AI strategy. Because people are going to say, “I don’t trust it. I don’t know it. I don’t get it.” Going back to the first thing – creating a data science or a data culture – you’ve got to make sure that you know that that is addressed. Make sure that you’re asking the right questions, that you’re framing the questions appropriately.
Make sure you’re connecting to the community and being constantly learning. Think about the technology, and this is where the IT guys and gals, the data scientists, and the analytics people come in. They’ve got to think through the long-term effect. There’s easy ways to get started in the cloud, but you have to decide if are you’re going to be in the cloud long term, or if you’re going to do a hybrid approach, etc. And then finally, trust in AI. Those are the five key takeaways. Really, you’ve got to think about trust in AI. It goes back to actually trusting your team, being part of the culture, and saying, “Yes, we get it.”
So this is what we’ve learned over the last several months, talking to a number of CDOs, chief data officers, CIOs, chief analytics officers, business leaders, across many different industries, and across the world.
So where do you go from here? What’s your next move? You need get started. What problem are you solving? What technology do you need? Who’s on the team? How do you get the support to get it done? If you’re the executive sponsor, you’ve got to bring a team together around you. It’s not just you in your office with your team; you have to bring a cross- functional team in. We’ve talked to CDOs that go and talk to their boards about this. This is a critical board issue right now in terms of making sure that they’re getting the latest and greatest and keeping on the cutting edge.
So we want to continue the conversation. We’d love to have you join us at an event or meetup – contact us! H20.ai is probably the fastest way to get ahold of us. Have your people try our platform. We have easy ways to get started. We have tutorials and free trials. We do POC pilots all the time. We’ve got data scientists that would love to share what they know. We’ve got our own customers who would love to talk to you about this. The more we all evangelize what’s going on in the market today, the more we’re going to get more of a movement here and more people will be successful.
So with that, there are some questions. excellent.
Question: Do you find more customers coming to you to apply new tools to a well-defined business problem or more who are looking to see what things might be possible with new tools and techniques? It seems that the former is the much easier challenge to address.
Answer: Obviously the former would be low-hanging fruit, right? You already have a problem or question. If you have a question that you want to answer, then you have an equally good idea of what you want to do with it. So we do see that quite a bit. But on the other hand, we also see more of “If you build it, they will come.” So buying licenses and asking us to come in and set up an infrastructure. And then giving it to data scientists. People who are data-aware think that “Hey, we’ve got H20 – let’s connect it to our data sources. We can also come in and do training in person, in webinars, or online training, and getting the teams at the ops level to say, “Hey – it’s all connected. It’s all set up. Now just go ahead and play with it.
Vinod Iyengar: And I would say it’s a mix, right? It’s never 100% this way or that way. Some of the companies that we’re working with, they just come up with a new set of problems that they’re trying to address that they haven’t been able to solve in the past; that sometimes works out really well, too.
Ingrid Burton: We’re willing also to come in and talk to your executive team. Our CEO talks to a number of boards and he gets invited all over the place to talk to boards and executive teams. We’d love to see more adoption of AI, machine learning, and automatic machine learning. We’re here to help. Thank you to everybody who joined us today. The presentation slides and recording will be made available through our Bright Talk channel.
Ingrid Burton: Ingrid Burton is CMO at H2O.ai, the open source leader in AI. She has several decades of experience leading global marketing teams to build brands, create demand, and engage and grow communities. She also serves as an independent director on the Aerohive board. Prior to H2O.ai she was CMO at Hortonworks, where she drove a brand and marketing transformation, and created ecosystem programs that positioned the company for growth. At SAP she co-created the Cloud strategy, led SAP HANA and Analytics marketing, and drove developer outreach.
She also served as CMO at Silver Spring Networks and Plantronics after spending almost 20 years at Sun Microsystems, where she was head of Sun marketing, led Java marketing to build out a thriving Java developer community, championed and led open source initiatives, and drove various product and strategic initiatives. A developer early in her career, Ingrid holds a BA in Math with a concentration in Computer Science from San Jose State University.`
Vinod Iyengar: Vinod is VP of marketing and technical alliances at H2O.ai. He leads all product marketing efforts, new product development and integrations with partners. Vinod comes with over 10 years of Marketing & Data Science experience in multiple startups. He was the founding employee for his previous startup, Activehours (Earnin), where he helped build the product and bootstrap the user acquisition with growth hacking. He has worked to grow the user base for his companies from almost nothing to millions of customers. He’s built models to score leads, reduce churn, increase conversion, prevent fraud and many more use cases.
He brings a strong analytical side and a metrics driven approach to marketing. When he is not busy hacking, Vinod loves painting and reading. He is a huge foodie and will eat anything that doesn’t crawl, swim or move.