This video was recorded on June 3rd, 2020. Slides from the presentation can be viewed here.
Traditional supply chain models have been severely impacted by COVID-19 resulting in technical code debt for a lot of businesses. As uncertainty builds up, it becomes incredibly important to retrain models from the get-go for adjusting sales forecasts and do what-if scenarios.
Read the Full Transcript
Hello, I would like to do a quick sound check before we start. If you can hear me, please type yes in the questions tab. (silence) Okay, we’re getting some yeses coming through. Hello, and welcome everyone. Thank you for joining us today for our virtual meetup titled, Getting Your Supply Chain Back on Track with AI. My name is Qybare Pula and I’m on the marketing team here at H2O.ai.
I’d love to start off by introducing our speaker, Karthik Guruswamy is a senior principle solutions architect and business data scientist at H2O.ai. Before I hand it off Karthik, I’d like to go over the following housekeeping items. Please feel free to send us your questions throughout the sessions via the questions tab in your console.
We’ll be happy to answer them during the Q and A at the end of the meetup. This meetup is being recorded. A copy of the meetup recording will be available after the presentation is over. Without further ado, I’d like to hand it over to Karthik.
Hi everyone. Welcome to the talk or to the meetup. Let me see if I can show share my screen here. There we go. We’re going to be talking about this important topic, Getting Your Supply Chain Back on Track with AI. We’re going to cover a couple of important things here. First is, what is H2O doing regarding response to COVID-19. Also, we look supply chain in the COVID-19 era. Mostly simply the social impacts. We’ll look at how do you manage your supply chain in difficult times. Then the rest of the topics I’ve shared. Then at some point we’ll go to a demo. Then we’ll be going to a Q and A at the very end.
How is H2O handling and solving COVID-19? You can see a quote from the CEO and founder of H2O, Sri. We are focused on data science and help saving lives and good for humanity. Currently, we’re working with hospitals and staffing predictions, and ICU plans first, generally predicting the spread of COVID-19. How to go through some of the initiatives out on that. Most importantly hospital supply chains. We started with a hospital supply chain that’s connected to COVID-19. Now we are expanding it to the COVID-19 predictions across multiple verticals.
Supply chain being one of the most important problem space right now because of the uncertainty. It’s on so many different topics just beyond COVID-19. Basically, from H2O, we have been evaluating global and open health datasets to determine patterns. Then as you know H2O has data science experts that are contributing to the knowledge to solve the pressing problems from the pandemic.
Also, H2O has a lot of AI platforms, two of which were the more creative Driverless AI and the Q platform. Finally, we’ll look solutions on our AI just not on supply chains, but going beyond the supply chain. As you all know, there is unprecedented disruption to the supply chain happening the last few months. It’s kind of a black-swan moment for supply chains right now. Businesses are trying to cope with it, but the recovery is going to be very difficult.
Just reading all the news coming out of so many newspapers. I mean, you see unless the business acted on it proactively with the right tools and approach, the recovery is going to be very difficult for business. We see disruptions across models and processes around manufacturing, warehouse management, transportation, distribution, logistics. It’s a long list of supply-chain building blocks that gets the stuff on the shelves and to your home.
You even see things like cash flow, forecasting, also part of the supply chain as well. Through and post COVID-19, assuming what we’ve already experienced, there’s going to be a diminished or sudden increase in demand for consumption of household and industrial goods. As you know velocities with the baking goods, alcohol, frozen food. The demand is going up. Then you’re going to see demand going down on other things as well.
Basically, the whole story has since changed. Even if occurred, there is also a possibility of a second wave. I don’t think anyone is really prepared for that from a supply-chain perspective and product. The bottom issue that businesses are struggling with this problem is because a lot of businesses, even if they were using AI or ML, the models have been manually crafted with Agile projects. Maybe a three or six-month project where a bunch of data scientists look at the data, do some forecasting, or built some models around a supply chain. However, the models only work if the underlying assumptions, and the ecosystem, and the macro-economic environment has not changed.
Let’s say, in your model you had not imported any of these, then when that changes, then the model breaks. This means you have to start all over again. You’ve got to build new features, or you have to go and create it, and then you have it back tested. Also, these supply-chain models, it’s not one size fits all. It changes from an orderly from SKUs to categories, but it also changes from state to state and country to country.
Something like COVID-19 happens, unfortunately, we don’t have historical data to go back and build those models. Maybe some crisis you can probably go back to many years ago, but not a pandemic. We don’t have the data. The last time this happened was the last century, so we don’t have the data. Obviously, you cannot afford to build new AI and ML models with the same methods and timelines as the situation is changing every other week as well.
Let’s say you have this COVID data just for the last few weeks, then the situation changes. That’s not good enough, so you need to come up with something better. Basically, what it really boils down to is that time to train and deploy short-lived models with the highest accuracy week after week becomes extremely difficult. The talent, especially, not necessarily the data scientists, but the domain expertise that was generally used for traditional models is not really usable. Nobody knows what to do.
The third thing is that even if you had models predicting something, that this is how your [inaudible] is going to change, you really want to get a really quick insight into what’s actually impacting the model prediction. The cost is extremely important. It really comes down to time, talent, and trust that most businesses are missing created where supply-chain models around COVID. From the H2O site, I mentioned just a little bit about the talent and the team. H2O has some of the top Kaggle grandmasters, the world’s best Kaggle grand masters. They’ve been competing with Kaggle competition. Not just any competition, but especially they are focused on COVID-19, the global forecasting contest on Kaggle.
They’ve been consistently on the leaderboard in predicting what’s going to happen next week on which region. We’re going to talk a little bit about what they did, and where our models fall on the leaderboard. Then we’ll talk about how to incorporate this model into the supply chain and other parts of your business, okay? In a second. How do you build models around COVID? The most important thing to remember is that we can predict COVID. At some point all the social distancing as well, but we won’t talk about that in this meetup.
If we can predict COVID in the next few months across every country, and every county in the best possible way, then that’s a really good building block to include in your existing model. The Kaggle grandmasters have been using power-growth models for the competitions. Just to give you a small introduction, you can look at the right side. The power-growth models are basically, this is how COVID infections are growing. You can see it’s basically made up of an exponential growth initially followed by a couple of polynomial growth at the ends.
Picking up code like this and then putting the brand exponential polynomial functions on top, and driving the coefficient would be something called a power-growth model. They’ve been extremely successful with this in the competition. That’s one way to forecast because you take this week’s forecast and enter the formula that’s using exponential polynomials. You’ve got yourself a nice equation or formula that’s going to predict what’s going to happen the next few weeks or months.
The same thing can also be used for the second wave in some way. We’ll talk about it later. The second way to address COVID predictions is using your epidemiological models as a SEIRD. SEIRD stands for susceptible, exposed, infected, recovered, dead. It’s like a state-driven model, though it’s only 10% of differential equations. Then you have all the free parameters that it can take and try to fit in.
Some percentage of susceptible that’s been exposed and some percentage that’s exposed infected and enabled the COVID, and unfortunately died. The movement across different states is really governed by mandatory social distancing, or self quarantines, or lock downs. They’ll actually go into this model. What we have done right now, is we’ve incorporated the SEIRD model that’s incredibly popular right now across the epidemiological studies. We have taken that and incorporated that into our Driverless AI software, which means that you bring in some upper and lower bonds of some values on all these different states. Then we’ll unlock the model on the site to find the best fit.
Then we look at the actual infection rate. Then we build that model like a [Julien] model or an Aron model on the [inaudible]. Now why we are doing all of this is because once you have the core predictions nailed down then your existing models that haven’t used their extraneous data can be enriched by the predictions. That’s a point that I’m making.
In the last part, one is that you have short-term historical data. Then even if it’s a few weeks, but you have some predictions of infection. You have for those few weeks, you have the core infections. Now, your time-series models can be making really good predictions. Now, there are other ways. Let’s say the situation changes. There’s going to be more lock downs, even curfews if you’re going to have curfews. Very unfortunate, but those can be factored into these models in order to make the predictions much better.
Now, going beyond COVID-19, some of this has also impacted the COVID-19, some are correlated, but some of the products that we have is that even if you have different variables that are correlated, we are able to use the ones that make sense in the best possible way to build the best models. For example, you can add unemployment claims, things like consumer price index, mobility data that’s available known obviously as shift capacity. As well as going to rural areas. Also, any other data that your organization has will apply to the supply chain.
You can think of your time-series model could be built on so many different extraneous data and many are predicting. You can also make predictions on some of those. For example, COVID-19 and the other ones. That will help your models to forecast better. This is something we talked in the last few meetups. The short-term models and can be power-growth models, or a SEIRD-type model, or a SEIRD transformer that works with well with a product like Auto-ML. Data can be reported for the last X weeks with input from COVID-19 data and other extraneous data to help forecast. Those are really good for short-term models. Basically, running into maybe a few weeks or months.
What about long-term forecasts? This is where things become highly uncertain. Even now it’s very uncertain what will happen in the next few weeks. Imagine projecting about a year ahead. How are the long-term forecasts, the best way to do predict what you might be doing is running a lot of what-if simulations. You could run what-if simulations based on a wide range of inputs and hypothesis. You can say, “Hey, what is the unemployment rate?” Whether it’s from 20% or 30%. Then you can input what if the second wave, the infection rate is 30% higher? What if something happens that second wave stops?
Basically, now you are reporting constraints on not only you are reporting constraints on all the extraneous values. Then run different simulations to actually see yourself looking at the predictions at first on what are the different combinations of this extraneous values are leading to different demand and supply things that we’re very concerned about. It’s a very different way of, I wouldn’t call it long-term forecast. That’s the wrong way to say it. It’s more of what-if simulations to understand what are the different things that can go wrong in a constrained way. In the most constrained way.
Let’s talk about really short-term. We’re building weekly forecasts and things are changing on the ground every day. How do you start making the certain day-to-day decisions on the supply chain? The forecast model is not great even with COVID-19 data and unemployment extraneous data. We can build short-term forecasts, but if something changes every day or hourly, we need to do something called demand sensing.
Demand sensing has been there for a while, but for folks who are new to this term, really what we’re doing is that we are already doing the short-term forecasts, but we’re adjusting the output of that based on day-to-day, or hour-to-hour, or even near real-time changes in the values. You’ve got a real big surprise on some particular situation that you model was built on, now we can adjust the predictions.
Let’s say predictions are not more COVID-19. Something like a demand for a particular product. Let’s say hand sanitizer or toilet paper. Everything is fine now. All the stores are stocked with hand sanitizers and toilet paper, but what if something changes dramatically?
Your short-term forecast says all of this is going to be available, or all of this will be sold from the supply side, but something changes, now we can incorporate those changes into our short-term forecast. That’s part of demand sensing. I will give you a demo of both the short-term forecasting and the demand sensing. I’ve got time to do a little bit of what-if scenario analysis as well. Let’s talk about Driverless AI time-series forecasting in our next 10, 15 minutes or so. Then we’ll go on to the demand sensing demo. Then like I said, if we have time, how about some what-if scenarios? If you’re not part of the event, H2O has a product called Driverless AI. Some of you are using it.
We also have an open-source product called AutoML. AutoML is a commercial product. There’s a lot of randoms in the space who I recommend. I’ve actually listed a few of them. One of the things that’s very different about Driverless AI is that it can do automatic feature generation. Given all the things done at H2O, we do some pretty advanced features in generated. If you have a dataset with all the different columns, for example you’re SKU. Your customer ID. Anything, country, region. Then you also have core data in the X created by another Driverless AI model. Then you have unemployment rate, blah, blah, blah. You’re trying to predict the demand for some product. You can actually bring the data into Driverless through some of the most popular data solutions I listed here. I only have 50-plus connectors to name and type.
You bring up that new dataset, then you can do some visualization, automatic visualization. I don’t know if you have time to do that, but if you do, do a trial. Check it out because it’s great for looking at different visuals that we actually pull out what we think is useful for data scientists. Then you bring the data X and Y, and we predict Y from X. That’s what we’re trying to do in predictive modeling. Do you know what? To bring it in we stock a revolutionary process by which we run features like lag features and type series. Then we run multiple algorithms, do some hyper-parameter tuning. Then the goal is to find the best set of features, and the best algorithms. Then maybe an ensemble on top of them.
It’s completely hands-free. That’s why we call it Driverless. It’s a metaphor. It’s like a driverless car. You say where you need to go and then it just drives the rest of the way. Then it’s going to find the model with the best score possible. In this case, you see the lower the score the better, so it’s finding a better and better model and features. We are narrowing both the features and the models. It actually moves. We’re tuning that. Then we also have an option to bring your own algorithms and transformations. We have open source maybe, I don’t know. 150 something different scorers, and algorithms, and transformations. I could have the past three. There’s a link for them at the end of the presentation.
You can go there and see that you can just point to one of those and say, “Hey, there is a built-in set of algorithms that we ship with and we really want to add something on top of it.” This will be published regularly. You can just put it in, and put the best product. If you look at a transformer, that is one example of what it was built in going forward, but it could be another one coming up maybe next month. Or in a few months.
You don’t have to wait for the transformer or the algorithm to move the product. You can go to our GitHub and it can just add that as part of the program. Then name the best features and the best algorithms as well with the highest accuracy possible. We also generate an automatic scoring pipeline that means whatever models and features that we’ve found during the original process. We actually write the code for you as combined code that goes into our pipeline.
Sometimes these pipelines could be pretty big, but remember, we’re also doing a lot of cost validation for all fittings, so you’re going to see these features and models that fit through a pipeline that are already tested for fitting. All you have to do is deploy the Python or Java artifact. Then you can go to production. Then make it part of your supply-chain process. You can think of Driverless AI as the engine behind the supply-chain automation.
Also, we have a product too that I talked about earlier. You can also use that product. It depends on Driverless AI, the backend, so you can create AI apps, possible AI apps. You’ll end up predicting modeling and Driverless. Then you’ll start off with the interface for the user with more friendly sort of app. You have drop-downs and things like that. Then actually consume the output of the automatic machinery.
The most important thing at Driverless is the machine learning and interpretability. That’s our time-series forecast, regression, or classification. We are able to breakup a pre-built model to see how it’s making predictions. We borrow things like shopping values and day line output for our regression and classification, and Shapley forecasts. You’ll be able to say, “Hey, we’re predicting two weeks from now,” and say, “Do you know what?” “The demand is going to peak or the demand is going to tank.” Then you can say, “Why did this happen?” It will actually tell you the reason why it happened is because of X, Y, Z. You want to know that. That comes in the business communication.
That’s what Driverless AI does. Specifically, with respect to time-series forecasting, we’re talking about short-term forecasts, long-term forecasts. For short-term forecasting, when we build the models we account for forecasting gap. Then we have a thing called forecast horizon. That’s the builder you have for historical data, which is about maybe a few days late or a week late because sometimes it takes time to collect all the data, curate it, and put it in the data house or the lake house.
Then we bring it to the Driverless, it’s probably more than a week late for whatever reason. I’m just making an assumption here, but if you’re going to get it in the next second before, that’s even better. Assuming, there’s a gap, now we can actually build a model to say, “Hey, I’ve got a one week gap of data, but I want to predict for the next two weeks.” Then we’ll build a model of all the forecasts for the next two weeks. Then based on my forecast whether it’s four weeks or six weeks, then we’ll build a model that lowers the error for the next six weeks. It’s extremely easy to build. It’s kind of hard to say that, but I obviously use the GPUs on the backend. Not just for CatBoost models, but also for LightGBM, and XGBoost, and even GLMs. Your models are going to run really fast.
Then we have our cross-validation scheme, a lot of back testing. This is not the case where you bring in one set of validation data and then you are doing your model based on that. You thought about maybe the whole cross-validation. We actually move the windows. We enlarge the windows. We play with it and do a model that is robust, so you can use it for future. It’s not about what is more. Then we also have time-group columns. This already is part of feature in Driverless. What I mean by a time group is that, let’s say you have a department in store and you have an SKU expiration. Then you can bring the historical data for all departments, all the stores, all the SKUs and throw it into the Driverless across maybe last up to six months.
Then what we will do is that we look at it and we automatically detect that there are groups inside. These groups pickup like threads. Then there’s flat-building models I can bring for different radiations across different groups. We don’t build an illusion model for each group. However, we look at the whole that it’s employing. We’re looking at even using the group columns. The time-group columns is its own feature by itself.
Then forecasting unusual department store and SKU will probably be much lower efficacy than putting all of that together. Basically, we cross learn across different time groups and columns. That’s the power of Driverless. Then we also do time-series specific feature transformations. We do lag features. We do aggregation on exponential moving average. We do aggregations that we were talking about just now. We look at lags.
You can also incorporate exogenous experiments like the COVID numbers from another Driverless AI model, which means that it’s not just illuminated and series. We’re actually looking at not only time groups from that perspective, but also we’re looking at different columns with numeric parabolically. It could even be X. It could be a different X that might appear. Even though we would not be able to predict what would come out. I’m just saying that if you have some LLP for the columns, you can actually include that. We’ll take care of all of that for you.
The most important thing about Driverless as we mentioned in the last two slides, you can bring your own features and models. Let’s say you have a time-series feature that you’ve been using in a model. You can import that. It will actually complete with the live features. Sometimes can combine as well. You can also bring your own models. Most importantly, you can bring your own cost function. If you have a cost function that penalizes different things, then you can write your own in Python and actually make that a model of that cost function.
Regardless of what you do, whether you bring your own or use one of ours, you can always forecast your value with Shapley. We can use Shapley values for predictions. I’ll show you more of them in a minute. Okay, we have about 30 minutes in the presentation. I’m going to go a little quicker because we have a lot of ground to cover. With Driverless, we have different knobs, accuracy, time, and interpretability. I’m going to show that in a demo.
Maybe we should just go to a demo right now to save time. (silence) There we go. I have some datasets here. Some training and test datasets that I learned from a Walmart Kaggle contest sometime ago. Let me show you quickly what the dataset looks like. This is just weekly sales. I hope you can see my screen here. This is just weekly sales for several months and quarters. The data’s from 2010 all the way to 2012. You can see how this is all the sales across multiple departments and stores. Let’s go ahead and look at store number one here.
That is store number one across all the different departments. I know we have an Excel number. I want to show you quickly. Okay, there we go. That’s just store one, department one. The data looks something like this. You have a store column, and you have a department column. Let me see if I can zoom in real quick. Then you have each day of the week. You have one date a week. I think it’s a Monday.
Then you have weekly sales and some other variables like it also has a holiday column and some sample column. If it’s a holiday, national holiday, then it would have the rate in order. You also have the rates across those. It’s there. Now, we want to bring in that director model. You want to go back to, let’s say the spreadsheet. I’ll go to a chart here in a minute. We want to actually forecast the next six months. How it’s going to look like. Not across all department and stores, we want to look at the first store and first department.
What does the forecast look like in one shot? You’re not going to build one model per department per store. We’re going to throw all of the data in. Driverless is going to do lag features on all of the threads. It is going to create features. It’s going to build the model. Then we’re going to put it. Because all of the data in the past, we can actually compare the actual with the predicted, and see how well it’s doing.
I have a created dataset here. Let me see if I can enlarge my screen here. Then we have a test dataset if you want to predict on. We have it on Driverless AI. I want to create it, but I don’t want to create a time-series model. I’m going on separately. Then I pick a target column. This is weekly sales. Then I’m also going to use our time column just our date. Then I’ll watch as it extracts it into our test dataset right here.
The next thing I’ll do is actually build a model around training and try to predict on this right away, but I’m not going to do that. I’m going to show you how to score on our test dataset separately. I want to predict for 26 weeks. That’s exactly six months. You can also do this for a few weeks, not long. You can even just predict the next week. Then we have three knobs here. Accuracy, time and interpretability. Very quickly, you can read the supplement on the H2O website. The accuracy knob basically allows you to do more cross validation and more algorithms, more hyper-parameter tuning, and a large sampling of data.
There are many things you can do with the accuracy knob. Obviously, the lower the value, if you actually want to build a model. The time obviously, allows for wherever you are stopping. For the early stopping iterations. When you build a model, we’ll take the one iteration. We’ll run multiple of them. We’ll do some feature on engineering. Then we move to the next. We decide what we can do for the next iteration. Then we then go to the score.
Obviously, certain combination of features and models are doing really well. We’ll try to improve that. That’s our original process. That’s impossible to do it manually. You need to have a [inaudible] to do that. The time knob basically, allows you to run more iterations even if the score doesn’t improve. If I set it to three, then what it’s going to do is, after five iterations, it implements the score. It’s going to stop. It’s going to wrap-up and build a model, package it as an experiment.
The last one is the interpretability knob. The higher the interpretability settings I have, the lower engineering it’s going to do. It’s already basically engineering. One COVID and things like that. Because it’s a time-series column, it helps produce some lag. Then you lower the interpretability knob, it is going to a little bit more advancements at the beginning. Again, no matter what setting you have, you’ll always be able to build Shapley models on top of them.
Then you have export settings where you go and create more time-series parameters. For example, I won’t actually consider the time-group column. It’s a standalone feature. I’m going to check that. Now every time-grouped department in the store will be by itself. As opposed to being anonymous. We don’t want to why it’s important to look at the different, sometimes looking at [inaudible] is important by looking at what category. Maybe there’s some signal or something with this particular one.
There are things to speed up your former predictions. You can see that we also have a dropout mode for [inaudible]. Then we have various different settings. Again, most of the time you don’t have to change anything here, but as you export more serious models, sometimes you may have to make those calls, but you can always do it in Driverless AI. Now for the export settings, you can also incorporate recipes that are outside, from outside like our GitHub link. You can actually go here and go to our GitHub website. You can click there.
You can see you have a whole bunch of advanced commerce, which you can add the model. I can add [inaudible], profit. I can add interpretability, time and coordinates. There’s a lot. I don’t want to do this right now because it takes a lot of time to build a model and we’ll run out of time. If you want to do that, you can actually go here if you really wanted to. Then paste your URL. Then you can say load customers while you can type the link here.
Then it should actually show up here. It should actually show up as part of the feature. If our A is better than our B piece, it’s going to show up on top. I’m going to stop the experiment, but before I do that, I want to show you we have various different scorers. You can also add your own scorer. My favorite is MAPE, absolute percentage error, which we don’t right now, but we can get it from GitHub. I’m not going to do it, but you can add it.
Then once you do that, then the model’s optimized for the scorer to predict. We have a time column. We have a target column. You have a forecast horizon. We have the settings. Let’s go in and launch this experiment, okay? Right now, what it’s actually doing validation click. You can see that it automatically detected the department and store cooking that we never said explicitly. That’s the fun part. When you are doing a time-series forecast and you’re working on different regions, or different departments, or different SKUs then you can just bring the historical data. As long as it’s [inaudible] or daily data, something you can agree on. You can actually throw it into that mix.
A very popular question is what are the gaps in the time-series? We handle a lot of gaps in the time-series, but if there’re too many gaps, you can always roll-up from days to weeks, or weeks to months depending on what you’re trying to build. You’ll still get a good model. While I’m building this, while I’m running this. It’s already built a few models. It’s built the LightGBM. Oh, it’s finding a lower automacy, a better score and it’s still progressing.
On the very first model it built, you can see how it tried exponential algorithms and it came up with the number one feature here. Then the lower and lower models are the store and department will soon become an important feature. We automatically extract the day and the year. The day of the week and so on. For every model the feature is going to change, but at the very end you’re going to get a model.
Let me pick one with a low score here. This is a completed model. You can see the validation score was 7,629. The last time I encoded a test dataset, so it did really well with the test dataset. It worked really well. Maybe a little bit under fitting, but it can change all of that. Now if you want a score? Now we’ve built the model and we want a score on our new dataset. How do you do that? You can actually say score on another dataset. Then click our test dataset. Then the weekly fields are of the test dataset, this is the actual value. We’re going to predict from this dataset. Then compare the prediction with the actual dataset.
I’m actually now scoring on the test set. What’s happening right now is that in that pipeline that was built, it’s now used to score. It’s probably writing it in code. It’s doing it. You can actually download the predictions. Then it can compile the actual list unpredicted. You can also look at the scoring pipeline. Again, some of the products, some of our competitors call them a blueprint, but we don’t put this in ahead of time. We discover the pipeline. Then we do feature and model tuning. Also, checking for all fitting across validations.
This is a powerful one. If you keep the settings really high, you’re going to get a means of this. You’re going to get a very complex pipeline. They’re all tested for fitting and validation. You’re going to get a much higher score. All this control in what features can be in the model, what features go into the final model. What are the different algorithms you can choose in the expert settings. That’s for that.
Okay, now we’re at the bottom, what else can we do? Let’s go to a product queue. Let’s check the time. We’re at about 10:40, so we probably have enough time to do it, I think… Then we can also do our thing on there. I want to go to Q. Then we’ll go to the demo. What is Q? Q is, I think of this as a VI for AI. It’s whatever you want to call it. There’re different ways to look at it. Q has a very business friendly data scientist whatever you want to call it in the interface that any business analyst can go in there, bring in the table, upload the data. Then they can search on different things. You can build some charts and graphs, and things like that, but it can do less number of keystrokes.
There’re a lot of our company built into it. The other thing it also does is it actually talks about in the backend. You don’t have to get into the interface that I showed you. If it’s too complex, you can actually just stay at a table level. You can build a model, just the settings, interpret the settings and the time column, and all of that. It’s one of the forecast models for you. Regardless, you bring the model back and it can score on the Q.
It’s a really nice, very light interface. We have this. You can do data prep. You can do dashboards. You can do reporting. Most importantly, it can actually support, you can build applications on top. You can build notebooks and all the applications are run through Python. We have an extra key for that. Then you can actually package your forecasting or whatever and make it really nice with the drop-downs and everything it can do. Businesses like to play with it.
Their experience with limited ability of the output, looking at the graph, looking at the business models that you as a data scientist can handle in your application.
The way it works is you have the interface. I’m now going to show you how to login. It’s just a login and password. You have these different tables. Let’s go ahead and click on this table, okay? This is time series data. You can see we have date. We have week. We have SKU IDs, customer ID, regen. We’re actually some value like sales per customer per SKU for this week.
It’s like historical data. Notice that we actually augmented this COVID data. Again we also added sentiment to it on that product or just some market sentiment around the topic. We also have a search interface. You can save things like I don’t know, COVID number of cases per day. Again, actually I’m hardly typing anything and I’m actually getting all of this. How I can actually draw a chart right here. I did two charts. They’re line charts. They are just a few letters.
The one here is actually let me see, that’s the average core uses per day. Then that’s the average per day in this case. If you want to add sales to this. I don’t know if there’re sales here. Let me see. That makes sense. There we go, sales. That’s sales. When we go to the right side, you can see the recorded cases, historical data. There’re these boxes here, but there’s probably some in here. This includes both COVID cases that are current as well as future as predicted by SEIRD models.
Some of this is predicted by a model. Some of this is actually like a forecast that came out of those predictions. We have all of that. What we really want to do. This is a forecast. We do forecasts. Independent of the COVID values, but COVID values are added here for just for future forecasts, based on the COVID data you see here. This could be going beyond COVID. This could be any assumption like social distancing, lockdown, whatever. You can actually bring that and then modify this forecast and I just took the [inaudible] for this.
That’s what we’re going to do. That’s why it’s called the thing. How do we do that? We have an app for that. We built an app. Again, it’s built in Python with a nice UI. Then we want to use the table. We want to see our data from variables and I want to create a dashboard. Now, from that data, we also specify the different sensing variables, [inaudible]. We have the region, customer column, and then we want to pick the target column. This is specifically sales. What we’re doing right now is we’re going to build a model, a time-series model independently of the COVID first. The core cases did come from a scaled model, but when we build a time-series forecast, we’re just going to build it as there’s not COVID data.
Then we’re going to make a distance to that data based on COVID. That’s what we’re going to do. We’re going to use MAPE as a validation method. You can use automacy or whatever. Let’s pick some regions real quick and some customers. Then we’ll pick some SKUs of interest for these customers… Let’s pick the dashboard. Then look at the model sensing. There we go. Okay, so this is the COVID data. This is the actual sales. We use this as a test set just to see how well we’re doing. Then you have the forecast of sales of the sensing model. Basically, the forecast was saved from the , then we’re adjusting the forecasted sales based on the COVID data.
We’re actually learning everything here. We’re trying to apply that here. We are adjusting that. That’s based on a little regression inside and we adjust that. The point being here is that you can have a short-term forecast like this. Then you can have just a forecast based on additional data. Now the forecast is much higher for this product because the COVID is going up. This is the kind of thing I was talking about.
You can do nine tenths of this forecast. You can also do a code forecast using SEIRD models. Then you can have demand sensing to combine both of to come up with a time-series forecast. Now from a business use it. This is not a Driverless AI interface. This is a very business specific interface that was created by our data centers to a business. You see the power of machine learning following these apps. These are VI apps that are use AI under the covers. That’s the purpose of Q. I think I am about to wrap-up right now. I think I will pass it off to Qybare to see if there are any questions. Then we can go from there.
Thank you, Karthik. That was great. Thank you for doing a great presentation. We have some questions in the chat window. We’ll try to get to all of them with time permitting. One question is, is the demand sensing app using Driverless AI to make predictions?
Yes. It uses the Driverless AI to make predictions for the time series model. Then in order to combine the COVID data to do the final demand sensing, we actually do that in the Q app itself. We’re probably using one inside the Q app. We’re doing all the heavy lifting outside the automatic machine learning entirely. If that’s not good, I don’t know what is. Then when the data comes back we are actually doing the mashup in the Q.
All right. What is the end-to-end implementation time for Q and Driverless AI?
I think that’s a great question. For Driverless AI models, probably minutes or seconds. I wouldn’t worry too much about it. Once you have that, then that’s going to do all the stuff behind. Maybe a few hours workspace in case you plan on building on a large set of data or something. With Q apps, anywhere between it depends on the proficiency of Python for your data centers. A little bit of training on how to use the STK. Some of our data centers for experience probably build it in weeks, maybe a day or so, sometimes weeks. Sometimes it’s a very complex Q app. It could probably take a month or so. If the data center’s working on it, we should be able to get something going very quick.
Okay. I saw you were using Driverless AI in AWS, is it only available on cloud or can it be on prem?
That’s a really good question, I would say. It’s already delivered on all the cloud platforms. It’s already on Amazon. It’s already on Microsoft Azure. It’s already on Google Cloud. It’s also on [inaudible]. [inaudible] can run on boxes like UNX boxes with GPU that are instead [inaudible]. You can have GPUs with cool-down drivers. Sure, it can run. You can also run it on, if it’s big enough on EMS stuff.
Great. What kind of Q apps do you have available? Is demand sensing the only one for COVID?
We’re building a lot of apps right now because we have an issue. We have tons and tons of apps coming out. This is probably things like NL insights. We have apps on OCR. This is the one that we are building that’s using just pure COVID prediction that they’re actually different hospitals with. There’re connectors to so many different data sources like Salesforce. We do MDM solutions. We do money lending. You can see we also have a lending app to predict markets do fall, forbearance. Also, if you’re talking to Salesforce for example, you can do leads and intelligence. This is growing by the day. Oh, I forgot to say two more right now.
Data augmentation, when you have datasets from different data sources, we’re building an augmentation app, which automatically augments data like with COVID for example. You have a data center and you have the COVID data. You just say, go find all the datasets that I can augment. It’s going so smoothly, space and find for you. Then we have the forecasting app, which is getting very popular by the day, which is our financial forecasting. We’re building our apps around the big thing next. I don’t know. The next few months, cash flow, very cash on-hand. What about payables and receivables. They’re also explained in shopping. This is growing including [inaudible]. I would say maybe 100 plus apps. Short answer.
Perfect. Besides demand sensing, what other supply chain use cases could you solve with H2O.AI?
That’s a great question. I showed you a slide before. Let’s go back to the slide again.
There we go. See this. Manufacturing… Oh, sorry. Okay. Manufacturing, warehouse management, transportation, distribution, logistics, the whole supply chain. All the domains that are around the supply chain. Including part failures. It’s a supply chain. Then let’s say you have an electronic device, and something fails then the parts didn’t come. What if there’s a manufacturing defect? We have quite a few models that wipe and mashing up data. All of those use cases around supply chain, you can plugin the automatic machine and Q.
I think we have time for one question. In the current situation where the local and global supply chains are badly broken, how can AI help predict this ahead of time. Which AI model can help this?
Right now COVID, the last few weeks COVID was the big uncertainty. COVID affects everything. Supply chain directly or indirectly. It effects including the number of employees coming into work. Everything, including transportation. There are no planes available for flight. Whatever you want to come up with. There are different models that are outside machine learning. I showed you the group models and the SEIRD models. Just equations to model that. You can actually use them to come up with, try to fix something.
Then bring those predictions into your automatic machine learning. Automatic machine learning is not going to go away to do something. You’re going to do it day in and day out. The more data you’ve got, the questions, what kind of data you can augment, your data. A lot of third-party are not available. You can see in the marketplace like mobility data. Not so much available. You can augment that with the AI/ML models.
Great. Wonderful. I think that’s all the time we have for questions. In the chat window, I did put a link to try Driverless AI. You get a 21-day free trial. There’s a link. Thank you, Karthik for taking the time today and doing a great presentation. I’d like to say thank you to everyone who had joined today. The recording will be made available shortly. Everyone, have a good rest of the day. Thank you.