David Ferber has over 20 years of experience in the credit industry focusing on the technical development and growth of Equifax’s data platforms. This includes Equifax’s state of the art big data analytics platform that provides valuable cost effective insights and analytics, on its wide range of differentiated data sources. At Equifax, David has held multiple positions; Vice President of Technology, Decision 360 Technology Leader, Enterprise Data Leader and most recently Solutions Delivery Leader within Equifax’s Data & Analytics organization. David holds a degree in Computer Science from North Georgia College.
Pinaki Ghosh has over 20 years of industry experience of which 17 years at Equifax building data management and real-time decisioning platforms with strong background in software development, IT and data management. This includes building cutting edge Big Data Platforms which provides high speed access to differentiated data for building rich and actionable insights for Equifax and its clients. Pinaki is passionate and expert in the disciplines of Data Quality, Data Management, Entity.
This session was recorded in NYC on October 22nd, 2019.
Read the Full Transcript
Thank you so much. My name is David Ferber. I have Pinaki here with me.
Thank you for inviting us Sri, Kerry, and the H2O team. It’s our pleasure to come here and talk to you today about what’s going on at Equifax. I apologize ahead of time. I had laryngitis, Thursday. I could not talk at all. So I was in a little panic mode. So Pinaki may do most of the heavy lifting today, depending on how the voice holds up. So thank you so much.
So today, we’re going to talk about transformations. Transformations driven by innovation. And at Equifax, we are in a major transformation mode right now. And there’s a lot of new things happening. And I want to talk a little bit about that. And we’re also going to talk about how AI and our analytical platforms are enabling our customers to do more, supporting consumers with their financial lives.
As you can see here, some transformations are personal. Me personally, I was a crazy computer scientist, 25 years ago. Working, coming out of college and working for Equifax, to the person I am today here, talking about AI at a conference. It’s kind of funny if you know me personally. It’s been a big transformation for me. But we’ve also done a lot with our technology platforms in migrating from mainframes to proprietary supercomputers, to commercially available distributed platforms. And now, we’re moving to the cloud.
So to be relevant, innovation has to impact a challenge or to help a community, right? So what we were looking for at Equifax is how can we be better at supporting our consumers around the world, not just in the U.S. And so, our new tagline is helping people live their financial best. And it’s a very important thing, if you think about the people that are impacted globally, by the data that we have and the analytics that we use against that data. And so, today we’re going to talk a little bit about that. But I really want to dive into the impact that we have worldwide on the consumers, and how are we doing that. So you think about what we do at Equifax. We have data. We have over a billion consumers represented in our data assets across the globe. But if you think about the positive impacts that we’ve had on those people around the world, I want to touch on a few of these highlights.
9 million people were able to further their education by getting student loans, using our data. 12 million people are able to get access to healthcare through our verification of income and employment data. 400 million consumers are protected from fraud in Europe. 8 million people in Latin America are getting access to credit. 12 million people, I’m sorry, 32 million people were able to buy a vehicle with our data. So getting that loan to buy that new car. Major impacts on people’s and major decisions that are being made, leveraging the assets that we’re collecting, analyzing, and producing. But, so how do we do that? How do we make it better? How do we enable more consumers to get access to credit, and have the lives that they really want? And that comes to our next vision here. It’s about data obviously, but data is just one piece of the puzzle. The advanced analytics.
And we’re going to talk a little bit about H2O, and our machine learning areas there, and how our customers are using that to support better, deeper insights into the data we have. But it’s also the technology platforms, and how do we deliver those insights, and make actual decisions on those analytics in real time. And so, Pinaki and I are going to cover some of those pieces, and what we’re doing there. But let’s talk about the value chain, and the areas that we had to solve problems for. It starts with getting access to the data. If you think about all the data assets that are available in the world, there’s literally just masses of amounts of information out there. And how do you get connected to it? And how do you bring it in? And we’ve built what we call a data access engine that allows us to connect in, not only to our own data, but our customer’s data, and other data assets that are available in the industry.
And then, once you have that data, you have to link that data together, right? You gotta be able to connect it. And so, we’ve built an algorithm. A matching algorithm, we call Connexus and Ignite Link, that allows us to link all that data assets in real time. And then, once you have all that data together, you have to do something with it, right? So here, we’re talking about generating insights. The insights that we talk about is, attributing that data, cleansing the data, building models in with the data, and then deploying those models into our production systems. And then, that in generate insights pieces, what we’re talking about here, with H2O is a great fit for how that plugs in to our analytical ecosystem.
But we also want to focus on, and have a lot of efforts on how we get those insights into our production system. So how do we deploy to our decision platforms? That was a big challenge that we’ve had, and we’re still continuing to grow that. But we’ve made great strides in there. But then, the last piece of this is optimizing those decisions. Once you’ve made a decision on a consumer, whether you apply or grant credit, or you give them a marketing offer, or whatever decision is your business is trying to perform, you want to monitor those decisions over time to see how they perform. So feeding that data back into our Linolex platforms have a really important piece of that. And so, we call that the feedback loop in its native part of our technology platforms.
So what we did is we brought all these together. Our analytics platform, we call Ignite, and our decision platform is called InterConnect. We’re going to focus mostly on Ignite today. Because that’s where all the development happens, where the machine learning happens, and where all the data assets we’re bringing together and coming into one place, where teams like Pinaki and our customers can come in, and actually generate those insights. So I’m going to hand off to Pinaki right now. And he’s going to talk and deep dive around what we’re doing at Ignite, and how that platform works.
Thank you Dave. Hello everybody. My name is Pinaki Ghosh. So I’ll quickly touch upon what Ignite is, and how it’s adding to the social cause and the benefit of our consumers, right? So basically, if you think about Ignite, it really builds upon what David was talking about. It’s data, technology, and analytics around that. So it all starts with differentiated data, right? The more data we have, differentiated data we have, we can make better decisions. Whether it’s differentiated data within Equifax, our client’s data, or third party data, and different exchange data as well. Now, there are several types of data sets that are available inside Equifax. As you can see, these are all linked and connected pretty well. We have 550 million people, places, things, all connected. All sitting in one platform. Now that makes it really, really valuable and powerful for our clients, and us to build those super cool insights that we want to build. Things that we have not seen in the past before. And that has been powered a lot by the modern technology stack, and tools like H2O.
As you can see, different data assets that we have, whether it’s consumer data, which is the 800 pound gorilla, right? The consumer credit. We have the commercial credit data. We have property. How a person buys property, moves property. Auto, which is the cars. Then we have their phone bills, utility bills. When you start looking at all of this, 360 degree view of a consumer, you start seeing a really, really bigger picture versus just a credit history report on a consumer.
And I would like to touch on the key aspect here. Traditional credit has been around for 30, 40 years. We’re going beyond that now. We’re looking at alternative data sources, like your utility bills, like how you pay other bills that are not on your credit report, as key variables that come into our decision process. And some of these assets are only available at Equifax. But the point here is, that we are expanding and our lenders are expanding what data they’re using to make credit decisions, which is benefiting the consumers. Because now you’re getting credit for how you pay your cell phone bill. How you pay your cable bill. And if, I don’t know about you guys, but me and my cell phone bill is more than my car payment in my household, because I have five people on the internet. So these are substantial amounts of payments that are happening, that consumers are being ignored or not getting credit for. So we’re making sure these are available to our customers as key aspects for decisioning.
That’s a great point you mentioned David. And as part of Ignite, this is how we’re exposing some of this data sets that was previously not available to our clients. Like Dave said. I mean, my phone bill is equal to a car payment as well, because I have four people in the house. And so, we’re bringing all that data, sitting in one place for our clients to come and do analytics. Now they can really expand their analytics, all in one place, in a secured fashion, with a bunch of different types of tools and techniques, all deployed on top of it. As you can see, we have this big data store, that has all the data sets that we talked about. That is already pre-keyed, and linked, and cleansed, all sitting in one place. And then, we apply our security data governance and metadata model, all on top of that data. And then, make it available to our internal data scientists, which is about 400 to 500 people across the globe. And also to our clients based on their instances.
Now we’re talking about exposing them to SQL querying engines. Whether it’s data engineering tools, or attribution tools. Modeling tools like H2O, R, Python. And then, visual analytical tools. All sitting in one place. Now, the client has the ability to build a lot of good insights that can now be delivered in a couple of channels. Whether it’s real time decisioning channel, or even batch. Again, H2O makes it really smooth and easy for us to take those artifacts and deploy into production.
Explain, Dave, you want to touch upon a little bit of the explainable AI?
So, one of the things that we’re doing at Equifax that we’re trying to deliver on our mission to help people live their financial best, is explainable AI. And the regulatory environment that exists today with credit, it requires our models to deliver reason codes of why the score was what it was. And that’s traditionally why regression type modeling techniques are being used. And they’re good in a lot of use cases, as models are very predictive and they do really well. But because of the ability not to explain what machine learning is doing, in some cases, we can’t actually act on some of that, or take advantage of some more advanced modeling techniques, to get better predictive models. So we’ve developed and patented a technology, we call NeuroDecisioning Technology. And we make it available through Ignite.
So our customers can now deliver and build models using neural nets and Random Forests, and some other algorithms. I think gradient boosting machines are there to be able to develop more predictive models for certain use cases in the risk modeling space. We’ve been through the patent process, and we deliver that, those math algorithms to our customers, so they can build models. We build our own models with it as well. But it gives you that explainable reason code, if you’re using a neural net type model, or gradient boosting model, that you couldn’t have before.
So I think it’s a big buzzword in the market today. We’re hearing explainable AI more and more. But we consider ourselves a leader in this. And we’ve been doing it for several years. And it really is having a positive impact on consumers, in how decisions are being made on granting credit.
Great, thanks. So I’ll just quickly touch upon, in the interest of time. I’ll just quickly touch upon the overall Ignite ecosystem, and just talked about it a little bit. But this shows you the overall orchestration of what our Ignite platform looks like. It’s a continuous development, deployment, monitoring, and improvement platform. It’s really to bring all the data in one central location, providing the tools and technology to build insights, and deploy insights, and continuously keep monitoring and improving upon it. Right? If you think about a classic use case, it all starts with a market type analysis. If I was in a business, I want to go and look how the market is performing, compared to my business. And I want to see how my peers are doing, the rest of the market is doing, and how I am stacking up against them.
I’ll be like a typical use case, right? And then, I would say, “Okay, how do I get better ahead of my competition? How do I make better decisions with better data sets?” And so, that’s how they come and explore our differentiated data assets. Now, once that is all selected and agreed upon, then we put together our Ignite solution for our client, where they come in and start accessing our data sets, all in one location. Now, as you can imagine, it all starts with data wrangling, data engineering exercise, right? You’re sifting through tons of data. Billions and billions of records, all in one place. The processing power of Hadoop in a snap of a second, sometimes more than a second, depending upon the type of queries. And we bring in all industry standard tools for the clients and our own data analyst. By the way, this all looks the same for our own internal folks, and for our clients as well.
Now, as part of that exercise, when a client is built their modeling data set, then they can pass it through cool technologies like H2O, R, and so on, and so forth. And this is where the rubber meets the road. This is where you could build the super awesome models, that has never been experienced, right? So they build those models, take these models and deploy into our production platform. Now, we have other tools, as you can see, as part of our overall technology stack. This is our homegrown tools which we call the Attribute Engine. It’s built on top of Python. We have built a framework with a rich history in the industry. We understand how our clients build attributions all in one place. So we built that. And then, we also have Advanced Modeling Engine, which is where we plug in the NeuroDecision Technology, which Dave was talking about. So that really gives the complete data pipeline, plus an algorithm execution engine, all stitched together. And then, it also produces artifacts that can be deployed. So we are not-
Yeah, we’ll touch on that one piece. The deployment part of this. Using our homegrown tools, where they were designed to integrate with our production systems, to make a frictionless deployment possible. So we avoid the whole recoding process. If you look at the tools that are in the industry today, you use the analytical tools to develop your analytics, and then you recode them in a deployment tool. So you can run them in production. We’re trying to avoid that now. And we’re going towards a seamless deployment process. We’ve done it in a few pieces. We’re not there, where we want to be yet. But if you use our homegrown tools, the Attribute Engine and the Advanced Modeling Engine, you can take advantage of some of those algorithms, and you can deploy your code directly into your runtime environments for execution. And that saves massive amounts of time. In some cases, three to six months, it takes some of our analysts to build and deploy models. Because of that process is so cumbersome and difficult.
So we found a lot of great advances there. We’re looking up at that stack like H2O and their ability to deploy, and making sure that we have integrations with our production systems with those as well. So it’s a really critical piece of this. Saves a lot of time. Saves a lot of money. You get higher quality. And I call it one single source of the truth. You got, what the analytical exercise developed and delivered, is what you’re actually running in production. And it’s a big change and mind shift in what we’ve had in the past.
I think that’s a wrap, and we can take some questions.
David Ferber: Real quick. I do want to touch on that. Once that model is deployed, I talk about monitoring and managing the performance of those models. So we make sure we have the tools in place to monitor the performance, that those models are still acting like they should. So we feed that data back in, and we continue to allow constant validation and model performance management on those models. So that if you start to see degradation, you can bring them back into Ignite, and make those enhancements to the model, and redeploy. So I want to make sure we’re clear on that piece of it as well, is that it really is focused on the front end of development, the middle piece of execution and monetization, and then that backend of monitoring as well, your decision strategies are working to the level that you are acceptable to your business.
So yeah. With that, I would turn it over to the team here to offer any questions, or ideas, or input.
Hello. You probably got tens of thousands attributes for a customer. Could you talk a little bit on how you do the feature selection or feature engineering before you throw all the features into your model?
So we, I’ll touch upon it and you can add to that. So, yeah. We already have a standard set of features that we have built over the years, based on our understanding of the clients or consumers. And so, different feature sets are available, based on different data assets, right? Whether it’s our utility and exchange data set, there are standard features available. There for consumer credit, we have standard feature sets available, which we call Attributes internally. And so on and so forth. For different data assets, we have standard features. And we continually keep adding more to those. If you think about it, traditionally, they were based on static dataset. Now we also start adding more based on longitudinal view as well.
And I think our tools enable some of that feature engineering capabilities as well with our own, so our analysts have their own processes that they do. And our customers also have their own. But the tools enable some of that. And to be clear, Pinaki and I are more the technology and the data guys. We’re not the analytics guys. So when I was mentioning earlier, it’s kind of funny. We’re up here talking about machine learning and AI. But we have enough knowledge to be dangerous.
Dangerous yeah. No, we also use other tools. I know that our data scientists use some of the other tools to create additional features. And that’s where our Attributes Engine comes into play. If you look at five B or five A, it is the Attributes Engine. So we have certain standard templates that are written, that explodes one feature into many features. And then, there’s a feature reduction that happens at the end of the process cycle.
Any other questions?
Actually, we are out of time. But I’m going to, want to thank David and Pinaki. They’re going to be backstage. So if you guys have questions, feel free to grab them. I’m sure they’ll be happy to answer more questions.
Yeah. Thank you guys. Thank you so much.
Thank you everybody.