This session was recorded in NYC on October 22nd, 2019.
Slides from the session can be viewed here: https://www.slideshare.net/0xdata/amitpal-tagore-integral-ad-science-leveraging-data-for-successful-ad-campaigns-h2o-world-2019-nyc/1
Leveraging Data for Successful Ad Campaigns
Marketing dollars should be spent to reach real people and make digital campaigns successful. IAS leverages large amounts of data and machine learning software to measure, analyze, and predict on billions of digital advertisements every day. I’ll be discussing how we do this in the context of fraud detection and brand safety, helping to ensure marketing dollars are used to reach the right people.
With a desire for problem-solving and handling messy data, Amitpal Tagore completed a PhD and postdoc in astrophysics. Using the skills gained in academia, he became a data scientist at Vydia, working with rising artists on social media. Currently, Amit is a data scientist in the fraud detection lab at Integral Ad Science.
Read the Full Transcript
Thanks for the intro. Hi, I’m Amit, once again. And yeah, I’m a data scientist in the Fraud Detection department at Integral Ad Science. IAS for short. So yeah, so today I’ll talk to you a little bit about digital advertising. And so that basically means any kind of advertising you see on your laptop, or on your mobile phone, or on a connected device like a smart TV or the more modern gaming systems as well.
A little brief intro to IAS. It was founded about a little over a decade ago, and it’s grown to be a global company approaching about 1,000 employees now. And what we do is we sit between companies that want to advertise digitally and companies that have space, say, on a website to show advertisements. We don’t connect buyers to sellers, but we do sit in between them, and we monitor and we analyze what’s happening. And we actually block advertisements from being shown sometimes when it’s appropriate.
And to do that, we collect hundreds of billions of measurements every single day. And we can do that because we have partnerships with major players in the industry, some major exchanges where these buyings and sellings happen, as well as with major advertisers like Google and Facebook, Instagram. So… Whoops, there we go. So we do that in, broadly speaking, four different ways. And I’ll talk about the first three of these in more detail later on, but brand safety. That basically means that companies don’t want their advertisement placed in a context that might damage their reputation. That makes sense.
Fraud detection is another big one. When, if you’re paying for your advertisement to be shown, you want to make sure it’s shown to a typical normal user. You don’t want it shown to somebody that’s… to “it,” not somebody. You don’t want it shown to a robot or to somebody that’s been paid to look at advertisements over and over again. You want it to be something that could be impactful.
Viewability. That means that if your advertisement is shown to a normal user, you want them to have had good opportunity to see and be impacted by that advertisement. If it just flashes at you for a fraction of a second, didn’t really have a chance to make an impact.
And then optimization. That is another product where we basically work with individual campaigns, using aggregates of all the data we have and insights we’ve gained, to maximize the impact of an ad campaign.
So brand safety, the first one. A typical example might look like this. You have a tragic story. So it could be something like a plane crash that just happened recently, and all the news sites are now reporting on it. If you’re an airline company, you may not want your advertisement shown on that page. People may not think you’re responsible for the plane crash, but people have negative connotations that happen subconsciously, and you might want to avoid that. Another example could be, if you’re a children’s toy company, you may not want your product on a website talking about violent video games.
And we want this to happen dynamically. So in the online ecosystem or in the digital ecosystem, when a company that has space to show website wants to sell that space, it goes to a central platform, and it says, “Hey, I have some space to show an advertisement.” And then other companies can bid on how much they’re willing to pay to have the ad shown there. So IAS can sit there, and we can give predictions about how safe a website will be for a particular client. And using that information, the client can choose whether or not they want to even make a bid. It may not be worth their while, or companies might have different thresholds for risks they want to take.
So here’s a real life example of what one of those websites could look like. localchron.com. So one of the first things to notice about this is, this isn’t just one website. It is, but there are many, many sub domains. So you look at the bottom there, there’s a lot of sub domains. And even within those, there are many, many sub domains. So you can have localchron.com/local/US, international, and more and more slashes, .html. And a lot of them will have very similar layouts, and they’ll also have a lot of repeated texts across all of those sub domains that are very similar. So a lot of external links, internal links, the site map at the bottom itself will be repeated. And a machine learning model, you don’t want it to pick up on these repeated things. Well, if they are themselves bad, that’s one thing. But if they’re fairly innocuous, you don’t want it to use that as a feature. So let’s say on the whole localchron.com is fairly safe, most of the subdomains are safe, a machine learning algorithm might look at all this repeated text and start using those as indicators of safe websites.
Another thing to watch out for is where you get your text from. So there’s a lot of different places here. Like I said, the site maps, external links, the headline could be repeated across places, the URL, and there’s even metadata associated with the website that you can use. And it’s, I guess, up to you to decide what [inaudible 00:05:39]. So you can get a bunch of data like this, and then you can start to create all your models. The internet is a global place, so there are many, many different languages. We typically work at IAS with about 40+ languages. And for each of those, we rank about 7+ categories that clients might be interested in. So they might want to stay away and avoid sites that talk about violence or adult content, alcohol, gambling, things like that. And they can have different risk levels as well. 3-5 or more. The risk levels are actually also set by a central agency to kind of standardize that. And of course you want to use this across many, many different websites.
At IAS, at least, we offer another solution as well, where you can decide if you want to show your ad on a site based off of keyword blocking, so… That wasn’t supposed to happen. So, let’s say you just hate baseball and you just don’t want your ad show on baseball. You use that keyword to block on websites that talk about baseball. And for every keyword list that you want to block on, we have to create another model as well. So the number of models that we require is definitely in the hundreds, and it is growing all the time.
So once you have these websites and things like that, and you want to start building models, you have to start thinking about languages. So a lot of our data is text. And we’re going to have to deal with this. So alphabets, there are a lot of different alphabets on the internet. If you choose to go with, say, a language-agnostic model, which is not unheard of, you’ll have to be able to account for all of these. But if you choose to do language-specific models, then this isn’t a big problem. But you have different problems depending on language you had. So in German, for example, you can have compound words that are created almost dynamically. So this word right here, I won’t even try to say it, but it’s means five year contract, and it’s made up of five, year, and contract. So your algorithm has to be able to see this, decompose it into its constituents, and then deal with those somehow.
Other languages, like some Asian languages, they don’t use alphabets that have sounds. They use characters that themselves have meanings and express ideas. So you have to start dealing with those. And if your characters now represent ideas, you may not need white space anymore, so they may not be a white space to draw clear segments in between them. And then even if you have the same language and everything, you still might have problems based on cultural differences. So some parts of the Northern England, they call the evening meal “tea.” Here, we call it “dinner,” or most of us might call it “dinner.”
And then lastly, online, it’s sometimes just easier to write in English than your native alphabet. So “namaste” could be written in either of those two ways on the internet. And if you’re in a comment section, it could go back and forth between different alphabets, different scripts. So a lot of different nuances to handle.
H2o is very helpful here to us. It deals with text very naturally. It does various things that we’ve probably heard of throughout the day. TF, IDF, frequency counts. It embeds words in a vector space, words that are very similar get put closer together. And then on top of those, it can do transformations, like eigendecompositions, or linear transformations and deep learning networks. So it can really kind of dig into the data and do lots of things. So this has been very helpful to us. Before this, we were doing the feature engineering ourselves using… there were some TensorFlow components, some linear models like logistic regressions, and IE base.
But we started using driverless AI more recently. And this is kind of what one of our dashboards might look like, which a lot of people might have seen. But basically, we need to be able to create lots and lots of models rapidly, iterate on them, test them, deploy them. So H2O has been very helpful in that. The precision and accuracy of our models hasn’t actually increased a lot, but our ability to try out new things very rapidly has increased. And that’s been a time saver for us, definitely.
So, yeah, that’s enough about brand safety. We also work with ad fraud. And so little introduction to ad fraud. Why do people do it, specifically ad fraud? Well, I don’t want to incentivize anybody to do it, but it’s a good gamble. So you can scale very easily in ad fraud. If you create some malware, you just need to create that malware one time, and you can kind of infect a lot of machines, and get a huge botnet going, and if you could shut down, just start up another one. Tweak the software a little bit, version 2.0, and go again.
The risk is also pretty low. People are more interested in solving violent crimes. It’s also legally kind of a gray area, what is and isn’t ad fraud. So it’s attracted a lot of attention, but it’s also starting to attract more attention from the authorities as well. So maybe a bad time to get into it, but it does nevertheless cost companies a lot of money. So companies typically spend hundreds of billion, I think, or a little over $300 billion annually every year on advertisements, digitally. And ad fraud takes away about 5 billion of that.
But at IAS, we have a kind of a three pillar way that we kind of try to fight it. One is behavioral analysis. Robots are going to act very differently than a normal human user was going to act. So we can look at what they do. The most unsophisticated algorithms are going to do things that are very repetitive. They’ll do things that, every five seconds the mouse will move in a very jagged way. So it’ll be very easy to detect that. There are smarter ones out there though. We also look at browser and device analysis. Fraudsters typically need to modify the browser in some way. So we basically do detection on that and try to see if you are using Chrome 77 if you say you are, or if you’re on Windows 7 if you say you’re on Windows 7. And lastly we do reconnaissance. So we will reverse engineer malware to try to track it down.
One way that you can do ad fraud to rack up the money real quick is by hiding ads. So this has to do with viewability. Basically, if the orange thing are your ads here and you show them off the screen, you have essentially shown 20 ads, and the user didn’t even have a chance to see any of it, but you made money off those 20 ads. So this is one case of fraud. You can also show some on the screen and some could be shown off screen. You could stack them all into one little section. So all your ads are stuffed into very small area. Or in the worst of the cases, you’ve loaded entire websites and you’ve stuffed them into itty-bitty pixels. And so for you, if you’re looking at this, you’ll just see like a speck on your screen. But you’ve basically loaded up 20×4 ads in this case, or however many you’re doing it. So lots of different ways.
One of the things we offer is predicted viewability. So again, like the brand safety solution, during bidding time, when an advertiser bids on whether or not they want to show their ad, we provide a predictive viewability. What is the probability that your ad will be seen by a real user? And we provide that at bid time. And then the company can decide based on that, whether or not they want to bid.
So we recently at IAS, one team of data scientists, I wasn’t involved personally myself, but participated in a hackathon. And we basically wanted to challenge ourselves to improve our product, and specifically this viewability product. So we use our proprietary data. We measure around 10 billion ads digitally. We get data on them every single day. And we get various types of data. We get things about the URL, the websites, the device, the browser, the operating system, and the advertisement itself. And this looks very clean, but the data is very messy. Coming from an astronomy background, the data in astronomy is often really messy. Sometimes it’s really weak when you’re measuring things out in the universe. Very noisy. There’s planes, radio, cell phones that contaminate everything. This data has been even messier than that.
But we also get viewability measurements, which is basically a measurement of where the ad was placed, how big the ad was, how long it was in view. So we get that measurement as well. And we can use that to make predictions on future advertisements. So in this hackathon, we used a number of tools, as you can see here. One of them is Hive. So we basically store our data in Hive, which is a non-relational kind of data warehouse that uses SQL-like syntax. And we did all of our model building using H2o products in this case. So we used GPU-powered AWS machines. And hybrid approach with H2o, we used driverless AI to do our feature engineering. And then kind of looked at that and used AutoML to create the actual models. It was just a kind of a way to iterate very quickly in this case. And then some clean up, post-processing with Jupyter/Python.
So as a result of this, we were able to get 46 significant, important features out of driverless AI, like the brand safety solution. I didn’t mention it, but they all seem to prefer tree-based methods, LightGBM, gradient boosting trees. And in this case, we actually did see a very, very noticeable improvement in the accuracy of the product. And it also sped things up. So this has been pretty useful to us. So basically in summary, we can use all the vast amount of data we get, along with new automated features that H2o provides, to kind of iterate quickly and also provide ourselves a little flexibility and flexibility towards our clients, based on what they want as well. As well as rapidly keep testing, iterating, and making things better. Thank you. Thank you.
Thank you. Now let’s see, do have questions? Over here.
Hey, great job on the presentation. So you mentioned the viewability prediction for a particular ad page. Are you just forecasting impressions for that that webpage at a particular moment of time? Or what exactly are you doing there? Are you giving some probability of whether somebody is likely to press on the ad?
Gotcha. The first. We’re making a forecast. Definitely. So we’re using historical data about 30 day backs in this case, to make predictions on what’s going to happen right then. So it’s not monitoring what the user actually is doing in that moment in time. There are other things like the optimization that might look at that.
Thank you for sharing this. This was fascinating. So quick question. You mentioned that in the hackathon you were provided with four basic inputs, and then you mentioned that using the H2o product, you had 46 engineered features. So what took that from 4 to 46, if you can just… What did H2o do there to give you all these different features? Would be interesting to…
Gotcha. So I wasn’t on this particular project myself personally, so I can’t actually speak to what exact features it used. I know that the deep learning models were turned off in this case. So it was mainly tree-based methods, clustering analysis, that kind of a thing. But I don’t actually know. I could follow up and talk to people that actually were involved.
Actually, I’m going to… If you go outside and ask one of our data scientists, they can tell you exactly how do we do feature engineering. That’s exactly what I would say it does. So if you’re curious how we can go from 4 features to 46 features, step outside, get a demo. Any other for data scientists outside? Yeah, that’s exactly what the product does.