This meetup video was recorded in London on February 28th, 2020.

Bio:

Andreas Vrålstad is a Developer Advocate at Peltarion, where he focuses on making AI more accessible for the developer community. Prior to Peltarion, he did a range of things including running his own company within environmental sciences, developing autonomous underwater vehicles and working as a project manager within the IT sector. He has a Master of Science in AI and Computer Science and enjoys breaking down complex concepts to make them easily digestible to a broad audience. https://www.linkedin.com/in/andvra/

Read the Full Transcript

Andreas Vrålstad:

Yeah, I flew in from Stockholm earlier today to come to this meetup and talking about AI and AI adoptions and listen to other talks later and first thing that struck me was on the airport that you have some technology and some AI adoption that we don’t have back in Stockholm, Sweden and that’s the e-gate. You’ve all seen it at Heathrow, right?

You go there with your passport and you just put it in your hand, press down the passport, and you look into the camera and the gate opens. That’s a really, really cool adoption especially in theory since when I tried it, it didn’t open and I had to go to this line for manual inspection anyway and stand there for 40 minutes roughly, but that’s just a sign out.

I’m Andreas and I work with a team and a company called Peltarion where we developed something that we call operational AI platform which I think it sounds cool, but it doesn’t say much. But in essence, it’s a platform for you to rapidly develop deep learning solutions even if you don’t have the interest in coding everything from scratch yourself and I want to talk more about that in details, but also before that I will talk a little bit about AI adoption that we have now today in companies and because we want to make that more rapid and improve the rate of adoption.

And then I will end with running a case on screen, a real case, and I don’t want to spoil it now too much, but it’s about music and it’s going to be an interactive one where all of you can use your phone for some voting which everyone likes voting, right? So we’re going to do that.

So let’s see. I also have a habit of explaining the agenda before I show the slide so I think we can just skip this one now. So yeah, so deep learning. There’s really a revolution going on there now with all of these adoptions. Say, for instance, the e-gate thing where it films the face and after a few seconds it can identify you, match it with what you have in your passport, and opens the gate, right? And that’s really, really cool and it probably does it with a really high confidence level since I didn’t get in.

I think it’s because I took my passport photo without glasses on and I had my glasses in the gate so it didn’t work, but they want to have a certain confidence level, of course, to do that. And this is really cool technology doing this. I mean, go back 10 years and you didn’t have it. It was very difficult to do it, maybe even impossible to do it, but now we can do it and why do we have a lot of deep learning applications today?

So one reason and our foundation, as many of you know, is you need to have a lot of data. Lots and lots of data and a very good way to find that is the internet, obviously, and generate so much data and I read somewhere that we double the amount of data of the internet every two years and also, for instance, just YouTube alone. Every minute there is roughly 300 hours of video uploaded on YouTube and about 10,000 tweets are written. I read this on the internet, by the way, so it must be true.

I’m not certain about this, but probably in the ballpark and even if I’m a frequent user for both these platforms so probably you could delete 99.999% and still be happy, but anyway it’s a lot of data over there, right? And as we’re developing and have more data, you also have increased of the power in the infrastructure. A lot of development going on with GPUs and hardware to calculate all this and of course, a lot of algorithms and tooling to be able to actually do something out of all this data that we have.

And so obviously if we can use this technology [inaudible] that we have today, we can all benefit a lot from it and we know about a lot of cool applications today. Cool and useful, for the way. Within healthcare, for instance, where you can get help predicting cancer at a very early stage which is awesome, of course, and you can help translating documents and things you write very quickly. So you can have an easy time communicating with people all around the world and in that way, make a more open world where everyone is included.

So it’s cool and yeah, you can do this to fight climate change, of course, and reach other United Nations sustainability goals and also maybe you seek to find a cure to the coronavirus so Peter doesn’t need to be scared about that any more which would be great. All right. And some big companies are really fully committed to doing this, right?

It was a few years ago, a number of years ago now, that Google said that, “Okay, we’ll be in mobile first. Everyone is using mobiles today and you have it on your phone, but now we’re going to be AI first and that’s what we’re going to do. Build a lot of applications on this technology. We’re going to go heavy on research” and that’s what they’ve been doing now for a long time. There are other companies doing the same. So you have Netflix, for instance, creating great algorithms to help you say what you can binge watch next or have speech recognition and so many different applications.

And one of the reasons they are successful, sorry, I feel a bit dry and I’m going to have some water. One reason they’re very successful is that they have a lot of money, obviously, and can put it into research and they can also afford trying something out for a while. See if it works and just drop it if it doesn’t and so many cool things come out of this, of course.

You have Siri you can talk to, Alexa, and you have just so many things that you use in your daily life. They can be really expensive and we have deep mine, for instance, doing so very many cool things that we can do. We have alpha [fold] for instance that I read up on a while ago which help you predict the threaded structures of protein just based on a gene sequence. Very difficult problem and very, very cool.

And also they made, you know all of these probably but I’m saying it anyway because I think it’s fun and cool and Alpha Star playing Starcraft. Are you up to date with what’s going on there? Yeah? Some of you in the audience. But anyway, I played the game Starcraft back in the days and I think it’s really cool now to see how algorithms are developed so we can actually beat grandmasters in this game.

But it costs a lot of money to try out things and not everyone can afford that. So how can we help adoption and the adoption rate of these technologies? So of course if you are a startup, you have a lot of good ideas on how to improve technology for humans and come up with good things or if you’re a business that want to make more money coming up with new products and whatever, there’s a high chance that you can profit a lot from using these new latest technologies.

And let’s see. So one obstacle for many companies are the technologies that they need to learn. Say you have an infrastructure and you have processes for developing IT projects. You’ve done that for a long, long time, but now you need to develop AI projects instead which is very different in many ways and here on the screen is just a number of different technologies. Of course they’re not all mandatory to build on your project, but this is a picture of the landscape that you have to deal with.

[inaudible] need to know some of the technologies in every step here for a successful AI project. So let’s see. So I’m building an AI proof of concept. You can do that, depending on what you’re doing, but you can do it usually with a quite small team. Maybe it’s you and two more people or maybe even just you if it’s small enough and you’re skilled enough then you could do it by yourself and sit there and work with [Pi Torch], [Tensile Flow], build something, but then when you want to put it in production there’s a lot more to it as well.

You need other skills, you need to cooperate with different people within the company to make this happen and to have a real application with a live version and build the whole application at the end. I realize now watching on this slide that model seems to be a very, very small part of the AI project which is obviously not the case, but it just shows that there’s many other things that you need to think of as well to make it all happen.

Inventing in a typical company where you have the data science department up and running already, it can look something like this that you have isolated business areas and isolated business science department that you have something you want to, let’s say, work on one department. You’re not a data scientist in this case and you have an idea what you want to try out. You want to build a project. So what do you do then? You call your data science department and you set up a project and run everything from scratch and the data science department, they might feel that sometimes, of course, wow it’s a cool project coming up, this is going to be fun, but also sometimes it’s really minor things that are just simple, routine work as well that you think you need to do this.

So an idea to get increase the rate of adoption of AI is to change that a little bit so every department itself, it can work. It has some AI tools or skills to work with that to build simple projects themself, to try out things and see how that goes, and integrate it. So of course, I can’t do everything and a data science team is still very much important here helping the departments with the data science still so it runs smooth, but also hopefully having more time or focusing on more challenging things so I really need to know the tech behind and know the algorithms to make it work.

I see here as well that the data science department, it feels like everyone is fired and it’s on its way out. That’s absolutely not the case. It’s just more on the side and can run its own projects as well. So what do we do then as a company? Obviously we are building a platform that we call operational AI and it’s basically a platform where you can, using a graphical user interface, you can have a look at your data and you can build your models.

You can have it [inaudible] set to do that, selecting the right model for you. We also can do your training. You can evaluate it and then you can also deploy your model. So the idea is you come up with an idea, you don’t have all the skills, you’re not a data scientist yourself, but you can do this to build something and build proof of concept and finding a use for it in your business in that way.

So with that said, I wanted to go through an example of something we built on the platform and have a look at that and it’s two steps. First I want to go through a little bit of theory since we’re going to work with music now. So I just want to walk through how that works on a quite high level still and then we’re going to have a look at the case as well. How it’s built on the platform and also try it out live. So let’s do this.

So working with audio, one way to do that is since there’s been a lot of focus on developing algorithms for image analysis for a long time now. It’s been a really cool area with a lot of research and this shows a little bit of the progress that’s been going on for the last, what is it, 10 years soon and how the algorithms are at some image classification tasks.

And one idea here and one way to approach music is to use the same algorithms for that as well. There are many ways to do it, but one is to use image classification algorithms as well, but then to do that since audio isn’t images itself or we don’t think of it as images, so we need to find a way to actually turn music into images and we’re going to do that using something called spectrograms which I know some of you know of but not all of you.

So I’m going to talk about how we build that from audio. So how many of you play an instrument? All right. Quite many. Half, roughly. Yeah. Good, good. So you maybe have seen notes like this before. I don’t know exactly what song that is, but I’m trying to sing it in my head now, but yeah. I don’t recognize it. But anyways, this is one way to represent music, right? You can just read out. You start on a high note here. You go down for a while and then you go up again and you can read that.

So you can actually maybe see that in your head if you’re a good singer. I don’t read notes myself. I play music though, but you could do that. And the next one is a wave form representation of music. So that’s happens when I speak into this microphone. the microphone picks up the changes in pressure in the air and it samples it. So one sample for every time frame, kind of. So depending on what you have as the sample rate, maybe you have 44k samples per seconds. That means 44,000 lines every second. So that’s a way to do that as well.

So what I can see here is this is time axis and here you have pressure. A pressure, basically. So in this new one, this new form that I want to introduce for those of you who haven’t seen it before, it’s the spectrogram which we’re zooming in on that one here now. So we still have the time axis here going there and here we have frequency instead.

So for every time step, you can see what frequencies are involved and you can also see the intensities of each frequency. So that would be down here we have the bass notes like [inaudible] and up here we have the high pitched and yeah. So to solve this problem now, if we find a way to make sound into spectrograms then we can use these pictures that is the outcome of that into our algorithms and use the same technology for finding patterns in music.

Which by the way, it’s very cool that you can use the same technology for a different problem and it’s not just music or audio. You can also use it for radio signals or what you want. So this is just to give an intuition of spectrograms. So I have three spectrograms and three different audio samples here now. So the idea here is that we assume that as well we can kind of see the sound playing. So let’s try the first one and we can see it. Listen to it and then let’s see.

So we had a beat here. It’s like a bass drum, right? [inaudible] and a snare. So it’s hard to see just the first time you see this one, okay, this is a bass drum and a snare. It’s quite difficult, but if you’ve seen this a couple times then you can definitely recognize it and come back to it. You’ll see that okay, that must be a bass drum. Okay. Try the next one.

Can you see that as well? How it works? So [inaudible] also you’ve got time here and frequency there. Same thing here. If you see that a lot of times, then you can probably recognize this sound just looking at this picture. Okay. So any guesses on the last one? What is it? It’s difficult. Yeah. That’s a difficult one. So that’s a bus and you can hear it’s all over the spectrum, right?

It’s a bus voice so you have some sound from the brakes and people and shattering and yeah, just all over the place. But anyway, when you listen to them, they are very, very different from each other and you can also see that the spectrograms are very different. Oh. All right. Let’s see. Didn’t mean to do that. So let’s have a look at how we can go from audio or this waveform representation because this one is really easy to get. It’s just recording what I’m saying here.

So let’s have a look at how we can turn this into a spectrogram because that’s what we want to put into the algorithms, remember? So the thing is it turns out it’s quite easy to do it. There was a guy in the, what was it, early 19th century I think called Yosef [inaudible] that came up with a way how to do that going from this wave form and find out the frequencies in it. Yeah. If you want to take a picture, I can take a break here. Yeah. I’ll send out the presentations as well so you can have a look.

So yeah, and you don’t need to remember this formula. And what you do when you apply it, you can apply this formula once for the whole signal. What would we get? That is just a one dimensional [inaudible] It will get all the frequencies, but all the frequencies that it finds in the whole signal at once. So if you have a long signal, it doesn’t make too much sense for this kind of application.

So what you want to do is you chop the signal into many small pieces. Usually somewhere around 100 milliseconds, something like that, depending on your application of course. In this case we did roughly 100 milliseconds. So take this little piece here and then I would just zoom in on that. So we go from here, we zoom in, we get this one, right? And for this small piece, then we apply the algorithm that all of you probably remember and then you get the first slice of the spectrogram as well.

So 100 millisecond from original [inaudible], it will be 100 milliseconds of the spectrogram. Right? And then you just do this for the whole signal one slice at a time and then you have your spectrogram and if you are a coder, all you need to do is one line of code basically to do this. So it’s quite cool and it can do that for you. It’s amazing how these libraries work. You just want to create this, you know about the spectrogram, and then you just Google “how to make spectrogram Python” and then you get this. It’s cool.

All right, and then yeah, then we put this through a convolution neural network which would give us some probability of different classes and I see now that I didn’t present the case so it doesn’t make too much sense now, but let’s do that instead. So what we’ll do now is we went through the theory, right? So now I’m just going to show how we apply this for a customer and then we will also listen to it and see how it works.

So we did this for a company also in Stockholm, Sweden called Epidemic Sound. Had anyone heard of them? Yeah? Camera man heard of it. Yeah. And yeah, so basically what it worked with is music licensing. So say you use it with your vlog, you want to put it up on YouTube, but you want to have some cool background music when you do whatever you do on your vlog, but then it can be really difficult to get license for this music.

So maybe you’re going to Spotify and listen to a song and then you call up Justin Bieber and ask if you can play it in your vlog. Maybe it works, but I don’t think so. So these people we worked with, they have a huge library of music. Tens of thousands of songs. And they label each song with categories they came up with such as dreamy, ambient, angry. A number of categories, labels, right?

So you ask to use it. What do you do? You pay a subscription to these people, of course, and then you just go in, browse the categories, and you find a song and then you’re free to use it on your videos. So it’s a good service if you work with… How many vloggers do we have in here? Anyone putting any content onto YouTube frequently? No?

 

Audience Member:

I do.

 

Andreas Vrålstad:

Ah. Do you put content on YouTube? Okay. So yeah, so that’s what they do. It’s quite simple and what they want to do, the problem or the issue that we’re working on, let’s go back to this one. I was a bit quick now. So what I want to solve is today they have the buy in songs from a number of artists and then [inaudible] sound, I have people sitting down and manually tagging the songs.

They’ll listen to it and put tags on it. So there are two problems with this. One is it’s very time consuming and not always very fun for everyone involved and also you will have a lot of inconsistency in your database. Right? So if you listen to a song and I listen to the same song, it’s quite likely that it will enable different emotions in us. Maybe depending on the mood today or just that we have different tastes in music or whatever.

So that’s what I want to solve and that’s why I want an AI to do this instead because it would be much more consistent doing it. May be wrong in some subjective sense, but still consistent. Right? So they build, actually, the front end for this. So what they do is they input this new song, right? They get the new song from a new artist. They’ll send it into our back end that will put labels on it, a number of labels. I think it’s up to five labels or so.

And then they just take that and store it in a database. So something like this. They have three songs and they generate the spectrograms from this sent into the model that we trained and back goes the labels like peaceful, ambient, dreamy, smooth, dark, happy, dreamy, and all these. So it’ll save a lot of time and it will be more consistent. And now I have some screenshots of the user interface. We can have a look at that later if you have some more time, but just let’s have a look at how we do that on the platform in our tool.

So here we have uploaded the spectrograms from the songs and that’s very easy to generate, right? We had a look at that one line code earlier and then just [inaudible] that together to file and upload it to our platform and what happens here, and this is for the training data, right? So they are labeled already. So we can see that upload spectrograms, this is inspecting what we just uploaded.

You can see the spectrograms and then you see all the, whatever it is, 30 or 40 labels one had encoded which means it’s a 0 if it’s not angry and it’s a 1 if it is angry in the first column here. So that’s an inspection and then you select here on your left side to keep some of it for training and some of it for validation, right? And then we don’t see it on the screenshot, but then you just press a button, we call it wizard, which recommends your model for this kind of data and then it puts that on the canvas where you can see and build your own models if you have an idea of what you want to do.

Otherwise, just using the ones that we have in the system and it’s a lot of pre framed models for [inaudible] and yeah. Whatever you want. There’s quite a lot of models in there and here from this page as well you can select the hitbox and learning rate or you can just stick with the defaults and then we press train and then it starts training, right? And as you know, sometimes it’s really quick and sometimes it takes days and hours and these were run now in the servers with the GPUs. It’s quite quick to do that.

And while it trains, you will see the progress. It can get a lot of different data here to inspect the progress as you go and then when you’re done with the training, if you’re happy with the results then you can press a button to deploy it and when you press that button, it’s just one button, one click, then this model will be available using a rest API. So then just send in your next song as a spectrogram and you will get back the results.

All right. I hope it makes sense. So now we come into the interactive moment. So please put up your phones or computers and what I will do now is I will play two songs and for each of them, you will be able to… You can go into this URL. I hope it works and hopefully that code is activated as well. So you won’t see anything on this page yet. It will just be blank, but what we will do, I will start playing a song and then it should probably pop up a lot of options on it. Categories like angry, dreamy, happy, whatever, and then you are allowed to select up to three of these and then submit it.

Okay. Great. So now I’ll play the song and then you have up to three choices. One, two, or three, and then you press submit. Okay? Can I have maybe five, five to ten minutes?

 

Camera Man:

Five minutes?

 

Andreas Vrålstad:

Okay, yeah. This is good.

 

Camera Man:

Good, good.

 

Andreas Vrålstad:

I’ll skip the end.

 

Camera Man:

Yeah.

 

Andreas Vrålstad:

Okay.

 

Audience Member:

How do you process the lyrics? What about the lyrics?

 

Andreas Vrålstad:

So we don’t explicitly handle the lyrics.

 

Audience Member:

So it doesn’t have any effect.

 

Andreas Vrålstad:

No. I mean, it has inherently because the lyrics will affect the spectrogram, right?

 

Audience Member:

It’s just the sound of the lyrics.

 

Andreas Vrålstad:

It’s just the sound of it. Exactly. Exactly. But you don’t know exactly what the network learns. It might learn the lyrics, but you don’t know that.

 

Audience Member:

Yes. Some songs, the whole [inaudible] and everything will sound upbeat, but the actual meaning of the lyrics is very dark.

 

Andreas Vrålstad:

Exactly.

 

Audience Member:

But yours…

 

Andreas Vrålstad:

It would be cool to try with a song like that where the lyrics contradict the feeling of the song. Yeah. But no explicit training on the lyrics. Just the spectrograms. All right. I really like that song, by the way. It’s nice.

 

Audience Member:

It’s quite slow, really. Like a romantic movie, basically.

 

Andreas Vrålstad:

Romantic movie?

 

Audience Member:

Yeah. It has a wee romantic shadow.

 

Andreas Vrålstad:

Yeah, yeah, yeah. I agree. I agree. It’s really nice personally. All right. So let’s see. What we can see here is what I want to show as well because now we are, let’s see, how many are we? 64 participants in this and let’s say they have 64 people at this company as well labeling the songs. It probably would look exactly like this. So I mean, we touched so many categories, but it’s a maximum of three or maybe five categories. I don’t remember.

But anyway, they have a maximum number of categories. Right. And this means that we don’t really agree which is 100% on which labels it is. So we can see that most of us think it’s dreamy, sentimental, also relaxing and hopeful, those kinds of feelings, but someone thinks its angry and someone think its… Yeah. I mean, yeah. Maybe someone just don’t like guitars, right? Makes you angry. So you don’t know and the same, two people think it’s dark. Yeah.

So let’s see what the computer says. I have named the button show correct answer which might be not right, but let’s see what happens. So the computer selects peaceful and sentimental. That’s the two categories.

 

Audience Member:

Not dreamy?

 

Andreas Vrålstad:

No, it didn’t pick dreamy. I agree. I would have probably picked dreamy as well, but what we could do is get through the catalog and listen to other dreamy songs because the neural network might have a different definition of dreamy in some sense. Yeah. But anyway, sentimental and peaceful which is something most of us…

 

Audience Member:

[inaudible] Your network. Is it trained in the same amount of data? So in our samples for all type of the…

 

Andreas Vrålstad:

No, it’s the same amount of samples for it. Yep. Yep. All right. We have time to try one more. So all right. Are you ready for the next one? Yeah.

All right. So what have we got here? Angry, 18 people. Yeah. [inaudible] frantic, chasing, restless, running, suspense, epic. Yeah.

 

Audience Member:

It’s more like the sort of beginnings of a rock band in the 1980s. The early 80s.

 

Andreas Vrålstad:

Yeah, yeah.

 

Audience Member:

That sort of background [inaudible]

 

Andreas Vrålstad:

Yeah. Yeah. That’s not a romantic song any more, right? But we have one person saying dreamy still. All right. So let’s have a look at what the computer says here. So the computer says chasing and angry and it feels like most of us kind of agree with that as well even if it didn’t catch all the labels that we think.

 

Audience Member:

[inaudible]

 

Andreas Vrålstad:

Pardon?

 

Audience Member:

Is your algorithm only choosing categories rather than…

 

Andreas Vrålstad:

I think it was limited to three or five, even, but it’s only picking two now.

 

Audience Member:

Yeah, why is it only picking two?

 

Andreas Vrålstad:

Probably because it needs to reach a certain level to decide on it. So maybe the threshold is 80% certain or something like that. All right. But yeah.

 

Audience Member:

[inaudible] looked like you only sampled 30 seconds of the song.

 

Andreas Vrålstad:

Yeah. Yeah.

 

Audience Member:

[inaudible] two columns there and I think this song…

 

Andreas Vrålstad:

Correct.

 

Audience Member:

Is a good example where the beginning, because I saw people tagging in the first few seconds.

 

Andreas Vrålstad:

Yeah.

 

Audience Member:

And then once the drums sort of set in, then the answers changed quite significantly. So how do you make sure that whichever part of the song is sampled is actually representative of the…

 

Andreas Vrålstad:

That’s the thing they’re actually working on today so it’s a really good question. Trying to find out what part of the song is most representative.

 

Audience Member:

And did they put this in manually? So there was a person who said this is the 30 seconds we want to sample or did you have an algorithm to determine that?

 

Andreas Vrålstad:

Here we just picked a part of it, kind of random, from the songs.

 

Audience Member:

Random.

 

Andreas Vrålstad:

Well, a part of the song. Yeah. Yeah. Which, of course, just as I said, won’t give you the best results all the time. I mean, this song is one sample and also Bohemian Rhapsody is also hard to set, put a label on it, right?

 

Audience Member:

And is it computational resources? I mean, obviously if you sample the whole song you have to deal with more data. So why is it only 30 seconds?

 

Andreas Vrålstad:

It’s basically for computational reasons. Yeah. Yeah.

 

Audience Member:

Thank you.

 

Andreas Vrålstad:

Yeah. Thanks.

 

Speaker 6:

Is it because your record needs to take in 30 seconds of the data and you can [inaudible] a lot of different sizes.

 

Andreas Vrålstad:

Pardon?

 

Speaker 6:

Is it because your neural network can only take 30 seconds of data and not deal with different [inaudible]

 

Andreas Vrålstad:

Yeah, we designed it to just be able to take 30 seconds and only work with 30 seconds for computational reasons. One minute?

 

Camera Man:

Yeah.

 

Andreas Vrålstad:

A last question? A last question?

 

Speaker 7:

Is your platform a [inaudible] of it being for the [inaudible] and adjust the architecture to that fact or is it using the same resonant that it would use for a picture? Because spectrogram is structured, right?

 

Andreas Vrålstad:

Yeah.

 

Speaker 7:

The dimensions mean different things.

 

Andreas Vrålstad:

Yeah. Yeah, yeah. It does. So that’s a part we are going to add to the [inaudible] as well so you can actually look at images and see what it finds, but right now it will just give you a vanilla scene to run it. Yeah. But it’s in the pipeline.

 

Audience Member:

Thank you.

 

Start Your 21-Day Free Trial Today

Get It Now
Desktop img