ml5.js: KNN Classification Part 1

 ml5.js: KNN Classification Part 1 - YouTube

https://www.youtube.com/watch?v=KTNqXwkLuM4


Transcript:

(00:00) Hello. I am back in the beginner's guide to Machine Learning with ML5. And today, I am continuing this series and I'm going to do a few videos about something called KNN Classification. And the thing that I'm going to use it and apply it to, is building your own teachable machine. Now this is the Teachable Machine Project from Google Creative Lab demonstrating this idea of transfer learning.

(00:28) Taking a pre-trained model and using it as a foundation to train a new model on top of. Now you might be thinking, didn't you already make these videos? There was that whole thing with the feature extractor and classification and regression and saving the model. Yes, you are right. I did already make those videos.

(00:45) However, I'm going to, basically, do the exact same thing again but with a slightly different technique. And that technique is called KNN Classification, which has some advantages. There are pros and cons to both of these techniques and at the end we maybe even sort of wrap up and look at why you might pick one over the other.

(01:02) But one of the main advantages of KNN Classification is it's training the model is happening as you're adding new images, as opposed to a separate step. And I'll get into why that is in a bit. So this is the Teachable Machine Project. You can go to TeachableMachineWithGoogle.com. And what I'm going to do is-- I've already trained this.

(01:25) And so if I'm just standing here in the middle, I've trained it with just me standing here with kind of an angry look on my face as orange. My hand to the left as green. My hand to the right as purple. So what I want to do is emphasis that what you can actually do is build a gesture based controller for some type of game using this technique.

(01:48) You could do this with the other way as well, but I really looked at it as just a new way to classify new images. But here, let's think about if I'm training a model in real-time to understand that a hand is holding up on my left, and my hand is holding up on my right, or I have no hands up at all. You could imagine I could control a character, a pong paddle to move left or right, and maybe up or down.

(02:09) So that's what I'm going to do. It's going to take a few videos, a bunch of things to get through. But that's the idea. All right. So you can play around with this website to get a sense of-- there's a really nice interface. There's a ML5 version of this. This is built with TensorFlow dot js.

(02:24) ML5, which is a library, built on top of TensorFlow js We have a similar version. All this work was done by Yining Shi for ML5. And Yining Shi has been teaching a class called "Machine Learning for the Web" at ITP, NYU. Thank you so much, to Yining, for allowing me to be inspired by and use your materials in these tutorials.

(02:44) Hopefully, Yining will come and be a guest again in "The Coding Train" and show some additional examples of things you can do with KNN Classification. I highly recommend you look through the syllabus and examples. And there's also a whole presentation about KNN Classification that's Yining's slides that you can look at.

(03:01) But what I'm going to actually follow is this article, written by Nikhil Thorat, who is one of the creators of TensorFlow dot js, one of the members of the research team at Google Brain developing TensorFlow dot js, you can see here. And this is a notebook about how all of this stuff works. And I'm going to, basically, go through this, but not use the code here.

(03:27) This code, which is kind of the raw TensorFlow js code, I encourage you to look at it and think about how it works. And this might be a place where you actually want to get more lower level into it. But I'm going to use the new ML5 KNN Classifier feature that is an ML5 library. Now this feature, you're going to want to make sure you're using at least version 0.2.1 of ML5.

(03:57) So as of version 0.2.1, the KNN Classifier is part of ML5. But I will mention, at the time of this recording the documentation is not yet up on the website. But, hopefully, by the time this video is released as part of that playlist you will find the documentation on the website, ML5js.org. All right. So let's start going through this article and let's start right here at this moment of Background, MobileNet.

(04:23) Now this is MobileNet, the MobileNet model is something that I've used in just about almost every video up until this point. So if you've been watching the playlist, you're somewhat familiar to it. MobileNet, this is our first step, load MobileNet. And MobileNet is a Machine Learning, Image Classification model that's been trained to recognize 1,000 classes of images.

(04:46) And it was trained based on a huge database of images, known as ImageNet. I've said this before, but this is a database that exists for researchers to use and try and development machine learning algorithms with, but it's not necessarily a model that works particularly well in generic applications of image classification out in the world.

(05:07) Because it knows a lot about birds and dogs, but not about a lot of other stuff that appears in the world. However, it is a good basis to load, the first step one is to load this model. It's a good basis from which to do this feature extraction process, this transfer learning process, which we did before and we're going to do now in a different way.

(05:26) So the next step is to then create what's called a feature extractor.  I'm going to show you in the code. This is actually, basically, what I've done before. So this is what MobileNet does here. You can see it can look at an image, it can tell you what the probability of a particular class, like Egyptian cat, tabby cat, tiger cat, remote control apparently, there is a almost 2% chance that it's a remote control.

(05:58) Cat, you are not a remote control to me. All right. So now what I have already, which is basically from my previous examples, is a sketch that loads the image from the webcam and displays it in the window. And I also, like I said, have a reference to the ML5 library and I also have this feature extractor loaded.

(06:18) So I'm using P5, I'm connecting to the webcam using a P5 library to connect to the webcam and draw the webcam's image. And then I'm using ML5 to create a feature extractor with the MobileNet model. So now we can go back to Nikhil's article and scroll down. And be like, OK. So this is actually the end result, the cat softmax.

(06:42) So what does this mean? So the MobileNet model is a neural network that expects an input of an image.  That image comes in and it's processed over a variety of different layers. So the image is transformed and processed and twisted and turned, and all of that is the neural network process. Which you'd have to refer to some of my other videos and other reference materials to understand more how this works underneath the hood.

(07:19) But by the end what it gets is a big vector, a big list of numbers, that are all probabilities. They all add up to 100%. And if you look at that here, that's what we're seeing here. And we're seeing all the way over here these classes have a very high probability. So somewhere in here, we're getting this number, like, 0.9, maybe we're getting 0.7, maybe we're getting 0.

(07:47) 05, and all sorts of other numbers. They add up to 100. And 0.9, this particular number, is the number that corresponds to the probability of it being an Egyptian cat. So this is what MobileNet would do on its own. But this stage is called the softmax stage. Softmax is a fancy term for normalizing the output of all of this process to a set of probability values that all add up to 100%.

(08:17) But there are stages before. One stage before is called the logits. And by the way, before I recorded this video I spent, like, a half an hour trying to figure out how I pronounce it. Whether it's "law-gits" or "low-gits." And a lot of places said "low-gits," and some other places said "law-gits.

(08:36) " Imma go with "law-gits." Like logic or logical. But you can see, this is actually what the logits look like. It is the unnormalized, the non-normalized output of the neural network. It is a layer-- I shouldn't be using my hand. I have an eraser. It is the layer right before. This is called the logits.

(09:05) And it's just like a lot of numbers. And softmax is applied to get this last output, which is good for classification. But this layer, all of these numbers, this is referred to as the features. I know I explained this in a previous video, but this is useful to give this another try. Again, we haven't really gotten to the next step, where something pretty different is going to happen.

(09:29) This is still the same as before. And so we can look at it. This is what those cat logits look like. And you can notice, there's peaks still around here. But there's lots of other stuff going on. And you can see, Nikhil writes, this is a "'semantic fingerprint' of the image." Right? This is, basically, the essence of the image.

(09:53) The boiled down, numerical essence of the image as perceived by the MobileNet model. So because an image itself-- I forget what the dimensions MobileNet requires is, maybe 224 by 224 pixels. This is a lot of numbers. So you could say, this is the literal numeric representation of the image, is all of the RGB values of every single pixel.

(10:19) But that's too many numbers to work with and compare and try to make sense of the image. So this pre-trained model has already learned how to boil it down to just 1,000 numbers. And if we have 1,000 numbers, we can start to compare this image to, let's say, another image. So this is the feature extraction, getting these features, that essence, the semantic-- what did Nikhil write? "'Semantic fingerprint' of the image.

(10:51) " If I have two "semantic fingerprints" of two images that are arrays of numbers, there's lots of math that I could start to apply to them. I can say, well, how similar are they? Well, how similar are they compared to these other images? Or what if I were to sort of move interpolate between one image to another? And you've seen some of these, like, artistic visualizations of the output of neural networks as moving through, you might hear walking through the latent space.

(11:19) So this idea of interpolated between the features of images to generate something, all of this is interrelated. But here this now, finally, is the moment where I can say, this is how it relates to KNN Classification. Because KNN stands for K Nearest Neighbors. So if we get the logits for an image, I could say, well, what other images have I seen previously that this semantic fingerprint is very similar to? Then this image is within that category of those images.

(11:54) That's K Nearest Neighbor. So I'm going to wrap this up and we're going to start implementing the code in the next video. But let's take one more step. Let's actually look and see with ML5 how easy it is to get those logits. So I'm gonna come back over here.  So this is my code, right? So in my code I have the video and I have a feature extractor.

(12:21) The feature extractor, I'm just calling it "features," is preloaded with the MobileNet model. So now, what I'm going to do is, I'm just going to have "mouse pressed," which is not what I'm going to want to do ultimately. And I'm going to say, "features dot infer video.

(12:37) " So the infer function is the naming-- I'm little unsure about this naming but maybe it'll change in the API at some point. But this is the idea of inferring the logits from this particular image. So this could be a static image, but since I have the video it's whatever the snapshot of the video is right now.

(12:56) And this could go into the logits. This is, basically, giving me that particular logits. And I'm going to do something and is going to look weird. I'm gonna say, "console log logits," like, shouldn't that show me just an array of numbers? Not exactly, but I'm going to show it to you.

(13:13) Let's see what happens. So let me refresh this. OK, the model is loaded. Now I'm going to click. And you say, what? What? What? Well this looks, kind of, dtype float32. OK, floating point numbers, there's 1,000 of them. What's all this nonsense? Look at all this, this looks crazy. Well, guess what? We just waited, by accident or kind of on purpose since I did it on purpose, into TF dot js territory.

(13:39) Now to continue writing this code, we don't ever have to look at the TensorFlow dot js documentation. But this is a moment to realize ML5 isn't some magical thing that exists on its own. It is operating on top of TensorFlow dot js. Meaning it's managing something called the tensors for you. What's a tensor? A tensor is, basically, a fancy word for an array of numbers, or a matrix of numbers.

(14:05) And so, the logits are a tensor, it's a one dimensional array of all those numbers. So they are actually a tensor. I can actually look at those values in the console by saying, "logits dot print." So let me do that now. And if I click here now, you're going to see, there it is. Now it's not printing it all out, but you can see, oh, that looks like an array of numbers.

(14:31) And another way I could do it is, I could say, "console dot log logits data Sync." So Data Sync is a function-- these are functions that are part of TensorFlow dot js and you can go back and look at my TensorFlow dot js videos as background, where I go through them more specifically. This is a function that gives me that data as a regular array.

(14:51) So last step here would be to do this. Click and there you go. These are the logits. This is the digital fingerprint of that image. So we are now ready. We now, hopefully, from watching this video, you understand maybe a little bit about what KNN Classification might be. But that's what I'm gonna get into.

(15:14) That you're loading MobileNet, the pre-trained model. You're creating a feature extractor, which knows how to pull out the logits of the MobileNet model with a given input image. And then we're passing in an image from a video. And instead of getting the classification, getting just that digital fingerprint.

(15:36) And that semantic digital fingerprint, this is exactly what we're going to do in the next video, to train a KNN Classification model. All right. So if you want to keep watching, then keep going in this playlist. If the video doesn't exist yet, it's because I'm still working on it but it'll be there soon.

(15:51) Bye! [MUSIC PLAYING]

Comments

Popular posts from this blog

ml5.js: KNN Classification Part 2

ml5.js: Feature Extractor Classification

ml5.js: Transfer Learning with Feature Extractor