ml5.js: KNN Classification Part 2
ml5.js: KNN Classification Part 2 - YouTube
https://www.youtube.com/watch?v=Mwo5_bUVhlA
Transcript:
(00:00) [DING] All right, we are ready to continue our ml5 KNN Classification, what I'm calling game controller. I want it to be able to create an interface by which I can train a little controller for a game to move something left and right, maybe up and down. That's what I'm doing. Now the last video I left off with at this step where I load the MobileNet model, I create a feature extractor, I passed in an image from the webcam and I get the logit itself.
(00:29) So one thing I do want to mention, people in the chat were helpful, I was being a little bit loosey goosey with the terminology of like scalar, vector, matrix and tensor. And so I should mention that a scalar being like a single number, is really zero dimensions. A vector is typically like a list of numbers, which is really one dimensional.
(00:51) A matrix is thought of as a grid of numbers. This is 2D. And a tensor is a term to describe the sort of n dimensional possibilities of any collection of numbers. So a tensor can be zero dimensions, one dimension, two dimensions, three, et cetera. So that's what those terms mean, and in this case, the logits come out as a one dimensional tensor.
(01:17) And the shape of it, the dimensions might actually be something different underneath the hood in tensorflow.js, but that's one way of thinking about it. So hopefully that helps clarify a little bit. But let's now move on. And I'm going to the next thing that we're going to do is, we're just only going to work from assuming that we have the logits.
(01:35) And the algorithm, we don't have to write ourselves. I think I have a video where I do some kind of KNN writing the algorithm out with some movie recommendations or something. I'll try to reference that in the video's description. But the next thing we need to do, and I don't know what step I'm on, like four, is create an ml5 KNN model.
(02:00) It's going to be empty. It's going to be creating an empty one. So I'm going to add a new variable, KNN. And then in Setup also, I may have to move this somewhere else eventually. But for right now, I'm going to say KNN is a new ml5 KNN Classifier. Simple as that. So I have the feature extractor and the classifier.
(02:25) I think it's time to talk about what is KNN classification. Luckily for us, we can refer back to Nikhil's article. Thank you again to Nikhil for publishing this. And I'm going to skip through some of this more examples of different logits. And I'm going to look down here and look at this. So this is a nice example demonstrating the idea of k-nearest neighbors.
(02:56) So let's sort of recreate that visualization over here on the board so I can talk through it a little bit. Let's say this is a two-dimensional space. It is. And I'm going to put in this two-dimensional space, a bunch of A's and a bunch of B's. Now I've written them in a kind of certain way, to make it very obvious.
(03:21) In fact, if I wanted to, I could draw just visually, I would understand where's the quote unquote, decision boundary between these two different categories. And I could kind of draw like this. That's just me eyeballing it. I'm not sure if that's exactly those exact mathematical decision boundary.
(03:39) But and then, I could say like, oh, if I threw a dart at the wall, the dart's either going to be over here or it's going to be over here. It's going to be either of class A or of Class B. It's going to be either, I want to classify it. Now the way the k-nearest neighbor works, the n stands for nearest, the n stands for neighbor.
(04:01) The k stands for how many nearest neighbors am I looking for? So the way that it actually works, the way that the decision boundary is computed, let's take it out for a second, is that if I were to look at any given point, like this one. Let's say I have a k of three. And typical you would want a k that's an odd number.
(04:22) Because what you want to do, is it's basically like a voting system. I want to look for the three nearest neighbors. And in this thing, I mean literally nearest. The Euclidean distance. By Euclidean distance. By proximity in two-dimensional space, which are the nearest? Well, I can eyeball it. I could say this is the nearest, this is the nearest, and this is the nearest.
(04:43) So these are my three nearest neighbors. They all vote, and three to zero, this spot gets categorized as an A. We could get a little more complex. Like what if I went over here? What are the three nearest neighbors? Well, this is the nearest, this is the nearest, and this is the nearest. Ah, by a vote of 2 to 1.
(05:04) Actually, this k being odd is actually irrelevant, because you can have more than two classes. So breaking the tie is a thing. But here, I could say by a vote of 2 to 1, this point is classified as a B. And this is exactly what Nikhil's visualization is doing. And in fact, this would be nice to actually do.
(05:28) Do a coding challenge of just like doing exactly this. Remaking this visual explanation. But what I can do here, is I can vary this. I can say let's look at a three class example. Let's make k much larger, like eight. And we can see here this is showing what category it is. And I can also show that decision boundary.
(05:50) And you can see here this is the decision boundary. So this is the idea of k-nearest neighbor. Wait, so that may make sense to you. You might be thinking to yourself, what? Suddenly you've got there's two-dimensional graph and a bunch of points, which ones are near each other? What does that have to do with images of cats and logits of 1,000? Well, it does.
(06:09) It has something very specific to do with it. In fact, the reason why the Nikhil picked that example in his write-up, and the reason why I'm drawing it this way to explain it to you, is because our brains are very well suited to think in small amounts of dimensions. Two dimensions, three dimensions. So you could think of each one of these points in two-dimensional space as the logits that just has two values.
(06:37) Its x and y-coordinate. But, what the feature extractor does, so I'm going to say, I'm going to say infer from an image. Remember this step from the previous video, this gives us 1,000 logits, a list of 1,000 numbers, and I said what if we got two of them, two lists of 1,000 numbers, how could we compute which? It's the distance between these two things.
(07:13) It gets weird to think about, how do I think about the distance between two things in 1,000 dimensional space? Well actually, the formula that you would use for Euclidean distance in two-dimensional space is the same exact formula you would use in 1,000 dimensional space. Beyond the scope of this video, I could make another video.
(07:30) It basically looks like the square root of the difference between all the values squared added together. It's like the Pythagorean theorem. But actually, this Euclidean distance formula is not such a good one for higher dimensional spaces. It kind of breaks down. Because you could have a lot of the numbers similar, but just one number really far away from another one.
(07:49) So there is actually other ways of calculating distances. One of which is called cosine similarity or cosine distance. And actually, I've referenced that in my Word2Vec tutorial, as well, this comes up exactly in that Word2Vec stuff. And this is a way of looking at, you could think almost as if I wanted to compare a and b, I might look at the angle between them.
(08:12) If this is the origin and I'm looking at these two points in 2D space, are they, is there a small angle between them? That's kind of like what cosine similarity is doing. But what's that sort of like angle between in 1,000 dimensional space? But the point is if I can infer the logits from an image, I can build up a database.
(08:36) A database of let's say, cat. I can then use the, what I'm going to do is use the add example function. So ml5 has an add example function that I give the logits and the label. So I can basically build up a database of a lot of logits labeled with cat, and store them so that later when I get a new one, I could see where is this new one in one-dimensional, in 1,000 dimensional space.
(09:06) Near all those cat ones? This is exactly what we're going to do. So in other words, let's come back here. And I'm going to say under mouse press now, instead of console logging it out, I'm going to say KNN add example, cat. And actually, what I'm going to do is, I'm going to add left and right.
(09:28) I'm going to try to do left and right, because the idea is I want to control something. So I made an example, these logits are categorized as left. And I'm going to change this to key pressed. This is sort of silly what I'm doing. But I'm going to say if key equals, uh, I forget how I do the arrow.
(09:47) If key equals L, again, I should be building a nice interface for this, but. This is going to just do the trick. Right. So what I'm doing now, is I can now build up that database. This is the model. The KNN model is actually just the big database of all of the logits that are labeled. All right. So the idea here is that I extract the features from the video.
(10:18) When I press L, I'm saying this is one to the left. And when I press R, this is one to the right. Let's take a look at that. Go back here. We'll click in here to give it focus. So I want, this is confusing, because it's not mirrored. This is my left. You see it as my right. This is my left hand.
(10:34) So I'm going to use my actual left. And I'm going to press L. So I'm training it. And then I'm going to do this and press R a bunch of times. Now of course, I shouldn't be console logging to see what's happening. But I at least can look now at that KNN object. And it can look in it and see.
(10:54) What are some functions I can call? I can do get number of labels. Two, right? Left and right. I could also probably like how many, how many examples are there? But this is me training it. Oh, and look. Left and right. I didn't even notice that. Map string to index, that's there. OK, so now, let's think.
(11:22) Now what do I do? Let's use mouse pressed. Let's go back and use mouse pressed. What if I get something new and I want to classify it? Aha. So let's say this is me training it. Like here's a bunch of lefts, here's a bunch of rights. And when I click the mouse, I want to see what it is currently.
(11:42) So what I'm going to do now, is I'm also going to infer the logits from the video. Then I'm going to say KNN classify those logits. Ah, here's the thing. This is where I now need a callback. So everything so far I've been able to do synchronously, like I can pull out the logits, I can train it.
(12:02) But this is where, this is now the computer, the ml5-library and tensorflow.js needs to like do something for a little bit and let me know when it's done. So I am going to say add a function call, got results. And I'll just write a separate function. Again, I'm doing this in a very sort of like es5 callback kind of way.
(12:21) You can use Promises and all of that as well, if you prefer that. But I'm going to say, got results, result. And I'm just going to say console log result to just see what that looks like. So now, I press keys to train it. And you know what would be nice for me to console log something, so console log left.
(12:43) Just so I see that it's working. Console log right. So I just console log something there. Again, if you were you doing this, you probably want to build a whole interface for this. I'm doing everything with just key presses and console logs, which is super awkward in terms of like user interaction.
(13:00) But it allows me to test and iterate and explain this idea more quickly. All right, so now, refresh. So what I want to do is, and by the way, I'm going to get an error. So KNN classify is not a function. Well, oh, I spelled it wrong. Classify. So here's the thing. I want to say like if, I only want to do this if it's ready to classify.
(13:23) I want to say, if KNN, what was it? What did I say KNN, um, get count or get number of labels or something? I'm going to say if KNN, get number of labels is greater than zero, then actually do the classification. Now let me refresh. And I'm going to click over here and train it left. Here's like 20 images, train it right.
(13:54) Here's 20 images. It's not 20, it's 17, it's 21. Now I'm going to click the mouse. Ah, I know what I did wrong. I always do this wrong. [DING] ml5 uses error first callbacks, p5 doesn't. And so I'm so used to just the result being the first thing, but the error and any error always comes in as the argument to the callback.
(14:18) So I'd want to say if error console. I probably want to hand, there's many ways I could actually handle the error. Otherwise, console log result. Got to do this again. It's fine, here we go. OK. So I'm going to train it. Wait, no, whoops. Da, da, da, da, da, da. 21. I'm going to try to do about the same amount.
(14:43) 22 now. Please say left. Look at that. Left. Please say right. Oh, no, whoops. Clicked in the wrong place. Ah, no, right, it said right anyway. Right, left. It's working. Oh, that's exciting. So we've done it now. We, this is basically it, right? And I could, just say you know by the way, I could say let's add one more.
(15:10) Let's call it up. One of the advantages of doing KNN, like I was saying, this KNN classification, unlike the other feature extractor with transfer learning example that I went through previously, this requires no separate training step. The model is just the database of training images. And it's able to do that distance calculation in real time.
(15:38) So also, I can add labels later, I can add new images later. So you can kind of train and classify all at the same time in a very flexible way. That's one of the advantages of doing it this KNN classification way. All right, let's do one more thing, which is. Let's actually just have it constantly try to guess.
(16:00) So I'm going to make a label paragraph. I'm going to say label paragraph equals create p. Need training data. Then what I'm going to do is, let's just make sure that's working. So it says need training data there. Then what I want to do is, this is train, is um, I'm going to have a Boolean variable called ready.
(16:35) Have it be false. And then in. In draw. Basically since draw is looping anyway, there's a more elegant way I could do this. As long as it's not ready, and if it's not ready and there's at least one label, I can start. So I can start this classification and I can now say ready is true. And then in the classification, I can, when I get the result, what do the results look like? They have a class index confidence, confidences by label, and the actual label.
(17:18) So I can get the label. I can say labelp.html, result.label. And then guess what I can do right then. I can classify again. I probably want to break this out into a function, but I'm just going to copy paste the code right here. Whoops, ah. This video was going very well a moment ago. Well, as well as I could expect it.
(17:44) OK, I don't need this anymore. No, that's not right. Didn't do what I want to do. These two lines of code, what do I want to do? Let's put this in another function. Go classify. I mean, I could think of a better name for this. And actually, let's take this. I'm going to do this. I'm going to make this a little nicer.
(18:06) I'm going to go classify there. So if this is true, then go classify. A little refactoring. Because why not, this callback is a little silly to have it as a separate function. Let's just make this an anonymous function and put it in right here. So this is maybe a little, so this is what I'm doing.
(18:29) I'm basically saying, classify. What does classify do? It gets the logits. It calls classify with the logits. Then it has a callback and it gives an error, it labels it, and then guess what it does. Go classify again. So this is kind of this recursive just like classify it when you're done, classify again when you're done, classify again.
(18:50) OK. Let's see here. Needs training data, needs training data. So let me do this. I'm going to give it some left. Let's gotta click over here. So it's only going to ever be able to classify anything that's left right now. And at some point, it knows that it's right. Left, right, left. Let's train it better.
(19:16) First of all, one thing that's bothering me, is it's very hard to see that. So style, font size. This is how, this is how CSS works with p5, Ooh, too big OK, let's try this one more time. Here we go everybody. [DRUM ROLL] Ooh, that's so loud. [DRUM ROLL] Give it lots and lots of examples of me with my hand to my left, about 50 of them.
(19:53) Now let's give it lots and lots of examples of me with my hand to the right. Maybe, OK, now. Left, right, left, right, left, right, left, look at me, I did left right there. OK, ahh! This is good. So guess what. We're kind of there. I mean, there's so much, there's a lot of stuff. What's missing here? Well, first of all, I want this to actually be able to control something, like moving a paddle left and right.
(20:30) And then also, I might not want to have to train it every time. I think there's interesting, probably a lot of really unique applications for you to make something that the user is actually expected to train their own controller. There's a tensorflow.js of what examples. Has an example of training your own controller to control Pac-Man.
(20:46) I'll link to that in this video's description. But in this case, what I might like to show you, is that I could sit here and train it for a while, save that, and load that model into like a different sketch, which is just doing the classification to move something. So that's what I'll do in the very last video and I'll add up as well.
(21:04) So the last next step that I'm going to do, is add saving and loading, as well as a simple game mechanic to control as well. And you probably, the save the load functions are just called that, save and load. And so you might even try this yourself before you watch the next video where I'm going to do it.
(21:19) OK? Goodbye, and I'll see you soon. [DING]
Comments
Post a Comment