ml5.js: Feature Extractor Classification

 ml5.js: Feature Extractor Classification - YouTube

https://www.youtube.com/watch?v=eeO-rWYFuG0


Transcript:

(00:00) OK, it is time now. I am going to build a image classifier using my own images. I am going to teach this example to not say acoustic guitar or electric guitar, but to say ukulele. I am going to teach this example not to say syringe, but just say train whistle. OK, this is what we're going to do-- and the process that I'm going to use is transfer .

(00:24) Learning I described this process in the previous video. You can go back and watch that if you want, or you can just keep following with me. I'm going to write the code for this. One thing that I want to mention is-- you might be wondering, oh, why is it called a feature extractor? How does this stuff happen? So I want to first mention it, and thank Gene Kogan for making the image classification and regression transfer learning with MobileNet examples.

(00:48) You can find these sort of native TensorFlow.js versions of what I'm doing right now here at the ml4a-- machine learning for artists-- website. It's a wonderful website with tons of resources about machine learning. And also it's an interesting whole discussion here, in terms of how should the ml5 API work, or what should things be called.

(01:06) So if you're curious about how open-source projects choose their names of things, I might reference this particular thread. But the important piece for us is to be at the ml5 website on the featureExtractor documentation page. I going to need to make heavy use of this page to look up what the names of all the functions I need to type are in.

(01:29) And then, of course, I should mentioned, if I just go down here under Examples, Classifier with Feature Extractor, this is basically what I'm going to build. The point of this video is I'm going to build it up, but you could just look at this example instead, if you want. But anyway, all right. So let's go back to the featureExtractor documentation page.

(01:50) What I've got here is the code that I wrote from a previous video, a couple videos ago, using the MobileNet model to classify images from the webcam. OK, so if I go to the code, the main thing that I need to change is I no longer want to make an ml5 imageClassifier. I want to make an ml5.featureExtractor.

(02:09) And the difference here, also, is I'm not going to reference the video yet. So I'm just going to make a featureExtractor built on top of MobileNet. And this callback modelReady means the model has been loaded, the MobileNet model is downloaded from wherever it needed to download it from, and it's there and ready to go.

(02:33) So this should say model is ready. I'm going to get rid of this predict function. So now, if we just refresh this-- a lot of stuff's going to break here-- but I see model is ready, and the video appears. Now, there's no labels anymore, because I got rid of the imageClassifier-- I have a featureExtractor now.

(02:47) But I want an imageClassifier, so what I do is I'm going to add a variable. I'm going to write a variable called classifier. So the variable mobilenet is now referring to the featureExtractor, and the classifier equals mobilenet, the featureExtractor, dot classification. So I want to make-- classification.

(03:11) I want to make a classification object from the featureExtractor, and I need to give that-- I want to say, and I want to use images from the video. And again, now, if I were doing this with a database of JPEGs that I'm loading, or something else, I would do it in a different way. But I'm going to use the video in this example.

(03:27) So I'm going to say video, and then I'm going to say videoReady, because I want to have a callback also to know that the video is ready. I don't really need that callback, but it's sort of useful. And I write that up here-- videoReady. And I'm going to say a video is ready. So let's refresh this again, and we should see my image pop up, model's ready, video's ready.

(03:49) OK, now we are ready to train our own labels. We have the featureExtractor, we have the classifier. We can give it images of a ukulele-- and I don't know if I wrote this on the board just now, or earlier, or when, but I used to have ukulele spelled wrong. It's spelled U-K-U-L-E-L-E.

(04:10) I don't know why, but I feel like it's very important to say right now. OK, so how do I do that? Well, I could look up in the featureExtractor page, there's a function called addImage. And it's right here. I know some of this API from having practiced this a little bit, but this is what I'm looking for-- addImage.

(04:32) addImage and a label. So I can say-- oh, but when do I want to add an image? I want to make a button that, every time I press the button, I'm saying that's a ukulele image. So let's first create ukeButton, and then I'm going to say, in setup, ukeButton equals createButton ukulele-- but if it's uke, it's just U-K-E. Boy, it's confusing.

(05:01) Now, here's the thing-- I am creating this example in a very basic, what I hope is beginner-friendly way. There are so many ways you can build an interface, and style your page, and handle events. I'm just going to use simple p5 functions that place a button on the page, and a simple callback function for when I press the button.

(05:24) So then I'm going to say [? ukePressed ?] mousePressed. And I could put the function name here and write the function somewhere else, which I do in some videos, to be kind of-- but I'm just going to put an anonymous function in here. The idea of this anonymous function is, when I press the button, this function should be executed.

(05:41) So the code I'm writing in here will happen when I press the button. And what do I want to do there? I want to say classifier.addImage, and I want to give it a label-- ukulele. I hope that's spelled right. So now, whenever I press the button, it's a ukulele. Let's add oh train whistle one while we're at it.

(06:03) Let's say whistleButton, and let's do exactly the same thing, but with whistleButton.  whistleButton, createButton-- I'm going to have the button say whistle. I cannot spell anything. whistleButton, and then addImage("whistle"). So you can see here I have two buttons. Ukulele adds an image assigned the label ukulele, whistle adds an image assigned the label whistle.

(06:37) Now, there's not really much point to me running this code. I should just make sure I have no errors. We can see the buttons are there now. The image comes up, but nothing's going to happen. I can keep pressing this ukulele button, I can keep pressing the whistle button-- nothing's happening, because I'm not giving myself any feedback.

(06:53) So right now, I just have to hope it's working. But the thing that I need to do next is actually apply a training step. Now, one thing that was interesting-- I showed in the previous video the Teachable Machine, this project I'm basically making exactly the same thing. Train green, train purple, train orange.

(07:14) My buttons are train ukulele, train whistle. But it just started to work immediately. That's because there's a slightly different algorithm at play here. The algorithm that I'm using in ml5 requires an additional step, a training step. I'm going to write this here-- training. So basically, the process is add a bunch of images-- say, hey, this is a ukulele, this is ukulele, this is a ukulele, this is a whistle, this is a whistle, this is a ukulele, this is a whistle, this is a ukulele.

(07:44) Once I've added all those images, then I explicitly say, I'm done with all of these images. I'm going to let the model train itself using those the features that it extracted of those images, and map those features basically to these labels. And by map, I really mean-- there's another machine learning model right here.

(08:04) It's actually like a neural network model. And so that mapping between the features and the new labels happens with another neural network here. But ml5 and TensorFlow.js are handling all of that for us. OK, now, let me come back over here and add the training steps. So I certainly need one more button-- train choo-choo button.

(08:27) So I'm going to add that button. createButton trainButton-- I'm going to say train-- trainButton-- oh, OK, now here-- now, what am I doing in here? OK, what I'm doing in here is I'm going to say classifier.train. So I'm going to say, hey, classifier, train yourself. Now, I could just leave it at this and kind of say, hey, great, I'm done.

(08:59) But something that I should do here is I can actually put a callback inside of here. Now, this is getting pretty awkward. I'm going to allow it-- and you know what, actually, though? I'm going to make a separate function just for-- because I might want to do a bunch of things. I'm going to call it whileTraining.

(09:17) So I just want to put this in a separate function so I can look at it on its own. What did I call it? whileTraining-- so this function whileTraining is a function that's kind of kind run over and over again during the training process, and it's going to report back to me information about the training process.

(09:36) And the information it's going to report to me is something called loss. Loss. Let's just actually run this, and then I'm going to explain what loss is. Let's just see if I can get this to work so far. So I'm going to refresh. My model is ready, my video is ready. I have a ukulele. I can press the ukulele, ukulele, ukulele.

(09:56) I'm going to show it a lot of ukuleles. Here's a lot of examples of ukuleles. And now, I'm going to hold up my train whistle. Here's a lot of examples of train whistles. Now, I'm going to hit the button train. And yes, we see this number here in the console, it's training and training a training while training console.log this value, loss.

(10:20) What is the loss? So loss is a really important term in machine learning, data science. It's often also called cost. And by this, I mean loss function or cost function. And what the loss function our cost function is calculating is the error. So what do I mean by error? So basically, if I say to the machine learning model that's training, hey, here's a ukulele-- this is a ukulele, but just pretend you don't know what it is for a second.

(10:52) What do you think it is? And if it comes back and says, I think it's a ukulele, guess what? The error is 0. If it comes back and says, I think it's a train whistle, then the error is not 0, it's something else. But something that I haven't really mentioned, because we're kind of living above the fray here-- we're talking in terms of labels.

(11:11) The machine learning model underneath the hood is just working with numbers. The labels are only a thing for us, the human being, to use at the end of the process. So when it's actually calculating an error, even if it guesses it's a ukulele, it's calculating error based on the numeric probability that it guessed.

(11:29) So the machine learning model might think, OK, that's 80% likely to be a ukulele. So that might be the right answer, but we know the right answer is it's 100% a ukulele. So the error is actually the difference between mean 100% and 80%, or 0.2. So this idea of the aggregate error of all of the training images over time-- ml5 and TensorFlow.

(11:52) js is calculating that for you during that training process. And we're seeing that here-- [CLOCK TICKING]  There's a clock tick tocking, because I have a clock sound effect, for some unknown reason. We can see that that loss is getting very, very low. So you want the error to be low. The error started kind of high-- it was like 6.

(12:10) 92, and then it got lower and lower lower, as it was training. And one thing you'll notice is, eventually, it's stopped. Now, it's an arbitrary decision when do you stop training. Ml5 has default settings. It's going to train for a while until the loss of a certain amount, or something like that.

(12:24) But you can see when it's finished. It gives you the low as-- the loss as no. The low. The loss as null. So I can actually say here, if loss equals null, console.log--  I'm going to say training complete, else I can console.log the loss. And then guess what? When the training is complete, what do I want to do? I want to classify.

(12:55) So I'm going to say classifier.classify(gotResults).  And I already have the gotResults function.  I'm getting results, just like I got results from the raw MobileNet model, but now I'm getting the results from my new transfer learn trained, custom trained model. So I say classifier.classify(gotResults).

(13:21) I get the results, if there's an error. Otherwise, I can give the class name to the label to get drawn on the screen. And then what I want to do-- again, I want to say classifier.classify. So before, it was called predict. Now, it's called classify. I'm not sure why. Maybe, at some point, check that code that goes along with this video to see if that's changed.

(13:41) But that's what it is right now. OK, so I think we're actually done with this example, sort of. Let's try. OK, model is ready, video is ready. Oh, boy. What's the chance that this is going to work? Stop and guess. I give it like 10 to 1 this works. OK, I'm going to step out of the frame, and I'm going to train it with a bunch of images of a ukulele.

(14:06) I'm going to move the ukulele around to give it a lot of different examples. Further back-- and again, I should probably add something that shows me how many training examples, because I probably want to give around the same number of training examples for the whistle as with the ukulele. I'm going to train this.

(14:23) Whistle, whistle, whistle, whistle, whistle. Now, I'm going to hit train. When it's done, we should start seeing labels. Training complete. I don't see labels.  No labels, that's so sad. What did I do wrong? OK, so something's wrong. I got to figure this out. I got to debug this. All right, I really should put console.

(14:46) dot(results) in, because I don't even know if this function is being called. Let's make sure dot classify is the right name of the function. Classify, callback-- yeah. Oh, no, no-- yeah, callback. And then the callback is a function, otherwise blah, blah, blah. So I think this is right. Everything looks right to me.

(15:11) Am I just not drawing the label properly? No, there it is, label-- This is me not paying close attention. This is interesting, and this is now a question for the ml5 library itself. But the thing that comes in the gotResults function is actually just the label that it guesses. So there's not like the class name and the probability, all that stuff that came with MobileNet.

(15:38) It's not giving us all that information, which probably makes sense. So actually, all I want to do is say label equals results. So that was a little bit of a digression. I had to figure out what I did wrong there. Now, this should work. OK, ready? Let's try it with the ukulele.  Giving it a lot of examples of a ukulele.

(16:04) Giving it that a lot of examples of a whistle. Now, I'm going to train it. I'm going to do my training dance.  And training is complete.  Here's a ukulele. It is a ukulele.  It is a whistle.  It is a ukulele. It is a whistle. Yay, OK. So this actually works. Here's the thing-- I want to show you something more interesting here.

(16:47) It's not more interesting-- I'm going to show you something different. I'm going to leave all the variable names the same, but I might change this to smile, I'm going to change it to happy, I'm going change this to sad.  OK, so I'm going to run this again, and I'm going to do this.

(17:07) And we get lots of pictures of me smiling.  Then I'm going to hit train, let it train for a little bit. It's finished.  How long should I do that for? This is pretty cool. Here's the thing-- it doesn't just have to be image recognition. I can training with different facial expressions. What MobileNet is really good at, if you recall, is taking an image and boiling it down to a smaller list of numbers that kind of define the essence of that image.

(17:57) So it doesn't matter what the content is, it can figure out the essence of me smiling, the essence of me frowning. It is important, though, to remember it's not really learning smiling or frowning, it's kind of-- and if I were to turn this this way, you can see some of the room here-- and I go here, it's not working so well, because it learned all that stuff with the background.

(18:26) So there's a lot of nuance to this, and it's really important to remember. It's not some magic. The computer doesn't have suddenly some understanding of emotions. It's really just looking at, I've got a new image, and I'm comparing it to a bunch of images I got before. What's it most similar like? Let's see if it-- now that I moved the camera around-- is it working.

(18:45) So here's the thing--  go. Go forth. Add a third category. Make a project. Make a game where I'm playing Pong-- actually, this is a project I should reference by Alejandro. Let me go to ml5js.org. Alejandro, under experiments, made a project called Pong ML, which basically is the same exact technique to train various hand gestures, and then you could play Pong with hand gestures.

(19:20) So there's so many possibilities of what you could do just from being able to train starting with the MobileNet model, and applying transfer learning. OK, are you with me? Did this makes sense? Make something. Go and try this. Make your own version of it. Really think about the interface. And one more thing-- something you're really going to want to do-- and you've probably noticed this-- every time I restarted the sketch, I had to retrain with images again, which is fun for the sort of real-time interactive nature

(19:50) of this. But if you're actually going to do this for a while, you're building an installation or something, you've trained it with a bunch of images and you want to, I don't know, save the model that you've done, save all that work, you're going to want to save and be able to reload the result of that transfer learning.

(20:07) At present, at the time of this recording, there isn't a way to do that easily with the ml5 library. It's certainly technically possible. There is a GitHub issue. Currently, it is ml5-library/issues/174 with a discussion about this feature. Certainly, once this feature exists, I will come back and make a video to show you how to do that.

(20:26) OK, thanks for watching this video. Oh, I'm going to do one more video. I'm going to something interesting-- hopefully. I'm going to do one more video with something very similar to what I did just here, but I'm going to do something called a regression. So I'm going to define what a regression is, and show you how I can do transfer learning with the MobileNet model to perform a regression.

(20:44) Why would I want to do that? I don't know. I guess we'll have to watch. Basically a slider. If I want to be able to control something with my hand moving back and forth as a slider, that's something that I can use a regression for. All right, see you there. [MUSIC PLAYING]

Comments

Popular posts from this blog

ml5.js: KNN Classification Part 2

ml5.js: Transfer Learning with Feature Extractor