ml5.js: Feature Extractor Regression
ml5.js: Feature Extractor Regression - YouTube
https://www.youtube.com/watch?v=aKgq0m1YjvQ
Transcript:
(00:00) [WHISTLE] Hello, everyone. In the previous video, I made this example. What this example is doing, is it's using something called transfer learning. It's already been trained to recognize my happy face. And my sad face. My happy. My sad face. The process that I use is something called transfer learning, where I'm using a pre-trained model, MobileNet.
(00:29) I am eliminating the last part of what it does, which is like take an image and give it a label with some probabilities. I eliminate that. But I'm using all the steps it does up until that last step where it basically boils the image down to a nice smaller array of numbers called the features. And then I assign my own labels to particular features from images that I'm giving it and train it again.
(00:53) So now it knows how to recognize new images. And instead of giving me the MobileNet labels, giving me my own labels. So that's what I did. That's called classification. In this video, what I want to do, is use transfer learning not to do classification, but to do something called a regression. Now regression really sounds like some scary, complicated mathematical concept term.
(01:17) It's really quite simple in this case. Classification says for this image, I would like to have a label A or B or C. One of two options or five options or 1,000 options. I am taking the image and putting it in a bucket. It is a ukulele, it is a train whistle. There's no in between. A regression is, I want to make a prediction from that image, but I'm not making a prediction as to which bucket it falls in, but I'm making, I'm going to get a number.
(01:47) It's really more like a slider. So instead of am I happy or sad, is how happy versus how sad. I want a single number. Of course, a regression could produce more than a single number, but in this case, I want a single number. I'm going to train it. I'm going to say this image is a zero, this image is a one, this image is a 0.5.
(02:09) And then after I train it, it's going to give me numbers in between. So I'm basically training the model to look at images and produce the results that are like a slider. Hopefully that made somewhat sense. Don't worry if it didn't. I mean, I'm going to produce the code. I think it'll make more sense once I do.
(02:28) So once again, I want to thank Gene Kogan. Check out the ml4a website, which I'll link to. Gene Kogan made a set of examples that do exactly this with tensorflow.js, and this is really taking inspiration from that and doing the same thing with the ml5-library. So let me go to my code. This is the code that I left off in the previous video with the image classification example.
(02:52) And the main thing I'm going to change here is, well, there's a lot of things I'm going to change. But I want to change this word here. I want a feature extractor from the MobileNet model. And now I want to do a regression, not a classification. Then I also want to have a slider. So these buttons ukulele vs.
(03:12) Whistle or happy or sad, I don't want these anymore. I'm going to keep the training button, but I want to have a slider. So I'm going to say a slider, I going to say let slider. And then in Setup, I'm going to say slider equals create slider. The slider should have a range between 0 and 1, should start at 0.5 I guess, and have incremental values 0.1.
(03:38) Let's just make sure I see a slider. So now I see a slider and I'm able to move it around. So I've got a slider and I can move it around. Now I will need to have an event that happens whenever I move the slider. So whenever, oh, yeah, whenever I move the slider, the event for the p5 dom library.
(04:01) Again, there's a number of different ways you could do this, and you can find a link in the video description too I think. Hopefully, a plain JavaScript example does the same thing. I think that exists, is input. So let me just say console.logslider.value. So what this should do, is now anytime I move the slider, I should see the value of the slider.
(04:23) So if I move the slider, yeah, you can see 0. You could see it in the console all the way up to 1, to all the way down to 0. Perfect. Because what I want to do whenever I'm moving the slider, is I want to say, and I don't want to call this classifier. I'm going to call it, I guess the correct term would be regressor.
(04:42) That sounds really weird. I'm going to call it predictor. Predictor. So I'm going to say predictor equals mobilenet.regression. And when I move the slider, I'm going to say, predictor add image with that slider value. So basically I'm saying, give me an image. No, sorry, not give me. Assign this image.
(05:05) This image's features to this number. So basically I'm saying, I'm going to move the slider, I'm going to move it over here, move over here, and I'm going to say, oh, wait, sorry. Oh, it's always doing it, that's weird. I probably should. I should create and add image button. Let me create an add, I was kind of doing it whenever I moved the slider.
(05:25) But I think interaction-wise, it's going to make more sense if I do, I'm going to make a button called add example, add button. And I'm going to say add Button equals create button. Add example image. And then add button dot mouse pressed. And then I'm going to say predictor dot add. I'm going to add it only when I press the button.
(05:55) So forget about this slider thing dot input thing. I'm going to add the slider value only when I press the button. So in other words, oops, error on line 47. This has to say function, and then here. I'm putting these anonymous functions inside here. They're the callback for mouse pressed. OK, now let's try this again.
(06:16) So basically what I'm saying is, if I move this to one, I'm going to say add example image, add example image. Then I move my hand all the way over here and move this over. I'm going to say, add example image, add example image, add example image, et cetera. So then I might put it in the middle and add, add example, add example.
(06:33) All right, so I'm adding a bunch of images. So I'm trying to do something that makes sense to me, which is map the position of something. But you don't have to be so literal. You can probably come up with your own creative way of using this. But let's see if we can actually get it to work. So I'm going to hit train.
(06:48) I don't know, I probably need to change some code there. Classifiers not defined, yeah. So schedule, let's try to get the rest of the code in here. So this has to be predictor train. So this should be the same thing. And then once it's finished, call predictor. And it's probably now not classified, it's probably predict.
(07:09) Got results. And then the result, it's not a label anymore. I'm going to, I'm going to call it a value. And I'm going to say the value is zero. And then I'm going to say value equals result and predictor predict. So basically, everything is the same as before. I'm just changing the name of some things, because I'm no longer classifying and getting a label.
(07:36) I'm predicting and getting a number. I might have missed something. Let's try running this and see what happens. OK, label is not defined. sketch.js. Lines. So this is now value. OK, so we can see that these sample values, the starting value of zero is here. So I'm going to put this here. I'm going to add a bunch of examples.
(07:56) I'm going to move this over, move the slider over, add a bunch of examples. I'm going to move this over, move this over. Add a bunch of examples. I'm going to move this over, move this over. Add a bunch of examples, up, down, let's move it back over here. Let's add some more examples. Then let's call train.
(08:14) It's training, it's training, it's training, it's training, it's finished. And now one down to zero up to one. Pretty good, pretty, pretty, pretty good. Not perfect, but pretty good. Boy, wouldn't it be nice if I drew something on the screen? What the example, the example, the original example that Gene Kogan had made in the ml5 example, doesn't draw the text, but it actually draws a, I mean, it's nice to sort of see the text, but it draws a rectangle.
(08:45) The rectangle at, I'm going to say, value times width. Height divided by two. 50 comma 50. I'm going to must say rect mode center. And I'm going to say fill 255 comma zero comma 200. So now we have this nice rectangle here. Do bear with me one more time to do this. I got to retrain it. I'm going to say hey, add example image, add example image, and move it to the middle.
(09:15) I'm going to add some example images. That's not really the middle. I'm going to move it over here. I'm not, I could be more careful about this. I should move it up and down. Agh, up, up, up, up, up, down, down, down, down, down. Move it back to the middle. Let's add some more examples in the middle.
(09:29) Example image, example image, example image, let's move here. Let's add some more example images. All right, let's train. Train, train. [WHISTLE] Train, train. [WHISTLE] OK, training is complete. And now, oh, where was I standing? That's actually kind of important. So look, it's kind of, it looks like I almost have like a computer vision project.
(09:48) Right? That's following this. But you have to remember, that's not what's happening. I could have done this with my, interestingly enough, it's also working just with my face. Like it sort of got the sense of like something in front of the background probably. Even if I hold this up, it's sort of working.
(10:03) But remember now, I don't have to be so literal. I mean this could map to sound. I could use different images. Images of cats, for one thing. I don't even know. But the idea here is now, instead of the image producing a label, the image produces a number. And that's a regression. And remember, this wasn't not happening from scratch, this is happening because I'm basically saying this whole process that's already been learned, just take out that last part where it turns it into the fixed set of 1,000 MobileNet layer labels.
(10:36) And turn that into, turn that into something else. My own labels or my own number. You know, there's something else that I want to mention that's really important, because it came up in the chat. Now, where is all this happening? You might ask, am I like broadcasting images of myself somewhere out into the cloud for all this? What's kind of interesting and amazing about this, is everything is happening here locally inside the browser of this computer itself.
(11:06) It's all gone. It lives nowhere else. The ml5-library, the ml5-library and the p5-library are being accessed from the cloud. They're being downloaded when the pager. I could have local copies of them. The MobileNet model, the thing that I'm starting with, the MobileNet model is also being downloaded from the cloud.
(11:27) And the ml5-library is taking care of that for you. But once all of that is done, it's all happening here in the browser. And this, by the way, in theory would work on your phone as well. It's called MobileNet. That model is a small, not accurate, not super advanced, robust, but it's a small little model meant to work on a mobile device.
(11:48) So it's important to realize that everything is happening here locally on the device. OK? Thanks for watching, and I'll see you, and again, the same issue here, I probably want to save the train, save my retrained model. That's not possible right now easily with the ml5-library, but that will come in the future, and hopefully I'll do a video on that.
(12:08) OK? Goodbye, and I hope you make something with this. I enjoyed it.
Comments
Post a Comment