ml5.js: Object Detection with COCO-SSD

 ml5.js: Object Detection with COCO-SSD - YouTube

https://www.youtube.com/watch?v=QEzRxnuaZCk


Transcript:

(00:01) Hi, everyone. Welcome to another ml5js video. In this video, I am going to talk about the object detector in ml5, which is a new feature as of 0.5.0, so you want to make sure you are on at least that version before you get started and try the same code that I'm about to demonstrate to you. What do I mean by object detection? So far, I have covered image classification.

(00:27) Meaning we have an image, maybe it has a cat in it. And when that image is sent into the machine learning model, in the case of the previous examples, a model called MobileNet. I get back a list of labels and confidence scores, and most likely in this case I would get the label cat with hopefully a confident score of something like 95%.

(00:51) There might be some other guesses with lower confidence scores, but ultimately the goal is to have a single classification. A single label come out and be assigned as the result of the prediction of this image. Now, what happens in the case of object detection? Let's say I have this same image. An object detection model will not only label something in the image, but give a bounding box as to where that object it detects is.

(01:26) So instead of just saying this image is classified as cat, an object detection model will say in this image there is an object of type cat that is located at a particular xy location with a particular width and a particular height. The model will also return a confidence score for how certain it is that there is a cat at this exact location.

(01:52) So maybe that would also be something like 95%. And what's special about object detection, instead of just classifying the image with one label, here if you're detecting an object in an image, it could detect more than one thing. So maybe there's also, I don't know I'm drawing the rest of the cat.

(02:11) Maybe there's also a dog. There's my dog. If the image is of a cat and a dog, we could get two bounding boxes. The second one with the label dog and another xy width and height for its bounding box. And right here on the ml5 reference page we can see an example of this. Here's an image of a cat, the bounding box is marked, the label cat is indicated along with a confidence score here of 65.41%.

(02:42) Now, this doesn't happen by magic. This happens because there is a pre trained model that presumably has been trained already on many images of cats and dogs with those bounding boxes marked and labeled. How does a data set like that even exist? Something new that is now in the ml5 documentation is a section called model and data providence.

(03:06) You'll find this for every single pre trained model that's in the ml5 library. This is a project that's been started by Ellen Nickles and I encourage you to click the link to find out more about Ellen and her work on model and data provenance. And what she has done here is created model biographies and data biographies.

(03:25) Anytime you're using a machine learning model, you want to ask yourself the question, what data was used to train this model, who trained this model, in what context and for what reasons? Anytime you're going to use a pre trained model in a project, you want to think about the ethical implications of where that model came from and how you're using it and researching into the biography, so to speak, of the data behind the model and the model itself is incredibly important when considering those kinds of questions.

(03:50) In this case, the data set behind the object detection model that I'm going to use is a data set called Coco. Coco, or common objects and context, is a large scale object detection segmentation and captioning data set. So before you watch the rest of this video, I would pause, go to the Coco data set website, click explore, and poke around a little bit.

(04:09) Also in this video series, you'll find videos about the posenet pre trained model. In Coco, in addition to the object detection data, there's also a set of 200,000 images with 250,000 instances of people labeled with particular key points on their body. Coco also includes image segmentation, which is a very similar concept to object detection but instead of a particular bounding box, every single pixel is labeled as part of a particular category.

(04:40) So there are all the pixels for the giraffe versus the pixels for the clouds in the sky, and so on and so forth. I also want to suggest to you two readings if you're interested in learning more about data sets for machine learning. "The Humans of AI Project" by Philipp Schmitt is a project that explores specifically the Coco image data set.

(05:00) And you can learn a lot more about where did those images come from, who took those photos, and how Philipp Schmitt puts it, exposing the myth of magically intelligent machines. I also would highly suggest reading "Excavating AI, The Politics of Images and Machine Learning Training Sets" by Kate Crawford and Trevor Paglen.

(05:18) This essay explores the ImageNet database, another very well-known image database that is the data set behind the MobileNet model, which serves as the foundation for many of the ml5 image classification and transfer learning examples that are throughout this particular playlist. Circling back to ml5, if you use the object detector there are two pre trained models you can select from at present.

(05:41) Hopefully in the future maybe you'll even train your own model or we'll be able to incorporate other open source object detection models. But right now there's Yolo, or which stands for you only look once, and Coco SD. In this video, I'm going to demonstrate using Coco SD, but I encourage you to explore and experiment and do your research about the Yolo model, as well.

(06:00) The Coco SSD model comes from TensorFlow. So there's the TensorFlow JS port of the TensorFlow Coco SSD model, that's what ml5 is using. Certainly on the GitHub page for that model, you can find code for using it in JavaScript without the ml5 library that you could explore as well as more background about how it was trained and what it does.

(06:20) Now, one thing I'll note is it only detects 80 classes of object. Not a huge number if you think about it. You can find that list of labels as part of the ml5 materials themselves, as well. All right, it's time to write some code. The first thing that I want to do is have an image to try to detect objects in.

(06:46) There we go. So I've made a simple P5 sketch that uses the preload function to load a particular image that I've uploaded to the P5 web editor, and in the setup function I'm making a canvas and drawing that image. You might recognize Gloria Pickel from my coding in the cabana series. Along with her good friend, Greta Goose.

(07:05) Unfortunately Evie Mango is not pictured here, but you can learn more about them on their Instagram, which I'll link to in the video's description. Now that I've got my image, the next step is for me to load the Coco SD model itself. For this basic example, I'm making heavy use of the preload function which allows me to load images and pre trained models without any callbacks, and everything is ready to go once I get to the setup function.

(07:31) But certainly in other contexts, you might want to use a callback or write your code in a different way. And you'll find all of that in the actual official ml5 examples themselves. Let's double check that things are working correctly by just console logging the detector object. Oh, and I should put that in setup to see that it's loaded properly.

(07:49) Whoops, by accident I put a capital O there, it should be a lowercase o object detector. The console isn't necessarily going to show us anything useful here, just a lot of the stuff that's part of that detector object in ml5. But it's more clear for us to look at the documentation then see what's logged here in the console.

(08:06) I happen to know that what I need to do is call detector dot predict, pass at the image, and then a callback for when I've got the detections. So I'll say got detections as the name of my callback function. Let's write that function, and let's log the results. So this is the same pattern in many other ml5 pre trained models.

(08:30) Load the model, call predict, get a result, error first in the callback in case there's an error, and maybe I should check for that. And then do something with the results. I just want to log them right now. Detector dot predict is not a function. Whoops. Looking at the documentation, the function is not predict, its detect.

(08:53) So predict is a general word for when you want to ask a machine learning model to give you the output associated with a given input. But in this specific case, the ml5 function is named detect because it's a more descriptive word of what we're actually doing. We're going to change this to detect, let's also comment out this unnecessary console log and run it again.

(09:18) Aha, look at this. Three objects. OK, there is a cat, there is a dog, what's the third one? It's something in this list of 80 things. Did you see it there? Object 0 is the dog. Here's the confidence score and the xy width and height. It also looks like it gives you something called normalized, which are probably all of these values but mapped to a range between 0 and 1.

(09:44) Object one is the couch, it detected the couch. And then object three I'm going to assume is the cat. Let's draw those bounding boxes. So I can write a loop to look at all of the elements of the array. And, of course, there's countless other ways you could do this with different types of array functionality, but this simplest way, I'll just say let object equals results index i.

(10:09) Let's first just draw the bounding box. At object x object y with the width and height. Let's give it a sort of distinctive color just so it really is emphasized with a given thickness. And makes sure there's no fill blocking it out. There we go. I've got three rectangles, one drawn around the dog, one around the cat, and one around the couch.

(10:33) Let's add the labels just so we can see them. And where do I want to draw it? The X location, but shift it a little over, and the Y location, but shift it a little bit down. Maybe I want to put it in the center. There's no rules here. I'm just going to do it however I'm going to do it. Run it again.

(10:55) There we go, couch, dog, cat. Now of course I want to think about visual design and contrast. This isn't the best visualization of it, but you can see it's working. Maybe if you're following along, pause this video, try to add the confidence score. That's a nice little exercise for you to do.

(11:13) Hopefully you have some creative ideas of how you might want to use this or experiment with this. An application that I would imagine you might want to try is try this model out on a real time video feed. So I have a webcam here on this laptop. I can rewrite this code to use the P5 createCapture object, and then pass the video as the thing we're looking at into the machine learning model, same as we did with the MobileNet image classification examples.

(11:38) So I'm going to save this code as is, and you'll find it linked in the video's description. And I'm going to duplicate it, and rewrite it with the capture object. Call it webcam, comment out the image, and add a video instead. I need a draw function, because now I'm going to be looping and drawing every frame of the video in real time.

(12:03) Let's make sure the video is the same size as the Canvas. And let's run this and see what happens. So I see my video there. I console.log the detections, but I only detected things once and I found nothing. Why is that? That's because I called detect in setup with the video once, got the results, and never called detect again.

(12:25) So now I need to create this kind of loop system, where I first call detect, I get the detections and once I've gotten the detections, let's call it again. Why did I say object? I should be saying detector. And look, it's recognizing me. Now I don't love the way that I've written this, because drawing the results here outside of draw, and it happening in this sort of like separate sequence, is a little bit prone to error.

(12:57) So I want to just adjust the way I'm doing this. I'm gonna take this loop, and I'm going to put it into draw. This way I know my drawing sequence is always happening in the right order. Draw the video, draw the results on top of it. But this isn't where I got the results. Where I got the results is in the got detections function.

(13:16) So I'll just use a global variable here to sort of link those two things. So let's create a variable called detections. I'm going to make it an empty array to start. Then in the got detections function, I will just set detections equal to the results. So now it's a global variable that gets set whenever there are new detections, and whatever the latest detections are, they'll always be drawn in draw by me adding detections here.

(13:43) Let's run this and let's see if I can get some object detecting going. Oh boy, things froze. Error. What went wrong? Results is not defined. Sketch line 33. Detections is the global variable, but I'm still using results here. I need to change that also to detections. Notice how when the error happened, the sort of video element is still going, but the Canvas where I'm separately drawing the video got frozen.

(14:11) I probably only want to see one of those. So I don't need to see the original video element. I can call video.hide to remove that. Run it again. All right, person, cell phone. Oh, it still sees me. How about book? This is a book that I'm currently reading. It's called Weapons of Math Destruction, also highly recommended when thinking about algorithms and machine learning.

(14:35) I happen to have a paintbrush, Scissors? Baseball bat? OK, batter up. [MUSIC AND CHEERING] All right, so you get the idea. Something that I might want to add to this is some kind of debouncing or interpolation. You can see that it's very, very, very noisy. So that's something that I will also include some references for in the video's description, and maybe even an extra example that adds that.

(15:01) But this wraps up this video. So thank you for watching this video tutorial on the ML5 object detector. If you have some creative ideas or things you want to try, let me know. Write something in the comments. You can also go to The Coding Train web page associate with this video and submit your creative examples and experiments there.

(15:18) Thanks for watching, and I'll see you in another ML5 video. Goodbye. [MUSIC]

Comments

Popular posts from this blog

ml5.js: KNN Classification Part 2

ml5.js: Feature Extractor Classification

ml5.js: Transfer Learning with Feature Extractor