Ganesh Sharma
- Sep 26, 2023
- 6 min read

Object Detection Using YOLO: An Inference Guide

Updated: Oct 18, 2023

Introduction

YOLO (You Only Look Once) is a fast object detection algorithm. Unlike other algorithms that look at an image multiple times to find objects, YOLO looks at the entire image just once and detects all the objects at once. This makes it very fast compared to other algorithms.

The original YOLO algorithm was created by Joseph Redmon and implemented in the C programming language. The code is available on GitHub so anyone can use it. YOLO has become very popular because of its speed and accuracy for object detection.

However, there are other good object detection algorithms too like Faster R-CNN and SSD.

This article explains how to use a pre-trained YOLO model with OpenCV to quickly start detecting objects in images. By leveraging YOLO and OpenCV together, you can build high quality object detection into applications very rapidly.

Here are some of the common and impactful applications of YOLO:

Surveillance and Security: YOLO can be used in security cameras and surveillance systems to detect unauthorized personnel, identify specific objects, or even count the number of individuals in a given area.
Traffic Management: It can be employed to detect and count vehicles on roads, identify traffic violations, or recognize license plates.
Retail: In stores, YOLO can monitor customer traffic, detect shoplifting incidents, or even track inventory by recognizing items on shelves.
Healthcare: YOLO can assist in medical imaging by detecting and classifying anomalies or specific structures in images such as X-rays or MRIs.
Industrial Automation: In factories, YOLO can identify defective products on an assembly line or monitor worker safety by recognizing when they're in dangerous zones.
Agriculture: Farmers can use drones equipped with YOLO to monitor crops, detect pest infestations, or assess the health of livestock.
Wildlife Monitoring: YOLO can be employed in cameras set up in natural habitats to detect and classify different animal species, helping in research and conservation efforts.
Augmented Reality (AR): YOLO can be used to detect objects in real-time and overlay virtual information or graphics on them.
Robotics and Drones: For robots or drones that need to navigate or interact with their environment, YOLO can be a critical tool for object recognition and collision avoidance.
Smart Cars: In the automotive industry, YOLO can play a role in autonomous driving systems to detect vehicles, pedestrians, traffic signs, and other important objects on the road.
Assistive Technology: For people with visual impairments, devices equipped with YOLO can provide real-time descriptions of their surroundings.
Sports Analysis: YOLO can track players, balls, or other elements in a game to provide analytics or automate camera controls.

Input Image

For the input image, we will use one that features zebras. We will then detect and recognize these zebras.

Implementation

Import the libraries

Import Numpy, Keras, and Matplotlib

Define the BoundBox Class


class BoundBox:
    def __init__(self, xmin, ymin, xmax, ymax, objness=None, classes=None):
        """
        Initializes a BoundBox object.
        """

    def get_label(self):
        """
        Get the label of the bounding box based on the class probabilities.

        Returns:
            int: The label index of the bounding box.
        """

    def get_score(self):
        """
        Get the confidence score of the bounding box.

        Returns:
            float: The confidence score of the bounding box.
        """

This class represents a bounding box. A bounding box is typically a rectangular box used to enclose detected objects in images.

xmin, ymin, xmax, ymax: These represent the coordinates of the bounding box.
objness: Represents the objectness score (how sure the model is that there's an object in this box).
classes: Holds the probability distribution over all classes for the object enclosed by the bounding box.
label: Represents the index of the class with the highest probability.
score: Represents the highest class probability.

The methods within this class include:

get_label(): Returns the label of the detected object. If the label is not yet determined, it finds the class with the highest probability.
get_score(): Returns the confidence score of the detected object. If the score is not yet determined, it sets and returns the score for the most probable class.

Define the Yolo Classes and Anchors

class YoloCA:
    def __init__(self):
        """
        Initializes a YoloCA (YOLO Class and Anchors) object with predefined class labels and anchor boxes.
        """
        self.labels = ["person", "bicycle", .. add the labels]  # List of class labels
        self.anchors = [...]  # List of anchor boxes in the format [width1, height1, width2, height2, ...]

This class encapsulates the class labels and anchor boxes for a YOLO (You Only Look Once) object detection model.

labels: A list containing the names of classes that the YOLO model can detect.
anchors: A list containing predefined anchor box sizes. In YOLO, anchor boxes are used as references to predict the dimensions of detected object bounding boxes.

Define the Object Detection

class ObjectDetection:
    def __init__(self):
        """
        Initializes an ObjectDetection object with YoloCA as a component.
        """

    def sigmoid(self, x):
        """
        Apply sigmoid activation to the input.

        Args:
            x (numpy.ndarray): Input data.

        Returns:
            numpy.ndarray: Output after applying sigmoid activation.
        """

    def decode_netout(self, netout, anchors, obj_thresh, net_h, net_w):
        """
        Decode the network output to obtain bounding boxes.

        Args:
            netout (numpy.ndarray): Network output.
            anchors (list): List of anchor boxes.
            obj_thresh (float): Objectness threshold.
            net_h (int): Network input height.
            net_w (int): Network input width.

        Returns:
            list: List of BoundBox objects representing bounding boxes.
        """

    def correct_yolo_boxes(self, boxes, image_h, image_w, net_h, net_w):
        """
        Correct the coordinates of bounding boxes based on network input and image dimensions.

        Args:
            boxes (list): List of BoundBox objects.
            image_h (int): Original image height.
            image_w (int): Original image width.
            net_h (int): Network input height.
            net_w (int): Network input width.
        """

    def interval_overlap(self, interval_a, interval_b):
        """
        Calculate the overlap between two intervals.

        Args:
            interval_a (list): First interval [x1, x2].
            interval_b (list): Second interval [x3, x4].

        Returns:
            float: Overlap between the intervals.
        """

    def bbox_iou(self, box1, box2):
        """
        Calculate the Intersection over Union (IoU) between two bounding boxes.

        Args:
            box1 (BoundBox): First bounding box.
            box2 (BoundBox): Second bounding box.

        Returns:
            float: IoU between the bounding boxes.
        """


    def non_max_suppression(self, boxes, nms_thresh):
        """
        Apply non-maximum suppression to eliminate redundant bounding boxes.

        Args:
            boxes (list): List of BoundBox objects.
            nms_thresh (float): NMS threshold.
        """


    def load_image_pixels(self, filename, shape):
        """
        Load and preprocess an image.

        Args:
            filename (str): Filepath of the image.
            shape (tuple): Target shape (height, width) for the image.

        Returns:
            tuple: Tuple containing the preprocessed image, original image width, and original image height.
        """

    def get_filtered_boxes(self, boxes, labels, thresh):
        """
        Filter and extract boxes with confidence scores above a threshold.

        Args:
            boxes (list): List of BoundBox objects.
            labels (list): List of class labels.
            thresh (float): Confidence threshold.

        Returns:
            tuple: Tuple containing the filtered boxes, filtered labels, and filtered scores.
        """

    def visualize_boxes(self, filename, filtered_boxes, filtered_labels, filtered_scores):
        """
        Visualize bounding boxes on an image.

        Args:
            filename (str): Filepath of the image.
            filtered_boxes (list): List of filtered BoundBox objects.
            filtered_labels (list): List of filtered class labels.
            filtered_scores (list): List of filtered confidence scores.
        """

    def classify(self, model_filename, photo_filename):
        """
        Perform object detection and visualization.

        Args:
            model_filename (str): Filepath of the trained model.
            photo_filename (str): Filepath of the input image.
        """

This class houses various utility methods associated with the YOLO object detection model:

sigmoid(x): This is an activation function used to transform any input into a value between 0 and 1.
decode_netout(...): Transforms the raw output of the YOLO neural network (which is usually a tensor) into a set of bounding boxes. It uses the anchor boxes and applies certain transformations like sigmoid to convert network outputs into meaningful bounding box parameters.
correct_yolo_boxes(...): After obtaining the bounding boxes from the network output, their coordinates are adjusted based on the original image's dimensions.
interval_overlap(...): Calculates the overlap between two intervals. This is useful for determining the intersection between two bounding boxes.
bbox_iou(...): Calculates the Intersection over Union (IoU) for two bounding boxes. IoU is a measure of the overlap between two bounding boxes and is used extensively during the Non-Max Suppression (NMS) process.
do_nms(boxes, nms_thresh): Non-Max Suppression. After detecting objects, there might be multiple boxes for a single object. NMS ensures that only the bounding box with the highest confidence score is retained while others are suppressed.
load_image_pixels(...): Loads an image from the file, resizes it to fit the input shape required by the model, and converts it to a format suitable for prediction.
get_boxes(...): Filters the list of bounding boxes based on a threshold. This ensures that only boxes with a high enough confidence score are considered.
draw_boxes(...): This method is used to visualize the results. It draws the detected bounding boxes on the image and displays it.

Perform Detection and Recognition

detection = ObjectDetection()
detection.classify('model.h5', 'zebra.webp')

Here, 'detection' is an instance of the class 'ObjectDetection'. We have provided a pretrained YOLO model named 'model.h5' to perform inference, and an image named 'zebra.webp' as the input image.

As we can observe, we have an image where zebras have been detected with an accuracy of over 99 percent.

We have provided only the code template. For a complete implementation, contact us.

If you require assistance with the implementation of the topic mentioned above, or if you need help with related projects, please don't hesitate to reach out to us.