YOLOv4: A Comprehensive Guide to Object Detection using Darknet and OpenCV

kamal_DS
6 min readJul 13, 2023

--

Object detection is a fundamental task in computer vision that involves identifying and localizing objects within an image or video. Over the years, several object detection algorithms have been developed, each with its own strengths and limitations. One such algorithm that has gained significant popularity is YOLOv4 (You Only Look Once), known for its high accuracy and real-time performance.

In this blog post, we will explore the YOLOv4 algorithm and guide you through its implementation using OpenCV. We will cover the architecture, explain the code, and demonstrate how to perform object detection on both images and videos.

Introduction to YOLOv4

YOLOv4 is the fourth iteration of the YOLO algorithm, which revolutionized object detection by introducing a single-stage, end-to-end approach. Unlike traditional two-stage detectors, YOLOv4 processes the entire image in a single pass, making it highly efficient. It achieves state-of-the-art accuracy by leveraging a combination of advanced techniques, including a powerful backbone network, feature pyramid network, and multiple detection heads.

Understanding the Code

The code provided implements YOLOv4 using opencv. Let’s break it down step by step:

Importing the necessary packages: We start by importing the required packages, including OpenCV, NumPy, time, and argparse. These packages provide the necessary tools for image processing, numerical operations, and command-line argument parsing.

YOLOv4 Class:

The Yolov4 class encapsulates the functionality of YOLOv4. It initializes the weights and configuration file paths, defines the list of classes, and loads the pre-trained model using cv2.dnn.readNetFromDarknet. It also sets up the necessary parameters for inference.

import cv2
import numpy as np
import time
import argparse

class Yolov4:
def __init__(self):
self.weights = opt.weights # loading weights
self.cfg = opt.cfg # loading cfg file
self.classes = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat',
'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench',
'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe',
'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis',
'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog',
'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table',
'toilet', 'TV', 'laptop', 'mouse', 'remote', 'keyboard',
'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase',
'scissors', 'teddy bear', 'hair drier', 'toothbrush']
self.Neural_Network = cv2.dnn.readNetFromDarknet(self.cfg, self.weights)
self.outputs = self.Neural_Network.getUnconnectedOutLayersNames()
self.COLORS = np.random.randint(0, 255, size=(len(self.classes), 3), dtype="uint8")
self.image_size = opt.img_size

Bounding Box Function:

The bounding_box method takes the output of the YOLOv4 model and extracts the bounding box coordinates, confidence scores, and class labels. It applies a confidence threshold and performs non-maximum suppression to filter out weak detection's and overlapping boxes.

    def bounding_box(self, detections):
try:
confidence_score = []
ids = []
cordinates = []
Threshold = 0.5
for i in detections:
for j in i:
probs_values = j[5:]
class_ = np.argmax(probs_values)
confidence_ = probs_values[class_]

if confidence_ > Threshold:
w, h = int(j[2] * self.image_size), int(j[3] * self.image_size)
x, y = int(j[0] * self.image_size - w / 2), int(j[1] * self.image_size - h / 2)
cordinates.append([x, y, w, h])
ids.append(class_)
confidence_score.append(float(confidence_))
final_box = cv2.dnn.NMSBoxes(cordinates, confidence_score, Threshold, .6)
return final_box, cordinates, confidence_score, ids

except Exception as e:
print(f'Error in : {e}')

Prediction Function:

The Predictions method takes the filtered bounding box information and overlays the boxes, class labels, and confidence scores on the original image. It also calculates the inference time and displays it on the image.

 def predictions(self, prediction_box, bounding_box, confidence, class_labels, width_ratio, height_ratio, end_time,
image):
try:
for j in prediction_box.flatten():
x, y, w, h = bounding_box[j]
x = int(x * width_ratio)
y = int(y * height_ratio)
w = int(w * width_ratio)
h = int(h * height_ratio)
label = str(self.classes[class_labels[j]])
conf_ = str(round(confidence[j], 2))
color = [int(c) for c in self.COLORS[class_labels[j]]]
cv2.rectangle(image, (x, y), (x + w, y + h), color, 2)
cv2.putText(image, label + ' ' + conf_, (x, y - 2), cv2.FONT_HERSHEY_COMPLEX, .5, color, 2)
time = f"Inference time: {end_time:.3f}"
cv2.putText(image, time, (10, 13), cv2.FONT_HERSHEY_COMPLEX, .5, (156, 0, 166), 1)
return image

except Exception as e:
print(f'Error in : {e}')

Inference Function:

The Inference method performs the actual inference on the input image. It pre-processes the image, sets it as the input to the YOLOv4 model, and retrieves the output predictions. It then calls the bounding_box and predictions functions to process and visualize the results.


def Inference(self, image, original_width, original_height):
try:
blob = cv2.dnn.blobFromImage(image, 1 / 255, (320, 320), True, crop=False)
self.Neural_Network.setInput(blob)
start_time = time.time()
output_data = self.Neural_Network.forward(self.outputs)
end_time = time.time() - start_time
final_box, cordinates, confidence_score, ids = self.bounding_box(output_data)
outcome = self.predictions(final_box, cordinates, confidence_score, ids, original_width / 320,
original_height / 320, end_time, image)
return outcome
except Exception as e:
print(f'Error in : {e}')

Main Execution:

In the main section of the code, we parse the command-line arguments using argparse. If an image path or video path is provided, the corresponding inference is performed using the Yolov4 class. The results are displayed and optionally saved to a video file.


if __name__ == "__main__":
parse=argparse.ArgumentParser()
parse.add_argument('--weights', type=str, default='yolov4.weghts', help='weights path')
parse.add_argument('--cfg', type=str, default='yolov4.cfg', help='cfg path')
parse.add_argument('--image', type=str, default='', help='image path')
parse.add_argument('--video', type=str, default='', help='video path')
parse.add_argument('--img_size', type=int, default='', help='size of w*h')
opt = parse.parse_args()
obj = Yolov4() # constructor called and executed


if opt.image:
try:
image = cv2.imread(opt.image, 1)
original_width , original_height = image.shape[1] , image.shape[0]
obj.Inference(image=image,original_width=original_width,original_height=original_height)
cv2.imshow('Inference ',image)
cv2.waitKey()
cv2.destroyAllWindows()
except Exception as e:
print(f'Error in : {e}')

if opt.video:
try:
cap = cv2.VideoCapture(opt.video)
fps = cv2.CAP_PROP_FPS
width = cap.get(3)
height = cap.get(4)
fourcc = cv2.VideoWriter_fourcc(*'XVID')
output = cv2.VideoWriter("demo.avi", fourcc, fps, (int(width), int(height)))
while cap.isOpened():
res, frame = cap.read()
if res == True:
outcome = obj.Inference(image=frame, original_width=width, original_height=height)
cv2.imshow("demo", outcome)
output.write(outcome)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
else:
break
cap.release()
cv2.destroyAllWindows()
except Exception as e:
print(f'Error in : {e}')

Running the Code

To try out the YOLOv4 implementation, follow these steps:

  1. Make sure you have the required dependencies installed, including OpenCV.
  2. Download the YOLOv4 weights here and configuration file here and place them in the same directory as the code.
  3. Open a terminal or command prompt and navigate to the directory containing the code.
  4. To perform object detection on an image, run the command python yolov4.py --image path/to/image.jpg. Replace path/to/image.jpg with the actual path to your image file.
  5. To perform object detection on a video, run the command python yolov4.py --video path/to/video.mp4. Replace path/to/video.mp4 with the actual path to your video file.
  6. for images = python Inference_args.py — weights yolov4.weights — cfg=yolov4.cfg — image=bus.jpg — img_size=320
  7. for videos = python Inference_args.py — weights yolov4.weights — cfg=yolov4.cfg — video=traffic_signs.mp4 — img_size=320

Resource Utilization

If you are currently running the YOLOv4 inference using OpenCV on a CPU, you may experience high CPU usage, with the CPU utilization reaching above 90%. To improve the performance and achieve better frame rates per second (FPS), it is recommended to utilize GPU acceleration.

Conclusion

In this blog post, we explored the YOLOv4 algorithm and learned how to implement it using opencv. We discussed the architecture, explained the code, and demonstrated how to perform object detection on images and videos. YOLOv4 is a powerful algorithm that achieves impressive accuracy while maintaining real-time performance, making it a valuable tool in various computer vision applications.

Feel free to experiment with the code and apply YOLOv4 to your own projects. Stay curious and keep exploring the exciting world of object detection.

Code files available here

follow my medium blog posts here to learn about computer vision.

--

--

kamal_DS

Interested to work in the field of Artificial Intelligence, Machine Learning, Deep Learning, NLP and Computer Vision.