In this blog post, I am going to explain Line by Line code Explanation for Yolov3 pre-trained object detection for the coco dataset which is having 80 labels. we can get the weights files and cfg files from the yolo official website https://pjreddie.com/darknet/yolo/
image = cv2.imread('./testing images/crosswalk-featured.jpg')
#cv2.imshow('image',image)
#cv2.waitKey()
#cv2.destroyAllWindows()
original_with , original_height = image.shape[1] , image.shape[0]
Neural_Network = cv2.dnn.readNetFromDarknet('./Files/yolov3.cfg','./Files/yolov3.weights')
classes_names = []
k = open('./Files/class_names','r')
for i in k.readlines():
classes_names.append(i.strip())
#print(classes_names)
blob = cv2.dnn.blobFromImage(image , 1/255 , (320,320) , True , crop = False)
#print(blob.shape)
Neural_Network.setInput(blob)
cfg_data = Neural_Network.getLayerNames()
#print(cfg_data)
layer_names = Neural_Network.getUnconnectedOutLayers()
outputs = [cfg_data[i-1] for i in layer_names]
#print(outputs)
output_data = Neural_Network.forward(outputs)
prediction_box , bounding_box , confidence , class_labels = bounding_box_prediction(output_data)
final_prediction(prediction_box , bounding_box , confidence , class_labels , original_with / 320 , original_height / 320 )
- The first line reads an image file
crosswalk-featured.jpg
from the directorytesting images
and stores it as an array in the variableimage
. - The next two commented lines display the image using OpenCV.
- The next line retrieves the dimensions (width and height) of the image and stores them in variables
original_with
andoriginal_height
, respectively. - The line
cv2.dnn.readNetFromDarknet('./Files/yolov3.cfg','./Files/yolov3.weights')
loads the pre-trained YOLOv3 model from the Darknet framework. The two arguments are paths to the configuration file and the weights file, respectively. - The next lines read the class names for the COCO dataset from the file
class_names
and store them in the listclasses_names
. - The
cv2.dnn.blobFromImage()
function creates a 4-dimensional blob from the input image. The blob is a standardized format that the neural network expects as input. The arguments passed are the input image, scaling factor, output size, and mean subtraction values. - The
setInput()
function of the neural network is used to set the input blob as the input to the network. getLayerNames()
function returns the names of all layers in the neural network.getUnconnectedOutLayers()
function returns the indices of the output layers that are not connected to any other layer. In the YOLOv3 model, the output layer indices are 82, 94, and 106.forward()
function is used to perform a forward pass of the neural network and obtain the output predictions for the given input blob. The outputs variable is a list of outputs from the unconnected output layers.- The
bounding_box_prediction()
function is called, which extracts bounding box coordinates, class labels, and confidence scores from the output predictions using the IOU (Intersection over Union) technique. - The
final_prediction()
function is called to draw the predicted bounding boxes on the input image, along with the predicted class label and confidence score. Theoriginal_with / 320
andoriginal_height / 320
are the scaling factors used to convert the bounding box coordinates to the original size of the input image.
def bounding_box_prediction(output_data):
bounding_box = []
class_labels = []
confidence_score = []
for i in output_data:
for j in i:
high_label = j[5:]
classes_ids = np.argmax(high_label)
confidence = high_label[classes_ids]
if confidence > Threshold:
w , h = int(j[2] * image_size) , int(j[3] * image_size)
x , y = int(j[0] * image_size - w/2) , int(j[1] * image_size - h/2)
bounding_box.append([x,y,w,h])
class_labels.append(classes_ids)
confidence_score.append(confidence)
prediction_boxes = cv2.dnn.NMSBoxes(bounding_box , confidence_score , Threshold , .6)
return prediction_boxes , bounding_box ,confidence_score,class_labels
The function bounding_box_prediction()
is used to get the bounding box, class label, and confidence score of each detected object. It takes in output_data
(the output of the YOLOv3 neural network) as an argument. Inside the function, you're iterating over each element in output_data
to get the bounding box coordinates, class label, and confidence score. You're using argmax()
to get the index of the highest confidence score. If the confidence score is higher than the Threshold
, the bounding box coordinates, class label, and confidence score are appended to their respective lists. Finally, the non-maximum suppression algorithm is applied to the bounding boxes using `cv2.dnn.NMSBoxes.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
font = cv2.FONT_HERSHEY_COMPLEX
Threshold = 0.5
image_size = 320
def final_prediction(prediction_box , bounding_box , confidence , class_labels,width_ratio,height_ratio):
for j in prediction_box.flatten():
x, y , w , h = bounding_box[j]
x = int(x * width_ratio)
y = int(y * height_ratio)
w = int(w * width_ratio)
h = int(h * height_ratio)
label = str(classes_names[class_labels[j]])
conf_ = str(round(confidence[j],2))
cv2.rectangle(image , (x,y) , (x+w , y+h) , (0,0,255) , 2)
cv2.putText(image , label+' '+conf_ , (x , y-2) , font , .2 , (0,255,0),1)
In this section, you’re importing some necessary libraries for your model, including NumPy, Pandas, and Matplotlib. You’re also setting the font to be used in displaying the label and confidence score of the detected object. The Threshold
variable sets the minimum threshold for confidence scores, while the image_size
variable is the size of the image to be processed.
This function final_prediction()
is used to draw the bounding box around the detected object and display the label and confidence score. It takes in prediction_box
(the output of the NMS algorithm), bounding_box
(coordinates of the bounding box), confidence
(the confidence score of the detected object), class_labels
(the class label of the detected object), width_ratio
(the ratio of original image width to processed image width), and height_ratio
(the ratio of original image height to processed image height) as arguments.
Test Image:
Complete code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
font = cv2.FONT_HERSHEY_COMPLEX
Threshold = 0.5
image_size = 320
def final_prediction(prediction_box , bounding_box , confidence , class_labels,width_ratio,height_ratio):
for j in prediction_box.flatten():
x, y , w , h = bounding_box[j]
x = int(x * width_ratio)
y = int(y * height_ratio)
w = int(w * width_ratio)
h = int(h * height_ratio)
label = str(classes_names[class_labels[j]])
conf_ = str(round(confidence[j],2))
cv2.rectangle(image , (x,y) , (x+w , y+h) , (0,0,255) , 2)
cv2.putText(image , label+' '+conf_ , (x , y-2) , font , .2 , (0,255,0),1)
def bounding_box_prediction(output_data):
bounding_box = []
class_labels = []
confidence_score = []
for i in output_data:
for j in i:
high_label = j[5:]
classes_ids = np.argmax(high_label)
confidence = high_label[classes_ids]
if confidence > Threshold:
w , h = int(j[2] * image_size) , int(j[3] * image_size)
x , y = int(j[0] * image_size - w/2) , int(j[1] * image_size - h/2)
bounding_box.append([x,y,w,h])
class_labels.append(classes_ids)
confidence_score.append(confidence)
prediction_boxes = cv2.dnn.NMSBoxes(bounding_box , confidence_score , Threshold , .6)
return prediction_boxes , bounding_box ,confidence_score,class_labels
image = cv2.imread('./testing images/crosswalk-featured.jpg')
#cv2.imshow('image',image)
#cv2.waitKey()
#cv2.destroyAllWindows()
original_with , original_height = image.shape[1] , image.shape[0]
Neural_Network = cv2.dnn.readNetFromDarknet('./Files/yolov3.cfg','./Files/yolov3.weights')
classes_names = []
k = open('./Files/class_names','r')
for i in k.readlines():
classes_names.append(i.strip())
#print(classes_names)
blob = cv2.dnn.blobFromImage(image , 1/255 , (320,320) , True , crop = False)
#print(blob.shape)
Neural_Network.setInput(blob)
cfg_data = Neural_Network.getLayerNames()
#print(cfg_data)
layer_names = Neural_Network.getUnconnectedOutLayers()
outputs = [cfg_data[i-1] for i in layer_names]
#print(outputs)
output_data = Neural_Network.forward(outputs)
prediction_box , bounding_box , confidence , class_labels = bounding_box_prediction(output_data)
final_prediction(prediction_box , bounding_box , confidence , class_labels , original_with / 320 , original_height / 320 )
Yolov3 Detection:
You can get the complete Yolov3 architecture explanation from here: https://medium.com/p/74cf9ade2044/edit
LinkedIn: https://www.linkedin.com/feed/
Computer vision Blogs: https://medium.com/me/stories/public