Introduction to basic computer vision 1 -Basic operations with OpenCV

Viraj Kadam
4 min readAug 6, 2024

--

This article will cover the basic operations on images with OpenCV

First we will be importing OpenCV and defining basic function to display images using matplotlib

import os 
import cv2
import numpy as np
import matplotlib.pyplot as plt

from IPython import display
def show_image(image,
title=None,
cmap=None,
fig_s=(10,6)):
'''show loaded image'''
plt.figure(figsize=fig_s)
plt.imshow(image,cmap=cmap)
plt.axis('off')
if title:
plt.title(f'{title}')

plt.show()

Next, we will see how to load a image using openCV. The loaded image will be in BGR(blue,green,red) format, indicating the order of the image channels.

# load a image 

# params
# 1) filename = path to the image
# 2) flags = 1(IMREAD_COLOR:default arg),0(cv2.IMREAD_GRAYSCALE:grayscale),-1(cv2.IMREAD_UNCHANGED:unchanged)


cat_1 = cv2.imread(filename=cat_path,
flags=1)

But most of the image libraries and matplotlib expect the image to be in RGB format. The next line of code converts BGR to RGB format.

#convert the image from bgr to rgb
cat = cv2.cvtColor(cat_1,cv2.COLOR_BGR2RGB)

Now we can display the image using the show function defined above.

Checking the properties of the image object

Now that we have loaded the image, we will explore the basic properties of the loaded image object, and how to manipulate it.

As we can see, the image object is actually a numpy array.

Image indexing and geometric operations

As we see that the image object is actually a numpy array, we can do the operations we perform on a numpy array to perform manipulations on the image.

Selecting individual image bands

As we saw that the image is actually a composition of individual BGR color channels, we can seperate the individual channels from the image

#selecting individual bands
red = cat[:,:,2]
green = cat[:,:,1]
blue = cat[:,:,0]


#the above operation can be performed by using this:
red,green,blue = cv2.split(cat)

Merge individual color channels
We can merge the splitted image channels again

#merge image bands 

img_m = cv2.merge([red,green,blue])

show_image(img_m,'merged channels')
merged individual channels of the image

Select specific region
We can crop the image based on the location, using indexing in numpy.

sub = cat[100:250,100:250,:].copy()

show_image(sub,'Sampled Image')
cropped image

We can also manipulate and change values of specific regions after selecting it.

#change the pixel values 

sub[50:100,50:100,:] = 0

show_image(sub,'added patch')

Geometric operations on Image

Resize : The operation on image to convert it from its original dimension to some other dimension is called as resizing a image. The following code resizes the image. There are various algorithms that can be used to resize a image, we will be using the cubic interpolation algo.

# check the original dimensions 
cat.shape[:2] # height,width

height, width = cat.shape[:2]

# resize using cv2.reszie
res = cv2.resize(cat, # image
(200, 200), #size
interpolation = cv2.INTER_CUBIC
)

show_image(res,f'Resized Image to be 200*200')

Translation : Translation refers to the shifting of a objects location in the image. We can perform translations by defining a translation matrix.

x_,y_ = 120,100


# matrix defining that translation
translation_matrix = np.float32([[1,0,x_],
[0,1,y_]])
rows,cols = cat.shape[:2]
transl = cv2.warpAffine(cat,translation_matrix,(cols,rows))
show_image(transl,f'Translation')

Rotation : rotation of image by a certain angle can be done in open cv. We have to define the image centre, the angle of rotation, which gives us a rotation matrix, which we can use to apply the rotation to the image.

#get a rotation matrix for rotaion by 90 deg
Rot_Mat = cv2.getRotationMatrix2D(center=((cols-1)/2.0,(rows-1)/2.0), # center of image
angle = 90,
scale=1)

#perform the rotation
rot = cv2.warpAffine(cat,Rot_Mat,(cols,rows))

show_image(rot,f'Rotation by 90')

In the next article, we will explore how to use the basic functions we used to perform tasks like identifying text blocks and extraction of texts from documents using open cv and tessaract.

Resources

--

--

No responses yet