Start your machine vision coding with Python
Motivation
Human beings perceive the environment and environment with our vision system. The human eye, brain, and limbs work together to perceive the environment and act accordingly. An intelligent system can perform those tasks that require a certain level of intelligence if performed by a human being. So to perform intelligent tasks, machine vision system is one of the important things for a computer. Typically, the camera and image are used to collect the information needed to get the job done. Image processing and computer vision techniques help us perform tasks similar to those performed by humans, such as image recognition, object tracking, etc.
In computer vision, the camera works like a human eye to capture the image and the processor works like a brain to process the captured image and generate meaningful results. But there is a basic difference between humans and computers. The human brain works automatically and intelligence is an innate acquisition. On the contrary, the computer has no intelligence without human instruction (program). Computer vision is the way to provide proper instruction so that it can function in a way that is compatible with the human vision system. But the capacity is limited.
In the upcoming sections, we will discuss the basic idea of how the image is formed and can be manipulated using python.
How the image is formed and displayed
The image is nothing more than a combination of pixels with different color intensities. The jargon for ‘pixels’ and ‘color intensity’ may be unfamiliar to you. Don’t worry. It will be very clear, just read the article to the end.
pixel it is the smallest unit/element of the digital image. Details are in the image below.
The screen is made up of pixels. In the above figure, there are 25 columns and 25 rows. Each small square is considered a pixel. The configuration can accommodate 625 pixels. Represents a screen with 625 pixels. If we make the pixels shine with different color intensity (brightness), it will form a digital image.
How does the computer store the image in memory?
If we look closely at the image, we can compare it to a 2D matrix. An array has rows and columns, and its elements can be addressed by their index. The matrix structure is similar to a matrix. And the computer stores the image in a computer memory array.
Each element of the array contains the intensity value of a color. In general, the intensity value ranges from 0 to 255
. For demonstration purposes, I’ve included a matrix representation of an image.
Color and grayscale image
gray scale the image is a black and white image. It is formed with a single color. A pixel value close to 0 represents darkness and gets brighter with higher intensity values. The highest value is 255, which represents the color white. A 2D matrix is enough to contain the grayscale image, as the last figure shows.
color images it cannot be formed with a single color; there can be hundreds of thousands of color combinations. Mainly, there are three primary color channels RED (R), GREEN(G), and Blue(B)
. And each color channel is stored in a 2D matrix and maintains its intensity values, and the final image is the combination of these three color channels.
This color model has (256 x 256 x 256) = 16,777,216 possible color combinations. You may visualize the combination here.
But in computer memory, the image is stored differently.
The computer does not know the RGB channels. Know the value of intensity. The red channel is stored with high intensity and the green and blue channels are stored with medium and low intensity values, respectively.
NumPy basics for working with Python
NumPy is a fundamental Python package for scientific computing. It works primarily as an array object, but its operation is not limited to the array. However, the library can handle various numeric and logical operations on numbers [1]. You will get NumPy official documentation here.
Let’s start our journey. First thing’s first.
- Import of the NumPy library.
It’s time to work with NumPy. As we know, NumPy works with an array. So, let’s try to create our first 2D array of zeros.
It’s as simple as that. We can also create a NumPy array with all ones as follows.
Interestingly, NumPy also provides a method to fill the array with any value. The simple syntax array.fill(value)
can do the job.
matrix ‘b’
with everyone now is full of 3
.
- The function of the seed in the case of random number generation
Just take a look at the following coding examples.
In the first cell of code, we have used np.random.seed(seed_value)
, but we haven’t used any initialization for the other two code cells. There is a big difference between generating random numbers with and without seeding. In the case of random seeding, the generated random number remains the same for a specific seed value. On the other hand, without an initial value, the random number changes for each execution.
- Basic operations (max, min, mean, reshape, etc.) with NumPy
NumPy has made our lives easier by providing numerous functions to perform mathematical operations. array_name.min(), array_name.max(), array_name.mean()
The syntaxes help us find the minimum, maximum, and mean values of an array. Encoding Example —
The indeies of the minimum and maximum values can be extracted with the syntaxes array_name.argmax(), array_name.argmin()
. Example –
Die reshaping is one of the important operations of NumPy. array_name.reshape(row_no, column_no)
is the syntax for reshaping an array. When reshaping the array, we need to be careful about the number of elements in the array before and after the reshape. In both cases, the total number of elements must be the same.
- Array indexing and splitting
Each element of the array can be addressed with its column and row
number. Let’s generate another matrix with 10 rows and columns.
Suppose we want to find the value of the first value in the array. It can be extracted by passing the row and column index (0, 0).
Row and column specific values can be split with the syntax array_name[row_no,:], array_name[:,column_no].
Let’s try to split the core elements of the array.
OpenCV Basics
OpenCV is an open source Python library for Computer Vision developed by Intel [2]. I will discuss some uses of OpvenCv although its scope is wide. You will find the official documentation here.
I have used the following image for demonstration purposes.
- OpenCV and Matplotlib library import
Matplotlib is a visualization library. It helps to visualize the image.
- Loading the image with OpenCV and visualizing with matplotlib
We have read the image with OpenCV and I visualized it with the matplotlib library. The color has been changed because OpenCV read the image in BGR format instead of rgbbut matplotlib wait for the picture on rgb Format. So, we need to convert the image from BGR to RGB.
- Convert image from BGR to RGB format
Now, the image seems to be fine.
- Convert image to grayscale
We can easily convert the image from BGR to grayscale with cv2.COLOR_BGR2GRAY
It is as follows.
The image above is not grayed out correctly, even though it has been converted to grayscale. It has been visualized with matplotlib. By default, matplotlib uses a color mapping other than grayscale. To display it correctly, we must specify the grayscale color mapping in matplotlib. Let’s do that.
Rotating is also an easy task with OpenCV
. cv2.rotate()
function helps us do that. Clockwise and anticlockwise 90-degree and 180-degree
rotation are shown below.
We can resize the image by passing the width and height pixel values to the cv2.resize()
function.
Sometimes we need to draw over an existing image. For example, we need to draw a bounding box on an image object to identify it. Let’s draw a rectangle on the flower. cv2.rectangle()
function helps to draw on it. It takes some parameters like the image on which we draw the rectangle, the coordinate point of the upper left corner (pt1)
and the lower right corner (pt2)
and the thickness of the boundary line. Below is an example of coding.
There are other drawing functions. cv.line(), cv.circle() , cv.ellipse(), cv.putText(), etc
. Full official documentation is available. here
[3].
Play with NumPy
We will change the intensity value of an image. I’ll try to keep it simple. So consider the grayscale image shown above. Find the shape of the image.
It shows that it is a 2D array with a size of 1200 x 1920
. In the basic operation of NumPy, we learned how to partition an array.
Using the concept, we have taken the segment of the grayscale image matrix [400:800, 750:1350]
and replaced the intensity values with 255
. Finally, we visualize it and find the image above.
Conclusion
Computer vision is one of the promising fields in modern computer technology. I always emphasize basic knowledge of any domain. I have discussed only the primary knowledge of computer vision and shown some practical coding. The concepts are very simple but can play an important role for the computer vision beginner.
This is the first article of the computer vision series. Get connected to read the upcoming articles.
[N.B. Instructor Jose Portilla’s course helps me to gather knowledge.]