Design and Development of AI based Camera for Wildlife Study

Shubham Rakshe
20 min readJul 10, 2021

--

In partial fulfillment of B.E. Degree in Electronics Engineering. As laid down by University of Mumbai during academic year(2020–2021)

Authors:

Shubham Rakshe(linkedin.com/in/shubham-rakshe-861842193)

& Ashish Pradhan(https://www.linkedin.com/in/ashish-pradhan1414)

ABSTRACT:-

Visual data analytics is increasingly becoming an important part of wildlife monitoring and conservation strategies. In this work, we discuss our solution to the images based on Pattern Recognition algorithm for Identification of the species of leopard, which can be easily extended to other wildlife species. Various factors like poor quality images, lighting and pose variations, and limited images per identity make leopard identification is a difficult task for Wildlife Institute of India (WII). Consequently, we propose to utilize the model in deep learning. The proposed algorithm is based on CNN (convolution neural network) with a Sequential model in deep learning. We also utilize several data augmentation techniques to improve the model’s accuracy, robustness and generalization across views and image quality variations. The algorithm has given the accuracy of 77% on our training and testing model. To capture the images and provide a cost-effective solution for identification, a sensor-based camera is developed which is equipped with the Nvidia Nano Jetson module. This will provide faster and better analysis of the captured images. Thus, this project will be helpful in wildlife research to get improved quality of images and advanced techniques for analysis.

INTRODUCTION:-

Mumbai city is gifted with a wide range of Biodiversity. Amidst the city is the forest area of “Sanjay Gandhi National Park”. This forest has a good number of Leopards and protecting them and understanding their life is very important to preserve biodiversity. Wildlife researchers are actively involved in the census as well as tracking the activities and handling crises that arise in the forest area from time to time. In Leopard census, that is usually carried out in the month of May every year, the team sets up camera traps at strategic locations in the forest and the pictures taken are analyzed by the expert. The leopards are identified based upon the pattern of spots on their bodies and hence expert knowledge is required for this highly skillful job. The cameras that are used are equipped with motion and temperature sensors and get activated when the animals cross them while travelling in the jungle. Many times, the picture quality is hampered and hence it becomes difficult to identify them. Moreover, human expert knowledge is required to identify them. The cameras that are currently being used by the forest department are imported and are costly. Also, they are lagging the technological developments that are happening in western countries.

In this project, we present the results of classification of the images that have been provided to us. The dataset is quite small and thus poses problems in implementing the AI based algorithms, which require a large data set. However, the current data set has been improved by using some techniques like data augmentation to increase the number and filtering to improve the quality of the images. A Convolution Neural Net is used for training the data set for six leopard species and the model accuracy is tested on the separate database.

We also propose to develop a camera, which will be equipped with the thermal and motion sensors. The camera module will also be equipped with ESP32 CAM module with Nvidia Jetson to improve the quality pictures taken and preprocessing the images to provide a clean and precise data base required for accurate classification of the species of leopard.

PROPOSED SYSTEM

Problem Statement:-

● To develop a Pattern Recognition algorithm for Identification of species using the database of images provided by Wildlife Researcher.

● Development of Sensor based Camera which is cost effective and also provides images comparable with the Cuddeback camera currently being used.

● Introduce advanced features for image processing and classification using a special processing unit in the camera module.

Project Usefulness & Social Impact:

Project Usefulness:

The trail cameras used for wildlife census are currently being imported and are very costly. This project will result in a device useful for Wildlife Research and Ecology Study of wild animals. Hence it can be a part of Make-In -India Mission.

Social Impact:

Development of proprietary hardware technology for product fosters under “Make in India 2.0” Or “Atmanirbhar Bharat” specifically under the PLI scheme of the government of India.

Will help in the analysis of wildlife so that more accurate animal surveys like. Poonam-Avlokan

Help in identifying & demarcating green corridors

Architecture:

The block diagram of the proposed system is shown in Figure 1. It provides a system for Pattern recognition & Classification by using a Deep Learning algorithm (CNN) & techniques.

Figure 1: Block Diagram of System

This project develops a system for Pattern recognition & Classification by using a Deep Learning algorithm (CNN) & techniques. The Camera is the target to be controlled via ESP32 for capturing images of animals and then images are transferred to NVIDIA JETSON Nano for image classification (currently of leopard).

HARDWARE AND SOFTWARE DESIGN AND DEVELOPMENT

1.Hardware Description: -

This section describes the hardware development of a proposed AI based wildlife camera and the components used therein. The components have been selected so as to develop an efficient and cost effective product for picture clicking & for best quality of leopard images with perfect moment of animal is required.

ESP32 Cam based Motion Triggered Image Capturing Device with PIR Motion Sensor is used for image capturing and primary image processing. We have accommodated all the components on the Vero board and connected them with the soldering lines. The captured image is then sent to a pre-programmed email. A copy of the captured image is also saved in the onboard Micro SD Card of ESP32- cam.

⮚ Introduction to ESP32-Cam:-

The ESP32-Cam is a very small camera module with the ESP32-S chip. Besides the OV2640 camera and several GPIOs to connect peripherals, it also features a microSD card slot that can be useful to store images taken with the camera or to store files to serve to clients.

Figure 2: ESP32-Cam Module

The ESP32-Cam doesn’t have any onboard USB connector like NodeMCU-ESP8266 comes with an onboard micro-USB connector, so we need a FTDI programmer to upload code through the U0R and U0T pins (serial pins).

Figure №3: FTDI Programmer

There are three GND pins and two pins for power: either 3.3V or 5V.

GPIO 1 and GPIO 3 are the serial pins. we need these pins to upload code to your board. Additionally, GPIO 0 also plays an important role, since it determines whether the ESP32 is in flashing mode or not. When GPIO 0 is connected to GND, the ESP32 is in flashing mode.

Circuit Diagram:

Figure below gives the circuit schematic for ESP32 Cam based Motion Triggered Image Capturing Device, The intermediate trigger circuit between Motion sensor and ESP32 will generate an interrupt to wake up the ESP32-Cam Module when the motion is detected by PIR Motion Sensor. since ESP32-Cam doesn’t have any onboard USB connector as NodeMCU-ESP8266 comes with an onboard micro-USB connector, we will use this FTDI programmer to upload code through the U0R and U0T pins (serial pins). GPIO0 (IO0) needs to be connected to GND, to put ESP32-Cam in the flash mode so that we can upload code.

Figure №4: ESP32 Cam based Motion Triggered Image Capturing Device

Trail Cameras and Proposed Camera Module:

proposed camera
Trail camera

Figure №5: Cameras

In the national parks to understand the animal behaviour, Trail Cameras are fitted on the trunk of the trees, at strategic locations. The actual trailing cameras which are currently being used are of the Cuddeback brand which are very costly and are imported. Moreover, we get them only when the technology has grown older in other countries like the USA. Figure below shows the cuddeback and proposed ESP32 cam:

The cuddeback cameras are known for their quality of images. They are equipped with sensors for motion and temperature detection and LED display for flashing. Whenever the motion is detected in front of the camera then it wakes up and takes a picture and saves into the memory card. During leopard census, these cameras are installed at strategic locations and positions such that it takes pictures from both the sides of the leopard. Initially it is put in the test mode to check whether it clicks an image or not. Then it is put in the ARM mode. All these are done manually by the census teams.

To compare the images taken by both the cameras, we are using a sift algorithm in python to find the keypoints and descriptors of the two images to be compared. We load FlannBasedMatcher which is the method used to find the matches between the descriptors of the 2 images.

2.Software :

The software part includes the classification of leopard images by using a deep learning algorithm in python language. Libraries Used for CNN are NumPy, Open-cv2, OS, TensorFlow, ImageDatagenerator, Etc. Following table gives the important steps used in this project.

Raw Data Set:

The dataset for image classification of images of Leopard is gathered from Mr. Nikit Surve. The following pictures are a part of a database that has been provided by leopard researchers. These images are clicked using the Cuddeback night vision camera, which has thermal and motion sensors.

L24 Left
L24 Right

Figure №6:Different view of same species

Leopard Classification (Manual Mode):

The images are analyzed by human experts, by checking the pattern on both the sides of the leopard’s body (belly middle region) as shown below. These patterns are unique for each Leopard and a great expertise is required to study and match these patterns. Figure below shows how the experts check the images for classification. Once both the patterns are checked the leopard seen by the camera is classified and labelled.

L21 Left
L24 Right

Figure №7:Exact part for feature detection

Size of Data Set:

We have taken 260 total images, 156 images for training and 104 for testing. Our classification was done in two methods as first we did it by using simple Sequential 2D Convolutional Neural Network. And then as we used Nvidia Jetson Nano we used a transfer learning technique with Resnet — 18 for which we used 260 images for training, testing and validation.

WORKING OF ALGORITHM :

● The Classification part is done by using CNN (Convolutional neural networks). The code is written in a Python environment. & further coding for hardware is into the C/C++ & Linux environment.

Libraries used: So, the libraries used are NumPy, Open-cv2, OS, TensorFlow, ImageDatagenerator, Etc.

Loading the Dataset:

■ For loading of the dataset, we required a total 260 real images taken by Wildlife Photographer Sanjay Gandhi National park . In which there are 156 images are train dataset images & 104 are test dataset images.

■ All the images are going to be classified as per the Labels provided to leopard by their category is like: L24, L25, L54, L61, L66 & L82.

■ To adjust the size of images, we are rescaling using the ImageDataGenerator function. By using NumPy arrays we are converting all those images in Array format.

■ Following are different moments of L24 Species of Leopard which is one example of how we provide input dataset.

Figure №8: Data Augmentation and Preparation for Classification.

Image Enhancement:

As shown in the above images, most of the images are edited for increasing the dataset. As the dataset improves better will be the accuracy for classification. Images are converted into binary images (1’s & 0’s /Black & white).

Image Restoration:

Image restoration involves, the part of the image is kept to the blur so that the main part of the leopard is visible to the training model for classification. The most straightforward and a conventional technique for image restoration is deconvolution, which is performed in the frequency domain by using the Fourier transform to the images.

Colour Image Processing:

Image classification is the process of segmenting images into different categories based on their features. A feature could be the edges in an image, the pixel intensity, the change in pixel values, and many more.

An image consists of the smallest indivisible segments called pixels and every pixel has a strength often known as the pixel intensity. Whenever we study a digital image, it usually comes with three color channels, i.e. the Red-Green-Blue channels, popularly known as the “RGB” values. Why RGB? Because it has been seen that a combination of these three can produce all possible color pallets. Whenever we work with a color image, the image is made up of multiple pixels with every pixel consisting of three different values for the RGB channels.

Figure №9: Experiment in Jupyter Lab (Used specifically for data science).

The output of image.shape is (897,1265, 3). The Shape of the image is 897x 1265 x 3 where 897 represents the height, 1265 is the width, and 3 represents the number of color channels. When we say 897 x 1265 it means we have 1134705 pixels in the data and every pixel has an R-G-B value hence 3 color channels. So we convert the RGB image into a grayscale image. Note a grayscale value can lie between 0 to 255, 0 signifies black and 255 signifies white.

Flowchart of CNN layers:

CNN or the convolutional neural network (CNN) is a class of deep learning neural networks. In short, think of CNN as a machine learning algorithm that can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image, and be able to differentiate one from the other. The following figure explains the operation of CNN used in this project.

Figure №10: Convolution Neural Network Architecture

CNN Algorithm:

In the Convolution Neural Network, we are using a Sequential model to train the model. In the Sequential model, we are using different layers like Conv2D, Max Pooling, Dense Layer. Figure 9 explains the process of 2 D convolution of an input image.

Figure №11: Convolution of the given matrix to feature map extraction.

We understand that the training data consists of grayscale images which will be an input to the convolution layer to extract features. The convolution layer consists of one or more Kernels with different weights that are used to extract features from the input image.

○ When we slide the Kernel over the input image (say the values in the input image are grayscale intensities) based on the weights of the Kernel we end up calculating features for different pixels based on their surrounding/neighboring pixel values.

○ We are using (3,3) kernel to match the image with activation function as RELU & SOFTMAX. As there are 6 types of different labels in the output so have to provide SoftMax function in the output layer.

○ There are a total 16 in the first layer and 32 in the second layer are the number of nodes in each layer. This number can be adjusted to be higher or lower, depending on the size of the dataset.

Compilation of the model:

○ Next, we need to compile our model. Compiling the model takes three parameters: optimizer, loss and metrics.

○ The optimizer controls the learning rate. We used ‘RMSprop’ as our optimizer. RMSprop is generally a good optimizer to use for many cases. The RMSprop optimizer adjusts the learning rate throughout training.

○ The learning rate determines how fast the optimal weights for the model are calculated. A smaller learning rate may lead to more accurate weights (up to a certain point), but the time it takes to compute the weights will be longer.

○ We used ‘categorical_crossentropy’ as a loss function. This is the most common choice for classification.

○ To make things even easier to interpret, we will use the ‘accuracy’ metric to see the accuracy score on the validation set when we train the model.

Training the model:

○ The next step is to train our model. To train, we used the ‘fit()’ function on our model with the following parameters: training data, test data, validation data, and the number of epochs.

○ The number of epochs is the number of times the model will cycle through the data. The more epochs we run, the more the model will improve, up to a certain point. After that point, the model will stop improving during each epoch. For our model, we have set the number of epochs to 50. We are able to get 77% accuracy as output.

Prediction/Testing:

○ Here, we give the new input images path to predict by trained model.

○ Here again we have to create an array of these images, So that it will Predict the images by using model.predict Function.

We are able to get 77% accuracy by using CNN algorithm for image classification but there were some 2–3 misclassifications of images that take place. So, to improve accuracy and for faster processing, we are using Nvidia Jetson nano as it has capability of parallel computing which is possible due GPU (Nvidia Jetson Nano has 128-Maxwell graphical cuda processor) which is data science centric.

3.NVDIA JETSON NANO:

This small, embedded machine works purely for AI application at remote locations. Power availability is a problem and proper computing machines require more power in processing out information captured via various sensors.

Figure 10 shows a benchmark of different embedded modules rendering optimizations and it is visible how the Nvidia Jetson Nano is far better than other embedded boards available commercially to the developers.

Figure 12: Data Rendering Capabilities of Embedded Systems

Our project goal is to classify the images into different categories and to learn new species existence independently over itself, thus the requirement of the Nvidia Jetson Nano standouts to be a perfect match for our project.

1. microSD card slot for main storage

2. 40-pin expansion header

3. Micro-USB port for 5V power input, or for Device Mode

4. Gigabit Ethernet port

5. USB 3.0 ports (x4)

6. HDMI output port

7. DisplayPort connector

8. DC Barrel jack for 5V power input

9. MIPI CSI-2 camera connectors

Since use of Nvidia Jetson Nano was a totally new concept, our group member Ashish completed a course in Nvidia Jetson Nano from Nvidia Deep learning Institute [DLI].

Refer Appendix Section for more information regarding the Nvidia Jetson Nano

Transfer Learning using Resnet -18 in Nvidia Jetson Nano

Resnet-18 is one of the Residual learning networks used in our project for classification of leopard individual species and this network was easy to train on the Nvidia Jetson nano via transferring the learning as taking the network without any previous weights and biases.

It is a convolutional neural network that is 18 layers deep. Resnet are Built on Residual blocks that are used to train over a dataset very deeply such that the learning is transferred from a layer to another layer. Doing so we can achieve classification of 1000 different objects categories via images.

It is a problem that is encountered while training artificial neural networks that involve gradient based learning and backpropagation. We know that in backpropagation, we use gradients to update the weights in a network. But sometimes what happens is that gradient becomes vanishingly small, effectively preventing the weights from changing values. This leads the network to stop training as the same values are propagated repeatedly and no useful work is done.

To solve such problems, Residual neural networks are introduced.

Residual neural networks or commonly known as Resnet are the type of neural network that applies identity mapping. What this means is that the input to some layer is passed directly or as a shortcut to some other layer. Consider the below figure and Eqn. (1) & (2) that shows basic generalised residual block:

Figure №13: Generalised block for Resnet-18 and Resnet-34.

shows how the plain layer network and residual layer network perform as the data channelizes through the network while training the error margin. Resnet-34 shows less error margin than the Resnet-18, the error margin plays a significant role as it saves computation resources to much extent.

In Fig. 12 each Resnet block is either 2 layers deep (used in a small network like Resnet 18,34) or 3 layers deep (Resnet 50, 101, 152). Thus this shows how the layers are constructed in a network in the form of 2-layers in Resnet-18/34 (on left) and 3-layers in Resnet-50/101/152.

Figure №14: Construction of Layers

Result & Discussion:

Hardware Testing Results:

As per the experiments conducted using ESP32 CAM Module, The PIR sensors are able to sense up to the 7 to 10 m distance from the sensor & capture the image, It will capture an image and send that image via an email if any motion is detected. This type of camera is to be tested in the actual environment, but due to the current covid situation, we have not been able to test it. Some experiments have been conducted at home to test the working of the camera under various situations as follows:

A number of experiments are conducted in different situations like daylight or in the dark. Quality of images taken is tested by using opencv in python. We found that the designed system is capable of capturing images as it is able to detect heating bodies & also able to capture better quality of image so that it can be used for classification purposes.

Following are some of the sample images of a dog captured by esp32 cam on an experiment conducted by us to test the hardware.

Figure №15: Experimental Setup

Due to the current pandemic situation, we could not see the field operation of these cameras, but to study how they work, we have obtained a camera from the wildlife researcher Mr. Nikit Surve who is currently doing his research on leopards at SGNP.

The given camera was made to work properly by putting a memory card and also by mending the connections for the battery. It was then used to click the pictures in the home environment only. It definitely provides a better quality image as compared to the developed camera which certainly can be improved with the use of a better camera module. A sample picture captured under identical situations and conditions is shown in the following figure.

Image captured by Cuddeback
Image Captured by ESP32cam

Figure №16

The quality of image captured by esp32 is not that good as compared to a cuddeback camera. In future work, a better-quality camera can be used.
Similarity between the two images can be found out by using Opencv and Python.

Software Testing Results:

Classification of Leopard images by using CNN classification by sequential model in the project thus giving 77% accuracy on classification of the images.

Output:

Figure №17: CNN based Classification

We were also tried by using the Nvidia Jetson with resnet-18 model for classification of images. The accuracy is calculated using how the model is performing in a single class (total number of correct classifications by total images entity in that class) and then averaging up individual performance.

The accuracy of nvidia jetson using Resnet or other available networks increases as the data set will increase.

PROBLEMS INCURRED, POSSIBLE SOLUTIONS & LIMITATIONS

Problem incurred & its solution:

1. Preparing Database(images): We had to gather the number of real images from a wildlife department. But for image classification we required several images to be trained. So, we had to edit many more images to develop a new image for classification. Images have been sequentially labelled for ease of understanding for the user.

2. Controlling the parameters in the Classification algorithm: For better adjustment of function parameters involved in the sequential model of CNN, we had to understand, learn, comprehend, and exercise Deep Learning techniques.

Limitations:

1. Area covered by the camera: This is one of the features that depends on the camera angle. If the angle is narrow, a lower amount of area will be covered. Thus, prune the chances of capture of high-quality animal images.

2. Speed of the moving object: It increases the chances of capture of blurred images.

3. Indoor environment: This project has been developed for indoors. For an outdoor environment we must use a stereo vision concept that requires two cameras and the concept of background subtraction, as there will be a lot of interference and ambient movement.

APPLICATION, FUTURE SCOPE, CONCLUSION

Applications:

This is a useful tool to conduct surveys and census of species of habitual animal present in that biosphere. It Will also help in the analysis of wildlife so that more accurate animal surveys e.g., Poonam-Avalokan can be conducted to help in identifying & demarcating green corridors.

One of the applications named as ATMAN AI is developed by DRDO for covid-19 detection & analysis.

Future Scope:

● Better Filtering Techniques for Dataset.

● Better A.I Algorithm for classification.

● Development of Tools.

● Development of work specific curated hardware for more cost-effective products.

Conclusion:

In this project, we have tried to automate the task of classification of leopard images, taken by Cuddeback cameras in the forest region of SGNP. These images were processed initially, and CNN based algorithms for pattern recognition have been implemented using python and have given reasonable accuracy of 77% considering the quality and database size.

We have also developed a camera module which is able to capture images on detection of motion of a living organism. It uses motion and temperature sensors for that. It saves the image and sends email to the set user. This camera is supposed to be a substitute for high-cost cameras; however, we were not able to get the quality comparable with the available camera. The desired quality can be achieved by the use of high-quality camera modules.

A detailed study of NanoJetson for processing of these images is done. We are currently working on the algorithms to achieve better classification accuracy..

Thus a camera powered by AI algorithms which not only captures wildlife images but also provides classification of animal species is proposed and partially completed.

REFERENCES & APPENDIX

References:

[1] Blur image detection and principle behind it: Blur Detection with open cv by Adrian Rosebrock, pyimagesearch(blog) September 7, 2015.

[2]Adrian Rosebrock, “OpenCV Fast Fourier Transform (FFT) for blur detection in images and video streams”, June 15, 2020

[3]Ankita Shukla, Connor Anderson, Gullal Singh Cheema, Pei Guo, Suguru Onda, Divyam, Anshumaan, Saket Anand and Ryan Farrell, “A Hybrid Approach for Tiger Re-Identification”, by IIIT-Delhi, India- Brigham Young University, Provo, UT, Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 0–0

[4]Leopard identification techniques in wildlife monitoring-13th march 2018-Wildlife act.

[5] Image Classification Engine For Wildlife Species Identification,

ML Based Image Classification for Wildlife Species Identification (valiancesolutions.com)

[6] ESP32 Cam module blog written by, ByVeeruSubbuAmi

[7]Image Filtering Techniques using C++

[8]Nvidia Jetson Nano usage documentation and user guide

[9]Software Development Kit (SDK) of Nvidia Jetson Nano, GitHub repository for developing various data science projects Developed by Dustin Franklin

[10]Rectified Linear Units Improve Restricted Boltzmann Machines (toronto.edu) Shows the Rectified Linear equation that is implemented in layer.

& Here we successfully done with our BE project..!

Thank You..!

--

--

Shubham Rakshe
Shubham Rakshe

Written by Shubham Rakshe

I’m a recent Engineering graduate(2021) of VIT,Mumbai. •Enthusiastic to make carrier in Artificial Intelligence & ML Technology.

Responses (1)