2022-10-20

# Counting Chips

On this article I'm going to talk about a project I worked on related to computer vision and object counting.

Specifically, we worked on automating a task that used to be completed manually by operators. They were counting how many chips had been scraped during the manufacturing process, for accounting purposes.

In order to verify the number of chips scraped, the components where laid down on a paper sheet. Afterwards, pictures were taken at the sheets with chips on them. With these pictures in hand, someone had to manually count how many chips there were on the sheet.

This was a tedious job, prone to human error. It was also repetitive and monotonous. Luckily, this is precisely the kind of tasks a machine would be good at!

Without further ado, we jumped into the task. Here there are mainly two approaches used to tackle it: classical Computer Vision and Machine Learning.

By Computer Vision I mean algorithms that are not using learning techniques, but are merely performing simple transformations on the image. On the other hand, by Machine Learning I'm referring to Object Detection techniques. In Object Detection, a neural network is trained to spot a given object of interest based on the labelled training data.

Computer Vision will definitely require less compute, but are at the same time less powerful and less good at generalizing - for example they may not work when some conditions, such as lighting, change.

## Finding Contours

With Image Processing we implemented functions available in the OpenCV library.

Breaking down our steps:

The first point is the most crucial within the pipeline and the one which more heavily depends on the quality of the dataset. As I mentioned above, the background is going to be uniform and white. Conversely, chips will be a shade of yellow/brown. From a computer's point of view, these colors are basically just similar values. For example if white is 0 and yellow is 4, we can tell the program to only keep pixels with a value of 4. This way we'll have masked out the background. In practice, it's slightly more complex. Pictures in RGB have 3 layers, each one representing the intensity of the respective color (Red, Green or Blue). Another picture format is HSV (Hue, Saturation and Value). This time we only have one parameter describing the color (Hue), while the remaining two are referred to intensity (Saturation) and luminosity (Value). Under this format it's easier to mask the picture.

In step 2. we are doing contour detection for our pictures. The function is going to draw lines around the chips in the picture. Once this is done we will be able to perform operations on these contours. For each contour we can calculate the perimeter and the area. If we notice some false positives because of some yellow spots on the sheet, we can simply filter out objects with area lower than a threshold. Once this is done, we can count the objects and display the contour on top of the picture.

## Object Detection

If pictures are not taken with proper lightning or background, or if the colors differs from the expected yellow due to metal reflection, the algorithm will fail. To overcome this eventuality, we tried with Object Recognition. Object recognition algorithms lay their foundations on Image Classification, used for example in OCR. A Neural Network is trained to categorize images into multiple classes. Nonetheless object detection in addition to categorizing object is also providing the x,y location in the picture and a bounding box describing where the object is located. In order to do this the algorithm is sliding a windows around the image and then performing the classification task inside each of these fragments of the image. A good performing object detection algorithm is YOLO.

Although we used a pretrained model, we still needed to fine-tune it to our specific use case. This was done providing some pictures and doing some labelling. We took 100 pictures and drew a bounding box around each chip in each picture and then fed this data to the model.

## Conclusions and Deployment

After running the counting task using the YOLO we noticed that it was working better than with the more simple Computer Vision approach in non optimal pictures. In fact Convolutional Networks are able to generalize better. For example if two chips were overlapping and therefore the contours were not completely separate, the YOLO approach was able to still predict them as two separate elements. In order to make the tool usable, we deployed it on a server and wrapped in with a Web UI. Users could in a few clicks upload the picture of the chips and download an Excel file with the counting results.