Real-Time Anomaly Detection For Quality Control

The scenario: a high-speed production line is producing thousands of products. Two cameras are installed to continuously control the quality of each product.
The goal: develop an algorithm that can check each product as fast as possible.
The constraint: you have an edge device with limited resources.
In this blog post, we will divide and conquer the problem. First by extracting meaningful features out of the images and then by using anomaly detection models to detect outliers from those features.
The key idea is to learn a lower dimensional representation of the visual input and to use this representation to train a classifier that can distinguish between normal and anomalous inputs.
We will explore some interesting methods for feature extraction, including histograms of oriented gradients (HOG), wavelet edge detection, and convolutional neural networks (CNNs).
Finally, we will cover two libraries that I found particularly useful to benchmark and implement algorithms in streaming data–PyOD and PySAD.
Extract features
There are many ways to extract features from images. We won't cover them all in this post, but we will focus on three methods that I found particularly interesting:
- histogram of oriented gradients (HOG),
- wavelet edge detection, and
- convolutional neural networks.
Histogram of Oriented Gradients
The histogram of oriented gradients is a popular technique in image processing and computer vision. The HOG descriptor can capture the shape and aspect of an object in a picture.

In a few words, the HOG descriptor is a vector of histograms built as follows:
- The image is divided into cells, e.g. by 8×8 pixel cells.
- The gradient magnitude and orientation are calculated for each pixel.
- For each cell, we construct a histogram that has "n" bins. Each bin in this histogram represents a specific range of orientations.
- The height of each bin is determined by adding up all the magnitudes–from all gradients in a cell–that fall within its corresponding orientation range.
- We combine all the histograms to create the final HOG descriptor. This final descriptor has a total length of "N" which is equal to "n" (number of bins in each histogram) times the number of cells in the picture.
- We normalize the HOG to minimize the impact of light variation in the picture.
Keep in mind that the HOG is a vector, which is composed of a series of small histograms. Therefore, to visualize the data, we need a trick. In the example of the HOG representation of a cup shown in the picture above:
- each bar is oriented perpendicularly to the average orientation of its corresponding cell, and
- the brightness of each bar is scaled according to the magnitude of the gradients in that cell.
If you are interested in knowing more about this method, the curse of dimensionality, and how to reduce its dimension, you can find more details in this blog:
Wavelet edge detection
A wavelet transform is a mathematical tool that decomposes a signal or a picture into its frequency components.
The wavelet image decomposition technique can be used in many different applications. For example, for compressing, denoizing, or detecting edges in pictures.

Once the image has been decomposed with the wavelet transform, the decomposed images can be analyzed to extract information. For example, the horizontal and vertical details can be used to detect the boundaries of objects in the image.
You can find more information about how to extract edges from wavelet pictures in the following post:
Auto-encoder
A common approach to unsupervised learning is auto-encoding. Auto-encoders are neural networks trained to learn a representation of the data in a lower dimensional–sometimes called "embeddings" or "latent space".
To illustrate how an auto-encoder works, let's have a look at the picture below. First, the input picture is encoded by a neural network in a lower dimensional space and further decoded by another network to reconstruct the picture.
The error between the input image and its reconstructed version is used as a cost function to train the encoder and decoder networks.

The idea is that the lower-dimensional representation captures the structure of the data.
In other words, the latent space extracts features from the input picture.
A traditional method would use the reconstruction error to detect errors in the image. In our case, we have limited resources, and we only want to use the encoder network to extract features from the image.
The only issue is that two pictures that are close to one another in the input space are not necessarily close in the latent space. Nevertheless, this issue can be solved with the Laplacian version of the auto-encoder.
Laplacian auto-encoder
The Laplacian auto-encoder is a variant of the standard auto-encoder and can be used to extract a lower dimensional space while maintaining the same neighbors between the original and the lower dimensional space.
For this purpose, we need to construct the K-Nearest-Neighbor Graph (K-NNG). The K-NNG is a data structure that can be used to store the relationship between data points in a dataset.
Each data point is visualized as a node in the graph, and the edges between nodes represent the similarity between data points. More specifically, each point is connected to its K nearest neighbors.
The similarity between data points can be measured using a variety of distance metrics, such as Euclidean distance, Manhattan distance, or cosine similarity, but also Wasserstein if we are comparing histograms–such as the HOG.

More details about distance metrics and the K-NN graph can be found in this post:
So why is the K-NNG useful for the Laplacian auto-encoder?
The laplacian auto-encoder has the same primary goal as any auto-encoder (reduce reconstruction error), but a term is added to the loss function to maintain the same neighbors between the original images and their embeddings a.k.a. lower dimensional space.

The benefit of such an auto-encoder is that the K-NN Graph in the input is similar to the K-NNG in the lower dimensional representation.
We can therefore expect that similar pictures will end up in a similar area in the latent space, and we could even use the lower dimensional representation of the pictures to detect anomalies.
More details regarding this method can be found in the following post:
Real-time anomaly detection with Python
Anomaly Detection is a process of identifying unusual patterns that do not fit with the main trends in the data.
There are multiple approaches, algorithms, and libraries to detect anomalies in real-time, each with its advantages and disadvantages.
To benchmark multiple algorithms and deploy models with streaming data, we can use the following libraries:
- PyOD is a toolkit for benchmarking and using anomaly detection algorithms.
- PySAD is a toolkit for detecting anomalies in streaming data.

Both PyOD and PySAD are open-source projects (BSD License 2.0) and are available on PyPI:
pip install pyod pysad
PyOD has 30 detection algorithms as of January 2024 which can seem overwhelming.
To get started, we can train a simple distance-based algorithm such as KNN or a popular algorithm such as iForest (Isolation Forest) to create a baseline that can further be used as a comparison with other algorithms.
More details about those libraries can be found in the following post:
Isolation Forest
iForest is a particularly versatile algorithm for detecting anomalies in a dataset.
It is a fast and effective way to find outliers in high-dimensional data. It works by constructing several decision trees, each of which is trained on a random subset of the data.
The length of the path to making a prediction indicates if a data point is common.

In other words, a point is an outlier if it can be classified with a small number of steps in the decision tree, e.g. point in red in the picture above.
To grasp the concept, imagine a 1D example with a temperature sensor. You have 10,000 points between 20 and 21 degrees and one outlier at 100 degrees. In that case, a decision tree will tend to classify the outlier early on. For instance, the first condition could be: "Is the temperature below 50 degrees?" and the 100-degree point would already be isolated from the 10,000 other points.
iForest calculates the anomaly score based on the average path length required by several isolation trees. Since these trees tend to isolate outliers early in their paths, the average path length is a good indicator for detecting outliers, i.e. outliers have a small path on average.
Results from Practical Applications
Texture Analysis in the Food Industry
In this project, we had the challenge of detecting different textures for a production line packing food for supermarkets. The shape of the product in its container was a good indicator for quality. In this case, quality meant appealing and looking delicious.
To build a quality score that reflects the shape, we used a dataset with enough images to represent the full spectrum of all possible shapes. Then, we used HOG features as descriptors for each image and clustered them based on different shape groups.
Each cluster was assigned a quality score by the manufacturer, which allowed us to quickly classify new images. This score was then used both to fine-tune the machines and to assess the quality of the end product.
Anomaly Detection in Physical Products
In this example, we had to assess the quality of a plastic product, we used the Laplacian auto-encoder to find dents, scratches, holes or any malfunctions. The method was trained on anomaly-free pictures, so we didn't need to have a large number of examples of anomalies–only a few to assess the effectiveness of the algorithm.
To train the encoder and decoder networks, we used around 1,000 anomaly-free images that we easily obtained from the production line. The method was effective in detecting deviations in the product's quality, but it required a lot of computational resources. This was particularly challenging because we had images of the product taken from different angles, and each image needed to be processed through the encoder network.
At the same time, this was not an issue in this case, as the production process allowed more than a full second to assess each product. This was not the case, however, for the next example.
Efficiency in High Frame-Rate Environments
In this example, we had multiple cameras that were recording between 10 and 50 frames per second (FPS). In that case, we used wavelet compression to reduce bandwidth and improve the processing speed by analysing the compressed version of each image.
We compressed images by around 80% by eliminating black pixels in subbands, and we didn't reconstruct images to spot anomalies; instead, we detected edges directly from wavelet subbands.
The Histogram-based Outlier Score (HBOS) algorithm was then used on features derived for the edge detector. The speed and simplicity of the algorithm was efficient for real-time analysis on a sliding window with PyOD and PySAD. This allowed us to compute anomaly scores for each frame in milliseconds.
However, we faced a challenge with the algorithm generating too many false positives. To address this, we developed a second algorithm specifically designed to evaluate images that might contain anomalies. This second algorithm had more processing time available, thanks to the initial classification from the first algorithm. This allowed us to use deep-learning methods and reduce the error rates significantly.
Conclusion
There are many different ways to approach anomaly detection, and we only covered a subset of a large field of possibilities. The best approach highly depends on the problem we are trying to solve.
In general, however, the following tips can help build an effective system:
- Approaching the problem using a divide-and-conquer strategy. Each model is likely to perform better when dealing with a specific task. For instance, one model could detect an area of interest, while a second model could determine if there's an anomaly present in that area.
- Use multiple Machine Learning models. Each model makes different errors, so using the results from several models can help reduce the overall error rates. In the "Efficiency in High Frame-Rate Environments" example, we linked two algorithms: the first at high FPS to identify potential anomalies, and the second to confirm them.
- Use a combination of supervised and unsupervised methods. For instance, supervised methods can be used to train models on known anomalies, while unsupervised methods can be used to identify new anomalies. For instance, in the "Texture Analysis in the Food Industry" use case, we labeled and clustered groups based on shape. This setup could have allowed us to train an unsupervised model for each group to better assess the quality and spot any deviations.
Anomaly detection is a diverse field, and the right approach depends on your specific problem. This post shows a small selection of feature extraction methods and algorithms that I consider effective for edge computing, especially when working with limited resources.
Thanks for reading.
Curious to learn more about Anthony's work and projects? Follow him on Medium and LinkedIn.