A Time Series Anomaly Detection Model for All Types of Time Series

10 min readNov 8, 2020

My Journey to improve Lazy Lantern’s automated time series anomaly detection model

From giphy.com

As an Insight Data Science Fellow, I had the opportunity to work with Lazy Lantern, a computer software company that uses machine learning to provide autonomous analytics for businesses to make data-driven decisions by better understanding user behaviors. As a consultant, I was tasked with improving the time series anomaly detection models used by Lazy Lantern. In this article, I will walk you through my journey from identifying problems and challenges to reaching an unusual, yet an actionable solution.

https://gitlab.com/gitlab-org/gitlab/-/issues/279552

https://gitlab.com/gitlab-org/gitlab/-/issues/279553

https://gitlab.com/gitlab-org/gitlab/-/issues/279554

https://gitlab.com/gitlab-org/gitlab/-/issues/279555

https://gitlab.com/gitlab-org/gitlab/-/issues/279556

https://gitlab.com/gitlab-org/gitlab/-/issues/279557

https://gitlab.com/gitlab-org/gitlab/-/issues/279558

https://gitlab.com/gitlab-org/gitlab/-/issues/279559

https://gitlab.com/gitlab-org/gitlab/-/issues/279560

https://gitlab.com/gitlab-org/gitlab/-/issues/279561

https://gitlab.com/gitlab-org/gitlab/-/issues/279562

https://gitlab.com/gitlab-org/gitlab/-/issues/279563

https://gitlab.com/gitlab-org/gitlab/-/issues/279569

https://gitlab.com/gitlab-org/gitlab/-/issues/279570

https://gitlab.com/gitlab-org/gitlab/-/issues/279571

https://gitlab.com/gitlab-org/gitlab/-/issues/279572

https://gitlab.com/gitlab-org/gitlab/-/issues/279573

https://gitlab.com/gitlab-org/gitlab/-/issues/279574

https://boxingfree.medium.com/while-everyone-is-distracted-by-social-media-successful-people-double-down-on-an-underrated-skill-ed6db6a2bfb6

Background

Imagine you own a website and sell your products online. Unfortunately, for your recent product release, you made a mistake and put a ridiculously low price for your products. You are a busy person, of course, and you did not realize that there was a pricing error. Yet, when people discover this “crazy deal”, there will likely be an enormous increase in traffic on your website. If you don’t correct the error fast, you could end up with huge losses, just like Amazon’s pricing error in the 2019 Prime Day event. However, if you have a tool to monitor the number of clicks of the checkout or the add to cart buttons on your website, you could detect the unusually high demand for the product in time to take corrective actions and save your job!

Pricing errors can be incredibly costly (Source: The Washington Post).

Lazy Lantern provides automated data analysis services for monitoring websites and mobile applications. Their clients need only to input the metric, such as the checkout, that they want to monitor for anomalies and where to send the notifications for the anomalies. Once that is done, Lazy Lantern uses its time-series anomaly detection model to monitor the chosen metric by counting the number of requests made for the metric by the user, i.e. for each hour, how many times the checkout was clicked and notify confirmed anomalies to the client by the chosen communication method.

This sounds pretty straight forward so far, but let me break down what is actually happening behind the scenes. The time series model is used to predict the range of ‘normal’ click rates based on historical patterns of activity. If the observed value is outside of the ‘predicted normal’ range for long enough (3 hours) then an anomaly score is calculated and compared to the threshold. Finally, if the anomaly score is above the threshold, it is reported to the client as an unusual activity. The currently deployed model was built using Prophet, an open-source library created by Facebook, to generate the ‘predicted normal’ range. Figure 1 below summarizes the process of anomaly detection.

Figure 1. How the anomalies are detected and reported by the current model in Lazy Lantern.

The current model with the anomaly score calculation works pretty well for catching anomalies lasting a long time (Figure 2). But, by design, potentially anomalous events that last less than three hours are not communicated to clients.

Figure 2. Example plot representing the reported anomalies by the current model (red shaded areas) and missing an anomaly which lasted less than 3 hours ( red circled one, FN = false negative)

In this context, I will call these short-term anomalies false negatives (about which the clients are not notified). However, these short-term anomalies could still cause significant financial losses for the business, and could also affect user experience and brand credibility. Therefore, Lazy Lantern was interested in finding ways to reduce these false negatives and for new ways of gaining insights about how the current model could be improved.

Challenges

Before I go further into my thought process of modeling, let me clarify the challenges associated with this project. These challenges are considered as the anomaly detection model that should work well with any arbitrary time series generated by metrics chosen by clients.

Challenge 1: Lazy Lantern’s data product should work well for all clients, with different data sources and varied metrics to monitor.

One type of model should fit various types of time series with different characteristics such as trends and seasonalities. One type of model means the time series model with a fixed set of parameters. Although a separate model is trained using the past activity of one particular time series from each chosen metric of a website/mobile application, the model parameters can not be tuned for the trend or seasonalities of the time series. Even for one website, there are several metrics to be monitored which each has unique characteristics, so it is not efficient if the model has to be customized every time it is trained.

Challenge 2: It is not feasible to infer the characteristics of each time series as it requires a certain amount of historical data or/and the domain knowledge.

From the first challenge, the model parameters are assumed not to be adjusted for each time series. If the training period is long enough, it may be possible to capture the characteristics like the trends or the seasonalities and have the parameters to be automatically adjusted. However, not every time series are guaranteed to have long enough historical data to do so, especially the data collected from the new clients. Therefore, it is only assumed that the past activity of each metric is available only enough to train the model but not enough to infer more information about the time series.

From these two points, even if I pick a time series from the company’s database with rich historical data and build a model that is optimized for the time series, the model would be useless as it is tailored for that specific case. This makes the project unconventional as a time series model is usually built to forecast a specific metric of interest then is used for anomaly detection.

Understanding the current model and exploring options

The first logical thing to do when there is an existing model is to understand how the model performs and to tune its parameters to improve its performance. According to the article about Prophet on the Facebook Research blog, its modeling approach is described as:

a very flexible regression model (somewhat like curve-fitting) instead of a traditional time series model for this task because it gives us more modeling flexibility, makes it easier to fit the model, and handles missing data or outliers more gracefully.

Moreover, Prophet uses Stan for model fitting and generating uncertainty intervals (the range that the normal activity level is supposed to fall in). Stan is a statistical programming language for the Bayesian statistical inference. To optimize the model, the parameters for seasonality and trends need to be updated so that they are used as prior information. With the challenges that are addressed above, updating prior information for each time series for optimizing the model is not possible.

On the other hand, Prophet is one of the best options to build a model with one set of parameters that fits various types of time series only using historical data; furthermore, it does not require input data to be preprocessed. For example, Auto Arima generates the optimal combination of parameters by itself, but the time series needs to be stationary which may require preprocessing for certain datasets. Moreover, a Prophet model does not require multiple time series data sets of features for forecasting to obtain good results unlike autoencoder techniques or clustering algorithms.

The second thing that I attempted after exploring parameter tuning was to consider changing the process described in Figure 1. But I couldn’t see the intuitive way to criticize or change the monitoring period or threshold. For example, performance improvement by changing the monitoring period from three hours to two hours to calculate the anomaly score would be hard to confirm. Even if a change in the criteria reduces the false negatives for specific time series that are tested, it may also increase false positives and the net effect cannot be estimated; so we cannot say it makes a definite improvement. It may, in fact, create a new problem with respect to the first challenge point. Making an arbitrary change in the process and testing would be a guess-and-check approach which is a bad idea especially since we don’t even have a labeled dataset to test out.

Getting creative: Double-checking system

Given the difficulties of working within the existing modeling framework currently used by Lazy Lantern, I started from the top again (Don’t feel bad for me, I gained so many insights from my previous attempts). While I was talking to one of my fellow Insight Fellows about my struggle, she gave me an idea of a double-checking system. So I started searching for simple and intuitive anomaly detection that could work together with the current model as an attachment. Finally, I ended up implementing the low pass filter (LPF) using a moving average. A moving average (rolling mean) takes an average of a subset of a full dataset, while a low pass filter is a filter that passes if the signal has a lower frequency than a fixed threshold and attenuates otherwise. The idea here is that the newly observed point is compared to the average of the past observations in a fixed window of time (μ) and if the new point is far from μ with the distance measured using moving standard deviation (σ), then it is considered as an outlier (anomaly).

Figure 3. How the low pass filter anomaly detection using a moving average works. The distance between the red point and the moving average (μ) of points inside the rectangle (window) is evaluated with the standard deviation(σ) of the points in the window as the unit. In this model, we consider the point as an anomaly if the distance is more than 3σ.

This approach is very simple to implement. In pandas, functions for DataFrame rolling(window = w) and rolling(window = w).std() are used for getting μ and σ with the window size, w.

Almost there, but not quite.

You may wonder if the window size, w, changes the results and why 3σ is chosen as the standard distance to decide the outlier.

As shown in Figure 4, w changes the anomaly detection results. However, as mentioned in Challenge 1, we want to eliminate the need to employ custom model parameters in order to make the anomaly detection approach robust across all customer use cases.

Figure 4. Moving average and standard deviation changes depending on the window size thus the result of anomaly detection is affected.

To make this new approach robust across cases, I took advantage of the double-checking system. Specifically, I constructed a connection between two models so that when a point was considered a potential anomaly by the LPF detector, it was also evaluated using the uncertainty interval generated by the Prophet model. This approach prevented additional false positives caused by the LPF detector. With that, I just focused on detecting as many false negative candidates as possible by using multiple window sizes with the LPF model. From 24 hours to the total training data length that the Prophet model uses(22 days), μ and σ were calculated with varied window sizes in 24 hours increments, w = 24, 48, …, 22× 24 hours. Then if the distance between a new observed point and μ is greater than 3σ for any corresponding w, it is considered as an anomaly by the LPF model. 3σ is chosen by the empirical rule with the assumption that data points are from the gaussian distribution.

The LPF model with the Prophet model worked really well and efficiently together. Although there were no labeled datasets available to calculate evaluation metrics, I hand-labeled false negatives, like the one in Figure 2, for 10 sets of time series and saw around 77% reduction in false negatives with the LPF model. Ultimately, Lazy Lantern would be able to get feedback from clients in the future to validate the combined modeling approach to optimize user experience following implementation.

By asking the right questions and considering the usages of the data products, I was able to come up with a creative analytical approach that draws on the strengths of multiple methodologies, while simultaneously minimizing the limitations of those models when used in isolation. Moreover, I am really glad that I could share the insights that I learned during the project to Lazy Lantern with the actionable solution to improve the model. Finally, I want to thank Bastien and Guillaume from Lazy Lantern for letting me have this amazing opportunity to work on the model and being so supportive throughout the process.

Are you ready to make a change & transition to a career in tech? Sign up to learn more about Insight Fellows programs and start your application today.