Anomaly detection systems work on two basic principles today:
- Model based - this is where a user specifies a definition of normal and the systems alerts a user whenever the system deviates from this definition. There are a few different types of model based systems:
- The simplest of these systems use user specified rules - e.g. Anti Money Laundering systems are often based on business users defining exceptional transactions as a set of scenarios that are programmed into a transaction monitoring system, which alerts an investigative unit whenever a transaction triggers one or more such scenarios.
- Momentum traders use simple models for time series data e.g. produce a buy signal whenever a specific financial instrument dips below its moving average by a % deviation. In this case, there’s no explicit definition of ‘normal’, but it is implicit in the algorithm.
- Supervised learning based - this is where users manually label anomalous events and then a supervised machine learning system is used to train a binary classifier to classify unseen data as an anomaly/not.
- Reconstruction based - these are really advanced systems, which learn a generative machine learning model of the data and alert/flag events whenever reconstruction error is high. A generative model tries to mimic its input i.e. it learns a model whose output is the same as its input
- Model based systems - It is really difficult to come up with a definition of normal and it drifts over time. Rules have issues since they are brittle and difficult to change/adapt.
- Supervised learning based systems. Two issues:
- Labelled data - to work well, these systems need lots of labelled data, which is both really tedious to collect and itself error prone.
- Global optimization - all supervised learning systems rely on global optimization i.e. a model that has low model error across each data point.
- Generalization - This is a form of model drift, where models that have low error on training data might still exhibit high error on unseen test data.
- Reconstruction based systems. This is still an active area of research and early results are encouraging, but the ‘reconstruction error’ is still to fine tune.
TDA is a general purpose manifold learning method that is able to combine the results of many machine learning methods. A few notes:
- It does not require labelled data, but can use it when present.
- It does not solve a global optimization problem - instead it solves a large number of local optimization problems which makes it robust to erroneous data points.
- Combines supervised and unsupervised learning in a single framework.
- The output is a network (in the computer science sense), which makes working with the output uniform i.e, the underlying machine learning methods can change, but the upstream usage need not.
Why is TDA good for AD?
The pipeline that we’re going to use is:
- Use unsupervised learning to construct a representative manifold. Note that this constructs a model of the data, which means that we avoid defining ‘normal’. This shows anomalies in the form of disconnected nodes and small networks [look at this picture]. We are going to use these small networks and disconnected components as training data.
- Use supervised machine learning to create models that distinguish the anomalies detected above from the rest of the data.
Note that we will use bagging and boosting throughout the procedure to prevent generalization errors.
- No need to label data by hand - it is created using step 1
- No prior assumption about ‘normal’
- No assumption about the underlying distribution of the data
- As data evolves, different underlying ML methods might work and it is easy to swap them out while keeping the rest of the pipeline the same.