Anomaly Detection

We use anomaly detection instead of supervised learning since:

very small number of positive(anomaly) samples
anomalies looks differently
So we model negative samples only.

How it works

Model probability of occurrence from data
Identify anomaly by

Probability

We assume data are normally distributed

( is product symbol, return the product of all elements)

Where ,

By definition of normal distribution:

When not normally distributed

We could try transform data by taking or

Training/CV/Test

Training set: 60% normal data
CV set: 20% normal data, all anomalous
Test set: 20% normal data, all anomalous

Notice: As is highly skewed, we should evaluate it differently

Multivariate normal distribution

We could use multivariate normal distribution to capture correlations between features. Otherwise, we need to manually create features to do so. However, multivariate normal could be computationally expensive.

Also, this doesn’t work when is non-invertible ( or contains redundant features)

Where ,