We use anomaly detection instead of supervised learning since:
- very small number of positive(anomaly) samples
- anomalies looks differently
So we model negative samples only.
How it works
- Model probability of occurrence from data
- Identify anomaly by
Probability
We assume data are normally distributed
( is product symbol, return the product of all elements)
Where ,
By definition of normal distribution:
When not normally distributed
We could try transform data by taking or
Training/CV/Test
- Training set: 60% normal data
- CV set: 20% normal data, all anomalous
- Test set: 20% normal data, all anomalous
Notice: As is highly skewed, we should evaluate it differently
Multivariate normal distribution
We could use multivariate normal distribution to capture correlations between features. Otherwise, we need to manually create features to do so. However, multivariate normal could be computationally expensive.
Also, this doesn’t work when is non-invertible ( or contains redundant features)
Where ,