Evaluating Model

Summary

High variance(over-fitting)
- More training samples
- Less features
- Increase / [SVM]Decrease
- [Neural Network] Less layers, less units
- [SVM] Increase
High bias(under-fitting)
- Get additional features
- Add polynomial features
- Decrease / [SVM]Increase
- [Neural Network] More layers, more units
- [SVM] Decrease

Testing

Testing is used to evaluate how well the model perform.

Split data into Training : Test = 7:3.

Cross Validation

Cross validation is used to use different data set to choose the best model and evaluate it.

Split data into Training : Cross Validation : Test = 6:2:2.
Use Cross Validation set to try on different models.
Use Test set to evaluate the best model selected by step 2

Learning Curve

Learning curve is to plot (training set size, ) and (training set size, ) so that we could know

High bias: High error on both curves. Curves flat out quickly, and get to same error.
High variance: High and low , and gap becomes smaller. Indicating more data could help.

Skewed Classes

When the composition of target is highly skewed(e.g. 99% of are the same value), we can get a low error ignoring inputs. So we cannot know whether the model improved or not.

Precision/Recall

We can trade off precision() and recall() by setting threshold of decision.

Score( score) =