Introduction to Machine Learning

Online Courses for machine learning

The machine learning course from Stanford in coursera is a great and famous resource to learn machine learning. If you want to start learning machine learning, even if you got no foundation, you should take a look at it.

What is Machine learning?

Machine learning is about using data to get a model that can describe and predict data.
Machine learning includes supervised learning and unsupervised learning.

  • Supervised learning is a machine learning which training data are labeled.
  • unsupervised learning is a machine learning which training data are not labeled.

Regression, Supervised

Outputs are real numbers.

Minimization Algorithms:

Linear Regression

Check out Linear Regression

Classification, Supervised

Outputs are discrete(0, 1, 2 ……).

  • Two-class classification
  • Multi-class classification
    • One-vs-all(one-vs-rest): make a classifier for each class

Clustering, Unsupervised

Output cluster centroids, giving clusters by distance to the centroids

Logistic Regression

Check out Logistic Regression

Clustering, Unsupervised

Over-fitting

Model perform accurate on training model, but do not generalize

Solutions:

  1. Reduce number of features
    • Manually select which features to keep
    • Model selection algorithm
  2. Regularization
    • Penalize by adding \lambda\theta to cost function, where \lambda is regularization parameter. (Do not penalize \theta_0 )

Notations

  • m : Number of training samples
  • x : “input” variables
  • y : “output” variables
  • (x^{(i)}, y^{(i)}) : i-th sample
  • x^{(i)}_j : j-th column of i-th sample
  • \theta : parameters of the model
  • h_\theta(x) : hypothesis function that takes input to estimate output.
  • J(\theta_0, \theta_1, …) : cost function that takes parameters to calculate the accuracy of prediction from hypothesis function.