Linear regression is used when predicting numeric output from numeric input that has linear relationship.
Hypothesis Function
For convenience, define (to use matrix for calculation).
Cost Function
Squared Error
$
J(\theta) = \frac{1}{m}\displaystyle\sum_{i=1}^m{Cost(h_\theta(x^{(i)}), y^{(i)})}
$
$
Cost(h_\theta(x), y)=\frac{1}{2}(h_\theta(x)-y)^2
$
Optimization
Derivative for Gradient Descent
$
\frac{d}{d\theta_j}J(\theta)=\frac{1}{m}\displaystyle\sum_{i=1}^m{(h_\theta(x^{(i)})-y^{(i)})x^{(i)}_j}
$
Update :
$
\theta_j = \theta_j -\frac{\alpha}{m}\displaystyle\sum_{i=1}^m{(h_\theta(x^{(i)})-y^{(i)})x^{(i)}_j}
$
Vectorized implementation: