Linear regression could not yield proper result in classification Problem, because outliers could affect the result.

Logistic Regression is still a classification algorithm, even though it is called regression.

Hypothesis Function

Logistic regression uses the “Sigmoid Function” (or “Logistic Function”) as hypothesis function:

Logistic Function

will give us the probability that our output is 1.

Decision Boundary

Decision boundary is on , also
It is a feature of the trained model (parameters).

Non linear decision boundaries:
Just create new features to fit.

Cost Function

$
J(\theta) = \frac{1}{m} \sum_{i=1}^m \mathrm{Cost}(h_\theta(x^{(i)}),y^{(i)})

Vectorized version:

Why not Using Squared Error？

If using squared error on sigmoid function, it will be non-convex, so that we can’t use gradient descent.

Optimization

Derivative for Gradient Descent

It looks identical to linear regression
$
\frac{d}{d\theta_j}J(\theta)=\frac{1}{m}\displaystyle\sum_{i=1}^m{(h_\theta(x^{(i)})-y^{(i)})x^{(i)}_j}

$
J(\theta_0,\theta_1,\theta_2, …) = \frac{1}{m}\displaystyle\sum_{i=1}^m{Cost(h_\theta(x^{(i)}), y^{(i)})}+\frac{\lambda}{2m}\displaystyle\sum_{j=1}^n{\theta^2_j}

( is regularization parameter)

Update :
$
\theta_j = \theta_j -\frac{\alpha}{m}\displaystyle\sum_{i=1}^m{(h_\theta(x^{(i)})-y^{(i)})x^{(i)}_j}

$
Vectorized implementation: