[Machine Learning]: #8 Logistic Model
***************************************
[Write Infront]:
let's review the model we have already defined in [Machine Learning]: #7 Classification Introduction
Training Sets $$\{ ({x^{(1)}},{y^{(1)}}),({x^{(2)}},{y^{(2)}}),.....({x^{(m)}},{y^{(m)}})\} $$
where $$x \in \left[ {\matrix{
1 \cr
{{x_1}} \cr
{...} \cr
{{x_n}} \cr
} } \right]$$
and \(x_0 = 1\ , y \in \{ 0,1\}\)
$${H_\theta }(x) = {1 \over {1 + {e^{ - {\theta ^T}x}}}}$$. Our problem become to that how to define a Cost Function for \({H_\theta }(x)\) to determine a Best \(\theta \)
*********************************************************************************
*********************************************************************************
1 \cr
{{x_1}} \cr
{...} \cr
{{x_n}} \cr
} } \right]$$
and \(x_0 = 1\ , y \in \{ 0,1\}\)
$${H_\theta }(x) = {1 \over {1 + {e^{ - {\theta ^T}x}}}}$$. Our problem become to that how to define a Cost Function for \({H_\theta }(x)\) to determine a Best \(\theta \)
*********************************************************************************
*********************************************************************************
[Cost Function for Logistic Function]:
Some of you may ask why we don't use the Cost Function for Linear Regression?
It is because we want to define the Cost Function is Convex !!! Which makes sure that if we find a min, it is the global min.
Now, let's define the Cost Function for the Logistic Function
$${\mathop{\rm Cos}\nolimits} t({H_\theta }(x),y) = \left\{ {\matrix{
{ - \log ({H_\theta }(x))} & {y = 1} \cr
{ - \log (1 - {H_\theta }(x))} & {y = 0} \cr
} } \right.$$
$$J(\theta ) = {1 \over m}\sum\limits_{i = 1}^m {{\mathop{\rm Cos}\nolimits} t({H_\theta }({x^i}),{y^i})} $$
When \(y=1\)
then we have to see some properties of this function:
When \(y=0\)
Some of you may ask why we don't use the Cost Function for Linear Regression?
It is because we want to define the Cost Function is Convex !!! Which makes sure that if we find a min, it is the global min.
Now, let's define the Cost Function for the Logistic Function
$${\mathop{\rm Cos}\nolimits} t({H_\theta }(x),y) = \left\{ {\matrix{
{ - \log ({H_\theta }(x))} & {y = 1} \cr
{ - \log (1 - {H_\theta }(x))} & {y = 0} \cr
} } \right.$$
$$J(\theta ) = {1 \over m}\sum\limits_{i = 1}^m {{\mathop{\rm Cos}\nolimits} t({H_\theta }({x^i}),{y^i})} $$
When \(y=1\)
then we have to see some properties of this function:
so, we \({H_\theta }(x)\) is approaching 1 which is the 'right answer' for the classification, the Cost Function is approaching to 0. On the contrary, when \({H_\theta }(x)\) is approaching 0 which is a 'wrong answer' for the classification. The cost is infinite.
so, we \({H_\theta }(x)\) is approaching 0 which is the 'right answer' for the classification, the Cost Function is approaching to 0. On the contrary, when \({H_\theta }(x)\) is approaching 1 which is a 'wrong answer' for the classification. The cost is infinite.
To Summary. We define a Cost Function as shown above which make sure that it is Convex Function.
*********************************************************************************
*********************************************************************************
*********************************************************************************
[Simpilified Cost Function and Gradient Descent]:
First, we simplify $${\mathop{\rm Cos}\nolimits} t({H_\theta }(x),y) = \left\{ {\matrix{
{ - \log ({H_\theta }(x))} & {y = 1} \cr
{ - \log (1 - {H_\theta }(x))} & {y = 0} \cr
} } \right.$$
to
$${\mathop{\rm Cos}\nolimits} t({H_\theta }(x),y) = - y\log {H_\theta }(x) - (1 - y)\log (1 - {H_\theta }(x))$$
They are same, don't believe? Try \(y=1\) and \(y=0\), what you get?
Then, we plug the New \({\mathop{\rm Cos}\nolimits} t({H_\theta }(x),y)\) in the Cost Function. We get the following simplified Cost Function.
$$J(\theta ) = - {1 \over m}[\sum\limits_{i = 1}^m {{y^i}\log {H_\theta }({x^i}) + (1 - {y^i})\log (1 - {H_\theta }({x^i}))]} $$.
Then, as we did before, we have to find \(\theta\) to minimize \(J(\theta )\)
*********************************************************************************
*********************************************************** **********************
[Gradient Descent]:
minimize \(J(\theta )\), we have to repeat the following function SIMULTANEOUSLY until it converge.
$${\theta _j} = {\theta _j} - \alpha {\delta \over {\delta {\theta _j}}}J(\theta )$$
after math computations, we get:
$${\theta _j} = {\theta _j} - \alpha \sum\limits_{i = 1}^m {({H_\theta }({x^i}) - {y^i})x_j^i} $$
*********************************************************************************
{ - \log ({H_\theta }(x))} & {y = 1} \cr
{ - \log (1 - {H_\theta }(x))} & {y = 0} \cr
} } \right.$$
to
$${\mathop{\rm Cos}\nolimits} t({H_\theta }(x),y) = - y\log {H_\theta }(x) - (1 - y)\log (1 - {H_\theta }(x))$$
They are same, don't believe? Try \(y=1\) and \(y=0\), what you get?
Then, we plug the New \({\mathop{\rm Cos}\nolimits} t({H_\theta }(x),y)\) in the Cost Function. We get the following simplified Cost Function.
$$J(\theta ) = - {1 \over m}[\sum\limits_{i = 1}^m {{y^i}\log {H_\theta }({x^i}) + (1 - {y^i})\log (1 - {H_\theta }({x^i}))]} $$.
Then, as we did before, we have to find \(\theta\) to minimize \(J(\theta )\)
*********************************************************************************
*********************************************************** **********************
[Gradient Descent]:
minimize \(J(\theta )\), we have to repeat the following function SIMULTANEOUSLY until it converge.
$${\theta _j} = {\theta _j} - \alpha {\delta \over {\delta {\theta _j}}}J(\theta )$$
after math computations, we get:
$${\theta _j} = {\theta _j} - \alpha \sum\limits_{i = 1}^m {({H_\theta }({x^i}) - {y^i})x_j^i} $$
*********************************************************************************
[Advanced Optimization]:
- Gradient Descent (Covered)
- Conjugate Gradient
- BFGS
- L-BFGS
Advanced Optimization no need to manually select \(\alpha\) and faster, but difficult .
I will try to make it easier to explain and I will post the important Algo this week. For implementation part, I use open source.
Comments
Post a Comment