[Machine Learning]: #8 Logistic Model
***************************************
[Write Infront]:
let's review the model we have already defined in [Machine Learning]: #7 Classification Introduction
Training Sets {(x(1),y(1)),(x(2),y(2)),.....(x(m),y(m))}
where x∈[1x1...xn]
and x0=1 ,y∈{0,1}
Hθ(x)=11+e−θTx. Our problem become to that how to define a Cost Function for Hθ(x) to determine a Best θ
*********************************************************************************
*********************************************************************************
and x0=1 ,y∈{0,1}
Hθ(x)=11+e−θTx. Our problem become to that how to define a Cost Function for Hθ(x) to determine a Best θ
*********************************************************************************
*********************************************************************************
[Cost Function for Logistic Function]:
Some of you may ask why we don't use the Cost Function for Linear Regression?
It is because we want to define the Cost Function is Convex !!! Which makes sure that if we find a min, it is the global min.
Now, let's define the Cost Function for the Logistic Function
Cost(Hθ(x),y)={−log(Hθ(x))y=1−log(1−Hθ(x))y=0
J(θ)=1mm∑i=1Cost(Hθ(xi),yi)
When y=1
then we have to see some properties of this function:
When y=0
Some of you may ask why we don't use the Cost Function for Linear Regression?
It is because we want to define the Cost Function is Convex !!! Which makes sure that if we find a min, it is the global min.
Now, let's define the Cost Function for the Logistic Function
Cost(Hθ(x),y)={−log(Hθ(x))y=1−log(1−Hθ(x))y=0
J(θ)=1mm∑i=1Cost(Hθ(xi),yi)
When y=1
then we have to see some properties of this function:
so, we Hθ(x) is approaching 1 which is the 'right answer' for the classification, the Cost Function is approaching to 0. On the contrary, when Hθ(x) is approaching 0 which is a 'wrong answer' for the classification. The cost is infinite.
so, we Hθ(x) is approaching 0 which is the 'right answer' for the classification, the Cost Function is approaching to 0. On the contrary, when Hθ(x) is approaching 1 which is a 'wrong answer' for the classification. The cost is infinite.
To Summary. We define a Cost Function as shown above which make sure that it is Convex Function.
*********************************************************************************
*********************************************************************************
*********************************************************************************
[Simpilified Cost Function and Gradient Descent]:
First, we simplify Cost(Hθ(x),y)={−log(Hθ(x))y=1−log(1−Hθ(x))y=0
to
Cost(Hθ(x),y)=−ylogHθ(x)−(1−y)log(1−Hθ(x))
They are same, don't believe? Try y=1 and y=0, what you get?
Then, we plug the New Cost(Hθ(x),y) in the Cost Function. We get the following simplified Cost Function.
J(θ)=−1m[m∑i=1yilogHθ(xi)+(1−yi)log(1−Hθ(xi))].
Then, as we did before, we have to find θ to minimize J(θ)
*********************************************************************************
*********************************************************** **********************
[Gradient Descent]:
minimize J(θ), we have to repeat the following function SIMULTANEOUSLY until it converge.
θj=θj−αδδθjJ(θ)
after math computations, we get:
θj=θj−αm∑i=1(Hθ(xi)−yi)xij
*********************************************************************************
to
Cost(Hθ(x),y)=−ylogHθ(x)−(1−y)log(1−Hθ(x))
They are same, don't believe? Try y=1 and y=0, what you get?
Then, we plug the New Cost(Hθ(x),y) in the Cost Function. We get the following simplified Cost Function.
J(θ)=−1m[m∑i=1yilogHθ(xi)+(1−yi)log(1−Hθ(xi))].
Then, as we did before, we have to find θ to minimize J(θ)
*********************************************************************************
*********************************************************** **********************
[Gradient Descent]:
minimize J(θ), we have to repeat the following function SIMULTANEOUSLY until it converge.
θj=θj−αδδθjJ(θ)
after math computations, we get:
θj=θj−αm∑i=1(Hθ(xi)−yi)xij
*********************************************************************************
[Advanced Optimization]:
- Gradient Descent (Covered)
- Conjugate Gradient
- BFGS
- L-BFGS
Advanced Optimization no need to manually select α and faster, but difficult .
I will try to make it easier to explain and I will post the important Algo this week. For implementation part, I use open source.
Comments
Post a Comment