[Machine Learning]: Parameter Learning
The problem is how to solve the math model I post in Bolg [Machine Learning] Model and Cost Function.
-------------------------------
Question and Math Model:
Hypothesis: Hθ(x)=θ0+θ1x
Parameters: θ0,θ1
Cost Function: J(θ0,θ1)=12m∑mi=1(ˆyi−yi)2=12m∑mi=1(Hθ(xi)−yi)2
Goal: minimize: J(θ0,θ1)
-------------------------------
In this Bolg, we are going to give a general method to solve this question.
1. Gradient Descent
The z axis is the value of the cost function. and θ0 and θ1 is the x axis and y axis. Our goal is to find the minimum the cost function.
The way we do this is by taking the derivative (the tangential line to a function) of our cost function. The slope of the tangent is the derivative at that point and it will give us a direction to move towards. We make steps down the cost function in the direction with the steepest descent. The size of each step is determined by the parameter α, which is called the learning rate.
For example, the distance between each start in the graph above represents a step determined by our parameter α, A smaller α would result in a smaller step and a larger α results in a larger step. The direction in which the step is taken is determined by the partial derivative ofJ(θ0,θ1) .
Please note: Depending on where one starts on the graph, one could end up at different points. which means that one start point may help us to get to the global min but another would not and reach to the local min.
2. How to reach to the min?
The problem is how to solve the math model I post in Bolg [Machine Learning] Model and Cost Function.
-------------------------------
Question and Math Model:
Hypothesis: Hθ(x)=θ0+θ1x
Parameters: θ0,θ1
Cost Function: J(θ0,θ1)=12m∑mi=1(ˆyi−yi)2=12m∑mi=1(Hθ(xi)−yi)2
Goal: minimize: J(θ0,θ1)
-------------------------------
In this Bolg, we are going to give a general method to solve this question.
1. Gradient Descent

The z axis is the value of the cost function. and θ0 and θ1 is the x axis and y axis. Our goal is to find the minimum the cost function.
The way we do this is by taking the derivative (the tangential line to a function) of our cost function. The slope of the tangent is the derivative at that point and it will give us a direction to move towards. We make steps down the cost function in the direction with the steepest descent. The size of each step is determined by the parameter α, which is called the learning rate.
For example, the distance between each start in the graph above represents a step determined by our parameter α, A smaller α would result in a smaller step and a larger α results in a larger step. The direction in which the step is taken is determined by the partial derivative of
Please note: Depending on where one starts on the graph, one could end up at different points. which means that one start point may help us to get to the global min but another would not and reach to the local min.
2. How to reach to the min?
Here is the gradient descent algo:
Repeat the follows function until it convergence:
3. More about the Learning Rate α
4. Gradient Descent For Linear Regression
From the equation above, we can modify our equations to:
Our job is to repeat the above function and stop when it converged. Please note that we have to use the entire training set on every step, and this is called batch gradient descent.
5. Summary of our First Machine Learning Algo:
Our goal is to min the Cost Function, and the realted functions are given as follows:
Hypothesis: Hθ(x)=θ0+θ1x
Parameters: θ0,θ1
Cost Function: J(θ0,θ1)=12m∑mi=1(ˆyi−yi)2=12m∑mi=1(Hθ(xi)−yi)2
Goal: minimize: J(θ0,θ1)
--> In order to min the Cost Function, the first step comes to my mind is using Gradient Descent. So we use partial derivate to get the following:
J(θ0,θ1)=12mm∑i=1(H(x)−yi)2
{θ0=θ0−αδδθ0J(θ0,θ1)θ1=θ1−αδδθ1J(θ0,θ1)
repeat until convergence: {θ0:=
}
--> Then we put our Hypothesis Function into the function above and get the follows:
{θ0=θ0−α1mm∑i=1(θ0+θ1xi−yi)θ1=θ1−α1mm∑i=1((θ0+θ1xi−yi)xi)
--> We use our training data set to train the function above and repeat until it converges.
θj=θj−αδδθjJ(θ0,θ1)
Important!!!!
For each iteration j, we have to update \({\theta _j}\) simultaneously. Which means that we have to do as following:
temp0=θ0−αδδθ0J(θ0,θ1)
temp1=θ1−αδδθ1J(θ0,θ1)
θ0=temp0
θ1=temp1
3. More about the Learning Rate α
- if the α is too small, we have to use more time to get to the opt. Because the learning rate is too small. Every time we update \({\theta _j}\), it changes a little.
- If the α is too big, then we study too much and can't reach to the opt.
- if we have already at the local opt, the \({\theta _j}\) not change.
From the equation above, we can modify our equations to:
5. Summary of our First Machine Learning Algo:
Our goal is to min the Cost Function, and the realted functions are given as follows:
Hypothesis: Hθ(x)=θ0+θ1x
Parameters: θ0,θ1
Cost Function: J(θ0,θ1)=12m∑mi=1(ˆyi−yi)2=12m∑mi=1(Hθ(xi)−yi)2
Goal: minimize: J(θ0,θ1)
--> In order to min the Cost Function, the first step comes to my mind is using Gradient Descent. So we use partial derivate to get the following:
J(θ0,θ1)=12mm∑i=1(H(x)−yi)2
{θ0=θ0−αδδθ0J(θ0,θ1)θ1=θ1−αδδθ1J(θ0,θ1)
repeat until convergence: {θ0:=
}
--> Then we put our Hypothesis Function into the function above and get the follows:
{θ0=θ0−α1mm∑i=1(θ0+θ1xi−yi)θ1=θ1−α1mm∑i=1((θ0+θ1xi−yi)xi)
--> We use our training data set to train the function above and repeat until it converges.
Comments
Post a Comment