[Machine Learning] #10 Solving Problem of Overfitting
[Write Infront]:
In the previous Machine Learning Topic, we talk about the Linear Regression and Logistic Regression, but we don't focus on the very important topic, Overfitting.
************************************************************************
************************************************************************
[The Problem of Overfitting]:
What is Overfitting, in a very easy word, overfitting is the function we got is soo.. accurate for the training set but not generalize for the new. Typical overfitting means that error on the training data is very low but error on new data is high.
Do not do that soo accurate, because our life is not that accurate. right?
So, the problem is how we can avoid such problem?
1) Reduce the number of features:
- Manually select which feature to keep.
- Use model selection Algo (We will discuss later)
2) Regularization:
- Keep all the features, but reduce the magnitude of parameters θj
- Regularization works well when we have a lot of slightly useful features.
************************************************************************
************************************************************************
[Cost Function]:
Now, we are going to find some ideas to avoid overfitting. as the picture above shows, we cannot do soo accurate fit for the training sets, because our life is not that accurate. But I find that maybe a quadratic will be a good choice. what should we do?
For example, we want to make the following function more quadratic:
θ0+θ1x+θ2x2+θ3x3+θ4x4
what should we do? Panish x3 and x4, we don't want them to be that big. Which means we have to min θ3 and θ4.
So, we modify our Cost Function as follows:
J(θ)=minθ12mm∑i=1(Hθ(x(i))−y(i))2+1000⋅θ23+1000⋅θ24
What this function can help us?
Ans: not to make the function so "Aggressive", to become slightly up and down.
But how do you know which parameter should be "Punished"?
After this, we can apply regularization on both linear regression and logistic regression
Now, we are going to find some ideas to avoid overfitting. as the picture above shows, we cannot do soo accurate fit for the training sets, because our life is not that accurate. But I find that maybe a quadratic will be a good choice. what should we do?
For example, we want to make the following function more quadratic:
θ0+θ1x+θ2x2+θ3x3+θ4x4
what should we do? Panish x3 and x4, we don't want them to be that big. Which means we have to min θ3 and θ4.
So, we modify our Cost Function as follows:
J(θ)=minθ12mm∑i=1(Hθ(x(i))−y(i))2+1000⋅θ23+1000⋅θ24
What this function can help us?
Ans: not to make the function so "Aggressive", to become slightly up and down.
But how do you know which parameter should be "Punished"?
After this, we can apply regularization on both linear regression and logistic regression
************************************************************************
************************************************************************
************************************************************************
[Regular Linear Regression]:
Now let's do linear Regression 1st. 2 method, Gradient Decent and Normal Equation
Gradient Descent:
we will modify our gradient decent function to separate θ0 out of the other parameters because we don't want to penalize θ0. We repeat the following function until it converage:
θ0=θ0−α1mm∑i=1(Hθ(xi)−yi)xi0
θj=θj−α[1m(m∑i=1(Hθ(xi)−yi)xi0)+λmθj]
The term λmθj perform our regularization part. We can do some math manipulation and get:
θj=θj(1−αλm)−α1mm∑i=1(Hθ(xi)−yi)xij
Please note that 1−αλm will always bigger than 0.
Normal Equation:
Now let's do Normal Equation, to add in regularization part, the equation is same as our original, except that we add another term inside the equation:
θ=(XTX+λ⋅L)−1XTy
where
Now let's do linear Regression 1st. 2 method, Gradient Decent and Normal Equation
Gradient Descent:
we will modify our gradient decent function to separate θ0 out of the other parameters because we don't want to penalize θ0. We repeat the following function until it converage:
θ0=θ0−α1mm∑i=1(Hθ(xi)−yi)xi0
θj=θj−α[1m(m∑i=1(Hθ(xi)−yi)xi0)+λmθj]
The term λmθj perform our regularization part. We can do some math manipulation and get:
θj=θj(1−αλm)−α1mm∑i=1(Hθ(xi)−yi)xij
Please note that 1−αλm will always bigger than 0.
Normal Equation:
Now let's do Normal Equation, to add in regularization part, the equation is same as our original, except that we add another term inside the equation:
θ=(XTX+λ⋅L)−1XTy
where
************************************************************************
************************************************************************
[Regularized Logistic Regression]:
Recall the logistic function was
J(θ)=−1mm∑i=1[yilog(Hθ(xi))+(1−yi)log(1−Hθ(xi))]
now, we add regularization equation by adding a term to the end:
J(θ)=−1mm∑i=1[yilog(Hθ(xi))+(1−yi)log(1−Hθ(xi))]+λ2mn∑j=1θ2j
Gradient Descent:
************************************************************************
Comments
Post a Comment