[Machine Learning]: Multi-Variables Linear Regression
****************************************
[Write on front]:
lets us consider the house price prediction problem, where the house price is related many parameters, i.e. house year, house size, number of bedrooms, house location......., the problem we are solving before is just one parameter. How to do the multi-variable linear regression is today's topic.
*********************************************************************************
[Multiple Features]:We now introduce notation we gonna use in this chapter.
xij is the value of feature j in the ith training example.
xi is the input (features) of the ith training example.
m is the number of training example.
n is the number of features.
Features are the parameters, of house size, number of bedrooms ......
*********************************************************************************
[Hypothesis Function of Multi-Variables]:
The hypothesis function is as follows:
Hθ(x)=θ0x0+θ1x1+θ2x2+θ3x3+....+θnxn
To simplify the function, we use matrix to represent:
Hθ(x)=[θ0,θ1,θ2......θn][x0x1...xn]=θTX
*********************************************************************************
[How to come up with a good hypothesis function?]:
Note:
if you forget Cost Function, please review:
Back to the question of how to solve the hypothesis function, Gradient Descent, obviously!
repeat the following function until it converage:
θ0=θ0−α1mm∑i=1(Hθ(xi)−yi)xi0
θ1=θ1−α1mm∑i=1(Hθ(xi)−yi)xi1
θ2=θ2−α1mm∑i=1(Hθ(xi)−yi)xi2 ......
θn=θn−α1mm∑i=1(Hθ(xi)−yi)xinIn other words:
θj=θj−α1mm∑i=1(Hθ(xi)−yi)xij
where j is the number of parameters.
Please Note that we have to update θj simultaneously.
*********************************************************************************
[Some tips to make the Gradient Descent Faster]
#1. [Feature Scaling]:
The problem: may have when we decide to use the Gradient Descent to find the opt, is as the picture shows:
idea: to make the features are on the same scale sowe want to modify the range of our input variables so that they are roughly the same. Ideally:
−1≤xi≤1
These are not the exact requirement, range from -3 to 3 is ok.
method: Feature Scaling and Mean Normalization, Feature Scaling involves dividing the input values by the range(i.e. max-min). Mean Normalization using:xi=xi−μisi, where μi is the average, si is the range value(max-min).
example: xi is the house price with the range of 100 to 2000 and the mean is 1000, thenxi=price−10001900
#2. [Learning Rate α]:
[Is the Gradient Descend Work?]:
As shown in the picture, if the Gradient Descend Algo works, the value of Cost Function J(θ) has to decrease all the way. if the plot graph start to increase, change the value of α, this is so big. Similarly, if the Cost Function plot goes up and down, make your α smaller.
[How the α influent our Cost Function?]:
if α is too small -> slow convergence;
if α is too big, maybe cannot converge;
[How to choose α?]
Try it!
Comments
Post a Comment