[Machine Learning]: Multi-Variables Linear Regression
****************************************
[Write on front]:
lets us consider the house price prediction problem, where the house price is related many parameters, i.e. house year, house size, number of bedrooms, house location......., the problem we are solving before is just one parameter. How to do the multi-variable linear regression is today's topic.
*********************************************************************************
[Multiple Features]:We now introduce notation we gonna use in this chapter.
\(x_j^i\) is the value of feature \(j\) in the \({i^{th}}\) training example.
\(x^i\) is the input (features) of the \({i^{th}}\) training example.
\(m\) is the number of training example.
\(n\) is the number of features.
Features are the parameters, of house size, number of bedrooms ......
*********************************************************************************
[Hypothesis Function of Multi-Variables]:
The hypothesis function is as follows:
$${H_\theta }(x) = {\theta _0}{x_0} + {\theta _1}{x_1} + {\theta _2}{x_2} + {\theta _3}{x_3} + .... + {\theta _n}{x_n}$$
To simplify the function, we use matrix to represent:
$${H_\theta }(x) = [{\theta _0},{\theta _1},{\theta _2}......{\theta _n}]\left[ {\matrix{
{{x_0}} \cr
{{x_1}} \cr
{...} \cr
{{x_n}} \cr
} } \right] = {\theta ^T}X$$
*********************************************************************************
[How to come up with a good hypothesis function?]:
Note:
if you forget Cost Function, please review:
Back to the question of how to solve the hypothesis function, Gradient Descent, obviously!
repeat the following function until it converage:
\[{\theta _0} = {\theta _0} - \alpha \frac{1}{m}\sum\limits_{i = 1}^m {({H_\theta }({x^i}) - {y^i})x_0^i} \]
\[{\theta _1} = {\theta _1} - \alpha \frac{1}{m}\sum\limits_{i = 1}^m {({H_\theta }({x^i}) - {y^i})x_1^i} \]
\[{\theta _2} = {\theta _2} - \alpha \frac{1}{m}\sum\limits_{i = 1}^m {({H_\theta }({x^i}) - {y^i})x_2^i} \] ......
\[{\theta _n} = {\theta _n} - \alpha \frac{1}{m}\sum\limits_{i = 1}^m {({H_\theta }({x^i}) - {y^i})x_n^i} \]In other words:
\[{\theta _j} = {\theta _j} - \alpha \frac{1}{m}\sum\limits_{i = 1}^m {({H_\theta }({x^i}) - {y^i})x_j^i} \]
where \(j\) is the number of parameters.
Please Note that we have to update \({\theta_j}\) simultaneously.
*********************************************************************************
[Some tips to make the Gradient Descent Faster]
#1. [Feature Scaling]:
The problem: may have when we decide to use the Gradient Descent to find the opt, is as the picture shows:
idea: to make the features are on the same scale sowe want to modify the range of our input variables so that they are roughly the same. Ideally:
\[ - 1 \le {x_i} \le 1\]
These are not the exact requirement, range from -3 to 3 is ok.
method: Feature Scaling and Mean Normalization, Feature Scaling involves dividing the input values by the range(i.e. max-min). Mean Normalization using:\[{x_i} = \frac{{{x_i} - {\mu _i}}}{{{s_i}}}\], where \({{\mu _i}}\) is the average, \({{s_i}}\) is the range value(max-min).
example: \({x_i}\) is the house price with the range of 100 to 2000 and the mean is 1000, then$${x_i} = {{price - 1000} \over {1900}}$$
#2. [Learning Rate \(\alpha\)]:
[Is the Gradient Descend Work?]:
As shown in the picture, if the Gradient Descend Algo works, the value of Cost Function \(J(\theta )\) has to decrease all the way. if the plot graph start to increase, change the value of \(\alpha\), this is so big. Similarly, if the Cost Function plot goes up and down, make your \(\alpha\) smaller.
[How the \(\alpha\) influent our Cost Function?]:
if \(\alpha\) is too small -> slow convergence;
if \(\alpha\) is too big, maybe cannot converge;
[How to choose \(\alpha\)?]
Try it!
Comments
Post a Comment