Skip to main content

[Machine Learning] #10 Solving Problem of Overfitting


[Machine Learning] #10 Solving Problem of Overfitting 



[Write Infront]:
In the previous Machine Learning Topic, we talk about the Linear Regression and Logistic Regression, but we don't focus on the very important topic, Overfitting. 
************************************************************************

************************************************************************
[The Problem of Overfitting]:
What is Overfitting, in a very easy word, overfitting is the function we got is soo.. accurate for the training set but not generalize for the new. Typical overfitting means that error on the training data is very low but error on new data is high.
Do not do that soo accurate, because our life is not that accurate. right?

So, the problem is how we can avoid such problem?

1) Reduce the number of features:
  • Manually select which feature to keep.
  • Use model selection Algo (We will discuss later)
2) Regularization:
  • Keep all the features, but reduce the magnitude of parameters \({\theta _j}\)
  • Regularization works well when we have a lot of slightly useful features.
************************************************************************

************************************************************************
[Cost Function]:

Now, we are going to find some ideas to avoid overfitting. as the picture above shows, we cannot do soo accurate fit for the training sets, because our life is not that accurate. But I find that maybe a quadratic will be a good choice. what should we do?

For example, we want to make the following function more quadratic:
$${\theta _0} + {\theta _1}x + {\theta _2}{x^2} + {\theta _3}{x^3} + {\theta _4}{x^4}$$
what should we do? Panish \({x^3}\) and \({x^4}\), we don't want them to be that big. Which means we have to min \({\theta _3}\) and \({\theta _4}\). 

So, we modify our Cost Function as follows:
$$J(\theta ) = mi{n_\theta }{1 \over {2m}}\sum\limits_{i = 1}^m {{{({H_\theta }({x^{(i)}}) - {y^{(i)}})}^2} + 1000 \cdot \theta _3^2 + 1000 \cdot \theta _4^2} $$

What this function can help us?
Ans: not to make the function so "Aggressive", to become slightly up and down.

But how do you know which parameter should be "Punished"? 

After this, we can apply regularization on both linear regression and logistic regression 
************************************************************************

************************************************************************
[Regular Linear Regression]:
Now let's do linear Regression 1st. 2 method, Gradient Decent and Normal Equation

Gradient Descent:
we will modify our gradient decent function to separate \({\theta _0}\) out of the other parameters because we don't want to penalize  \({\theta _0}\). We repeat the following function until it converage:
$${\theta _0} = {\theta _0} - \alpha {1 \over m}\sum\limits_{i = 1}^m {({H_\theta }({x^i}) - {y^i})x_0^i} $$
$${\theta _j} = {\theta _j} - \alpha \left[ {{1 \over m}\left( {\sum\limits_{i = 1}^m {({H_\theta }({x^i}) - {y^i})x_0^i} } \right) + {\lambda  \over m}{\theta _j}} \right]$$
The term \({{\lambda  \over m}{\theta _j}}\) perform our regularization part. We can do some math manipulation and get:
$${\theta _j} = {\theta _j}(1 - \alpha {\lambda  \over m}) - \alpha {1 \over m}\sum\limits_{i = 1}^m {({H_\theta }({x^i}) - {y^i})x_j^i} $$
Please note that \(1 - \alpha {\lambda  \over m}\) will always bigger than 0.

Normal Equation:
Now let's do Normal Equation, to add in regularization part, the equation is same as our original, except that we add another term inside the equation:
$$\theta  = {({X^T}X + \lambda  \cdot L)^{ - 1}}{X^T}y$$
where


************************************************************************

************************************************************************
[Regularized Logistic Regression]:

Cost Function:
Recall the logistic function was
$$J(\theta ) =  - {1 \over m}\sum\limits_{i = 1}^m {[{y^i}\log ({H_\theta }({x^i})) + (1 - {y^i})\log (1 - {H_\theta }({x^i}))]} $$
now, we add regularization equation by adding a term to the end:
$$J(\theta ) =  - {1 \over m}\sum\limits_{i = 1}^m {[{y^i}\log ({H_\theta }({x^i})) + (1 - {y^i})\log (1 - {H_\theta }({x^i}))]}  + {\lambda  \over {2m}}\sum\limits_{j = 1}^n {\theta _j^2} $$

Gradient Descent:



************************************************************************


Comments

Popular posts from this blog

[LeetCode Solution 230]: Kth Smallest Element in a BST

Question: Given a binary search tree, write a function  kthSmallest  to find the  k th smallest element in it. ************************************************************************************************************************************ Write Infront To read to a tutorial, please to read the tutorial of in-order traversal of BST, please check: LeetCode Solution 94: Binary Tree Inorder Traversal We are going to solve this question using the following 4 methods: ->Binary Search ->Recursive ->Iterative ->Morris  Approach #1 Binary Search [Accepted] Detail Explanation The first method to solve this problem is using Binary Search. The idea is very easy and extremely to think. We use BST's property that the left child of the root is smaller than the root while the right child of the root is always bigger. We consider that the root is the pivot, and find the number of the nodes in the left subtree and the number of ...

[LeetCode Solution 145] Binary Tree Postorder Traversal

[LeetCode Solution 145]: Binary Tree Postorder Traversal Question: Given a binary tree, return the  postorder  traversal of its nodes' values. For example: Given binary tree  {1,#,2,3} , 1 \ 2 / 3 return  [3,2,1] . Approach #1 Recursive [Accepted] Detail Explanation The first method to solve this problem is using recursive. This is the classical method and straightforward. we can define a helper function to implement recursion. The java code is as follows: Java public class Solution { public List<Integer> postorderTraversal (TreeNode root) { List<Integer> res = new ArrayList<>(); helper(root, res); return res; } public void helper (TreeNode root, List<Integer> res) { if (root != null ) { if (root.left != null ) { helper(root.left, res); } if (root.right != null ) { helper(root.right, res); } res.add(root.val); } } } Complexity Analysis Ti...

[Interview]: URLify

[Interview]  URLify: -------------------------------------------------------------------------------------------------------------------------- Question: URLify: Write a method to replace all spaces in a string with ‘%20’, you may assume that the string has sufficient space at the end to hold the additional characters. Example  input: ' mr john smith '  output: ' mr %20john%20smith' --------------------------------------------------------------------------------------------------------------------------   Idea 1:  Start from the back and start replacing until the character is not ' ', and replace the characters in reverse order. Solution 1: public class Solution{ public String replace(char[] str) { boolean flag = false; StringBuffer sb = new StringBuffer(); for (int i = str.length - 1; i >= 0; i--) { if (str[i] != ' ') flag = true; if (flag == true) { if (str[i] == ' ') { s...