Skip to main content

Posts

Showing posts from September, 2017

[Machine Learning] #10 Solving Problem of Overfitting

[Machine Learning] #10 Solving Problem of Overfitting  [Write Infront]: In the previous Machine Learning Topic, we talk about the Linear Regression and Logistic Regression, but we don't focus on the very important topic, Overfitting.  ************************************************************************ ************************************************************************ [The Problem of Overfitting]: What is Overfitting, in a very easy word, overfitting is the function we got is soo.. accurate for the training set but not generalize for the new. Typical overfitting means that error on the training data is very low but error on new data is high. Do not do that soo accurate, because our life is not that accurate. right? So, the problem is how we can avoid such problem? 1) Reduce the number of features: Manually select which feature to keep. Use model selection Algo (We will discuss later) 2) Regularization: Keep all the features,

[Machine Learning] #9 Multi-class Classification

[Machine Learning] #9 Multi-class Classification [Write Infront]: Binary classification posted in the previous blog  [Machine Learning]: #8 Logistic Model . The picture (Allen Turing) is different from the previous one no tense expression, so topic today seems like much easier to understand...😃 ********************************************************************************* ********************************************************************************* [Muti-Classification] The different is we want to seperate the output in diferent class, that's it! Instead of \(y = \{ 0, 1\}\), we will expand our defination so that  \(y = \{ 0, 1, 2, 3 ..., n\}\). So, for each \(y\), we have to divide our problem into \(n+1\)binary classification problems. In each one, we predict the probability that 'y' is a member of one of our classes. $$\eqalign{   & y \in \{ 0,1,......n\}   \cr   & H_\theta ^0(x) = P(y = 0|x,\theta )  \cr   & H_\theta ^1(

[Machine Learning]: #8 Logistic Model

[Machine Learning]: #8 Logistic Model *************************************** [Write Infront]: let's review the model we have already defined in  [Machine Learning]: #7 Classification Introduction   Training Sets $$\{ ({x^{(1)}},{y^{(1)}}),({x^{(2)}},{y^{(2)}}),.....({x^{(m)}},{y^{(m)}})\} $$ where  $$x \in \left[ {\matrix{    1  \cr    {{x_1}}  \cr    {...}  \cr    {{x_n}}  \cr  } } \right]$$ and \(x_0 = 1\ , y \in \{ 0,1\}\) $${H_\theta }(x) = {1 \over {1 + {e^{ - {\theta ^T}x}}}}$$. Our problem become to that how to define a Cost Function for \({H_\theta }(x)\) to determine a Best \(\theta \) ********************************************************************************* ********************************************************************************* [Cost Function for Logistic Function]: Some of you may ask why we don't use the  Cost Function for Linear Regression? It is because we want to define the Cost Function is Convex !!! Whi

[Machine Learning]: #7 Classification Introduction

[Machine Learning] #7 Classification Introduction **************************************** [What is Classification Problem]: if or not problem/ yes or no problem. Is it a spam email? Are you a good boy? Do you like this girl?......  [Logistic Regression Model]: Linear Regression performs badly in the classification problems. because the classification problem is not 'Continuous'. Intuitively, it also doesn't make sense for \({H_\theta }(x)\) which is larger than 1 or smaller than 0 we when knowing the \(y = \{ 0,1\}\), \(y\) is just 0 or 1. So we try to make the value of  \({H_\theta }(x)\) between 0 and 1, how can we do that? Yes, Logistic Function. $${H_\theta }(x) = g({\theta ^T}x)$$ $$z = {\theta ^T}x$$ $$g(z) = {1 \over {1 + {e^{ - z}}}}$$ And the following image shows that what the function looks like: [What does the funtion means]: \({H_\theta }(x)\) gives us the probability that the output is 1. For example.  \({H_\theta }(x)=0.8\

[Machine Learning]: #6 Model Selection and Cross Validation

[Machine Learning]: #6  Model Selection and Cross-Validation **************************************** Write Infront: The topic we discuss before give us linear and polynomial functions to fit the data. But which model is the best and how can we decide which one to choose? Today we are going to tell you how to choose a better model for the data. Let's Start Step by Step: Here is a good video:  Model Selection and Cross-Validation ********************************************************************************* [Model Selection]:  Use the Data we have and test it by the same data, it is horrible idea!!!! [Cross Validation]: [Step1]: choose a model, (maybe a linear regression with d =1) [Step2]: separate the data. [Step3]: For every data set they have to once test set. and get the error value(it is not official, but you know what I mean) ................ (after all these steps, calculate the average value of you mode1) [Step4]: Choose di

[Machine Learning]: #5 Computing Parameters Analytically

[Machine Learning]: Computing Parameters Analytically [Write Infront]: We have already know the Gradient Descent Algo, Today we are going to talk another way to min the Cost Function, we call this Normal Equation! [Normal Equation]: The Normal Equation Formula is given below: $$\theta  = {({X^T}X)^{ - 1}}{X^T}Y$$ here is an example of how to use it: If we use this method, there is no need to do Feature Scaling [Comparison]: Gradient Descent Normal Equation Need to choose \(\alpha\) No need to choose \(\alpha\) Needs many iterations No need to iterate \(O(k{n^2})\) \(O({n^3})\) Works well when n is large Slow if n is very large

[Machine Learning]: #4 Multi-Variables Linear Regression

[Machine Learning]: Multi-Variables Linear Regression **************************************** [Write on front]: lets us consider the house price prediction problem,  where the house price is related many parameters, i.e. house year, house size, number of bedrooms, house location......., the problem we are solving before is just one parameter. How to do the multi-variable linear regression is today's topic. ********************************************************************************* [Multiple Features]: We now introduce notation we gonna use in this chapter. \(x_j^i\) is the value of feature \(j\) in the \({i^{th}}\) training example. \(x^i\) is the input (features) of the  \({i^{th}}\) training example. \(m\) is the number of training example. \(n\) is the number of features. Features are the parameters, of house size, number of bedrooms ...... ********************************************************************************* [Hypothesis Funct

[LeetCode Solution 39]: Combination Sum

[LeetCode Solution 39]:  Combination Sum ********************************************************************************* Question: Given a  set  of candidate numbers ( C )  (without duplicates)  and a target number ( T ), find all unique combinations in  C  where the candidate numbers sum to  T . The  same  repeated number may be chosen from  C  unlimited number of times. Note: All numbers (including the target) will be positive integers. The solution set must not contain duplicate combinations. For example, given candidate set  [2, 3, 6, 7]  and target, 7 A solution set is:  [ [7], [2, 2, 3] ] -------------------------------------------------------------------------------------------------- Approach Recursive Method Intuition Most problems like this, which requires the return of all the required solutions, can be solved by recursive and the thinking part is similar. if you carefully study these topics are found in a routine, are required to

[Machine Learning]: #3 Parameter Learning

[Machine Learning]: Parameter Learning The problem is how to solve the math model I post in Bolg  [Machine Learning] Model and Cost Function . ------------------------------- Question and Math Model:          Hypothesis:                               \({H_\theta }(x) = {\theta _0} + {\theta _1}x\)          Parameters:                               \({\theta _0}, {\theta _1}\)          Cost Function:                          \(J({\theta _0},{\theta _1}) = {1 \over {2m}}\sum\nolimits_{i = 1}^m {{{(\hat y_i^{} - {y_i})}^2}}  = {1 \over {2m}}\sum\nolimits_{i = 1}^m {{{({H_\theta }({x_i}) - {y_i})}^2}} \)          Goal: minimize:                         \(J({\theta _0},{\theta _1})\) ------------------------------- In this Bolg, we are going to give a general method to solve this question. 1. Gradient Descent The \(z\) axis is the value of the cost function. and  \({\theta _0}\) and   \({\theta _1}\)   is the \(x\) axis and \(y\) axis.  Our goal is to find the