Introduction to Machine Learning | Linear Regression

Machine Learning (ML) is the field of study that gives computers the ability to learn without being explicitly programmed. It is a subset of Artificial Intelligence (AI) — the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it to learn for themselves.

Machine Learning Algorithms
• Supervised learning: Learn to predict target values from labelled data. Example: Predicting house prices, Email spam classifier.
• Unsupervised learning: Finding useful structure or knowledge in data when no labels are available. Example: Finding groups of similar customers, Detecting abnormal server access patterns.
Key Machine Learning Problems

Supervised Learning

• Classification: It is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known. Target values are discrete classes. An algorithm that implements classification, especially in a concrete implementation, is known as a classifier.
• Regression: Regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the ‘outcome variable’) and one or more independent variables (often called ‘predictors’, or ‘features’). The most common form of regression analysis is linear regression. Regression analysis is primarily used for two conceptually distinct purposes. First, regression analysis is widely used for prediction and forecasting. Second, in some situations regression analysis can be used to infer causal relationships between the independent and dependent variables. Likewise, an algorithm that implements regression is called a regressor.
Classification
• Logistic regression
• Multinomial logistic regression
• Probit regression
• Support vector machines
• Linear discriminant analysis
Regression
• Linear & multivariable linear regression.
• Polynomial Regression
• Stepwise Regression

There are many more algorithms & you could develop even your own algorithms. It all depends upon the type of problem you are trying to solve by machine learning.

Linear Regression

In statistics, linear regression is a linear approach to modeling the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables). The case of one explanatory variable is called simple linear regression. For more than one explanatory variable, the process is called multiple linear regression.

In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters are estimated from the data. Such models are called linear models.

Linear regression has many practical uses. Most applications fall into one of the following two broad categories:

• If the goal is prediction, forecasting, or error reduction linear regression can be used to make a prediction of the observatory data.
• If the goal is to explain variation in the response variable that can be attributed to variation in the explanatory variables, linear regression analysis can be applied to quantify the strength of the relationship between the response and the explanatory variables.[1]

Hypothesis function

h_ \theta (x) = \theta_ 0 + \theta_ 1 (x)

where,

h_ \theta (x) = prediction = hypothesis (dependent variable)

\theta_ i = parameters

x = input/features (independent variable)

Goal: Choose \theta_ i , such that hypothesis is close to expected output

How to select \theta_ i (parameters), so that it best fits the training data?

Cost Function (J)

Cost function(J) of Linear Regression is the Root Mean Squared Error (RMSE) between:
predicted y value (predi or h_ \theta (x) ) and true y value (yi).

To update θ values in order to reduce Cost function (minimising RMSE value) and achieving the best fit line, the model uses Gradient Descent – the idea is to start with random θ values and then iteratively updating the values, reaching minimum cost.

   \theta_{j}  =  \theta_{j} - \alpha  \frac{\partial}{{\partial \theta_{j}}}  J (\theta_{0}, \theta_{1})

Here,
j = 0, 1
\alpha = learning rate (alpha)
J (\theta_{0}, \theta_{1}) = cost function
We would repeatedly update \theta_{j} , until we converge to the minima

Intuition behind equation

As we could see from the above graph, θ will always converge towards local minima. Since, for values of θ lower than local minima value (ie with negative slope), the new θ value will increase and move towards the minima and vis-a-vis for the positive slope.

Now, the value of alpha (learning rate) could have different effects on the time to reach the minimum value and gradient descent’s performance.

This is called Linear regression in one variable i.e. when you have only one feature or input value. Next, we’ll learn about Linear regression with multiple variables and regularisation.

Footnotes
[1] Linear regression — Wikipedia