Saturday, November 6, 2021

Regression Analysis in Machine Learning

Regression analysis in supervised learning, uses one or more independent variables to describe the relationship between a dependent (target) and independent (predictor) variables. More specifically, regression analysis enables us to comprehend how, while other independent variables are held constant, the value of the dependent variable changes in relation to an independent variable. It forecasts real, continuous values like temperature, age, salary, and cost, among others.

We can understand the concept of regression analysis using the below example:

Example: Suppose there is a marketing company A, who does various advertisements every year to increase their sales based on that. The below list shows the advertisement made by the company in the last 10 years and the corresponding sales


The company is looking for a sales forecast for this year to plan an Rs. 150000 campaign for 2021. Regression analysis is therefore required in order to handle these kinds of prediction problems in machine learning.

Definition: Regression is a supervised learning method that enables us to predict the continuous output variable based on one or more predictor variables and aids in determining the correlation between variables. It is mostly used for forecasting, time series modeling, prediction, and establishing the causal connection between variables



Regression analysis-related terminologies: 

o Dependent Variable: In a regression study, the primary variable that we wish to predict or comprehend is referred to as the dependent variable. It also goes by the name target variable.

o Independent Variable: Also known as a predictor, independent variables are the elements that have an impact on the dependent variables or that are used to forecast their values.

o Outliers: An observation that deviates significantly from the norm in terms of either very low or very high values An outlier should be avoided as it might hurt the outcome.

o Multicollinearity: This situation is characterized by the independent variables having a higher correlation with one another than with other variables. It shouldn't be included in the dataset because it causes issues when determining which variable has the greatest impact.

o Overfitting and Underfitting: An overfitting problem occurs when our system performs well on the training dataset but poorly on the test dataset. Underfitting is the term used when an algorithm does not perform well even with training data.

No comments:

Post a Comment

Clustering in Machine Learning

Clustering is a type of unsupervised learning in machine learning where the goal is to group a set of objects in such a way that objects in...