Wednesday, November 24, 2021

Classification in Supervised Machine Learning

Finding a function to divide the dataset into classes based on several parameters is the process of classification. In classification, data is divided into various classes by a computer program that has been trained on the training dataset.

Finding the mapping function to convert the input (x) to the discrete output is the goal of the classification algorithm (y).

Example: Email spam detection offers the clearest illustration of the Classification issue. When a new email arrives, the model determines whether it is spam or not based on training data from millions of emails on various parameters. The email gets placed in the Spam folder if it is considered spam.

What is Classification?

On the basis of training data, the Classification algorithm is a Supervised Learning technique that is used to categorize new observations. In classification, a program makes use of the dataset or observations that are provided to learn how to categorize/classify fresh observations into various classes or groups. For instance, Animal or Bird, Male or Female, Yes or No, 0 or 1, Spam or Not Spam. Targets, labels, or categories can all be used to describe classes.




Tuesday, November 23, 2021

Types of Regression

Regression comes in a variety of forms, and they are employed in data science and machine learning. The significance of each type varies depending on the situation, but fundamentally, all regression techniques examine the impact of the independent variable on the dependent variables. Here, we'll talk about a few significant types of regression, which are listed below:


  • Linear Regression
  • Logistic Regression
  • Support Vector Regression
  • Generalized Linear Models

Wednesday, November 17, 2021

Case Study: Working withTitanic Dataset

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

df=pd.read_csv('titanic_dataset.csv') 

 #dataset can be downloaded from www.kaggle.com

df


df.head() # to show top 5 rows

df.tail(2) # to show bottom 2 rows

df.nunique()

df['Survived'].unique()

s1=df['Survived'].unique()df['Survived'].value_counts() # to find number of people survived vs not survived

#using matplotlib

plt.bar(s1.index,s1.values)

plt.show()



plt.bar(['Not Survived','Survived'],s1.values)

plt.show()




#seaborn method

sns.countplot(x='Survived', data=df)

plt.show()





df['Sex'].value_counts() # to find number of people survived vs not survived

s1

#seaborn method

sns.countplot(x='Sex', data=df)

plt.show()



df['Survived']==1

df[df['Survived']==1]

df[df['Survived']==1]['Sex'].value_counts()


df.groupby(['Survived']).sum()


df.groupby(['Survived','Sex']).size()

sns.catplot(x='Survived',hue='Sex',kind='count',data=df)

plt.show()



#dealing with missing values

df.isnull()

mean_age=df['Age'].mean()

mean_age

29.69911764705882

df['Age']=df['Age'].fillna(mean_age)

sns.kdeplot(df['Age'])


Saturday, November 13, 2021

Implementation of Machine Learning using Python

 # General Changes

# 1. Labeling the x-axis and y-axis

# 2. Title of the Graph

# 3. Figure Size

# 4. Annotate on the graph

# 5. Scale of the graph

# 6. Grid on the graph


import matplotlib.pyplot as plt

import numpy as np

plt.figure(figsize=(10,3))

x=np.array([10,30,45,67,90])

y=np.array([12,56,27,36,67])

for i in range(len(x)):

    plt.text(x[i],y[i],(x[i],y[i]))

for i in range(len(x)):

    plt.text(x[i],y[i],f'   ({x[i]},{y[i]})')

  


plt.plot(x,y,color='r',ls='--',lw=3,marker='*',ms=20,markeredgecolor='g')

plt.title('Height-Weight Graph',fontsize=20,fontweight='bold')

plt.xlim(-10,100)

plt.ylim(0,70)

plt.xlabel('X-axis values')

plt.ylabel('Y-axis values')

plt.grid()

plt.xticks(np.arange(-10,101,10))

plt.savefig("mylineplot.png")

plt.show()


# Take a numpy array named person and give 3 names for the same

# Take another numpy array named height and give their respective heights

# Plot a bar graph indicating the same

person = np.array([ 'Mr A', 'Mr B', 'Mr C' ])
height = np.array( [145,146,1351])
weight=np.array([45,56,47])                   
plt.bar(person,height,width=-0.4,align='edge',color='r',label='height')
plt.bar(person,weight,width=0.4,align='edge',color='g',label='weight')
plt.legend()
plt.xlabel('Person Name')
plt.ylabel ('Height in cms')
plt.show()



cities=np.array(['Mumbai', 'Bangalore','Ahemdabad','Delhi' ])

cities = np.array(['Mumbai','Bangalore','Ahmedabad','Delhi'])
population = np.array([12442373,8443675,5577940,11034555])
plt.pie(population,explode=[0,0.1,0,0],labels=cities,autopct='%.2f%%')
plt.show()



Saturday, November 6, 2021

Regression Analysis in Machine Learning

Regression analysis in supervised learning, uses one or more independent variables to describe the relationship between a dependent (target) and independent (predictor) variables. More specifically, regression analysis enables us to comprehend how, while other independent variables are held constant, the value of the dependent variable changes in relation to an independent variable. It forecasts real, continuous values like temperature, age, salary, and cost, among others.

We can understand the concept of regression analysis using the below example:

Example: Suppose there is a marketing company A, who does various advertisements every year to increase their sales based on that. The below list shows the advertisement made by the company in the last 10 years and the corresponding sales


The company is looking for a sales forecast for this year to plan an Rs. 150000 campaign for 2021. Regression analysis is therefore required in order to handle these kinds of prediction problems in machine learning.

Definition: Regression is a supervised learning method that enables us to predict the continuous output variable based on one or more predictor variables and aids in determining the correlation between variables. It is mostly used for forecasting, time series modeling, prediction, and establishing the causal connection between variables



Regression analysis-related terminologies: 

o Dependent Variable: In a regression study, the primary variable that we wish to predict or comprehend is referred to as the dependent variable. It also goes by the name target variable.

o Independent Variable: Also known as a predictor, independent variables are the elements that have an impact on the dependent variables or that are used to forecast their values.

o Outliers: An observation that deviates significantly from the norm in terms of either very low or very high values An outlier should be avoided as it might hurt the outcome.

o Multicollinearity: This situation is characterized by the independent variables having a higher correlation with one another than with other variables. It shouldn't be included in the dataset because it causes issues when determining which variable has the greatest impact.

o Overfitting and Underfitting: An overfitting problem occurs when our system performs well on the training dataset but poorly on the test dataset. Underfitting is the term used when an algorithm does not perform well even with training data.

INTRODUCTION TO MACHINE LEARNING

Machine Learning was first defined by Arthur Samuel in early 90's describing it as,” A field of study that gives the ability to the computer for self-learn without being explicitly programmed”, that means giving machines information without hard-coding it.

"A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E." - Tom Mitchell

Machine learning focuses on data-driven learning based on actual interactions and is the process of teaching computers and digital devices to learn and carry out tasks the same way humans do.

Machine learning (ML), a subset of the widely adopted idea of Artificial Intelligence (AI), transforms data into knowledge for programs and applications that provide computers the ability to do human-like tasks. This data helps machines function better over time and increases their accuracy all the while.

Types of Machine Learning:

Machine Learning can be broadly classified as:


  • Supervised Machine Learning: Supervised learning is most often employed category of machine learning. In this learning, labeled data is used to train the machine learning algorithm. Such algorithm uses labeled samples to predict future events by applying knowledge from the past to fresh data. The weights are adjusted until the model is well fitted when input data is inputted into it. Regression and classification algorithms are used in supervised learning to make predictions or divide data into distinct classes.
  • Unsupervised Machine Learning: Unsupervised machine learning includes building models using data without labels or clearly stated outcomes. These algorithms look for concealed patterns or data clusters without human interaction. As this method may identify same and different patterns in data, it is useful for exploratory data analysis, consumer segmentation, cross-selling strategies, and the finding of images and patterns. Clustering and association techniques are utilized to implement the models in unsupervised learning.
  • Semi-Supervised Machine Learning: This learning falls in between supervised and unsupervised learning. It uses a small amount of labeled dataset during training in order to manage feature selection or extraction and classification from a larger set of unlabeled data. Semi-supervised learning can be used to provide a best solution to the problem of not having enough labeled data. It also helps in case of labeling more data which would be beneficial but too expensive.


Clustering in Machine Learning

Clustering is a type of unsupervised learning in machine learning where the goal is to group a set of objects in such a way that objects in...