Kaggle linear algorithm 연습니다.

1 minute read

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set_theme(color_codes = True)
data = {'House Size': [1380, 3120, 3520, 1130, 1030, 1720, 3920, 1490, 1860, 3430, 2000, 3660, 2500, 1220, 1390],
        'House Price':[76, 216, 238, 69, 50, 119, 282, 81, 132, 228, 145, 251, 170, 71, 29]
       }

df= pd.DataFrame(data, columns = ['House Size', 'House Price'])
df
House Size House Price
0 1380 76
1 3120 216
2 3520 238
3 1130 69
4 1030 50
5 1720 119
6 3920 282
7 1490 81
8 1860 132
9 3430 228
10 2000 145
11 3660 251
12 2500 170
13 1220 71
14 1390 29
plt.scatter(df['House Size'], df['House Price'])
plt.xlabel ='House Price'
plt.ylabel = 'House Size'
plt.titl= 'House Price by Size'
plt.show()

png

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15 entries, 0 to 14
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype
---  ------       --------------  -----
 0   House Size   15 non-null     int64
 1   House Price  15 non-null     int64
dtypes: int64(2)
memory usage: 368.0 bytes
df.describe()
House Size House Price
count 15.000000 15.000000
mean 2224.666667 143.800000
std 1033.902915 82.211574
min 1030.000000 29.000000
25% 1385.000000 73.500000
50% 1860.000000 132.000000
75% 3275.000000 222.000000
max 3920.000000 282.000000

How to plot a linear regression line on a scatter plot in Python

USE numpy.polyfit() TO PLOT A LINEAR REGRESSION LINE ON A SCATTER PLOT

Call

numpy.polyfit(x, y, deg)

with x and y as arrays of data for the scatter plot and deg as 1 to calculate the slope and y-intercept of the line of best fit.

Plot the linear regression line by calling

matplotlib.pyplot.plot(x, eq)

with x as the array of x-values and eq as the y-intercept added to the product of the slope and x.

plt.plot(df['House Size'], df['House Price'])
plt.set_xlabel ='House Price'
plt.set_ylabel = 'House Size'
plt.set_titl= 'House Price by Size'

m, b = np.polyfit(df['House Size'], df['House Price'], 1)
plt.plot(df['House Price'], m* df['House Price'], + b)

plt.show()

png

sns.regplot(x = 'House Size', y = 'House Price', data = df)
<AxesSubplot:xlabel='House Size', ylabel='House Price'>

png


Updated: