Kaggle linear algorithm 연습니다.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set_theme(color_codes = True)
data = {'House Size': [1380, 3120, 3520, 1130, 1030, 1720, 3920, 1490, 1860, 3430, 2000, 3660, 2500, 1220, 1390],
'House Price':[76, 216, 238, 69, 50, 119, 282, 81, 132, 228, 145, 251, 170, 71, 29]
}
df= pd.DataFrame(data, columns = ['House Size', 'House Price'])
df
| House Size | House Price | |
|---|---|---|
| 0 | 1380 | 76 |
| 1 | 3120 | 216 |
| 2 | 3520 | 238 |
| 3 | 1130 | 69 |
| 4 | 1030 | 50 |
| 5 | 1720 | 119 |
| 6 | 3920 | 282 |
| 7 | 1490 | 81 |
| 8 | 1860 | 132 |
| 9 | 3430 | 228 |
| 10 | 2000 | 145 |
| 11 | 3660 | 251 |
| 12 | 2500 | 170 |
| 13 | 1220 | 71 |
| 14 | 1390 | 29 |
plt.scatter(df['House Size'], df['House Price'])
plt.xlabel ='House Price'
plt.ylabel = 'House Size'
plt.titl= 'House Price by Size'
plt.show()

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15 entries, 0 to 14
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 House Size 15 non-null int64
1 House Price 15 non-null int64
dtypes: int64(2)
memory usage: 368.0 bytes
df.describe()
| House Size | House Price | |
|---|---|---|
| count | 15.000000 | 15.000000 |
| mean | 2224.666667 | 143.800000 |
| std | 1033.902915 | 82.211574 |
| min | 1030.000000 | 29.000000 |
| 25% | 1385.000000 | 73.500000 |
| 50% | 1860.000000 | 132.000000 |
| 75% | 3275.000000 | 222.000000 |
| max | 3920.000000 | 282.000000 |
How to plot a linear regression line on a scatter plot in Python
USE numpy.polyfit() TO PLOT A LINEAR REGRESSION LINE ON A SCATTER PLOT
Call
numpy.polyfit(x, y, deg)
with x and y as arrays of data for the scatter plot and deg as 1 to calculate the slope and y-intercept of the line of best fit.
Plot the linear regression line by calling
matplotlib.pyplot.plot(x, eq)
with x as the array of x-values and eq as the y-intercept added to the product of the slope and x.
plt.plot(df['House Size'], df['House Price'])
plt.set_xlabel ='House Price'
plt.set_ylabel = 'House Size'
plt.set_titl= 'House Price by Size'
m, b = np.polyfit(df['House Size'], df['House Price'], 1)
plt.plot(df['House Price'], m* df['House Price'], + b)
plt.show()

sns.regplot(x = 'House Size', y = 'House Price', data = df)
<AxesSubplot:xlabel='House Size', ylabel='House Price'>
