728x90

DA_seaborn_python

seaborn

산점도, 회귀선 : 두 변수 간의 상관관계를 확인
선그래프, 막대그래프 : 시간의 변화에 따른 추이를 확인
박스플롯 : 데이터 분포의 중앙값을 확인
바이올린플롯 : 데이터의 최대값, 최소값을 확인
히스토그램 : 데이터의 분포
히트맵 : 여러 가지 변수를 한 번에 비교

데이터셋

In [54]:

import seaborn as sns
import pandas as pd

In [55]:

# print(dir(sns))

In [56]:

print(sns.get_dataset_names(), end='')

['anagrams', 'anscombe', 'attention', 'brain_networks', 'car_crashes', 'diamonds', 'dots', 'dowjones', 'exercise', 'flights', 'fmri', 'geyser', 'glue', 'healthexp', 'iris', 'mpg', 'penguins', 'planets', 'seaice', 'taxis', 'tips', 'titanic']

In [57]:

df = sns.load_dataset('tips')
df.head()

Out[57]:

	total_bill	tip	sex	smoker	day	time	size
0	16.99	1.01	Female	No	Sun	Dinner	2
1	10.34	1.66	Male	No	Sun	Dinner	3
2	21.01	3.50	Male	No	Sun	Dinner	3
3	23.68	3.31	Male	No	Sun	Dinner	2
4	24.59	3.61	Female	No	Sun	Dinner	4

산점도

sns.scatterplot(x, y)
hue 색상(그룹핑), size 점의 크기(그룹핑), style 점의 모양(그룹핑)

점의 수가 작다면 막대그래프가 유용
sns.barplot(x, y)

In [58]:

sns.scatterplot(x=df['total_bill'], y = df['tip'], hue= df['sex'], size= df['size'])

Out[58]:

<AxesSubplot: xlabel='total_bill', ylabel='tip'>

회귀선

sns.regplot(x , y)
scatter 산점도 표시여부, ci 신뢰구간(95%)

In [59]:

sns.regplot(x = df['total_bill'], y = df['tip'], scatter = False, ci = None)

Out[59]:

<AxesSubplot: xlabel='total_bill', ylabel='tip'>

선 그래프

sns.lineplot(x,y)
hue 색상(그룹핑), size 점의크기(그룹핑), style 점의모양(그룹핑)

In [60]:

sns.lineplot(x = df['day'], y = df['total_bill'], hue=df['tip'], errorbar=None)

Out[60]:

<AxesSubplot: xlabel='day', ylabel='total_bill'>

막대그래프

sns.barplot(x,y)
hue 색상(그룹핑), orient 수직 막대 'vertical' 수평막대 'horizontal', ci 신뢰구간(95%)

sns.countplot(x)
y축의 값을 x축 변수의 개수를 표시

In [61]:

sns.barplot(x=df['size'], y = df['tip'],errorbar=None)

Out[61]:

<AxesSubplot: xlabel='size', ylabel='tip'>

sns.pointplot(x,y)
데이터의 평균 값을 선으로 잇고 신뢰구간을 함께 표시

In [62]:

sns.pointplot(x = df['day'], y = df['tip'])

Out[62]:

<AxesSubplot: xlabel='day', ylabel='tip'>

박스플롯, 바이올린플롯

sns.boxplot(x,y)
hue 색상(그룹핑), orient 수직 막대 'vertical' 수평막대 'horizontal'

In [63]:

sns.boxplot(x = df['size'], y = df['tip'])

Out[63]:

<AxesSubplot: xlabel='size', ylabel='tip'>

sns.violinplot(x,y)
hue 색상(그룹핑), orient 수직 막대 'vertical' 수평막대 'horizontal'

In [64]:

sns.violinplot(x = df['size'], y = df['tip'])

Out[64]:

<AxesSubplot: xlabel='size', ylabel='tip'>

히스토그램

sns.histplot(data) : 연속형 변수의 분포
data 데이터프레임 또는 다차원배열, hue 색상(그룹핑), bins 막대의 개수, kde 커널 밀도 추정
sns.countplot(x) : 범주형 데이터의 분포

In [65]:

sns.histplot(data = df['total_bill'], bins = list(range(0,50,5)), kde = True)

Out[65]:

<AxesSubplot: xlabel='total_bill', ylabel='Count'>

sns.kdeplot(data) : 밀도 곡선으로 데이터 분포
sns.rugplot(data) : 카페트(rug) 모양처럼 작은 실모양의 선으로 데이터 분포

히트맵

sns.heatmap(data)
data 2D데이터셋 (pivot_table), annot 값 표시 여부
df.pivot_table(values, index, columns, aggfunc = )

In [66]:

pivot_df = df.pivot_table('tip','day','size', aggfunc = 'count')
pivot_df.fillna(0, inplace=True)

In [67]:

sns.heatmap(pivot_df, annot = True)

Out[67]:

<AxesSubplot: xlabel='size', ylabel='day'>

matplotlib

In [69]:

import matplotlib.pyplot as plt

In [74]:

plt.bar(range(len(df['tip'])),df['tip'])
plt.show()

도화지 설정

figure 구성요소 :
title 제목
xlabel x축 이름
ylable y축 이름
legend 범례
grid 눈금
subplots 여러 개의 도화지
figure(figsize=(가로, 세로)) 도화지 설정

In [79]:

fig = plt.figure(figsize=(10, 5))
plt.show()

<Figure size 720x360 with 0 Axes>

plt.subplots(nrows = 1, ncols = 1, sharex = False, sharey = False)

fig, ((ax1, ax2, ax3),(ax4,ax5,ax6)) = plt.subplots(nrwos =2, ncols =3) # 여섯 개의 도화지
fig.set_size_inches(16,8) # 사이즈 설정
ax1.set(title = '', xlabel = ) # 도화지 구성요소 설정
plt.barplot(x,y, ax = ax1)

자주 마주하는 문제 해결

한글 지원 : plt.rcParams['font.family'] = 'NanumGothic'

마이너스 표기
import matplotlib as mpl
mpl.rcParams['axes.unicode_minus'] = False

X축 겹침 방지
plt.xticks(rotation = 45)

728x90

'Data Analytics with python > [Data Analysis]' 카테고리의 다른 글

[Pandas][Series] S1_02_custom_index: 사용자 지정 인덱스 (0)	2023.01.17
[Pandas][Series] S1_01_Numeric Default Index: 기본 인덱스 (0)	2023.01.17
[실전 연습] 보험료 예측 (insurance) (0)	2023.01.17
[pandas] 기초 명령어 (0)	2023.01.08
[데이터프레임] Dataframe이란 (0)	2022.02.28

Kang's Note

[matplotlib & seaborn] 기초 명령어

'Data Analytics with python > [Data Analysis]' 카테고리의 다른 글

댓글

티스토리툴바

[matplotlib & seaborn] 기초 명령어

'Data Analytics with python > [Data Analysis]' 카테고리의 다른 글

관련글

댓글

티스토리툴바