728x90

평균 이동 :¶

군집의 중심을 데이터가 모여 있는 밀도가 가장 높은 곳으로 이동시킵니다.

컴퓨터 비전 영역에서 뛰어난 역할을 수행하는 알고리즘 입니다.

이미지나 영상 데이터에서 특정 개체를 구분하거나 움직임을 추적하는 데 사용합니다.

In [12]:

# 군집화 데이터 생성기
from sklearn.datasets import make_blobs

# 평균 이동 (밀도가 가장 높은 곳) : MeanShift
# 밀도의 추정 성능을 위한 최적의 대역폭 계산 : estimate_bandwidth
from sklearn.cluster import MeanShift, estimate_bandwidth

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

데이터 생성

In [3]:

X, y = make_blobs(n_samples=200, n_features=2, centers =3, cluster_std=0.7, random_state=0)

MeanShift

In [5]:

# bandwidth가 높아질수록 부드럽게 평활화
meanshift = MeanShift(bandwidth=1)
cluster_labels = meanshift.fit_predict(X)
print('cluster labels 유형: ', np.unique(cluster_labels))

cluster labels 유형:  [0 1 2]

최적의 대역폭

In [10]:

best_bandwidth = estimate_bandwidth(X)
print('bandwidth 값:', round(bandwidth, 3))

bandwidth 값: 1.816

In [11]:

cluster_df = pd.DataFrame(data=X, columns=['ftr1','ftr2'])
cluster_df['target'] = y

meanshift = MeanShift(bandwidth=best_bandwidth)
cluster_labels = meanshift.fit_predict(X)
print('cluster labels 유형: ', np.unique(cluster_labels))

cluster labels 유형:  [0 1 2]

군집 시각화

In [13]:

cluster_df['meanshift_label']  = cluster_labels
centers = meanshift.cluster_centers_
unique_labels = np.unique(cluster_labels)
markers=['o', 's', '^', 'x', '*']

for label in unique_labels:
    label_cluster = cluster_df[cluster_df['meanshift_label']==label]
    center_x_y = centers[label]
    # 군집별로 다른 마커로 산점도 적용
    plt.scatter(x=label_cluster['ftr1'], y=label_cluster['ftr2'], edgecolor='k', marker=markers[label] )
    
    # 군집별 중심 표현
    plt.scatter(x=center_x_y[0], y=center_x_y[1], s=200, color='gray', alpha=0.9, marker=markers[label])
    plt.scatter(x=center_x_y[0], y=center_x_y[1], s=70, color='k', edgecolor='k', marker='$%d$' % label)
    
plt.show()

In [14]:

print(cluster_df.groupby('target')['meanshift_label'].value_counts())

target  meanshift_label
0       0                  67
1       1                  67
2       2                  66
Name: meanshift_label, dtype: int64

728x90

'Data Analytics with python > [Machine Learning ]' 카테고리의 다른 글

[Clustering] DBSCAN (0)	2023.02.15
[Clustering] GMM (0)	2023.02.15
[Clustering] K-means (0)	2023.02.14
[Dimension Reduction] NMF 변환 (0)	2023.02.13
[Dimension Reduction] SVD 변환 (0)	2023.02.13

Kang's Note

[Clustering] MeanShift

평균 이동 :¶

'Data Analytics with python > [Machine Learning ]' 카테고리의 다른 글

댓글

티스토리툴바

[Clustering] MeanShift

평균 이동 :¶

'Data Analytics with python > [Machine Learning ]' 카테고리의 다른 글

관련글

댓글

티스토리툴바