본문 바로가기
프로그래밍/머신러닝

비지도학습(AgglomerativeClustering)_파이썬으로 머신러닝 배우기

by 조크리 2021. 5. 26.
반응형

 

1. AgglomerativeClustering이란?

 

병합균집을 의미한다.

즉, 가장 가까운 것끼리 합치는 것을 의미한다. 

 

ward : 분산을 가장 작게 하는 방법

average : 평균 거리를 가장 짧게 하는 방법

 

출처 https://woolulu.tistory.com/48

 

2. Iris 데이터를 사용해보기

 

import numpy as np
from sklearn.cluster import AgglomerativeClustering
from sklearn import datasets
from sklearn import metrics
from sklearn.metrics.cluster import homogeneity_score,completeness_score,v_measure_score

iris = datasets.load_iris()
X = iris.data
y = iris.target
print(y)
model = AgglomerativeClustering(n_clusters=3)
model.fit(X)
print(model.labels_)
labels = model.labels_
print(labels)

labels = model.labels_
print(labels)
# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
n_noise_ = list(labels).count(-1)
print('Estimated number of clusters: %d' % n_clusters_)
print('Estimated number of noise points: %d' % n_noise_)
print("Homogeneity: %0.3f" % homogeneity_score(y, labels))
print("Completeness: %0.3f" % completeness_score(y, labels))
print("V-measure: %0.3f" %v_measure_score(y, labels))

 

 

3. 인자 바꾸어가며 확인하기 

import numpy as np
from sklearn.cluster import AgglomerativeClustering
from sklearn import datasets
from sklearn import metrics
from sklearn.metrics.cluster import homogeneity_score,completeness_score,v_measure_score
iris = datasets.load_iris()
X = iris.data
y = iris.target
print(y)

for i in range(1,10):
  model = AgglomerativeClustering(distance_threshold=i, n_clusters=None)
  model.fit(X)
  labels = model.labels_
  n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
  n_noise_ = list(labels).count(-1)
  print('Estimated number of clusters: %d' % n_clusters_)
  print('Estimated number of noise points: %d' % n_noise_)
  print("Homogeneity: %0.3f" % homogeneity_score(y, labels))
  print("Completeness: %0.3f" % completeness_score(y, labels))
  print("V-measure: %0.3f" %v_measure_score(y, labels))

반응형