본문 바로가기
프로그래밍/머신러닝

비지도학습(AgglomerativeClustering)_파이썬으로 머신러닝 배우기

by 조크리 2021. 5. 26.
반응형

 

1. AgglomerativeClustering이란?

 

병합균집을 의미한다.

즉, 가장 가까운 것끼리 합치는 것을 의미한다. 

 

ward : 분산을 가장 작게 하는 방법

average : 평균 거리를 가장 짧게 하는 방법

 

출처 https://woolulu.tistory.com/48

 

2. Iris 데이터를 사용해보기

 

python
닫기
import numpy as np from sklearn.cluster import AgglomerativeClustering from sklearn import datasets from sklearn import metrics from sklearn.metrics.cluster import homogeneity_score,completeness_score,v_measure_score iris = datasets.load_iris() X = iris.data y = iris.target print(y) model = AgglomerativeClustering(n_clusters=3) model.fit(X) print(model.labels_) labels = model.labels_ print(labels) labels = model.labels_ print(labels) # Number of clusters in labels, ignoring noise if present. n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0) n_noise_ = list(labels).count(-1) print('Estimated number of clusters: %d' % n_clusters_) print('Estimated number of noise points: %d' % n_noise_) print("Homogeneity: %0.3f" % homogeneity_score(y, labels)) print("Completeness: %0.3f" % completeness_score(y, labels)) print("V-measure: %0.3f" %v_measure_score(y, labels))

 

 

3. 인자 바꾸어가며 확인하기 

python
닫기
import numpy as np from sklearn.cluster import AgglomerativeClustering from sklearn import datasets from sklearn import metrics from sklearn.metrics.cluster import homogeneity_score,completeness_score,v_measure_score iris = datasets.load_iris() X = iris.data y = iris.target print(y) for i in range(1,10): ​​model = AgglomerativeClustering(distance_threshold=i, n_clusters=None) ​​model.fit(X) ​​labels = model.labels_ ​​n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0) ​​n_noise_ = list(labels).count(-1) ​​print('Estimated number of clusters: %d' % n_clusters_) ​​print('Estimated number of noise points: %d' % n_noise_) ​​print("Homogeneity: %0.3f" % homogeneity_score(y, labels)) ​​print("Completeness: %0.3f" % completeness_score(y, labels)) ​​print("V-measure: %0.3f" %v_measure_score(y, labels))

반응형