User Tools

Site Tools


johnson_s_hierarchical_clustering
BOS NY DC MIA CHI SEA SF LA DEN
BOS 0 206 429 1504 963 2976 3095 2979 1949
NY 206 0 233 1308 802 2815 2934 2786 1771
DC 429 233 0 1075 671 2684 2799 2631 1616
MIA 1504 1308 1075 0 1329 3273 3053 2687 2037
CHI 963 802 671 1329 0 2013 2142 2054 996
SEA 2976 2815 2684 3273 2013 0 808 1131 1307
SF 3095 2934 2799 3053 2142 808 0 379 1235
LA 2979 2786 2631 2687 2054 1131 379 0 1059
DEN 1949 1771 1616 2037 996 1307 1235 1059 0
  1. Start by assigning each item to its own cluster, so that if you have N items, you now have N clusters, each containing just one item. Let the distances (similarities) between the clusters equal the distances (similarities) between the items they contain.
  2. Find the closest (most similar) pair of clusters and merge them into a single cluster, so that now you have one less cluster.
  3. Compute distances (similarities) between the new cluster and each of the old clusters.
  4. Repeat steps 2 and 3 until all items are clustered into a single cluster of size N.

Step 3 can be done in different ways, which is what distinguishes single-link from complete-link and average-link clustering.

  • single-link clustering (also called the connectedness or minimum method) = the shortest distance from any member of one cluster to any member of the other cluster.
  • complete-link clustering (also called the diameter or maximum method) = the longest distance from any member of one cluster to any member of the other cluster.
  • average-link clustering = the average distance from any member of one cluster to any member of the other cluster.
BOS NY DC MIA CHI SEA SF LA DEN
BOS 0 206 429 1504 963 2976 3095 2979 1949
NY 206 0 233 1308 802 2815 2934 2786 1771
DC 429 233 0 1075 671 2684 2799 2631 1616
MIA 1504 1308 1075 0 1329 3273 3053 2687 2037
CHI 963 802 671 1329 0 2013 2142 2054 996
SEA 2976 2815 2684 3273 2013 0 808 1131 1307
SF 3095 2934 2799 3053 2142 808 0 379 1235
LA 2979 2786 2631 2687 2054 1131 379 0 1059
DEN 1949 1771 1616 2037 996 1307 1235 1059 0
  1. 가장 가까운 거리의 도시: BOS 와 NY, 206
  2. 두 도시를 합하여 BOS/NY로 하고 다시 이를 포함한 도시들 간의 거리를 구함
  3. single link 방법을 사용한다면 BOS/NY와 DC간의 거리는 223이 됨 (가장 가까운 거리를 클러스터와의 거리로 환산하는 방법이 single link method). 마찬가지로 DEN까지의 거리는 1771이 됨
BOS/NY DC MIA CHI SEA SF LA DEN
BOS/NY 0 223 1308 802 2815 2934 2786 1771
DC 223 0 1075 671 2684 2799 2631 1616
MIA 1308 1075 0 1329 3273 3053 2687 2037
CHI 802 671 1329 0 2013 2142 2054 996
SEA 2815 2684 3273 2013 0 808 1131 1307
SF 2934 2799 3053 2142 808 0 379 1235
LA 2786 2631 2687 2054 1131 379 0 1059
DEN 1771 1616 2037 996 1307 1235 1059 0
  1. BOS/NY와 가장 가까운 거리의 도시는 DC이고 거리는 223
  2. BOS/NY/DC 로 클러스터링하고 이와 다른 도시들, 그리고 각 도시들 간의 거리를 다시 계산
BOS/NY/DC MIA CHI SEA SF LA DEN
BOS/NY/DC 0 1075 671 2684 2799 2631 1616
MIA 1075 0 1329 3273 3053 2687 2037
CHI 671 1329 0 2013 2142 2054 996
SEA 2684 3273 2013 0 808 1131 1307
SF 2799 3053 2142 808 0 379 1235
LA 2631 2687 2054 1131 379 0 1059
DEN 1616 2037 996 1307 1235 1059 0
  1. 위에서 가장 가까운 도시들 간의 거리는 379이고 이는 SF와 LA 간의 거리
  2. SF/LA로 합치고 다시 계산하여 매트릭스를 구함
BOS/NY/DC MIA CHI SEA SF/LA DEN
BOS/NY/DC 0 1075 671 2684 2631 1616
MIA 1075 0 1329 3273 2687 2037
CHI 671 1329 0 2013 2054 996
SEA 2684 3273 2013 0 808 1307
SF/LA 2631 2687 2054 808 0 1059
DEN 1616 2037 996 1307 1059 0
  1. 이제 CHI가 BOS/NY/DC/CHI와 가장 가까움 (671)
  2. BOS/NY/DC/CHI로 병합
BOS/NY/
DC/CHI
MIA SEA SF/LA DEN
BOS/NY/
DC/CHI
0 1075 2013 2054 996
MIA 1075 0 3273 2687 2037
SEA 2013 3273 0 808 1307
SF/LA 2054 2687 808 0 1059
DEN 996 2037 1307 1059 0
  1. 같은 방법으로 SEA을 SF/LA에 병합 (SF/LA/SEA)
BOS/NY/
DC/CHI
MIA SF/LA
/SEA
DEN
BOS/NY/
DC/CHI
0 1075 2013 996
MIA 1075 0 2687 2037
SF/LA/
SEA
2054 2687 0 1059
DEN 996 2037 1059 0
BOS/NY/DC/
CHI/DEN
MIA SF/LA/SEA
BOS/NY/DC/
CHI/DEN
0 1075 1059
MIA 1075 0 2687
SF/LA/SEA 1059 2687 0
BOS/NY/DC/CHI/
DEN/SF/LA/SEA
MIA
BOS/NY/DC/CHI/
DEN/SF/LA/SEA
0 1075
MIA 1075 0

JOHNSON'S HIERARCHICAL CLUSTERING
--------------------------------------------------------------------------------

Method:                                 SINGLE_LINK (minimum distance)
Type of Data:                           Dissimilarities
Input dataset:                          cities (D:\Users\Hyo\Documents\UCINET data\Cities\cities)

HIERARCHICAL CLUSTERING

        M S     B     C D
        I E S L O N D H E
        A A F A S Y C I N

Level   4 6 7 8 1 2 3 5 9
-----   - - - - - - - - -
  206   . . . . XXX . . .
  233   . . . . XXXXX . .
  379   . . XXX XXXXX . .
  671   . . XXX XXXXXXX .
  808   . XXXXX XXXXXXX .
  996   . XXXXX XXXXXXXXX
 1059   . XXXXXXXXXXXXXXX
 1075   XXXXXXXXXXXXXXXXX



Measures of cluster adequacy

                  1      2      3      4      5      6      7
             ------ ------ ------ ------ ------ ------ ------
  1     Eta  -0.284 -0.480 -0.554 -0.657 -0.711 -0.687 -0.151
  2       Q  -0.133 -0.163 -0.188 -0.203 -0.240 -0.214 -0.033
  3 Q-prime  -0.152 -0.190 -0.226 -0.254 -0.320 -0.322 -0.065
  4     E-I   0.994  0.973  0.961  0.884  0.824  0.625 -0.490


Size of each cluster, expressed as a proportion of the total population clustered

             1     2     3     4     5     6     7     8
         ----- ----- ----- ----- ----- ----- ----- -----
  1 CL1  0.222 0.333 0.333 0.111 0.111 0.111 0.111 1.000
  2 CL2  0.111 0.111 0.111 0.444 0.444 0.333 0.889      
  3 CL3  0.111 0.111 0.111 0.111 0.333 0.556            
  4 CL4  0.111 0.111 0.111 0.222 0.111                  
  5 CL5  0.111 0.111 0.222 0.111                        
  6 CL6  0.111 0.111 0.111                              
  7 CL7  0.111 0.111                                    
  8 CL8  0.111                                          

Actor-by-Partition indicator matrix saved as dataset Part

----------------------------------------
Running time:  00:00:01
Output generated:  21 11 16 09:10:06
UCINET 6.614 Copyright (c) 1992-2016 Analytic Technologies


E.g. 1

cities2.csv
0	206	429	1504	963	2976	3095	2979	1949
206	0	233	1308	802	2815	2934	2786	1771
429	233	0	1075	671	2684	2799	2631	1616
1504	1308	1075	0	1329	3273	3053	2687	2037
963	802	671	1329	0	2013	2142	2054	996
2976	2815	2684	3273	2013	0	808	1131	1307
3095	2934	2799	3053	2142	808	0	379	1235
2979	2786	2631	2687	2054	1131	379	0	1059
1949	1771	1616	2037	996	1307	1235	1059	0

# Prepare Data
setwd(“d:/rdata”)
mydata ← read.csv(“cities.csv”)
mydata ← na.omit(mydata) # listwise deletion of missing
mydata ← scale(mydata) # standardize variables

johnson_s_hierarchical_clustering.txt · Last modified: 2016/11/21 12:15 by hkimscil