johnson_s_hierarchical_clustering
Differences
This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
| johnson_s_hierarchical_clustering [2016/11/21 08:51] – created hkimscil | johnson_s_hierarchical_clustering [2016/11/21 12:45] (current) – [E.g. 1] hkimscil | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ^ Cities | + | | | BOS | NY | DC | MIA | CHI | SEA | SF | LA | DEN | |
| - | | Boston, Mass. | + | | BOS |
| - | | Chicago, Ill. | + | | NY |
| - | | Denver, Colo. | + | | DC |
| - | | Los Angeles, Calif. | + | | MIA |
| - | | New York, N.Y. | + | | CHI |
| - | | San Francisco, Calif. | + | | SEA |
| - | | Seattle, Wash. | + | | SF | 3095 |
| - | | Washington, D.C. | + | | LA |
| - | + | | DEN | 1949 | 1771 | 1616 | 2037 | 996 | 1307 | 1235 | 1059 | 0 | | |
| - | {{ : | + | |
| - Start by assigning each item to its own cluster, so that if you have N items, you now have N clusters, each containing just one item. Let the distances (similarities) between the clusters equal the distances (similarities) between the items they contain. | - Start by assigning each item to its own cluster, so that if you have N items, you now have N clusters, each containing just one item. Let the distances (similarities) between the clusters equal the distances (similarities) between the items they contain. | ||
| Line 21: | Line 20: | ||
| * **complete-link clustering** (also called the diameter or maximum method) = the longest distance from any member of one cluster to any member of the other cluster. | * **complete-link clustering** (also called the diameter or maximum method) = the longest distance from any member of one cluster to any member of the other cluster. | ||
| * **average-link clustering** = the average distance from any member of one cluster to any member of the other cluster. | * **average-link clustering** = the average distance from any member of one cluster to any member of the other cluster. | ||
| - | |||
| | | BOS | NY | DC | MIA | CHI | SEA | SF | LA | DEN | | | | BOS | NY | DC | MIA | CHI | SEA | SF | LA | DEN | | ||
| - | | BOS | 0 | 206 | 429 | 1504 | 963 | 2976 | 3095 | 2979 | 1949 | | + | | BOS | 0 | **206** | 429 | 1504 | 963 | 2976 | 3095 | 2979 | 1949 | |
| | NY | 206 | 0 | 233 | 1308 | 802 | 2815 | 2934 | 2786 | 1771 | | | NY | 206 | 0 | 233 | 1308 | 802 | 2815 | 2934 | 2786 | 1771 | | ||
| | DC | 429 | 233 | 0 | 1075 | 671 | 2684 | 2799 | 2631 | 1616 | | | DC | 429 | 233 | 0 | 1075 | 671 | 2684 | 2799 | 2631 | 1616 | | ||
| Line 34: | Line 32: | ||
| | DEN | 1949 | 1771 | 1616 | 2037 | 996 | 1307 | 1235 | 1059 | 0 | | | DEN | 1949 | 1771 | 1616 | 2037 | 996 | 1307 | 1235 | 1059 | 0 | | ||
| + | - 가장 가까운 거리의 도시: BOS 와 NY, 206 | ||
| + | - 두 도시를 합하여 BOS/NY로 하고 다시 이를 포함한 도시들 간의 거리를 구함 | ||
| + | - single link 방법을 사용한다면 BOS/NY와 DC간의 거리는 223이 됨 (가장 가까운 거리를 클러스터와의 거리로 환산하는 방법이 single link method). 마찬가지로 DEN까지의 거리는 1771이 됨 | ||
| - | ^ | + | | |
| | BOS/ | | BOS/ | ||
| | DC | 223 | 0 | 1075 | 671 | 2684 | 2799 | 2631 | 1616 | | | DC | 223 | 0 | 1075 | 671 | 2684 | 2799 | 2631 | 1616 | | ||
| Line 45: | Line 46: | ||
| | DEN | 1771 | 1616 | 2037 | 996 | 1307 | 1235 | 1059 | 0 | | | DEN | 1771 | 1616 | 2037 | 996 | 1307 | 1235 | 1059 | 0 | | ||
| + | - BOS/NY와 가장 가까운 거리의 도시는 DC이고 거리는 223 | ||
| + | - BOS/NY/DC 로 클러스터링하고 이와 다른 도시들, 그리고 각 도시들 간의 거리를 다시 계산 | ||
| | | BOS/ | | | BOS/ | ||
| Line 55: | Line 58: | ||
| | DEN | 1616 | 2037 | 996 | 1307 | 1235 | 1059 | 0 | | | DEN | 1616 | 2037 | 996 | 1307 | 1235 | 1059 | 0 | | ||
| + | - 위에서 가장 가까운 도시들 간의 거리는 379이고 이는 SF와 LA 간의 거리 | ||
| + | - SF/LA로 합치고 다시 계산하여 매트릭스를 구함 | ||
| | | BOS/ | | | BOS/ | ||
| Line 64: | Line 69: | ||
| | DEN | 1616 | 2037 | 996 | 1307 | 1059 | 0 | | | DEN | 1616 | 2037 | 996 | 1307 | 1059 | 0 | | ||
| + | - 이제 CHI가 BOS/ | ||
| + | - BOS/ | ||
| - | | | BOS/ | + | | | BOS/ |
| - | | BOS/ | + | | BOS/ |
| | MIA | 1075 | 0 | 3273 | 2687 | 2037 | | | MIA | 1075 | 0 | 3273 | 2687 | 2037 | | ||
| | SEA | 2013 | 3273 | 0 | 808 | 1307 | | | SEA | 2013 | 3273 | 0 | 808 | 1307 | | ||
| Line 72: | Line 79: | ||
| | DEN | 996 | 2037 | 1307 | 1059 | 0 | | | DEN | 996 | 2037 | 1307 | 1059 | 0 | | ||
| - | + | - 같은 방법으로 SEA을 SF/LA에 병합 (SF/ | |
| - | | | BOS/ | + | |
| - | | BOS/ | + | | | BOS/ |
| + | | BOS/ | ||
| | MIA | 1075 | 0 | 2687 | 2037 | | | MIA | 1075 | 0 | 2687 | 2037 | | ||
| - | | SF/ | + | | SF/ |
| | DEN | 996 | 2037 | 1059 | 0 | | | DEN | 996 | 2037 | 1059 | 0 | | ||
| - | | | BOS/ | + | | | BOS/ |
| - | | BOS/ | + | | BOS/ |
| | MIA | 1075 | 0 | 2687 | | | MIA | 1075 | 0 | 2687 | | ||
| | SF/ | | SF/ | ||
| - | | | BOS/ | + | | | BOS/ |
| - | | BOS/ | + | | BOS/ |
| | MIA | 1075 | 0 | | | MIA | 1075 | 0 | | ||
| + | {{: | ||
| + | < | ||
| + | -------------------------------------------------------------------------------- | ||
| + | |||
| + | Method: | ||
| + | Type of Data: | ||
| + | Input dataset: | ||
| + | |||
| + | HIERARCHICAL CLUSTERING | ||
| + | |||
| + | M S | ||
| + | I E S L O N D H E | ||
| + | A A F A S Y C I N | ||
| + | |||
| + | Level 4 6 7 8 1 2 3 5 9 | ||
| + | ----- - - - - - - - - - | ||
| + | 206 . . . . XXX . . . | ||
| + | 233 . . . . XXXXX . . | ||
| + | 379 . . XXX XXXXX . . | ||
| + | 671 . . XXX XXXXXXX . | ||
| + | 808 . XXXXX XXXXXXX . | ||
| + | 996 . XXXXX XXXXXXXXX | ||
| + | | ||
| + | | ||
| + | |||
| + | |||
| + | |||
| + | Measures of cluster adequacy | ||
| + | |||
| + | 1 2 3 4 5 6 7 | ||
| + | | ||
| + | 1 | ||
| + | 2 | ||
| + | 3 Q-prime | ||
| + | 4 | ||
| + | |||
| + | |||
| + | Size of each cluster, expressed as a proportion of the total population clustered | ||
| + | |||
| + | | ||
| + | ----- ----- ----- ----- ----- ----- ----- ----- | ||
| + | 1 CL1 0.222 0.333 0.333 0.111 0.111 0.111 0.111 1.000 | ||
| + | 2 CL2 0.111 0.111 0.111 0.444 0.444 0.333 0.889 | ||
| + | 3 CL3 0.111 0.111 0.111 0.111 0.333 0.556 | ||
| + | 4 CL4 0.111 0.111 0.111 0.222 0.111 | ||
| + | 5 CL5 0.111 0.111 0.222 0.111 | ||
| + | 6 CL6 0.111 0.111 0.111 | ||
| + | 7 CL7 0.111 0.111 | ||
| + | 8 CL8 0.111 | ||
| + | |||
| + | Actor-by-Partition indicator matrix saved as dataset Part | ||
| + | |||
| + | ---------------------------------------- | ||
| + | Running time: 00:00:01 | ||
| + | Output generated: | ||
| + | UCINET 6.614 Copyright (c) 1992-2016 Analytic Technologies | ||
| + | |||
| + | </ | ||
| + | |||
| + | {{hiclus2.gif}} | ||
| + | {{hiclus4.gif}} | ||
| + | |||
| + | ====== E.g. 1 ====== | ||
| + | <code csv cities2.csv> | ||
| + | 206 0 233 1308 802 2815 2934 2786 1771 | ||
| + | 429 233 0 1075 671 2684 2799 2631 1616 | ||
| + | 1504 1308 1075 0 1329 3273 3053 2687 2037 | ||
| + | 963 802 671 1329 0 2013 2142 2054 996 | ||
| + | 2976 2815 2684 3273 2013 0 808 1131 1307 | ||
| + | 3095 2934 2799 3053 2142 808 0 379 1235 | ||
| + | 2979 2786 2631 2687 2054 1131 379 0 1059 | ||
| + | 1949 1771 1616 2037 996 1307 1235 1059 0 | ||
| + | |||
| + | </ | ||
| + | |||
| + | # Prepare Data | ||
| + | setwd(" | ||
| + | mydata <- read.csv(" | ||
| + | mydata <- na.omit(mydata) # listwise deletion of missing | ||
| + | mydata <- scale(mydata) # standardize variables | ||
johnson_s_hierarchical_clustering.1479685887.txt.gz · Last modified: by hkimscil
