r:data_transformations
Differences
This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
| r:data_transformations [2016/10/11 23:03] – created hkimscil | r:data_transformations [2019/09/19 09:23] (current) – [Splitting a Vector into Groups] hkimscil | ||
|---|---|---|---|
| Line 6: | Line 6: | ||
| Warning message: | Warning message: | ||
| 패키지 ‘MASS’는 R 버전 3.2.5에서 작성되었습니다 | 패키지 ‘MASS’는 R 버전 3.2.5에서 작성되었습니다 | ||
| - | > split(Cars93$MPG.city, | + | > split(Cars93$MPG.city, |
| $USA | $USA | ||
| [1] 22 19 16 19 16 16 25 25 19 21 18 15 | [1] 22 19 16 19 16 16 25 25 19 21 18 15 | ||
| Line 39: | Line 39: | ||
| [1] 23.86667 | [1] 23.86667 | ||
| > | > | ||
| + | # or | ||
| + | > sapply(g, mean) | ||
| + | | ||
| + | 20.95833 23.86667 | ||
| + | # or retain list format | ||
| + | > lapply(g, mean) | ||
| + | $USA | ||
| + | [1] 20.95833 | ||
| + | |||
| + | $`non-USA` | ||
| + | [1] 23.86667 | ||
| + | |||
| + | |||
| </ | </ | ||
| + | ====== Applying a Function to Each List Element ====== | ||
| + | < | ||
| + | 98, 87, 94, 77, 87, 98, 85, 89, 95, 85, 93, 93, | ||
| + | 97, 71, 97, 93, 75, 68, 98, 95, 79, 94, 98, 95) | ||
| + | S2 <- c(60, 98, 94, 95, 99, 97, 100, 73, 93, 91, 98, | ||
| + | 86, 66, 83, 77, 97, 91, 93, 71, 91, 95, 100, | ||
| + | 72, 96, 91, 76, 100, 97, 99, 95, 97, 77, 94, | ||
| + | 99, 88, 100, 94, 93, 86) | ||
| + | S3 <- c(95, 86, 90, 90, 75, 83, 96, 85, 83, 84, 81, 98, | ||
| + | 77, 94, 84, 89, 93, 99, 91, 77, 95, 90, 91, 87, | ||
| + | 85, 76, 99, 99, 97, 97, 97, 77, 93, 96, 90, 87, | ||
| + | 97, 88) | ||
| + | S4 <- c(67, 93, 63, 83, 87, 97, 96, 92, 93, 96, 87, 90, | ||
| + | 94, 90, 82, 91, 85, 93, 83, 90, 87, 99, 94, 88, | ||
| + | 90, 72, 81, 93, 93, 94, 97, 89, 96, 95, 82, 97) | ||
| + | |||
| + | scores <- list(S1=S1, | ||
| + | </ | ||
| + | |||
| + | < | ||
| + | $S1 | ||
| + | [1] 89 85 85 86 88 89 86 82 96 85 93 91 98 87 94 77 87 98 85 89 | ||
| + | [21] 95 85 93 93 97 71 97 93 75 68 98 95 79 94 98 95 | ||
| + | |||
| + | $S2 | ||
| + | | ||
| + | [16] 97 91 93 71 91 95 100 72 96 91 76 100 97 99 95 | ||
| + | [31] 97 77 94 99 88 100 94 93 86 | ||
| + | |||
| + | $S3 | ||
| + | [1] 95 86 90 90 75 83 96 85 83 84 81 98 77 94 84 89 93 99 91 77 | ||
| + | [21] 95 90 91 87 85 76 99 99 97 97 97 77 93 96 90 87 97 88 | ||
| + | |||
| + | $S4 | ||
| + | [1] 67 93 63 83 87 97 96 92 93 96 87 90 94 90 82 91 85 93 83 90 | ||
| + | [21] 87 99 94 88 90 72 81 93 93 94 97 89 96 95 82 97 | ||
| + | </ | ||
| + | |||
| + | **lapply(list_name, | ||
| + | < | ||
| + | $S1 | ||
| + | [1] 36 | ||
| + | |||
| + | $S2 | ||
| + | [1] 39 | ||
| + | |||
| + | $S3 | ||
| + | [1] 38 | ||
| + | |||
| + | $S4 | ||
| + | [1] 36 | ||
| + | </ | ||
| + | |||
| + | **sapply(list_name, | ||
| + | < | ||
| + | S1 S2 S3 S4 | ||
| + | 36 39 38 36 | ||
| + | </ | ||
| + | |||
| + | < | ||
| + | S1 | ||
| + | 88.77778 89.79487 89.23684 88.86111 | ||
| + | > sapply(scores, | ||
| + | | ||
| + | | ||
| + | </ | ||
| + | |||
| + | If the called function returns a vector, sapply will form the results into **a matrix**. The range function, for example, returns a two-element vector: | ||
| + | < | ||
| + | | ||
| + | [1,] 68 60 75 63 | ||
| + | [2,] 98 100 99 99 | ||
| + | </ | ||
| + | |||
| + | If the called function returns a structured object, such as a list, then you will need to use lapply rather than sapply. Structured objects cannot be put into a vector. Suppose we want to perform a t test on every semester. The t.test function returns a list, so we must use lapply: | ||
| + | < | ||
| + | </ | ||
| + | |||
| + | |||
| + | ====== Applying a Function to Every Row ====== | ||
| + | |||
| + | < | ||
| + | | ||
| + | | ||
| + | | ||
| + | |||
| + | > long <- matrix(longdata, | ||
| + | > colnames(long) <- c(" | ||
| + | > rownames(long) <- c(" | ||
| + | |||
| + | > long | ||
| + | | ||
| + | Moe | ||
| + | Larry -1.406571 -0.2804896 -0.1580926 -0.5407272 -0.2757667 | ||
| + | Curly -1.010482 | ||
| + | </ | ||
| + | |||
| + | < | ||
| + | | ||
| + | -1.6529530 | ||
| + | </ | ||
| + | |||
| + | < | ||
| + | Moe Larry Curly | ||
| + | [1,] -3.7170704 -0.1580926 -1.7086779 | ||
| + | [2,] -0.2804896 | ||
| + | </ | ||
| + | |||
| + | ====== Applying a Function to Every Column ====== | ||
| + | < | ||
| + | </ | ||
| + | 1 -> row by row | ||
| + | 2 -> column by column | ||
| + | |||
| + | ====== Applying a Function to Groups of Data ====== | ||
| + | < | ||
| + | </ | ||
| + | |||
| + | <code csv suburbs.csv> | ||
| + | city county state pop | ||
| + | Chicago Cook IL 2853114 | ||
| + | Kenosha Kenosha WI 90352 | ||
| + | Aurora Kane IL 171782 | ||
| + | Elgin Kane IL 94487 | ||
| + | Gary Lake(IN) IN 102746 | ||
| + | Joliet Kendall IL 106221 | ||
| + | Naperville DuPage IL 147779 | ||
| + | Arlington Heights Cook IL 76031 | ||
| + | Bolingbrook Will IL 70834 | ||
| + | Cicero Cook IL 72616 | ||
| + | Evanston Cook IL 74239 | ||
| + | Hammond Lake(IN) IN 83048 | ||
| + | Palatine Cook IL 67232 | ||
| + | Schaumburg Cook IL 75386 | ||
| + | Skokie Cook IL 63348 | ||
| + | Waukegan Lake(IL) IL 91452 | ||
| + | |||
| + | </ | ||
| + | |||
| + | < | ||
| + | suburbs <- read.csv(" | ||
| + | </ | ||
| + | |||
| + | < | ||
| + | > attach(suburbs) | ||
| + | > pop | ||
| + | [1] 2853114 | ||
| + | [10] | ||
| + | We can easily compute sums and averages for all the cities: | ||
| + | > sum(pop) | ||
| + | [1] 4240667 | ||
| + | > mean(pop) | ||
| + | [1] 265041.7 | ||
| + | </ | ||
| + | |||
| + | factors by county = 8 | ||
| + | < | ||
| + | [1] Cook | ||
| + | [7] DuPage | ||
| + | [13] Cook | ||
| + | 8 Levels: Cook DuPage Kane Kendall Kenosha Lake(IL) ... Will | ||
| + | </ | ||
| + | |||
| + | < | ||
| + | Cook | ||
| + | | ||
| + | |||
| + | > tapply(pop, | ||
| + | Cook | ||
| + | 468852.3 147779.0 133134.5 106221.0 | ||
| + | </ | ||
| + | |||
| + | The function given to tapply should expect a single argument: a vector containing all the members of one group. A good example is the length function, which takes a vector parameter and returns the vector’s length. Use it to count the number of data in each group; in this case, the number of cities in each county: | ||
| + | < | ||
| + | Cook | ||
| + | | ||
| + | </ | ||
| + | |||
| + | ====== Applying a Function to Groups of Rows ====== | ||
| + | < | ||
| + | dfrm = the data frame, | ||
| + | fact = grouping factor, | ||
| + | fun = function. The function should expect one argument, a data frame. | ||
| + | < | ||
| + | sel <- Cars93[c(" | ||
| + | |||
| + | > by(sel, sel$Orig, summary) | ||
| + | sel$Orig: USA | ||
| + | | ||
| + | | ||
| + | | ||
| + | Dodge : 6 Median : | ||
| + | Pontiac | ||
| + | Buick : 4 3rd Qu.: | ||
| + | Oldsmobile: 4 Max. : | ||
| + | (Other) | ||
| + | ------------------------------------------------------------------ | ||
| + | sel$Orig: non-USA | ||
| + | | ||
| + | | ||
| + | | ||
| + | Nissan | ||
| + | Toyota | ||
| + | Volkswagen: 4 3rd Qu.: | ||
| + | Honda : 3 Max. : | ||
| + | (Other) | ||
| + | </ | ||
| + | |||
| + | < | ||
| + | </ | ||
| + | |||
| + | < | ||
| + | </ | ||
r/data_transformations.1476227005.txt.gz · Last modified: by hkimscil
