r:data_transformations
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
r:data_transformations [2016/10/12 07:33] – created hkimscil | r:data_transformations [2019/09/19 18:23] (current) – [Splitting a Vector into Groups] hkimscil | ||
---|---|---|---|
Line 6: | Line 6: | ||
Warning message: | Warning message: | ||
패키지 ‘MASS’는 R 버전 3.2.5에서 작성되었습니다 | 패키지 ‘MASS’는 R 버전 3.2.5에서 작성되었습니다 | ||
- | > split(Cars93$MPG.city, | + | > split(Cars93$MPG.city, |
$USA | $USA | ||
[1] 22 19 16 19 16 16 25 25 19 21 18 15 | [1] 22 19 16 19 16 16 25 25 19 21 18 15 | ||
Line 39: | Line 39: | ||
[1] 23.86667 | [1] 23.86667 | ||
> | > | ||
+ | # or | ||
+ | > sapply(g, mean) | ||
+ | | ||
+ | 20.95833 23.86667 | ||
+ | # or retain list format | ||
+ | > lapply(g, mean) | ||
+ | $USA | ||
+ | [1] 20.95833 | ||
+ | |||
+ | $`non-USA` | ||
+ | [1] 23.86667 | ||
+ | |||
+ | |||
</ | </ | ||
+ | ====== Applying a Function to Each List Element ====== | ||
+ | < | ||
+ | 98, 87, 94, 77, 87, 98, 85, 89, 95, 85, 93, 93, | ||
+ | 97, 71, 97, 93, 75, 68, 98, 95, 79, 94, 98, 95) | ||
+ | S2 <- c(60, 98, 94, 95, 99, 97, 100, 73, 93, 91, 98, | ||
+ | 86, 66, 83, 77, 97, 91, 93, 71, 91, 95, 100, | ||
+ | 72, 96, 91, 76, 100, 97, 99, 95, 97, 77, 94, | ||
+ | 99, 88, 100, 94, 93, 86) | ||
+ | S3 <- c(95, 86, 90, 90, 75, 83, 96, 85, 83, 84, 81, 98, | ||
+ | 77, 94, 84, 89, 93, 99, 91, 77, 95, 90, 91, 87, | ||
+ | 85, 76, 99, 99, 97, 97, 97, 77, 93, 96, 90, 87, | ||
+ | 97, 88) | ||
+ | S4 <- c(67, 93, 63, 83, 87, 97, 96, 92, 93, 96, 87, 90, | ||
+ | 94, 90, 82, 91, 85, 93, 83, 90, 87, 99, 94, 88, | ||
+ | 90, 72, 81, 93, 93, 94, 97, 89, 96, 95, 82, 97) | ||
+ | |||
+ | scores <- list(S1=S1, | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | $S1 | ||
+ | [1] 89 85 85 86 88 89 86 82 96 85 93 91 98 87 94 77 87 98 85 89 | ||
+ | [21] 95 85 93 93 97 71 97 93 75 68 98 95 79 94 98 95 | ||
+ | |||
+ | $S2 | ||
+ | | ||
+ | [16] 97 91 93 71 91 95 100 72 96 91 76 100 97 99 95 | ||
+ | [31] 97 77 94 99 88 100 94 93 86 | ||
+ | |||
+ | $S3 | ||
+ | [1] 95 86 90 90 75 83 96 85 83 84 81 98 77 94 84 89 93 99 91 77 | ||
+ | [21] 95 90 91 87 85 76 99 99 97 97 97 77 93 96 90 87 97 88 | ||
+ | |||
+ | $S4 | ||
+ | [1] 67 93 63 83 87 97 96 92 93 96 87 90 94 90 82 91 85 93 83 90 | ||
+ | [21] 87 99 94 88 90 72 81 93 93 94 97 89 96 95 82 97 | ||
+ | </ | ||
+ | |||
+ | **lapply(list_name, | ||
+ | < | ||
+ | $S1 | ||
+ | [1] 36 | ||
+ | |||
+ | $S2 | ||
+ | [1] 39 | ||
+ | |||
+ | $S3 | ||
+ | [1] 38 | ||
+ | |||
+ | $S4 | ||
+ | [1] 36 | ||
+ | </ | ||
+ | |||
+ | **sapply(list_name, | ||
+ | < | ||
+ | S1 S2 S3 S4 | ||
+ | 36 39 38 36 | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | S1 | ||
+ | 88.77778 89.79487 89.23684 88.86111 | ||
+ | > sapply(scores, | ||
+ | | ||
+ | | ||
+ | </ | ||
+ | |||
+ | If the called function returns a vector, sapply will form the results into **a matrix**. The range function, for example, returns a two-element vector: | ||
+ | < | ||
+ | | ||
+ | [1,] 68 60 75 63 | ||
+ | [2,] 98 100 99 99 | ||
+ | </ | ||
+ | |||
+ | If the called function returns a structured object, such as a list, then you will need to use lapply rather than sapply. Structured objects cannot be put into a vector. Suppose we want to perform a t test on every semester. The t.test function returns a list, so we must use lapply: | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | |||
+ | ====== Applying a Function to Every Row ====== | ||
+ | |||
+ | < | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | > long <- matrix(longdata, | ||
+ | > colnames(long) <- c(" | ||
+ | > rownames(long) <- c(" | ||
+ | |||
+ | > long | ||
+ | | ||
+ | Moe | ||
+ | Larry -1.406571 -0.2804896 -0.1580926 -0.5407272 -0.2757667 | ||
+ | Curly -1.010482 | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | | ||
+ | -1.6529530 | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | Moe Larry Curly | ||
+ | [1,] -3.7170704 -0.1580926 -1.7086779 | ||
+ | [2,] -0.2804896 | ||
+ | </ | ||
+ | |||
+ | ====== Applying a Function to Every Column ====== | ||
+ | < | ||
+ | </ | ||
+ | 1 -> row by row | ||
+ | 2 -> column by column | ||
+ | |||
+ | ====== Applying a Function to Groups of Data ====== | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | <code csv suburbs.csv> | ||
+ | city county state pop | ||
+ | Chicago Cook IL 2853114 | ||
+ | Kenosha Kenosha WI 90352 | ||
+ | Aurora Kane IL 171782 | ||
+ | Elgin Kane IL 94487 | ||
+ | Gary Lake(IN) IN 102746 | ||
+ | Joliet Kendall IL 106221 | ||
+ | Naperville DuPage IL 147779 | ||
+ | Arlington Heights Cook IL 76031 | ||
+ | Bolingbrook Will IL 70834 | ||
+ | Cicero Cook IL 72616 | ||
+ | Evanston Cook IL 74239 | ||
+ | Hammond Lake(IN) IN 83048 | ||
+ | Palatine Cook IL 67232 | ||
+ | Schaumburg Cook IL 75386 | ||
+ | Skokie Cook IL 63348 | ||
+ | Waukegan Lake(IL) IL 91452 | ||
+ | |||
+ | </ | ||
+ | |||
+ | < | ||
+ | suburbs <- read.csv(" | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | > attach(suburbs) | ||
+ | > pop | ||
+ | [1] 2853114 | ||
+ | [10] | ||
+ | We can easily compute sums and averages for all the cities: | ||
+ | > sum(pop) | ||
+ | [1] 4240667 | ||
+ | > mean(pop) | ||
+ | [1] 265041.7 | ||
+ | </ | ||
+ | |||
+ | factors by county = 8 | ||
+ | < | ||
+ | [1] Cook | ||
+ | [7] DuPage | ||
+ | [13] Cook | ||
+ | 8 Levels: Cook DuPage Kane Kendall Kenosha Lake(IL) ... Will | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | Cook | ||
+ | | ||
+ | |||
+ | > tapply(pop, | ||
+ | Cook | ||
+ | 468852.3 147779.0 133134.5 106221.0 | ||
+ | </ | ||
+ | |||
+ | The function given to tapply should expect a single argument: a vector containing all the members of one group. A good example is the length function, which takes a vector parameter and returns the vector’s length. Use it to count the number of data in each group; in this case, the number of cities in each county: | ||
+ | < | ||
+ | Cook | ||
+ | | ||
+ | </ | ||
+ | |||
+ | ====== Applying a Function to Groups of Rows ====== | ||
+ | < | ||
+ | dfrm = the data frame, | ||
+ | fact = grouping factor, | ||
+ | fun = function. The function should expect one argument, a data frame. | ||
+ | < | ||
+ | sel <- Cars93[c(" | ||
+ | |||
+ | > by(sel, sel$Orig, summary) | ||
+ | sel$Orig: USA | ||
+ | | ||
+ | | ||
+ | | ||
+ | Dodge : 6 Median : | ||
+ | Pontiac | ||
+ | Buick : 4 3rd Qu.: | ||
+ | Oldsmobile: 4 Max. : | ||
+ | (Other) | ||
+ | ------------------------------------------------------------------ | ||
+ | sel$Orig: non-USA | ||
+ | | ||
+ | | ||
+ | | ||
+ | Nissan | ||
+ | Toyota | ||
+ | Volkswagen: 4 3rd Qu.: | ||
+ | Honda : 3 Max. : | ||
+ | (Other) | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | </ | ||
r/data_transformations.1476227005.txt.gz · Last modified: 2016/10/12 07:33 by hkimscil