Differences

This shows you the differences between two versions of the page.

--- r:data_transformations [2016/10/12 00:10] – hkimscil
+++ r:data_transformations [2019/09/19 09:23] (current) – [Splitting a Vector into Groups] hkimscil
@@ Line 6: / Line 6: @@
 Warning message:
 패키지 ‘MASS’는 R 버전 3.2.5에서 작성되었습니다
-> split(Cars93$MPG.city, Cars93$Origin)
+> split(Cars93$MPG.city, Cars93$Origin) # Origin별로 MPG.city를 나눠라
 $USA
  [1] 22 19 16 19 16 16 25 25 19 21 18 15
@@ Line 39: / Line 39: @@
 [1] 23.86667
 >
+# or
+> sapply(g, mean)
+     USA  non-USA
+.95833 23.86667
+# or retain list format
+> lapply(g, mean)
+$USA
+[1] 20.95833
+$`non-USA`
+[1] 23.86667
 </code>
 ====== Applying a Function to Each List Element ======
@@ Line 180: / Line 193: @@
 <code>suburbs <- read.csv("suburbs.csv", head=T, sep="	")
+suburbs <- read.csv("http://commres.net/wiki/_export/code/r/data_transformations?codeblock=15", head=T, sep="	")
 </code>
@@ Line 203: / Line 217: @@
 <code>> tapply(pop, county, sum)
-    Cook   DuPage     Kane  Kendall  Kenosha Lake(IL) Lake(IN)
+    Cook   DuPage     Kane  Kendall  Kenosha Lake(IL) Lake(IN)     Will
- 3281966   147779   266269   106221    90352    91452   185794
+ 3281966   147779   266269   106221    90352    91452   185794    70834
-    Will
+> tapply(pop,county,mean)
+    Cook   DuPage     Kane  Kendall  Kenosha Lake(IL) Lake(IN)     Will
+.3 147779.0 133134.5 106221.0  90352.0  91452.0  92897.0  70834.0
 </code>
+The function given to tapply should expect a single argument: a vector containing all the members of one group. A good example is the length function, which takes a vector parameter and returns the vector’s length. Use it to count the number of data in each group; in this case, the number of cities in each county:
+<code>> tapply(pop,county,length)
+    Cook   DuPage     Kane  Kendall  Kenosha Lake(IL) Lake(IN)     Will
+        1        2        1        1        1        2        1
+</code>
+====== Applying a Function to Groups of Rows ======
+<code>> by(dfrm, fact, fun)</code>
+dfrm = the data frame,
+fact = grouping factor,
+fun = function. The function should expect one argument, a data frame.
+<code>library("MASS")
+sel <- Cars93[c("Origin", "Manufacturer", "MPG.city", "MPG.highway", "EngineSize")]
+> by(sel, sel$Orig, summary)
+sel$Orig: USA
+     Origin       Manufacturer    MPG.city      MPG.highway      EngineSize
+ USA    :48   Chevrolet : 8    Min.   :15.00   Min.   :20.00   Min.   :1.300
+ non-USA: 0   Ford      : 8    1st Qu.:18.00   1st Qu.:26.00   1st Qu.:2.200
+              Dodge     : 6    Median :20.00   Median :28.00   Median :3.000
+              Pontiac   : 5    Mean   :20.96   Mean   :28.15   Mean   :3.067
+              Buick     : 4    3rd Qu.:23.00   3rd Qu.:30.00   3rd Qu.:3.800
+              Oldsmobile: 4    Max.   :31.00   Max.   :41.00   Max.   :5.700
+              (Other)   :13
+------------------------------------------------------------------
+sel$Orig: non-USA
+     Origin       Manufacturer    MPG.city      MPG.highway      EngineSize
+ USA    : 0   Mazda     : 5    Min.   :17.00   Min.   :21.00   Min.   :1.000
+ non-USA:45   Hyundai   : 4    1st Qu.:19.00   1st Qu.:25.00   1st Qu.:1.600
+              Nissan    : 4    Median :22.00   Median :30.00   Median :2.200
+              Toyota    : 4    Mean   :23.87   Mean   :30.09   Mean   :2.242
+              Volkswagen: 4    3rd Qu.:26.00   3rd Qu.:33.00   3rd Qu.:2.800
+              Honda     : 3    Max.   :46.00   Max.   :50.00   Max.   :4.500
+              (Other)   :21
+</code>
+<code>tapply(suburbs$pop, suburbs$state, summary)
+</code>
+<code>by(suburbs$pop, suburbs$state, summary)
+</code>