> v <- c(10, 20, 30) > names(v) <- c("Moe", "Larry", "Curly") > print(v) Moe Larry Curly 10 20 30
> v["Larry"] Larry 20
n = c(2, 3, 5) s = c("aa", "bb", "cc", "dd", "ee") b = c(TRUE, FALSE, TRUE, FALSE, FALSE) x = list(n, s, b, 3) # x contains copies of n, s, b
> x[2] [[1]] [1] "aa" "bb" "cc" "dd" "ee"
> x[c(2, 4)] [[1]] [1] "aa" "bb" "cc" "dd" "ee" [[2]] [1] 3
> x[[2]] [1] "aa" "bb" "cc" "dd" "ee" > x[[2]][2] [1] "bb"
x[[2]][1] <- "xx" # instead of "aa" x[[2]]
Object | Example | Mode |
---|---|---|
Number | 3.1415 | numeric |
Vector of numbers | c(2, 7.182, 3.1415) | numeric |
Character string | "Moe" | character |
Vector of character strings | c("Moe", "Larry", "Curly") | character |
Factor | factor(c("NY", "CA", "IL")) | numeric |
List | list("Moe", "Larry", "Curly") | list |
Data frame | data.frame(x=1:3, y=c("NY", "CA", "IL")) | list |
Function | print | function |
In R, every object also has a class, which defines its abstract type. The terminology is borrowed from object-oriented programming. A single number could represent many different things: a distance, a point in time, a weight. All those objects have a mode of “numeric” because they are stored as a number; but they could have different classes to indicate their interpretation.
For example, a Date object consists of a single number:
> d <- as.Date("2010-03-15") > mode(d) [1] "numeric" > length(d) [1] 1
But it has a class of Date, telling us how to interpret that number; namely, as the number of days since January 1, 1970:
> class(d) [1] "Date"
> A <- 1:6 > dim(A) NULL > print(A) [1] 1 2 3 4 5 6
We give dimensions to the vector when we set its dim attribute. Watch what happens when we set our vector dimensions to 2 × 3 and print it:
> dim(A) <- c(2,3) > print(A) [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6
A matrix can be created from a list, too. Like a vector, a list has a dim attribute, which is initially NULL:
> B <- list(1,2,3,4,5,6) > dim(B) NULL
If we set the dim attribute, it gives the list a shape:
> dim(B) <- c(2,3) > print(B) [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6
The discussion of matrices can be generalized to 3-dimensional or even n-dimensional structures: just assign more dimensions to the underlying vector (or list). The following example creates a 3-dimensional array with dimensions 2 × 3 × 2:
> D <- 1:12 > dim(D) <- c(2,3,2) > print(D) , , 1 [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6 , , 2 [,1] [,2] [,3] [1,] 7 9 11 [2,] 8 10 12
A factor looks like a vector, but it has special properties. R keeps track of the unique values in a vector, and each unique value is called a level of the associated factor. R uses a compact representation for factors, which makes them efficient for storage in data frames. In other programming languages, a factor would be represented by a vector of enumerated values.
There are two key uses for factors:
Categorical variables: A factor can represent a categorical variable. Categorical variables are used in contingency tables, linear regression, analysis of variance (ANOVA), logistic regression, and many other areas.
Grouping: This is a technique for labeling or tagging your data items according to their group. See the Introduction to Chapter 6.
> A <- c(1,2,2,3,3,4,4,4,4,2,1,2,3,3) > A [1] 1 2 2 3 3 4 4 4 4 2 1 2 3 3 > str(A) num [1:14] 1 2 2 3 3 4 4 4 4 2 ... > fA <- factor(A) > fA [1] 1 2 2 3 3 4 4 4 4 2 1 2 3 3 Levels: 1 2 3 4 > str(fA) Factor w/ 4 levels "1","2","3","4": 1 2 2 3 3 4 4 4 4 2 ... >
A data frame is powerful and flexible structure. Most serious R applications involve data frames. A data frame is intended to mimic a dataset, such as one you might encounter in SAS or SPSS.
A data frame is a tabular (rectangular) data structure, which means that it has rows and columns. It is not implemented by a matrix, however. Rather, a data frame is a list:
Because a data frame is both a list and a rectangular structure, R provides two different paradigms for accessing its contents:
> v <- c(1,2,3) > v <- c(v,4) # Append a single value to v > v [1] 1 2 3 4 > w <- c(5,6,7,8) > v <- c(v,w) # Append an entire vector to v > v [1] 1 2 3 4 5 6 7 8
> v <- c(1,2,3) # Create a vector of three elements > v[10] <- 10 # Assign to the 10th element > v # R extends the vector automatically [1] 1 2 3 NA NA NA NA NA NA 10
> append(1:10, 99) [1] 1 2 3 4 5 6 7 8 9 10 99
> append(1:10, 99, after=5) [1] 1 2 3 4 5 99 6 7 8 9 10
> append(1:10, 99, after=0) [1] 99 1 2 3 4 5 6 7 8 9 10
> (1:6) + (1:3) [1] 2 4 6 5 7 9
> 1 2 3 4 5 6 1 2 3 1 2 3 2 4 6 5 7 9
> cbind(1:6) [,1] [1,] 1 [2,] 2 [3,] 3 [4,] 4 [5,] 5 [6,] 6
> cbind(1:3) [,1] [1,] 1 [2,] 2 [3,] 3
> cbind(1:6, 1:3) [,1] [,2] [1,] 1 1 [2,] 2 2 [3,] 3 3 [4,] 4 1 [5,] 5 2 [6,] 6 3
> (1:6) + (1:5) # Oops! 1:5 is one element too short [1] 2 4 6 8 10 7 Warning message: In (1:6) + (1:5) : longer object length is not a multiple of shorter object length
> (1:6) + 10 [1] 11 12 13 14 15 16
> f <- factor(v) # v is a vector of strings or integers
> f <- factor(v, levels)
> f <- factor(c("Win","Win","Lose","Tie","Win","Lose")) > f [1] Win Win Lose Tie Win Lose Levels: Lose Tie Win
Add the below line before entering the textbook code.
> wday <- c("Wed", "Thu", "Mon", "Wed", "Thu", "Thu", "Thu", "Tue", "Thu", "Tue")
> f <- factor(wday) > f [1] Wed Thu Mon Wed Thu Thu Thu Tue Thu Tue Levels: Mon Thu Tue Wed
> f <- factor(wday, c("Mon","Tue","Wed","Thu","Fri")) # c(...) part means "levels" not data > f # note that there is no Fri in the below output. [1] Wed Thu Mon Wed Thu Thu Thu Tue Thu Tue Levels: Mon Tue Wed Thu Fri
> comb <- stack(list(v1=v1, v2=v2, v3=v3)) # Combine 3 vectors
Why in the world would you want to mash all your data into one big vector and a parallel factor? The reason is that many important statistical functions require the data in that format. Suppose you survey freshmen, sophomores, and juniors regarding their confidence level (“What percentage of the time do you feel confident in school?”). Now you have three vectors, called freshmen, sophomores, and juniors. You want to perform an ANOVA analysis of the differences between the groups. The ANOVA function, aov, requires one vector with the survey results as well as a parallel factor that identifies the group. You can combine the groups using the stack function:
freshmen | sophomores | juniors | |
---|---|---|---|
1 | .60 | .70 | .76 |
2 | .35 | .61 | .72 |
3 | .44 | .63 | .92 |
4 | .62 | .87 | .87 |
5 | .60 | .85 | |
6 | .70 | ||
7 | .64 |
freshmen <- c(0.6, 0.35, 0.44, 0.62, 0.6) sophomores <- c(0.7, 0.61, 0.63, 0.87, 0.85, 0.7, 0.64) juniors <- c(.76, .72, .92, .87)
> comb <- stack(list(fresh=freshmen, soph=sophomores, jrs=juniors)) > print(comb) values ind 1 0.60 fresh 2 0.35 fresh 3 0.44 fresh 4 0.62 fresh 5 0.60 fresh 6 0.70 soph 7 0.61 soph 8 0.63 soph 9 0.87 soph 10 0.85 soph 11 0.70 soph 12 0.64 soph 13 0.76 jrs 14 0.71 jrs 15 0.92 jrs 16 0.87 jrs
Now you can perform the ANOVA analysis on the two columns:
> aov(values ~ ind, data=comb)
When building the list we must provide tags for the list elements (the tags are fresh, soph, and jrs in this example). Those tags are required because stack uses them as the levels of the parallel factor.
Annoyed by the funky variable names (column names)?
colnames(comb) <- c("score", "year") aov(score ~ year, data=comb)
> lst <- list(0.5, 0.841, 0.977) > lst [[1]] [1] 0.5 [[2]] [1] 0.841 [[3]] [1] 0.977
When R prints the list, it identifies each list element by its position (1, 2, 3) and prints the element’s value (e.g., [1] 0.5) under its position. More usefully, lists can — unlike vectors — contain elements of different modes (types). Here is an extreme example of a mongrel created from a scalar, a character string, a vector, and a function:
> lst <- list(3.14, "Moe", c(1,1,2,3), mean) > lst [[1]] [1] 3.14 [[2]] [1] "Moe" [[3]] [1] 1 1 2 3 [[4]] function (x, ...) UseMethod("mean") <environment: namespace:base>
You can also build a list by creating an empty list and populating it. Here is our “mongrel” example built in that way:
> lst <- list() > lst[[1]] <- 3.14 > lst[[2]] <- "Moe" > lst[[3]] <- c(1,1,2,3) > lst[[4]] <- mean
> lst <- list(mid=0.5, right=0.841, far.right=0.977) > lst $mid [1] 0.5 $right [1] 0.841 $far.right [1] 0.977
> years <- list(1960, 1964, 1976, 1994) > years [[1]] [1] 1960 [[2]] [1] 1964 [[3]] [1] 1976 [[4]] [1] 1994
> years[[1]] [1] 1960
lst[[n]]
This is an element, not a list. It is the nth element of lst.
lst[n]
This is a list, not an element. The list contains one element, taken from the nth element of lst. This is a special case of lst[c(n1, n2, …, nk)] in which we eliminated the c(…) construct because there is only one n.
> class(years[[1]]) [1] "numeric"
> class(years[1]) [1] "list"
Use one of these forms. Here, lst is a list variable:
lst[["name"]]
Selects the element called name. Returns NULL if no element has that name.
lst$name
Same as previous, just different syntax.
lst[c(name1, name2, ..., namek)]
Returns a list built from the indicated elements of lst.
Note that the first two forms return an element whereas the third form returns a list.
> years <- list(Kennedy=1960, Johnson=1964, Carter=1976, Clinton=1994)
The below has the same effects as the above.
years <- list(1960, 1964, 1976, 1994) names(years) <- c("Kennedy", "Johnson", "Carter", "Clinton")
These next two expressions return the same value—namely, the element that is named “Kennedy”:
> years[["Kennedy"]] [1] 1960 > years$Kennedy [1] 1960
The following two expressions return sublists extracted from years:
> years[c("Kennedy","Johnson")] $Kennedy [1] 1960 $Johnson [1] 1964
> years["Carter"] $Carter [1] 1976
> years <- list(Kennedy=1960, Johnson=1964, Carter=1976, Clinton=1994)
> years $Kennedy [1] 1960 $Johnson [1] 1964 $Carter [1] 1976 $Clinton [1] 1994 > years[["Johnson"]] <- NULL # Remove the element labeled "Johnson" > years $Kennedy [1] 1960 $Carter [1] 1976 $Clinton [1] 1994
You can remove multiple elements this way, too:
> years[c("Carter","Clinton")] <- NULL # Remove two elements > years $Kennedy [1] 1960
> lst[sapply(lst, is.null)] <- NULL
> lst <- list("Moe", NULL, "Curly") # Create list with NULL element > lst [[1]] [1] "Moe" [[2]] NULL [[3]] [1] "Curly" > lst[sapply(lst, is.null)] <- NULL # Remove NULL element from list > lst [[1]] [1] "Moe" [[2]] [1] "Curly"
> lst[lst < 0] <- NULL > lst[lst == 0] <- NULL > lst[is.na(lst)] <- NULL
> theData <- c(1.1, 1.2, 2.1, 2.2, 3.1, 3.2) > mat <- matrix(theData, 2, 3) > mat [,1] [,2] [,3] [1,] 1.1 2.1 3.1 [2,] 1.2 2.2 3.2
matrix(data, row, col)
If data is a single value, recycling rule is applied.
> matrix(0, 2, 3) # Create an all-zeros matrix [,1] [,2] [,3] [1,] 0 0 0 [2,] 0 0 0 > matrix(NA, 2, 3) # Create a matrix populated with NA [,1] [,2] [,3] [1,] NA NA NA [2,] NA NA NA
Same thing.
> mat <- matrix(c(1.1, 1.2, 1.3, 2.1, 2.2, 2.3), 2, 3)
Easy to read.
> theData <- c(1.1, 1.2, 1.3, + 2.1, 2.2, 2.3) > mat <- matrix(theData, 2, 3, byrow=TRUE)
Condense version
> mat <- matrix(c(1.1, 1.2, 1.3, + 2.1, 2.2, 2.3), + 2, 3, byrow=TRUE)
Same
> v <- c(1.1, 1.2, 1.3, 2.1, 2.2, 2.3) > dim(v) <- c(2,3) > v [,1] [,2] [,3] [1,] 1.1 1.3 2.2 [2,] 1.2 2.1 2.3
t(A) Matrix transposition of A solve(A) Matrix inverse of A A %*% B Matrix multiplication of A and B diag(n) An n-by-n diagonal (identity) matrix
> mat <- matrix(c(1.1, 1.2, 1.3, 2.1, 2.2, 2.3), 2, 3, byrow=TRUE) > mat [,1] [,2] [,3] [1,] 1.1 1.2 1.3 [2,] 2.1 2.2 2.3 > mat%*%t(mat) [,1] [,2] [1,] 4.34 7.94 [2,] 7.94 14.54 > t(mat)%*%mat [,1] [,2] [,3] [1,] 5.62 5.94 6.26 [2,] 5.94 6.28 6.62 [3,] 6.26 6.62 6.98
> rownames(mat) <- c("rowname1", "rowname2", ..., "rownamem") > colnames(mat) <- c("colname1", "colname2", ..., "colnamen")
> vec <- mat[1,] # First row > vec <- mat[,3] # Third column
Normally, when you select one row or column from a matrix, R strips off the dimensions. The result is a dimensionless vector:
> mat[1,] [1] 1 4 7 10 > mat[,3] [1] 7 8 9
When you include the drop=FALSE argument, however, R retains the dimensions. In that case, selecting a row returns a row vector (a 1 × n matrix):
> mat[1,,drop=FALSE] [,1] [,2] [,3] [,4] [1,] 1 4 7 10
Likewise, selecting a column with drop=FALSE returns a column vector (an n × 1 matrix):
> mat[,3,drop=FALSE] [,1] [1,] 7 [2,] 8 [3,] 9
Combining vectors
> dfrm <- data.frame(v1, v2, v3, f1, f2)
Combining lists
> dfrm <- as.data.frame(list.of.vectors)
pred1 <- c(-2.7528917, -0.3626909, -1.0416039, 1.266682, 0.7806372, -1.0832624, -2.0883305, -0.7063653, -0.8394022, -0.4966884) pred2 <- c(-1.4078413, 0.31286963, -0.69685664, -1.27511434, -0.27292745, 0.73383339, 0.96816822, -0.84476203, 0.31530793, -0.08030948) pred3 <- c("AM", "AM", "PM", "PM", "AM", "AM", "PM", "PM", "PM", "AM") resp <- c(12.57715, 21.02418, 18.94694, 18.98153, 19.59455, 20.71605, 22.70062, 18.40691, 21.0093, 19.31253)
> dfrm <- data.frame(pred1, pred2, pred3, resp) > dfrm pred1 pred2 pred3 resp 1 -2.7528917 -1.40784130 AM 12.57715 2 -0.3626909 0.31286963 AM 21.02418 3 -1.0416039 -0.69685664 PM 18.94694 4 1.2666820 -1.27511434 PM 18.98153 5 0.7806372 -0.27292745 AM 19.59455 6 -1.0832624 0.73383339 AM 20.71605 7 -2.0883305 0.96816822 PM 22.70062 8 -0.7063653 -0.84476203 PM 18.40691 9 -0.8394022 0.31530793 PM 21.00930 10 -0.4966884 -0.08030948 AM 19.31253
> dfrm <- data.frame(p1=pred1, p2=pred2, p3=pred3, r=resp) > dfrm p1 p2 p3 r 1 -2.7528917 -1.40784130 AM 12.57715 2 -0.3626909 0.31286963 AM 21.02418 3 -1.0416039 -0.69685664 PM 18.94694 . . (etc.) .
suppose that there are other data in the resp and pred3 like the below:
pred3 <- c(pred3, "PM")
resp <- c(resp, 20,30,40)
Now you are trying to combine these vectors into a dataframe (dfrm2); but, failing:
dfrm <- data.frame(pred1, pred2, pred3, resp)
To fix this, you want to remove the additional data from the vectors so that each vector has 10 data element. How would you do that?
pred3 ← pred3[-c(11)]
resp ← resp[-c(11:13)]
dfrm ← data.frame(pred1, pred2, pred3, resp)
Or, add NAs in the short columns. How would I do that?
> lst <- list(p1=pred1, p2=pred2, p3=pred3, r=resp)
Alternatively, list → as.data.frame
> lst <- list(p1=pred1, p2=pred2, p3=pred3, r=resp) > as.data.frame(lst) p1 p2 p3 r 1 -2.7528917 -1.40784130 AM 12.57715 2 -0.3626909 0.31286963 AM 21.02418 3 -1.0416039 -0.69685664 PM 18.94694 . . (etc.) .
rbind() function combines vector, matrix or data frame by rows.
Subtype Gender Expression A m -0.54 A f -0.8 B f -1.03 C m -0.41
Subtype Gender Expression D m 3.22 D f 1.02 D f 0.21 D m -0.04 D m 2.11 B m -1.21 A f -0.2
> x1 <- read.csv("http://commres.net/wiki/_export/code/r/data_structures?codeblock=83", head=T, sep=" ") > x2 <- read.csv("http://commres.net/wiki/_export/code/r/data_structures?codeblock=84", head=T, sep=" ") > x <- rbind(x1,x2) > x Subtype Gender Expression 1 A m -0.54 2 A f -0.80 3 B f -1.03 4 C m -0.41 5 D m 3.22 6 D f 1.02 7 D f 0.21 8 D m -0.04 9 D m 2.11 10 B m -1.21 11 A f -0.20
city county state pop 1 Chicago Cook IL 2853114 2 Kenosha Kenosha WI 90352 3 Aurora Kane IL 171782 4 Elgin Kane IL 94487 5 Gary Lake(IN) IN 102746 6 Joliet Kendall IL 106221 7 Naperville DuPage IL 147779 8 Arlington Heights Cook IL 76031 9 Bolingbrook Will IL 70834 10 Cicero Cook IL 72616 11 Evanston Cook IL 74239 12 Hammond Lake(IN) IN 83048 13 Palatine Cook IL 67232 14 Schaumburg Cook IL 75386 15 Skokie Cook IL 63348 16 Waukegan Lake(IL) IL 91452
suburbs <- read.csv("http://commres.net/wiki/_export/code/r/data_structures?codeblock=89", head=T, sep=" ") suburbs . . . . . suburbs$X <- NULL # x column should be deleted.
newRow <- data.frame(city="West Dundee", county="Kane", state="IL", pop=5428) suburbs <- rbind(suburbs, newRow) suburbs city county state pop 1 Chicago Cook IL 2853114 2 Kenosha Kenosha WI 90352 3 Aurora Kane IL 171782 4 Elgin Kane IL 94487 5 Gary Lake(IN) IN 102746 6 Joliet Kendall IL 106221 7 Naperville DuPage IL 147779 8 Arlington Heights Cook IL 76031 9 Bolingbrook Will IL 70834 10 Cicero Cook IL 72616 11 Evanston Cook IL 74239 12 Hammond Lake(IN) IN 83048 13 Palatine Cook IL 67232 14 Schaumburg Cook IL 75386 15 Skokie Cook IL 63348 16 Waukegan Lake(IL) IL 91452 17 West Dundee Kane IL 5428
city county state pop 1 Chicago Cook IL 2853114 2 Kenosha Kenosha WI 90352 3 Aurora Kane IL 171782 4 Elgin Kane IL 94487 5 Gary Lake(IN) IN 102746 6 Joliet Kendall IL 106221 7 Naperville DuPage IL 147779 8 Arlington Heights Cook IL 76031 9 Bolingbrook Will IL 70834 10 Cicero Cook IL 72616 11 Evanston Cook IL 74239 12 Hammond Lake(IN) IN 83048 13 Palatine Cook IL 67232 14 Schaumburg Cook IL 75386 15 Skokie Cook IL 63348 16 Waukegan Lake(IL) IL 91452
suburbs <- read.csv("http://commres.net/wiki/_export/code/r/data_structures?codeblock=92", head=T, sep=" ")
> suburbs[[1]] [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 > suburbs[[3]] [1] Cook Kenosha Kane Kane Lake(IN) Kendall DuPage Cook Will Cook Cook Lake(IN) Cook Cook Cook [16] Lake(IL) Levels: Cook DuPage Kane Kendall Kenosha Lake(IL) Lake(IN) Will > suburbs[[4]] [1] IL WI IL IL IN IL IL IL IL IL IL IN IL IL IL IL Levels: IL IN WI
suburbs[[1]]
This returns one column.
suburbs[1]
This returns a data frame, and the data frame contains exactly one column. This is a special case of dfrm[c(n1,n2, …, nk)]. We don’t need the c(…) construct because there is only one n.
city county state pop Chicago Cook IL 2853114 Kenosha Kenosha WI 90352 Aurora Kane IL 171782 Elgin Kane IL 94487 Gary Lake(IN) IN 102746 Joliet Kendall IL 106221 Naperville DuPage IL 147779 Arlington Heights Cook IL 76031 Bolingbrook Will IL 70834 Cicero Cook IL 72616 Evanston Cook IL 74239 Hammond Lake(IN) IN 83048 Palatine Cook IL 67232 Schaumburg Cook IL 75386 Skokie Cook IL 63348 Waukegan Lake(IL) IL 91452
suburbs <- read.csv("http://commres.net/wiki/_export/code/r/data_structures?codeblock=97", head=T, sep=" ")
> suburbs[[1]] [1] "Chicago" "Kenosha" "Aurora" "Elgin" [5] "Gary" "Joliet" "Naperville" "Arlington Heights" [9] "Bolingbrook" "Cicero" "Evanston" "Hammond" [13] "Palatine" "Schaumburg" "Skokie" "Waukegan"
> suburbs[1] city 1 Chicago 2 Kenosha 3 Aurora 4 Elgin 5 Gary 6 Joliet 7 Naperville 8 Arlington Heights 9 Bolingbrook 10 Cicero 11 Evanston 12 Hammond 13 Palatine 14 Schaumburg 15 Skokie 16 Waukegan
> suburbs[c(1,4)] city pop 1 Chicago 2853114 2 Kenosha 90352 3 Aurora 171782 4 Elgin 94487 5 Gary 102746 6 Joliet 106221 7 Naperville 147779 8 Arlington Heights 76031 9 Bolingbrook 70834 10 Cicero 72616 11 Evanston 74239 12 Hammond 83048 13 Palatine 67232 14 Schaumburg 75386 15 Skokie 63348 16 Waukegan 91452
dfrm[["name"]] Returns one column, the column called name. dfrm$name Same as previous, just different syntax. To select one or more columns and package them in a data frame, use these list expressions: dfrm["name"] Selects one column and packages it inside a data frame object. dfrm[c("name1", "name2", ..., "namek")] Selects several columns and packages them in a data frame. You can use matrix-style subscripting to select one or more columns: dfrm[, "name"] Returns the named column. dfrm[, c("name1", "name2", ..., "namek")] Selects several columns and packages in a data frame.
Data set used in the section: Cars93 in MASS packages
install.packages("MASS") library(MASS) Cars93 Manufacturer Model Type Min.Price Price Max.Price MPG.city MPG.highway AirBags DriveTrain Cylinders EngineSize 1 Acura Integra Small 12.9 15.9 18.8 25 31 None Front 4 1.8 2 Acura Legend Midsize 29.2 33.9 38.7 18 25 Driver & Passenger Front 6 3.2 3 Audi 90 Compact 25.9 29.1 32.3 20 26 Driver only Front 6 2.8 4 Audi 100 Midsize 30.8 37.7 44.6 19 26 Driver & Passenger Front 6 2.8 . . . . .
subset(Cars93, select=Model, subset=(MPG.city > 30)) Model 31 Festiva 39 Metro 42 Civic . . (etc.) .
subset(Cars93, select=c(Model,Min.Price,Max.Price), + subset=(Cylinders == 4 & Origin == "USA")) Model Min.Price Max.Price 6 Century 14.2 17.3 12 Cavalier 8.5 18.3 13 Corsica 11.4 11.4 . . (etc.) .
subset(Cars93, select=c(Manufacturer,Model), + subset=c(MPG.highway > median(MPG.highway))) Manufacturer Model 1 Acura Integra 5 BMW 535i 6 Buick Century . . (etc.) .
mat <- c(-0.818, -0.667, -0.494, -0.819, -0.946, -0.205, 0.385, 1.531, -0.611, -2.155, -0.535, -0.316) dim(mat) <- c(4,3) mat [,1] [,2] [,3] [1,] -0.818 -0.667 -0.494 [2,] -0.819 -0.946 -0.205 [3,] 0.385 1.531 -0.611 [4,] -2.155 -0.535 -0.316
Vanilla variable name!
as.data.frame(mat) V1 V2 V3 1 -0.818 -0.667 -0.494 2 -0.819 -0.946 -0.205 3 0.385 1.531 -0.611 4 -2.155 -0.535 -0.316
colnames(mat) <- c("before","treatment","after") > mat before treatment after [1,] -0.818 -0.946 -0.611 [2,] -0.667 -0.205 -2.155 [3,] -0.494 0.385 -0.535 [4,] -0.819 1.531 -0.316 > as.data.frame(mat) before treatment after 1 -0.818 -0.946 -0.611 2 -0.667 -0.205 -2.155 3 -0.494 0.385 -0.535 4 -0.819 1.531 -0.316
> temp <- edit(mat) mat <- temp # Overwrite only if you're happy with the changes! mat2 <- temp # or.... # then, close the edit window
Can you save it as “mat.csv.” Then, retrieve it again into r space?
When you read back the csv file? How would you avoid like the below output? I mean aovid X column?
X before treatment after 1 1 -0.818 -0.946 -0.611 2 2 -0.667 -0.205 -2.155 3 3 -0.494 0.385 -0.535 4 4 -0.819 1.531 -0.316
Or even, how would I save the csv file, without the X column?
Use na.omit to remove rows that contain any NA values.
> clean <- na.omit(dfrm)
> subset(dfrm, select = -badboy) # All columns except badboy
> cor(patient.data) patient.id pre dosage post patient.id 1.00000000 0.02286906 0.3643084 -0.13798149 pre 0.02286906 1.00000000 0.2270821 -0.03269263 dosage 0.36430837 0.22708208 1.0000000 -0.42006280 post -0.13798149 -0.03269263 -0.4200628 1.00000000
This correlation matrix includes the meaningless “correlation” between patient ID and other variables, which is annoying. We can exclude the patient ID column to clean up the output:
> cor(subset(patient.data, select = -patient.id)) pre dosage post pre 1.00000000 0.2270821 -0.03269264 dosage 0.22708207 1.0000000 -0.42006280 post -0.03269264 -0.4200628 1.00000000
We can exclude multiple columns by giving a vector of negated names:
> cor(subset(patient.data, select = c(-patient.id,-dosage))) pre post pre 1.00000000 -0.03269264 post -0.03269264 1.00000000
> stooges name n.marry n.child 1 Moe 1 2 2 Larry 1 2 3 Curly 4 2 > birth birth.year birth.place 1 1887 Bensonhurst 2 1902 Philadelphia 3 1903 Brooklyn > cbind(stooges,birth) name n.marry n.child birth.year birth.place 1 Moe 1 2 1887 Bensonhurst 2 Larry 1 2 1902 Philadelphia 3 Curly 4 2 1903 Brooklyn
rbind
> stooges name n.marry n.child 1 Moe 1 2 2 Larry 1 2 3 Curly 4 2 > guys name n.marry n.child 1 Tom 4 2 2 Dick 1 4 3 Harry 1 1 > rbind(stooges,guys) name n.marry n.child 1 Moe 1 2 2 Larry 1 2 3 Curly 4 2 4 Tom 4 2 5 Dick 1 4 6 Harry 1 1