1.1 A short introduction to R objects
1.1.1 Objects
The broad categorization of R objects in my mind are functions (verbs) and data objects (nouns). Data objects are in turn of different types:
data.frame
ortibble
: These are rectangular data sets much like you would see in a spreadsheetvector
: This is a 1-dimensional list of numbers or strings (words in the language sense), but all must be of the same kind (number or string)matrix
: This is a 2-dimensional list of numbers or strings, once again all of the same type- A single number or word
list
: This is a catch-all bucket. Each element of a list can be literally any valid R object. So they could betibble
’s, or functions, or matrices, and different elements can be of different types.
Most objects we’ll use in this workshop are going to be data.frame
or tibble
objects. (In case you’re wondering, they’re basically the same thing, but tibble
’s have some modest additional functionality). R comes with a bunch of built-in datasets stored as data.frame
s.
mtcars
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
A data.frame
can be acted upon by different functions to help describe it and extract elements from it. For example, to see the size of the data, we use
dim(mtcars)
## [1] 32 11
Data sets often have row names and column names. These can be extracted by the functions rownames
and
colnames
or names
:
rownames(mtcars)
## [1] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710"
## [4] "Hornet 4 Drive" "Hornet Sportabout" "Valiant"
## [7] "Duster 360" "Merc 240D" "Merc 230"
## [10] "Merc 280" "Merc 280C" "Merc 450SE"
## [13] "Merc 450SL" "Merc 450SLC" "Cadillac Fleetwood"
## [16] "Lincoln Continental" "Chrysler Imperial" "Fiat 128"
## [19] "Honda Civic" "Toyota Corolla" "Toyota Corona"
## [22] "Dodge Challenger" "AMC Javelin" "Camaro Z28"
## [25] "Pontiac Firebird" "Fiat X1-9" "Porsche 914-2"
## [28] "Lotus Europa" "Ford Pantera L" "Ferrari Dino"
## [31] "Maserati Bora" "Volvo 142E"
names(mtcars)
## [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
## [11] "carb"
Both of these are valid R objects that are vectors of strings. You could save them for future use by assigning them a name using the assignment operator <-
. So if you wanted to store the row names, which are the makes and models of the cars in this data set (this structure is not desirable, as we’ll discuss later), you could run
car_names <- rownames(mtcars)
You can see that this is stored in R for current use, either by typing ls()
in the console (for “list”) or by looking in the Environment
pane in RStudio.
The output of any function is a valid R object, and so you can always store the results of the function by assigning it a name, as above.
1.1.2 Extracting elements from objects
We can see the structure of any object by using the function str
.
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
This tells us that mtcars
is a data.frame
with 32 observations (rows) and 11 variables (columns).
Each variable has a name, and all the variables are numeric.
data.frame
objects are like lists, in that each column can be of a different type. This is a very powerful structure, since we can keep all sorts of data together, and can load spreadsheets with diverse kinds of data easily into R
To extract the mpg
variable from this data set, there are a few equivalent methods. My preferred method is mtcars[,'mpg']
, i.e., extract the column named “mpg” from this data set. Notice that we’re using []
while functions use ()
. This format of extraction will work when you’re extracting more than one variable, as we’ll see below. Other ways include mtcars[['mpg']]
and mtcars$mpg
, which are
the list
way and a data.frame
-specific shortcut.
You can also extract elements by position, either using the
[,]
or[[]]
forms. So, to extract an element in the 2nd row and 4th column, you’d have to use the matrix notation asmtcars[2,4]
. To extract the 4th column, you could use eithermtcars[,4]
ormtcars[[4]]
. To extract the 2nd row, you’d again use the matrix notation asmtcars[2,]
.
If we want to extract the mpg, cyl and disp variables at once to create a new data.frame
, you can
use either the matrix notation mtcars[,c('mpg','cyl','disp')]
or the list notation mtcars[[c('mpg','cyl','disp')]]
. The c()
function stands for concatenate and is the function used to create vectors. We’ll actually see a much more user-friendly way of doing this in the data munging section (Chapter 4).
Advanced note: A
data.frame
object is really alist
object where all the elements are vectors of the same length, and which happen to have names assigned to them. The object also looks like a matrix or 2-dimensional array visually. So both notations were allowed to be valid fordata.frame
objects.