1.1 A short introduction to R objects

1.1.1 Objects

The broad categorization of R objects in my mind are functions (verbs) and data objects (nouns). Data objects are in turn of different types:

  • data.frame or tibble: These are rectangular data sets much like you would see in a spreadsheet
  • vector: This is a 1-dimensional list of numbers or strings (words in the language sense), but all must be of the same kind (number or string)
  • matrix: This is a 2-dimensional list of numbers or strings, once again all of the same type
  • A single number or word
  • list: This is a catch-all bucket. Each element of a list can be literally any valid R object. So they could be tibble’s, or functions, or matrices, and different elements can be of different types.

Most objects we’ll use in this workshop are going to be data.frame or tibble objects. (In case you’re wondering, they’re basically the same thing, but tibble’s have some modest additional functionality). R comes with a bunch of built-in datasets stored as data.frames.

mtcars
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

A data.frame can be acted upon by different functions to help describe it and extract elements from it. For example, to see the size of the data, we use

dim(mtcars)
## [1] 32 11

Data sets often have row names and column names. These can be extracted by the functions rownames and colnames or names:

rownames(mtcars)
##  [1] "Mazda RX4"           "Mazda RX4 Wag"       "Datsun 710"         
##  [4] "Hornet 4 Drive"      "Hornet Sportabout"   "Valiant"            
##  [7] "Duster 360"          "Merc 240D"           "Merc 230"           
## [10] "Merc 280"            "Merc 280C"           "Merc 450SE"         
## [13] "Merc 450SL"          "Merc 450SLC"         "Cadillac Fleetwood" 
## [16] "Lincoln Continental" "Chrysler Imperial"   "Fiat 128"           
## [19] "Honda Civic"         "Toyota Corolla"      "Toyota Corona"      
## [22] "Dodge Challenger"    "AMC Javelin"         "Camaro Z28"         
## [25] "Pontiac Firebird"    "Fiat X1-9"           "Porsche 914-2"      
## [28] "Lotus Europa"        "Ford Pantera L"      "Ferrari Dino"       
## [31] "Maserati Bora"       "Volvo 142E"
names(mtcars)
##  [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
## [11] "carb"

Both of these are valid R objects that are vectors of strings. You could save them for future use by assigning them a name using the assignment operator <-. So if you wanted to store the row names, which are the makes and models of the cars in this data set (this structure is not desirable, as we’ll discuss later), you could run

car_names <- rownames(mtcars)

You can see that this is stored in R for current use, either by typing ls() in the console (for “list”) or by looking in the Environment pane in RStudio.

The output of any function is a valid R object, and so you can always store the results of the function by assigning it a name, as above.

1.1.2 Extracting elements from objects

We can see the structure of any object by using the function str.

str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

This tells us that mtcars is a data.frame with 32 observations (rows) and 11 variables (columns). Each variable has a name, and all the variables are numeric.

data.frame objects are like lists, in that each column can be of a different type. This is a very powerful structure, since we can keep all sorts of data together, and can load spreadsheets with diverse kinds of data easily into R

To extract the mpg variable from this data set, there are a few equivalent methods. My preferred method is mtcars[,'mpg'], i.e., extract the column named “mpg” from this data set. Notice that we’re using [] while functions use (). This format of extraction will work when you’re extracting more than one variable, as we’ll see below. Other ways include mtcars[['mpg']] and mtcars$mpg, which are the list way and a data.frame-specific shortcut.

You can also extract elements by position, either using the [,] or [[]] forms. So, to extract an element in the 2nd row and 4th column, you’d have to use the matrix notation as mtcars[2,4]. To extract the 4th column, you could use either mtcars[,4] or mtcars[[4]]. To extract the 2nd row, you’d again use the matrix notation as mtcars[2,].

If we want to extract the mpg, cyl and disp variables at once to create a new data.frame, you can use either the matrix notation mtcars[,c('mpg','cyl','disp')] or the list notation mtcars[[c('mpg','cyl','disp')]]. The c() function stands for concatenate and is the function used to create vectors. We’ll actually see a much more user-friendly way of doing this in the data munging section (Chapter 4).

Advanced note: A data.frame object is really a list object where all the elements are vectors of the same length, and which happen to have names assigned to them. The object also looks like a matrix or 2-dimensional array visually. So both notations were allowed to be valid for data.frame objects.