class: center, middle, inverse, title-slide # Practical R: Functions and Loops ### Abhijit Dasgupta ### BIOF 339 --- class: middle,center,inverse # Functions --- ## Why do we need functions? When you are typing instructions to the computer, you might find yourself repeating the same instructions over and over. So you end up copying and pasting code for each repitition. + Can make a mistake copying and pasting + If you need to change the instructions, you need to find every instance of it **manually** and change it, and you're likely to miss one The rule of thumb is, if you're copying the same code more than twice, write a function. + Write the instructions once + Change it in only one place, if needed --- ## Defining functions The basic syntax of a function is ``` <function name> <- function(<input argument(s)>){ <code for instructions> ... <more code> return(<output object>) } ``` --- ## Defining functions Let's create our own function to convert feet to inches. ```r ft2in <- function(ft){ inch <- ft * 12 return(inch) } ``` + `ft2in` is the name of the function + The input argument is named `ft` (make an expressive name) + Inches are computed by multiplying `ft` by 12 and storing it in `inch` + The output of the function is the value of the `inch` variable To run this: .pull-left[ ```r ft2in(12) # 12 feet to inches ``` ] .pull-right[ ``` [1] 144 ``` ] --- ## Defining functions What if we want more than one input? ```r ft2in <- function(ft, convert_to){ # ft = input (feet) # convert_to = unit to convert to ('in','m','cm') if(convert_to == 'in'){ output <- ft * 12 } if(convert_to == 'm'){ output <- ft * 0.3048 } if(convert_to == 'cm'){ output <- ft * 30.48 } return(paste(output, convert_to)) } ``` .pull-left[ ```r ft2in(12, convert_to='cm') ``` ] .pull-right[ ``` [1] "365.76 cm" ``` ] --- ## Quick reminder about conditions Some comparison operators for filtering | Operator | Meaning | |----------|----------------------------------| | == | Equals | | != | Not equals | | > / < | Greater / less than | | >= / <= | Greater or equal / Less or equal | | ! | Not | | %in% | In a set | Combining comparisons | Operator | Meaning | |------------|---------| | & | And | | | | Or | --- background-image: url(../img/dplyr_case_when.png) background-size: contain --- ## Defining functions ```r ft2in <- function(ft, convert_to){ # ft = input (feet) # convert_to = unit to convert to ('in','m','cm') * conversion <- case_when( * convert_to == 'in' ~ 12, * convert_to =='m' ~ 0.3048, * convert_to == 'cm' ~ 30.48, * TRUE ~ 1 # otherwise * ) output = ft * conversion return(paste(output, convert_to)) } ``` .pull-left[ ```r ft2in(12, convert_to='cm') ``` ] .pull-right[ ``` [1] "365.76 cm" ``` ] --- ## The concept of local vs global variables ```r x <- 10 print(x) ``` ``` [1] 10 ``` ```r f <- function(x){ x <- 5 print(x) } f(x) ``` ``` [1] 5 ``` ```r print(x) ``` ``` [1] 10 ``` The `x` inside the function is local to the function and is independent of the `x` in the global space that has the value 10.. --- class: middle,center,inverse # Loops --- ## for-loops ![](https://media.giphy.com/media/3o6nURRboKQJrBGVC8/giphy.gif) --- ## for-loops The for-loop is a construct to repeat the same operation over a list of values. Basic syntax: ``` for(<variable> in <list>){ <code> ... <more code> } ``` Example: .pull-left[ ```r for(i in 1:10){ print(i) } ``` Here `i` is a dummy variable. It's actual name doesn't matter, just its action ] .pull-right[ ``` [1] 1 [1] 2 [1] 3 [1] 4 [1] 5 [1] 6 [1] 7 [1] 8 [1] 9 [1] 10 ``` ] --- ## for-loops Example: ```r for(n in names(iris)){ if(is.numeric(iris[,n])){ * print(glue::glue('The mean of {n} is {mean(iris[,n])}')) } } ``` ``` The mean of Sepal.Length is 5.84333333333333 The mean of Sepal.Width is 3.05733333333333 The mean of Petal.Length is 3.758 The mean of Petal.Width is 1.19933333333333 ``` You don't need the `<list>` in the for-loop definition to be integers. In this case it is a list of strings. Note that vectors are also considered lists for this purpose. .footnote[ ----- The **glue** package allows you to run templated text strings interspersed with the results of R objects] --- class: middle,center,inverse # purrr: functional programming and mapping --- ## purrr The **purrr** package provides ways to efficiently run functions over lists. These functions are typically more efficient than for-loops. The function `purrr::map` has syntax ``` map(<list/vector>, <function/formula>, ...) ``` Example: .pull-left[ ```r iris1 <- select(iris, where(is.numeric)) *map(iris1, mean) ``` ``` $Sepal.Length [1] 5.843333 $Sepal.Width [1] 3.057333 $Petal.Length [1] 3.758 $Petal.Width [1] 1.199333 ``` ] .pull-right[ `map` takes a list and outputs a list. Recall, a data.frame is a list of columns, so `map` takes each column and applies the function `mean` to it, and prints the output If you're familiar with `lapply`, `map` works almost exactly the same way ] --- ## purrr Example (cont.): You can clean the output up a bit. .pull-left[ ```r map_dbl(iris1, mean) ``` ``` Sepal.Length Sepal.Width Petal.Length Petal.Width 5.843333 3.057333 3.758000 1.199333 ``` ] .pull-right[ There are several helper functions like `map_dbl`, `map_int`, `map_chr`, and others that will reduce the output into a vector of particular type (more [here](https://purrr.tidyverse.org/reference/map.html)) ] `map` can also be used as part of pipes, leveraging the fact that data.frames are lists of columns. ```r iris %>% select(where(is.numeric)) %>% map_dbl(mean) ``` ``` Sepal.Length Sepal.Width Petal.Length Petal.Width 5.843333 3.057333 3.758000 1.199333 ``` > **Question:** Why does `map_dbl` only have the argument `mean`? --- ## purrr There are several extensions of `map` + `map2` and derivatives `map2_dbl`, etc, iterate over two lists to compute the outcome of a function of **two** variables + `pmap` and derivatives iterate over _p_ lists to compute the outcome of a p-dimensional function + `imap` and derivatives iterates over a list and its index/names to compute the outcome of a function that takes the values and index/names as inputs --- ## purrr The function part of these functions can be entered in a couple of ways: 1. If you have a formal `function` f with the appropriate number of arguments, you can just add `f`. + `map_dbl(iris1, mean)` 1. You can also define a function "on-the-fly" using a _formula interface_. + `map_dbl(iris1, ~mean(.x))` + if you have multiple arguments, they are denoted as `.x`, `.y`, `.z`, `.w`, etc. .footnote[The second method is often referred to as *anonymous functions* or *lambda functions* in computer science since they aren't given a name]