class: center, middle, inverse, title-slide # Practical R: Packages ### Abhijit Dasgupta --- --- class: middle, center, inverse # What are packages in R? --- ## Packages Packages are collections of functions, and sometimes data, that are usually unified for a common purpose .saltinline[If _functions_ are recipes, then _packages_ are recipe books] -- If you want to cook from a recipe, you first have to grab the recipe book from your shelf -- .heatinline[Similarly, if you want to use a function from a package, you first have to grab or activate the package in _your current R session_ ] This is done using the `library` function For example, ```r library(tidyverse) library(janitor) ``` --- ## Packages There is another way to access functions from packages, if you're really only going to use one function from it. The general form for this is .heatinline[`<package>::<function>`] (note the __two__ colons) For example, if you just want to use the `clean_names` function from the **janitor** package, you can do so by ```r janitor::clean_names(dataset) ``` where `dataset` is the name of the data.frame whose column names you want to clean. --- ## Important operational notes .pull-left[ .acid[Install packages **once per computer**] Never install packages inside a R Markdown file ] .pull-right[ .heat[Activate a package **once per R session**] ] -- .footnote[The **pacman** package and the `pacman::p_load` function saves you a bunch of trouble by installing a package only if it doesn't exist on your computer and then activating the packaage. This one function removes a lot of the operational issues in installing and loading packages in R.] --- class: middle,center,inverse # Where are the packages? --- ## CRAN CRAN is the Comprehensive R Archive Network, a network of mirrored repositories containing R packages. Today, it really doesn't matter which of the repositories you use. In RStudio, the default repository is **Global (CDN) - RStudio** which is a version in the cloud that typically works the fastest. ![:scale 50%](../img/pkg1.png) --- ## CRAN You can install packages from CRAN using the following means: .pull-left[ `install.packages("<package name>")` Or, if you want to be explicit, or are not using RStudio, `install.packages("<package name>", repos = "<repository URL>")` ] .pull-right[ Using the RStudio _Packages_ panel (see next slide) You can find packages using CRAN [Task Views](https://cran.r-project.org/web/views/) --- background-image: url(../img/pkg2.png) background-size: contain --- ## GitHub GitHub is where many R packages reside during development. To install a package directly from GitHub, you need the **remotes** package, and then you can use ```r remotes::install_github("<owner>/<repo>") ``` For example, if you want to install the development version of **dplyr**: ```r remotes::install_github("tidyverse/dplyr") ``` --- ## Bioconductor The [Bioconductor](https://www.bioconductor.org) is a R organization dedicated to bioinformatics. It has its own repository of over 1900 packages To install Bioconductor packages, you first need to install the **BiocManager** package from CRAN (note the upper and lower case letters). Then you can install packages by ```r BiocManager::install('<package name>') ``` For example, if you want to install the **DESeq2** package that computes differential gene expressions: ```r BiocManager::install('DESeq2') ``` --- ## Installing packages, a summary .pull-left[ ### From CRAN ```r install.packages("tidyverse") ``` ### From Bioconductor ```r install.packages("BiocManager") # do once BioManager::install('limma') ``` ### From GitHub ```r install.packages('remotes') # do once remotes::install_github("rstudio/rmarkdown") # usual format is username/packagename ``` ] .pull-right[ > GitHub often hosts development version of packages published on CRAN or Bioconductor > Both CRAN and Bioconductor have stringent checks to make sure packages can run properly, with no obvious program flaws. There are typically no guarantees about analytic or theoretical correctness, but most packages have been crowd-validated and there are several reliable developer groups including RStudio ] --- class: middle,center,inverse # Packages commonly used ## An incomplete listing --- ## Data ingestion <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Package </th> <th style="text-align:left;"> Description </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> readr </td> <td style="text-align:left;"> Read Rectangular Text Data </td> </tr> <tr> <td style="text-align:left;"> readxl </td> <td style="text-align:left;"> Read Excel Files </td> </tr> <tr> <td style="text-align:left;"> haven </td> <td style="text-align:left;"> Import and Export 'SPSS', 'Stata' and 'SAS' Files </td> </tr> <tr> <td style="text-align:left;"> DBI </td> <td style="text-align:left;"> R Database Interface </td> </tr> <tr> <td style="text-align:left;"> rvest </td> <td style="text-align:left;"> Easily Harvest (Scrape) Web Pages </td> </tr> <tr> <td style="text-align:left;"> jsonlite </td> <td style="text-align:left;"> A Simple and Robust JSON Parser and Generator for R </td> </tr> </tbody> </table> --- ## Data munging <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Package </th> <th style="text-align:left;"> Description </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> tidyr </td> <td style="text-align:left;"> Tidy Messy Data </td> </tr> <tr> <td style="text-align:left;"> dplyr </td> <td style="text-align:left;"> A Grammar of Data Manipulation </td> </tr> <tr> <td style="text-align:left;"> stringr </td> <td style="text-align:left;"> Simple, Consistent Wrappers for Common String Operations </td> </tr> <tr> <td style="text-align:left;"> lubridate </td> <td style="text-align:left;"> Make Dealing with Dates a Little Easier </td> </tr> <tr> <td style="text-align:left;"> forcats </td> <td style="text-align:left;"> Tools for Working with Categorical Variables (Factors) </td> </tr> <tr> <td style="text-align:left;"> purrr </td> <td style="text-align:left;"> Functional Programming Tools </td> </tr> <tr> <td style="text-align:left;"> janitor </td> <td style="text-align:left;"> Simple Tools for Examining and Cleaning Dirty Data </td> </tr> </tbody> </table> --- ## Data visualization <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Package </th> <th style="text-align:left;"> Description </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> ggplot2 </td> <td style="text-align:left;"> Create Elegant Data Visualisations Using the Grammar of Graphics </td> </tr> <tr> <td style="text-align:left;"> lattice </td> <td style="text-align:left;"> Trellis Graphics for R </td> </tr> <tr> <td style="text-align:left;"> visdat </td> <td style="text-align:left;"> Preliminary Visualisation of Data </td> </tr> <tr> <td style="text-align:left;"> naniar </td> <td style="text-align:left;"> Data Structures, Summaries, and Visualisations for Missing Data </td> </tr> <tr> <td style="text-align:left;"> htmlwidgets </td> <td style="text-align:left;"> HTML Widgets for R </td> </tr> <tr> <td style="text-align:left;"> leaflet </td> <td style="text-align:left;"> Create Interactive Web Maps with the JavaScript 'Leaflet' Library </td> </tr> <tr> <td style="text-align:left;"> highcharter </td> <td style="text-align:left;"> A Wrapper for the 'Highcharts' Library </td> </tr> <tr> <td style="text-align:left;"> plotly </td> <td style="text-align:left;"> Create Interactive Web Graphics via 'plotly.js' </td> </tr> </tbody> </table> There is an entire package ecosystem around `ggplot2` that can be seen [here](https://exts.ggplot2.tidyverse.org/). These include specialized plots, different themes and colors, animations, etc. --- ## Statistics **Data description** <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Package </th> <th style="text-align:left;"> Description </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> tableone </td> <td style="text-align:left;"> Create 'Table 1' to Describe Baseline Characteristics with or without Propensity Score Weights </td> </tr> <tr> <td style="text-align:left;"> table1 </td> <td style="text-align:left;"> Tables of Descriptive Statistics in HTML </td> </tr> <tr> <td style="text-align:left;"> stargazer </td> <td style="text-align:left;"> Well-Formatted Regression and Summary Statistics Tables </td> </tr> <tr> <td style="text-align:left;"> arsenal </td> <td style="text-align:left;"> An Arsenal of 'R' Functions for Large-Scale Statistical Summaries </td> </tr> <tr> <td style="text-align:left;"> gtsummary </td> <td style="text-align:left;"> Presentation-Ready Data Summary and Analytic Result Tables </td> </tr> <tr> <td style="text-align:left;"> flextable </td> <td style="text-align:left;"> Functions for Tabular Reporting </td> </tr> <tr> <td style="text-align:left;"> Hmisc </td> <td style="text-align:left;"> Harrell Miscellaneous </td> </tr> </tbody> </table> --- ## Statistics **Analysis** <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Package </th> <th style="text-align:left;"> Description </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> stats </td> <td style="text-align:left;"> The R Stats Package </td> </tr> <tr> <td style="text-align:left;"> survival </td> <td style="text-align:left;"> Survival Analysis </td> </tr> <tr> <td style="text-align:left;"> infer </td> <td style="text-align:left;"> Tidy Statistical Inference </td> </tr> <tr> <td style="text-align:left;"> rsample </td> <td style="text-align:left;"> General Resampling Infrastructure </td> </tr> <tr> <td style="text-align:left;"> broom </td> <td style="text-align:left;"> Convert Statistical Objects into Tidy Tibbles </td> </tr> <tr> <td style="text-align:left;"> finalfit </td> <td style="text-align:left;"> Quickly Create Elegant Regression Results Tables and Plots when Modelling </td> </tr> </tbody> </table> --- ## Statistical modeling <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Package </th> <th style="text-align:left;"> Description </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> stats </td> <td style="text-align:left;"> The R Stats Package </td> </tr> <tr> <td style="text-align:left;"> survival </td> <td style="text-align:left;"> Survival Analysis </td> </tr> <tr> <td style="text-align:left;"> recipes </td> <td style="text-align:left;"> Preprocessing Tools to Create Design Matrices </td> </tr> <tr> <td style="text-align:left;"> rms </td> <td style="text-align:left;"> Regression Modeling Strategies </td> </tr> <tr> <td style="text-align:left;"> broom </td> <td style="text-align:left;"> Convert Statistical Objects into Tidy Tibbles </td> </tr> <tr> <td style="text-align:left;"> rsample </td> <td style="text-align:left;"> General Resampling Infrastructure </td> </tr> </tbody> </table> --- ## Machine Learning <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Package </th> <th style="text-align:left;"> Description </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> caret </td> <td style="text-align:left;"> Classification and Regression Training </td> </tr> <tr> <td style="text-align:left;"> parsnip </td> <td style="text-align:left;"> A Common API to Modeling and Analysis Functions </td> </tr> <tr> <td style="text-align:left;"> yardstick </td> <td style="text-align:left;"> Tidy Characterizations of Model Performance </td> </tr> <tr> <td style="text-align:left;"> rpart </td> <td style="text-align:left;"> Recursive Partitioning and Regression Trees </td> </tr> <tr> <td style="text-align:left;"> party </td> <td style="text-align:left;"> A Laboratory for Recursive Partytioning </td> </tr> <tr> <td style="text-align:left;"> randomForest </td> <td style="text-align:left;"> Breiman and Cutler's Random Forests for Classification and Regression </td> </tr> <tr> <td style="text-align:left;"> baguette </td> <td style="text-align:left;"> Efficient Model Functions for Bagging </td> </tr> <tr> <td style="text-align:left;"> kernlab </td> <td style="text-align:left;"> Kernel-Based Machine Learning Lab </td> </tr> <tr> <td style="text-align:left;"> earth </td> <td style="text-align:left;"> Multivariate Adaptive Regression Splines </td> </tr> </tbody> </table> --- ## Reporting <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Package </th> <th style="text-align:left;"> Description </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> rmarkdown </td> <td style="text-align:left;"> Dynamic Documents for R </td> </tr> <tr> <td style="text-align:left;"> knitr </td> <td style="text-align:left;"> A General-Purpose Package for Dynamic Report Generation in R </td> </tr> <tr> <td style="text-align:left;"> bookdown </td> <td style="text-align:left;"> Authoring Books and Technical Documents with R Markdown </td> </tr> <tr> <td style="text-align:left;"> distill </td> <td style="text-align:left;"> 'R Markdown' Format for Scientific and Technical Writing </td> </tr> <tr> <td style="text-align:left;"> rticles </td> <td style="text-align:left;"> Article Formats for R Markdown </td> </tr> <tr> <td style="text-align:left;"> blogdown </td> <td style="text-align:left;"> Create Blogs and Websites with R Markdown </td> </tr> <tr> <td style="text-align:left;"> flexdashboard </td> <td style="text-align:left;"> R Markdown Format for Flexible Dashboards </td> </tr> <tr> <td style="text-align:left;"> shiny </td> <td style="text-align:left;"> Web Application Framework for R </td> </tr> <tr> <td style="text-align:left;"> officer </td> <td style="text-align:left;"> Manipulation of Microsoft Word and PowerPoint Documents </td> </tr> </tbody> </table>