Classical statistical models

class: center, middle, inverse, title-slide

# Classical statistical models
### Abhijit Dasgupta
### BIOF 339

---

---
class: middle, center, inverse

# Statistical models

---
class: inverse, middle, center

.bigblockquote[All models are wrong, but some are useful]

.right[G.E.P. Box]

---

# Models

Models are our way of understanding nature, usually using some sort of mathematical expression

Famous mathematical models include Newton's second law of motion, the laws of thermodynamics, the ideal gas law

All probability distributions, like Gaussian, Binomial, Poisson, Gamma, are models

Mendel's laws are models **that result in** particular mathematical models for inheritance and population prevalence

---

# Models

We use models all the time to describe our understanding of different processes

+ Cause-and-effect relationships
--

+ Supply-demand curves
--

+ Financial planning
--

+ Optimizing travel plans (perhaps including traffic like _Google Maps_)
--

+ Understanding the effects of change
    - Climate change
    - Rule changes via Congress or companies
    - Effect of a drug on disease outcomes
    - Effect of education and behavioral patterns on future earnings
    
    
---

# Data-driven models

> Can we use data collected on various aspects of a particular context to understand the relationships between the different aspects?

+ How does increased smoking affect your risk of getting lung cancer? (causality/association)
    - Does genetics matter?
    - Does the kind of smoking matter?
    - Does gender matter?

---

# Data-driven models

> Can we use data collected on various aspects of a particular context to understand the relationships between the different aspects?

+ What is your lifetime risk of breast cancer? (prediction)
    - What if you have a sister with breast cancer?
    - What if you had early menarche?
    - What if you are of Ashkenazi Jewish heritage?
    
.right[[The Gail Model from NCI](https://bcrisktool.cancer.gov/)]

---

# Association models

These are more traditional, highly interpretative models that look at **how** different predictors
affect outcome.

+ Linear regression
+ Logistic regression
+ Cox proportional hazards regression
+ Decision trees

Since these models have a particular known structure determined by the modeler, they can 
be used on relatively small datasets

You can easily understand which predictors have more "weight" in influencing the outcome

You can literally write down how a prediction would be made

---

# Predictive models

These are more recent models that primarily look to provide good predictions of an outcome, and 
the way the predictions are made is left opaque (often called a _black box_)

+ Deep Learning (or Neural Networks)
+ Random Forests
+ Support Vector Machines
+ Gradient Boosting Machines

These models require data to both determine the structure of the model as well as make the predictions, so they require lots of data to _train_ on

The relative "weight" of predictors in influencing the **predictions** can be obtained

The effect of individual predictors is not easily interpretable, though this is changing

They require a different **philosophic perspective** than traditional association models

---

## R for statistical models

We've seen that R is great for data munging and data visualizations

R also can fit a wide variety of statistical models to data.

In fact, most new models first are implemented in R (see CRAN and GitHub)

Today we'll describe some standard popular models. Fitting most models follow the same pattern of code.

---

# Datasets

We will use the `pbc` data from the `survival` package, and the in-built `mtcars` dataset.

```r
library(survival)
str(pbc)
```

```
'data.frame':	418 obs. of  20 variables:
 $ id      : int  1 2 3 4 5 6 7 8 9 10 ...
 $ time    : int  400 4500 1012 1925 1504 2503 1832 2466 2400 51 ...
 $ status  : int  2 0 2 2 1 2 0 2 2 2 ...
 $ trt     : int  1 1 1 1 2 2 2 2 1 2 ...
 $ age     : num  58.8 56.4 70.1 54.7 38.1 ...
 $ sex     : Factor w/ 2 levels "m","f": 2 2 1 2 2 2 2 2 2 2 ...
 $ ascites : int  1 0 0 0 0 0 0 0 0 1 ...
 $ hepato  : int  1 1 0 1 1 1 1 0 0 0 ...
 $ spiders : int  1 1 0 1 1 0 0 0 1 1 ...
 $ edema   : num  1 0 0.5 0.5 0 0 0 0 0 1 ...
 $ bili    : num  14.5 1.1 1.4 1.8 3.4 0.8 1 0.3 3.2 12.6 ...
 $ chol    : int  261 302 176 244 279 248 322 280 562 200 ...
 $ albumin : num  2.6 4.14 3.48 2.54 3.53 3.98 4.09 4 3.08 2.74 ...
 $ copper  : int  156 54 210 64 143 50 52 52 79 140 ...
 $ alk.phos: num  1718 7395 516 6122 671 ...
 $ ast     : num  137.9 113.5 96.1 60.6 113.2 ...
 $ trig    : int  172 88 55 92 72 63 213 189 88 143 ...
 $ platelet: int  190 221 151 183 136 NA 204 373 251 302 ...
 $ protime : num  12.2 10.6 12 10.3 10.9 11 9.7 11 11 11.5 ...
 $ stage   : int  4 3 4 4 3 3 3 3 2 4 ...
```

---
class: inverse, middle, center

# The formula interface

---

# Representing model relationships

In R, there is a particularly convenient way to express models, where you have

- one dependent variable
- one or more independent variables, with possible transformations and interactions

```
y ~ x1 + x2 + x1:x2 + I(x3^2) + x4*x5
```

`y` depends on ...
--

- `x1` and `x2` linearly
--

- the interaction of `x1` and `x2` (represented as `x1:x2`)
--

- the square of `x3` (the `I()` notation ensures that the `^` symbol is interpreted correctly)
--

- `x4`, `x5` and their interaction (same as `x4 + x5 + x4:x5`)

---

# Representing model relationships

```
y ~ x1 + x2 + x1:x2 + I(x3^2) + x4*x5
```

This interpretation holds for the vast majority of statistical models in R

- For decision trees and random forests and neural networks, don't add interactions or transformations, since the model will try to figure those out on their own

---

# Our first model

```r
myLinearModel <- lm(chol ~ bili + albumin + copper + sex, data = pbc)
```

Note that everything in R is an **object**, so you can store a model in a variable name.

This statement runs the model and stored the fitted model in `myLinearModel`

.bg-light-blue.b--blue.ba.bw2.br3.ph4.mt2.center[
R does not interpret the model, evaluate the adequacy or appropriateness of the model, or comment on whether looking at the relationship between cholesterol and bilirubin makes any kind of sense.

.f2[It just fits the model it is given]
]

---

# Our first model

```r
myLinearModel
```

```

Call:
lm(formula = chol ~ bili + albumin + copper + sex, data = pbc)

Coefficients:
(Intercept)         bili      albumin       copper         sexf  
   221.0571      22.7113      28.9076      -0.1888      -9.7605  
```

> Not very informative, is it?

---

# Our first model

```r
summary(myLinearModel)
```

```

Call:
lm(formula = chol ~ bili + albumin + copper + sex, data = pbc)

Residuals:
    Min      1Q  Median      3Q     Max 
-580.83  -90.62  -34.79   37.96 1297.16

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 221.0571   135.6962   1.629    0.104    
bili         22.7113     3.2821   6.920 3.14e-11 ***
albumin      28.9076    33.8309   0.854    0.394    
copper       -0.1888     0.1743  -1.083    0.280    
sexf         -9.7605    40.8253  -0.239    0.811    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 214.3 on 277 degrees of freedom
  (136 observations deleted due to missingness)
Multiple R-squared:  0.1638,	Adjusted R-squared:  0.1517 
F-statistic: 13.56 on 4 and 277 DF,  p-value: 4.147e-10
```

> A little better

???

Talk about the different metrics in this slide

---

# Our first model

```r
broom::tidy(myLinearModel)
```

```
# A tibble: 5 × 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)  221.      136.        1.63  1.04e- 1
2 bili          22.7       3.28      6.92  3.14e-11
3 albumin       28.9      33.8       0.854 3.94e- 1
4 copper        -0.189     0.174    -1.08  2.80e- 1
5 sexf          -9.76     40.8      -0.239 8.11e- 1
```

```r
broom::glance(myLinearModel)
```

```
# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.164         0.152  214.      13.6 4.15e-10     4 -1911. 3834. 3856.
# … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
```

---

# Our first model

.pull-left[

```r
library(gtsummary)
tbl_regression(myLinearModel)
```

#ajvcikbkxb .gt_table {
  display: table;
  border-collapse: collapse;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#ajvcikbkxb .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#ajvcikbkxb .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#ajvcikbkxb .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 0;
  padding-bottom: 4px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#ajvcikbkxb .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#ajvcikbkxb .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#ajvcikbkxb .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#ajvcikbkxb .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#ajvcikbkxb .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#ajvcikbkxb .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#ajvcikbkxb .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#ajvcikbkxb .gt_group_heading {
  padding: 8px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
}

#ajvcikbkxb .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#ajvcikbkxb .gt_from_md > :first-child {
  margin-top: 0;
}

#ajvcikbkxb .gt_from_md > :last-child {
  margin-bottom: 0;
}

#ajvcikbkxb .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#ajvcikbkxb .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 12px;
}

#ajvcikbkxb .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#ajvcikbkxb .gt_first_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
}

#ajvcikbkxb .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#ajvcikbkxb .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#ajvcikbkxb .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#ajvcikbkxb .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#ajvcikbkxb .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#ajvcikbkxb .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding: 4px;
}

#ajvcikbkxb .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#ajvcikbkxb .gt_sourcenote {
  font-size: 90%;
  padding: 4px;
}

#ajvcikbkxb .gt_left {
  text-align: left;
}

#ajvcikbkxb .gt_center {
  text-align: center;
}

#ajvcikbkxb .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#ajvcikbkxb .gt_font_normal {
  font-weight: normal;
}

#ajvcikbkxb .gt_font_bold {
  font-weight: bold;
}

#ajvcikbkxb .gt_font_italic {
  font-style: italic;
}

#ajvcikbkxb .gt_super {
  font-size: 65%;
}

#ajvcikbkxb .gt_footnote_marks {
  font-style: italic;
  font-size: 65%;
}
</style>
<div id="ajvcikbkxb" style="overflow-x:auto;overflow-y:auto;width:auto;height:auto;"><table class="gt_table">
  
  <thead class="gt_col_headings">
    <tr>
      <th class="gt_col_heading gt_columns_bottom_border gt_left" rowspan="1" colspan="1"><strong>Characteristic</strong></th>
      <th class="gt_col_heading gt_columns_bottom_border gt_center" rowspan="1" colspan="1"><strong>Beta</strong></th>
      <th class="gt_col_heading gt_columns_bottom_border gt_center" rowspan="1" colspan="1"><strong>95% CI</strong><sup class="gt_footnote_marks">1</sup></th>
      <th class="gt_col_heading gt_columns_bottom_border gt_center" rowspan="1" colspan="1"><strong>p-value</strong></th>
    </tr>
  </thead>
  <tbody class="gt_table_body">
    <tr>
      <td class="gt_row gt_left">bili</td>
      <td class="gt_row gt_center">23</td>
      <td class="gt_row gt_center">16, 29</td>
      <td class="gt_row gt_center"><0.001</td>
    </tr>
    <tr>
      <td class="gt_row gt_left">albumin</td>
      <td class="gt_row gt_center">29</td>
      <td class="gt_row gt_center">-38, 96</td>
      <td class="gt_row gt_center">0.4</td>
    </tr>
    <tr>
      <td class="gt_row gt_left">copper</td>
      <td class="gt_row gt_center">-0.19</td>
      <td class="gt_row gt_center">-0.53, 0.15</td>
      <td class="gt_row gt_center">0.3</td>
    </tr>
    <tr>
      <td class="gt_row gt_left">sex</td>
      <td class="gt_row gt_center"></td>
      <td class="gt_row gt_center"></td>
      <td class="gt_row gt_center"></td>
    </tr>
    <tr>
      <td class="gt_row gt_left" style="text-align: left; text-indent: 10px;">m</td>
      <td class="gt_row gt_center">—</td>
      <td class="gt_row gt_center">—</td>
      <td class="gt_row gt_center"></td>
    </tr>
    <tr>
      <td class="gt_row gt_left" style="text-align: left; text-indent: 10px;">f</td>
      <td class="gt_row gt_center">-9.8</td>
      <td class="gt_row gt_center">-90, 71</td>
      <td class="gt_row gt_center">0.8</td>
    </tr>
  </tbody>
  
  <tfoot>
    <tr class="gt_footnotes">
      <td colspan="4">
        <p class="gt_footnote">
          <sup class="gt_footnote_marks">
            <em>1</em>
          </sup>
           
          CI = Confidence Interval
          <br />
        </p>
      </td>
    </tr>
  </tfoot>
</table></div>

]
.pull-right[

```r
library(stargazer)
stargazer(myLinearModel, type='html')
```

<table style="text-align:center"><tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td><em>Dependent variable:</em></td></tr>
<tr><td></td><td colspan="1" style="border-bottom: 1px solid black"></td></tr>
<tr><td style="text-align:left"></td><td>chol</td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">bili</td><td>22.711<sup>***</sup></td></tr>
<tr><td style="text-align:left"></td><td>(3.282)</td></tr>
<tr><td style="text-align:left"></td><td></td></tr>
<tr><td style="text-align:left">albumin</td><td>28.908</td></tr>
<tr><td style="text-align:left"></td><td>(33.831)</td></tr>
<tr><td style="text-align:left"></td><td></td></tr>
<tr><td style="text-align:left">copper</td><td>-0.189</td></tr>
<tr><td style="text-align:left"></td><td>(0.174)</td></tr>
<tr><td style="text-align:left"></td><td></td></tr>
<tr><td style="text-align:left">sexf</td><td>-9.760</td></tr>
<tr><td style="text-align:left"></td><td>(40.825)</td></tr>
<tr><td style="text-align:left"></td><td></td></tr>
<tr><td style="text-align:left">Constant</td><td>221.057</td></tr>
<tr><td style="text-align:left"></td><td>(135.696)</td></tr>
<tr><td style="text-align:left"></td><td></td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>282</td></tr>
<tr><td style="text-align:left">R<sup>2</sup></td><td>0.164</td></tr>
<tr><td style="text-align:left">Adjusted R<sup>2</sup></td><td>0.152</td></tr>
<tr><td style="text-align:left">Residual Std. Error</td><td>214.262 (df = 277)</td></tr>
<tr><td style="text-align:left">F Statistic</td><td>13.560<sup>***</sup> (df = 4; 277)</td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr>
</table>
  
]

---

# Our first model

We do need some sense as to how well this model fit the data

.pull-left[

```r
# install.packages('ggfortify')
library(ggfortify)
autoplot(myLinearModel)
```
]
.pull-right[
![](04-stat_models_files/figure-html/09-Modeling-21-1.png)
]

---

# Our first model

Let's see if we have some strangeness going on

--
.pull-left[

```r
ggplot(pbc, aes(x = bili))+geom_density()
```

We'd like this to be a bit more "Gaussian" for better behavior
]
.pull-right[
![](04-stat_models_files/figure-html/09-Modeling-23-1.png)

]

---

# Our first model

Let's see if we have some strangeness going on

.pull-left[

```r
ggplot(pbc, aes(x = log(bili)))+geom_density()
```

]
.pull-right[
![](04-stat_models_files/figure-html/09-Modeling-25-1.png)

]

---

# Our first model

```r
myLinearModel2 <- lm(chol~log(bili) + albumin + copper + sex, data = pbc)
summary(myLinearModel2)
```

```

Call:
lm(formula = chol ~ log(bili) + albumin + copper + sex, data = pbc)

Residuals:
    Min      1Q  Median      3Q     Max 
-448.77  -96.23  -26.77   40.76 1221.21

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 128.3685   132.9579   0.965   0.3351    
log(bili)   124.2339    14.8852   8.346 3.39e-15 ***
albumin      53.6093    33.2245   1.614   0.1078    
copper       -0.3775     0.1743  -2.166   0.0312 *  
sexf         19.6595    39.1715   0.502   0.6161    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 207.4 on 277 degrees of freedom
  (136 observations deleted due to missingness)
Multiple R-squared:  0.2163,	Adjusted R-squared:  0.205 
F-statistic: 19.11 on 4 and 277 DF,  p-value: 6.792e-14
```

---

# Our first model

```r
tbl_regression(myLinearModel2)
```

#dwmhhuuasw .gt_table {
  display: table;
  border-collapse: collapse;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#dwmhhuuasw .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#dwmhhuuasw .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#dwmhhuuasw .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 0;
  padding-bottom: 4px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#dwmhhuuasw .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#dwmhhuuasw .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#dwmhhuuasw .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#dwmhhuuasw .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#dwmhhuuasw .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#dwmhhuuasw .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#dwmhhuuasw .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#dwmhhuuasw .gt_group_heading {
  padding: 8px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
}

#dwmhhuuasw .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#dwmhhuuasw .gt_from_md > :first-child {
  margin-top: 0;
}

#dwmhhuuasw .gt_from_md > :last-child {
  margin-bottom: 0;
}

#dwmhhuuasw .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#dwmhhuuasw .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 12px;
}

#dwmhhuuasw .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#dwmhhuuasw .gt_first_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
}

#dwmhhuuasw .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#dwmhhuuasw .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#dwmhhuuasw .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#dwmhhuuasw .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#dwmhhuuasw .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#dwmhhuuasw .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding: 4px;
}

#dwmhhuuasw .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#dwmhhuuasw .gt_sourcenote {
  font-size: 90%;
  padding: 4px;
}

#dwmhhuuasw .gt_left {
  text-align: left;
}

#dwmhhuuasw .gt_center {
  text-align: center;
}

#dwmhhuuasw .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#dwmhhuuasw .gt_font_normal {
  font-weight: normal;
}

#dwmhhuuasw .gt_font_bold {
  font-weight: bold;
}

#dwmhhuuasw .gt_font_italic {
  font-style: italic;
}

#dwmhhuuasw .gt_super {
  font-size: 65%;
}

#dwmhhuuasw .gt_footnote_marks {
  font-style: italic;
  font-size: 65%;
}
</style>
<div id="dwmhhuuasw" style="overflow-x:auto;overflow-y:auto;width:auto;height:auto;"><table class="gt_table">
  
  <thead class="gt_col_headings">
    <tr>
      <th class="gt_col_heading gt_columns_bottom_border gt_left" rowspan="1" colspan="1"><strong>Characteristic</strong></th>
      <th class="gt_col_heading gt_columns_bottom_border gt_center" rowspan="1" colspan="1"><strong>Beta</strong></th>
      <th class="gt_col_heading gt_columns_bottom_border gt_center" rowspan="1" colspan="1"><strong>95% CI</strong><sup class="gt_footnote_marks">1</sup></th>
      <th class="gt_col_heading gt_columns_bottom_border gt_center" rowspan="1" colspan="1"><strong>p-value</strong></th>
    </tr>
  </thead>
  <tbody class="gt_table_body">
    <tr>
      <td class="gt_row gt_left">log(bili)</td>
      <td class="gt_row gt_center">124</td>
      <td class="gt_row gt_center">95, 154</td>
      <td class="gt_row gt_center"><0.001</td>
    </tr>
    <tr>
      <td class="gt_row gt_left">albumin</td>
      <td class="gt_row gt_center">54</td>
      <td class="gt_row gt_center">-12, 119</td>
      <td class="gt_row gt_center">0.11</td>
    </tr>
    <tr>
      <td class="gt_row gt_left">copper</td>
      <td class="gt_row gt_center">-0.38</td>
      <td class="gt_row gt_center">-0.72, -0.03</td>
      <td class="gt_row gt_center">0.031</td>
    </tr>
    <tr>
      <td class="gt_row gt_left">sex</td>
      <td class="gt_row gt_center"></td>
      <td class="gt_row gt_center"></td>
      <td class="gt_row gt_center"></td>
    </tr>
    <tr>
      <td class="gt_row gt_left" style="text-align: left; text-indent: 10px;">m</td>
      <td class="gt_row gt_center">—</td>
      <td class="gt_row gt_center">—</td>
      <td class="gt_row gt_center"></td>
    </tr>
    <tr>
      <td class="gt_row gt_left" style="text-align: left; text-indent: 10px;">f</td>
      <td class="gt_row gt_center">20</td>
      <td class="gt_row gt_center">-57, 97</td>
      <td class="gt_row gt_center">0.6</td>
    </tr>
  </tbody>
  
  <tfoot>
    <tr class="gt_footnotes">
      <td colspan="4">
        <p class="gt_footnote">
          <sup class="gt_footnote_marks">
            <em>1</em>
          </sup>
           
          CI = Confidence Interval
          <br />
        </p>
      </td>
    </tr>
  </tfoot>
</table></div>

---

# Our first model

```r
autoplot(myLinearModel2)
```

![](04-stat_models_files/figure-html/09-Modeling-27-1.png)

---

# Just the residual plot, please

```r
autoplot(myLinearModel2, which=1)
```

![](04-stat_models_files/figure-html/09-Modeling-28-1.png)

> OR

---

# Just the residual plot, please

```r
d <- broom::augment(myLinearModel2, newdata=pbc)
d
```

```
# A tibble: 418 × 22
      id  time status   trt   age sex   ascites hepato spiders edema  bili  chol
   <int> <int>  <int> <int> <dbl> <fct>   <int>  <int>   <int> <dbl> <dbl> <int>
 1     1   400      2     1  58.8 f           1      1       1   1    14.5   261
 2     2  4500      0     1  56.4 f           0      1       1   0     1.1   302
 3     3  1012      2     1  70.1 m           0      0       0   0.5   1.4   176
 4     4  1925      2     1  54.7 f           0      1       1   0.5   1.8   244
 5     5  1504      1     2  38.1 f           0      1       1   0     3.4   279
 6     6  2503      2     2  66.3 f           0      1       0   0     0.8   248
 7     7  1832      0     2  55.5 f           0      1       0   0     1     322
 8     8  2466      2     2  53.1 f           0      0       0   0     0.3   280
 9     9  2400      2     1  42.5 f           0      0       1   0     3.2   562
10    10    51      2     2  70.6 f           1      0       1   1    12.6   200
# … with 408 more rows, and 10 more variables: albumin <dbl>, copper <int>,
#   alk.phos <dbl>, ast <dbl>, trig <int>, platelet <int>, protime <dbl>,
#   stage <int>, .fitted <dbl>, .resid <dbl>
```

---

# Just the residual plot, please

```r
ggplot(d, aes(x = .fitted, y = .resid))+geom_point()+ geom_smooth(se=F)+
  labs(x = 'Fitted values', y = 'Residual values')+
  geom_hline(yintercept=0, linetype=2) + 
  theme_classic()
```

![](04-stat_models_files/figure-html/09-Modeling-30-1.png)

---

# Predictions

```r
head(predict(myLinearModel2, newdata = pbc))
```

```
       1        2        3        4        5        6 
560.7384 361.4248 277.4503 333.0571 435.3173 314.7947 
```

The `newdata` has to have the same format and components as the original data the model was trained on

---

# Categorical predictors

```r
myLM3 <- lm(chol ~ log(bili) + sex, data = pbc)
broom::tidy(myLM3)
```

```
# A tibble: 3 × 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)    283.       36.6     7.71  2.14e-13
2 log(bili)       99.6      12.1     8.22  7.37e-15
3 sexf            32.5      37.8     0.858 3.92e- 1
```

R has a somewhat unfortunate notation for categorical varialbes here, as `{variable name}{level}`

---
class: inverse, middle, center

# Logistic regression

---

# The logistic transformation

For an outcome which is binary (0/1), what is really modeled is the **probability** that the outcome is 1, usually denoted by _p_.

However, we know `$0 \leq p \leq 1$`, so what if the model gives a prediction outside this range!!

The logistic transform takes _p_ to

$$
\text{logit}(p) = \log\left(\frac{p}{1-p}\right)
$$

and we model _logit(p)_, which has a range from `$-\infty$` to `$\infty$`

---

# Logistic regression

Logistic regression is a special case of a **generalized linear model**, so the function we use to run a logistic regression is `glm`

```r
myLR <- glm(spiders ~ albumin + bili + chol, data = pbc, family = binomial)
myLR
```

```

Call:  glm(formula = spiders ~ albumin + bili + chol, family = binomial, 
    data = pbc)

Coefficients:
(Intercept)      albumin         bili         chol  
  2.3326484   -0.9954927    0.0995915   -0.0003176

Degrees of Freedom: 283 Total (i.e. Null);  280 Residual
  (134 observations deleted due to missingness)
Null Deviance:	    341.4 
Residual Deviance: 315.2 	AIC: 323.2
```

- We have to add the `family = binomial` as an argument, since this is a special kind of GLM
- All these models only use complete data; they kick out rows with missing data

---

# Logistic regression

```r
broom::tidy(myLR)
```

```
# A tibble: 4 × 5
  term         estimate std.error statistic p.value
  <chr>           <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept)  2.33      1.30         1.80  0.0717 
2 albumin     -0.995     0.362       -2.75  0.00595
3 bili         0.0996    0.0344       2.89  0.00381
4 chol        -0.000318  0.000615    -0.517 0.605  
```
--

```r
broom::glance(myLR)
```

```
# A tibble: 1 × 8
  null.deviance df.null logLik   AIC   BIC deviance df.residual  nobs
          <dbl>   <int>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
1          341.     283  -158.  323.  338.     315.         280   284
```

---

# Logistic regression

```r
tbl_regression(myLR)
```

#skwxklzoim .gt_table {
  display: table;
  border-collapse: collapse;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#skwxklzoim .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#skwxklzoim .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#skwxklzoim .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 0;
  padding-bottom: 4px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#skwxklzoim .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#skwxklzoim .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#skwxklzoim .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#skwxklzoim .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#skwxklzoim .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#skwxklzoim .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#skwxklzoim .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#skwxklzoim .gt_group_heading {
  padding: 8px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
}

#skwxklzoim .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#skwxklzoim .gt_from_md > :first-child {
  margin-top: 0;
}

#skwxklzoim .gt_from_md > :last-child {
  margin-bottom: 0;
}

#skwxklzoim .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#skwxklzoim .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 12px;
}

#skwxklzoim .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#skwxklzoim .gt_first_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
}

#skwxklzoim .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#skwxklzoim .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#skwxklzoim .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#skwxklzoim .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#skwxklzoim .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#skwxklzoim .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding: 4px;
}

#skwxklzoim .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#skwxklzoim .gt_sourcenote {
  font-size: 90%;
  padding: 4px;
}

#skwxklzoim .gt_left {
  text-align: left;
}

#skwxklzoim .gt_center {
  text-align: center;
}

#skwxklzoim .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#skwxklzoim .gt_font_normal {
  font-weight: normal;
}

#skwxklzoim .gt_font_bold {
  font-weight: bold;
}

#skwxklzoim .gt_font_italic {
  font-style: italic;
}

#skwxklzoim .gt_super {
  font-size: 65%;
}

#skwxklzoim .gt_footnote_marks {
  font-style: italic;
  font-size: 65%;
}
</style>
<div id="skwxklzoim" style="overflow-x:auto;overflow-y:auto;width:auto;height:auto;"><table class="gt_table">
  
  <thead class="gt_col_headings">
    <tr>
      <th class="gt_col_heading gt_columns_bottom_border gt_left" rowspan="1" colspan="1"><strong>Characteristic</strong></th>
      <th class="gt_col_heading gt_columns_bottom_border gt_center" rowspan="1" colspan="1"><strong>log(OR)</strong><sup class="gt_footnote_marks">1</sup></th>
      <th class="gt_col_heading gt_columns_bottom_border gt_center" rowspan="1" colspan="1"><strong>95% CI</strong><sup class="gt_footnote_marks">1</sup></th>
      <th class="gt_col_heading gt_columns_bottom_border gt_center" rowspan="1" colspan="1"><strong>p-value</strong></th>
    </tr>
  </thead>
  <tbody class="gt_table_body">
    <tr>
      <td class="gt_row gt_left">albumin</td>
      <td class="gt_row gt_center">-1.0</td>
      <td class="gt_row gt_center">-1.7, -0.30</td>
      <td class="gt_row gt_center">0.006</td>
    </tr>
    <tr>
      <td class="gt_row gt_left">bili</td>
      <td class="gt_row gt_center">0.10</td>
      <td class="gt_row gt_center">0.04, 0.17</td>
      <td class="gt_row gt_center">0.004</td>
    </tr>
    <tr>
      <td class="gt_row gt_left">chol</td>
      <td class="gt_row gt_center">0.00</td>
      <td class="gt_row gt_center">0.00, 0.00</td>
      <td class="gt_row gt_center">0.6</td>
    </tr>
  </tbody>
  
  <tfoot>
    <tr class="gt_footnotes">
      <td colspan="4">
        <p class="gt_footnote">
          <sup class="gt_footnote_marks">
            <em>1</em>
          </sup>
           
          OR = Odds Ratio, CI = Confidence Interval
          <br />
        </p>
      </td>
    </tr>
  </tfoot>
</table></div>

---

# Logistic regression

```r
tbl_regression(myLR, exponentiate = TRUE)
```

#mhgwadbbxg .gt_table {
  display: table;
  border-collapse: collapse;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#mhgwadbbxg .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#mhgwadbbxg .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#mhgwadbbxg .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 0;
  padding-bottom: 4px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#mhgwadbbxg .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#mhgwadbbxg .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#mhgwadbbxg .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#mhgwadbbxg .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#mhgwadbbxg .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#mhgwadbbxg .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#mhgwadbbxg .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#mhgwadbbxg .gt_group_heading {
  padding: 8px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
}

#mhgwadbbxg .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#mhgwadbbxg .gt_from_md > :first-child {
  margin-top: 0;
}

#mhgwadbbxg .gt_from_md > :last-child {
  margin-bottom: 0;
}

#mhgwadbbxg .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#mhgwadbbxg .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 12px;
}

#mhgwadbbxg .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#mhgwadbbxg .gt_first_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
}

#mhgwadbbxg .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#mhgwadbbxg .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#mhgwadbbxg .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#mhgwadbbxg .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#mhgwadbbxg .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#mhgwadbbxg .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding: 4px;
}

#mhgwadbbxg .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#mhgwadbbxg .gt_sourcenote {
  font-size: 90%;
  padding: 4px;
}

#mhgwadbbxg .gt_left {
  text-align: left;
}

#mhgwadbbxg .gt_center {
  text-align: center;
}

#mhgwadbbxg .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#mhgwadbbxg .gt_font_normal {
  font-weight: normal;
}

#mhgwadbbxg .gt_font_bold {
  font-weight: bold;
}

#mhgwadbbxg .gt_font_italic {
  font-style: italic;
}

#mhgwadbbxg .gt_super {
  font-size: 65%;
}

#mhgwadbbxg .gt_footnote_marks {
  font-style: italic;
  font-size: 65%;
}
</style>
<div id="mhgwadbbxg" style="overflow-x:auto;overflow-y:auto;width:auto;height:auto;"><table class="gt_table">
  
  <thead class="gt_col_headings">
    <tr>
      <th class="gt_col_heading gt_columns_bottom_border gt_left" rowspan="1" colspan="1"><strong>Characteristic</strong></th>
      <th class="gt_col_heading gt_columns_bottom_border gt_center" rowspan="1" colspan="1"><strong>OR</strong><sup class="gt_footnote_marks">1</sup></th>
      <th class="gt_col_heading gt_columns_bottom_border gt_center" rowspan="1" colspan="1"><strong>95% CI</strong><sup class="gt_footnote_marks">1</sup></th>
      <th class="gt_col_heading gt_columns_bottom_border gt_center" rowspan="1" colspan="1"><strong>p-value</strong></th>
    </tr>
  </thead>
  <tbody class="gt_table_body">
    <tr>
      <td class="gt_row gt_left">albumin</td>
      <td class="gt_row gt_center">0.37</td>
      <td class="gt_row gt_center">0.18, 0.74</td>
      <td class="gt_row gt_center">0.006</td>
    </tr>
    <tr>
      <td class="gt_row gt_left">bili</td>
      <td class="gt_row gt_center">1.10</td>
      <td class="gt_row gt_center">1.04, 1.19</td>
      <td class="gt_row gt_center">0.004</td>
    </tr>
    <tr>
      <td class="gt_row gt_left">chol</td>
      <td class="gt_row gt_center">1.00</td>
      <td class="gt_row gt_center">1.00, 1.00</td>
      <td class="gt_row gt_center">0.6</td>
    </tr>
  </tbody>
  
  <tfoot>
    <tr class="gt_footnotes">
      <td colspan="4">
        <p class="gt_footnote">
          <sup class="gt_footnote_marks">
            <em>1</em>
          </sup>
           
          OR = Odds Ratio, CI = Confidence Interval
          <br />
        </p>
      </td>
    </tr>
  </tfoot>
</table></div>

---
# Predictions from logistic regression

```r
head(predict(myLR))
```

```
          1           2           3           4           5           6 
 1.10554163 -1.77506554 -1.04814132 -0.09414055 -0.93144911 -1.62851203 
```
--

These are on the "wrong" scale. We would expect probabilities
--

```r
head(predict(myLR, type='response'))
```

```
        1         2         3         4         5         6 
0.7512970 0.1449135 0.2595822 0.4764822 0.2826308 0.1640343 
```

or you can use `plogis(predict(myLR))` for the inverse logistic transform

---
class: inverse, middle, center

# Model selection

---

# How to get the "best" model

Generally getting to the best model involves

- looking at a lot of graphs
- Fitting lots of models
- Comparing the model fits to see what seems good

Sometimes if you have two models that fit about the same, you take the smaller, less complex model (Occam's Razor)

Generally it is not recommended that you use automated model selection methods. It screws up your error rates and may not be the right end result for your objectives

> Model building and selection is an art

---

# Clues to follow

You can look at the relative weights (size of coefficient and its p-value) of different predictors

- These weights will change once you change the model, so be aware of that

You can trim the number of variables based on collinearities

- If several variables are essentially measuring the same thing, use one of them

You can look at residuals for clues about transformations

You can look at graphs, as well as science, for clues about interactions (synergies and antagonisms)

---

# Automated model selection

```r
# install.packages('leaps')
library(leaps)
mtcars1 <- mtcars %>% mutate(across(c(cyl, vs:carb), as.factor))
all_subsets <- regsubsets(mpg~., data = mtcars1)
all_subsets
```

```
Subset selection object
Call: regsubsets.formula(mpg ~ ., data = mtcars1)
16 Variables  (and intercept)
      Forced in Forced out
cyl6      FALSE      FALSE
cyl8      FALSE      FALSE
disp      FALSE      FALSE
hp        FALSE      FALSE
drat      FALSE      FALSE
wt        FALSE      FALSE
qsec      FALSE      FALSE
vs1       FALSE      FALSE
am1       FALSE      FALSE
gear4     FALSE      FALSE
gear5     FALSE      FALSE
carb2     FALSE      FALSE
carb3     FALSE      FALSE
carb4     FALSE      FALSE
carb6     FALSE      FALSE
carb8     FALSE      FALSE
1 subsets of each size up to 8
Selection Algorithm: exhaustive
```

---

# Automated model selection

Which has the best R<sup>2</sup>?

```r
ind <- which.max(summary(all_subsets)$adjr2)
summary(all_subsets)$which[ind,]
```

```
(Intercept)        cyl6        cyl8        disp          hp        drat 
       TRUE        TRUE       FALSE       FALSE        TRUE       FALSE 
         wt        qsec         vs1         am1       gear4       gear5 
       TRUE       FALSE        TRUE        TRUE       FALSE       FALSE 
      carb2       carb3       carb4       carb6       carb8 
      FALSE       FALSE       FALSE       FALSE       FALSE 
```