Suppose you have an expression dataset, and want to run t-tests for several genes/probes comparing two conditions. We can use tidyverse concepts to do this. First, let’s get some data.
# BiocManager::install('affydata')
data(Dilution, package = 'affydata')
eset <- mas5(Dilution) # pre-proccessing
dat <- exprs(eset) # extract expressions
Dat <- as.data.frame(t(dat)) %>% rownames_to_column('sample') %>% as_tibble() %>%
mutate(scanner = as_factor(pData(Dilution)$scanner)) %>% # Add "scanner" from phenoData
select(sample, scanner, everything()) # Re-order columns
Now, this dataset has 12625 probesets, and we want to run a t-test on each one,
comparing the scanner types (which are of two types). There are a couple of ways to do this,
which take about the same amount of time, and create a tibble of results, with one row for each
probeset. The first is a direct application of purrr::map
, acknowledging that a tibble or data.frame is basically a list of columns, and map
can apply a function to each column and get the results.
results1 <- Dat %>% select(-sample, -scanner) %>% # Keep only probeset data
map(~ tidy(t.test(. ~ Dat$scanner))) %>% # T-test of each column against scanner, and tidy
bind_rows(.id = 'probe') # Puts it all together into a data frame
head(results1)
## # A tibble: 6 x 11
## probe estimate estimate1 estimate2 statistic p.value parameter conf.low
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 100_… 73.4 487. 414. 1.13 0.436 1.20 -490.
## 2 1000… 108. 809. 700. 103. 0.00231 1.22 99.6
## 3 1001… -23.9 75.9 99.8 -2.11 0.195 1.66 -83.6
## 4 1002… -18.9 150. 169. -1.57 0.341 1.12 -137.
## 5 1003… -23.8 42.7 66.5 -0.657 0.618 1.15 -362.
## 6 1004… 45.4 186. 141. 1.75 0.299 1.19 -183.
## # ... with 3 more variables: conf.high <dbl>, method <chr>,
## # alternative <chr>
The second approach is what is recommended in
the broom
package; you create a list-column of tibbles using nest
, one per probeset, run t-tests on each tibble using map
, and then unnest
to put it all together. This is done after
using gather
to convert the dataset from wide to tall form.
results2 <- Dat %>%
gather(probe, value, -sample, -scanner) %>% # wide-to-tall
nest(-probe) %>% # Create nested data frames using list-columns
mutate(tests = map(data, ~tidy(t.test(value ~ scanner, data = .)))) %>% # run t-tests
select(-data) %>%
unnest() # Create final data.frame
head(results2)
## # A tibble: 6 x 11
## probe estimate estimate1 estimate2 statistic p.value parameter conf.low
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 100_… 73.4 487. 414. 1.13 0.436 1.20 -490.
## 2 1000… 108. 809. 700. 103. 0.00231 1.22 99.6
## 3 1001… -23.9 75.9 99.8 -2.11 0.195 1.66 -83.6
## 4 1002… -18.9 150. 169. -1.57 0.341 1.12 -137.
## 5 1003… -23.8 42.7 66.5 -0.657 0.618 1.15 -362.
## 6 1004… 45.4 186. 141. 1.75 0.299 1.19 -183.
## # ... with 3 more variables: conf.high <dbl>, method <chr>,
## # alternative <chr>