Fixing Humpty Dumpty: Breaking and repairing a data set

Context This is an exercise in breaking up and putting a data set back together. The second part is what I want to show, but the first part is an interesting exercise by itself Breaking up a data set We want to take a dataset, and split it up into separate datasets, each of which will contain the index column and another column, and will be named after the name of this second column.

Read More

Running many t-tests

Suppose you have an expression dataset, and want to run t-tests for several genes/probes comparing two conditions. We can use tidyverse concepts to do this. First, let’s get some data. # BiocManager::install('affydata') data(Dilution, package = 'affydata') eset <- mas5(Dilution) # pre-proccessing dat <- exprs(eset) # extract expressions Dat <- as.data.frame(t(dat)) %>% rownames_to_column('sample') %>% as_tibble() %>% mutate(scanner = as_factor(pData(Dilution)$scanner)) %>% # Add "scanner" from phenoData select(sample, scanner, everything()) # Re-order columns Now, this dataset has 12625 probesets, and we want to run a t-test on each one, comparing the scanner types (which are of two types).

Read More

Function documentation

Writing functions We learned how to write functions in class this week. As a reminder, a function takes a particular structure, starting with the keyword function. In RStudio, if you type fun [TAB] (i.e., type fun and then hit the tab key), you get the following function scaffolding: name <- function(variables) { } name is the name of the function, the inputs/arguments go inside the (), and the lines of code that comprise the function go between the {} in separate lines.

Read More

Installing rJava for R on the Mac

Introduction There are several R packages that link to or use Java, and so there exists a link between R and Java throught the rJava package. The rJava package is notoriously finicky to install in R, given the quirks of the Mac OS X system, Java, and R. In fact general advice is that, if you can avoid using rJava, do. It is a conundrum that even the most experienced R hands would like to avoid.

Read More

Summarize outcomes into new variables

An example Suppose you have a fairly detailed set of outcomes for cause of death. You want to summarize these outcomes into categories and create new variables indicating whether a person’s death could be attributed to a particular category or not. For example, suppose deaths from fentanyl are recorded as one of: Fentanyl Intoxication Fentanyl and Morphine Intoxication Combined Drug Intoxication (Heroin, Cocaine and Fentanyl) Fentanyl-involved These are all possiblities in your dataset under a variable Cause_of_death.

Read More