We will use data from a gene expression experiment on chronic lymphocytic leukemia (CLL) patients, available from Bioconductor. You can download the data here
read_csv
function from the readr
package, part of the tidyverse
metapackage.suppressPackageStartupMessages(library(tidyverse))
link <- 'https://dl.dropboxusercontent.com/s/op4ehzkc7ery96l/geneexpressions.csv'
download.file(link,'geneexpressions.csv')
dat <- read_csv('geneexpressions.csv')
dat2 <- dat %>% select(-starts_with('AFFX'))
gather
, but making sure that we keep the SampleID
and Disease
columns intact and repeated over the rows of each probeset’s data.dat3 <- dat2 %>% gather(probe, value, -SampleID, -Disease)
dat3 <- dat3 %>% filter(!is.na(Disease))
I created the dataset used here using the following code:
source('http://bioconductor.org/biocLite.R')
biocLite('CLL')
library(CLL)
library(affy)
data(CLLBatch)
data(disease)
d <- rma(CLLbatch)
dat <- t(as.data.frame(exprs(d)))
dat <- cbind(disease, dat)
write.csv(dat, file='geneexpressions.csv', row.names = F)