Bioconductor

December 11th, 2018

Bioconductor

Bioconductor provides tools for the analysis and comprehension of high-throughput genomic and biological data, using R.

1476 packages
Covers the bioinformatic pipeline
Software
Annotation
Experiments

Explore Bioconductor website

Installing Bioconductor packages

This is different from the usual install.packages. If you are using a version of R less than 3.5, the method is:

source('http://bioconductor.org/biocLite.R')
biocLite(c('Biobase','limma','hgu95av2.db','Biostrings'))

Otherwise (for R version 3.5 and later):

install.packages("BiocManager")
BiocManager::install(c('Biobase','limma','hgu95av2.db','Biostrings'))

DNA sequences

library(Biostrings)
dna <- DNAStringSet(c("AACAT", "GGCGCCT"))
reverseComplement(dna)

#    A DNAStringSet instance of length 2
#      width seq
#  [1]     5 ATGTT
#  [2]     7 AGGCGCC

DNA sequences

library(Biostrings)
data("phiX174Phage")
phiX174Phage

#    A DNAStringSet instance of length 6
#      width seq                                          names               
#  [1]  5386 GAGTTTTATCGCTTCCATGAC...ATTGGCGTATCCAACCTGCA Genbank
#  [2]  5386 GAGTTTTATCGCTTCCATGAC...ATTGGCGTATCCAACCTGCA RF70s
#  [3]  5386 GAGTTTTATCGCTTCCATGAC...ATTGGCGTATCCAACCTGCA SS78
#  [4]  5386 GAGTTTTATCGCTTCCATGAC...ATTGGCGTATCCAACCTGCA Bull
#  [5]  5386 GAGTTTTATCGCTTCCATGAC...ATTGGCGTATCCAACCTGCA G97
#  [6]  5386 GAGTTTTATCGCTTCCATGAC...ATTGGCGTATCCAACCTGCA NEB03

DNA sequences

letterFrequency(phiX174Phage, 'GC', as.prob=TRUE)

#             G|C
#  [1,] 0.4476420
#  [2,] 0.4472707
#  [3,] 0.4472707
#  [4,] 0.4470850
#  [5,] 0.4472707
#  [6,] 0.4470850

Data in Bioconductor

The basic structure for expression data in a Bioconductor pipeline is the ExpressionSet

library(Biobase)
data("sample.ExpressionSet")
str(sample.ExpressionSet)

#  Formal class 'ExpressionSet' [package "Biobase"] with 7 slots
#    ..@ experimentData   :Formal class 'MIAME' [package "Biobase"] with 13 slots
#    .. .. ..@ name             : chr "Pierre Fermat"
#    .. .. ..@ lab              : chr "Francis Galton Lab"
#    .. .. ..@ contact          : chr "pfermat@lab.not.exist"
#    .. .. ..@ title            : chr "Smoking-Cancer Experiment"
#    .. .. ..@ abstract         : chr "An example object of expression set (ExpressionSet) class"
#    .. .. ..@ url              : chr "www.lab.not.exist"
#    .. .. ..@ pubMedIds        : chr ""
#    .. .. ..@ samples          : list()
#    .. .. ..@ hybridizations   : list()
#    .. .. ..@ normControls     : list()
#    .. .. ..@ preprocessing    : list()
#    .. .. ..@ other            :List of 1
#    .. .. .. ..$ notes: chr "An example object of expression set (exprSet) class"
#    .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
#    .. .. .. .. ..@ .Data:List of 2
#    .. .. .. .. .. ..$ : int [1:3] 1 0 0
#    .. .. .. .. .. ..$ : int [1:3] 1 1 0
#    ..@ assayData        :<environment: 0x7fe8eb96e078> 
#    ..@ phenoData        :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
#    .. .. ..@ varMetadata      :'data.frame':  3 obs. of  1 variable:
#    .. .. .. ..$ labelDescription: chr [1:3] "Female/Male" "Case/Control" "Testing Score"
#    .. .. ..@ data             :'data.frame':  26 obs. of  3 variables:
#    .. .. .. ..$ sex  : Factor w/ 2 levels "Female","Male": 1 2 2 2 1 2 2 2 1 2 ...
#    .. .. .. ..$ type : Factor w/ 2 levels "Case","Control": 2 1 2 1 1 2 1 1 1 2 ...
#    .. .. .. ..$ score: num [1:26] 0.75 0.4 0.73 0.42 0.93 0.22 0.96 0.79 0.37 0.63 ...
#    .. .. ..@ dimLabels        : chr [1:2] "sampleNames" "sampleColumns"
#    .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
#    .. .. .. .. ..@ .Data:List of 1
#    .. .. .. .. .. ..$ : int [1:3] 1 1 0
#    ..@ featureData      :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
#    .. .. ..@ varMetadata      :'data.frame':  0 obs. of  1 variable:
#    .. .. .. ..$ labelDescription: chr(0) 
#    .. .. ..@ data             :'data.frame':  500 obs. of  0 variables
#    .. .. ..@ dimLabels        : chr [1:2] "featureNames" "featureColumns"
#    .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
#    .. .. .. .. ..@ .Data:List of 1
#    .. .. .. .. .. ..$ : int [1:3] 1 1 0
#    ..@ annotation       : chr "hgu95av2"
#    ..@ protocolData     :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
#    .. .. ..@ varMetadata      :'data.frame':  0 obs. of  1 variable:
#    .. .. .. ..$ labelDescription: chr(0) 
#    .. .. ..@ data             :'data.frame':  26 obs. of  0 variables
#    .. .. ..@ dimLabels        : chr [1:2] "sampleNames" "sampleColumns"
#    .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
#    .. .. .. .. ..@ .Data:List of 1
#    .. .. .. .. .. ..$ : int [1:3] 1 1 0
#    ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
#    .. .. ..@ .Data:List of 4
#    .. .. .. ..$ : int [1:3] 2 13 0
#    .. .. .. ..$ : int [1:3] 2 11 5
#    .. .. .. ..$ : int [1:3] 1 3 0
#    .. .. .. ..$ : int [1:3] 1 0 0

Differences with usual R

Instead of storing data in named lists, ExpressionSet objects store data in slots, and we can see what the slots are with slotNames:

slotNames(sample.ExpressionSet)

#  [1] "experimentData"    "assayData"         "phenoData"        
#  [4] "featureData"       "annotation"        "protocolData"     
#  [7] ".__classVersion__"

Differences with usual R

You can access these slots using @, instead of the usual $:

sample.ExpressionSet@phenoData

#  An object of class 'AnnotatedDataFrame'
#    sampleNames: A B ... Z (26 total)
#    varLabels: sex type score
#    varMetadata: labelDescription

Differences with usual R

However, it’s much easier to go with the built-in functions

pData(sample.ExpressionSet)

#       sex    type score
#  A Female Control  0.75
#  B   Male    Case  0.40
#  C   Male Control  0.73
#  D   Male    Case  0.42
#  E Female    Case  0.93
#  F   Male Control  0.22
#  G   Male    Case  0.96
#  H   Male    Case  0.79
#  I Female    Case  0.37
#  J   Male Control  0.63
#  K   Male    Case  0.26
#  L Female Control  0.36
#  M   Male    Case  0.41
#  N   Male    Case  0.80
#  O Female    Case  0.10
#  P Female Control  0.41
#  Q Female    Case  0.16
#  R   Male Control  0.72
#  S   Male    Case  0.17
#  T Female    Case  0.74
#  U   Male Control  0.35
#  V Female Control  0.77
#  W   Male Control  0.27
#  X   Male Control  0.98
#  Y Female    Case  0.94
#  Z Female    Case  0.32

Differences with usual R

head(exprs(sample.ExpressionSet))

#                         A         B        C        D        E       F
#  AFFX-MurIL2_at  192.7420  85.75330 176.7570 135.5750 64.49390 76.3569
#  AFFX-MurIL10_at  97.1370 126.19600  77.9216  93.3713 24.39860 85.5088
#  AFFX-MurIL4_at   45.8192   8.83135  33.0632  28.7072  5.94492 28.2925
#  AFFX-MurFAS_at   22.5445   3.60093  14.6883  12.3397 36.86630 11.2568
#  AFFX-BioB-5_at   96.7875  30.43800  46.1271  70.9319 56.17440 42.6756
#  AFFX-BioB-M_at   89.0730  25.84610  57.2033  69.9766 49.58220 26.1262
#                         G       H       I         J        K       L
#  AFFX-MurIL2_at  160.5050 65.9631 56.9039 135.60800 63.44320 78.2126
#  AFFX-MurIL10_at  98.9086 81.6932 97.8015  90.48380 70.57330 94.5418
#  AFFX-MurIL4_at   30.9694 14.7923 14.2399  34.48740 20.35210 14.1554
#  AFFX-MurFAS_at   23.0034 16.2134 12.0375   4.54978  8.51782 27.2852
#  AFFX-BioB-5_at   86.5156 30.7927 19.7183  46.35200 39.13260 41.7698
#  AFFX-BioB-M_at   75.0083 42.3352 41.1207  91.53070 39.91360 49.8397
#                        M       N       O       P        Q        R
#  AFFX-MurIL2_at  83.0943 89.3372 91.0615 95.9377 179.8450 152.4670
#  AFFX-MurIL10_at 75.3455 68.5827 87.4050 84.4581  87.6806 108.0320
#  AFFX-MurIL4_at  20.6251 15.9231 20.1579 27.8139  32.7911  33.5292
#  AFFX-MurFAS_at  10.1616 20.2488 15.7849 14.3276  15.9488  14.6753
#  AFFX-BioB-5_at  80.2197 36.4903 36.4021 35.3054  58.6239 114.0620
#  AFFX-BioB-M_at  63.4794 24.7007 47.4641 47.3578  58.1331 104.1220
#                          S       T         U        V       W         X
#  AFFX-MurIL2_at  180.83400 85.4146 157.98900 146.8000 93.8829 103.85500
#  AFFX-MurIL10_at 134.26300 91.4031  -8.68811  85.0212 79.2998  71.65520
#  AFFX-MurIL4_at   19.81720 20.4190  26.87200  31.1488 22.3420  19.01350
#  AFFX-MurFAS_at   -7.91911 12.8875  11.91860  12.8324 11.1390   7.55564
#  AFFX-BioB-5_at   93.44020 22.5168  48.64620  90.2215 42.0053  57.57380
#  AFFX-BioB-M_at  115.83100 58.1224  73.42210  64.6066 40.3068  41.82090
#                        Y         Z
#  AFFX-MurIL2_at  64.4340 175.61500
#  AFFX-MurIL10_at 64.2369  78.70680
#  AFFX-MurIL4_at  12.1686  17.37800
#  AFFX-MurFAS_at  19.9849   8.96849
#  AFFX-BioB-5_at  44.8216  61.70440
#  AFFX-BioB-M_at  46.1087  49.41220

Accessing Features (probes)

affyIDs <- rownames(sample.ExpressionSet@featureData)
affyIDs[200:203]

#  [1] "31439_f_at" "31440_at"   "31441_at"   "31442_at"

The IDs for affy probes are singularly unhelpful if we wish to analyze our expression data with respect to genes or transcripts. To address this problem here (and in other data sets) we can turn to the “biomaRt” library.

BiomaRt

“The biomaRt package, provides an interface to a growing collection of databases implementing the BioMart software suite.”

To use the biomaRt package, we will have to first select a BioMart database and a dataset from that database to query.

Selecting a Database

# BiocManager::install('biomaRt')
library("biomaRt")
ensemblDatabase <- useMart("ensembl")

Selecting a Dataset

searchDatasets(mart = ensemblDatabase, pattern = "Human")

#                   dataset              description    version
#  54 hsapiens_gene_ensembl Human genes (GRCh38.p12) GRCh38.p12

ensemblHumanData <- useMart("ensembl",dataset="hsapiens_gene_ensembl")

Identifying Attributes

searchAttributes(mart = ensemblHumanData, pattern = "affy")

#                       name                 description         page
#  95           affy_hc_g110          AFFY HC G110 probe feature_page
#  96          affy_hg_focus         AFFY HG Focus probe feature_page
#  97          affy_hg_u133a         AFFY HG U133A probe feature_page
#  98        affy_hg_u133a_2       AFFY HG U133A 2 probe feature_page
#  99          affy_hg_u133b         AFFY HG U133B probe feature_page
#  100   affy_hg_u133_plus_2   AFFY HG U133 Plus 2 probe feature_page
#  101          affy_hg_u95a          AFFY HG U95A probe feature_page
#  102        affy_hg_u95av2        AFFY HG U95Av2 probe feature_page
#  103          affy_hg_u95b          AFFY HG U95B probe feature_page
#  104          affy_hg_u95c          AFFY HG U95C probe feature_page
#  105          affy_hg_u95d          AFFY HG U95D probe feature_page
#  106          affy_hg_u95e          AFFY HG U95E probe feature_page
#  107          affy_hta_2_0          AFFY HTA 2 0 probe feature_page
#  108   affy_huex_1_0_st_v2   AFFY HuEx 1 0 st v2 probe feature_page
#  109         affy_hugenefl         AFFY HuGeneFL probe feature_page
#  110 affy_hugene_1_0_st_v1 AFFY HuGene 1 0 st v1 probe feature_page
#  111 affy_hugene_2_0_st_v1 AFFY HuGene 2 0 st v1 probe feature_page
#  112        affy_primeview        AFFY PrimeView probe feature_page
#  113         affy_u133_x3p         AFFY U133 X3P probe feature_page

Identifying Attributes (Part 2)

searchAttributes(mart = ensemblHumanData, pattern = "hgnc")

#                name             description         page
#  58         hgnc_id                 HGNC ID feature_page
#  59     hgnc_symbol             HGNC symbol feature_page
#  60 hgnc_trans_name HGNC transcript name ID feature_page

Querying the Dataset

getBM(attributes = c('affy_hg_u95av2', 'hgnc_symbol'),
      filters = "affy_hg_u95av2",
      values = affyIDs[200:203],
      mart = ensemblHumanData)

#    affy_hg_u95av2 hgnc_symbol
#  1       31440_at        TCF7
#  2     31439_f_at        RHCE
#  3     31439_f_at         RHD

Bioconductor ecosystem

Taken from <a href = “http://bioconductor.org/help/course-materials/2017/OSU/B1_Bioconductor_intro.html”, target="_blank">Morgan’s Bioconductor Tutorial

A RNA-Seq pipeline (link)

Goals

Exploratory data analysis
Differential expression analysis with DESeq2
Visualization
We will start after reads have been aligned to a reference genome and reads overlapping known genes have been counted

The experiment

In the experiment, four primary human airway smooth muscle cell lines were treated with 1 micromolar dexamethasone for 18 hours.
For each of the four cell lines, we have a treated and an untreated sample.

Start with prepared SummarizedExperiment

# BiocManager::install('airway')
library(airway)
data(airway)
se <- airway
head(assay(se))

#                  SRR1039508 SRR1039509 SRR1039512 SRR1039513 SRR1039516
#  ENSG00000000003        679        448        873        408       1138
#  ENSG00000000005          0          0          0          0          0
#  ENSG00000000419        467        515        621        365        587
#  ENSG00000000457        260        211        263        164        245
#  ENSG00000000460         60         55         40         35         78
#  ENSG00000000938          0          0          2          0          1
#                  SRR1039517 SRR1039520 SRR1039521
#  ENSG00000000003       1047        770        572
#  ENSG00000000005          0          0          0
#  ENSG00000000419        799        417        508
#  ENSG00000000457        331        233        229
#  ENSG00000000460         63         76         60
#  ENSG00000000938          0          0          0

Metadata for experiment

colData(se)

#  DataFrame with 8 rows and 9 columns
#             SampleName     cell      dex    albut        Run avgLength
#               <factor> <factor> <factor> <factor>   <factor> <integer>
#  SRR1039508 GSM1275862   N61311    untrt    untrt SRR1039508       126
#  SRR1039509 GSM1275863   N61311      trt    untrt SRR1039509       126
#  SRR1039512 GSM1275866  N052611    untrt    untrt SRR1039512       126
#  SRR1039513 GSM1275867  N052611      trt    untrt SRR1039513        87
#  SRR1039516 GSM1275870  N080611    untrt    untrt SRR1039516       120
#  SRR1039517 GSM1275871  N080611      trt    untrt SRR1039517       126
#  SRR1039520 GSM1275874  N061011    untrt    untrt SRR1039520       101
#  SRR1039521 GSM1275875  N061011      trt    untrt SRR1039521        98
#             Experiment    Sample    BioSample
#               <factor>  <factor>     <factor>
#  SRR1039508  SRX384345 SRS508568 SAMN02422669
#  SRR1039509  SRX384346 SRS508567 SAMN02422675
#  SRR1039512  SRX384349 SRS508571 SAMN02422678
#  SRR1039513  SRX384350 SRS508572 SAMN02422670
#  SRR1039516  SRX384353 SRS508575 SAMN02422682
#  SRR1039517  SRX384354 SRS508576 SAMN02422673
#  SRR1039520  SRX384357 SRS508579 SAMN02422683
#  SRR1039521  SRX384358 SRS508580 SAMN02422677

Genomic ranges over which counting occurred

rowRanges(se)

#  GRangesList object of length 64102:
#  $ENSG00000000003 
#  GRanges object with 17 ranges and 2 metadata columns:
#         seqnames            ranges strand |   exon_id       exon_name
#            <Rle>         <IRanges>  <Rle> | <integer>     <character>
#     [1]        X 99883667-99884983      - |    667145 ENSE00001459322
#     [2]        X 99885756-99885863      - |    667146 ENSE00000868868
#     [3]        X 99887482-99887565      - |    667147 ENSE00000401072
#     [4]        X 99887538-99887565      - |    667148 ENSE00001849132
#     [5]        X 99888402-99888536      - |    667149 ENSE00003554016
#     ...      ...               ...    ... .       ...             ...
#    [13]        X 99890555-99890743      - |    667156 ENSE00003512331
#    [14]        X 99891188-99891686      - |    667158 ENSE00001886883
#    [15]        X 99891605-99891803      - |    667159 ENSE00001855382
#    [16]        X 99891790-99892101      - |    667160 ENSE00001863395
#    [17]        X 99894942-99894988      - |    667161 ENSE00001828996
#  
#  ...
#  <64101 more elements>
#  -------
#  seqinfo: 722 sequences (1 circular) from an unspecified genome

Create a DESeqDataSet

# BiocManager::install('DESeq2')
library("DESeq2")
dds <- DESeqDataSet(se, design = ~ cell + dex)
dds

#  class: DESeqDataSet 
#  dim: 64102 8 
#  metadata(2): '' version
#  assays(1): counts
#  rownames(64102): ENSG00000000003 ENSG00000000005 ... LRG_98 LRG_99
#  rowData names(0):
#  colnames(8): SRR1039508 SRR1039509 ... SRR1039520 SRR1039521
#  colData names(9): SampleName cell ... Sample BioSample

Fix factor label

dds$dex <- relevel(dds$dex, "untrt")

Run differential expression pipeline

dds <- DESeq(dds)

#  estimating size factors

#  estimating dispersions

#  gene-wise dispersion estimates

#  mean-dispersion relationship

#  final dispersion estimates

#  fitting model and testing

Extracting results

(res <- results(dds))

#  log2 fold change (MLE): dex trt vs untrt 
#  Wald test p-value: dex trt vs untrt 
#  DataFrame with 64102 rows and 6 columns
#                          baseMean      log2FoldChange             lfcSE
#                         <numeric>           <numeric>         <numeric>
#  ENSG00000000003 708.602169691234  -0.381253887429316 0.100654430187038
#  ENSG00000000005                0                  NA                NA
#  ENSG00000000419 520.297900552084   0.206812715390385 0.112218674572541
#  ENSG00000000457 237.163036796015  0.0379205923945968 0.143444716340173
#  ENSG00000000460 57.9326331250967 -0.0881676962637897 0.287141995230742
#  ...                          ...                 ...               ...
#  LRG_94                         0                  NA                NA
#  LRG_96                         0                  NA                NA
#  LRG_97                         0                  NA                NA
#  LRG_98                         0                  NA                NA
#  LRG_99                         0                  NA                NA
#                                stat               pvalue               padj
#                           <numeric>            <numeric>          <numeric>
#  ENSG00000000003  -3.78775069036569 0.000152017272634607 0.0012836381222748
#  ENSG00000000005                 NA                   NA                 NA
#  ENSG00000000419   1.84294384315417   0.0653372100766968  0.196545840691255
#  ENSG00000000457  0.264356843264061    0.791504963002084  0.911458000845924
#  ENSG00000000460 -0.307052600205483    0.758803335537906   0.89503444995272
#  ...                            ...                  ...                ...
#  LRG_94                          NA                   NA                 NA
#  LRG_96                          NA                   NA                 NA
#  LRG_97                          NA                   NA                 NA
#  LRG_98                          NA                   NA                 NA
#  LRG_99                          NA                   NA                 NA

Summarizing results

summary(res)

#  
#  out of 33469 with nonzero total read count
#  adjusted p-value < 0.1
#  LFC > 0 (up)       : 2604, 7.8%
#  LFC < 0 (down)     : 2215, 6.6%
#  outliers [1]       : 0, 0%
#  low counts [2]     : 15441, 46%
#  (mean count < 5)
#  [1] see 'cooksCutoff' argument of ?results
#  [2] see 'independentFiltering' argument of ?results

Summarizing results

library(tidyverse)
as.data.frame(res) %>% 
  rownames_to_column(var = 'ID') %>% 
  filter(padj < 0.1) %>% 
  arrange(desc(abs(log2FoldChange))) %>% head()

#                 ID   baseMean log2FoldChange     lfcSE      stat
#  1 ENSG00000179593  67.243048       9.505972 1.0545022  9.014654
#  2 ENSG00000109906 385.071029       7.352628 0.5363902 13.707610
#  3 ENSG00000250978  56.318194       6.327384 0.6777974  9.335214
#  4 ENSG00000132518   5.654654       5.885112 1.3240432  4.444803
#  5 ENSG00000128285   6.624741      -5.325905 1.2578165 -4.234247
#  6 ENSG00000127954 286.384119       5.207160 0.4930828 10.560419
#          pvalue         padj
#  1 1.974931e-19 1.253664e-17
#  2 9.141988e-43 2.257695e-40
#  3 1.007873e-20 7.210289e-19
#  4 8.797236e-06 1.000609e-04
#  5 2.293191e-05 2.380061e-04
#  6 4.546302e-26 5.059304e-24

A visualization

topGene <- rownames(res)[which.min(res$padj)]
dat <- plotCounts(dds, gene=topGene, intgroup=c("dex"), returnData=TRUE)
ggplot(dat, aes(x = dex, y = count, fill=dex))+
  geom_dotplot(binaxis='y', stackdir='center')+
  scale_y_log10()

Another visualization

plotMA(res, ylim=c(-5,5))

Making a heatmap

Heatmaps

There are several ways of doing heatmaps in R:

Some example data

library(Biobase)
data(sample.ExpressionSet)
exdat <-  sample.ExpressionSet
library(limma)
design1 <- model.matrix(~type, data=pData(exdat))
lm1 <- lmFit(exprs(exdat), design1)
lm1 <- eBayes(lm1) # compute linear model for each probeset
geneID <- rownames(topTable(lm1, coef=2, num=100, adjust='none',p.value=0.05))
exdat2 <- exdat[geneID,] # Keep features with p-values < 0.05
exdat2

#  ExpressionSet (storageMode: lockedEnvironment)
#  assayData: 46 features, 26 samples 
#    element names: exprs, se.exprs 
#  protocolData: none
#  phenoData
#    sampleNames: A B ... Z (26 total)
#    varLabels: sex type score
#    varMetadata: labelDescription
#  featureData: none
#  experimentData: use 'experimentData(object)'
#  Annotation: hgu95av2

Heatmaps using `Heatplus`

source('http://bioconductor.org/biocLite.R')
biocLite('Heatplus')

# BiocManager::install('Heatplus')
library(Heatplus)
reg1 <- regHeatmap(exprs(exdat2), legend=2, col=heat.colors, 
                   breaks=-3:3)
plot(reg1)

corrdist <- function(x) as.dist(1-cor(t(x)))
hclust.avl <-  function(x) hclust(x, method='average')
reg2 <- regHeatmap(exprs(exdat2), legend=2, col=heat.colors,
                   breaks=-3:3, 
                   dendrogram = list(clustfun=hclust.avl, distfun=corrdist))
plot(reg2)

ann1 <- annHeatmap(exprs(exdat2), ann=pData(exdat2), col = heat.colors)
plot(ann1)

ann2 <- annHeatmap(exprs(exdat2), ann=pData(exdat2), col = heat.colors,
                   cluster = list(cuth=7500,
                                  label=c('Control-like','Case-like')))
plot(ann2)

Put your mouse over each point :)

# install.packages('d3heatmap')
library(d3heatmap)
d3heatmap(exprs(exdat2), distfun = corrdist, 
          hclustfun = hclust.avl, scale='row')

Put your mouse over each point :)

Bioconductor

Bioconductor

Explore Bioconductor website

Installing Bioconductor packages

DNA sequences

DNA sequences

DNA sequences

Data in Bioconductor

Differences with usual R

Differences with usual R

Differences with usual R

Differences with usual R

Accessing Features (probes)

BiomaRt

BiomaRt

Selecting a Database

Selecting a Dataset

Identifying Attributes

Identifying Attributes (Part 2)

Querying the Dataset

Bioconductor ecosystem

A RNA-Seq pipeline (link)

Goals

The experiment

Start with prepared SummarizedExperiment

Metadata for experiment

Genomic ranges over which counting occurred

Create a DESeqDataSet

Fix factor label

Run differential expression pipeline

Extracting results

Summarizing results

Summarizing results

A visualization

Another visualization

Making a heatmap

Heatmaps

Some example data

Heatmaps using Heatplus

Heatmaps using `Heatplus`