DEGs

By default, iDEP runs differential gene expression analysis on all pairs of sample groups, which are defined by parsing sample names.  Users can choose their interested comparisons by clicking on the “Select factors & comparisons” button. If a user have 3 sample groups Ctrl, TrtA, and TrtB, all possible pair-wise comparisons will be listed:

  • Ctrl vs. TrtA
  • Ctrl vs. TrtB
  • TrtA vs. Ctrl
  • TrtA vs. TrtB
  • TrtB vs. Ctrl
  • TrtB vs. TrtA.

The “Ctrl vs. TrtA” comparison will be denoted as “TrtA-Ctrl” in DEG1 and DEG2 tabs.  If sample groups are labeled as “WT_Ctrl” and “Mu_Ctrl”, then the “WT_Ctrl-Mu_Ctrl” comparison will represent the contrast between wild-type and mutant under control condition.

DEG1 tab summarizes the numbers of differentially expressed genes (DEGs) on the main panel, both as a bar graph and a table below. This tab also enables the users to adjust for fold-change and FDR cutoffs. Venn diagrams showing the overlaps of DEGs of different comparisons can be obtained by the “Venn Diagram” button. Only 5 gene lists can be selected. Users also have the option to split DEGs for each comparison into up- and down-regulated.

“Ctrl vs. TrtA” and “TrtA vs. Ctrl” represent the same comparison but in opposite directions. The results are the same except the up- or down-regulated genes are switched.  In this case, obviously, users should choose TrtA vs. Ctrl, as it makes more sense.

If users upload a sample information file, they can use information in the file to build statistical models. See data format page for details.  Users can select up to 6 factors when using DESeq2; only 2 are allowed for limma-voom or limma-trend.  One factor can be chosen for batch effect.

When using DESeq2, users can select the reference levels for each factor. Reference levels are important to interpret the results of DESeq2. See my note here. Usually, we select baseline levels such as control over treatments, wild-type over mutants, etc.  Reference levels are default levels when constructing comparisons for other factors. For the demo data, there are two factors p53 (wt or null) , and Treatment (mock or ionizing radiation, IR). The reference levels are “wt” for p53 and “mock” for Treatment. In the possible comparisons, “p53:null vs. wt” would refer to the difference between p53 null vs. wild-type, as observed in “mock” samples, which is the reference level for Treatment.  Similarly, “Treatment: IR vs. mock” represents the difference between IR and mock in wild-type, the reference level for p53.  Comparisons for non-reference levels are

Interaction terms between two factors indicate a differential response among the levels of the two factors. For example, interaction terms might capture the differential response to treatment among two genotypes. See more explanation here.  In the demo data, the interaction term may represent the extra effect of IR on p53 null samples. Interaction terms starts with “I:”.

Linear Models for Microarray Data (limma) (Ritchie et al., 2015) is a Bioconductor package for identifying differentially expressed genes (DEGs). If the input is normalized expression data, limma is the method used to analyze all possible comparisons between sample groups. FDR and fold-change cutoff can be adjusted. A Venn diagram comparing all gene lists is also produced. When there are more than 5 comparisons, only the first 5 gene lists are shown.  Users can download the gene lists and produce Venn diagrams themselves. The downloaded gene lists contain gene ids, one column for each comparison with gene list information. If a gene is upregulated in “Treatment-Control” comparison, it is denoted as 1. “Treatment-Control” means Treatment compared with Control. Downregulated genes are marked as -1. Genes not significantly different are indicated as 0. These genes are included because they are differentially expressed in other comparisons, specified in another column. Raw expression data is also attached, which the users can use to analyze further.

 

For read counts data, three methods are available in iDEP for identifying DEGs, namely DESeq2 (Love et al., 2014), limma-voom (Law et al., 2014; Liu et al., 2015), and limma-trend. DESeq2 is recommended, but it is not set as default as it is slower than other methods. Limma-trend and limma-voom first transform read counts data as continuous data. According to the limma user guide(Smyth, 2016): “If the sequencing depth is reasonably consistent across the RNA samples, then the simplest and most robust approach to differential expression is to use limma-trend. …When the library sizes are quite variable between samples, then the voom approach is theoretically more powerful than limma-trend.”

In the DEG2 tab,  user can examine each of the comparisons in detail by  using scatter plots, volcano plots, MA plots, as well as enrichment analyses.   These gene lists are used to conduct enrichment analysis using GO and other available gene sets. Similarly, enrichment of TF binding motif in promoters are conducted and the result is shown in a pop-up window.  On the main panel, a heatmap is also produced showing the expression pattern and number of genes in the up- and down-regulated genes. The top genes among DEGs sorted by the absolute value of fold-change are shown at the bottom of the main panel.

R code for limma on normalized gene expression data:

groups = detectGroups( colnames(x) )
eset = new("ExpressionSet", exprs=as.matrix(x))
design <- model.matrix(~0+groups)
cont.matrix <- makeContrasts(contrasts=comparisons, levels=design)
fit <- lmFit(eset, design)
if(batch_effect) { 
  corfit <- duplicateCorrelation(eset,design,block= block)
  fit <- lmFit(eset, design,block=block,correlation=corfit$consensus)
}
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2, trend=FALSE)
results <- decideTests(fit2, p.value=maxP_limma, lfc=log2(minFC_limma) )

 

R code for limma-trend:

dge <- DGEList(counts=rawCounts);
dge <- calcNormFactors(dge)
eset <- cpm(dge, log=TRUE, prior.count=priorCounts)
limmaTrend = TRUE
design <- model.matrix(~0+groups)
fit <- lmFit(eset, design)
if(batch_effect) { 
 corfit <- duplicateCorrelation(eset,design,block= block)
 fit <- lmFit(eset, design,block=block,correlation=corfit$consensus)
}
cont.matrix <- makeContrasts(contrasts=comparisons, levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2, trend=FALSE)
results <- decideTests(fit2, p.value= maxP_limma, lfc= log2(minFC_limma) )
topGenes1 =topTable(fit2, number = 1e12,sort.by="M" )

R code for limma-voom:

design <- model.matrix(~0+groups)
v <- voom(rawCounts, design);
fit <- lmFit(v, design)
if(batch_effect) { 
 corfit <- duplicateCorrelation(eset,design,block= block)
 fit <- lmFit(eset, design,block=block,correlation=corfit$consensus)
}
cont.matrix <- makeContrasts(contrasts=comparisons, levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2, trend=limmaTrend)
results <- decideTests(fit2, p.value=maxP_limma, lfc=log2(minFC_limma) )
topGenes1 =topTable(fit2, number = 1e12,sort.by="M" )

R code for DESeq2:

colData = cbind(colnames(rawCounts), groups )
dds = DESeqDataSetFromMatrix(countData=rawCounts,
colData=colData,
design=~groups)
dds = DESeq(dds)

 

References

Law, C.W., Chen, Y., Shi, W., and Smyth, G.K. (2014). voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol 15, R29.

Liu, R., Holik, A.Z., Su, S., Jansz, N., Chen, K., Leong, H.S., Blewitt, M.E., Asselin-Labat, M.L., Smyth, G.K., and Ritchie, M.E. (2015). Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses. Nucleic Acids Res 43, e97.

Love, M.I., Huber, W., and Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15, 550.

Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., and Smyth, G.K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43, e47.

Smyth, G.K. (2016). limma: Linear Models for Microarray and RNA-Seq Data: User’s Guide.

 

Advertisements
%d bloggers like this:
search previous next tag category expand menu location phone mail time cart zoom edit close