Pathway analyses are done using fold-change values returned by limma or DESeq2. Since there are fold-change values for each comparison, so pathway analysis can be conducted on each comparison. Note that pathway analysis uses fold-change values of all genes and hence is independent of the selected DEGs.
However, users can choose to filter out some genes with noisy fold-changes by using a FDR cutoff by reducing the “Remove genes with big FDR before pathway analysis” from the default value of 1 to a relatively bigger FDR cutoff like 0.6. If this is done then, genes with FDR > 0.6 is removed from pathway analysis.
Pathway analysis can be performed using several methods. GSEA (Gene Set Enrichment Analysis) (Subramanian et al., 2005) is conducted in the pre-ranked mode using a recent faster algorithm based on the fgsea package (Sergushichev, 2016). PAGE (Parametric Analysis of Gene Set Enrichment) (Kim and Volsky, 2005) is used as implemented in PGSEA package(Furge and Dykema, 2012). For PGSEA there are two versions. One only analyzes the selected comparisons and another option (“PGSEA w/ all samples”) enables the user to analyze all sample groups.
Unlike all of these methods rely on the built-in geneset databases, ReactomePA (Reactome Pathway Analysis) (Yu and He, 2016) retrieves genesets from Reactome (Fabregat et al., 2016; Yu and He, 2016).
Gene expression data can be visualized on KEGG pathway diagrams (Kanehisa et al., 2017) using Pathview (Luo and Brouwer, 2013). Note that Pathview download pathway diagrams directly from KEGG website and thus is slow.
On the lower left side of the screen, there is checkbox named “Use absolute values of fold changes for GSEA and GAGE”. This is useful as some molecular pathways can be modulated by up-regulating some genes while down-regulating others. This is especially useful when using KEGG pathways. For others genes sets such as TF target genes, microRNA target genes where we know the regulation is one-directional, we should not check this box.
Very small genesets can cause problems. By default genesets with less than 15 genes are disregarded. Sometimes this needs to be reduced to 10, or even 5, when some pathway of interest only have few genes. But be aware that this may introduce false positives.
Besides Gene Ontology, iDEP includes additional data from KEGG, Reactome, MSigDB (human), GSKB (mouse) and araPath (arabidopsis).
This table below lists the human pathway databases and their sources:
Type |
Subtype/Database name |
#GeneSets |
Source |
Gene Ontology |
Biological Process (BP) |
15796 |
Ensembl 92 |
|
Cellular Component (CC) |
1916 |
Ensembl 92 |
|
Molecular Function (MF) |
4605 |
Ensembl 92 |
KEGG |
KEGG |
327 |
Release 86.1 |
Curated |
Biocarta |
249 |
Whichgenes 1.5 |
|
GeneSetDB.EHMN |
55 |
GeneSetDB |
|
Panther |
168 |
1.0.4 |
|
HumanCyc |
240 |
pathway Commons V9 |
|
INOH |
576 |
pathway Commons V9 |
|
NetPath |
27 |
pathway Commons V9 |
|
PID |
223 |
pathway Commons V9 |
|
PSP |
327 |
pathway Commons V9 |
|
Recon X |
2339 |
pathway Commons V9 |
|
Reactome |
2010 |
V64 |
|
Wiki |
457 |
20180610 |
TF.Target |
CircuitsDB.TF |
829 |
V2012 |
|
ENCODE |
181 |
V70.0 |
|
Marbach2016 |
628 |
regulatorycircuits Release 1.0 |
|
RegNetwork.TF |
1400 |
7/1/2017 |
|
TFacts |
428 |
Feb. 2012 |
|
tftargets.ITFP |
1926 |
tftargets May,2017 |
|
tftargets.Neph2012 |
16476 |
tftargets May,2017 |
|
tftargets.TRED |
131 |
tftargets May,2017 |
|
TRRUST |
793 |
V2 |
miRNA.Targets |
CircuitsDB.miRNA |
140 |
V. 2012 |
|
GeneSetDB.MicroCosm |
44 |
GeneSetDB |
|
miRDB |
2588 |
V 5.0 |
|
miRTarBase |
2599 |
V 7.0 |
|
RegNetwork.miRNA |
618 |
V. 2015 |
|
TargetScan |
219 |
V7.2 |
MSigDB.Computational |
Computational gene sets |
858 |
MSigDB 6.1 |
MSigDB.Curated |
Literature |
3465 |
MSigDB 6.1 |
MSigDB.Hallmark |
hallmark |
50 |
MSigDB 6.1 |
MSigDB.Immune |
Immune system |
4872 |
MSigDB 6.1 |
MSigDB.Location |
Cytogenetic band |
326 |
MSigDB 6.1 |
MSigDB.Motif |
TF and miRNA Motifs |
836 |
MSigDB 6.1 |
MSigDB.Oncogenic |
Oncogenic signatures |
189 |
MSigDB 6.1 |
PPI |
BioGRID |
15542 |
3.4.160 |
|
CORUM |
2178 |
02.07.2017 |
|
BIND |
3807 |
pathway Commons V9 |
|
DIP |
2630 |
pathway Commons V9 |
|
HPRD |
7141 |
pathway Commons V9 |
|
IntAct |
11991 |
pathway Commons V9 |
Drug |
GeneSetDB.MATADOR |
266 |
GeneSetDB |
|
GeneSetDB.SIDER |
473 |
GeneSetDB |
|
GeneSetDB.STITCH |
4616 |
GeneSetDB |
|
GeneSetDB.T3DB |
846 |
GeneSetDB |
|
SMPDB |
699 |
pathway Commons V9 |
|
CTD |
8758 |
pathway Commons V9 |
|
Drugbank |
2563 |
pathway Commons V9 |
Other |
GeneSetDB.CancerGenes |
23 |
GeneSetDB |
|
GeneSetDB.MethCancerDB |
21 |
GeneSetDB |
|
GeneSetDB.MethyCancer |
54 |
GeneSetDB |
|
GeneSetDB.MPO |
3134 |
GeneSetDB |
|
HPO |
6785 |
May,2018 |
Total: |
|
140,438 |
|
This table below lists the sources for mouse pathways.
|
|
|
|
Type |
Source |
#Sets |
Note |
Co-expression |
Literature |
8,742 |
Differentially expressed genes from 2526 |
|
MSigDB |
3,964 |
Molecular Signature Database, v.6.0 |
|
L2L |
248 |
List of lists, v.2006.2 |
|
CancerGenes* |
23 |
Cancer gene lists |
|
GeneSigDB |
494 |
Gene Signature Database, R.4 |
Gene |
GO_BP |
11,943 |
V2017.5 |
Ontology |
GO_MF |
2,932 |
|
|
GO_CC |
1,475 |
|
Curated |
Biocarta* |
176 |
Metabolic and signaling pathways |
pathways |
PANTHER |
151 |
Ontology-based pathway database, v3.4.1 |
|
WikiPathways* |
146 |
Open platform for pathway curation |
|
INOH* |
73 |
Integrating network objects with hierarchies |
|
NetPath* |
25 |
Signal transduction pathways |
Metabolic |
KEGG |
314 |
Metabolic pathways, R.82.0 |
pathways |
EHMN* |
53 |
Edinburgh human metabolic network |
|
MouseCyc |
321 |
Mouse Biochemical Pathways |
Drug |
CTD* |
910 |
The Comparative Toxicogenomics |
related |
SIDER* |
460 |
Side Effect Resource |
|
MATADOR* |
248 |
Manually Annotated Targets and Drugs Online |
|
DrugBank* |
136 |
Open data drug and target database |
|
SMPDB* |
74 |
Small Molecule Pathway Database |
miRNA |
miRDB |
1,912 |
miRNA target prediction and annotations, v |
Target |
microRNA.org |
314 |
Predicted miRNA targets, v.R2010 |
Genes |
Grimson et al. |
179 |
Predicted miRNA targets. v.6.2 |
|
TarBase |
84 |
Experimentally validated miRNA targets, v.6.0 |
|
miRTarBase |
775 |
Experimentally validated miRNA targets, V6.1 |
|
MicroCosm |
464 |
Predicted targets |
|
PicTar |
35 |
Predicted miRNA sites, v. 2007.3 |
TF Target |
TFactS* |
101 |
Predicted TF targets |
Genes |
TRED |
99 |
Confirmed TF target genes, v.2013.7 |
|
CircuitsDB |
94 |
Mixed miRNA/TF regulation, v. 2012 |
|
TRANSFAC |
78 |
Confirmed TF binding sites, v7.0 |
Others |
Location |
341 |
Genomic location on chromosomes, v.2017 |
|
HPO* |
1,518 |
The human phenotype ontology |
|
STITCH* |
3,929 |
Interaction networks of chemicals and |
|
MPO* |
2,943 |
Mammalian Phenotype Ontology |
|
T3DB* |
722 |
Database of common toxins and their targets |
|
PID* |
193 |
Pathway Interaction Database |
|
MethyCancer* |
50 |
Human DNA methylation and cancer |
|
MethCancerDB* |
19 |
Aberrant DNA methylation in human cancer |
|
Total |
46,758 |
*Secondary data from GeneSetDB |
R code used for GAGE using the gage package:
paths <- gage(fold, gsets = gmt, ref = NULL, samp = NULL)
R code for GSEA via fgsea pckage:
paths <- fgsea(pathways = gmt,
stats = fold,
minSize=input$minSetSize,
maxSize=input$maxSetSize,
nperm=5000)
R code for PAGE using the PGSEA package:
pg= PGSEA (convertedData – rowMeans(convertedData), cl=gmt, range=myrange, p.value=TRUE, weighted=FALSE)
R code for ReactomePA:
paths <- gsePathway(fold, nPerm=5000, organism = ReactomePASpecies[ix],
minGSSize= input$minSetSize,
maxGSSize= input$maxSetSize,
pvalueCutoff=0.5,
pAdjustMethod=”BH”, verbose=FALSE)
References:
Fabregat, A., Sidiropoulos, K., Garapati, P., Gillespie, M., Hausmann, K., Haw, R., Jassal, B., Jupe, S., Korninger, F., McKay, S., et al. (2016). The Reactome pathway Knowledgebase. Nucleic Acids Res 44, D481-487.
Furge, K., and Dykema, K. (2012). PGSEA: Parametric Gene Set Enrichment Analysis. R package version 1480.
Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y., and Morishima, K. (2017). KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res 45, D353-D361.
Kim, S.Y., and Volsky, D.J. (2005). PAGE: parametric analysis of gene set enrichment. BMC Bioinformatics 6, 144.
Luo, W., and Brouwer, C. (2013). Pathview: an R/Bioconductor package for pathway-based data integration and visualization. Bioinformatics 29, 1830-1831.
Sergushichev, A. (2016). An algorithm for fast preranked gene set enrichment analysis using cumulative statistic calculation. bioRxiv http://biorxiv.org/content/early/2016/06/20/060012.
Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., et al. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102, 15545-15550.
Yu, G., and He, Q.Y. (2016). ReactomePA: an R/Bioconductor package for reactome pathway analysis and visualization. Mol Biosyst 12, 477-479.