Pathways

Pathway analyses are done using fold-change values returned by limma or DESeq2. Since there are fold-change values for each comparison, so pathway analysis can be conducted on each comparison. Note that pathway analysis uses fold-change values of all genes and hence is independent of the selected DEGs.

However, users can choose to filter out some genes with noisy fold-changes by using a FDR cutoff by reducing the “Remove genes with big FDR before pathway analysis” from the default value of 1 to a relatively bigger FDR cutoff like 0.6. If this is done then, genes with FDR > 0.6 is removed from pathway analysis.

Pathway analysis can be performed using several methods. GSEA (Gene Set Enrichment Analysis) (Subramanian et al., 2005) is conducted in the pre-ranked mode using a recent faster algorithm based on the fgsea package (Sergushichev, 2016). PAGE (Parametric Analysis of Gene Set Enrichment) (Kim and Volsky, 2005) is used as implemented in PGSEA package(Furge and Dykema, 2012). For PGSEA there are two versions. One only analyzes the selected comparisons and another option (“PGSEA w/ all samples”) enables the user to analyze all sample groups.

Unlike all of these methods rely on the built-in geneset databases, ReactomePA (Reactome Pathway Analysis) (Yu and He, 2016) retrieves genesets from Reactome (Fabregat et al., 2016; Yu and He, 2016).

Gene expression data can be visualized on KEGG pathway diagrams (Kanehisa et al., 2017) using Pathview (Luo and Brouwer, 2013). Note that Pathview download pathway diagrams directly from KEGG website and thus is slow.

On the lower left side of the screen, there is checkbox named “Use absolute values of fold changes for GSEA and GAGE”. This is useful as some molecular pathways can be modulated by up-regulating some genes while down-regulating others. This is especially useful when using KEGG pathways. For others genes sets such as TF target genes, microRNA target genes where we know the regulation is one-directional, we should not check this box.

Very small genesets can cause problems. By default genesets with less than 15 genes are disregarded. Sometimes this needs to be reduced to 10, or even 5, when some pathway of interest only have few genes. But be aware that this may introduce false positives.

Besides Gene Ontology, iDEP includes additional data from KEGG, Reactome, MSigDB (human), GSKB (mouse) and araPath (arabidopsis).

This table below lists the human pathway databases and their sources:

Type

Subtype/Database name

#GeneSets

Source

Gene Ontology

Biological Process (BP)

15796

Ensembl 92

 

Cellular Component (CC)

1916

Ensembl 92

 

Molecular Function (MF)

4605

Ensembl 92

KEGG

KEGG

327

Release 86.1

Curated

Biocarta

249

Whichgenes 1.5

 

GeneSetDB.EHMN

55

GeneSetDB

 

Panther

168

1.0.4

 

HumanCyc

240

pathway Commons V9

 

INOH

576

pathway Commons V9

 

NetPath

27

pathway Commons V9

 

PID

223

pathway Commons V9

 

PSP

327

pathway Commons V9

 

Recon X

2339

pathway Commons V9

 

Reactome

2010

V64

 

Wiki

457

20180610

TF.Target

CircuitsDB.TF

829

V2012

 

ENCODE

181

V70.0

 

Marbach2016

628

regulatorycircuits Release 1.0

 

RegNetwork.TF

1400

7/1/2017

 

TFacts

428

Feb. 2012

 

tftargets.ITFP

1926

tftargets May,2017

 

tftargets.Neph2012

16476

tftargets May,2017

 

tftargets.TRED

131

tftargets May,2017

 

TRRUST

793

V2

miRNA.Targets

CircuitsDB.miRNA

140

V. 2012

 

GeneSetDB.MicroCosm

44

GeneSetDB

 

miRDB

2588

V 5.0

 

miRTarBase

2599

V 7.0

 

RegNetwork.miRNA

618

V. 2015

 

TargetScan

219

V7.2

MSigDB.Computational

Computational gene sets 

858

MSigDB 6.1

MSigDB.Curated

Literature

3465

MSigDB 6.1

MSigDB.Hallmark

hallmark

50

MSigDB 6.1

MSigDB.Immune

Immune system

4872

MSigDB 6.1

MSigDB.Location

Cytogenetic band

326

MSigDB 6.1

MSigDB.Motif

TF and miRNA Motifs

836

MSigDB 6.1

MSigDB.Oncogenic

Oncogenic signatures

189

MSigDB 6.1

PPI

BioGRID

15542

3.4.160

 

CORUM

2178

02.07.2017

 

BIND

3807

pathway Commons V9

 

DIP

2630

pathway Commons V9

 

HPRD

7141

pathway Commons V9

 

IntAct

11991

pathway Commons V9

Drug

GeneSetDB.MATADOR

266

GeneSetDB

 

GeneSetDB.SIDER

473

GeneSetDB

 

GeneSetDB.STITCH

4616

GeneSetDB

 

GeneSetDB.T3DB

846

GeneSetDB

 

SMPDB

699

pathway Commons V9

 

CTD

8758

pathway Commons V9

 

Drugbank

2563

pathway Commons V9

Other

GeneSetDB.CancerGenes

23

GeneSetDB

 

GeneSetDB.MethCancerDB

21

GeneSetDB

 

GeneSetDB.MethyCancer

54

GeneSetDB

 

GeneSetDB.MPO

3134

GeneSetDB

 

HPO

6785

May,2018

Total:

 

140,438

 

This table below lists the sources for mouse pathways.

 

 

 

 

Type

Source

#Sets

Note

Co-expression

Literature

8,742

Differentially expressed genes from 2526
studies

 

MSigDB

3,964

Molecular Signature Database, v.6.0

 

L2L

248

List of lists,  v.2006.2

 

CancerGenes*

23

Cancer gene lists

 

GeneSigDB

494

Gene Signature Database,  R.4

Gene

GO_BP

11,943

V2017.5

Ontology

GO_MF

2,932

 

 

GO_CC

1,475

 

Curated

Biocarta*

176

Metabolic and signaling pathways

pathways

PANTHER

151

Ontology-based pathway database,  v3.4.1

 

WikiPathways*

146

Open platform for pathway curation

 

INOH*

73

Integrating network objects with hierarchies

 

NetPath*

25

Signal transduction pathways

Metabolic

KEGG

314

Metabolic pathways, R.82.0

pathways

EHMN*

53

Edinburgh human metabolic network

 

MouseCyc

321

Mouse Biochemical Pathways
,
v2013.7

Drug

CTD*

910

The Comparative Toxicogenomics
Database

related

SIDER*

460

Side Effect Resource

 

MATADOR*

248

Manually Annotated Targets and Drugs Online
Resource

 

DrugBank*

136

Open data drug and target database

 

SMPDB*

74

Small Molecule Pathway Database

miRNA

miRDB

1,912

miRNA target prediction and annotations, v
5.0

Target

microRNA.org

314

Predicted miRNA targets, v.R2010

Genes

Grimson et al.

179

Predicted miRNA targets. v.6.2

 

TarBase

84

Experimentally validated miRNA targets, v.6.0

 

miRTarBase

775

Experimentally validated miRNA targets, V6.1

 

MicroCosm

464

Predicted targets

 

PicTar

35

Predicted miRNA sites, v. 2007.3

TF Target

TFactS*

101

Predicted TF targets

Genes

TRED

99

Confirmed TF target genes, v.2013.7

 

CircuitsDB

94

Mixed miRNA/TF regulation, v. 2012

 

TRANSFAC

78

Confirmed TF binding sites, v7.0

Others

Location

341

Genomic location on chromosomes, v.2017

 

HPO*

1,518

The human phenotype ontology

 

STITCH*

3,929

Interaction networks of chemicals and
proteins

 

MPO*

2,943

Mammalian Phenotype Ontology

 

T3DB*

722

Database of common toxins and their targets

 

PID*

193

Pathway Interaction Database

 

MethyCancer*

50

Human DNA methylation and cancer

 

MethCancerDB*

19

Aberrant DNA methylation in human cancer

 

Total

46,758

*Secondary data from GeneSetDB

R code used for GAGE using the gage package:

paths <- gage(fold, gsets = gmt, ref = NULL, samp = NULL)

R code for GSEA via fgsea pckage:

paths <- fgsea(pathways = gmt,
stats = fold,
minSize=input$minSetSize,
maxSize=input$maxSetSize,
nperm=5000)

R code for PAGE using the PGSEA package:

pg= PGSEA (convertedData – rowMeans(convertedData), cl=gmt, range=myrange, p.value=TRUE, weighted=FALSE)

R code for ReactomePA:

paths <- gsePathway(fold, nPerm=5000, organism = ReactomePASpecies[ix],
minGSSize= input$minSetSize,
maxGSSize= input$maxSetSize,
pvalueCutoff=0.5,
pAdjustMethod=”BH”, verbose=FALSE)

References:

Fabregat, A., Sidiropoulos, K., Garapati, P., Gillespie, M., Hausmann, K., Haw, R., Jassal, B., Jupe, S., Korninger, F., McKay, S., et al. (2016). The Reactome pathway Knowledgebase. Nucleic Acids Res 44, D481-487.

Furge, K., and Dykema, K. (2012). PGSEA: Parametric Gene Set Enrichment Analysis. R package version 1480.

Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y., and Morishima, K. (2017). KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res 45, D353-D361.

Kim, S.Y., and Volsky, D.J. (2005). PAGE: parametric analysis of gene set enrichment. BMC Bioinformatics 6, 144.

Luo, W., and Brouwer, C. (2013). Pathview: an R/Bioconductor package for pathway-based data integration and visualization. Bioinformatics 29, 1830-1831.

Sergushichev, A. (2016). An algorithm for fast preranked gene set enrichment analysis using cumulative statistic calculation. bioRxiv http://biorxiv.org/content/early/2016/06/20/060012.

Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., et al. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102, 15545-15550.

Yu, G., and He, Q.Y. (2016). ReactomePA: an R/Bioconductor package for reactome pathway analysis and visualization. Mol Biosyst 12, 477-479.