Hierarchical clustering with heatmap can give us a holistic view of the data. Using the transformed data, iDEP first ranks all genes by standard deviation across all samples. By default, the top 1000 genes are used in hierarchical clustering using the heatmap.2 function. The data is centered by subtracting the average expression level for each gene. The distance matrix is 1- r, where r is Pearson’s correlation coefficient. The average linkage is used. Note that sample groups are not used in hierarchical clustering; they are just shown as color bars.
The correlation matrix is computed using the cor function in R and does not use the bottom 25% of genes regarding expression level. The graph is generated using ggplot2 as demonstrated here.
The following is the R code used for the heatmap:
hclust2 <- function(x,
hclust(x, method=method, …)
dist2 <- function(x, …)
groups = detectGroups(colnames(x) )
groups.colors = rainbow(length(unique(groups)) )
lmat = rbind(c(5,4),c(0,1),c(3,2))
lwid = c(1.5,6)
lhei = c(1,.2,8)
heatmap.2(x, distfun = dist2,hclustfun=hclust2
,lmat= lmat, lwid = lwid, lhei = lhei