Density Analysis for Gene Set Enrichment Testing

Abstract:

Use of biological knowledge has been shown to be of value in analyzing high throughput gene expression and other types of genomics data. For instance, Gene Set Enrichment Analysis (GSEA) is widely used for analysis of data from gene expression microarrays. However, biological knowledge is typically introduced in a very simple way ---for example all genes in a biological pathway are defined to be in a gene set and considered equivalent for statistical purposes.

We propose a more sophisticated analysis, called Density Scoring (DS), that takes into account the topology of the pathway graph by considering the relative positions of differentially expressed genes over the pathway network. This score is then used to adjust any prior gene set enrichment testing scores. Our experiments over lung and breast cancer microarray data show that the DS-adjusted methods assigned a higher mean rank to cancer-related signaling pathways, compared to the original gene set enrichment testing methods. We also show that DS-adjusted methods can more robustly replicate analysis results across studies.

Supplemental Material:

Tables S1 - S4 can be found in the following Excel files:

Java Implementation:

A Java implementation of the described techniques is available for download here:

  • density_analysis_1.0.zip Contains the source code of our Density Analysis technique, along with all the necessary libraries.