Appearance Frequency Modulated Gene Set Enrichment Testing

Abstract:

Background

Gene set enrichment analysis has helped bridge the gap from an individual gene to systems biology interpretation of microarray data. Although gene sets are defined a priori based on biological knowledge, all genes are treated as equal in current methods. However, it is well-known that some genes, such as those responsible for housekeeping functions, appear in many pathways, whereas other genes are more specialized and play a unique role in a single pathway.

Drawing inspiration from the field of information retrieval, we develop an approach to incorporate gene appearance frequency (in KEGG) into the Gene Set Enrichment Analysis (GSEA) and logistic regression-based LRpath framework to generate more reproducible and biologically meaningful results.

Results

Two breast cancer microarray datasets were analyzed to identify gene sets associated with the differentiation of histological grade 1 and 3 breast cancer. The correlation of Normalized Enrichment Score (NES) between gene sets, generated by original GSEA and GSEA with the appearance frequency of gene incorporated (GSEA-AF), was compared. GSEA-AF resulted in a higher correlation between experiments and more overlapping top gene sets. Several cancer related gene sets achieved higher NES in GSEA-AF as well. The same datasets were also analyzed by LRpath and LRpath with the appearance frequency of gene incorporated (LRpath-AF). Two well-studied lung cancer datasets were also analyzed in the same manner to demonstrate the validity of the method, and similar results were obtained.

Conclusions

We introduced an alternative way to integrate KEGG PATHWAY information into gene set enrichment analysis. The performance of GSEA and LRpath can be enhanced with the integration of appearance frequency of genes. We conclude that, generally, gene set analysis methods with the integration of information from KEGG PATHWAY performs better both statistically and biologically.

Publications

Java Implementation:

A Java implementation of the described techniques is available for download here: