Gene Specific Methylation Profile

The GSMP tool was developed by a team lead by Yusha Liu and Jeff Morris with Moon Shot support.  In brief the tool is designed to identify functionally important CpGs for each gene, with direction of interaction.   The GSMPs can be used to construct gene-level methylation scores that are maximally correlated with gene expression for use with integrative models, and produce a tissue-specific measure of percent variability in expression explained by methylation for each gene.

Click the icon below to access.  It is available on the internal development server and therefore not considered stable.  Please note it is only accessible from within MD Anderson.  Contact Dr. Morris (jefmorris@mdanderson.org) if the server is down.

Detailed Background:

DNA methylation plays an important role in the regulation of gene transcription. In the past, most studies have focused on the methylation in promoter regions and CpG islands and focusing on negative correlation when integrating DNA methylation and gene expression. However, recent studies have revealed that functionally important methylation also occurs in intragenic and distal regions, and varies across genes and tissue types. Moreover, the increasing availability of multi-platform genomics and epigenomics datasets in cancer and other diseases has enabled researchers to perform integrative analyses across platforms, which often requires the calculation of gene-level summaries for various platforms. Therefore, there is urgent need to develop an approach to construct gene-level methylation summaries that accounts for the complicated relationships between methylation and expression.

In this tool, we propose a sequential penalized regression approach to construct gene-specific methylation profiles (GSMPs), considering all CpG sites within the gene body or within +/- 500kb.  This yields tissue-specific sparse lists of functionally important CpGs for each gene with corresponding weights indicating strength and direction of association.  While producing gene-specific CpG methylation-expression relationships, our sequential approach combines information across the genome via global patterns of CpG methylation-expression relationships to focus on CpGs more likely to be functionally important when several CpGs are correlated with each other.  The GSMPs can be used to construct gene-level methylation scores that are maximally correlated with gene expression for use with integrative models, and produce a tissue-specific measure of percent variability in expression explained by methylation for each gene.

We develop the GSMPs in the setting of colorectal cancer, using TCGA tumor samples to build the model, and assessing its performance using cross validation as well as a separate independent validation data set obtained from colorectal tumor samples from M.D. Anderson patients. The comparison with existing approaches demonstrates the advantage of our proposed method in terms of sparsity and ability to explain expression variability. We illustrate how to interpret GSMPs on a chosen subset of genes, and provide a freely available database containing the GSMPs and accompanying plots for colorectal cancer for all genes in the genome. We are also applying this approach to each of the TCGA tumor types to produce GSMPs for each that will be made freely available.