SNPs2ChIP: Latent Factors of ChIP-seq to infer functions of non-coding SNPs
Nov 1, 2018
·
1 min read
Abstract
Genetic variations of the human genome are linked to many disease phenotypes. While whole-genome sequencing and genome-wide association studies (GWAS) have uncovered a number of genotype-phenotype associations, their functional interpretation remains challenging given most single nucleotide polymorphisms (SNPs) fall into the non-coding region of the genome.
Genetic variations of the human genome are linked to many disease phenotypes. While
whole-genome sequencing and genome-wide association studies (GWAS) have uncovered a
number of genotype-phenotype associations, their functional interpretation remains
challenging given most single nucleotide polymorphisms (SNPs) fall into the non-coding
region of the genome. Advances in chromatin immunoprecipitation sequencing (ChIP-seq)
have made large-scale repositories of epigenetic data available, allowing investigation
of coordinated mechanisms of epigenetic markers and transcriptional regulation and their
influence on biological function. To address this, we propose SNPs2ChIP, a method to
infer biological functions of non-coding variants through unsupervised statistical
learning methods applied to publicly-available epigenetic datasets. We systematically
characterized latent factors by applying singular value decomposition to ChIP-seq tracks
of lymphoblastoid cell lines, and annotated the biological function of each latent
factor using the genomic region enrichment analysis tool. Using these annotated latent
factors as reference, we developed SNPs2ChIP, a pipeline that takes genomic region(s)
as an input, identifies the relevant latent factors with quantitative scores, and
returns them along with their inferred functions. As a case study, we focused on
systemic lupus erythematosus and demonstrated our method’s ability to infer relevant
biological function. We systematically applied SNPs2ChIP on publicly available
datasets, including known GWAS associations from the GWAS catalogue and ChIP-seq peaks
from a previously published study. Our approach to leverage latent patterns across
genome-wide epigenetic datasets to infer the biological function will advance
understanding of the genetics of human diseases by accelerating the interpretation of
non-coding genomes.
Type
Publication
Published in Pacific Symposium on Biocomputing, 2018
We propose SNPs2ChIP, a method to infer biological functions of non-coding variants through unsupervised statistical learning methods applied to publicly-available epigenetic datasets.