The development of chromosomal conformation capture techniques, particularly, the Hi-C technique, has made the analysis and study of the spatial conformation of a genome an important topic in bioinformatics and computational biology. Aided by high-throughput next generation sequencing techniques, the Hi-C technology can generate read pairs that indicate the chromosomal locations within spatial proximity and large-scale intra- and inter-chromosomal interaction occuring within a genome (Lieberman-Aiden et al, 2009). This data can be used to reconstruct 3D structures of chromosomes that can be used to study DNA replication, gene regulation, genome interaction, genome folding, and genome function. This data is called the Hi-C data. Generally, before Hi-C data are used for model construction, they are converted to a matrix form known as a contact matrix or a contact map is a N * N matrix, extracted from a Hi-C data, showing the number of interactions between chromosomal regions. The size of the matrix (N) is the number of equal-size regions of a chromosome. The length of equal-size regions (e.g. 1 Mb base pair) is called resolution. Each entry in the matrix contains a count of read pairs that connect two corresponding chromosome regions in a Hi-C experiment. Therefore, the chromosome contact matrix represents all the observed interactions between the regions (or bins) in a chromosome.
This project focus on the development of algorithms for the detection of Hi-C structural read out Topologically Associated Domains (TADs) and chromatin loops from Hi-C data. TADs are considered to be the structural and functional unit (or module) of a chromosome. According to Dixon et al, 2012, these TADs are unchanged irrespective of cell differentiation, and they also contain gene clusters that are co-regulated. In recent years, the detection of topological domain has become an important problem in bioinformatics, and computational biology, and as a result, several methods for TAD identification have been developed. A chromatin loop occurs when stretches of genomic sequence that lie on the same chromosome (configured in cis) are in closer physical proximity to each other than to intervening sequences. These loops are mostle found in boundary regions on TADs, hence these two strutures are closely related.
This work was supported by UCCS Committee on Research and Creative Works (CRCW) Award: 2020-2022
All our algorithms are made public, open-source, and freely accessible to all through our GitHub repository