Understanding genome function is one of the grand scientific challenges of the 21st century. This challenge lies not only in the structural and spatial complexity of the genome’s organization, resulting from the coordinated action of many different components, but also in the dynamical complexity of genome organization, encompassing processes that occur over different time scales, ranging from nanoseconds to hours, to exquisitely regulate the function of the genome. Via developing novel theoretical and computational approaches, the Zhang Lab aims to build a global framework of the human genome that connects its sequence with structure and activity, and to enable quantitative and predictive modeling of genome structure and function.
Information theoretic study of genome structure and dynamics
Similar to the way catalytic activity is lost when proteins denature, the function of the genome appears to be tightly coupled to its three-dimensional organization, which has remained largely a mystery. We are developing computational approaches with rigorous statistical mechanics foundation to enable predictive modeling of genome organization using sequence information only. These approaches will be data driven, and will mine through the massive amounts of valuable information on the human genome produced via large-scale projects, such as the Human Genome Project and the Encyclopedia of DNA Elements Project. As these experiments probe the genome in its native environment, structural models that can effectively make use of the “big data” will provide more detailed and realistic 3D representations of the genome. Predictive modeling of the 3D structure of the genome will reveal its underlying organization principle, and will suggest new ways to perturb the genome for engineering novel functions.
Multiscale modeling of chromatin fiber
Can we predict the genome structure de novo, i.e., starting from physicochemical principle, without using experimental data? Two challenges arise for a first principle modeling of genome organization: i) the large system size and the complexity of molecular players place high demand on computational efficiency and chemical accuracy of the model; ii) the genome organization is not under a strict thermodynamic control, and many non-equilibrium active processes motorize the chromosome. We are building advanced computational models to address these challenges, to accurately describe protein-protein and protein-DNA interactions occurring over micrometer lengthscale, and to couple dynamical processes spanning from nanosecond molecular motions to second time scale enzymatic reactions.
Coupling genome structural dynamics with gene regulation
The structure and dynamics of the genome need to be investigated in the context of its activity to fully appreciate the driving force of genome folding. As is for proteins, the function of the genome is tightly coupled with its structure, i.e., the 3D organization. A prominent example is when a chromosome folds, the loops that form bring enhancers and promoters located far from each other in linear sequence (~ 1 Mb) closer together spatially for gene regulation. At a shorter lengthscale (~10 kb), conformational dynamics of the two forms of chromatins, euchromatin and heterochromatin, can also play a role in gene regulation. We are developing novel theoretical approaches to incorporate genome structure and dynamics into gene network models, and to elucidate the interplay of epigenetic modification and transcription factor regulation in cellular differentiation and fate determination.