Understanding gene regulation through graph-based posterior regularization in structured probabilistic models
Despite having sequenced the human genome over fifteen years ago, much is still unknown about how it functions. With the advent of high-throughput genomics technologies, it is now possible to measure properties of the genome across the entire genome in a single experiment, such as measuring where a given protein binds to the DNA or what genes are expressed. However, the complexity and massive scale of these data sets--billions of base pairs with thousands of measurements each--pose challenges to their analysis. My research focuses on the development of new machine learning methods that address the challenges posed by genomics data sets.
I will focus on a method for combining probabilistic models with graph-based methods for semi-supervised learning. Graph-based based methods have been successful in solving many types of semi-supervised learning problems by optimizing a graph smoothness criterion. This criterion states that data instances nearby in a given graph are likely to have similar properties. A graph smoothness criterion cannot be directly incorporated into a generative unsupervised model because it is usually not clear what probabilistic process generated the data instances with respect to the graph, and incorporating the graph directly into a factorizable (i.e. time-series) model would break the model's factorizable structure, making exact inference methods like belief propagation intractable. This method, called entropic graph-based posterior regularization (EGPR) provides a way to express a graph smoothness criterion in a probabilistic model by defining a regularization term on an auxiliary posterior distribution variable. We applied this approach to regulatory genomics data sets from the human genome, leading to the discovery of a new type of regulatory domain.
Note: the material in this talk will be distinct from my 2017-09-21 talk at VanBUG.
Bio: Maxwell Libbrecht is an Assistant Professor in Computing Science at Simon Fraser University. He received his PhD in 2016 from the Computer Science and Engineering department at University of Washington, advised by Bill Noble and Jeff Bilmes. He received his undergraduate degree in Computer Science from Stanford University, where he did research with Serafim Batzoglou. His research focuses on developing machine learning methods applied to high-throughput genomics data sets. He was the first author of a paper named one of ISCB's Top 10 Regulatory and Systems Genomics papers of 2015.