Two Big Data Research Projects: High-Dimensional Differential Equation and Network Modeling for GEO Genomics and EHR Phenomics Data
In this talk, first I will present a procedure and analysis pipeline for high-dimensional time course gene expression data from the GEO data repository. A series of advanced statistical methodologies and modeling techniques for high-dimensional gene regulatory networks will be discussed. In particular, we propose a novel matrix-based estimation approach for high- dimensional linear ordinary differential equation models with more than one million unknown parameters. A key idea is to use the similarity transformation of the coefficient matrix and separable least squares approach to reduce the nonlinear optimization space. Simulation studies show promising results for the new method to deal with high-dimensional systems. Two real data application examples will be used to illustrate the usefulness of the new method. I will also briefly introduce the second Big Data research project, EHR-BigData that I am involved (if time permits). Both Big Data projects will be used to demonstrate a novel concept and thinking of data-driven research in the Big Data era.