Technological advances and the information era allow the collection of massive amounts of data at unprecedented resolution. Making use of this data to gain insights into complex phenomena requires characterizing the relationships among a large number of variables. Probabilistic graphical models explicitly capture the statistical relationships between the variables of interest in the form of a network. Such a representation, in addition to enhancing interpretability of the model, often enables computationally efficient inference. My group studies graphical models and develops theory, methodology and algorithms to allow application of these models to scientifically important novel applications. In particular, our work to date has broken new grounds on providing a systematic approach to studying Gaussian graphical models, a framework that is rich enough to capture broad phenomena, but also allows systematic statistical and computational investigations. More generally, we study models with linear constraints on the covariance matrix or its inverse, as they arise in various applications and allow efficient computation. We use a holistic approach that combines ideas from machine learning, mathematical statistics, convex optimization, combinatorics, and applied algebraic geometry. For example, by leveraging the inherent algebraic and combinatorial structure in graphical models, we have uncovered statistical and computational limitations and developed new algorithms for learning directed graphical models to perform causal inference.
Building on our theoretical work, my group also develops scalable algorithms with provable guarantees for applications to genomics, in particular for learning gene regulatory networks. Recent technological developments have led to an explosion of singlecell imaging and sequencing data. Since most experiments require fixing a cell, one can only obtain one data modality per cell, take one snapshot of a particular cell in time, and observe a cell either before or after a perturbation (but not both). Hence a major computational challenge going forward is how to integrate the emerging singlecell data to identify regulatory modules in health and disease. Towards solving this challenge, my group has developed the first provably consistent causal structure discovery algorithms that can integrate observational and interventional data from gene knockout or knockdown experiments. In addition, we recently developed methods based on autoencoders and optimal transport to integrate and translate between different singlecell data modalities and data measured from different time points of a biological process. Together with our geometric models that link the packing of the DNA in the cell nucleus to gene expression, our methods have led to new biomarkers for early cancer prognosis based on singlecell images of DNA stained cell nuclei.
In what follows, we provide more details about our work in the abovementioned areas. A complete list of publications can be found here or in my CV.
Thanks to NSF, ONR, DARPA, IBM, the Sloan Foundation and the Simons Foundation for supporting my research!
Gaussian Graphical Models
We uncovered the deep interplay between mathematical statistics, applied algebraic geometry, and convex optimization with regards to Gaussian graphical models. By developing a geometric understanding for maximum likelihood estimation, we obtained new results on the minimum number of observations needed for existence of the maximum likelihood estimator in Gaussian graphical models. In the Bayesian treatment of Gaussian graphical models, the GWishart distribution plays an important role, since it serves as the conjugate prior. It has been unknown whether the normalizing constant of the GWishart distribution for a general graph could be represented explicitly, and a considerable body of computational literature emerged that attempted to avoid this apparent intractability. We solved this 20year old problem by providing an explicit representation of the GWishart normalizing constant for general graphs.

B. Sturmfels and C. Uhler: Multivariate Gaussians, semidefinite matrix completion and convex algebraic geometry. Annals of the Institute of Statistical Mathematics 62 (2010), pp. 603638.

C. Uhler: Geometry of maximum likelihood estimation in Gaussian graphical models. Annals of Statistics 40 (2012), pp. 238261.

L. Solus, C. Uhler and R. Yoshida: Extremal positive semidefinite matrices for graphs without K5 minors. Linear Algebra and its Applications 509 (2016), pp. 247275.

M. Michalek, B. Sturmfels, C. Uhler and P. Zwiernik: Exponential varieties. Proceedings of the London Mathematical Society 112 (2016), pp. 2756.

C. Uhler, A. Lenkoski and D. Richards: Exact formulas for the normalizing constants of Wishart distributions for graphical models. Annals of Statistics 46 (2018), pp. 90118.

C. Uhler: Gaussian graphical models: An algebraic and geometric perspective. Book chapter in Handbook on Graphical Models (edited by M. Drton, S. Lauritzen, M. Maathuis and M. Wainwright), CRC Press (2018).
Other Linearly Constrained Covariance Models
In many applications it is of interest to pose linear equality or inequality constraints on the covariance matrix or its inverse. While maximum likelihood estimation for Gaussian models with linear constraints on the inverse covariance matrix (e.g. Gaussian graphical models) leads to a convex optimization problem that can be solved efficiently, maximum likelihood estimation for Gaussian models with linear constraints on the covariance matrix is a nonconvex problem that typically has many local optima. Nevertheless, in recent work we provide sufficient conditions for any hillclimbing method to converge to the global optimum provided the number of samples is large enough (n>14p). Hence, surprisingly, maximum likelihood estimation for linear Gaussian covariance models behaves as if it were a convex optimization problem. We also studied Gaussian models with linear inequality constraints, in particular distributions that are multivariate totally positive of order two (MTP2). This property, introduced in the 1970s, is a strong form of positive dependence, an important notion in probability theory and statistical physics. We showed that MTP2 distributions have remarkable properties with respect to Markov structures, making such distributions interesting for modeling in the highdimensional setting. To relax the assumption of Gaussianity, we are currently developing methods for nonparametric density estimation under MTP2.

P. Zwiernik, C. Uhler and D. Richards: Maximum likelihood estimation for linear Gaussian covariance models. Journal of the Royal Statistical Society, Series B, 79 (2017), pp. 12691292

S. Fallat, L. Lauritzen, K. Sadeghi, C. Uhler, N. Wermuth and P. Zwiernik: Total positivity in Markov structures. Annals of Statistics 45 (2017), pp. 11521184.

C. Uhler and D. Richards: Generalized Fréchet bounds for cell entries in multidimensional contingency tables. Journal of Algebraic Statistics 10 (special issue for Stephen E. Fienberg) (2019), pp. 112.

E. Robeva, B. Sturmfels and C. Uhler: Geometry of logconcave density estimation. Discrete & Computational Geometry 61 (2019), pp. 136160.

S. Lauritzen, C. Uhler and P. Zwiernik: Maximum likelihood estimation in Gaussian models under total positivity. Annals of Statistics, 47 (2019), pp. 18351863.

E. Robeva, B. Sturmfels, N. Tran and C. Uhler: Maximum likelihood estimation for totally positive logconcave densities. Submitted.

S. Lauritzen, C. Uhler and P. Zwiernik: Total positivity in structured binary distributions. Submitted.

Y. Wang, U. Roy and C. Uhler: Learning highdimensional Gaussian graphical models under total positivity without tuning parameters.
Causal Inference
Causal inference is a cornerstone of scientific discovery. It is of particular interest to determine causal structure among variables based on observational data, since conducting randomized controlled trials is often impractical or prohibitively expensive. Unfortunately, in general observational data alone cannot uniquely identify a directed graphical model, since different directed graphical models can satisfy the same conditional independence relations (such graphs are Markov equivalent). It is therefore important to understand the set of Markov equivalence classes and their sizes. In recent work we shed new light onto this statistical problem by recasting it into the language of combinatorial optimization. On the learning side, using methods from algebraic geometry and combinatorics we proved that current methodologies for learning causal graphs have severe limitations, namely they require the socalled faithfulness assumption, which is extremely restrictive. It is therefore of interest to study the minimal assumptions required for learning directed graphical models. In recent work, we introduced the sparsest permutation algorithm and proved that it is consistent under strictly weaker assumptions than faithfulness. It is conjectured that this algorithm in fact meets informationtheoretically the minimal assumptions needed for causal inference. Most recently, we showed that also a greedy version of the sparsest permutation algorithm is consistent and can in fact be adapted to obtain the first consistent algorithm for learning interventional Markov equivalence classes from a mix of observational and interventional data, as is becoming available in genomics.

A. Radhakrishnan, L. Solus and C. Uhler: Counting Markov equivalence classes by number of immoralities. Proceedings of the ThirtyThird Conference on Uncertainty in Artificial Intelligence (UAI 2017).

A. Radhakrishnan, L. Solus and C. Uhler: Counting Markov equivalence classes for DAG models on trees. Discrete Applied Mathematics 244 (2018), pp. 170185.

C. Uhler, G. Raskutti, P. Bühlmann and B. Yu: Geometry of faithfulness assumption in causal inference. Annals of Statistics 41 (2013), pp. 436463.

S. Lin, C. Uhler, B. Sturmfels and P. Bühlmann: Hypersurfaces and their singularities in partial correlation testing. Foundations of Computational Mathematics 14 (2014), pp. 10791116.

A. Klimova, C. Uhler and T. Rudas: Faithfulness and learning of hypergraphs from discrete distributions. Journal of Computational Statistics and Data Analysis 87 (2015), pp. 5772.

G. Raskutti and C. Uhler: Learning directed acyclic graphs based on sparsest permutations. To appear in Stat 7 (2018), e183.

F. Mohammadi, C. Uhler, C. Wang and J. Yu: Generalized permutohedra from probabilistic graphical models. To appear in SIAM Journal on Discrete Mathematics 32 (2018), pp. 6493.

L. Solus, Y. Wang, L. Matejovicova and C. Uhler: Consistency guarantees for permutationbased causal inference algorithms. Under review.

Y. Wang, L. Solus, K.D. Yang and C. Uhler: Permutationbased causal inference algorithms with interventions. Advances in Neural Information Processing (NIPS 2017).

K.D. Yang, A. Katcoff, A. and C. Uhler: Characterizing and learning equivalence classes of causal DAGs under interventions. Proceedings of Machine Learning Research 80 (ICML 2018), pp. 55375546.

R. Agrawal, T. Broderick and C. Uhler: Minimal IMAP MCMC for scalable structure discovery in causal DAG models. Proceedings of Machine Learning Research 80 (ICML 2018), pp. 8998

Y. Wang, C. Squires, A. Belyaeva, A. and C. Uhler: Direct estimation of differences in causal graphs. Advances in Neural Information Processing Systems 31 (2018).

D. KatzRogozhnikov, K. Shanmugam, C. Squires and C. Uhler: Size of interventional Markov equivalence classes in random DAG models. Proceedings of Machine Learning Research 89 (AISTATS 2019), pp. 32343243.

R. Agrawal, C. Squires, K.D. Yang, K. Shanmugam and C. Uhler: ABCDStrategy: Budgeted experimental design for targeted causal structure discovery. Proceedings of Machine Learning Research 89 (AISTATS 2019), pp. 34003409.

B. Saeed, A. Belyaeva, Y. Wang and C. Uhler: Anchored causal inference in the presence of measurement noise. Submitted.
Gene Regulation and Chromosome Packing
The same string of genetic information encodes about 200 different cell types in our body. The emerging hypothesis is that the spatial organization of the genome is crucial in order to differentially turn on expression programs. In collaboration with the Shivashankar lab, a cell biology lab at the National University of Singapore, we probe this hypothesis using experiments and geometric models by integrating different singlecell modalities.

C. Uhler and S.J. Wright: Packing ellipsoids with overlap. SIAM Review 55 (2013), pp. 671706 (selected as Research Spotlight)

M. IglesiasHam, M. Kerber and C. Uhler: Sphere packing with limited overlap. Proceedings of the 26th Canadian Conference on Computational Geometry, Halifax, Nova Scotia (2014), pp. 155161.

C. Uhler and G.V. Shivashankar: Geometric control and modeling of genome reprogramming. BioArchitecture 6 (2016), pp. 7684.

Y. Wang, M. Nagarajan, C. Uhler and G.V. Shivashankar: Orientation and repositioning of chromosomes correlate with cell geometrydependent gene expression. To appear in Molecular Biology of the Cell 28 (2017), pp. 19972009.

C. Uhler and G.V. Shivashankar: Chromosome intermingling: Mechanical hotspots for genome regulation. Trends in Cell Biology 27 (2017), pp. 810819 (invited review).

C. Uhler and G.V. Shivashankar: Regulation of genome organization and gene expression by nuclear mechanotransduction. Nature Reviews Molecular Cell Biology 18 (2017), pp. 717727 (invited review).

A. Belyaeva, S. Venkatachalapathy, M. Nagarajan, G.V. Shivashankar and C. Uhler: Network analysis identifies chromosome intermingling regions as regulatory hotspots for transcription. Proceedings of the National Academy of Sciences, U.S.A. 114 (2017), pp. 1371413719.

A. Radhakrishnan, D. Damodaran, A.C. Soylemezoglu, C. Uhler and G.V. Shivashankar: Machine learning for nuclear mechanomorphometric biomarkers in cancer diagnosis. Scientific Reports 7 (2017), article nr. 17946.

C. Uhler and G.V. Shivashankar: Nuclear mechanopathology and cancer diagnosis. Trends in Cancer 4 (2018), pp. 320331 (invited review).

A. Radhakrishnan, C. Durham, A. Soylemezoglu and C. Uhler: PatchNet: Interpretable neural networks for image classification. Machine Learning for Health (ML4H) Workshop, Neural Information Processing Systems (2018).

K.D. Yang and C. Uhler: Scalable unbalanced optimal transport using generative adversarial networks. International Conference on Learning Representations (ICLR 2019).

K.D. Yang and C. Uhler: Multidomain translation by learning uncoupled autoencoders. Computational Biology Workshop, International Conference on Machine Learning (2019).

K.D. Yang, K. Damodaran, S. Venkatchalapathy, A.C. Soylemezoglu, G.V. Shivashankar and C. Uhler: Autoencoder and optimal transport to infer singlecell trajectories of biological processes. Submitted.