HENRY L. AND GRACE DOHERTY ASSOCIATE PROFESSOR
Department of Electrical Engineering and Computer Science and Institute for Data, Systems and Society
Massachusetts Institute of Technology
77 Massachusetts Ave., 32-D634 and E17-467
Cambridge, MA 02139, USA
I joined the MIT faculty in 2015 and am currently the Henry L. and Grace Doherty associate professor in EECS (Electrical Engineering & Computer Science) and IDSS (Institute for Data, Systems and Society). I am a member of LIDS (Laboratory for Information and Decision Systems), the Center for Statistics, Machine Learning at MIT, and the ORC (Operations Research Center).
I hold an MSc in mathematics, a BSc in biology, and an MEd in mathematics education from the University of Zurich, and a PhD in statistics from UC Berkeley. Before joining MIT, I spent a semester in the "Big Data" program at the Simons Institute at UC Berkeley, postdoctoral positions at the IMA and at ETH Zurich, and 3 years as an assistant professor at IST Austria.
I am an elected member of the International Statistical Institute and a recipient of a Simons Investigator Award, a Sloan Research Fellowship, an NSF Career Award, a Sofja Kovalevskaja Award from the Humboldt Foundation, and a START Award from the Austrian Science Foundation.
My research focuses on statistics, machine learning and computational biology, in particular on graphical models, causal inference, algebraic statistics and applications to genomics, for example on linking the spatial organization of the DNA with gene regulation.
My complete CV can be found here.
10/2019: International Conference on Genomes and AI - From Packing to Regulation
Together with GV Shivashankar I am organizing a conference at the intersection of machine learning and genome biology. Recent technological developments in genomics as well as rapid advancements in AI and machine learning provide exciting opportunities for integrating and exploiting the emerging vast amounts of imaging and sequencing data. This conference will bring together leading experts, early-career scientists and students to unravel the mechanogenomic codes that link genome architecture and function, thereby paving the way for novel disease biomarkers and therapeutic interventions.
07/2019: SIAM Conference on Applied Algebraic Geometry
Together with Sandra Di Rocco, I was the chair of the SIAM Conference on Applied Algebraic Geometry that took place in Bern, Switzerland. We had over 160 (!) minisymposia and were able to attract 850 (!) participants, nearly 2x (!) as many as in the last conference. If you missed some of the outstanding plenary talks, see the conference website. A huge thank you to my amazing co-chair Sandra Di Rocco and the local organizers, in particular Jan Draisma, for a great collaboration! [website]
06/2019: Simons Investigator in the Mathematical Modelling of Living Systems
I received a Simons Investigator Award! A huge thank you to Steve Wright and Bernd Sturmfels for their continued support and their supporting letters! Also a big thank you to my group for their excellent work! [announcement]
06/2019: Graduation of my first PhD student
Congratulations to my PhD student, Yuhao Wang, for earning his PhD! Yuhao came from Tsinghua University to Cambridge (MIT) for his PhD and is now going to the Statistics Laboratory at Cambridge (UK) for a short postdoc and then back to Tsinghua University to start his own group. What a great journey! [commencement] [good-by lab lunch]
06/2019: Anchored causal inference in the presence of measurement error
We just uploaded a paper to the arXiv that was led by my undergraduate (!) student Basil Saeed in collaboration with my PhD students Anastasiya Belyaeva and Yuhao Wang. We consider the problem of learning a causal graph under measurement error, a highly relevant setting ingenomics with single-cell RNA-seq data being known to be highly zero-inflated. Using method-of-moments estimators, we develop a provably consistent procedure for estimating the causal structur from corrupted observations on its nodes, under a variety of measurement error models. We demonstrate our method's performance through simulations and on real data, where we recover the underlying gene regulatory network from single-cell RNA-seq data. Great work, Basil, Anastasiya and Yuhao! [arXiv].
06/2019: Anchored causal inference in the presence of measurement error
In a paper that we just uploaded to the arXiv, we consider the problem of estimating an undirected Gaussian graphical model when the underlying distribution is multivariate totally positive of order 2 (MTP2), a strong form of positive dependence. Such distributions are relevant for example for portfolio selection, since assets are usually positively dependent. A large body of methods have been proposed for learning undirected graphical models without the MTP2 constraint. A major limitation of these methods is that their consistency guarantees in the high-dimensional setting usually require a particular choice of a tuning parameter, which is unknown a priori in real world applications. Interestingly, we show that an undirected graphical model under MTP2 can be learned consistently without any tuning parameters. This is achieved by a constraint-based estimator that infers the structure of the underlying graphical model by testing the signs of the empirical partial correlation coefficients. We evaluate the performance of our estimator in simulations and on financial data. Great job, Yuhao and Uma! [arXiv]
06/2019: Presentations of my lab's work at ICML
My lab members will be giving various presentations of our work at ICML this year. My PhD student Karren and I will be giving talks on how to integrate and translate between different data modalities using autoencoders in the Workshop on Computational Biology, my PhD student Adit will be presenting our work on memorization in overparameterized autoencoders in the Workshop on Identifying and Understanding Deep Learning Phenomena, and I will be presenting our work on causality in the Workshop on Learning and Reasoning with Graph-Structured Representations. Come find us at ICML!
05/2019: Total positivity in structured binary distributions
We just uploaded a paper to the arXiv, where we study total positivity in exponential families, in particular in Ising models. We show that totally positive exponential families are convex and hence maximum likelihood estimation can be solved via a convex optimization problem. In addition, we show that for quadratic exponential families such as Ising models or Gaussian models, the distributions form a polyhedral cone, where the faces correspond to conditional independence relations. Hence total positivity acts as an implicit regularizer. We also extend Iterative Proportional Scaling to an algorithm for computing the MLE in Ising models under total positivity and apply it to a dataset in psychology, where standard methods could not be applied due to its high dimensionality. Thanks for another great collaboration, Piotr and Steffen! [arXiv]
04/2019: Abdul Latif Jameel Clinic for Machine Learning in Health Grant
I received a J-Clinic grant for a joint project with Ernest Fraenkel!
03/2019: My postdocs go places
A busy interview season is coming to an end. Starting in the fall, my postdoc Elina Robeva will start a tenure-track position at the University of British Columbia, Kaie Kubjas is returning to Aalto University, and Elisa Perrone is starting at UMass Lowell. Congratulations!!
01/2019: Tenure-track position at Tsinghua University for my PhD student
Congratulations to my PhD student Yuhao Wang for obtaining an early offer for a tenure-track assistant professor position at Tsinghua University!
12/2018: Paper at ICLR 2019
Congratulations to my PhD student Karren! Our paper on scalable unbalanced optimal transport using generative adversarial networks just got accepted to ICLR. In particular, have a look at the application to population modeling in single-cell RNA-seq data during zebrafish embryogenesis in the blastula and gastrula stages. [arXiv]
12/2018: Two papers at AISTATS 2019
Congratulations to my students Raj, Chandler and Karren, and our collaborators at IBM, Karthik and Dmitriy! Our papers "Budgeted experimental design for targeted causal structure discovery" and "Size of interventional Markov equivalence classes in random DAG models" were accepted at AISTATS 2019.
12/2018: Presentations at NeurIPS
Congratulations to my students Yuhao, Chandler and Anastasiya! They will be presenting their paper on the direct estimation of differences in causal graphs at NeurIPS later this month. [arXiv]
In addition, my student Adit will be presenting our work on interpretable neural networks for image classification at the Machine Learning for Health (ML4H) Workshop at NeurIPS. [arXiv]
10/2018: Predicting cell lineages by generative modeling and optimal transport
Lineage tracing involves the identification of all ancestors and descendants of a given cell, and is an important tool for studying biological processes such as development and disease progression. However, in many settings controlled time-course experiments are not feasible, for example when working with tissue samples from patients. We just uploaded a paper to the bioRxiv presenting ImageAEOT, a computational pipeline based on generative modeling and optimal transport for predicting the lineages of cells using independent datasets from different stages of a cellular process. As a proof-of-concept, we applied ImageAEOT to nuclear and chromatin images during the activation of fibroblasts by tumor cells in engineered 3D tissues and in various breast cancer cell lines and human tissue samples, thereby linking alterations in chromatin condensation patterns to different stages of tumor progression. Importantly, ImageAEOT can infer the trajectory of a particular cell from one snapshot in time and identify the changing features to provide early biomarkers for developmental and disease progression. [bioRxiv]
07/2018: Presentations at ICML
Congratulations to my PhD students Karren and Raj! Their papers were accepted at ICML and selected for oral presentations. They will be giving their talks at ICML later this month. [arXiv:Karren] [arXiv:Raj]
06/2018: Maximum likelihood estimation for totally positive log-concave densities
We just uploaded a paper to the arXiv where we study nonparametric density estimation under a strong form of positive dependence known as MTP2. Given n samples from a d-dimensional MTP2 distribution we show that the MLE exists and is unique with probability one already when n≥3, independent of d. In addition, we show that the MLE is a tent function in many settings and can be computed efficiently using the conditional gradient method. Thanks for a great collaboration, Elina, Ngoc and Bernd! [arXiv]
05/2018: Joseph A. Martore Award for Exceptional Contributions to Education
I received a teaching award for designing the most popular class in IDSS.
05/2018: Abdul Latif Jameel World Water and Food Security Grant
04/2018: High-dimensional joint estimation of multiple directed Gaussian graphical models
We just uploaded a paper to the arXiv where we provide an algorithm together with high-dimensional consistency guarantees for learning a collection of directed Gaussian graphical models. This problem is of particular interest for learning gene regulatory networks based on gene expression data from different tissues, developmental stages or disease states. As a corollary we also obtain high-dimensional consistency results for causal inference from a mix of observational and interventional data. Great job, Yuhao and Santiago! [arXiv]
04/2018: Lay Summaries of our Recent Collaborations with the Shivashankar Lab at NUS
The Mechanobiology Institute at NUS published some great news articles describing the results from our collaboration with the Shivashankar lab on nuclear architecture, mechanogenomic codes and machine learning for early cancer detection. [news article 1] [news article 2] [news article 3]
04/2018: Summer School on Graphical Models: From Mathematical Foundations to Biological Applications
Together with Niko Beerenwinkel and Lisa Lambarti I am co-organizing a summer school on graphical models at ETH Zurich. The summer school covers models for high-dimensional data, causal inference from observational and interventional data, algebraic approaches to graphical models, and applications to genomics. The format is general lectures in the morning (given by Bernd Sturmfels, Jack Kuipers, Niko Beerenwinkel and myself) and specialized short talks by participants in the afternoon. Registration and abstract submission is now open. [website]
03/2018: Nuclear Mechanopathology and Cancer Diagnosis
My joint review with Shiva on how defects in nuclear mechanotransduction can lead to cancer onset and the use of machine learning methods to identify abnormal nuclear morphology and architecture as early cancer biomarkers just appeared in Trends in Cancer. [journal]
03/2018: Promotion to Associate Professor
From June onward I am Associate Professor! [news article]
03/2018: Minimal I-MAP MCMC for Scalable Structure Discovery in Causal DAG Models
We just submitted a paper where we propose a new method for sampling from the posterior over causal networks with provable guarantees, which overcomes some of the computational limitations of current methods, which depend on the maximum degree of the underlying causal DAG. This is a real limitation in particular for biological applications, which are known to contain high degree hub nodes. Great job, Raj! [arXiv]
02/2018: Geometry of Discrete Copulas
We just uploaded a paper where we unify the theory of discrete copulas with generalizations of the Birkhoff polytope studied in discrete geometry. Great job, Elisa and Liam! [arXiv]
02/2018: Characterizing and Learning Equivalence Classes of Causal DAGs under Interventions
We just submitted a paper where we show that the equivalence classes of causal DAG models under soft and hard interventions are the same and provide an algorithm for learning the interventional equivalence class from a mix of observational data and data from soft interventions. This has important implications for genomic perturbation experiments, since our results imply that the causal information gained from the more invasive gene knockout experiments do not provide more causal information than the less invasive gene knockdown experiments. Congrats, Karren and Abigail! [arXiv]
02/2018: Direct Estimation of Differences in Causal Graphs
We have uploaded a paper where we develop a method for directly estimating the difference of causal networks. This is particularly beneficial when the individual networks are large and complex, while the difference is small and sparse, as for example for gene regulatory networks in different disease states or for related cell types, etc. We applied our algorithm to single-cell RNA-seq data during the activation of T-cells. Great job, Yuhao,Chandler and Anastasiya! [arXiv]
12/2017: Machine Learning for Nuclear Mechano-Morphometric Biomarkers in Cancer Diagnosis
Our paper, where we develop a convolutional neural net pipeline for detecting subtle changes in nuclear morphometrics of single cells and apply it to differentiate between normal and cancer cells just appeared in Scientific Reports. Congrats, Adit, Karthik and Ali! [journal]
12/2017: Program Director for the SIAM Activity Group on Algebraic Geometry
I got elected as the program director for SI(AG)^2 and will be in charge of putting together an exciting program for the next SIAM Conference on Applied Algebraic Geometry taking place at the University of Bern in Switzerland, July 9-13, 2019. [conference website]
12/2017: Network Analysis Identifies Chromosome Intermingling Regions as Regulatory Hotspots for
Our paper on developing network analysis methods for identifying clusters of interactions between chromosomes just appeared in PNAS. Congratulations Anastasiya and Saradha! [journal]
11/2017: Regulation of Genome Organization and Gene Expression by Nuclear Mechanotransduction
My joint review with Shiva on how mechanical cues are transduced to the nucleus to regulate gene expression programs and the role played by the spatial chromosome organization just appeared in Nature Reviews Molecular Cell Biology. [journal]
11/2017: Maximum Likelihood Estimation in Gaussian Models under Total Positivity
Our paper is to appear in the Annals of Statistics. Thanks for a fun collaboration, Piotr and Steffen! [arXiv]
09/2017: Generalized Permutohedra from Probabilistic Graphical Models
Our paper, where we extend graph associahedra to the directed setting and show how these polytopes can be used for causal inference, got accepted to SIAM Journal on Discrete Mathematics. Thanks for a fun project, Fatemeh, Charles and Josephine! [journal]
09/2017: Permutation-Based Causal Inference Algorithms with Interventions
Our paper, where we extend permutation-based causal inference algorithms to the interventional setting and show how such methods can be applied for analyzing perturb-seq single-cell gene expression data, got accepted to NIPS. Congratulations, Yuhao, Liam and Karren! [conference proceedings]
08/2017: Generalized Fréchet Bounds for Cell Entries in Multidimensional Contingency Tables
In this short paper coauthored with Don Richards and written in honor of Steve Fienberg we show how to use supermodularity of the marginalization function in order to obtain new generalized Fréchet inequalities for multidimensional contingency tables. [arXiv]
07/2017: Chromosome Intermingling: Mechanical Hotspots for Genome Regulation
My joint review with Shiva on the packing of chromosomes and its importance for genome regulation just appeared in Trends in Cell Biology. [journal]
07/2017: Gaussian Graphical Models: An Algebraic and Geometric Perspective
In this pedagogical introduction to Gaussian graphical models I highlight the important role of algebraic and geometric properties of Gaussian graphical models for maximum likelihood estimation in these models. This overview was written as a chapter for the Handbook on Graphical Models edited by M. Drton, S. Lauritzen, M. Maathuis and M. Wainwright. Stay tuned! [arXiv]
06/2017: Counting Markov Equivalence Classes by Number of Immoralities
Our paper has been accepted at UAI 2017. Great job, Adit and Liam! [arXiv]
06/2017: Orientation and Repositioning of Chromosomes Correlate with Cell Geometry-Dependent Gene
Our paper, where we combine experiments with modeling of the 3D organization of chromosomes just appeared in Molecular Biology of the Cell. Congrats, Yejun and Mallika! [journal]
06/2017: Total Positivity in Markov Structures
05/2017: Permutation-Based Causal Inference Algorithms with Interventions
We have uploaded a paper where we extend permutation-based causal inference algorithms to the interventional setting and show how such methods can be applied for analyzing perturb-seq single-cell gene expression data. Great job, Yuhao, Liam and Karren! [arXiv]
05/2017: PatchNet: Interpretable Neural Networks for Image Classification
We have just submitted a paper, where we present a new neural network that restricts global context for image classification tasks to provide interpretable features, which is crucial for the acceptance of neural models in health care. Congrats, Adit and Charly! [arXiv]
04/2017: NSF Career Award
I received an NSF Career Award! [news]
04/2017: Geometry of Log-Concave Density Estimation
We have uploaded a paper, where we analyze the geometry of the maximum likelihood density estimator under log-concavity. Great job, Elina! [arXiv]
02/2017: Sloan Research Fellowship
I received a Sloan Research Fellowship! [news]
02/2017: Maximum Likelihood Estimation in Gaussian Models under Total Positivity
After presenting this work in different talks, we have now submitted our manuscript on MTP2 Gaussian graphical models. In particular, we propose an alternative to the graphical lasso that can be applied in high dimensions and does not require any tuning parameters. Thanks for a great collaboration, Piotr and Steffen! [arXiv]
02/2017: Consistency Guarantees for Permutation-Based Causal Inference Algorithms
We have uploaded a new paper containing the first consistency proof for permutation based causal inference algorithms. Great job, Liam and Yuhao! [arXiv]
01/2017: Exact Goodness-of-Fit Testing for the Ising Model