Richard Guo | publications

preprints

Model-oriented graph distances via partially ordered sets Armeen Taeb, Guo, F. Richard, and Leonard Henckel 2025 [Abs] [arXiv] [Slides] [Video]
A well-defined distance on the parameter space is key to evaluating estimators, ensuring consistency, and building confidence sets. While there are typically standard distances to adopt in a continuous space, this is not the case for combinatorial parameters such as graphs that represent statistical models. Defined on the graphs alone, existing proposals like the structural Hamming distance ignore the structure of the model space and can thus exhibit undesirable behaviors. We propose a model-oriented framework for defining the distance between graphs that is applicable across different graph classes. Our approach treats each graph as a statistical model and organizes the graphs in a partially ordered set based on model inclusion. This induces a neighborhood structure, from which we define the model-oriented distance as the length of a shortest path through neighbors, yielding a metric in the space of graphs. We apply this framework to probabilistic undirected graphs, causal directed acyclic graphs, causal acyclic directed mixed graphs, probabilistic completed partially directed acyclic graphs, and causal maximally oriented partially directed acyclic graphs. We analyze theoretical and empirical behaviors of the model-oriented distance. By exploiting the underlying poset structures, we develop algorithms for computing and bounding the proposed distance that scale to moderate-sized graphs. Finally, we showcase its utility for quantifying the robustness of adjustment sets to errors in specifying the causal graph.
On the universal calibration of heavy-tailed combination tests Parijat Chakraborty, Guo, F. Richard, Kerby Shedden, and Stilian Stoev 2025 [Abs] [arXiv] [Code]
It is often of interest to test a global null hypothesis using multiple, possibly dependent p-values by combining their strengths while controlling the type-I error. Recently, several heavy-tailed combination tests, such as the harmonic mean test and the Cauchy combination test, have been proposed: they transform p-values into heavy-tailed random variables before combining them into a single test statistic. The resulting tests, which are calibrated under some form of independence assumption among the p-values, have been shown to be rather robust to dependence asymptotically as the αlevel gets small. Yet, it has remained an open problem to understand this general phenomenon and characterize how such tests behave under dependence. Using the framework of multivariate regular variation from extreme value theory, we show that for a class of combination tests that are homogeneous, the asymptotic level of the test can be expressed using the angular measure under multivariate regular variation. This measure characterizes the dependence of the transformed heavy-tailed variables in their upper tails, or equivalently, the dependence of the p-values near zero. We use this result to study several tests. The harmonic mean test, which coincides with the Pareto linear combination test, is shown to be universally calibrated regardless of the tail dependence; further, this test is shown to be the only one that achieves universal calibration among all homogeneous heavy-tailed combination tests. In contrast, the Cauchy combination test is shown to be universally honest but often conservative; the Dunn–Šidák correction, also known as the Tippett’s method, while being honest, is calibrated if and only if the underlying p-values are independent near zero. These theoretical findings are corroborated with simulations and an application to independence testing with survey data.
The categorical instrumental variable model: characterization, partial identification, and statistical inference Yilin Song, Guo, F. Richard, K. C. Gary Chan, and Thomas S. Richardson 2025 [Abs] [arXiv] [Slides] [Video]
We study categorical instrumental variable (IV) models with instrument, treatment, and outcome taking finitely many values. We derive a simple closed-form characterization of the set of joint distributions of potential outcomes that are compatible with a given observed data distribution in terms of a set of inequalities. These inequalities unify several different IV models defined by versions of the independence and exclusion restriction assumptions and are shown to be non-redundant. Finally, given a set of linear functionals of the joint counterfactual distribution, such as pairwise average treatment effects, we construct confidence intervals with simultaneous finite-sample coverage, using a tail bound on the Kullback–Leibler divergence. We illustrate our method using data from the Minneapolis Domestic Violence Experiment.
Confounder selection: objectives and approaches Guo, F. Richard, Anton Rask Lundborg, and Qingyuan Zhao 2022 [Abs] [arXiv]
Confounder selection is perhaps the most important step in the design of observational studies. A number of criteria, often with different objectives and approaches, have been proposed, and their validity and practical value have been debated in the literature. Here, we provide a unified review of these criteria and the assumptions behind them. We list several objectives that confounder selection methods aim to achieve and discuss the amount of structural knowledge required by different approaches. Finally, we discuss limitations of the existing approaches and implications for practitioners.

journals

Confounder selection via iterative graph expansion Guo, F. Richard, and Qingyuan Zhao Annals of Statistics 2026 [Abs] [Publisher] [arXiv] [Slides] [Video] [Code] [Shiny WebApp]
Confounder selection, namely choosing a set of covariates to control for confounding between a treatment and an outcome, is arguably the most important step in the design of an observational study. Previous methods, such as Pearl’s back-door criterion, typically require pre-specifying a causal graph, which can often be difficult in practice. We propose an interactive procedure for confounder selection that does not require pre-specifying the graph or the set of observed variables. This procedure iteratively expands the causal graph by finding what we call primary adjustment sets for a pair of possibly confounded variables. This can be viewed as inverting a sequence of marginalizations of the underlying causal graph. Structural information in the form of primary adjustment sets is elicited from the user, bit by bit, until either a set of covariates is found to control for confounding or it can be determined that no such set exists. Other information, such as the causal relations between confounders, is not required by the procedure. We show that if the user correctly specifies the primary adjustment sets in every step, our procedure is both sound and complete.
Rank-transformed subsampling: inference for multiple data splitting and exchangeable p-values Guo, F. Richard, and Rajen D. Shah Journal of Royal Statistical Society, Series B 2025 [Abs] [Publisher] [arXiv] [Slides] [Video] [Code]
Many testing problems are readily amenable to randomised tests such as those employing data splitting, which divide the data into disjoint parts for separate purposes. However despite their usefulness in principle, randomised tests have obvious drawbacks. Firstly, two analyses of the same dataset may lead to different results. Secondly, the test typically loses power because it does not fully utilise the entire sample. As a remedy to these drawbacks, we study how to combine the test statistics or p-values resulting from multiple random realisations such as through random data splits. We introduce rank-transformed subsampling as a general method for delivering large sample inference about the combined statistic or p-value under mild assumptions. We apply our methodology to a range of problems, including testing unimodality in high-dimensional data, testing goodness-of-fit of parametric quantile regression models, testing no direct effect in a sequentially randomised trial and calibrating cross-fit double machine learning confidence intervals. For the latter, our method improves coverage in finite samples and for the testing problems, our method is able to derandomise and improve power. Moreover, in contrast to existing p-value aggregation schemes that can be highly conservative, our method enjoys type-I error control that asymptotically approaches the nominal level.
Richard Guo’s contribution to the Discussion of ‘Parameterizing and Simulating from Causal Models’ by Evans and Didelez Guo, F. Richard Journal of Royal Statistical Society, Series B 2024 [Publisher] [PDF] [Slides]
Variable elimination, graph reduction and efficient g-formula Guo, F. Richard, Emilija Perković, and Andrea Rotnitzky Biometrika 2023 [Abs] [Publisher] [arXiv] [Poster] [Slides] [Code]
We study efficient estimation of an interventional mean associated with a point exposure treatment under a causal graphical model represented by a directed acyclic graph without hidden variables. Under such a model, it may happen that a subset of the variables are uninformative in that failure to measure them neither precludes identification of the interventional mean nor changes the semiparametric variance bound for regular estimators of it. We develop a set of graphical criteria that are sound and complete for eliminating all the uninformative variables so that the cost of measuring them can be saved without sacrificing estimation efficiency, which could be useful when designing a planned observational or randomized study. Further, we construct a reduced directed acyclic graph on the set of informative variables only. We show that the interventional mean is identified from the marginal law by the g-formula (Robins, 1986) associated with the reduced graph, and the semiparametric variance bounds for estimating the interventional mean under the original and the reduced graphical model agree. This g-formula is an irreducible, efficient identifying formula in the sense that the nonparametric estimator of the formula, under regularity conditions, is asymptotically efficient under the original causal graphical model, and no formula with such property exists that only depends on a strict subset of the variables.
Efficient least squares for estimating total effects under linearity and causal sufficiency Guo, F. Richard, and Emilija Perković Journal of Machine Learning Research 2022 [Abs] [Publisher] [arXiv] [Slides] [Code]
Linear structural equation models are widely used to postulate causal mechanisms underlying observational data. In these models, each variable equals a linear combination of a subset of the remaining variables plus an error term. When there is no unobserved confounding or selection bias, the error terms are assumed to be independent. We consider estimating a total causal effect in this setting. The causal structure is assumed to be known only up to a maximally oriented partially directed acyclic graph (MPDAG), a general class of graphs that can represent a Markov equivalence class of directed acyclic graphs (DAGs) with added background knowledge. We propose a simple estimator based on iterated least squares, which can consistently estimate any identified total causal effect, under point or joint intervention. We show that this estimator is the most efficient among all regular estimators that are based on sample covariance, including covariate adjustment and the estimators employed by the joint-IDA algorithm. Notably, our result holds without assuming Gaussian errors.
BSDE: Barycenter single-cell differential expression for case-control studies Mengqi Zhang, and Guo, F. Richard Bioinformatics 2022 [Abs] [Publisher] [Code]
Single-cell sequencing brings about a revolutionarily high resolution for finding differentially expressed genes by disentangling highly heterogeneous cell tissues. Yet, such analysis is so far mostly focused on comparing between different cell types from the same individual. As single-cell sequencing becomes cheaper and easier to use, an increasing number of datasets from case-control studies are becoming available, which call for new methods for identifying differential expressions between case and control individuals.To bridge this gap, we propose Barycenter Single-cell Differential Expression (BSDE), a nonparametric method for finding differentially expressed genes for case-control studies. Through the use of optimal transportation for aggregating distributions and computing their distances, our method overcomes the restrictive parametric assumptions imposed by standard mixed-effect-modeling approaches. Through simulations, we show that BSDE can accurately detect a variety of differential expressions while maintaining the type-I error at a prescribed level. Further, 1345 and 1568 cell type specific differentially expressed genes are identified by BSDE from datasets on pulmonary fibrosis and multiple sclerosis, among which the top findings are supported by previous results from the literature.
Discussion of ’Estimating time-varying causal excursion effect in mobile health with binary outcomes’ Guo, F. Richard, Thomas S. Richardson, and James M. Robins Biometrika 2021 [Abs] [Publisher] [arXiv]
We discuss the recent paper on "excursion effect" by T. Qian et al. (2020). We show that the methods presented have close relationships to others in the literature, in particular to a series of papers by Robins, Hernán and collaborators on analyzing observational studies as a series of randomized trials. There is also a close relationship to the history-restricted and the history-adjusted marginal structural models (MSM). Important differences and their methodological implications are clarified. We also demonstrate that the excursion effect can depend on the design and discuss its suitability for modifying the treatment protocol.
Chernoff-type concentration of empirical probabilities in relative entropy Guo, F. Richard, and Thomas S. Richardson IEEE Transactions on Information Theory 2021 [Abs] [Publisher] [arXiv] [Code]
We study the relative entropy of the empirical probability vector with respect to the true probability vector in multinomial sampling of k categories, which, when multiplied by sample size n, is also the log-likelihood ratio statistic. We generalize a recent result and show that the moment generating function of the statistic is bounded by a polynomial of degree n on the unit interval, uniformly over all true probability vectors. We characterize the family of polynomials indexed by (k,n) and obtain explicit formulae. Consequently, we develop Chernoff-type tail bounds, including a closed-form version from a large sample expansion of the bound minimizer. Our bound dominates the classic method-of-types bound and is competitive with the state of the art. We demonstrate with an application to estimating the proportion of unseen butterflies.
On testing marginal versus conditional independence Guo, F. Richard, and Thomas S. Richardson Biometrika 2020 [Abs] [Publisher] [arXiv] [Slides]
We consider testing marginal independence versus conditional independence in a trivariate Gaussian setting. The two models are non-nested and their intersection is a union of two marginal independences. We consider two sequences of such models, one from each type of independence, that are closest to each other in the Kullback-Leibler sense as they approach the intersection. They become indistinguishable if the signal strength, as measured by the product of two correlation parameters, decreases faster than the standard parametric rate. Under local alternatives at such rate, we show that the asymptotic distribution of the likelihood ratio depends on where and how the local alternatives approach the intersection. To deal with this non-uniformity, we study a class of "envelope" distributions by taking pointwise suprema over asymptotic cumulative distribution functions. We show that these envelope distributions are well-behaved and lead to model selection procedures with rate-free uniform error guarantees and near-optimal power. To control the error even when the two models are indistinguishable, rather than insist on a dichotomous choice, the proposed procedure will choose either or both models.
How cognitive and reactive fear circuits optimize escape decisions in humans Song Qi, Demis Hassabis, Jiayin Sun, Guo, Fangjian, Nathaniel Daw, and Dean Mobbs Proceedings of the National Academy of Sciences (PNAS) 2018 [Abs] [Publisher]
Flight initiation distance (FID), the distance at which an organism flees from an approaching threat, is an ecological metric of cost–benefit functions of escape decisions. We adapted the FID paradigm to investigate how fast- or slow-attacking “virtual predators” constrain escape decisions. We show that rapid escape decisions rely on “reactive fear” circuits in the periaqueductal gray and midcingulate cortex (MCC), while protracted escape decisions, defined by larger buffer zones, were associated with “cognitive fear” circuits, which include posterior cingulate cortex, hippocampus, and the ventromedial prefrontal cortex, circuits implicated in more complex information processing, cognitive avoidance strategies, and behavioral flexibility. Using a Bayesian decision-making model, we further show that optimization of escape decisions under rapid flight were localized to the MCC, a region involved in adaptive motor control, while the hippocampus is implicated in optimizing decisions that update and control slower escape initiation. These results demonstrate an unexplored link between defensive survival circuits and their role in adaptive escape decisions.
Bounds of memory strength for power-law series Guo, Fangjian, Dan Yang, Zimo Yang, Zhi-Dan Zhao, and Tao Zhou Physical Review E 2017 [Publisher] [arXiv]

conferences

Minimal enumeration of all possible total effects in a Markov equivalence class Guo, F. Richard, and Emilija Perković In AISTATS 2021 [Abs] [Publisher] [arXiv] [Poster]
In observational studies, when a total causal effect of interest is not identified, the set of all possible effects can be reported instead. This typically occurs when the underlying causal DAG is only known up to a Markov equivalence class, or a refinement thereof due to background knowledge. As such, the class of possible causal DAGs is represented by a maximally oriented partially directed acyclic graph (MPDAG), which contains both directed and undirected edges. We characterize the minimal additional edge orientations required to identify a given total effect. A recursive algorithm is then developed to enumerate subclasses of DAGs, such that the total effect in each subclass is identified as a distinct functional of the observed distribution. This resolves an issue with existing methods, which often report possible total effects with duplicates, namely those that are numerically distinct due to sampling variability but are in fact causally identical.
Boosting variational inference Guo, Fangjian, X Wang, K Fan, T Broderick, and D Dunson In NIPS Workshop on Advances in Approximate Bayesian Inference 2016 [Abs] [Publisher] [arXiv] [Code]
Variational inference (VI) provides fast approximations of a Bayesian posterior in part because it formulates posterior approximation as an optimization problem: to find the closest distribution to the exact posterior over some family of distributions. For practical reasons, the family of distributions in VI is usually constrained so that it does not include the exact posterior, even as a limit point. Thus, no matter how long VI is run, the resulting approximation will not approach the exact posterior. We propose to instead consider a more flexible approximating family consisting of all possible finite mixtures of a parametric base distribution (e.g., Gaussian). For efficient inference, we borrow ideas from gradient boosting to develop an algorithm we call boosting variational inference (BVI). BVI iteratively improves the current approximation by mixing it with a new component from the base distribution family and thereby yields progressively more accurate posterior approximations as more computing time is spent. Unlike a number of common VI variants including mean-field VI, BVI is able to capture multimodality, general posterior covariance, and nonstandard posterior shapes.
The Bayesian Echo Chamber: modeling social influence via linguistic accommodation Guo, Fangjian, Charles Blundell, Hanna Wallach, and Katherine Heller In AISTATS 2015 [Publisher] [arXiv] [Code]
Uncovering systematic bias in ratings across categories: a Bayesian approach Guo, Fangjian, and David Dunson In RecSys 2015 [Publisher]
Parallelizing MCMC with random partition trees Xiangyu Wang, Guo, Fangjian, Katherine Heller, and David Dunson In NIPS 2015 [Abs] [Publisher] [arXiv] [Code]
The modern scale of data has brought new challenges to Bayesian inference. In particular, conventional MCMC algorithms are computationally very expensive for large data sets. A promising approach to solve this problem is embarrassingly parallel MCMC (EP-MCMC), which first partitions the data into multiple subsets and runs independent sampling algorithms on each subset. The subset posterior draws are then aggregated via some combining rules to obtain the final approximation. Existing EP-MCMC algorithms are limited by approximation accuracy and difficulty in resampling. In this article, we propose a new EP-MCMC algorithm PART that solves these problems. The new algorithm applies random partition trees to combine the subset posterior draws, which is distribution-free, easy to resample from and can adapt to multiple scales. We provide theoretical justification and extensive experiments illustrating empirical performance.

PhD thesis

Likelihood analysis of causal models Guo, F. Richard 2021 [Abs] [Publisher]
We analyze several problems in causal inference from the perspective of maximum likelihood. Two archetypal likelihoods are primarily concerned: Gaussian likelihood for continuous data and multinomial likelihood for discrete data. In the first half of this dissertation, Gaussian likelihood is considered for testing and estimation. Motivated by the selection of causal graphs, in Chapter 2, we study testing between marginal and conditional independence in a Gaussian setting with the likelihood ratio test (LRT). We introduce a class of “envelope” distributions by taking pointwise suprema over asymptotic distribution functions of LRT. We show that these envelope distributions are well-behaved and lead to uniformly consistent model selection procedures. In Chapter 3, we consider the estimation of total causal effects under causal sufficiency and linearity. We derive a simple recursive least squares estimator as the MLE under Gaussian errors, which can consistently estimate any identified total effect, under either point or joint intervention. Further, this estimator is shown to be asymptotically efficient even beyond the Gaussian assumption, when compared to a reasonably large class of estimators. In the latter half, we study the inference of instrumental variable (IV) models with discrete data. In Chapter 4, we develop non-asymptotic tail bounds for the likelihood ratio statistic under multinomial sampling. Such bounds are established by bounding the moment generating function of the statistic uniformly over all multinomial parameters, which can be viewed as a finite-sample version of Wilks’ theorem. Then, in Chapter 5, such bounds are combined with a convex parametrization of the IV model to streamline statistical inference as convex programming. This approach delivers strong guarantees and circumvents the difficulty in identification and post-selection inference. The approach is illustrated with a case study on the distributional effect of military service on annual earnings, using the Vietnam draft lottery as a monotone instrument. Finally, we study partial identification of the average treatment effect in a latent variable formulation and make connections to the Bell-CHSH inequalities in quantum mechanics.

notes, techical reports and expository writing

Empirical Bayes for large-scale randomized experiments: a spectral approach Guo, F. Richard, James McQueen, and Thomas S. Richardson 2020 [Abs] [arXiv]
Large-scale randomized experiments, sometimes called A/B tests, are increasingly prevalent in many industries. Though such experiments are often analyzed via frequentist t-tests, arguably such analyses are deficient: p-values are hard to interpret and not easily incorporated into decision-making. As an alternative, we propose an empirical Bayes approach, which assumes that experiments come from a population, and therefore the treatment effects are realized from a "true prior". A key step in implementing this framework is to estimate the underlying true prior from a set of previous experiments. First, we show that the empirical effect estimates from individual experiments can be asymptotically modeled as independent draws from the true prior perturbed by additive Gaussian noise with heterogeneous scales. Second, following the work of Robbins, we generalize from estimating the prior to estimating a family of marginal densities of the empirical effect estimates, indexed by the noise scale. We show that this density family is characterized by the heat equation. Third, given the general form of solution to the heat equation, we develop a spectral maximum likelihood estimate based on a Fourier series representation, which can be efficiently computed via convex optimization. In order to select hyperparameters and compare models we describe two model selection criteria. Finally, we demonstrate our method on simulated and real data, and compare posterior inference to that under a Gaussian mixture model for the prior.
Causal Inference by using invariant prediction by Peters, Buhlmann and Meinshausen (2016) Guo, F. Richard 2018 [Publisher] [Code]