Empirical Bayes for large-scale randomized experiments: a spectral approachGuo, F. Richard,
and Thomas S. Richardson
Large-scale randomized experiments, sometimes called A/B tests, are increasingly prevalent in many industries. Though such experiments are often analyzed via frequentist t-tests, arguably such analyses are deficient: p-values are hard to interpret and not easily incorporated into decision-making. As an alternative, we propose an empirical Bayes approach, which assumes that experiments come from a population, and therefore the treatment effects are realized from a "true prior". A key step in implementing this framework is to estimate the underlying true prior from a set of previous experiments. First, we show that the empirical effect estimates from individual experiments can be asymptotically modeled as independent draws from the true prior perturbed by additive Gaussian noise with heterogeneous scales. Second, following the work of Robbins, we generalize from estimating the prior to estimating a family of marginal densities of the empirical effect estimates, indexed by the noise scale. We show that this density family is characterized by the heat equation. Third, given the general form of solution to the heat equation, we develop a spectral maximum likelihood estimate based on a Fourier series representation, which can be efficiently computed via convex optimization. In order to select hyperparameters and compare models we describe two model selection criteria. Finally, we demonstrate our method on simulated and real data, and compare posterior inference to that under a Gaussian mixture model for the prior.
Linear structural equation models are widely used to postulate causal mechanisms underlying observational data. In these models, each variable equals a linear combination of a subset of the remaining variables plus an error term. When there is no unobserved confounding or selection bias, the error terms are assumed to be independent. We consider estimating a total causal effect in this setting. The causal structure is assumed to be known only up to a maximally oriented partially directed acyclic graph (MPDAG), a general class of graphs that can represent a Markov equivalence class of directed acyclic graphs (DAGs) with added background knowledge. We propose a simple estimator based on iterated least squares, which can consistently estimate any identified total causal effect, under point or joint intervention. We show that this estimator is the most efficient among all regular estimators that are based on sample covariance, including covariate adjustment and the estimators employed by the joint-IDA algorithm. Notably, our result holds without assuming Gaussian errors.
Chernoff-type concentration of empirical probabilities in relative entropyGuo, F. Richard,
and Thomas S. RichardsonIEEE Transactions on Information Theory
We study the relative entropy of the empirical probability vector with respect to the true probability vector in multinomial sampling of k categories, which, when multiplied by sample size n, is also the log-likelihood ratio statistic. We generalize a recent result and show that the moment generating function of the statistic is bounded by a polynomial of degree n on the unit interval, uniformly over all true probability vectors. We characterize the family of polynomials indexed by (k,n) and obtain explicit formulae. Consequently, we develop Chernoff-type tail bounds, including a closed-form version from a large sample expansion of the bound minimizer. Our bound dominates the classic method-of-types bound and is competitive with the state of the art. We demonstrate with an application to estimating the proportion of unseen butterflies.
We consider testing marginal independence versus conditional independence in a trivariate Gaussian setting. The two models are non-nested and their intersection is a union of two marginal independences. We consider two sequences of such models, one from each type of independence, that are closest to each other in the Kullback-Leibler sense as they approach the intersection. They become indistinguishable if the signal strength, as measured by the product of two correlation parameters, decreases faster than the standard parametric rate. Under local alternatives at such rate, we show that the asymptotic distribution of the likelihood ratio depends on where and how the local alternatives approach the intersection. To deal with this non-uniformity, we study a class of "envelope" distributions by taking pointwise suprema over asymptotic cumulative distribution functions. We show that these envelope distributions are well-behaved and lead to model selection procedures with rate-free uniform error guarantees and near-optimal power. To control the error even when the two models are indistinguishable, rather than insist on a dichotomous choice, the proposed procedure will choose either or both models.
How cognitive and reactive fear circuits optimize escape decisions in humansSong Qi,
and Dean Mobbs
Proceedings of the National Academy of Sciences (PNAS)
Flight initiation distance (FID), the distance at which an organism flees from an approaching threat, is an ecological metric of cost–benefit functions of escape decisions. We adapted the FID paradigm to investigate how fast- or slow-attacking “virtual predators” constrain escape decisions. We show that rapid escape decisions rely on “reactive fear” circuits in the periaqueductal gray and midcingulate cortex (MCC), while protracted escape decisions, defined by larger buffer zones, were associated with “cognitive fear” circuits, which include posterior cingulate cortex, hippocampus, and the ventromedial prefrontal cortex, circuits implicated in more complex information processing, cognitive avoidance strategies, and behavioral flexibility. Using a Bayesian decision-making model, we further show that optimization of escape decisions under rapid flight were localized to the MCC, a region involved in adaptive motor control, while the hippocampus is implicated in optimizing decisions that update and control slower escape initiation. These results demonstrate an unexplored link between defensive survival circuits and their role in adaptive escape decisions.
Bounds of memory strength for power-law seriesGuo, Fangjian,
and Tao Zhou
Physical Review E
Minimal enumeration of all possible total effects in a Markov equivalence classGuo, F. Richard,
and Emilija PerkovićIn AISTATS
In observational studies, when a total causal effect of interest is not identified, the set of all possible effects can be reported instead. This typically occurs when the underlying causal DAG is only known up to a Markov equivalence class, or a refinement thereof due to background knowledge. As such, the class of possible causal DAGs is represented by a maximally oriented partially directed acyclic graph (MPDAG), which contains both directed and undirected edges. We characterize the minimal additional edge orientations required to identify a given total effect. A recursive algorithm is then developed to enumerate subclasses of DAGs, such that the total effect in each subclass is identified as a distinct functional of the observed distribution. This resolves an issue with existing methods, which often report possible total effects with duplicates, namely those that are numerically distinct due to sampling variability but are in fact causally identical.
Boosting variational inferenceGuo, Fangjian,
and D DunsonIn NIPS Workshop on Advances in Approximate Bayesian Inference
Variational inference (VI) provides fast approximations of a Bayesian posterior in part because it formulates posterior approximation as an optimization problem: to find the closest distribution to the exact posterior over some family of distributions. For practical reasons, the family of distributions in VI is usually constrained so that it does not include the exact posterior, even as a limit point. Thus, no matter how long VI is run, the resulting approximation will not approach the exact posterior. We propose to instead consider a more flexible approximating family consisting of all possible finite mixtures of a parametric base distribution (e.g., Gaussian). For efficient inference, we borrow ideas from gradient boosting to develop an algorithm we call boosting variational inference (BVI). BVI iteratively improves the current approximation by mixing it with a new component from the base distribution family and thereby yields progressively more accurate posterior approximations as more computing time is spent. Unlike a number of common VI variants including mean-field VI, BVI is able to capture multimodality, general posterior covariance, and nonstandard posterior shapes.
The Bayesian Echo Chamber: modeling social influence via linguistic accommodationGuo, Fangjian,
and Katherine Heller
Uncovering systematic bias in ratings across categories: a Bayesian approachGuo, Fangjian,
and David DunsonIn RecSys
Parallelizing MCMC with random partition trees
and David DunsonIn NIPS
The modern scale of data has brought new challenges to Bayesian inference. In particular, conventional MCMC algorithms are computationally very expensive for large data sets. A promising approach to solve this problem is embarrassingly parallel MCMC (EP-MCMC), which first partitions the data into multiple subsets and runs independent sampling algorithms on each subset. The subset posterior draws are then aggregated via some combining rules to obtain the final approximation. Existing EP-MCMC algorithms are limited by approximation accuracy and difficulty in resampling. In this article, we propose a new EP-MCMC algorithm PART that solves these problems. The new algorithm applies random partition trees to combine the subset posterior draws, which is distribution-free, easy to resample from and can adapt to multiple scales. We provide theoretical justification and extensive experiments illustrating empirical performance.
notes and expository writings
Causal Inference by using invariant prediction by Peters, Buhlmann and Meinshausen (2016)Guo, F. Richard