Debiased score test: goodness-of-fit test and model comparison

Test whether a semiparametric (e.g., GAM) or parametric (e.g., glm) regression model is well-specified. The test is a debiased (Neyman-orthogonalized) score test computed via sample splitting: on a held-out hunt sample, the null model is fit and a flexible ML algorithm is used to hunt for a direction in which the null model's score seems positive; on an independent test sample, that direction's score is evaluated to assess the significance. The test employs orthogonalization to eliminate plug-in bias from estimating the null model, so the resulting test statistic is asymptotically standard normal under the null without requiring a parametric form for the alternative.

Usage

dScoreTest(
  y,
  X,
  score_fun,
  weight_fun,
  fit_method,
  wls_method,
  hunt.style = "optimal",
  hunt.method = "grf",
  hunt_fun = NULL,
  debias.method = "standard",
  debias_fun = NULL,
  trim.outlier.hunt = TRUE,
  X.cols.hunt = 1:ncol(X),
  splits = c(0.5, 0.5),
  arg.fit_method = NULL,
  arg.wls_method = NULL,
  arg.hunt_fun = NULL,
  predict_fun = stats::predict,
  predict_fun_hunt = NULL,
  verbose = FALSE
)

Arguments

y

Numeric response vector of length n.

X

Numeric covariate matrix of dimension n x p.

score_fun

Function with signature score_fun(fit, y, X) returning a vector of scores \(l'(\hat{f}(x_i), y_i)\), which can be viewed as negative residuals.

weight_fun

Function with signature weight_fun(fit, X) that computes the weight \(\mathbb{E}[l''(\hat{f}(x_i), y_i) | x_i]\) for each row \(x_i\) of X.

fit_method

Function with signature fit_method(y, X, ...) that returns a fitted null model \(\hat{f} \in \mathcal{F}\) by minimizing the loss \(\sum_i l(f(x_i), y_i)\). For a fitted f, it must support predict_fun(f, X) for evaluation.

wls_method

Function with signature wls_method(y, X, w, ...) that fits the null model \(\hat{f} \in \mathcal{F}\) with weighted least squares, i.e., minimizing \(\sum_i w_i (f(x_i) - y_i)^2\). For a fitted f, it must support predict_fun(f, X) for evaluation.

hunt.style

Hunting algorithm with the following options.

'optimal': optimal hunting (default). See hunt_optimal.
'wls': a simpler hunting using weighted least squares, which can be less powerful. See hunt_wls.
'vanilla': a basic hunting; not recommended unless unable to fit an alternative model with weighted least squares. See hunt_vanilla.

hunt.method

Built-in method for hunting. Currently available:

'grf': regression forest from package grf.

When this is set to any other value, arguments hunt_fun and predict_fun_hunt must be set properly to supply a customized hunting method.

hunt_fun

Default NULL. When hunt.method is not set to a built-in method, this is a customized function for hunting. When hunt.style is 'optimal' or 'wls', this function must have signature hunt_fun(y, X, w, ...) that returns a fitted alternative model \(\hat{g} \in \mathcal{G}\) via weighted least squares, i.e., by minimizing \(\sum_i w_i (y_i - g(x_i))^2\); otherwise, for 'vanilla' hunting, this function must have signature hunt_fun(y, X, ...) that returns an alternative model fitted in any fashion. The returned object g must support predict_fun_hunt(g, X) for evaluation.

debias.method

Debiasing method. Currently available:

'standard': standard debiasing (default). See debias_standard.

When set to any other value, debias_fun must be supplied.

debias_fun

Default NULL. When debias.method is not 'standard', this is a customized debiasing function with the same signature as debias_standard, returning a list with an element h, the debiased hunted function.

trim.outlier.hunt

If TRUE (default), extreme values produced by the hunted function will be trimmed using Tukey's IQR rule.

X.cols.hunt

Integer vector selecting which columns of X drive the hunt. Default 1:ncol(X). This is modified only in special settings, e.g., when there is an offset in the null model.

splits

Numeric vector of length 2 or 3 giving the relative sizes of the sample splits; rescaled internally to sum to one. Default is c(0.5, 0.5), which splits data into two halves for hunt and test respectively. Though typically unnecessary in practice, one can also specify a 3-way split for hunt, debiasing and test respectively.

arg.fit_method

Named list of additional arguments passed to fit_method (default to NULL).

arg.wls_method

Named list of additional arguments passed to wls_method (default to NULL).

arg.hunt_fun

Extra arguments (default NULL) passed to the customized hunt.fun.

predict_fun

Function with signature predict_fun(fit, X) returning a numeric vector of predictions from a fitted null model, which is produced by fit_method() and wls_method(). Note that if fit is \(\hat{f}\), this function should return \(\hat{f}(X)\). Default stats::predict. When y is binary, it must also support signature predict_fun(fit, X, type='response') for returning probabilities.

predict_fun_hunt

Default NULL. When hunt.method is not set to a built-in method, this is a function with signature predict_fun_hunt(fit, X) returning a numeric vector of predictions from a fitted alternative model produced by hunt_fun().

verbose

Default FALSE; information is printed if set to TRUE.

Value

An object of class "dScoreTest": a list whose key elements are the debiased test statistic t.stat and the one-sided p-value p.val (right tail of the standard normal), along with the test-set score residuals, the hunted direction, and the call. It has print, summary and plot methods.

Details

For most scenarios, use one of these methods instead:

Use gof_test to test whether a fitted model is well-specified against a nonparametric alternative. S3 methods are provided for glm (gof_test.glm), lm (gof_test.lm) and mgcv::gam (gof_test.gam).
Use compare_models to test a null model fit.0 against an alternative supermodel fit.1 in the same model class. Similar to anova, method can be used to conduct a significance test of one or more predictors. In contrast with gof_test, this method targets the alternative fit.1. S3 methods are provided for glm (compare_models.glm), lm (compare_models.lm) and mgcv::gam (compare_models.gam).

Use dScoreTest directly for full control over the score, weight, refit and hunt routines: this is the underlying engine that the S3 methods wrap.

Author

Maintainer: F. Richard Guo ricguo@umich.edu (ORCID) [copyright holder]

Authors:

Aditya Dhawan ad950@cam.ac.uk