Skip to contents

Test whether a parametric (e.g., glm) or a semiparametric (e.g., GAM) model is well-specified. The test is a debiased (Neyman-orthogonalized) score test computed via sample splitting: on a held-out hunt sample, a flexible auxiliary fit is hunted for a direction in which the null model's score is non-zero; on a held-out test sample, that direction's score is evaluated and standardized under the null. The orthogonalization absorbs plug-in bias from estimating the direction, so the resulting test statistic is asymptotically standard normal under the null without requiring a parametric form for the alternative.

Usage

dScoreTest(
  y,
  X,
  score_fun,
  weight_fun,
  fit_method,
  wls_method,
  hunt.style = "optimal",
  hunt.method = "grf",
  hunt_fun = NULL,
  trim.outlier.hunt = TRUE,
  X.cols.hunt = 1:ncol(X),
  splits = c(0.5, 0.5),
  arg.fit_method = NULL,
  arg.wls_method = NULL,
  arg.hunt_fun = NULL,
  predict_fun = stats::predict,
  predict_fun_alt = NULL,
  verbose = FALSE
)

Arguments

y

Numeric response vector of length n.

X

Numeric covariate matrix of dimension n x p.

score_fun

Function with signature score_fun(fit, y, X) returning a vector of scores \(l'(\hat{f}(x_i), y_i)\), which can be viewed as negative residuals.

weight_fun

Function with signature weight_fun(fit, X) that computes the weight \(\mathbb{E}[l''(\hat{f}(x_i), y_i) | x_i]\) for each row \(x_i\) of X.

fit_method

Function with signature fit_method(y, X, ...) that returns a fitted null model \(\hat{f} \in \mathcal{F}\) by minimizing the loss \(\sum_i l(f(x_i), y_i)\). For a fitted f, it must support predict_fun(f, X) for evaluation.

wls_method

Function with signature wls_method(y, X, w, ...) that fits the null model \(\hat{f} \in \mathcal{F}\) with weighted least squares, i.e., minimizing \(\sum_i w_i (f(x_i) - y_i)^2\). For a fitted f, it must support predict_fun(f, X) for evaluation.

hunt.style

Hunting algorithm with the following options.

  • 'optimal': optimal hunting (default). See hunt_optimal.

  • 'wls': a simpler hunting using weighted least squares, which can be less powerful. See hunt_wls.

  • 'vanilla': a basic hunting; not recommended unless unable to fit an alternative model with weighted least squares. See hunt_vanilla.

hunt.method

Built-in method for hunting. Currently available:

  • 'grf': regression forest from package grf.

When this is set to any other value, arguments hunt_fun and predict_fun_alt must be set properly to supply a customized hunting method.

hunt_fun

Default NULL. When hunt.method is not set to a built-in method, this is a customized function for hunting. When hunt.style is 'optimal' or 'wls', this function must have signature hunt_fun(y, X, w, ...) that returns a fitted alternative model \(\hat{g} \in \mathcal{G}\) via weighted least squares, i.e., by minimizing \(\sum_i w_i (y_i - g(x_i))^2\); otherwise, for 'vanilla' hunting, this function must have signature hunt_fun(y, X, ...) that returns an alternative model fitted in any fashion. The returned object g must support predict_fun_alt(g, X) for evaluation.

trim.outlier.hunt

If TRUE (default), extreme values produced by the hunted function will be trimmed using Tukey's IQR rule.

X.cols.hunt

Integer vector selecting which columns of X drive the hunt. Default 1:ncol(X). This is modified only in special settings, e.g., when there is an offset in the null model.

splits

Numeric vector of length 2 or 3 giving the relative sizes of the sample splits; rescaled internally to sum to one. Default is c(0.5, 0.5), which splits data into two halves for hunt and test respectively. Though typically unnecessary in practice, one can also specify a 3-way split for hunt, debiasing and test respectively.

arg.fit_method

Named list of additional arguments passed to fit_method (default to NULL).

arg.wls_method

Named list of additional arguments passed to wls_method (default to NULL).

arg.hunt_fun

Extra arguments (default NULL) passed to the customized hunt.fun.

predict_fun

Function with signature predict_fun(fit, X) returning a numeric vector of predictions from a fitted null model, which is produced by fit_method() and wls_method(). Note that if fit is \(\hat{f}\), this function should return \(\hat{f}(X)\). Default stats::predict. When y is binary, it must also support signature predict_fun(fit, X, type='response') for returning probabilities.

predict_fun_alt

Default NULL. When hunt.method is not set to a built-in method, this is a function with signature predict_fun_alt(fit, X) returning a numeric vector of predictions from a fitted alternative model produced by hunt_fun().

verbose

Default FALSE; information is printed if set to TRUE.

Value

An object of class "dScoreTest".

Details

For most scenarios, use one of these methods instead:

Use dScoreTest directly for full control over the score, weight, refit and hunt routines: this is the underlying engine that the S3 methods wrap.

Author

Maintainer: F. Richard Guo ricguo@umich.edu (ORCID)

Authors: