
Debiased score test: goodness-of-fit test and model comparison
Source:R/dScoreTest-package.R, R/dscoretest.R
dScoreTest.RdTest whether a parametric (e.g., glm) or a semiparametric (e.g., GAM) model is well-specified. The test is a debiased (Neyman-orthogonalized) score test computed via sample splitting: on a held-out hunt sample, a flexible auxiliary fit is hunted for a direction in which the null model's score is non-zero; on a held-out test sample, that direction's score is evaluated and standardized under the null. The orthogonalization absorbs plug-in bias from estimating the direction, so the resulting test statistic is asymptotically standard normal under the null without requiring a parametric form for the alternative.
Usage
dScoreTest(
y,
X,
score_fun,
weight_fun,
fit_method,
wls_method,
hunt.style = "optimal",
hunt.method = "grf",
hunt_fun = NULL,
trim.outlier.hunt = TRUE,
X.cols.hunt = 1:ncol(X),
splits = c(0.5, 0.5),
arg.fit_method = NULL,
arg.wls_method = NULL,
arg.hunt_fun = NULL,
predict_fun = stats::predict,
predict_fun_alt = NULL,
verbose = FALSE
)Arguments
- y
Numeric response vector of length n.
- X
Numeric covariate matrix of dimension n x p.
- score_fun
Function with signature
score_fun(fit, y, X)returning a vector of scores \(l'(\hat{f}(x_i), y_i)\), which can be viewed as negative residuals.- weight_fun
Function with signature
weight_fun(fit, X)that computes the weight \(\mathbb{E}[l''(\hat{f}(x_i), y_i) | x_i]\) for each row \(x_i\) of X.- fit_method
Function with signature
fit_method(y, X, ...)that returns a fitted null model \(\hat{f} \in \mathcal{F}\) by minimizing the loss \(\sum_i l(f(x_i), y_i)\). For a fittedf, it must supportpredict_fun(f, X)for evaluation.- wls_method
Function with signature
wls_method(y, X, w, ...)that fits the null model \(\hat{f} \in \mathcal{F}\) with weighted least squares, i.e., minimizing \(\sum_i w_i (f(x_i) - y_i)^2\). For a fittedf, it must supportpredict_fun(f, X)for evaluation.- hunt.style
Hunting algorithm with the following options.
'optimal': optimal hunting (default). Seehunt_optimal.'wls': a simpler hunting using weighted least squares, which can be less powerful. Seehunt_wls.'vanilla': a basic hunting; not recommended unless unable to fit an alternative model with weighted least squares. Seehunt_vanilla.
- hunt.method
Built-in method for hunting. Currently available:
'grf': regression forest from packagegrf.
When this is set to any other value, arguments
hunt_funandpredict_fun_altmust be set properly to supply a customized hunting method.- hunt_fun
Default
NULL. Whenhunt.methodis not set to a built-in method, this is a customized function for hunting. Whenhunt.styleis'optimal'or'wls', this function must have signaturehunt_fun(y, X, w, ...)that returns a fitted alternative model \(\hat{g} \in \mathcal{G}\) via weighted least squares, i.e., by minimizing \(\sum_i w_i (y_i - g(x_i))^2\); otherwise, for'vanilla'hunting, this function must have signaturehunt_fun(y, X, ...)that returns an alternative model fitted in any fashion. The returned objectgmust supportpredict_fun_alt(g, X)for evaluation.- trim.outlier.hunt
If
TRUE(default), extreme values produced by the hunted function will be trimmed using Tukey's IQR rule.- X.cols.hunt
Integer vector selecting which columns of
Xdrive the hunt. Default1:ncol(X). This is modified only in special settings, e.g., when there is an offset in the null model.- splits
Numeric vector of length 2 or 3 giving the relative sizes of the sample splits; rescaled internally to sum to one. Default is
c(0.5, 0.5), which splits data into two halves for hunt and test respectively. Though typically unnecessary in practice, one can also specify a 3-way split for hunt, debiasing and test respectively.- arg.fit_method
Named list of additional arguments passed to
fit_method(default toNULL).- arg.wls_method
Named list of additional arguments passed to
wls_method(default toNULL).- arg.hunt_fun
Extra arguments (default
NULL) passed to the customizedhunt.fun.- predict_fun
Function with signature
predict_fun(fit, X)returning a numeric vector of predictions from a fitted null model, which is produced byfit_method()andwls_method(). Note that iffitis \(\hat{f}\), this function should return \(\hat{f}(X)\). Defaultstats::predict. When y is binary, it must also support signaturepredict_fun(fit, X, type='response')for returning probabilities.- predict_fun_alt
Default
NULL. Whenhunt.methodis not set to a built-in method, this is a function with signaturepredict_fun_alt(fit, X)returning a numeric vector of predictions from a fitted alternative model produced byhunt_fun().- verbose
Default
FALSE; information is printed if set toTRUE.
Details
For most scenarios, use one of these methods instead:
Use
gof_testto test whether a fitted model is well-specified against a nonparametric alternative. S3 methods are provided forglm(gof_test.glm),lm(gof_test.lm) andmgcv::gam(gof_test.gam).Use
compare_modelsto test a null modelfit.0against an alternative supermodelfit.1in the same model class. Similar toanova, method can be used to conduct a significance test of one or more predictors. In contrast withgof_test, this method targets the alternativefit.1. S3 methods are provided forglm(compare_models.glm),lm(compare_models.lm) andmgcv::gam(compare_models.gam).
Use dScoreTest directly for full control over the score,
weight, refit and hunt routines: this is the underlying engine that the
S3 methods wrap.