Richard Guo | MultiSplit

Statistical tests are sometimes constructed with data splitting. When such tests are applied to data, the result can depend on the way the data is split, which is typically random. Therefore, on a dataset, the result of a test is random and not replicable. Further, such tests typically have low power because the full sample is not utlized.

R package MultiSplit properly aggregates the results from multiple data splits and reports the p-value of the aggregated statistic. The constructed test has level that asymptotically approaches the nominal level. Typically, by aggregating results from a sufficiently large number of data splits, the test becomes replicable and much more powerful. This package implements a generic method that handles any test that is constructed with “extra randomness”, including random data splitting, resampling, imputation, etc.