Title: | Handling Heteroskedasticity in the Linear Regression Model |
---|---|
Description: | Implements numerous methods for testing for, modelling, and correcting for heteroskedasticity in the classical linear regression model. The most novel contribution of the package is found in the functions that implement the as-yet-unpublished auxiliary linear variance models and auxiliary nonlinear variance models that are designed to estimate error variances in a heteroskedastic linear regression model. These models follow principles of statistical learning described in Hastie (2009) <doi:10.1007/978-0-387-21606-5>. The nonlinear version of the model is estimated using quasi-likelihood methods as described in Seber and Wild (2003, ISBN: 0-471-47135-6). Bootstrap methods for approximate confidence intervals for error variances are implemented as described in Efron and Tibshirani (1993, ISBN: 978-1-4899-4541-9), including also the expansion technique described in Hesterberg (2014) <doi:10.1080/00031305.2015.1089789>. The wild bootstrap employed here follows the description in Davidson and Flachaire (2008) <doi:10.1016/j.jeconom.2008.08.003>. Tuning of hyper-parameters makes use of a golden section search function that is modelled after the MATLAB function of Zarnowiec (2022) <https://www.mathworks.com/matlabcentral/fileexchange/25919-golden-section-method-algorithm>. A methodological description of the algorithm can be found in Fox (2021, ISBN: 978-1-003-00957-3). There are 25 different functions that implement hypothesis tests for heteroskedasticity. These include a test based on Anscombe (1961) <https://projecteuclid.org/euclid.bsmsp/1200512155>, Ramsey's (1969) BAMSET Test <doi:10.1111/j.2517-6161.1969.tb00796.x>, the tests of Bickel (1978) <doi:10.1214/aos/1176344124>, Breusch and Pagan (1979) <doi:10.2307/1911963> with and without the modification proposed by Koenker (1981) <doi:10.1016/0304-4076(81)90062-2>, Carapeto and Holt (2003) <doi:10.1080/0266476022000018475>, Cook and Weisberg (1983) <doi:10.1093/biomet/70.1.1> (including their graphical methods), Diblasi and Bowman (1997) <doi:10.1016/S0167-7152(96)00115-0>, Dufour, Khalaf, Bernard, and Genest (2004) <doi:10.1016/j.jeconom.2003.10.024>, Evans and King (1985) <doi:10.1016/0304-4076(85)90085-5> and Evans and King (1988) <doi:10.1016/0304-4076(88)90006-1>, Glejser (1969) <doi:10.1080/01621459.1969.10500976> as formulated by Mittelhammer, Judge and Miller (2000, ISBN: 0-521-62394-4), Godfrey and Orme (1999) <doi:10.1080/07474939908800438>, Goldfeld and Quandt (1965) <doi:10.1080/01621459.1965.10480811>, Harrison and McCabe (1979) <doi:10.1080/01621459.1979.10482544>, Harvey (1976) <doi:10.2307/1913974>, Honda (1989) <doi:10.1111/j.2517-6161.1989.tb01749.x>, Horn (1981) <doi:10.1080/03610928108828074>, Li and Yao (2019) <doi:10.1016/j.ecosta.2018.01.001> with and without the modification of Bai, Pan, and Yin (2016) <doi:10.1007/s11749-017-0575-x>, Rackauskas and Zuokas (2007) <doi:10.1007/s10986-007-0018-6>, Simonoff and Tsai (1994) <doi:10.2307/2986026> with and without the modification of Ferrari, Cysneiros, and Cribari-Neto (2004) <doi:10.1016/S0378-3758(03)00210-6>, Szroeter (1978) <doi:10.2307/1913831>, Verbyla (1993) <doi:10.1111/j.2517-6161.1993.tb01918.x>, White (1980) <doi:10.2307/1912934>, Wilcox and Keselman (2006) <doi:10.1080/10629360500107923>, Yuce (2008) <https://dergipark.org.tr/en/pub/iuekois/issue/8989/112070>, and Zhou, Song, and Thompson (2015) <doi:10.1002/cjs.11252>. Besides these heteroskedasticity tests, there are supporting functions that compute the BLUS residuals of Theil (1965) <doi:10.1080/01621459.1965.10480851>, the conditional two-sided p-values of Kulinskaya (2008) <arXiv:0810.2124v1>, and probabilities for the nonparametric trend statistic of Lehmann (1975, ISBN: 0-816-24996-1). For handling heteroskedasticity, in addition to the new auxiliary variance model methods, there is a function to implement various existing Heteroskedasticity-Consistent Covariance Matrix Estimators from the literature, such as those of White (1980) <doi:10.2307/1912934>, MacKinnon and White (1985) <doi:10.1016/0304-4076(85)90158-7>, Cribari-Neto (2004) <doi:10.1016/S0167-9473(02)00366-3>, Cribari-Neto et al. (2007) <doi:10.1080/03610920601126589>, Cribari-Neto and da Silva (2011) <doi:10.1007/s10182-010-0141-2>, Aftab and Chang (2016) <doi:10.18187/pjsor.v12i2.983>, and Li et al. (2017) <doi:10.1080/00949655.2016.1198906>. |
Authors: | Thomas Farrar [aut, cre] |
Maintainer: | Thomas Farrar <[email protected]> |
License: | MIT + file LICENSE |
Version: | 2.0.2 |
Built: | 2025-03-03 04:20:52 UTC |
Source: | https://github.com/tjfarrar/skedastic |
Fits an Auxiliary Linear Variance Model (ALVM) to estimate the error variances of a heteroskedastic linear regression model.
alvm.fit( mainlm, M = NULL, model = c("cluster", "spline", "linear", "polynomial", "basic", "homoskedastic"), varselect = c("none", "hettest", "cv.linear", "cv.cluster", "qgcv.linear", "qgcv.cluster"), lambda = c("foldcv", "qgcv"), nclust = c("elbow.swd", "elbow.mwd", "elbow.both", "foldcv"), clustering = NULL, polypen = c("L2", "L1"), d = 2L, solver = c("auto", "quadprog", "quadprogXT", "roi", "osqp"), tsk = NULL, tsm = NULL, constol = 1e-10, cvoption = c("testsetols", "partitionres"), nfolds = 5L, reduce2homosked = TRUE, ... )
alvm.fit( mainlm, M = NULL, model = c("cluster", "spline", "linear", "polynomial", "basic", "homoskedastic"), varselect = c("none", "hettest", "cv.linear", "cv.cluster", "qgcv.linear", "qgcv.cluster"), lambda = c("foldcv", "qgcv"), nclust = c("elbow.swd", "elbow.mwd", "elbow.both", "foldcv"), clustering = NULL, polypen = c("L2", "L1"), d = 2L, solver = c("auto", "quadprog", "quadprogXT", "roi", "osqp"), tsk = NULL, tsm = NULL, constol = 1e-10, cvoption = c("testsetols", "partitionres"), nfolds = 5L, reduce2homosked = TRUE, ... )
mainlm |
Either an object of |
M |
An |
model |
A character corresponding to the type of ALVM to be fitted:
|
varselect |
Either a character indicating how variable selection should
be conducted, or an integer vector giving indices of columns of the
predictor matrix (
|
lambda |
Either a double of length 1 indicating the value of the
penalty hyperparameter |
nclust |
Either an integer of length 1 indicating the value of the
number of clusters |
clustering |
A list object of class |
polypen |
A character, either |
d |
An integer specifying the degree of polynomial to use in the
penalised polynomial ALVM; defaults to |
solver |
A character, indicating which Quadratic Programming solver
function to use to estimate |
tsk |
An integer corresponding to the basis dimension |
tsm |
An integer corresponding to the order |
constol |
A double corresponding to the boundary value for the
constraint on error variances. Of course, the error variances must be
non-negative, but setting the constraint boundary to 0 can result in
zero estimates that then result in infinite weights for Feasible
Weighted Least Squares. The boundary value should thus be positive, but
small enough not to bias estimation of very small variances. Defaults to
|
cvoption |
A character, either |
nfolds |
An integer specifying the number of folds |
reduce2homosked |
A logical indicating whether the homoskedastic
error variance estimator |
... |
Other arguments that can be passed to (non-exported) helper functions, namely:
|
The ALVM model equation is
,
where is the Ordinary Least Squares residual vector,
is
the annihilator matrix
,
is a linear
predictor matrix,
is a random error vector,
is a
-vector of unknown parameters, and
denotes the
Hadamard (elementwise) product. The construction of
depends on
the method used to model or estimate the assumed heteroskedastic
function
, a continuous, differentiable function that is
linear in
and by which the error variances
of the main linear model are related to the predictors
.
This method has been developed as part of the author's doctoral research
project.
Depending on the model used, the estimation method could be Inequality-Constrained Least Squares or Inequality-Constrained Ridge Regression. However, these are both special cases of Quadratic Programming. Therefore, all of the models are fitted using Quadratic Programming.
Several techniques are available for feature selection within the model.
The LASSO-type model handles feature selection via a shrinkage penalty.
For this reason, if the user calls the polynomial model with
-norm penalty, it is not necessary to specify a variable
selection method, since this is handled automatically. Another feature
selection technique is to use a heteroskedasticity test that tests for
heteroskedasticity linked to a particular predictor variable (the
‘deflator’). This test can be conducted with each features in turn
serving as the deflator. Those features for which the null hypothesis of
homoskedasticity is rejected at a specified significance level
alpha
are selected. A third feature selection technique is best
subset selection, where the model is fitted with all possible subsets of
features. The models are scored in terms of some metric, and the
best-performing subset of features is selected. The metric could be
squared-error loss computed under -fold cross-validation or using
quasi-generalised cross-validation. (The quasi- prefix refers to
the fact that generalised cross-validation is, properly speaking, only
applicable to a linear fitting method, as defined by
Hastie et al. (2009). ALVMs are not linear fitting
methods due to the inequality constraint). Since best subset selection
requires fitting
models (where
is the number of
candidate features), it is infeasible for large
. A greedy search
technique can therefore be used as an alternative, where one begins with
a null model and adds the feature that leads to the best improvement in
the metric, stopping when no new feature leads to an improvement.
The polynomial and thin-plate spline ALVMs have a penalty hyperparameter
that must either be specified or tuned.
-fold
cross-validation or quasi-generalised cross-validation can be used for
tuning. The clustering ALVM has a hyperparameter
, the number of
clusters into which to group the observations (where error variances
are assumed to be equal within each cluster).
can be specified
or tuned. The available tuning methods are an elbow method (using a
sum of within-cluster distances criterion, a maximum
within-cluster distance criterion, or a combination of the two) and
-fold cross-validation.
An object of class "alvm.fit"
, containing the following:
coef.est
, a vector of parameter estimates,
var.est
, a vector of estimates of the error
variances for all observations
method
, a character corresponding to the model
argument
ols
, the lm
object corresponding to the original linear
regression model
fitinfo
, a list containing four named objects: Msq
(the
elementwise-square of the annihilator matrix ),
L
(the
linear predictor matrix ),
clustering
(a list object
with results of the clustering procedure), and gam.object
, an
object of class "gam"
(see gamObject
). The
last two are set to NA
unless the clustering ALVM or thin-plate
spline ALVM is used, respectively
hyperpar
, a named list of hyperparameter values,
lambda
, nclust
, tsk
, and d
, and tuning
methods, lambdamethod
and nclustmethod
. Values
corresponding to unused hyperparameters are set to NA
.
selectinfo
, a list containing two named objects,
varselect
(the value of the eponymous argument), and
selectedcols
(a numeric vector with column indices of
that were selected, with
1
denoting the intercept column)
pentype
, a character corresponding to the polypen
argument
solver
, a character corresponding to the solver
argument (or specifying the QP solver actually used, if solver
was set to "auto"
)
constol
, a double corresponding to the constol
argument
Hastie T, Tibshirani R, Friedman JH (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edition. Springer, New York.
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) myalvm <- alvm.fit(mtcars_lm, model = "cluster")
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) myalvm <- alvm.fit(mtcars_lm, model = "cluster")
Fits an Auxiliary Nonlinear Variance Model (ANLVM) to estimate the error variances of a heteroskedastic linear regression model.
anlvm.fit( mainlm, g, M = NULL, cluster = FALSE, varselect = c("none", "hettest", "cv.linear", "cv.cluster", "qgcv.linear", "qgcv.cluster"), nclust = c("elbow.swd", "elbow.mwd", "elbow.both"), clustering = NULL, param.init = function(q) stats::runif(n = q, min = -5, max = 5), maxgridrows = 20L, nconvstop = 3L, zerosallowed = FALSE, maxitql = 100L, tolql = 1e-08, nestedql = FALSE, reduce2homosked = TRUE, cvoption = c("testsetols", "partitionres"), nfolds = 5L, ... )
anlvm.fit( mainlm, g, M = NULL, cluster = FALSE, varselect = c("none", "hettest", "cv.linear", "cv.cluster", "qgcv.linear", "qgcv.cluster"), nclust = c("elbow.swd", "elbow.mwd", "elbow.both"), clustering = NULL, param.init = function(q) stats::runif(n = q, min = -5, max = 5), maxgridrows = 20L, nconvstop = 3L, zerosallowed = FALSE, maxitql = 100L, tolql = 1e-08, nestedql = FALSE, reduce2homosked = TRUE, cvoption = c("testsetols", "partitionres"), nfolds = 5L, ... )
mainlm |
Either an object of |
g |
A numeric-valued function of one variable, or a character denoting
the name of such a function. |
M |
An |
cluster |
A logical; should the design matrix X be replaced with an
|
varselect |
Either a character indicating how variable selection should
be conducted, or an integer vector giving indices of columns of the
predictor matrix (
|
nclust |
A character indicating which elbow method to use to select
the number of clusters (ignored if |
clustering |
A list object of class |
param.init |
Specifies the initial values of the parameter vector to
use in the Gauss-Newton fitting algorithm. This can either be a function
for generating the initial values from a probability distribution, a
list containing named objects corresponding to the arguments of
|
maxgridrows |
An integer indicating the maximum number of initial
values of the parameter vector to try, in case of |
nconvstop |
An integer indicating how many times the quasi-likelihood
estimation algorithm should converge before the grid search across
different initial parameter values is truncated. Defaults to |
zerosallowed |
A logical indicating whether 0 values are acceptable
in the initial values of the parameter vector. Defaults to |
maxitql |
An integer specifying the maximum number of iterations to
run in the Gauss-Newton algorithm for quasi-likelihood estimation.
Defaults to |
tolql |
A double specifying the convergence criterion for the
Gauss-Newton algorithm; defaults to |
nestedql |
A logical indicating whether to use the nested updating step
suggested in Seber and Wild (2003). Defaults to
|
reduce2homosked |
A logical indicating whether the homoskedastic
error variance estimator |
cvoption |
A character, either |
nfolds |
An integer specifying the number of folds |
... |
Other arguments that can be passed to (non-exported) helper functions, namely:
|
The ANLVM model equation is
,
where is the
th Ordinary Least Squares residual,
is a vector corresponding to the
th row of the
design matrix
,
is the
th element of the annihilator matrix
,
is a random error term,
is a
-vector of
unknown parameters, and
is a continuous, differentiable
function that need not be linear in
, but must be expressible
as a function of the linear predictor
.
This method has been developed as part of the author's doctoral research
project.
The parameter vector is estimated using the maximum
quasi-likelihood method as described in section 2.3 of
Seber and Wild (2003). The optimisation problem is
solved numerically using a Gauss-Newton algorithm.
For further discussion of feature selection and the methods for choosing the
number of clusters to use with the clustering version of the model, see
alvm.fit
.
An object of class "anlvm.fit"
, containing the following:
coef.est
, a vector of parameter estimates,
var.est
, a vector of estimates of the error
variances for all observations
method
, either "cluster"
or "functionalform"
,
depending on whether cluster
was set to TRUE
ols
, the lm
object corresponding to the original linear
regression model
fitinfo
, a list containing three named objects, g
(the
heteroskedastic function), Msq
(the elementwise-square of the
annihilator matrix ),
Z
(the design matrix used in the
ANLVM, after feature selection if applicable), and clustering
(a list object with results of the clustering procedure, if applicable).
selectinfo
, a list containing two named objects,
varselect
(the value of the eponymous argument), and
selectedcols
(a numeric vector with column indices of
that were selected, with
1
denoting the intercept column)
qlinfo
, a list containing nine named objects: converged
(a logical, indicating whether the Gauss-Newton algorithm converged
for at least one initial value of the parameter vector),
iterations
(the number of Gauss-Newton iterations used to
obtain the parameter estimates returned), Smin
(the minimum
achieved value of the objective function used in the Gauss-Newton
routine), and six arguments passed to the function (nested
,
param.init
, maxgridrows
, nconvstop
,
maxitql
, and tolql
)
Seber GAF, Wild CJ (2003). Nonlinear Regression. Wiley, Hoboken, NJ.
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) myanlvm <- anlvm.fit(mtcars_lm, g = function(x) x ^ 2, varselect = "qgcv.linear")
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) myanlvm <- anlvm.fit(mtcars_lm, g = function(x) x ^ 2, varselect = "qgcv.linear")
This function implements the method of Anscombe (1961) for testing for heteroskedasticity in a linear regression model, with or without the studentising modification of Bickel (1978).
anscombe(mainlm, studentise = TRUE, statonly = FALSE)
anscombe(mainlm, studentise = TRUE, statonly = FALSE)
mainlm |
Either an object of |
studentise |
A logical. Should studentising modification of
Bickel (1978) be implemented? Defaults to
|
statonly |
A logical. If |
Anscombe's Test is among the earliest suggestions for heteroskedasticity
diagnostics in the linear regression model. The test is not based on
formally derived theory but on a test statistic that Anscombe intuited
to be approximately standard normal under the null hypothesis of
homoskedasticity. Bickel (1978) discusses
the test and suggests a studentising modification (included in this
function) as well as a robustifying modification
(included in bickel
). The test is two-tailed.
An object of class
"htest"
. If object is
not assigned, its attributes are displayed in the console as a
tibble
using tidy
.
Anscombe FJ (1961).
“Examination of Residuals.”
In Neyman J (ed.), Fourth Berkeley Symposium on Mathematical Statistics and Probability June 20-July 30, 1960, 1–36.
Berkeley: University of California Press.
Bickel PJ (1978).
“Using Residuals Robustly I: Tests for Heteroscedasticity, Nonlinearity.”
The Annals of Statistics, 6(2), 266–291.
bickel
, which is a robust extension of this test.
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) anscombe(mtcars_lm)
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) anscombe(mtcars_lm)
Uses bootstrap methods to compute approximate confidence intervals for error variances in a heteroskedastic linear regression model based on an auxiliary linear variance model (ALVM) or auxiliary nonlinear variance model (ANLVM).
avm.ci( object, bootobject = NULL, bootavmobject = NULL, jackobject = NULL, bootCImethod = c("pct", "bca", "stdnorm"), bootsampmethod = c("pairs", "wild"), Bextra = 500L, Brequired = 1000L, conf.level = 0.95, expand = TRUE, retune = FALSE, resfunc = c("identity", "hccme"), qtype = 6, rm_on_constraint = TRUE, rm_nonconverged = TRUE, jackknife_point = FALSE, ... )
avm.ci( object, bootobject = NULL, bootavmobject = NULL, jackobject = NULL, bootCImethod = c("pct", "bca", "stdnorm"), bootsampmethod = c("pairs", "wild"), Bextra = 500L, Brequired = 1000L, conf.level = 0.95, expand = TRUE, retune = FALSE, resfunc = c("identity", "hccme"), qtype = 6, rm_on_constraint = TRUE, rm_nonconverged = TRUE, jackknife_point = FALSE, ... )
object |
An object of class |
bootobject |
An object of class |
bootavmobject |
An object of class |
jackobject |
An object of class |
bootCImethod |
A character specifying the method to use when computing
the approximate bootstrap confidence interval. The default, |
bootsampmethod |
A character specifying the method to use for
generating nonparametric bootstrap linear regression models. Corresponds
to the |
Bextra |
An integer indicating the maximum number of additional
bootstrap models that should be fitted in an attempt to obtain
|
Brequired |
An integer indicating the number of bootstrap regression
models that should be used to compute the bootstrap confidence intervals.
The default behaviour is to base the interval estimates only on bootstrap
ALVM variance estimates that are not on the constraint boundary or on
bootstrap ANLVMs where the estimation algorithm converged. Consequently,
if this is not the case for all of the first |
conf.level |
A double representing the confidence level |
expand |
A logical specifying whether to implement the expansion
technique described in Hesterberg (2015).
Defaults to |
retune |
A logical specifying whether to re-tune hyperparameters and
re-select features each time an ALVM or (in the case of feature
selection) ANLVM is fit to a bootstrapped regression model. If
|
resfunc |
Either a character naming a function to call to apply a
transformation to the Ordinary Least Squares residuals, or a function
to apply for the same purpose. This argument is ignored if
|
qtype |
A numeric corresponding to the |
rm_on_constraint |
A logical specifying whether to exclude
bootstrapped ALVMs from the interval estimation method where the ALVM
parameter estimate falls on the constraint boundary. Defaults to
|
rm_nonconverged |
A logical specifying whether to exclude bootstrapped
ANLVMs from the interval estimation method where the optimisation
algorithm used in quasi-likelihood estimation of the ANLVM parameter did
not converge. Defaults to |
jackknife_point |
A logical specifying whether to replace the point
estimates of the error variances |
... |
Other arguments to pass to non-exported helper functions |
resampled versions of the original linear regression model
(which can be accessed using
object$ols
) are generated using a
nonparametric bootstrap method that is suitable for heteroskedastic
linear regression models, namely either the pairs bootstrap or the wild
bootstrap (bootstrapping residuals is not suitable). Depending on
the class of object
, either an ALVM or an ANLVM is fit to each of
the bootstrapped regression models. The distribution of the
bootstrap estimates of each error variance
,
, is used to construct an approximate confidence
interval for
. This is done using one of three methods.
The first is the percentile interval, which simply takes the empirical
and
quantiles of the
th bootstrap
variance estimates. The second is the Bias-Corrected and accelerated
(BCa) method as described in Efron and Tibshirani (1993),
which is intended to improve on the percentile interval (although
simulations have not found it to yield better coverage probabilities).
The third is the naive standard normal interval, which takes
, where
is the standard deviation of the
bootstrap estimates of
. By default, the expansion
technique described in Hesterberg (2015) is
also applied; evidence from simulations suggests that this does
improve coverage probabilities.
Technically, the hyperparameters of the ALVM, such as (for a
penalised polynomial or thin-plate spline model) or
(for a
clustering model) should be re-tuned every time the ALVM is fitted to
another bootstrapped regression model. However, due to the computational
cost, this is not done by
avm.ci
unless retune
is set to
TRUE
.
When obtained from ALVMs, bootstrap estimates of that fall on
the constraint boundary (i.e., are estimated to be near 0) are ignored
by default; there is an attempt to obtain
Brequired
bootstrap
estimates of each that do not fall on the constraint
boundary. This fine-tuning can be turned off by setting the
rm_onconstraint
argument to FALSE
; the amount of effort
put into obtaining non-boundary estimates is controlled using the
Bextra
argument. When ANLVMs are used, the default behaviour is to
try to obtain Brequired
bootstrap estimates of where
the Gauss-Newton algorithm applied for quasi-likelihood estimation
has converged, and ignore estimates obtained from non-convergent cases.
This behaviour can be toggled using the
rm_nonconverged
argument.
An object of class "avm.ci"
, containing the following:
climits
, an matrix with lower confidence
limits in the first column and upper confidence limits in the second
var.est
, a vector of length of point estimates
of the error variances. This is the same vector
passed within
object
, unless jackknife_point
is
TRUE
.
conf.level
, corresponding to the eponymous argument
bootCImethod
, corresponding to the eponymous argument
bootsampmethod
, corresponding to the eponymous argument or
otherwise extracted from bootobject
Efron B, Tibshirani RJ (1993).
An Introduction to the Bootstrap.
Springer Science+Business Media, Dordrecht.
Hesterberg T (2015).
“What Teachers Should Know about the Bootstrap: Resampling in the Undergraduate Statistics Curriculum.”
alvm.fit
, anlvm.fit
,
Efron and Tibshirani (1993)
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) myalvm <- alvm.fit(mtcars_lm, model = "cluster") # Brequired would of course not be so small in practice ci.alvm <- avm.ci(myalvm, Brequired = 5)
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) myalvm <- alvm.fit(mtcars_lm, model = "cluster") # Brequired would of course not be so small in practice ci.alvm <- avm.ci(myalvm, Brequired = 5)
This function applies feasible weighted least squares (FWLS) to a
linear regression model using error variance estimates obtained
from an auxiliary linear variance model fit using alvm.fit
or from an auxiliary nonlinear variance model fit using
anlvm.fit
.
avm.fwls(object, fastfit = FALSE)
avm.fwls(object, fastfit = FALSE)
object |
Either an object of class |
fastfit |
A logical. If |
The function simply calculates
,
where is the design matrix,
is the response vector, and
is the diagonal variance-covariance matrix of the
random errors, whose diagonal elements have been estimated by an
auxiliary variance model.
Either an object of class
"lm"
(if fastfit
is FALSE
) or otherwise a generic
list object
There are no references for Rd macro \insertAllCites
on this help page.
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) myalvm <- alvm.fit(mainlm = mtcars_lm, model = "linear", varselect = "qgcv.linear") myfwls <- avm.fwls(myalvm) cbind(coef(mtcars_lm), coef(myfwls))
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) myalvm <- alvm.fit(mainlm = mtcars_lm, model = "linear", varselect = "qgcv.linear") myfwls <- avm.fwls(myalvm) cbind(coef(mtcars_lm), coef(myfwls))
The function simply calculates
,
where is the design matrix of a linear regression model and
is an estimate of the diagonal variance-covariance
matrix of the random errors, whose diagonal elements have been
obtained from an auxiliary variance model fit with
alvm.fit
or anlvm.fit
.
avm.vcov(object, as_matrix = TRUE)
avm.vcov(object, as_matrix = TRUE)
object |
Either an object of class |
as_matrix |
A logical. If |
Either a numeric matrix or a numeric vector, whose (diagonal)
elements are ,
.
There are no references for Rd macro \insertAllCites
on this help page.
alvm.fit
, anlvm.fit
,
avm.fwls
. If a matrix is returned, it can be
passed to coeftest
for implementation
of a quasi--test of significance of the
coefficients.
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) myalvm <- alvm.fit(mainlm = mtcars_lm, model = "linear", varselect = "qgcv.linear") myvcov <- avm.vcov(myalvm) lmtest::coeftest(mtcars_lm, vcov. = myvcov)
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) myalvm <- alvm.fit(mainlm = mtcars_lm, model = "linear", varselect = "qgcv.linear") myvcov <- avm.vcov(myalvm) lmtest::coeftest(mtcars_lm, vcov. = myvcov)
This function implements the Bartlett's Specification Error Test
(BAMSET) method of Ramsey (1969) for testing
for heteroskedasticity in a linear regression model.
bamset( mainlm, k = 3, deflator = NA, correct = TRUE, omitatmargins = TRUE, omit = NA, categorical = FALSE, statonly = FALSE )
bamset( mainlm, k = 3, deflator = NA, correct = TRUE, omitatmargins = TRUE, omit = NA, categorical = FALSE, statonly = FALSE )
mainlm |
Either an object of |
k |
An integer. The number of subsets (>= 2) into which the BLUS residuals are to be partitioned. Defaults to 3, the value suggested in Ramsey (1969). |
deflator |
Either a character specifying a column name from the
design matrix of |
correct |
A logical. Should the test statistic be divided by a scaling
constant to improve the chi-squared approximation? Defaults to
|
omitatmargins |
A logical. Should the indices of observations at the
margins of the |
omit |
A numeric vector of length |
categorical |
A logical. Is the deflator a categorical variable? If
so, the number of levels will be used as |
statonly |
A logical. If |
BAMSET is an analogue of Bartlett's Test for heterogeneity
of variances across independent samples from
populations. In this
case the populations are
subsets of the residuals from a linear
regression model. In order to meet the independence assumption,
BLUS residuals are computed, meaning that only
observations are used (where
is the number of rows and
the number of columns in the design matrix). Under the null hypothesis
of homoskedasticity, the test statistic is asymptotically chi-squared
distributed with
degrees of freedom. The test is right-tailed.
An object of class
"htest"
. If object is
not assigned, its attributes are displayed in the console as a
tibble
using tidy
.
Ramsey JB (1969). “Tests for Specification Errors in Classical Linear Least-Squares Regression Analysis.” Journal of the Royal Statistical Society. Series B (Statistical Methodology), 31(2), 350–371.
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) bamset(mtcars_lm, deflator = "qsec", k = 3) # BLUS residuals cannot be computed with given `omit` argument and so # omitted indices are randomised: bamset(mtcars_lm, deflator = "qsec", k = 4, omitatmargins = FALSE, omit = "last")
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) bamset(mtcars_lm, deflator = "qsec", k = 3) # BLUS residuals cannot be computed with given `omit` argument and so # omitted indices are randomised: bamset(mtcars_lm, deflator = "qsec", k = 4, omitatmargins = FALSE, omit = "last")
This function implements the method of Bickel (1978) for testing for heteroskedasticity in a linear regression model, with or without the scale-invariance modification of Carroll and Ruppert (1981).
bickel( mainlm, fitmethod = c("lm", "rlm"), a = "identity", b = c("hubersq", "tanhsq"), scale_invariant = TRUE, k = 1.345, statonly = FALSE, ... )
bickel( mainlm, fitmethod = c("lm", "rlm"), a = "identity", b = c("hubersq", "tanhsq"), scale_invariant = TRUE, k = 1.345, statonly = FALSE, ... )
mainlm |
Either an object of |
fitmethod |
A character indicating the method to be used to fit the
regression model. This can be "OLS" for ordinary least squares (the
default) or "robust" in which case a robust fitting method is called
from |
a |
A character argument specifying the name of a function to be
applied to the fitted values, or an integer |
b |
A character argument specifying a function to be applied to the
residuals. Defaults to Huber's function squared, as recommended by
Carroll and Ruppert (1981). Currently the only supported
functions are |
scale_invariant |
A logical indicating whether the scale-invariance
modification proposed by Carroll and Ruppert (1981)
should be implemented. Defaults to |
k |
A double argument specifying a parameter for Huber's function
squared; used only if |
statonly |
A logical. If |
... |
Optional arguments to be passed to |
Bickel's Test is a robust extension of Anscombe's Test
(Anscombe 1961) in which the OLS residuals and
estimated standard error are replaced with an estimator. Under
the null hypothesis of homoskedasticity, the distribution of the test
statistic is asymptotically standard normally distributed. The test is
two-tailed.
An object of class
"htest"
. If object is
not assigned, its attributes are displayed in the console as a
tibble
using tidy
.
Anscombe FJ (1961).
“Examination of Residuals.”
In Neyman J (ed.), Fourth Berkeley Symposium on Mathematical Statistics and Probability June 20-July 30, 1960, 1–36.
Berkeley: University of California Press.
Bickel PJ (1978).
“Using Residuals Robustly I: Tests for Heteroscedasticity, Nonlinearity.”
The Annals of Statistics, 6(2), 266–291.
Carroll RJ, Ruppert D (1981).
“On Robust Tests for Heteroscedasticity.”
The Annals of Statistics, 9(1), 206–210.
discussions of this test in Carroll and Ruppert (1981) and Ali and Giaccotto (1984).
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) bickel(mtcars_lm) bickel(mtcars_lm, fitmethod = "rlm") bickel(mtcars_lm, scale_invariant = FALSE)
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) bickel(mtcars_lm) bickel(mtcars_lm, fitmethod = "rlm") bickel(mtcars_lm, scale_invariant = FALSE)
This function computes the Best Linear Unbiased Scalar-Covariance (BLUS) residuals from a linear model, as defined in Theil (1965) and explained further in Theil (1968).
blus( mainlm, omit = c("first", "last", "random"), keepNA = TRUE, exhaust = NA, seed = 1234 )
blus( mainlm, omit = c("first", "last", "random"), keepNA = TRUE, exhaust = NA, seed = 1234 )
mainlm |
Either an object of |
omit |
A numeric vector of length |
keepNA |
A logical. Should BLUS residuals for omitted observations be
returned as |
exhaust |
An integer. If singular matrices are encountered
using the passed value of |
seed |
An integer specifying a seed to pass to
|
Under the ideal linear model conditions, the BLUS residuals have a
scalar covariance matrix (meaning they have a constant
variance and are mutually uncorrelated), unlike the OLS residuals, which
have covariance matrix
where
is a function of
the design matrix. Use of BLUS residuals could improve the performance of
tests for heteroskedasticity and/or autocorrelation in the linear model.
A linear model with
observations and an
design
matrix yields only
BLUS residuals. The choice of which
observations will not be represented in the BLUS residuals is specified
within the algorithm.
A double vector of length containing the BLUS residuals
(with
NA_real_
) for omitted observations), or a double vector
of length containing the BLUS residuals only (if
keepNA
is set to FALSE
)
Theil H (1965).
“The Analysis of Disturbances in Regression Analysis.”
Journal of the American Statistical Association, 60(312), 1067–1079.
Theil H (1968).
“A Simplification of the BLUS Procedure for Analyzing Regression Disturbances.”
Journal of the American Statistical Association, 63(321), 242–251.
H. D. Vinod's online article, Theil's BLUS Residuals and R Tools for Testing and Removing Autocorrelation and Heteroscedasticity, for an alternative function for computing BLUS residuals.
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) blus(mtcars_lm) plot(mtcars_lm$residuals, blus(mtcars_lm)) # Same as first example mtcars_list <- list("y" = mtcars$mpg, "X" = cbind(1, mtcars$wt, mtcars$qsec, mtcars$am)) blus(mtcars_list) # Again same as first example mtcars_list2 <- list("e" = mtcars_lm$residuals, "X" = cbind(1, mtcars$wt, mtcars$qsec, mtcars$am)) blus(mtcars_list2) # BLUS residuals cannot be computed with `omit = "last"` in this example, so # omitted indices are randomised: blus(mtcars_lm, omit = "last")
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) blus(mtcars_lm) plot(mtcars_lm$residuals, blus(mtcars_lm)) # Same as first example mtcars_list <- list("y" = mtcars$mpg, "X" = cbind(1, mtcars$wt, mtcars$qsec, mtcars$am)) blus(mtcars_list) # Again same as first example mtcars_list2 <- list("e" = mtcars_lm$residuals, "X" = cbind(1, mtcars$wt, mtcars$qsec, mtcars$am)) blus(mtcars_list2) # BLUS residuals cannot be computed with `omit = "last"` in this example, so # omitted indices are randomised: blus(mtcars_lm, omit = "last")
Generates nonparametric bootstrap replications of a linear
regression model that may have heteroskedasticity.
bootlm( object, sampmethod = c("pairs", "wild"), B = 1000L, resfunc = c("identity", "hccme"), fastfit = TRUE, ... )
bootlm( object, sampmethod = c("pairs", "wild"), B = 1000L, resfunc = c("identity", "hccme"), fastfit = TRUE, ... )
object |
Either an object of class |
sampmethod |
A character, either |
B |
An integer representing the number of bootstrapped linear
regression models to generate; defaults to |
resfunc |
Either a character naming a function to call to apply a
transformation to the Ordinary Least Squares residuals, or a function
to apply for the same purpose. This argument is ignored if
|
fastfit |
A logical indicating whether the |
... |
Other arguments to pass to |
Each replication of the pairs bootstrap entails drawing a sample
of size (the number of observations) with replacement from the
indices
. The pair or case
is
included as an observation in the bootstrapped data set for each sampled
index. An Ordinary Least Squares fit to the bootstrapped data set is then
computed.
Under the wild bootstrap, each replication of the linear regression model
is generated by first independently drawing random values
,
, from a distribution with zero mean and
unit variance. The
th bootstrap response is then computed as
, where is the
th
design observation,
is the Ordinary Least Squares
estimate of the coefficient vector
,
is the
th Ordinary Least Squares residual, and
is a
function performing some transformation on the residual. An Ordinary
Least Squares fit is then computed on the original design matrix and the
bootstrap response vector.
A list object of class "bootlm"
, containing B
objects,
each of which is a bootstrapped linear regression model fit by Ordinary
Least Squares. If fastfit
was set to TRUE
, each of these
objects will be a list containing named objects y
(the bootstrap
response vector), X
(the bootstrap design matrix, which is just
the original design matrix under the wild bootstrap), e
(the
residual vector from the Ordinary Least Squares fit to this bootstrap
data set), beta.hat
(the vector of coefficient estimates from the
Ordinary Least Squares fit to this bootstrap data set),
sampmethod
, and ind
(a vector of the indices from the
original data set used in this bootstrap sample; ignored under the
wild bootstrap)
of the kind returned by
lm.fit
; otherwise, each will be an object of class
"lm"
.
Davidson R, Flachaire E (2008).
“The Wild Bootstrap, Tamed at Last.”
Journal of Econometrics, 146, 162–169.
Efron B, Tibshirani RJ (1993).
An Introduction to the Bootstrap.
Springer Science+Business Media, Dordrecht.
paired.boot
and
wild.boot
for the pairs bootstrap and wild
bootstrap, respectively. The latter function does not appear to allow
transformations of the residuals in the wild bootstrap.
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) mybootlm <- bootlm(mtcars_lm)
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) mybootlm <- bootlm(mtcars_lm)
This function implements the popular method of Breusch and Pagan (1979) for testing for heteroskedasticity in a linear regression model, with or without the studentising modification of Koenker (1981).
breusch_pagan(mainlm, auxdesign = NA, koenker = TRUE, statonly = FALSE)
breusch_pagan(mainlm, auxdesign = NA, koenker = TRUE, statonly = FALSE)
mainlm |
Either an object of |
auxdesign |
A |
koenker |
A logical. Should studentising modification of
Koenker (1981) be implemented? Defaults to
|
statonly |
A logical. If |
The Breusch-Pagan Test entails fitting an auxiliary regression
model in which the response variable is the vector of squared residuals
from the original model and the design matrix consists of one or
more exogenous variables that are suspected of being related to the error
variance. In the absence of prior information on a possible choice of
, one would typically use the explanatory variables from the
original model. Under the null hypothesis of homoskedasticity, the
distribution of the test statistic is asymptotically chi-squared with
parameter
degrees of freedom. The test is right-tailed.
An object of class
"htest"
. If object
is not assigned, its attributes are displayed in the console as a
tibble
using tidy
.
Breusch TS, Pagan AR (1979).
“A Simple Test for Heteroscedasticity and Random Coefficient Variation.”
Econometrica, 47(5), 1287–1294.
Koenker R (1981).
“A Note on Studentizing a Test for Heteroscedasticity.”
Journal of Econometrics, 17, 107–112.
lmtest::bptest
, which performs exactly
the same test as this function; car::ncvTest
,
which is not the same test but is implemented in
cook_weisberg
; white
, which is a special
case of the Breusch-Pagan Test.
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) breusch_pagan(mtcars_lm) breusch_pagan(mtcars_lm, koenker = FALSE) # Same as first example mtcars_list <- list("y" = mtcars$mpg, "X" = cbind(1, mtcars$wt, mtcars$qsec, mtcars$am)) breusch_pagan(mtcars_list)
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) breusch_pagan(mtcars_lm) breusch_pagan(mtcars_lm, koenker = FALSE) # Same as first example mtcars_list <- list("y" = mtcars$mpg, "X" = cbind(1, mtcars$wt, mtcars$qsec, mtcars$am)) breusch_pagan(mtcars_list)
This function implements the two methods (parametric and nonparametric) of Carapeto and Holt (2003) for testing for heteroskedasticity in a linear regression model.
carapeto_holt( mainlm, deflator = NA, prop_central = 1/3, group1prop = 1/2, qfmethod = "imhof", alternative = c("greater", "less", "two.sided"), twosidedmethod = c("doubled", "kulinskaya"), statonly = FALSE )
carapeto_holt( mainlm, deflator = NA, prop_central = 1/3, group1prop = 1/2, qfmethod = "imhof", alternative = c("greater", "less", "two.sided"), twosidedmethod = c("doubled", "kulinskaya"), statonly = FALSE )
mainlm |
Either an object of |
deflator |
Either a character specifying a column name from the
design matrix of |
prop_central |
A double specifying the proportion of central
observations to exclude when comparing the two subsets of observations.
|
group1prop |
A double specifying the proportion of remaining
observations (after excluding central observations) to allocate
to the first group. The default value of |
qfmethod |
A character, either |
alternative |
A character specifying the form of alternative
hypothesis. If it is suspected that the error variance is positively
associated with the deflator variable, |
twosidedmethod |
A character indicating the method to be used to compute
two-sided |
statonly |
A logical. If |
The test is based on the methodology of
Goldfeld and Quandt (1965) but does not require any
auxiliary regression. It entails ordering the observations by some
suspected deflator (one of the explanatory variables) in such a way
that, under the alternative hypothesis, the observations would now
be arranged in decreasing order of error variance. A specified proportion
of the most central observations (under this ordering) is removed,
leaving a subset of lower observations and a subset of upper
observations. The test statistic is then computed as a ratio of quadratic
forms corresponding to the sums of squared residuals of the upper and
lower observations respectively. -values are computed by the
Imhof algorithm in
pRQF
.
An object of class
"htest"
. If object is
not assigned, its attributes are displayed in the console as a
tibble
using tidy
.
Carapeto M, Holt W (2003).
“Testing for Heteroscedasticity in Regression Models.”
Journal of Applied Statistics, 30(1), 13–20.
Goldfeld SM, Quandt RE (1965).
“Some Tests for Homoscedasticity.”
Journal of the American Statistical Association, 60(310), 539–547.
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) carapeto_holt(mtcars_lm, deflator = "qsec", prop_central = 0.25) # Same as previous example mtcars_list <- list("y" = mtcars$mpg, "X" = cbind(1, mtcars$wt, mtcars$qsec, mtcars$am)) carapeto_holt(mtcars_list, deflator = 3, prop_central = 0.25)
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) carapeto_holt(mtcars_lm, deflator = "qsec", prop_central = 0.25) # Same as previous example mtcars_list <- list("y" = mtcars$mpg, "X" = cbind(1, mtcars$wt, mtcars$qsec, mtcars$am)) carapeto_holt(mtcars_list, deflator = 3, prop_central = 0.25)
This function implements the score test of Cook and Weisberg (1983) for testing for heteroskedasticity in a linear regression model.
cook_weisberg( mainlm, auxdesign = NA, hetfun = c("mult", "add", "logmult"), statonly = FALSE )
cook_weisberg( mainlm, auxdesign = NA, hetfun = c("mult", "add", "logmult"), statonly = FALSE )
mainlm |
Either an object of |
auxdesign |
A |
hetfun |
A character describing the form of |
statonly |
A logical. If |
The Cook-Weisberg Score Test entails fitting an auxiliary
regression model in which the response variable is the vector of
standardised squared residuals from the original
OLS model and the design matrix is some function of
, an
matrix consisting of
exogenous variables,
appended to a column of ones. The test statistic is half the residual sum
of squares from this auxiliary regression. Under the null hypothesis of
homoskedasticity, the distribution of the test statistic is
asymptotically chi-squared with
degrees of freedom. The test is
right-tailed.
An object of class
"htest"
. If object is
not assigned, its attributes are displayed in the console as a
tibble
using tidy
.
Cook RD, Weisberg S (1983).
“Diagnostics for Heteroscedasticity in Regression.”
Biometrika, 70(1), 1–10.
Griffiths WE, Surekha K (1986).
“A Monte Carlo Evaluation of the Power of Some Tests for Heteroscedasticity.”
Journal of Econometrics, 31(1), 219–231.
car::ncvTest
, which implements the same
test. Calling car::ncvTest
with var.formula
argument omitted
is equivalent to calling skedastic::cook_weisberg
with
auxdesign = "fitted.values", hetfun = "additive"
. Calling
car::ncvTest
with var.formula = ~ X
(where X
is the
design matrix of the linear model with the intercept column omitted) is
equivalent to calling skedastic::cook_weisberg
with default
auxdesign
and hetfun
values. The
hetfun = "multiplicative"
option has no equivalent in
car::ncvTest
.
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) cook_weisberg(mtcars_lm) cook_weisberg(mtcars_lm, auxdesign = "fitted.values", hetfun = "logmult")
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) cook_weisberg(mtcars_lm) cook_weisberg(mtcars_lm, auxdesign = "fitted.values", hetfun = "logmult")
This function computes the number of peaks in a double vector, with
peak defined as per Goldfeld and Quandt (1965). The function
is used in the Goldfeld-Quandt nonparametric test for heteroskedasticity in
a linear model. NA
and NaN
values in the sequence are ignored.
countpeaks(x)
countpeaks(x)
x |
A double vector. |
An integer value between 0
and length(x) - 1
representing the number of peaks in the sequence.
Goldfeld SM, Quandt RE (1965). “Some Tests for Homoscedasticity.” Journal of the American Statistical Association, 60(310), 539–547.
set.seed(9586) countpeaks(stats::rnorm(20)) mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) countpeaks(mtcars_lm$residuals)
set.seed(9586) countpeaks(stats::rnorm(20)) mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) countpeaks(mtcars_lm$residuals)
This function computes , i.e. the probability mass
function for
, the nonparametric trend
statistic proposed by Lehmann (1975), under the
assumption that the ranks
are computed on a series of
independent and identically distributed random variables with no ties.
dDtrend(k = "all", n, override = FALSE)
dDtrend(k = "all", n, override = FALSE)
k |
An integer of |
n |
A positive integer representing the number of observations in the
series. Note that computation time increases rapidly with |
override |
A logical. By default, the function aborts if |
The function is used within horn
in computing
-values for Horn's nonparametric test for heteroskedasticity in a
linear regression model (Horn 1981). The support of
consists of consecutive even numbers from 0 to
, with the exception of the case
,
when the value 4 is excluded from the support. Note that computation speed
for
k = "all"
is about the same as when k
is set to an
individual integer value, because the entire distribution is still
computed in the latter case.
A double vector containing the probabilities corresponding to the
integers in its names
attribute.
Horn P (1981).
“Heteroscedasticity of Residuals: A Non-Parametric Alternative to the Goldfeld-Quandt Peak Test.”
Communications in Statistics - Theory and Methods, 10(8), 795–808.
Lehmann EL (1975).
Nonparametrics: Statistical Methods Based on Ranks.
Holden-Day, San Francisco.
prob <- dDtrend(k = "all", n = 6) values <- as.integer(names(prob)) plot(c(values[1], values[1]), c(0, prob[1]), type = "l", axes = FALSE, xlab = expression(k), ylab = expression(Pr(D == k)), xlim = c(0, 70), yaxs = "i", ylim = c(0, 1.05 * max(prob))) axis(side = 1, at = seq(0, 70, 10), las = 2) for (i in seq_along(values)) { lines(c(values[i], values[i]), c(0, prob[i])) }
prob <- dDtrend(k = "all", n = 6) values <- as.integer(names(prob)) plot(c(values[1], values[1]), c(0, prob[1]), type = "l", axes = FALSE, xlab = expression(k), ylab = expression(Pr(D == k)), xlim = c(0, 70), yaxs = "i", ylim = c(0, 1.05 * max(prob))) axis(side = 1, at = seq(0, 70, 10), las = 2) for (i in seq_along(values)) { lines(c(values[i], values[i]), c(0, prob[i])) }
This function implements the nonparametric test of Diblasi and Bowman (1997) for testing for heteroskedasticity in a linear regression model.
diblasi_bowman( mainlm, distmethod = c("moment.match", "bootstrap"), H = 0.08, ignorecov = TRUE, B = 500L, seed = 1234, statonly = FALSE )
diblasi_bowman( mainlm, distmethod = c("moment.match", "bootstrap"), H = 0.08, ignorecov = TRUE, B = 500L, seed = 1234, statonly = FALSE )
mainlm |
Either an object of |
distmethod |
A character specifying the method by which to estimate
the |
H |
A hyperparameter denoting the bandwidth matrix in the kernel
function used for weights in nonparametric smoothing. If a double of
length 1 (the default), |
ignorecov |
A logical. If |
B |
An integer specifying the number of nonparametric bootstrap
replications to be used, if |
seed |
An integer specifying a seed to pass to
|
statonly |
A logical. If |
The test entails undertaking a transformation of the OLS residuals
, where
denotes
expectation under the null hypothesis of homoskedasticity. The kernel
method of nonparametric regression is used to fit the relationship
between these
and the explanatory variables. This leads to a
test statistic
that is a ratio of quadratic forms involving the
vector of
and the matrix of normal kernel weights. Although
nonparametric in its method of fitting the possible heteroskedastic
relationship, the distributional approximation used to compute
-values assumes normality of the errors.
An object of class
"htest"
. If object is
not assigned, its attributes are displayed in the console as a
tibble
using tidy
.
Diblasi A, Bowman A (1997). “Testing for Constant Variance in a Linear Model.” Statistics & Probability Letters, 33, 95–103.
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) diblasi_bowman(mtcars_lm) diblasi_bowman(mtcars_lm, ignorecov = FALSE) diblasi_bowman(mtcars_lm, distmethod = "bootstrap")
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) diblasi_bowman(mtcars_lm) diblasi_bowman(mtcars_lm, ignorecov = FALSE) diblasi_bowman(mtcars_lm, distmethod = "bootstrap")
This function computes as defined by
Goldfeld and Quandt (1965), i.e. the probability that a
sequence of
independent and identically distributed random variables
contains exactly
peaks, with peaks also as defined by
Goldfeld and Quandt (1965). The function is used in
ppeak
to compute -values for the Goldfeld-Quandt
nonparametric test for heteroskedasticity in a linear model.
dpeak(k, n, usedata = FALSE)
dpeak(k, n, usedata = FALSE)
k |
An integer or a sequence of integers strictly incrementing by 1,
with all values between 0 and |
n |
A positive integer representing the number of observations in the sequence. |
usedata |
A logical. Should probability mass function values be
read from |
A double between 0 and 1 representing the probability of exactly
k peaks occurring in a sequence of independent and identically
distributed continuous random variables. The double has a
names
attribute with the values corresponding to the probabilities.
Computation time is very slow for
(if
usedata
is FALSE
) and for
regardless of
usedata
value.
Goldfeld SM, Quandt RE (1965). “Some Tests for Homoscedasticity.” Journal of the American Statistical Association, 60(310), 539–547.
dpeak(0:9, 10) plot(0:9, dpeak(0:9, 10), type = "p", pch = 20, xlab = "Number of Peaks", ylab = "Probability") # This would be extremely slow if usedata were set to FALSE: prob <- dpeak(0:199, 200, usedata = TRUE) plot(0:199, prob, type = "l", xlab = "Number of Peaks", ylab = "Probability") # `dpeakdat` is a dataset containing probabilities generated from `dpeak` utils::data(dpeakdat) expval <- unlist(lapply(dpeakdat, function(p) sum(p * 0:(length(p) - 1)))) plot(1:1000, expval[1:1000], type = "l", xlab = parse(text = "n"), ylab = "Expected Number of Peaks")
dpeak(0:9, 10) plot(0:9, dpeak(0:9, 10), type = "p", pch = 20, xlab = "Number of Peaks", ylab = "Probability") # This would be extremely slow if usedata were set to FALSE: prob <- dpeak(0:199, 200, usedata = TRUE) plot(0:199, prob, type = "l", xlab = "Number of Peaks", ylab = "Probability") # `dpeakdat` is a dataset containing probabilities generated from `dpeak` utils::data(dpeakdat) expval <- unlist(lapply(dpeakdat, function(p) sum(p * 0:(length(p) - 1)))) plot(1:1000, expval[1:1000], type = "l", xlab = parse(text = "n"), ylab = "Expected Number of Peaks")
A dataset containing the probability mass function for the distribution of the number of peaks in a continuous, uncorrelated stochastic series.
dpeakdat
dpeakdat
A list of 1000 objects. The th object is a double vector
of length
, with elements representing the probability of
peaks for
.
These probabilities were generated from the dpeak
function. This function is computationally very slow for ;
thus the functions of
skedastic
package that require peak
probabilities (ppeak
and goldfeld_quandt
)
by default obtain the probabilities from this data set rather than
from dpeak
, provided that . For
, good luck!
This function implements the method of Dufour et al. (2004) for testing for heteroskedasticity in a linear regression model.
dufour_etal( mainlm, hettest, R = 1000L, alternative = c("greater", "less", "two.sided"), errorgen = stats::rnorm, errorparam = list(), seed = 1234, ... )
dufour_etal( mainlm, hettest, R = 1000L, alternative = c("greater", "less", "two.sided"), errorgen = stats::rnorm, errorparam = list(), seed = 1234, ... )
mainlm |
Either an object of |
hettest |
A character specifying the name of a function
that implements a heteroskedasticity test on a linear regression model.
The function is called with the |
R |
An integer specifying the number of Monte Carlo replicates to
generate. Defaults to |
alternative |
The tailedness of the test whose statistic is computed by
|
errorgen |
A function, or a character specifying the name of a
function, from which the random errors are to be generated. The function
should correspond to a continuous probability distribution that has (or
at least can have) a mean of 0. Defaults to |
errorparam |
An optional list of parameters to pass to |
seed |
An integer specifying a seed to pass to
|
... |
Additional arguments to pass to function with name |
The test implements a Monte Carlo procedure as follows. (1) The
observed value of the test statistic is computed using function
with name
hettest
. (2) R
replications of the random error
vector are generated from the distribution specified using
errorgen
. (3) R
replications of the test statistic,
, are computed from the generated error vectors.
(4) The empirical
p
-value is computed as
, where
,
being the indicator function. The test is right-tailed, regardless of the
tailedness of
hettest
. Note that the heteroskedasticity
test implemented by hettest
must have a test statistic that is
continuous and that is invariant with respect to nuisance parameters
( and
). Note further that if
hettest
is goldfeld_quandt
with method
argument
"parametric"
, the replicated Goldfeld-Quandt statistics
are computed directly within this function rather than by calling
goldfeld_quandt
, due to some idiosyncratic features of this test.
Note that, if alternative
is set to "two.sided"
, the
one-sided -value is doubled (
twosidedpval
cannot
be used in this case).
An object of class
"htest"
. If object
is not assigned, its attributes are displayed in the console as a
tibble
using tidy
.
Dufour J, Khalaf L, Bernard J, Genest I (2004). “Simulation-Based Finite-Sample Tests for Heteroskedasticity and ARCH Effects.” Journal of Econometrics, 122, 317–347.
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) dufour_etal(mtcars_lm, hettest = "breusch_pagan")
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) dufour_etal(mtcars_lm, hettest = "breusch_pagan")
This function implements the two methods of Evans and King (1988) for testing for heteroskedasticity in a linear regression model.
evans_king( mainlm, method = c("GLS", "LM"), deflator = NA, lambda_star = 5, qfmethod = "imhof", statonly = FALSE )
evans_king( mainlm, method = c("GLS", "LM"), deflator = NA, lambda_star = 5, qfmethod = "imhof", statonly = FALSE )
mainlm |
Either an object of |
method |
A character indicating which of the two tests derived in
Evans and King (1988) should be implemented.
Possible values are |
deflator |
Either a character specifying a column name from the
design matrix of |
lambda_star |
A double; coefficient representing the degree of
heteroskedasticity under the alternative hypothesis.
Evans and King (1985) suggests 2.5, 5, 7.5, and 10 as
values to consider, and Evans and King (1988) finds
that 2.5 and 5 perform best empirically. This parameter is used only for
the |
qfmethod |
A character, either |
statonly |
A logical. If |
The test entails putting the data rows in increasing order of
some specified deflator (e.g., one of the explanatory variables) that
is believed to be related to the error variance by some non-decreasing
function. There are two statistics that can be used, corresponding to
the two values of the method
argument. In both cases the test
statistic can be expressed as a ratio of quadratic forms in the errors,
and thus the Imhof algorithm is used to compute -values. Both
methods involve a left-tailed test.
An object of class
"htest"
. If object is
not assigned, its attributes are displayed in the console as a
tibble
using tidy
.
Evans M, King ML (1985).
“A Point Optimal Test for Heteroscedastic Disturbances.”
Journal of Econometrics, 27(2), 163–178.
Evans MA, King ML (1988).
“A Further Class of Tests for Heteroscedasticity.”
Journal of Econometrics, 37, 265–276.
Evans and King (1985), which already anticipates one of the tests.
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) evans_king(mtcars_lm, deflator = "qsec", method = "GLS") evans_king(mtcars_lm, deflator = "qsec", method = "LM")
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) evans_king(mtcars_lm, deflator = "qsec", method = "GLS") evans_king(mtcars_lm, deflator = "qsec", method = "LM")
This function implements the method of Glejser (1969) for testing for "multiplicative" heteroskedasticity in a linear regression model. Mittelhammer et al. (2000) gives the formulation of the test used here.
glejser( mainlm, auxdesign = NA, sigmaest = c("main", "auxiliary"), statonly = FALSE )
glejser( mainlm, auxdesign = NA, sigmaest = c("main", "auxiliary"), statonly = FALSE )
mainlm |
Either an object of |
auxdesign |
A |
sigmaest |
A character indicating which model residuals to use in the
|
statonly |
A logical. If |
Glejser's Test entails fitting an auxiliary regression model in
which the response variable is the absolute residual from the original
model and the design matrix consists of one or more exogenous
variables that are suspected of being related to the error variance.
In the absence of prior information on a possible choice of
,
one would typically use the explanatory variables from the original model.
Under the null hypothesis of homoskedasticity, the distribution of the
test statistic is asymptotically chi-squared with
parameter
degrees
of freedom. The test is right-tailed.
An object of class
"htest"
. If object is
not assigned, its attributes are displayed in the console as a
tibble
using tidy
.
Glejser H (1969).
“A New Test for Heteroskedasticity.”
Journal of the American Statistical Association, 64(325), 316–323.
Mittelhammer RC, Judge GG, Miller DJ (2000).
Econometric Foundations.
Cambridge University Press, Cambridge.
the description of the test in SHAZAM software (which produces identical results).
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) glejser(mtcars_lm)
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) glejser(mtcars_lm)
This function implements the method of Godfrey and Orme (1999) for testing for heteroskedasticity in a linear regression model. The procedure is more clearly described in Godfrey et al. (2006).
godfrey_orme( mainlm, hettest, B = 1000L, alternative = c("greater", "less", "two.sided"), seed = 1234, ... )
godfrey_orme( mainlm, hettest, B = 1000L, alternative = c("greater", "less", "two.sided"), seed = 1234, ... )
mainlm |
Either an object of |
hettest |
A character specifying the name of a function
that implements a heteroskedasticity test on a linear regression model.
The function is called with the |
B |
An integer specifying the number of nonparametric bootstrap samples
to generate. Defaults to |
alternative |
The tailedness of the test whose statistic is computed by
|
seed |
An integer specifying a seed to pass to
|
... |
Additional arguments to pass to function with name |
The procedure runs as follows. (1) The observed
value of the test statistic is computed using function with
name
hettest
. (2) A sample
is drawn with replacement from
the OLS residuals. (3) Bootstrapped response values are computed as
.
(4) Bootstrapped test statistic value
is computed from the
regression of
on
using function
hettest
.
(5) Steps (2)-(4) are repeated until bootstrapped test statistic
values are computed. (6) Empirical
-value is computed by comparing
the bootstrapped test statistic values to the observed test statistic
value. Note that, if
alternative
is set to "two.sided"
, the
one-sided -value is doubled (
twosidedpval
cannot
be used in this case).
An object of class
"htest"
. If object
is not assigned, its attributes are displayed in the console as a
tibble
using tidy
.
Godfrey LG, Orme CD (1999).
“The Robustness, Reliability and Power of Heteroskedasticity Tests.”
Econometric Reviews, 18(2), 169–194.
Godfrey LG, Orme CD, Silva JS (2006).
“Simulation-Based Tests for Heteroskedasticity in Linear Regression Models: Some Further Results.”
Econometrics Journal, 9, 76–97.
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) godfrey_orme(mtcars_lm, hettest = "breusch_pagan")
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) godfrey_orme(mtcars_lm, hettest = "breusch_pagan")
This function implements the two methods (parametric and nonparametric) of Goldfeld and Quandt (1965) for testing for heteroskedasticity in a linear regression model.
goldfeld_quandt( mainlm, method = c("parametric", "nonparametric"), deflator = NA, prop_central = 1/3, group1prop = 1/2, alternative = c("greater", "less", "two.sided"), prob = NA, twosidedmethod = c("doubled", "kulinskaya"), restype = c("ols", "blus"), statonly = FALSE, ... )
goldfeld_quandt( mainlm, method = c("parametric", "nonparametric"), deflator = NA, prop_central = 1/3, group1prop = 1/2, alternative = c("greater", "less", "two.sided"), prob = NA, twosidedmethod = c("doubled", "kulinskaya"), restype = c("ols", "blus"), statonly = FALSE, ... )
mainlm |
Either an object of |
method |
A character indicating which of the two tests derived in Goldfeld and Quandt (1965) should be implemented. Possible values are "parametric" and "nonparametric". Default is "parametric". It is acceptable to specify only the first letter. |
deflator |
Either a character specifying a column name from the
design matrix of |
prop_central |
A double specifying the proportion of central
observations to exclude from the F test (when |
group1prop |
A double specifying the proportion of remaining
observations (after excluding central observations) to allocate
to the first group. The default value of |
alternative |
A character specifying the form of alternative
hypothesis. If it is suspected that the
error variance is positively associated with the deflator variable,
|
prob |
A vector of probabilities corresponding to values of the test
statistic (number of peaks) from 0 to |
twosidedmethod |
A character indicating the method to be used to compute
two-sided |
restype |
A character specifying which residuals to use: |
statonly |
A logical. If |
... |
Optional further arguments to pass to |
The parametric test entails putting the data rows in increasing
order of some specified deflator (one of the explanatory variables). A
specified proportion of the most central observations (under this
ordering) is removed, leaving a subset of lower observations and a subset
of upper observations. Separate OLS regressions are fit to these two
subsets of observations (using all variables from the original model).
The test statistic is the ratio of the sum of squared residuals from the
'upper' model to the sum of squared residuals from the 'lower' model.
Under the null hypothesis, the test statistic is exactly F-distributed
with numerator and denominator degrees of freedom equal to
where
is the number of observations in the
original regression model,
is the number of central observations
removed, and
is the number of columns in the design matrix (number of
parameters to be estimated, including intercept).
The nonparametric test entails putting the residuals of the linear model in
increasing order of some specified deflator (one of the explanatory
variables). The test statistic is the number of peaks, with the th
absolute residual
defined as a peak if
for all
. The first observation does not constitute a peak. If
the number of peaks is large relative to the distribution of peaks under
the null hypothesis, this constitutes evidence for heteroskedasticity.
An object of class
"htest"
. If object is
not assigned, its attributes are displayed in the console as a
tibble
using tidy
.
Goldfeld SM, Quandt RE (1965). “Some Tests for Homoscedasticity.” Journal of the American Statistical Association, 60(310), 539–547.
lmtest::gqtest
, another implementation
of the Goldfeld-Quandt Test (parametric method only).
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) goldfeld_quandt(mtcars_lm, deflator = "qsec", prop_central = 0.25) # This is equivalent to lmtest::gqtest(mtcars_lm, fraction = 0.25, order.by = mtcars$qsec) goldfeld_quandt(mtcars_lm, deflator = "qsec", method = "nonparametric", restype = "blus") goldfeld_quandt(mtcars_lm, deflator = "qsec", prop_central = 0.25, alternative = "two.sided") goldfeld_quandt(mtcars_lm, deflator = "qsec", method = "nonparametric", restype = "blus", alternative = "two.sided")
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) goldfeld_quandt(mtcars_lm, deflator = "qsec", prop_central = 0.25) # This is equivalent to lmtest::gqtest(mtcars_lm, fraction = 0.25, order.by = mtcars$qsec) goldfeld_quandt(mtcars_lm, deflator = "qsec", method = "nonparametric", restype = "blus") goldfeld_quandt(mtcars_lm, deflator = "qsec", prop_central = 0.25, alternative = "two.sided") goldfeld_quandt(mtcars_lm, deflator = "qsec", method = "nonparametric", restype = "blus", alternative = "two.sided")
Golden Section Search (GSS) is a useful algorithm for minimising a
continuous univariate function over an interval
in instances where the first derivative
is not easily obtainable, such as with loss functions
that need to be minimised under cross-validation to tune a
hyperparameter in a machine learning model. The method is described
by Fox (2021).
GSS(f, a, b, tol = 1e-08, maxitgss = 100L, ...)
GSS(f, a, b, tol = 1e-08, maxitgss = 100L, ...)
f |
A function of one variable that returns a numeric value |
a |
A numeric of length 1 representing the lower endpoint of the search interval |
b |
A numeric of length 1 representing the upper endpoint of the
search interval; must be greater than |
tol |
A numeric of length 1 representing the tolerance used to determine when the search interval has been narrowed sufficiently for convergence |
maxitgss |
An integer of length 1 representing the maximum number of iterations to use in the search |
... |
Additional arguments to pass to |
This function is modelled after a MATLAB function written by (). The method assumes that there is one local minimum in this interval. The solution produced is also an interval, the width of which can be made arbitrarily small by setting a tolerance value. Since the desired solution is a single point, the midpoint of the final interval can be taken as the best approximation to the solution.
The premise of the method is to shorten the interval
by comparing the values of the function at two
test points,
, where
and
, and
is the
reciprocal of the golden ratio. One compares
to
and updates the search interval
as follows:
If , the solution cannot lie in
;
thus, update the search interval to
If , the solution cannot lie in
;
thus, update the search interval to
One then chooses two new test points by replacing and
with
and
and recalculating
and
based on these new endpoints. One continues iterating in
this fashion until
, where
is the desired
tolerance.
A list object containing the following:
argmin
, the argument of f
that minimises f
funmin
, the minimum value of f
achieved at argmin
converged
, a logical indicating whether the convergence tolerance
was satisfied
iterations
, an integer indicating the number of search iterations
used
(????).
Golden Section Method Algorithm, author=Katarzyna Zarnowiec, organization=MATLAB Central File Exchange, url=https://www.mathworks.com/matlabcentral/fileexchange/25919-golden-section-method-algorithm, urldate=2022-10-19, year=2022,.
Fox WP (2021).
Nonlinear Optimization: Models and Applications, 1st edition.
Chapman and Hall/CRC, Boca Raton, FL.
goldsect
is similar to this function, but does
not allow the user to pass additional arguments to f
f <- function(x) (x - 1) ^ 2 GSS(f, a = 0, b = 2)
f <- function(x) (x - 1) ^ 2 GSS(f, a = 0, b = 2)
This function implements the method of Harrison and McCabe (1979) for testing for heteroskedasticity in a linear regression model.
harrison_mccabe( mainlm, deflator = NA, m = 0.5, alternative = c("less", "greater", "two.sided"), twosidedmethod = c("doubled", "kulinskaya"), qfmethod = "imhof", statonly = FALSE )
harrison_mccabe( mainlm, deflator = NA, m = 0.5, alternative = c("less", "greater", "two.sided"), twosidedmethod = c("doubled", "kulinskaya"), qfmethod = "imhof", statonly = FALSE )
mainlm |
Either an object of |
deflator |
Either a character specifying a column name from the
design matrix of |
m |
Either a |
alternative |
A character specifying the form of alternative
hypothesis. If it is suspected that the error variance is positively
associated with the deflator variable, |
twosidedmethod |
A character indicating the method to be used to compute
two-sided |
qfmethod |
A character, either |
statonly |
A logical. If |
The test assumes that heteroskedasticity, if present, is
monotonically related to one of the explanatory variables (known as the
deflator). The OLS residuals are placed in increasing order of
the deflator variable and we let
be an
selector
matrix whose first
diagonal elements are 1 and all other elements
are 0. The alternative hypothesis posits that the error variance changes
around index
. Under the null hypothesis of homoskedasticity, the
ratio of quadratic forms
should be close to
. Since the test statistic
is a ratio of
quadratic forms in the errors, the Imhof algorithm is used to compute
-values (with normality of errors assumed).
An object of class
"htest"
. If object is
not assigned, its attributes are displayed in the console as a
tibble
using tidy
.
Harrison MJ, McCabe BPM (1979). “A Test for Heteroscedasticity Based on Ordinary Least Squares Residuals.” Journal of the American Statistical Association, 74(366), 494–499.
lmtest::hmctest
, another
implementation of the Harrison-McCabe Test. Note that the -values
from that function are simulated rather than computed from the
distribution of a ratio of quadratic forms in normal random vectors.
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) harrison_mccabe(mtcars_lm, deflator = "qsec")
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) harrison_mccabe(mtcars_lm, deflator = "qsec")
This function implements the method of Harvey (1976) for testing for "multiplicative" heteroskedasticity in a linear regression model. Mittelhammer et al. (2000) gives the formulation of the test used here.
harvey(mainlm, auxdesign = NA, statonly = FALSE)
harvey(mainlm, auxdesign = NA, statonly = FALSE)
mainlm |
Either an object of |
auxdesign |
A |
statonly |
A logical. If |
Harvey's Test entails fitting an auxiliary regression model in
which the response variable is the log of the vector of squared
residuals from the original model and the design matrix
consists of one or more exogenous variables that are suspected of being
related to the error variance. In the absence of prior information on
a possible choice of
, one would typically use the explanatory
variables from the original model. Under the null hypothesis of
homoskedasticity, the distribution of the test statistic is
asymptotically chi-squared with
parameter
degrees of freedom.
The test is right-tailed.
An object of class
"htest"
. If object is
not assigned, its attributes are displayed in the console as a
tibble
using tidy
.
Harvey AC (1976).
“Estimating Regression Models with Multiplicative Heteroscedasticity.”
Econometrica, 44(3), 461–465.
Mittelhammer RC, Judge GG, Miller DJ (2000).
Econometric Foundations.
Cambridge University Press, Cambridge.
the description of the test in SHAZAM software (which produces identical results).
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) harvey(mtcars_lm) harvey(mtcars_lm, auxdesign = "fitted.values")
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) harvey(mtcars_lm) harvey(mtcars_lm, auxdesign = "fitted.values")
Computes an estimate of the covariance matrix
(assumed to be diagonal) of the random error vector of a linear
regression model, using a specified method
hccme( mainlm, hcnum = c("3", "0", "1", "2", "4", "5", "6", "7", "4m", "5m", "const"), sandwich = FALSE, as_matrix = TRUE )
hccme( mainlm, hcnum = c("3", "0", "1", "2", "4", "5", "6", "7", "4m", "5m", "const"), sandwich = FALSE, as_matrix = TRUE )
mainlm |
Either an object of |
hcnum |
A character corresponding to a subscript in the name of an
HCCME according to the usual nomenclature
|
sandwich |
A logical, defaulting to
should be returned instead of |
as_matrix |
A logical, defaulting to |
A numeric matrix (if as_matrix
is TRUE
) or else a
numeric vector
Aftab N, Chand S (2016).
“A New Heteroskedastic Consistent Covariance Matrix Estimator Using Deviance Measure.”
Pakistan Journal of Statistics and Operations Research, 12(2), 235–244.
Aftab N, Chand S (2018).
“A Simulation-Based Evidence on the Improved Performance of a New Modified Leverage Adjusted Heteroskedastic Consistent Covariance Matrix Estimator in the Linear Regression Model.”
Kuwait Journal of Science, 45(3), 29–38.
Cribari-Neto F (2004).
“Asymptotic Inference under Heteroskedasticity of Unknown Form.”
Computational Statistics & Data Analysis, 45, 215–233.
Cribari-Neto F, Souza TC, Vasconcellos KLP (2007).
“Inference under Heteroskedasticity and Leveraged Data.”
Communications in Statistics - Theory and Methods, 36(10), 1877–1888.
Cribari-Neto F, da Silva WB (2011).
“A New Heteroskedasticity-Consistent Covariance Matrix Estimator for the Linear Regression Model.”
Advances in Statistical Analysis, 95(2), 129–146.
Li S, Zhang N, Zhang X, Wang G (2017).
“A New Heteroskedasticity-Consistent Covariance Matrix Estimator and Inference under Heteroskedasticity.”
Journal of Statistical Computation and Simulation, 87(1), 198–210.
MacKinnon JG, White H (1985).
“Some Heteroskedasticity-Consistent Covariance Matrix Estimators with Improved Finite Sample Properties.”
Journal of Econometrics, 29(3), 305–325.
White H (1980).
“A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity.”
Econometrica, 48(4), 817–838.
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) Omega_hat <- hccme(mtcars_lm, hcnum = "4") Cov_beta_hat <- hccme(mtcars_lm, hcnum = "4", sandwich = TRUE)
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) Omega_hat <- hccme(mtcars_lm, hcnum = "4") Cov_beta_hat <- hccme(mtcars_lm, hcnum = "4", sandwich = TRUE)
This function creates various two-dimensional scatter plots that can aid in detecting heteroskedasticity in a linear regression model.
hetplot( mainlm, horzvar = "index", vertvar = "res", vertfun = "identity", filetype = NA, values = FALSE, ... )
hetplot( mainlm, horzvar = "index", vertvar = "res", vertfun = "identity", filetype = NA, values = FALSE, ... )
mainlm |
Either an object of |
horzvar |
A character vector describing the variable(s) to plot on
horizontal axes ( |
vertvar |
A character vector describing the variable to plot on the
vertical axis ( |
vertfun |
A character vector giving the name of a function to apply to
the |
filetype |
A character giving the type of image file to which the
plot(s) should be written. Values can be |
values |
A logical. Should the sequences corresponding to the
horizontal and vertical variable(s) be returned in a |
... |
Arguments to be passed to methods, such as graphical parameters
(see |
The variable plotted on the horizontal axis could be the original
data indices, one of the explanatory variables, the OLS predicted
(fitted) values, or any other numeric vector specified by the user. The
variable plotted on the vertical axis is some function of the OLS
residuals or transformed version thereof such as the BLUS residuals
Theil (1968) or standardised or studentised
residuals as discussed in Cook and Weisberg (1983). A
separate plot is created for each (horzvar
, vertvar
,
vertfun
) combination.
A list containing two data frames
, one
for vectors plotted on horizontal axes and one for vectors plotted
on vertical axes.
Cook RD, Weisberg S (1983).
“Diagnostics for Heteroscedasticity in Regression.”
Biometrika, 70(1), 1–10.
Theil H (1968).
“A Simplification of the BLUS Procedure for Analyzing Regression Disturbances.”
Journal of the American Statistical Association, 63(321), 242–251.
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) # Generates 2 x 2 matrix of plots in console hetplot(mtcars_lm, horzvar = c("index", "fitted.values"), vertvar = c("res_blus"), vertfun = c("2", "abs"), filetype = NA) # Generates 84 png files in tempdir() folder ## Not run: hetplot(mainlm = mtcars_lm, horzvar = c("explanatory", "log_explanatory", "fitted.values2"), vertvar = c("res", "res_stand", "res_stud", "res_constvar"), vertfun = c("identity", "abs", "2"), filetype = "png") ## End(Not run)
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) # Generates 2 x 2 matrix of plots in console hetplot(mtcars_lm, horzvar = c("index", "fitted.values"), vertvar = c("res_blus"), vertfun = c("2", "abs"), filetype = NA) # Generates 84 png files in tempdir() folder ## Not run: hetplot(mainlm = mtcars_lm, horzvar = c("explanatory", "log_explanatory", "fitted.values2"), vertvar = c("res", "res_stand", "res_stud", "res_constvar"), vertfun = c("identity", "abs", "2"), filetype = "png") ## End(Not run)
This function implements the two-sided LM Test of Honda (1989) for testing for heteroskedasticity in a linear regression model.
honda( mainlm, deflator = NA, alternative = c("two.sided", "greater", "less"), twosidedmethod = c("doubled", "kulinskaya"), qfmethod = "imhof", statonly = FALSE )
honda( mainlm, deflator = NA, alternative = c("two.sided", "greater", "less"), twosidedmethod = c("doubled", "kulinskaya"), qfmethod = "imhof", statonly = FALSE )
mainlm |
Either an object of |
deflator |
Either a character specifying a column name from the
design matrix of |
alternative |
A character specifying the form of alternative
hypothesis. If it is suspected that the error variance is positively
associated with the deflator variable, |
twosidedmethod |
A character indicating the method to be used to compute
two-sided |
qfmethod |
A character, either |
statonly |
A logical. If |
The test assumes that heteroskedasticity, if present, would be
either of the form or of the
form
, where
where
is a deflator (a nonstochastic variable
suspected of being related to the error variance),
is
some unknown constant, and
is an unknown parameter
representing the degree of heteroskedasticity. Since the test
statistic
is a ratio of quadratic forms
in the errors, the Imhof algorithm is used to compute
-values.
Since the null distribution is in general asymmetrical, the two-sided
-value is computed using the conditional method of
Kulinskaya (2008), setting
.
An object of class
"htest"
. If object is
not assigned, its attributes are displayed in the console as a
tibble
using tidy
.
Honda Y (1989).
“On the Optimality of Some Tests of the Error Covariance Matrix in the Linear Regression Model.”
Journal of the Royal Statistical Society. Series B (Statistical Methodology), 51(1), 71–79.
Kulinskaya E (2008).
“On Two-Sided p-Values for Non-Symmetric Distributions.”
0810.2124, 0810.2124.
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) # In this example, only the test statistic is returned, to save time honda(mtcars_lm, deflator = "qsec", statonly = TRUE)
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) # In this example, only the test statistic is returned, to save time honda(mtcars_lm, deflator = "qsec", statonly = TRUE)
This function implements the nonparametric test of Horn (1981) for testing for heteroskedasticity in a linear regression model.
horn( mainlm, deflator = NA, restype = c("ols", "blus"), alternative = c("two.sided", "greater", "less"), exact = (nres <= 10), statonly = FALSE, ... )
horn( mainlm, deflator = NA, restype = c("ols", "blus"), alternative = c("two.sided", "greater", "less"), exact = (nres <= 10), statonly = FALSE, ... )
mainlm |
Either an object of |
deflator |
Either a character specifying a column name from the
design matrix of |
restype |
A character specifying which residuals to use: |
alternative |
A character specifying the form of alternative
hypothesis; one of |
exact |
A logical. Should exact |
statonly |
A logical. If |
... |
Optional further arguments to pass to |
The test entails specifying a 'deflator', an explanatory variable
suspected of being related to the error variance. Residuals are ordered
by the deflator and the nonparametric trend statistic
proposed by
Lehmann (1975) is
then computed on the absolute residuals and used to test for an
increasing or decreasing trend, either of which would correspond to
heteroskedasticity. Exact probabilities for the null distribution of
can be obtained from functions
dDtrend
and
pDtrend
, but since computation time increases rapidly with
, use of a normal approximation is recommended for
.
Lehmann (1975) proves that
is
asymptotically normally distributed and the approximation of the
statistic
to the standard normal
distribution is already quite good for
.
The expectation and variance of (when ties are absent) are
respectively
and
; see
Lehmann (1975) for
and
when ties are present. When ties are absent, a continuity correction
is used to improve the normal approximation. When
exact distribution is used, two-sided
-value is computed by
doubling the one-sided
-value, since the distribution of
is symmetric. The function does not support the exact distribution of
in the presence of ties, so in this case the normal approximation
is used regardless of
.
An object of class
"htest"
. If object is
not assigned, its attributes are displayed in the console as a
tibble
using tidy
.
Horn P (1981).
“Heteroscedasticity of Residuals: A Non-Parametric Alternative to the Goldfeld-Quandt Peak Test.”
Communications in Statistics - Theory and Methods, 10(8), 795–808.
Lehmann EL (1975).
Nonparametrics: Statistical Methods Based on Ranks.
Holden-Day, San Francisco.
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) horn(mtcars_lm, deflator = "qsec") horn(mtcars_lm, deflator = "qsec", restype = "blus")
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) horn(mtcars_lm, deflator = "qsec") horn(mtcars_lm, deflator = "qsec", restype = "blus")
This function implements the two methods of Li and Yao (2019) for testing for heteroskedasticity in a linear regression model.
li_yao(mainlm, method = c("cvt", "alrt"), baipanyin = TRUE, statonly = FALSE)
li_yao(mainlm, method = c("cvt", "alrt"), baipanyin = TRUE, statonly = FALSE)
mainlm |
Either an object of |
method |
A character indicating which of the two tests derived in
Li and Yao (2019) should be implemented. Possible
values are |
baipanyin |
A logical. Should the central limit theorem of
Bai et al. (2016) be used to determine the
|
statonly |
A logical. If |
These two tests are straightforward to implement; in both cases the test statistic is a function only of the residuals of the linear regression model. Furthermore, in both cases the test statistic is asymptotically normally distributed under the null hypothesis of homoskedasticity. Both tests are right-tailed. These tests are designed to be especially powerful in high-dimensional regressions, i.e. when the number of explanatory variables is large.
An object of class
"htest"
. If object is
not assigned, its attributes are displayed in the console as a
tibble
using tidy
.
Bai Z, Pan G, Yin Y (2016).
“Homoscedasticity Tests for Both Low and High-Dimensional Fixed Design Regressions.”
1603.03830, 1603.03830.
Li Z, Yao J (2019).
“Testing for Heteroscedasticity in High-Dimensional Regressions.”
Econometrics and Statistics, 9, 122–139.
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) li_yao(mtcars_lm, method = "alrt") li_yao(mtcars_lm, method = "cvt") li_yao(mtcars_lm, method = "cvt", baipanyin = FALSE) # Same as first example li_yao(list("e" = mtcars_lm$residuals), method = "alrt")
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) li_yao(mtcars_lm, method = "alrt") li_yao(mtcars_lm, method = "cvt") li_yao(mtcars_lm, method = "cvt", baipanyin = FALSE) # Same as first example li_yao(list("e" = mtcars_lm$residuals), method = "alrt")
This function computes (
), i.e.
lower (upper) cumulative probabilities for
, the nonparametric trend statistic
proposed by Lehmann (1975), under the assumption
that the ranks
are computed on a series of
independent and
identically distributed random variables with no ties. The function may be
used to compute one-sided
-values for the nonparametric test for
heteroskedasticity of Horn (1981). Computation
time is extremely slow for
if
usedata
is set to
FALSE
; thus a normal approximation is implemented, including a
continuity correction.
pDtrend( k, n, lower.tail = TRUE, exact = (n <= 10), tiefreq = NA, override = FALSE )
pDtrend( k, n, lower.tail = TRUE, exact = (n <= 10), tiefreq = NA, override = FALSE )
k |
An integer of |
n |
A positive integer representing the number of observations in the series. |
lower.tail |
A logical. Should lower tailed cumulative probability be
calculated? Defaults to |
exact |
A logical. Should exact distribution of |
tiefreq |
A double vector corresponding to the value of |
override |
A logical. By default, the |
A double between 0 and 1 representing the probability/ies of
taking on at least (at most) the value(s) in the
names
attribute.
Horn P (1981).
“Heteroscedasticity of Residuals: A Non-Parametric Alternative to the Goldfeld-Quandt Peak Test.”
Communications in Statistics - Theory and Methods, 10(8), 795–808.
Lehmann EL (1975).
Nonparametrics: Statistical Methods Based on Ranks.
Holden-Day, San Francisco.
# For an independent sample of size 6, the probability that D is <= 50 is # 0.8222 pDtrend(k = 50, n = 6) # Normal approximation of the above with continuity correction is # 0.8145 pDtrend(k = 50, n = 6, exact = FALSE)
# For an independent sample of size 6, the probability that D is <= 50 is # 0.8222 pDtrend(k = 50, n = 6) # Normal approximation of the above with continuity correction is # 0.8145 pDtrend(k = 50, n = 6, exact = FALSE)
This function computes , i.e. the probability that a
sequence of
independent and identically distributed random variables
contains
peaks, with peaks as defined in
Goldfeld and Quandt (1965). The function may be used to
compute
-values for the Goldfeld-Quandt nonparametric test for
heteroskedasticity in a linear model. Computation time is very slow for
if
usedata
is set to FALSE
.
ppeak(k, n, lower.tail = FALSE, usedata = TRUE)
ppeak(k, n, lower.tail = FALSE, usedata = TRUE)
k |
An integer or a sequence of integers strictly incrementing by 1,
with all values between 0 and |
n |
A positive integer representing the number of observations in the sequence. |
lower.tail |
A logical. Should lower tailed cumulative probability be
calculated? Defaults to |
usedata |
A logical. Should probability mass function values be
read from |
A double between 0 and 1 representing the probability of at least
(at most) k peaks occurring in a sequence of independent and
identically distributed continuous random variables. The double has a
names
attribute with the values corresponding to the
probabilities.
Goldfeld SM, Quandt RE (1965). “Some Tests for Homoscedasticity.” Journal of the American Statistical Association, 60(310), 539–547.
# For an independent sample of size 250, the probability of at least 10 # peaks is 0.02650008 ppeak(k = 10, n = 250, lower.tail = FALSE, usedata = TRUE) # For an independent sample of size 10, the probability of at most 2 peaks # is 0.7060615 ppeak(k = 2, n = 10, lower.tail = TRUE, usedata = FALSE)
# For an independent sample of size 250, the probability of at least 10 # peaks is 0.02650008 ppeak(k = 10, n = 250, lower.tail = FALSE, usedata = TRUE) # For an independent sample of size 10, the probability of at most 2 peaks # is 0.7060615 ppeak(k = 2, n = 10, lower.tail = TRUE, usedata = FALSE)
This function computes cumulative probabilities (lower or upper tail) on a ratio of quadratic forms in a vector of normally distributed random variables.
pRQF( r, A, B, Sigma = diag(nrow(A)), algorithm = c("imhof", "davies", "integrate"), lower.tail = TRUE, usenames = FALSE )
pRQF( r, A, B, Sigma = diag(nrow(A)), algorithm = c("imhof", "davies", "integrate"), lower.tail = TRUE, usenames = FALSE )
r |
A double representing the value(s) for which |
A |
A numeric, symmetric matrix that is symmetric |
B |
A numeric, symmetric, non-negative definite matrix having the same
dimensions as |
Sigma |
A numeric, symmetric matrix with the same dimensions as
|
algorithm |
A character, either |
lower.tail |
A logical. If |
usenames |
A logical. If |
Most of the work is done by other functions, namely
imhof
, davies
,
or integrate
(depending on the algorithm
argument). It is assumed that the ratio of quadratic forms can be
expressed as
where is an
-dimensional normally distributed random variable with mean vector
and covariance matrix
, and
and
are real-valued, symmetric
matrices. Matrix
must be non-negative definite to ensure that the denominator of
the ratio of quadratic forms is nonzero.
The function makes use of the fact that a probability statement involving a
ratio of quadratic forms can be rewritten as a probability statement
involving a quadratic form. Hence, methods for computing probabilities
for a quadratic form in normal random variables, such as the Imhof
algorithm (Imhof 1961) or the Davies algorithm
(Davies 1980) can be applied to the rearranged
expression to obtain the probability for the ratio of quadratic forms.
Note that the Ruben-Farebrother algorithm (as implemented in
farebrother
) cannot be used here because the
matrix within the quadratic form (after rearrangement of the
probability statement involving a ratio of quadratic forms) is not in
general positive semi-definite.
A double denoting the probability/ies corresponding to the value(s)
r
.
Davies RB (1980).
“Algorithm AS 155: The Distribution of a Linear Combination of Random Variables.”
Journal of the Royal Statistical Society. Series C (Applied Statistics), 29, 323–333.
Imhof JP (1961).
“Computing the Distribution of Quadratic Forms in Normal Variables.”
Biometrika, 48(3/4), 419–426.
Duchesne and de Micheaux (2010), the article associated
with the imhof
and
davies
functions.
n <- 20 A <- matrix(data = 1, nrow = n, ncol = n) B <- diag(n) pRQF(r = 1, A = A, B = B) pRQF(r = 1, A = A, B = B, algorithm = "integrate") pRQF(r = 1:3, A = A, B = B, algorithm = "davies")
n <- 20 A <- matrix(data = 1, nrow = n, ncol = n) B <- diag(n) pRQF(r = 1, A = A, B = B) pRQF(r = 1, A = A, B = B, algorithm = "integrate") pRQF(r = 1:3, A = A, B = B, algorithm = "davies")
This function implements the two methods of Rackauskas and Zuokas (2007) for testing for heteroskedasticity in a linear regression model.
rackauskas_zuokas( mainlm, alpha = 0, pvalmethod = c("data", "sim"), R = 2^14, m = 2^17, sqZ = FALSE, seed = 1234, statonly = FALSE )
rackauskas_zuokas( mainlm, alpha = 0, pvalmethod = c("data", "sim"), R = 2^14, m = 2^17, sqZ = FALSE, seed = 1234, statonly = FALSE )
mainlm |
Either an object of |
alpha |
A double such that |
pvalmethod |
A character, either |
R |
An integer representing the number of Monte Carlo replicates to
generate, if |
m |
An integer representing the number of standard normal variates to
use when generating the Brownian Bridge for each replicate, if
|
sqZ |
A logical. If |
seed |
An integer representing the seed to be used for pseudorandom
number generation when simulating values from the asymptotic null
distribution. This is to provide reproducibility of test results.
Ignored if |
statonly |
A logical. If |
Rackauskas and Zuokas propose a class of tests that entails
determining the largest weighted difference in variance of estimated
error. The asymptotic behaviour of their test statistic
is studied using the empirical polygonal process
constructed from partial sums of the squared residuals. The test is
right-tailed.
An object of class
"htest"
. If object is
not assigned, its attributes are displayed in the console as a
tibble
using tidy
.
Rackauskas A, Zuokas D (2007). “New Tests of Heteroskedasticity in Linear Regression Model.” Lithuanian Mathematical Journal, 47(3), 248–265.
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) rackauskas_zuokas(mtcars_lm) rackauskas_zuokas(mtcars_lm, alpha = 7 / 16) n <- length(mtcars_lm$residuals) rackauskas_zuokas(mtcars_lm, pvalmethod = "sim", m = n, sqZ = TRUE)
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) rackauskas_zuokas(mtcars_lm) rackauskas_zuokas(mtcars_lm, alpha = 7 / 16) n <- length(mtcars_lm$residuals) rackauskas_zuokas(mtcars_lm, pvalmethod = "sim", m = n, sqZ = TRUE)
This function implements the modified profile likelihood ratio test and score test of Simonoff and Tsai (1994) for testing for heteroskedasticity in a linear regression model.
simonoff_tsai( mainlm, auxdesign = NA, method = c("mlr", "score"), hetfun = c("mult", "add", "logmult"), basetest = c("koenker", "cook_weisberg"), bartlett = TRUE, optmethod = "Nelder-Mead", statonly = FALSE, ... )
simonoff_tsai( mainlm, auxdesign = NA, method = c("mlr", "score"), hetfun = c("mult", "add", "logmult"), basetest = c("koenker", "cook_weisberg"), bartlett = TRUE, optmethod = "Nelder-Mead", statonly = FALSE, ... )
mainlm |
Either an object of |
auxdesign |
A |
method |
A character specifying which of the tests proposed in
Simonoff and Tsai (1994) to implement. |
hetfun |
A character describing the form of |
basetest |
A character specifying the base test statistic which is
robustified using the added term described in Details. |
bartlett |
A logical specifying whether a Bartlett correction should be
made, as per Ferrari et al. (2004), to improve the
fit of the test statistic to the asymptotic null distribution. This
argument is only applicable where |
optmethod |
A character specifying the optimisation method to use with
|
statonly |
A logical. If |
... |
Optional arguments to pass to |
The Simonoff-Tsai Likelihood Ratio Test involves a modification of
the profile likelihood function so that the nuisance parameter will be
orthogonal to the parameter of interest. The maximum likelihood estimate
of (called
in
Simonoff and Tsai (1994)) is computed from the modified
profile log-likelihood function using the Nelder-Mead algorithm in
optim
. Under the null hypothesis of
homoskedasticity, the distribution of the test statistic is
asymptotically chi-squared with degrees of freedom. The test is
right-tailed.
The Simonoff-Tsai Score Test entails adding a term to either the score
statistic of Cook and Weisberg (1983) (a test implemented
in cook_weisberg
) or to that of
Koenker (1981) (a test implemented in
breusch_pagan
with argument koenker
set to
TRUE
), in order to improve the robustness of these respective
tests in the presence of non-normality. This test likewise has a test
statistic that is asymptotically -distributed and the test
is likewise right-tailed.
The assumption of underlying both tests is that
, where
is
an
diagonal matrix with
th diagonal element
. Here,
is the
th row of an
nonstochastic auxiliary design matrix
. Note:
as defined here does not have a column of ones, but is
concatenated to a column of ones when used in an auxiliary regression.
is a
-vector of unknown parameters, and
is a real-valued, twice-differentiable function having the
property that there exists some
for which
for all
. Thus, the null
hypothesis of homoskedasticity may be expressed as
.
In the score test, the added term in the test statistic is of the form
,
where is the
th element of the Jacobian matrix
evaluated at
:
,
and , where
is the
-vector whose
th element is
,
, and
.
An object of class
"htest"
. If object is
not assigned, its attributes are displayed in the console as a
tibble
using tidy
.
Cook RD, Weisberg S (1983).
“Diagnostics for Heteroscedasticity in Regression.”
Biometrika, 70(1), 1–10.
Ferrari SL, Cysneiros AH, Cribari-Neto F (2004).
“An Improved Test for Heteroskedasticity Using Adjusted Modified Profile Likelihood Inference.”
Journal of Statistical Planning and Inference, 124, 423–437.
Griffiths WE, Surekha K (1986).
“A Monte Carlo Evaluation of the Power of Some Tests for Heteroscedasticity.”
Journal of Econometrics, 31(1), 219–231.
Koenker R (1981).
“A Note on Studentizing a Test for Heteroscedasticity.”
Journal of Econometrics, 17, 107–112.
Simonoff JS, Tsai C (1994).
“Use of Modified Profile Likelihood for Improved Tests of Constancy of Variance in Regression.”
Journal of the Royal Statistical Society. Series C (Applied Statistics), 43(2), 357–370.
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) simonoff_tsai(mtcars_lm, method = "score") simonoff_tsai(mtcars_lm, method = "score", basetest = "cook_weisberg") simonoff_tsai(mtcars_lm, method = "mlr") simonoff_tsai(mtcars_lm, method = "mlr", bartlett = FALSE) ## Not run: simonoff_tsai(mtcars_lm, auxdesign = data.frame(mtcars$wt, mtcars$qsec), method = "mlr", hetfun = "logmult") ## End(Not run)
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) simonoff_tsai(mtcars_lm, method = "score") simonoff_tsai(mtcars_lm, method = "score", basetest = "cook_weisberg") simonoff_tsai(mtcars_lm, method = "mlr") simonoff_tsai(mtcars_lm, method = "mlr", bartlett = FALSE) ## Not run: simonoff_tsai(mtcars_lm, auxdesign = data.frame(mtcars$wt, mtcars$qsec), method = "mlr", hetfun = "logmult") ## End(Not run)
This function implements the method of Szroeter (1978) for testing for heteroskedasticity in a linear regression model.
szroeter(mainlm, deflator = NA, h = SKH, qfmethod = "imhof", statonly = FALSE)
szroeter(mainlm, deflator = NA, h = SKH, qfmethod = "imhof", statonly = FALSE)
mainlm |
Either an object of |
deflator |
Either a character specifying a column name from the
design matrix of |
h |
A non-decreasing function taking as its argument the index
|
qfmethod |
A character, either |
statonly |
A logical. If |
The test entails putting the data rows in increasing order of some specified deflator (e.g., one of the explanatory variables) that is believed to be related to the error variance by some non-decreasing function. The test statistic is a ratio of quadratic forms in the OLS residuals. It is a right-tailed test.
An object of class
"htest"
. If object is
not assigned, its attributes are displayed in the console as a
tibble
using tidy
.
Szroeter J (1978). “A Class of Parametric Tests for Heteroscedasticity in Linear Econometric Models.” Econometrica, 46(6), 1311–1327.
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) szroeter(mtcars_lm, deflator = "qsec")
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) szroeter(mtcars_lm, deflator = "qsec")
A matrix of rows and 16 columns. Each column contains
Monte Carlo replicates from the distribution of
for a particular value of
,
,
. The values were generated by first generating a
Brownian Bridge using
standard normal variates and then
applying Equation (11) from Rackauskas and Zuokas (2007).
It can be used to compute empirical approximate
-values for
implementation of the Rackauskas-Zuokas Test for heteroskedasticity. This
is a time-saving measure because, while
rackauskas_zuokas
contains an option for simulating the -value directly, this would be
computationally intensive for the authors' recommended
of
. Passing the argument
pvalmethod = "data"
to
rackauskas_zuokas
instructs the function to use the
pre-generated values in this data set to compute the empirical approximate
-value for the test.
T_alpha
T_alpha
An object of class matrix
(inherits from array
) with 16384 rows and 16 columns.
-ValuesComputes the conditional -value
for a continuous
or discrete test statistic, as defined in
Kulinskaya (2008). This provides a method
for computing a two-sided
-value from an asymmetric null
distribution.
twosidedpval( q, CDF, continuous, method = c("doubled", "kulinskaya", "minlikelihood"), locpar, supportlim = c(-Inf, Inf), ... )
twosidedpval( q, CDF, continuous, method = c("doubled", "kulinskaya", "minlikelihood"), locpar, supportlim = c(-Inf, Inf), ... )
q |
A double representing the quantile, i.e. the observed value of the
test statistic for which a two-sided |
CDF |
A function representing the cumulative distribution function of
the test statistic under the null hypothesis, i.e.
|
continuous |
A logical indicating whether the test statistic is a
continuous ( |
method |
A character specifying the method to use to calculate
two-sided |
locpar |
a double representing a generic location parameter chosen to
separate the tails of the distribution. Note that if |
supportlim |
A numeric vector of |
... |
Optional arguments to pass to |
Let be a statistic that, under the null hypothesis, has
cumulative distribution function
and probability density or mass
function
. Denote by
a generic location parameter chosen
to separate the two tails of the distribution. Particular examples
include the mean
, the mode
, or the median
. Let
be the observed value
of
.
In the continuous case, the conditional two-sided -value centered
at
is defined as
where is the indicator function. In the discrete case,
depends on whether
is an attainable value within the
support of
. If
is not attainable, the conditional two-sided
-value centred at
is defined as
If is attainable, the conditional two-sided
-value centred
at
is defined as
A double.
Kulinskaya E (2008). “On Two-Sided p-Values for Non-Symmetric Distributions.” 0810.2124, 0810.2124.
# Computation of two-sided p-value for F test for equality of variances n1 <- 10 n2 <- 20 set.seed(1234) x1 <- stats::rnorm(n1, mean = 0, sd = 1) x2 <- stats::rnorm(n2, mean = 0, sd = 3) # 'Conventional' two-sided p-value obtained by doubling one-sided p-value: stats::var.test(x1, x2, alternative = "two.sided")$p.value # This is replicated in `twosidedpval` by setting `method` argument to `"doubled"` twosidedpval(q = var(x1) / var(x2), CDF = stats::pf, continuous = TRUE, method = "doubled", locpar = 1, df1 = n1 - 1, df2 = n2 - 1) # Conditional two-sided p-value centered at df (mean of chi-squared r.v.): twosidedpval(q = var(x1) / var(x2), CDF = stats::pf, continuous = TRUE, method = "kulinskaya", locpar = 1, df1 = n1 - 1, df2 = n2 - 1)
# Computation of two-sided p-value for F test for equality of variances n1 <- 10 n2 <- 20 set.seed(1234) x1 <- stats::rnorm(n1, mean = 0, sd = 1) x2 <- stats::rnorm(n2, mean = 0, sd = 3) # 'Conventional' two-sided p-value obtained by doubling one-sided p-value: stats::var.test(x1, x2, alternative = "two.sided")$p.value # This is replicated in `twosidedpval` by setting `method` argument to `"doubled"` twosidedpval(q = var(x1) / var(x2), CDF = stats::pf, continuous = TRUE, method = "doubled", locpar = 1, df1 = n1 - 1, df2 = n2 - 1) # Conditional two-sided p-value centered at df (mean of chi-squared r.v.): twosidedpval(q = var(x1) / var(x2), CDF = stats::pf, continuous = TRUE, method = "kulinskaya", locpar = 1, df1 = n1 - 1, df2 = n2 - 1)
This function implements the residual maximum likelihood test of Verbyla (1993) for testing for heteroskedasticity in a linear regression model.
verbyla(mainlm, auxdesign = NA, statonly = FALSE)
verbyla(mainlm, auxdesign = NA, statonly = FALSE)
mainlm |
Either an object of |
auxdesign |
A |
statonly |
A logical. If |
Verbyla's Test entails fitting a generalised auxiliary regression
model in which the response variable is the vector of standardised
squared residuals from the original OLS model
and the design matrix is some function of
, an
matrix consisting of
exogenous variables, appended to a column of
ones. The test statistic is half the residual sum of squares from this
generalised auxiliary regression. Under the null hypothesis of
homoskedasticity, the distribution of the test statistic is
asymptotically chi-squared with
degrees of freedom. The test is
right-tailed.
An object of class
"htest"
. If object is
not assigned, its attributes are displayed in the console as a
tibble
using tidy
.
Verbyla AP (1993). “Modelling Variance Heterogeneity: Residual Maximum Likelihood and Diagnostics.” Journal of the Royal Statistical Society. Series B (Statistical Methodology), 55(2), 493–508.
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) verbyla(mtcars_lm) verbyla(mtcars_lm, auxdesign = "fitted.values")
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) verbyla(mtcars_lm) verbyla(mtcars_lm, auxdesign = "fitted.values")
This function implements the popular method of White (1980) for testing for heteroskedasticity in a linear regression model.
white(mainlm, interactions = FALSE, statonly = FALSE)
white(mainlm, interactions = FALSE, statonly = FALSE)
mainlm |
Either an object of |
interactions |
A logical. Should two-way interactions between explanatory
variables be included in the auxiliary regression? Defaults to
|
statonly |
A logical. If |
White's Test entails fitting an auxiliary regression model in which the response variable is the vector of squared residuals from the original model and the design matrix includes the original explanatory variables, their squares, and (optionally) their two-way interactions. The test statistic is the number of observations multiplied by the coefficient of determination from the auxiliary regression model:
White's Test is thus a special case of the method of
Breusch and Pagan (1979). Under the null hypothesis of
homoskedasticity, the distribution of the test statistic is
asymptotically chi-squared with parameter
degrees of freedom.
The test is right-tailed.
An object of class
"htest"
. If object is
not assigned, its attributes are displayed in the console as a
tibble
using tidy
.
Breusch TS, Pagan AR (1979).
“A Simple Test for Heteroscedasticity and Random Coefficient Variation.”
Econometrica, 47(5), 1287–1294.
White H (1980).
“A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity.”
Econometrica, 48(4), 817–838.
This function should not be confused with
tseries::white.test
, which does
not implement the method of
White (1980) for testing for
heteroskedasticity in a linear model.
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) white(mtcars_lm) white(mtcars_lm, interactions = TRUE)
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) white(mtcars_lm) white(mtcars_lm, interactions = TRUE)
This function implements the nonparametric test of Wilcox and Keselman (2006) for testing for heteroskedasticity in a simple linear regression model, and extends it to the multiple linear regression model.
wilcox_keselman( mainlm, gammapar = 0.2, B = 500L, p.adjust.method = "none", seed = NA, rqwarn = FALSE, matchWRS = FALSE, statonly = FALSE )
wilcox_keselman( mainlm, gammapar = 0.2, B = 500L, p.adjust.method = "none", seed = NA, rqwarn = FALSE, matchWRS = FALSE, statonly = FALSE )
mainlm |
Either an object of |
gammapar |
A double value between 0 and 0.5 exclusive specifying the
quantile value |
B |
An integer specifying the number of nonparametric bootstrap samples
to use to estimate standard error(s) of the quantile difference(s).
Defaults to |
p.adjust.method |
A character specifying the family-wise error rate
method to use in adjusting |
seed |
An integer specifying a seed to pass to
|
rqwarn |
A logical specifying whether warnings generated by
|
matchWRS |
A logical specifying whether bootstrap samples should be
generated in the exact same manner as in the |
statonly |
A logical. If |
An object of class
"htest"
. If object is
not assigned, its attributes are displayed in the console as a
tibble
using tidy
.
Wilcox RR, Keselman HJ (2006). “Detecting Heteroscedasticity in a Simple Regression Model via Quantile Regression Slopes.” Journal of Statistical Computation and Simulation, 76(8), 705–712.
Rand R. Wilcox's package
WRS on Github; in particular
the functions qhomt
and qhomtv2
, which implement this
method for simple and multiple linear regression respectively.
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) wilcox_keselman(mtcars_lm)
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) wilcox_keselman(mtcars_lm)
This function implements the two methods of Yüce (2008) for testing for heteroskedasticity in a linear regression model.
yuce(mainlm, method = c("A", "B"), statonly = FALSE)
yuce(mainlm, method = c("A", "B"), statonly = FALSE)
mainlm |
Either an object of |
method |
A character indicating which of the two tests presented in
Yüce (2008) should be implemented. Possible
values are |
statonly |
A logical. If |
These two tests are straightforward to implement; in both cases the
test statistic is a function only of the residuals of the linear
regression model. The first test statistic has an asymptotic chi-squared
distribution and the second has an asymptotic -distribution. In
both cases the degrees of freedom are
. The chi-squared test
is right-tailed whereas the
-test is two-tailed.
An object of class
"htest"
. If object is
not assigned, its attributes are displayed in the console as a
tibble
using tidy
.
Yüce M (2008). “An Asymptotic Test for the Detection of Heteroskedasticity.” Istanbul University Econometrics and Statistics e-Journal, 8, 33–44.
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) yuce(mtcars_lm, method = "A") yuce(mtcars_lm, method = "B")
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) yuce(mtcars_lm, method = "A") yuce(mtcars_lm, method = "B")
This function implements the methods of Zhou et al. (2015) for testing for heteroskedasticity in a linear regression model.
zhou_etal( mainlm, auxdesign = NA, method = c("pooled", "covariate-specific", "hybrid"), Bperturbed = 500L, seed = 1234, statonly = FALSE )
zhou_etal( mainlm, auxdesign = NA, method = c("pooled", "covariate-specific", "hybrid"), Bperturbed = 500L, seed = 1234, statonly = FALSE )
mainlm |
Either an object of |
auxdesign |
A |
method |
A character specifying which of the three test methods to
implement; one of |
Bperturbed |
An integer specifying the number of perturbation samples
to generate when estimating the |
seed |
An integer specifying a seed to pass to
|
statonly |
A logical. If |
Zhou et al. (2015) The authors propose
three variations based on comparisons between sandwich and model-based
estimators for the variances of individual regression coefficient
esimators. The covariate-specific method computes a test statistic and
-value for each column of the auxiliary design matrix (which is,
by default, the original design matrix with intercept omitted). The
-values undergo a Bonferroni correction to control overall test
size. When the null hypothesis is rejected in this case, it also provides
information about which auxiliary design variable is associated with the
error variance. The pooled method computes a single test statistic and
-value and is thus an omnibus test. The hybrid method returns the
minimum
-value between the Bonferroni-corrected covariate-specific
-values and the pooled
-value, multiplying it by 2 for a
further Bonferroni correction. The test statistic returned is that
which corresponds to the minimum
-value. The covariate-specific
and pooled test statistics have null distributions that are
asymptotically normal with mean 0. However, the variance is intractable
and thus perturbation sampling is used to compute
-values
empirically.
An object of class
"htest". If object is not
assigned, its attributes are displayed in the console as a
tibble
using tidy
.
Zhou QM, Song PX, Thompson ME (2015). “Profiling Heteroscedasticity in Linear Regression Models.” The Canadian Journal of Statistics, 43(3), 358–377.
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) zhou_etal(mtcars_lm, method = "pooled") zhou_etal(mtcars_lm, method = "covariate-specific") zhou_etal(mtcars_lm, method = "hybrid")
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars) zhou_etal(mtcars_lm, method = "pooled") zhou_etal(mtcars_lm, method = "covariate-specific") zhou_etal(mtcars_lm, method = "hybrid")