The exactt package tests whether a slope coefficient is
equal to some null value using the novel method described in Pouliot
(2023). Importantly, inverting such a test produces a marginally valid
confidence interval.
The exactt package is hosted on GitHub at https://github.com/ian-xu-economics/exactt/. It can be
installed using the remotes::install_github() function:
# install.packages("remotes")
remotes::install_github("ian-xu-economics/exactt")To cite the exactt package in publications, use the
citation() function, which provides both the text version
and the BibTeX entry for referencing:
citation("exactt")exactt
After installing exactt, we can attach the package to
our session using the base library() function:
To compute the
confidence interval, use the exactt() function. Here’s an
example looking at the effect of vitamin C on tooth growth in guinea
pigs using data from datasets::ToothGrowth. We’ll
investigate the relationship between supp (orange juice
(OJ) or ascorbic acid (VC)) and dose (dose in
milligrams/day) on len (tooth length).
summary(datasets::ToothGrowth)
#> len supp dose
#> Min. : 4.20 OJ:30 Min. :0.500
#> 1st Qu.:13.07 VC:30 1st Qu.:0.500
#> Median :19.25 Median :1.000
#> Mean :18.81 Mean :1.167
#> 3rd Qu.:25.27 3rd Qu.:2.000
#> Max. :33.90 Max. :2.000Suppose our model is
.
We can create a 90% confidence interval by plugging in standard formula
notation into exactt(). The level of significance (alpha)
equals 0.1 here, but if we choose not to specify any additional
parameters, then by default:
nBlocks = 5).variables = NULL).nPerms = factorial(nBlocks)).alpha = 0.05).studentize = TRUE).permutation = NULL).optimize = FALSE).
exactt.1 <- exactt(model = len ~ dose + supp,
data = datasets::ToothGrowth,
alpha = 0.1)
print(exactt.1, digits = 5)
#>
#> Exact t-Test (Marginally Valid Tests)
#>
#> Call:
#> exactt(model = len ~ dose + supp, data = datasets::ToothGrowth,
#> alpha = 0.1)
#>
#> Summary:
#> Estimate P-value Lower Bound Upper Bound
#> dose 9.76357 0.07500 3.31768 16.01241
#> suppVC -3.70000 0.26667 -10.98448 7.60653To focus on specific coefficients, set the variables
parameter. The number entered corresponds to the index of the regressors
in the model (note that the intercept is never counted). For example,
set variables = 1 for dose, and set
variables = 2 for supp.
exactt.2 <- exactt(model = len ~ dose + supp,
data = datasets::ToothGrowth,
alpha = 0.1,
variables = 1)
print(exactt.2, digits = 5)
#>
#> Exact t-Test (Marginally Valid Tests)
#>
#> Call:
#> exactt(model = len ~ dose + supp, data = datasets::ToothGrowth,
#> alpha = 0.1, variables = 1)
#>
#> Summary:
#> Estimate P-value Lower Bound Upper Bound
#> dose 9.76357 0.075 3.31768 16.01241This creates a 90% confidence interval for dose only. It
is equivalent to the case where variables = NULL (all
variables are of interest) because these confidence intervals are
marginally valid.
The exactt() function is designed to allow for easy
modification of your model. For instance, you can treat a variable as
categorical, include polynomial terms, or apply other transformations
directly within the model formula. This flexibility helps tailor the
analysis to specific research questions without needing pre-transformed
data. To illustrate, consider treating dose as a
categorical variable to explore its discrete impact on tooth length:
exactt.3 <- exactt(model = len ~ as.factor(dose) + supp,
data = datasets::ToothGrowth,
alpha = 0.1)
exactt.3
#>
#> Exact t-Test (Marginally Valid Tests)
#>
#> Call:
#> exactt(model = len ~ as.factor(dose) + supp, data = datasets::ToothGrowth,
#> alpha = 0.1)
#>
#> Summary:
#> Estimate P-value Lower Bound Upper Bound
#> as.factor(dose)1 9.130 0.07500 -Inf Inf
#> as.factor(dose)2 15.495 0.21667 -64 29.71429
#> suppVC -3.700 0.95000 -Inf InfThe 90% confidence intervals when dose equals “2” and
supp equals “VC” is not informative due to suboptimal data
ordering, which can diminish the statistical power of the test. This
issue can be addressed by optimizing the data ordering.
The confidence intervals produced by the exactt()
function can change with the ordering of the data. Certain data
orderings can enhance statistical power, particularly when the sample
size is small and the number of blocks is large. The impact of
optimization is even more pronounced when dealing with categorical
variables, where appropriate ordering can substantially increase the
test’s power.
The exactt() function utilizes a genetic algorithm
(provided by the GA::ga() function) to optimize data
ordering. This approach systematically explores various data
arrangements to find the one that maximizes statistical power on
average.
To activate the optimization feature, set
optimize = TRUE. Additionally, exactt() allows
for the specification of various parameters of the GA::ga()
function to tailor the optimization process. For instance, you can limit
the number of iterations with maxiter or specify the seed
with seed for reproducibility:
exactt.4 <- exactt(model = len ~ as.factor(dose) + supp,
data = datasets::ToothGrowth,
alpha = 0.1,
optimize = TRUE,
parallel = FALSE,
maxiter = 5,
seed = 2024)
#> ✔ Optimizing ordering for `as.factor(dose)1`.
#> ✔ Optimizing ordering for `as.factor(dose)2`.
#> ✔ Optimizing ordering for `suppVC`.
print(exactt.4, digits = 5)
#>
#> Exact t-Test (Marginally Valid Tests)
#>
#> Call:
#> exactt(model = len ~ as.factor(dose) + supp, data = datasets::ToothGrowth,
#> alpha = 0.1, optimize = TRUE, seed = 2024, parallel = FALSE,
#> maxiter = 5)
#>
#> Summary:
#> Estimate P-value Lower Bound Upper Bound
#> as.factor(dose)1 9.130 0.00833 4.44096 12.06245
#> as.factor(dose)2 15.495 0.00833 14.30794 16.97312
#> suppVC -3.700 0.01667 -6.61671 -3.59551Note that by optimizing the data ordering, exactt() is
now able to construct informative 90% confidence intervals for each
category of dose and supp when they equal “2”
and “VC” respectively. Furthermore, the detailed results of the
optimization process, including the genetic algorithm’s configurations
and outcomes for each variable, are stored in the
exactt.4$gaResults. For instance, to review a summary of
the genetic algorithm’s performance for the suppVC
variable, use:
exactt.4$gaResults$suppVC@summary
#> max mean q3 median q1 min
#> [1,] 6.068181 4.164565 5.062314 4.052930 3.469694 2.298845
#> [2,] 7.411765 4.346684 5.408912 4.393609 3.322604 2.151261
#> [3,] 7.411765 4.343094 5.290857 4.175537 3.463937 1.098294
#> [4,] 7.411765 4.590258 5.369748 4.549311 3.632077 1.740592
#> [5,] 7.411765 4.449291 5.265123 4.493028 3.824316 1.774685The exactt() function is capable of handling models with
instrumental variables (IV). In Example 15.5 of Wooldridge (2020),
Wooldridge reanalyzes Mroz (1987). This example explores the impact of
education (educ) on log(wage), using parental
education levels—mother’s education (motheduc) and father’s
education (fatheduc)—as instruments. The model controls for
experience (exper) and its square (expersq),
with education being the primary variable of interest, hence we set
variables = 1. Optionally, as before, we can optimize the data ordering
to enhance statistical power.
exactt.iv <- exactt(model = lwage ~ educ + exper + expersq | exper + expersq + motheduc + fatheduc,
data = wooldridge::mroz,
variables = 1,
optimize = TRUE,
parallel = FALSE,
maxiter = 10,
monitor = TRUE,
seed = 31740)
#> ✔ Optimizing ordering for `educ`.
#> GA | iter = 1 | Mean = 2653027 | Best = 3095490
#> GA | iter = 2 | Mean = 2700054 | Best = 3095490
#> GA | iter = 3 | Mean = 2734650 | Best = 3095490
#> GA | iter = 4 | Mean = 2729035 | Best = 3095490
#> GA | iter = 5 | Mean = 2745544 | Best = 3167506
#> GA | iter = 6 | Mean = 2768499 | Best = 3167506
#> GA | iter = 7 | Mean = 2704935 | Best = 3167506
#> GA | iter = 8 | Mean = 2722396 | Best = 3167506
#> GA | iter = 9 | Mean = 2742183 | Best = 3167506
#> GA | iter = 10 | Mean = 2693768 | Best = 3167506
exactt.iv
#>
#> Exact t-Test (Marginally Valid Tests)
#>
#> Call:
#> exactt(model = lwage ~ educ + exper + expersq | exper + expersq +
#> motheduc + fatheduc, data = wooldridge::mroz, variables = 1,
#> optimize = TRUE, seed = 31740, parallel = FALSE, maxiter = 10,
#> monitor = TRUE)
#>
#> Summary:
#> Estimate P-value Lower Bound Upper Bound
#> educ 0.0614 0.18333 -0.04061 0.13679