The exactt package tests whether a slope coefficient is equal to some null value using the novel method described in Pouliot (2023). Importantly, inverting such a test produces a marginally valid confidence interval.
The exactt package is hosted on GitHub at https://github.com/ian-xu-economics/exactt/. It can be installed using the remotes::install_github() function:
# install.packages("remotes")
remotes::install_github("ian-xu-economics/exactt")To cite the exactt package in publications, use the citation() function, which provides both the text version and the BibTeX entry for referencing:
citation("exactt")exactt
After installing exactt, we can attach the package to our session using the base library() function:
To compute the confidence interval, use the exactt() function. Here’s an example looking at the effect of vitamin C on tooth growth in guinea pigs using data from datasets::ToothGrowth. We’ll investigate the relationship between supp (orange juice (OJ) or ascorbic acid (VC)) and dose (dose in milligrams/day) on len (tooth length).
summary(datasets::ToothGrowth)
#> len supp dose
#> Min. : 4.20 OJ:30 Min. :0.500
#> 1st Qu.:13.07 VC:30 1st Qu.:0.500
#> Median :19.25 Median :1.000
#> Mean :18.81 Mean :1.167
#> 3rd Qu.:25.27 3rd Qu.:2.000
#> Max. :33.90 Max. :2.000Suppose our model is . We can create a 90% confidence interval by plugging in standard formula notation into exactt(). The level of significance (alpha) equals 0.1 here, but if we choose not to specify any additional parameters, then by default:
nBlocks = 5).variables = NULL).nPerms = factorial(nBlocks)).alpha = 0.05).studentize = TRUE).permutation = NULL).optimize = FALSE).
exactt.1 <- exactt(model = len ~ dose + supp,
data = datasets::ToothGrowth,
alpha = 0.1)
print(exactt.1, digits = 5)
#>
#> Call:
#> exactt(model = len ~ dose + supp, data = datasets::ToothGrowth,
#> alpha = 0.1)
#>
#>
#> Summary:
#> Estimate P-value Lower Bound Upper Bound
#> dose 9.7636 0.07500 3.3177 16.0120
#> suppVC -3.7000 0.26667 -10.9840 7.6065To focus on specific coefficients, set the variables parameter. The number entered corresponds to the index of the regressors in the model (note that the intercept is never counted). For example, set variables = 1 for dose, and set variables = 2 for supp.
exactt.2 <- exactt(model = len ~ dose + supp,
data = datasets::ToothGrowth,
alpha = 0.1,
variables = 1)
print(exactt.2, digits = 5)
#>
#> Call:
#> exactt(model = len ~ dose + supp, data = datasets::ToothGrowth,
#> alpha = 0.1, variables = 1)
#>
#>
#> Summary:
#> Estimate P-value Lower Bound Upper Bound
#> dose 9.7636 0.075 3.3177 16.012This creates a 90% confidence interval for dose only. It is equivalent to the case where variables = NULL (all variables are of interest) because these confidence intervals are marginally valid.
The exactt() function is designed to allow for easy modification of your model. For instance, you can treat a variable as categorical, include polynomial terms, or apply other transformations directly within the model formula. This flexibility helps tailor the analysis to specific research questions without needing pre-transformed data. To illustrate, consider treating dose as a categorical variable to explore its discrete impact on tooth length:
exactt.3 <- exactt(model = len ~ as.factor(dose) + supp,
data = datasets::ToothGrowth,
alpha = 0.1)
exactt.3
#>
#> Call:
#> exactt(model = len ~ as.factor(dose) + supp, data = datasets::ToothGrowth,
#> alpha = 0.1)
#>
#>
#> Summary:
#> Estimate P-value Lower Bound Upper Bound
#> as.factor(dose)1 9.13 0.0500 5.644 19.12
#> as.factor(dose)2 15.49 0.6583 -Inf Inf
#> suppVC -3.70 0.4333 -Inf InfThe 90% confidence intervals when dose equals “2” and supp equals “VC” is not informative due to suboptimal data ordering, which can diminish the statistical power of the test. This issue can be addressed by optimizing the data ordering.
The confidence intervals produced by the exactt() function can change with the ordering of the data. Certain data orderings can enhance statistical power, particularly when the sample size is small and the number of blocks is large. The impact of optimization is even more pronounced when dealing with categorical variables, where appropriate ordering can substantially increase the test’s power.
The exactt() function utilizes a genetic algorithm (provided by the GA::ga() function) to optimize data ordering. This approach systematically explores various data arrangements to find the one that maximizes statistical power on average.
To activate the optimization feature, set optimize = TRUE. Additionally, exactt() allows for the specification of various parameters of the GA::ga() function to tailor the optimization process. For instance, you can limit the number of iterations with maxiter or specify the seed with seed for reproducibility:
exactt.4 <- exactt(model = len ~ as.factor(dose) + supp,
data = datasets::ToothGrowth,
alpha = 0.1,
optimize = TRUE,
parallel = FALSE,
maxiter = 5,
seed = 2024)
#> ✔ Optimizing ordering for `as.factor(dose)1`.
#> ✔ Optimizing ordering for `as.factor(dose)2`.
#> ✔ Optimizing ordering for `suppVC`.
print(exactt.4, digits = 5)
#>
#> Call:
#> exactt(model = len ~ as.factor(dose) + supp, data = datasets::ToothGrowth,
#> alpha = 0.1, optimize = TRUE, seed = 2024, parallel = FALSE,
#> maxiter = 5)
#>
#>
#> Summary:
#> Estimate P-value Lower Bound Upper Bound
#> as.factor(dose)1 9.130 0.0083333 4.4410 12.0620
#> as.factor(dose)2 15.495 0.0083333 14.3080 16.9730
#> suppVC -3.700 0.0166670 -6.6167 -3.5955Note that by optimizing the data ordering, exactt() is now able to construct informative 90% confidence intervals for each category of dose and supp when they equal “2” and “VC” respectively. Furthermore, the detailed results of the optimization process, including the genetic algorithm’s configurations and outcomes for each variable, are stored in the exactt.4$gaResults. For instance, to review a summary of the genetic algorithm’s performance for the suppVC variable, use:
exactt.4$gaResults$suppVC@summary
#> max mean q3 median q1 min
#> [1,] 6.068181 4.164565 5.062314 4.052930 3.469694 2.298845
#> [2,] 7.411765 4.346684 5.408912 4.393609 3.322604 2.151261
#> [3,] 7.411765 4.343094 5.290857 4.175537 3.463937 1.098294
#> [4,] 7.411765 4.590258 5.369748 4.549311 3.632077 1.740592
#> [5,] 7.411765 4.449291 5.265123 4.493028 3.824316 1.774685The exactt() function is capable of handling models with instrumental variables (IV). In Example 15.5 of Wooldridge (2020), Wooldridge reanalyzes Mroz (1987). This example explores the impact of education (educ) on log(wage), using parental education levels—mother’s education (motheduc) and father’s education (fatheduc)—as instruments. The model controls for experience (exper) and its square (expersq), with education being the primary variable of interest, hence we set variables = 1. Optionally, as before, we can optimize the data ordering to enhance statistical power.
exactt.iv <- exactt(model = lwage ~ educ + exper + expersq | exper + expersq + motheduc + fatheduc,
data = wooldridge::mroz,
variables = 1,
optimize = TRUE,
parallel = FALSE,
maxiter = 10,
monitor = TRUE,
seed = 31740)
#> ✔ Optimizing ordering for `educ`.
#> GA | iter = 1 | Mean = 2653027 | Best = 3095490
#> GA | iter = 2 | Mean = 2700054 | Best = 3095490
#> GA | iter = 3 | Mean = 2734650 | Best = 3095490
#> GA | iter = 4 | Mean = 2729035 | Best = 3095490
#> GA | iter = 5 | Mean = 2745544 | Best = 3167506
#> GA | iter = 6 | Mean = 2768499 | Best = 3167506
#> GA | iter = 7 | Mean = 2704935 | Best = 3167506
#> GA | iter = 8 | Mean = 2722396 | Best = 3167506
#> GA | iter = 9 | Mean = 2742183 | Best = 3167506
#> GA | iter = 10 | Mean = 2693768 | Best = 3167506
exactt.iv
#>
#> Call:
#> exactt(model = lwage ~ educ + exper + expersq | exper + expersq +
#> motheduc + fatheduc, data = wooldridge::mroz, variables = 1,
#> optimize = TRUE, seed = 31740, parallel = FALSE, maxiter = 10,
#> monitor = TRUE)
#>
#>
#> Summary:
#> Estimate P-value Lower Bound Upper Bound
#> educ 0.0614 0.1833 -0.04061 0.1368