Exact inference for a real-valued function of multinomial parameters
Source:R/xactonomial.R
xactonomial.Rd
We consider the k sample multinomial problem where we observe k vectors (possibly of different lengths), each representing an independent sample from a multinomial. For a given function psi which takes in the concatenated vector of multinomial probabilities and outputs a real number, we are interested in constructing a confidence interval for psi.
Usage
xactonomial(
psi,
data,
psi0 = NULL,
alternative = c("two.sided", "less", "greater"),
alpha = 0.05,
psi_limits,
maxit = 50,
chunksize = 500,
conf.int = TRUE,
psi_is_vectorized = FALSE
)
Arguments
- psi
Function that takes in a vector of parameters and outputs a real valued number
- data
A list with k elements representing the vectors of counts of a k-sample multinomial
- psi0
The null hypothesis value for the parameter being tested. A p value for a test of psi <= psi0 is computed. If NULL only a confidence interval is computed.
- alternative
a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less"
- alpha
A 1 - alpha percent confidence interval will be computed
- psi_limits
A vector of length 2 giving the lower and upper limits of the range of \(\psi(\theta)\)
- maxit
Maximum number of iterations of the stochastic procedure
- chunksize
The number of samples taken from the parameter space at each iteration
- conf.int
Logical. If FALSE, no confidence interval is calculated, only the p-value.
- psi_is_vectorized
Logical. If TRUE, expect that psi can take a matrix as input, and return a vector of length the number of rows, computing the statistic for each row of the matrix. If possible, this will substantially speed up the computation. See examples.
Details
Let \(T_j\) be distributed
\(\mbox{Multinomial}_{d_j}(\boldsymbol{\theta}_j, n_j)\) for \(j = 1,
\ldots, k\) and denote \(\boldsymbol{T} = (T_1, \ldots, T_k)\) and
\(\boldsymbol{\theta} = (\theta_1, \ldots, \theta_k)\). The subscript
\(d_j\) denotes the dimension of the multinomial. Suppose one is interested
in the parameter \(\psi(\boldsymbol{\theta}) \in \Theta \subseteq
\mathbb{R}\). Given a sample of size \(n\) from \(\boldsymbol{T}\), one
can estimate \(\boldsymbol{\theta}\) with the sample proportions as
\(\hat{\boldsymbol{\theta}}\) and hence
\(\pi(\hat{\boldsymbol{\theta}})\). This function constructs a \(1 -
\alpha\) percent confidence interval for \(\psi(\boldsymbol{\theta})\) and
provides a function to calculate a p value for a test of the null hypothesis
\(H_0: \psi(\boldsymbol{\theta}) \neq \psi_0\) for the two sided case,
\(H_0: \psi(\boldsymbol{\theta}) \leq \psi_0\) for the case alternative = "greater"
, and
\(H_0: \psi(\boldsymbol{\theta}) \geq \psi_0\) for the case alternative = "less"
.
We make no assumptions and do not rely on large sample approximations.
The computation is somewhat involved so it is best for small sample sizes.
Examples
psi_ba <- function(theta) {
theta1 <- theta[1:4]
theta2 <- theta[5:8]
sum(sqrt(theta1 * theta2))
}
data <- list(T1 = c(2,1,2,1), T2 = c(0,1,3,3))
xactonomial(psi_ba, data, psi_limits = c(0, 1), maxit = 5, chunksize = 20)
#> $estimate
#> [1] 0.7995291
#>
#> $conf.int
#> [1] 0.5975000 0.9970571
#>
#> $p.value
#> [1] NA
#>
psi_ba_v <- function(theta) {
theta1 <- theta[,1:4, drop = FALSE]
theta2 <- theta[,5:8, drop = FALSE]
rowSums(sqrt(theta1 * theta2))
}
data <- list(T1 = c(2,1,2,1), T2 = c(0,1,3,3))
xactonomial(psi_ba_v, data, psi_limits = c(0, 1), maxit = 5, chunksize = 20, psi_is_vectorized = TRUE)
#> $estimate
#> [1] 0.7995291
#>
#> $conf.int
#> [1] 0.6225000 0.9922135
#>
#> $p.value
#> [1] NA
#>