Skip to contents

We consider the k sample multinomial problem where we observe k vectors (possibly of different lengths), each representing an independent sample from a multinomial. For a given function psi which takes in the concatenated vector of multinomial probabilities and outputs a real number, we are interested in constructing a confidence interval for psi.

Usage

xactonomial(
  psi,
  data,
  psi0 = NULL,
  alternative = c("two.sided", "less", "greater"),
  alpha = 0.05,
  psi_limits,
  maxit = 50,
  chunksize = 500,
  conf.int = TRUE,
  psi_is_vectorized = FALSE
)

Arguments

psi

Function that takes in a vector of parameters and outputs a real valued number

data

A list with k elements representing the vectors of counts of a k-sample multinomial

psi0

The null hypothesis value for the parameter being tested. A p value for a test of psi <= psi0 is computed. If NULL only a confidence interval is computed.

alternative

a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less"

alpha

A 1 - alpha percent confidence interval will be computed

psi_limits

A vector of length 2 giving the lower and upper limits of the range of \(\psi(\theta)\)

maxit

Maximum number of iterations of the stochastic procedure

chunksize

The number of samples taken from the parameter space at each iteration

conf.int

Logical. If FALSE, no confidence interval is calculated, only the p-value.

psi_is_vectorized

Logical. If TRUE, expect that psi can take a matrix as input, and return a vector of length the number of rows, computing the statistic for each row of the matrix. If possible, this will substantially speed up the computation. See examples.

Value

A list with 3 elements: the estimate, the 1 - alpha percent confidence interval, and p-value

Details

Let \(T_j\) be distributed \(\mbox{Multinomial}_{d_j}(\boldsymbol{\theta}_j, n_j)\) for \(j = 1, \ldots, k\) and denote \(\boldsymbol{T} = (T_1, \ldots, T_k)\) and \(\boldsymbol{\theta} = (\theta_1, \ldots, \theta_k)\). The subscript \(d_j\) denotes the dimension of the multinomial. Suppose one is interested in the parameter \(\psi(\boldsymbol{\theta}) \in \Theta \subseteq \mathbb{R}\). Given a sample of size \(n\) from \(\boldsymbol{T}\), one can estimate \(\boldsymbol{\theta}\) with the sample proportions as \(\hat{\boldsymbol{\theta}}\) and hence \(\pi(\hat{\boldsymbol{\theta}})\). This function constructs a \(1 - \alpha\) percent confidence interval for \(\psi(\boldsymbol{\theta})\) and provides a function to calculate a p value for a test of the null hypothesis \(H_0: \psi(\boldsymbol{\theta}) \neq \psi_0\) for the two sided case, \(H_0: \psi(\boldsymbol{\theta}) \leq \psi_0\) for the case alternative = "greater", and \(H_0: \psi(\boldsymbol{\theta}) \geq \psi_0\) for the case alternative = "less". We make no assumptions and do not rely on large sample approximations. The computation is somewhat involved so it is best for small sample sizes.

Examples

psi_ba <- function(theta) {
  theta1 <- theta[1:4]
  theta2 <- theta[5:8]
  sum(sqrt(theta1 * theta2))
  }
data <- list(T1 = c(2,1,2,1), T2 = c(0,1,3,3))
xactonomial(psi_ba, data, psi_limits = c(0, 1), maxit = 5, chunksize = 20)
#> $estimate
#> [1] 0.7995291
#> 
#> $conf.int
#> [1] 0.5975000 0.9970571
#> 
#> $p.value
#> [1] NA
#> 

psi_ba_v <- function(theta) {
theta1 <- theta[,1:4, drop = FALSE]
theta2 <- theta[,5:8, drop = FALSE]
rowSums(sqrt(theta1 * theta2))
}
data <- list(T1 = c(2,1,2,1), T2 = c(0,1,3,3))
xactonomial(psi_ba_v, data, psi_limits = c(0, 1), maxit = 5, chunksize = 20, psi_is_vectorized = TRUE)
#> $estimate
#> [1] 0.7995291
#> 
#> $conf.int
#> [1] 0.6225000 0.9922135
#> 
#> $p.value
#> [1] NA
#>