Logistic regression with outcome based on imperfect test results, and measured imperfectly or with “error”
We illustrate the model and methods developed in "McInturff P, Johnson WO, Cowling D, Gardner IA. Modelling risk when binary outcomes are subject to error. Stat Med. 2004 Apr 15;23(7):1095-109" using the smoking cessation example.
The following WinBUGS code is presented for the logistic regression model when the outcome is measured imperfectly or with “error”. It illustrates logistic regression modeling and inference where the binary outcome indicating “success” may be incorrect, and uses an informative prior on the regression coefficients.
The binary outcome may be 1, indicating the presence of infection for example, but in fact the infection is not present, and vice versa. In this instance, we incorporate information about the proportion of the time that the outcome will indicate the infection is there, when in fact it is (eg. the sensitivity) and also the proportion of the time that the outcome indicated that the infection is absent, when it is indeed absent (eg. the specificity). Independent beta priors are placed on these proportions which reflect available scientific knowledge about them. Replacing the priors for Se and Sp with Se = Sp = 1.0 results in a standard binomial regression analysis.
The latent or unobserved true infection status is unknown but still modeled as a logistic regression (or possibly a probit or complementary log log regression as desired). Thus globally changing logit to cloglog in the WinBUGS code below will convert the model to a cloglog regression model. Changing logit to g(·), when g(·) is a function supported by WinBUGS, will give an analysis corresponding to that link.
We use “Conditional Means Priors” (see Bedrick EJ, Christensen R, Johnson WO. A new perspective on priors for generalized linear models. J Am Stat Assoc. 1996 Dec;91(436):1450-60. for details) for the regression coefficients. The principle behind these priors is to specify a prior distribution on parameters that are easy for scientists to think about, rather than on regression coefficients which are not easy to think about directly. For example, consider a situation with a single covariate, say age. The logistic regression model relates the logit of the probability of event, say infection, as a linear function of age. There is an intercept and a slope corresponding to age. We require a joint prior distribution on the intercept and slope. We ask the experimenter to think about the prevalence of infection for two types of individuals, one who is of average age say, and one for an individual who is say one standard deviation above the mean. We elicit independent beta distributions on these two probabilities. Because there is a one to one correspondence between these two probabilities and the regression intercept and slope, we are able to induce an informative prior distribution on the regression coefficients. In general, this is accomplished for say p-1 covariates by specifying independent beta priors on p probabilities of infection corresponding to distinct covariate combinations. These are specified in WinBugs. All that remains is to relate the regression coefficients to these probabilities in WinBugs. Again, see Bedrick et al for details. These priors result in data augmentation priors in the case of logistic regression.
There are N = 361 observations in the data below. The prior is a BCJ prior and is elicited based on p-1 = 5 covariates so there are 6 probabilities that are used to elicit a prior on the 6 regression coefficients. These probabilities are indexed as p[1] to p[6]. The data are given below. They are taken from McInturff et al. The results here differ from the paper slightly due to the fact that this was a separate monte carlo run.
# generall notations
# N is the number of individuals in the data set,
# y is a 0,1 binary (or Bernoulli) outcome,
# q is the probability that y = 1,
# pi is the true probability of D (“Disease or Infection”),
# beta[1] through beta[6] are the regression coefficients,
# x2 through x6 are covariates,
# xinv is the inverse of the covariate combination matrix, which
# must be calculated elsewhere and entered as data.
# variables key
# y = success in cessation program
# x1 = 1 (intercept)
# x2 = Age level 2: 20-29 years
# x3 = Age level 3: >= 30 years
# x4 = Education level 2: HS grad
# x5 = Education level 3: some college
# x6 = smoking history < 1 pack/day
model {
for (i in 1:N) {
y[i] ~ dbern(q[i]) # specification of the LR model with error
q[i] <- pi[i]*Se+(1-pi[i])*(1-Sp)
logit(pi[i]) <- beta[1] + x2[i]*beta[2] +
x3[i]*beta[3] + x4[i]*beta[4] + x5[i]*beta[5] + x6[i]*beta[6]
}
Se ~ dbeta(99, 1) # the priors are specified for Se and Sp
Sp ~ dbeta(14, 2)
p[1] ~ dbeta(8, 15) # specification of the prior on 6 probabilities of smoking
p[2] ~ dbeta(10, 15) # cessation for 6 covariate combinations
p[3] ~ dbeta(3, 13)
p[4] ~ dbeta(8, 10)
p[5] ~ dbeta(4, 15)
p[6] ~ dbeta(6, 15)
# relate the regression coefficients to the probabilities on which prior was specified
# this results in an induced informative prior on the regression coefficients
beta[1] <- xinv[1,1]*logit(p[1]) + xinv[1,2]*logit(p[2])
+ xinv[1,3]*logit(p[3]) + xinv[1,4]*logit(p[4])
+ xinv[1,5]*logit(p[5]) + xinv[1,6]*logit(p[6])
beta[2] <- xinv[2,1]*logit(p[1]) + xinv[2,2]*logit(p[2])
+ xinv[2,3]*logit(p[3]) + xinv[2,4]*logit(p[4])
+ xinv[2,5]*logit(p[5]) + xinv[2,6]*logit(p[6])
beta[3] <- xinv[3,1]*logit(p[1]) + xinv[3,2]*logit(p[2])
+ xinv[3,3]*logit(p[3]) + xinv[3,4]*logit(p[4])
+ xinv[3,5]*logit(p[5]) + xinv[3,6]*logit(p[6])
beta[4] <- xinv[4,1]*logit(p[1]) + xinv[4,2]*logit(p[2])
+ xinv[4,3]*logit(p[3]) + xinv[4,4]*logit(p[4])
+ xinv[4,5]*logit(p[5]) + xinv[4,6]*logit(p[6])
beta[5] <- xinv[5,1]*logit(p[1]) + xinv[5,2]*logit(p[2])
+ xinv[5,3]*logit(p[3]) + xinv[5,4]*logit(p[4])
+ xinv[5,5]*logit(p[5]) + xinv[5,6]*logit(p[6])
beta[6] <- xinv[6,1]*logit(p[1]) + xinv[6,2]*logit(p[2])
+ xinv[6,3]*logit(p[3]) + xinv[6,4]*logit(p[4])
+ xinv[6,5]*logit(p[5]) + xinv[6,6]*logit(p[6])
}
# inits
list(Se = 0.95, Sp = 0.93, p = c(0.33, 0.39, 0.14, 0.44, 0.18, 0.26))
list(y=c(0,1,0,1,0,0,0,0,0,1,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,
0,0,0,1,0,0,1,0,0,0,1,0,1,1,1,0,1,0,1,0,0,0,1,0,0,0,0,1,0,1,0,1,1,0,0,
0,0,0,0,1,0,1,0,1,0,1,1,1,1,0,0,0,1,0,1,0,0,0,0,0,0,1,0,1,1,0,1,1,0,1,
0,0,0,0,1,0,1,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,1,0,0,1,0,1,0,0,0,1,
0,0,0,1,0,0,0,0,1,0,1,0,1,0,0,0,0,0,0,0,0,0,1,1,0,0,1,0,0,0,1,0,0,0,0,
0,0,1,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,1,1,1,0,0,0,0,0,1,
1,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,1,1,0,0,0,0,0,
1,0,0,0,1,0,1,0,0,0,0,1,1,1,0,0,0,1,1,0,0,0,1,0,1,1,0,0,0,0,1,0,1,1,0,
0,0,1,1,0,1,0,1,1,0,0,1,0,0,1,1,0,0,1,1,0,1,0,0,0,0,0,0,1,1,1,0,1,0,0,
0,0,1,0,1,0,0,0,0,1,0,0,0,1,0,1,1,0,1,0,0,0,0,1,1,0,0,0,0,0,1,0,0,1,0,
0,0,1,0,0,0,0,0,0,0,0,1,0,1,1),
x2 = c(0,1,1,1,1,1,1,0,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,0,0,
0,1,1,1,0,1,1,1,1,1,1,0,0,1,1,0,1,1,1,1,0,0,1,1,0,0,1,0,1,1,0,1,1,1,1,
1,1,1,1,0,0,1,1,1,1,1,0,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,0,1,0,1,1,1,1,1,
0,0,0,0,1,0,1,1,1,1,1,1,0,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,
0,0,1,1,1,0,1,1,1,1,0,1,1,1,1,0,0,1,1,0,1,1,1,1,0,0,1,1,0,0,1,1,0,0,1,
1,1,1,0,0,1,1,0,1,0,1,1,0,1,1,1,1,0,1,1,1,0,1,0,1,0,0,1,1,1,1,1,1,0,1,
1,1,1,0,1,0,0,0,1,1,0,1,1,0,1,1,1,0,1,0,1,1,0,1,0,1,0,0,0,1,1,1,1,1,1,
1,1,1,1,0,1,1,1,1,0,1,1,1,0,0,1,1,1,1,1,1,1,0,1,1,0,1,1,0,1,0,0,1,0,1,
0,0,0,0,0,1,0,1,0,1,1,0,1,1,0,1,1,1,0,0,0,1,1,1,1,1,0,1,1,1,0,0,0,0,0,
0,1,0,0,1,1,0,1,0,1,1,1,0,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,0,0,1,1,0,1,1,1,0,0,0,0,1),
x3 = c(1,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,
1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1,0,1,0,0,0,0,0,0,0,
0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,
1,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,
1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0,0,1,0,0,1,1,0,0,1,1,0,
0,0,0,1,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,1,1,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,
1,1,0,1,0,0,1,0,1,0,0,0,0,0,1,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,1,0,1,
1,0,1,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,1,0,0,0,0,0,0,0,0,0,1,0,0),
x4 = c(0,1,1,1,0,1,1,1,0,1,1,0,1,0,0,1,0,0,0,0,1,1,1,0,0,0,1,1,1,1,1,0,
0,0,1,0,0,0,1,1,0,1,0,0,0,1,1,0,0,0,1,0,1,0,0,1,1,0,0,1,0,1,0,0,1,1,1,
0,1,1,0,0,0,1,1,1,1,1,0,1,0,0,0,1,1,0,0,0,0,0,0,1,1,0,0,1,1,0,0,1,1,1,
0,0,0,1,0,1,1,1,0,1,0,0,1,1,0,0,0,0,0,1,1,1,0,0,0,1,1,0,0,1,1,0,0,0,1,
1,1,1,1,0,0,0,1,1,0,1,0,0,0,1,1,0,0,1,0,0,1,0,0,0,0,0,0,0,1,1,1,0,1,0,
0,1,1,0,1,0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,1,0,1,1,0,0,1,1,1,1,1,0,0,0,0,
1,0,1,0,0,0,0,0,1,0,0,0,1,0,0,1,1,0,1,0,1,0,0,0,0,1,0,0,0,0,1,0,1,1,1,
1,0,0,1,0,1,1,0,1,0,1,1,1,1,0,1,0,0,1,0,0,1,1,1,1,1,0,1,1,0,1,1,0,0,1,
1,0,0,1,1,0,0,0,0,0,1,1,0,0,0,1,0,1,0,0,1,0,1,0,1,0,1,1,1,1,0,1,1,0,0,
1,1,0,0,1,1,0,1,0,1,1,1,1,0,1,1,0,1,1,0,0,1,1,0,1,1,1,0,1,1,1,0,1,1,0,
1,0,1,1,1,1,1,1,0,0,0,1,0,1),
x5 = c(0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1,1,1,0,0,0,0,1,1,0,0,0,0,0,0,
1,0,0,1,0,1,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,0,0,
1,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,1,1,1,0,0,1,1,0,0,1,1,0,0,0,
1,0,1,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,1,0,0,1,1,1,0,
0,0,0,0,0,0,1,0,0,1,0,1,0,1,0,0,0,0,0,1,1,0,1,0,0,1,1,1,1,0,0,0,0,0,0,
0,0,0,1,0,1,0,1,0,0,0,0,0,0,1,0,1,0,0,1,0,1,0,0,0,1,0,0,0,0,0,1,0,0,0,
0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,
0,1,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,
0,1,0,0,0,0,1,1,1,0,0,0,1,1,1,0,1,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,1,
0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,1,
0,1,0,0,0,0,0,0,0,0,0,0,0,0),
x6 = c(0,1,1,1,0,1,1,0,1,1,1,0,1,1,0,1,0,1,1,1,1,0,1,0,1,1,1,1,1,1,1,0,
0,0,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,0,1,1,1,1,0,0,1,0,0,0,1,1,0,1,0,
1,0,1,1,0,1,1,0,1,1,1,1,0,1,1,1,1,1,1,0,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,
1,1,0,0,1,0,1,0,1,1,1,0,1,0,1,0,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,
0,1,1,1,0,0,1,1,0,1,1,1,0,0,1,1,1,1,1,1,0,1,0,0,0,0,1,0,0,1,1,0,0,0,0,
0,0,1,1,1,1,0,0,1,0,0,1,1,0,1,1,0,0,1,1,1,1,1,0,0,0,1,1,1,0,1,0,1,1,1,
1,0,1,1,1,1,0,1,1,0,1,1,1,0,0,0,1,0,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,
0,1,1,1,0,1,1,1,1,1,0,1,1,0,1,1,0,1,1,1,1,1,1,1,1,0,1,1,0,1,0,0,1,0,0,
1,1,1,0,1,1,1,1,1,0,1,0,1,1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,
1,1,1,1,0,0,1,0,1,1,0,0,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,0,1,1,0,0,
0,1,1,1,1,0,0,1,1,1,1,1,1,1),
xinv=structure(.Data=c(-1.0, 1.94E-16, 3.33E-16, 2.57E-16, 1.0, 1.0,
-7.77E-16, 2.50E-16, 1.0, -4.86E-17, 3.33E-16, 1.0,-1.0, 1.0, 1.0,
-3.75E-16, -3.33E-16, 1.0,1.0, -5.83E-16, -5.55E-16, -9.02E-17, -1.0,
8.88E-16,4.16E-16, -6.11E-16, -3.89E-16, 1.0, -1.0, 5.41E-16,1.0,
1.67E-16, -1.0, 3.33E-16, 5.55E-17, 7.77E-16),.Dim=c(6,6)), N=361)
node |
mean |
sd |
MC error |
2.50% |
median |
97.50% |
start |
sample |
Se |
0.9889 |
0.0112 |
1.87E-04 |
0.9581 |
0.9923 |
0.9997 |
1001 |
10000 |
Sp |
0.8846 |
0.05832 |
0.001576 |
0.7586 |
0.8904 |
0.98 |
1001 |
10000 |
beta[1] |
-1.28 |
0.4824 |
0.009809 |
-2.378 |
-1.224 |
-0.4945 |
1001 |
10000 |
beta[2] |
-1.794 |
0.4204 |
0.008314 |
-2.759 |
-1.746 |
-1.109 |
1001 |
10000 |
beta[3] |
-1.814 |
0.5734 |
0.01031 |
-3.17 |
-1.732 |
-0.8912 |
1001 |
10000 |
beta[4] |
0.8832 |
0.3738 |
0.004783 |
0.2234 |
0.8552 |
1.688 |
1001 |
10000 |
beta[5] |
0.4718 |
0.4658 |
0.006418 |
-0.3917 |
0.4457 |
1.445 |
1001 |
10000 |
beta[6] |
1.225 |
0.352 |
0.004054 |
0.5862 |
1.209 |
1.989 |
1001 |
10000 |
p[1] |
0.4578 |
0.06514 |
0.001163 |
0.3356 |
0.4555 |
0.5909 |
1001 |
10000 |
p[2] |
0.4537 |
0.08422 |
0.001083 |
0.2842 |
0.4554 |
0.6153 |
1001 |
10000 |
p[3] |
0.2036 |
0.05354 |
8.07E-04 |
0.1106 |
0.1994 |
0.3207 |
1001 |
10000 |
p[4] |
0.3621 |
0.08063 |
0.001202 |
0.2122 |
0.3593 |
0.5298 |
1001 |
10000 |
p[5] |
0.2646 |
0.071 |
0.001258 |
0.1368 |
0.2614 |
0.4161 |
1001 |
10000 |
p[6] |
0.4039 |
0.05669 |
0.00158 |
0.2743 |
0.411 |
0.4932 |
1001 |
10000 |