The current tutorial demonstrates the Bayesian workflow to model shedding data using Rstan. We use the longitudinal SARS-CoV-2 fecal shedding data from Wölfel et al. (2020). The data includes observations of SARS-CoV-2 RNA concentration in 82 stool samples from 9 patients. The date of sample collection spans from Day 3 to Day 22 after symptom onset. We load the data in R with:
shedding.data <- data.frame(subjects = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 9, 9, 9),
day = c(8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 20, 21, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 19, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 7, 8, 9, 10, 11, 12, 14, 15, 16, 19, 20, 6, 8, 9, 12, 20, 21, 26, 6, 8, 9, 13, 16, 18, 21, 22, 6, 18, 19, 20, 23, 24, 26, 3, 4, 5, 8, 9, 12, 8, 9, 10),
value = c(6.64, 7.16, 6.68, 4.67, 4.7, 3.67, 3.7, 3.73, 3.01, 2.42, 3.46, 2.8, 5.74, NA, 4.88, 4.77, 3.63, 3.25, 2.83, 3.84, 3.87, 3.11, 2.28, 6.1, 6.03, 6.79, 6.62, 4.68, 5.1, 5.79, 4.3, 4.89, 3.64, 4.71, 4.75, 4.09, 2.33, NA, NA, NA, 6.05, 7.23, 7.51, 5.57, 4.43, 5.15, 4.88, 4.25, 3.98, 3.53, 3.56, 4.68, 2.78, 2.99, 4.27, 2.57, 2.95, NA, 3.88, 3.78, 4.09, 4.23, NA, 4.58, NA, 2.57, 3.66, 2.59, NA, NA, 2.04, NA, 1.97, 4.68, 4.33, 3.61, 3.99, 2.71, NA, NA, NA, NA))
censorlimit = 2 ##log10 scale limit of quantification
As most samples collected in shedding studies are several days after symptom onset when the number of pathogens shed is decreasing. In this tutorial, we focus on modeling the decay phase of the shedding dynamics. We consider two classical models, exponential decay model and gamma model.
When the concentration of pathogen shed c(t) is subject to exponential decay, it decreases at a rate \(a_{0}\) proportional to the current value:
\[c(t)=c_{0}e^{-a_{0}t},\] Where \(c_{0}\) is the concentration of pathogen shed at symptom onset and t is the day after symptom onset.
When there are samples from multiple subjects, we can develop a hierarchical model:
\[c_i(t)=c_{i,0}e^{-a_{i,0}t},\]
for subject i. \((log(a_{i,0}),log(c_{i,0})) \sim \mathcal{N}(\mathbf{\mu},\mathbf{\Omega^{-1}})\), where \(\mathbf{\mu}=(\mu_a,\mu_c)\). And we use the Cholesky parameterization for the correlation matrix and \(\mathbf{\Omega^{-1}}=diag(\tau_a,\tau_c)LL^{\top}diag(\tau_a,\tau_c)\). \(L\) is lower triangular such that \(LL^{\top}\) is positive definite.
#install rstan using "install.packages("rstan")";
library(rstan)
#load functions for visualization
source("./models/plotting_helper.R")
First, we consider the simplest case of one subject with only quantifiable positive samples. For this example, we select Subject 3, which has 14 positive samples and 3 negative samples, but only use measurements of positive samples. The data can be created using the following code:
#Create an indicator for noncensored observations of Subject 3;
ind <- which(shedding.data$subjects==3 & !is.na(shedding.data$value))
#Create input data for Stan model
data_one_noncensored <- list(
"nobs"=length(ind), #number of noncensored observations;
"t" = shedding.data$day[ind], #days after symptom onset;
"c" = shedding.data$value[ind] #log10 concentration of SARS-CoV-2 RNA (copies per mL);
)
rm(ind) #remove the indicator;
data_one_noncensored
## $nobs
## [1] 14
##
## $t
## [1] 6 7 8 9 10 11 12 13 14 15 16 17 18 19
##
## $c
## [1] 6.10 6.03 6.79 6.62 4.68 5.10 5.79 4.30 4.89 3.64 4.71 4.75 4.09 2.33
The exponential decay model for one subject with only quantifiable positive samples (expon1.stan) is coded in Stan as follows:
data {
int<lower=0> nobs;// number of noncensored observation
vector[nobs] t; // noncensored observations time vector
vector[nobs] c; // noncensored observations log10 concentration vector
}
parameters {
real<lower=0> sig_obs;//standard deviation of measurement error;
real a0; //decay rate;
real<lower=0> c0; //log scale concentration at day 0;
}
transformed parameters {
vector[nobs] gene_obs;
for (k_obs in 1:nobs)
gene_obs[k_obs] = c0/log(10)-a0*t[k_obs]/log(10);
}
model {
// priors
sig_obs ~ uniform(0,10);
a0 ~ normal(0,100);
c0 ~ normal(0,100);
// likelihood
c ~ normal(gene_obs, sig_obs);
}
Then, we fit the model using the following codes:
fit_one_noncensored <- stan(
file = "./models/expon1.stan", # Stan program
data = data_one_noncensored, # named list of data
chains = 4, # number of Markov chains
warmup = 1000, # number of warmup iterations per chain
iter = 2000, # total number of iterations per chain
cores = 4, # number of cores (could use one per chain)
refresh = 500 # no progress shown
)
The inference results are shown below:
print(fit_one_noncensored, pars=c("a0","c0","sig_obs"), probs=c(.1,.5,.9))
## Inference for Stan model: anon_model.
## 4 chains, each with iter=2000; warmup=1000; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=4000.
##
## mean se_mean sd 10% 50% 90% n_eff Rhat
## a0 0.55 0.01 0.14 0.39 0.56 0.71 592 1
## c0 18.34 0.08 1.84 16.24 18.41 20.45 552 1
## sig_obs 0.82 0.01 0.21 0.60 0.78 1.06 641 1
##
## Samples were drawn using NUTS(diag_e) at Fri Sep 13 11:41:58 2024.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
The estimated exponential decay curve is shown below:
#extract posterior samples
stan_post <- extract(fit_one_noncensored, permuted = FALSE)
a0 <- as.vector(stan_post[,,2])
c0 <- as.vector(stan_post[,,3])
#graph
plot(-1,-1,xlim=c(0,30),ylim=c(-5,10),col="white",
xlab="Days after Symptom Onset",ylab="Genome Copies per mL",main="Subject 3",yaxt="n",cex.axis=1.5,cex.lab=1.5,cex.main=2);
lines(x=c(0,30),y=c(censorlimit,censorlimit),lty=3,col="black");
points(data_one_noncensored$t,data_one_noncensored$c,col="black",cex=1.5,pch=16)
graph.resp.1(color="black",model="exp")
legend("topright", legend=c("Quantifiable","Negative","Median","95% Credible Interval"), lty=c(NA,NA,1,3), pch=c(16,1,NA,NA), lwd=2, pt.cex=2)
text(5,2.5,"Limit of Quantification")
ticks.log(2,cex.axis=1)
Next, we consider the case of one subject with both quantifiable positive samples and negative samples. Negative samples are treated as censored data indicating the SARS-CoV-2 RNA concentrations in those samples are below the Limit of Quantification (LOQ). We still use Subject 3, which has 14 positive samples and 3 negative samples. The data can be created using the following code:
#Create indicators for noncensored and censored observations of Subject 3;
ind.obs <- which(shedding.data$subjects==3 & !is.na(shedding.data$value))
ind.cen <- which(shedding.data$subjects==3 & is.na(shedding.data$value))
#Create input data for Stan model
data_one_censored <- list(
"nobs"=length(ind.obs), #number of noncensored observations;
"ncen" = length(ind.cen), #number of censored observations;
"censlim"=censorlimit, #log10 scale limit of quantification;
"t_obs" = shedding.data$day[ind.obs], #days after symptom onset;
"c_obs" = shedding.data$value[ind.obs], #log10 concentration of SARS-CoV-2 RNA (copies per mL);
"t_cen" = shedding.data$day[ind.cen] #days after symptom onset;
)
rm(ind.obs) #remove indicators;
rm(ind.cen) #remove indicators;
data_one_censored
## $nobs
## [1] 14
##
## $ncen
## [1] 3
##
## $censlim
## [1] 2
##
## $t_obs
## [1] 6 7 8 9 10 11 12 13 14 15 16 17 18 19
##
## $c_obs
## [1] 6.10 6.03 6.79 6.62 4.68 5.10 5.79 4.30 4.89 3.64 4.71 4.75 4.09 2.33
##
## $t_cen
## [1] 20 22 23
The exponential decay model for one subject with both quantifiable positive samples and negative samples (expon1cen.stan) is coded in Stan as follows:
data {
int<lower=0> nobs; // number of noncensored observation
int<lower=0> ncen; // number of censored observation
vector[nobs] t_obs; // noncensored observations time vector
vector[nobs] c_obs; // noncensored observations log10 concentration vector
vector[ncen] t_cen; // censored observations time vector
real<upper=min(c_obs)> censlim; //log10 scale limit of quantification
}
parameters {
real<lower=0> sig_obs;//standard deviation of measurement error;
real a0; //decay rate;
real<lower=0> c0; //log scale concentration at day 0;
}
transformed parameters {
vector[nobs] gene_obs;
vector[ncen] gene_cen;
for (k_obs in 1:nobs)
gene_obs[k_obs] = c0/log(10)-a0*t_obs[k_obs]/log(10);
for (k_cen in 1:ncen)
gene_cen[k_cen] = c0/log(10)-a0*t_cen[k_cen]/log(10);
}
model {
// priors
sig_obs ~ uniform(0,10);
a0 ~ normal(0,100);
c0 ~ normal(0,100);
// likelihood
c_obs ~ normal(gene_obs, sig_obs);
target += normal_lcdf(censlim | gene_cen, sig_obs);
}
Then, we fit the model using the following codes:
fit_one_censored <- stan(
file = "./models/expon1cen.stan", # Stan program
data = data_one_censored, # named list of data
chains = 4, # number of Markov chains
warmup = 1000, # number of warmup iterations per chain
iter = 2000, # total number of iterations per chain
cores = 4, # number of cores (could use one per chain)
refresh = 500 # no progress shown
)
The inference results are shown below:
print(fit_one_censored, pars=c("a0","c0","sig_obs"), probs=c(.1,.5,.9))
## Inference for Stan model: anon_model.
## 4 chains, each with iter=2000; warmup=1000; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=4000.
##
## mean se_mean sd 10% 50% 90% n_eff Rhat
## a0 0.74 0.00 0.11 0.60 0.74 0.88 1275 1
## c0 20.37 0.05 1.61 18.38 20.35 22.41 1283 1
## sig_obs 0.92 0.01 0.22 0.68 0.88 1.21 1331 1
##
## Samples were drawn using NUTS(diag_e) at Fri Sep 13 11:42:33 2024.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
The estimated exponential decay curve is shown below:
#extract posterior samples
stan_post <- extract(fit_one_censored, permuted = FALSE)
a0 <- as.vector(stan_post[,,2])
c0 <- as.vector(stan_post[,,3])
#graph
plot(-1,-1,xlim=c(0,30),ylim=c(-5,10),col="white",
xlab="Days after Symptom Onset",ylab="Genome Copies per mL",main="Subject 3",yaxt="n",cex.axis=1.5,cex.lab=1.5,cex.main=2);
lines(x=c(0,30),y=c(censorlimit,censorlimit),lty=3,col="black");
points(data_one_censored$t_obs,data_one_censored$c_obs,col="black",cex=1.5,pch=16)
points(data_one_censored$t_cen,rep(data_one_censored$censlim,length(data_one_censored$t_cen)),col="black",cex=1.5,pch=1)
graph.resp.1(color="black",model="exp")
legend("topright", legend=c("Quantifiable","Negative","Median","95% Credible Interval"), lty=c(NA,NA,1,3), pch=c(16,1,NA,NA), lwd=2, pt.cex=2)
text(5,2.5,"Limit of Quantification")
ticks.log(2,cex.axis=1)
Next, we consider the case of multiple subjects (all 9 subjects) with both quantifiable positive samples and negative samples. Negative samples are treated as censored data indicating the SARS-CoV-2 RNA concentrations in those samples are below the Limit of Quantification (LOQ). The data can be created using the following code:
#Create indicators for noncensored and censored observations;
ind.obs <- which(!is.na(shedding.data$value))
ind.cen <- which(is.na(shedding.data$value))
#Create input data for Stan model
data_multi_censored <- list(
"nsubj" = length(unique(shedding.data$subjects)), #number of subjects
"nobs"=length(ind.obs), #number of noncensored observations;
"ncen" = length(ind.cen), #number of censored observations;
"censlim"=1.96, #log10 scale limit of quantification; there is a observation less than censorlimit 2.
"t_obs" = shedding.data$day[ind.obs], #days after symptom onset;
"c_obs" = shedding.data$value[ind.obs], #log10 concentration of SARS-CoV-2 RNA (copies per mL);
"p_obs" = shedding.data$subjects[ind.obs], #subject ID vector for noncensored observations;
"t_cen" = shedding.data$day[ind.cen], #days after symptom onset;
"p_cen" = shedding.data$subjects[ind.cen] #subject ID vector for censored observations;
)
rm(ind.obs) #remove indicators;
rm(ind.cen) #remove indicators;
data_multi_censored
## $nsubj
## [1] 9
##
## $nobs
## [1] 68
##
## $ncen
## [1] 14
##
## $censlim
## [1] 1.96
##
## $t_obs
## [1] 8 9 10 11 12 13 14 15 16 17 20 21 6 8 9 10 11 12 13 14 15 19 6 7 8
## [26] 9 10 11 12 13 14 15 16 17 18 19 7 8 9 10 11 12 14 15 16 19 20 6 8 9
## [51] 12 20 21 6 8 9 13 18 22 6 18 23 26 3 4 5 8 9
##
## $c_obs
## [1] 6.64 7.16 6.68 4.67 4.70 3.67 3.70 3.73 3.01 2.42 3.46 2.80 5.74 4.88 4.77
## [16] 3.63 3.25 2.83 3.84 3.87 3.11 2.28 6.10 6.03 6.79 6.62 4.68 5.10 5.79 4.30
## [31] 4.89 3.64 4.71 4.75 4.09 2.33 6.05 7.23 7.51 5.57 4.43 5.15 4.88 4.25 3.98
## [46] 3.53 3.56 4.68 2.78 2.99 4.27 2.57 2.95 3.88 3.78 4.09 4.23 4.58 2.57 3.66
## [61] 2.59 2.04 1.97 4.68 4.33 3.61 3.99 2.71
##
## $p_obs
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4
## [39] 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 8 8 8 8 8
##
## $t_cen
## [1] 7 20 22 23 26 16 21 19 20 24 12 8 9 10
##
## $p_cen
## [1] 2 3 3 3 5 6 6 7 7 7 8 9 9 9
The hierarchical exponential decay model for multiple subjects with both quantifiable positive samples and negative samples (expon_hier_cen.stan) is coded in Stan as follows:
data {
int<lower=0> nsubj; // number of subjects
int<lower=0> nobs; // number of noncensored observation
int<lower=0> ncen; // number of censored observation
vector[nobs] t_obs; // noncensored observations time vector
vector[nobs] c_obs; // noncensored observations log10 concentration vector
int p_obs[nobs]; // noncensored observations subject vector
vector[ncen] t_cen; // censored observations time vector
int p_cen[ncen]; // censored observations subject vector
real<upper=min(c_obs)> censlim; //limit of quantification
}
parameters {
real<lower=0> sig_obs;//standard deviation of measurement error; // prediction error scale;
cholesky_factor_corr[2] L_Omega; // Cholesky factorization for LKJ prior correlation;
vector<lower=0>[2] tau; //precision
vector[2] mu; //mean
matrix[nsubj,2] par;
}
transformed parameters {
vector[nsubj] a0; //decay rate;
vector<lower=0>[nsubj] c0; //log scale concentration at day 0;
for (k_subj in 1:nsubj){
a0[k_subj] = exp(par[k_subj,1]);
c0[k_subj] = exp(par[k_subj,2]);
}
}
model {
// priors
sig_obs ~ uniform(0,10);
tau ~ cauchy(0, 2.5);
L_Omega ~ lkj_corr_cholesky(2);
mu ~ normal(0,10);
for (k_subj in 1:nsubj)
//Multivariate normal using Cholesky parameterization;
par[k_subj,] ~ multi_normal_cholesky(mu,diag_pre_multiply(tau, L_Omega));
vector[nobs] gene_obs;
vector[ncen] gene_cen;
for (k_obs in 1:nobs)
gene_obs[k_obs] = c0[p_obs[k_obs]]/log(10)-a0[p_obs[k_obs]]*t_obs[k_obs]/log(10);
for (k_cen in 1:ncen)
gene_cen[k_cen] = c0[p_cen[k_cen]]/log(10)-a0[p_cen[k_cen]]*t_cen[k_cen]/log(10);
// likelihood
c_obs ~ normal(gene_obs, sig_obs);
target += normal_lcdf(censlim | gene_cen, sig_obs);
}
Then, we fit the hierarchical model using the following codes:
fit_multi_censored <- stan(
file = "./models/expon_hier_cen.stan", # Stan program
data = data_multi_censored, # named list of data
chains = 4, # number of Markov chains
warmup = 2000, # number of warmup iterations per chain
iter = 5000, # total number of iterations per chain
cores = 4, # number of cores (could use one per chain)
refresh = 500, # progress shown
control = list(adapt_delta = 0.999, max_treedepth = 15), #model tuning
seed = 123 #set a seed
)
The inference results are shown below:
print(fit_multi_censored, pars=c("mu","tau","L_Omega","a0","c0","sig_obs"), probs=c(.1,.5,.9))
## Inference for Stan model: anon_model.
## 4 chains, each with iter=5000; warmup=2000; thin=1;
## post-warmup draws per chain=3000, total post-warmup draws=12000.
##
## mean se_mean sd 10% 50% 90% n_eff Rhat
## mu[1] -0.86 0.00 0.32 -1.24 -0.84 -0.50 4443 1
## mu[2] 2.56 0.00 0.17 2.36 2.58 2.76 5298 1
## tau[1] 0.65 0.01 0.36 0.29 0.58 1.10 2442 1
## tau[2] 0.41 0.00 0.16 0.24 0.37 0.61 4296 1
## L_Omega[1,1] 1.00 NaN 0.00 1.00 1.00 1.00 NaN NaN
## L_Omega[1,2] 0.00 NaN 0.00 0.00 0.00 0.00 NaN NaN
## L_Omega[2,1] 0.45 0.01 0.36 -0.07 0.51 0.88 5016 1
## L_Omega[2,2] 0.79 0.00 0.21 0.47 0.86 0.99 3733 1
## a0[1] 0.63 0.00 0.15 0.44 0.63 0.83 5770 1
## a0[2] 0.35 0.00 0.13 0.18 0.35 0.51 4906 1
## a0[3] 0.66 0.00 0.12 0.52 0.66 0.81 6057 1
## a0[4] 0.53 0.00 0.14 0.36 0.53 0.71 7479 1
## a0[5] 0.30 0.00 0.11 0.17 0.30 0.44 4046 1
## a0[6] 0.31 0.00 0.11 0.17 0.31 0.45 4257 1
## a0[7] 0.30 0.00 0.11 0.16 0.30 0.45 3722 1
## a0[8] 0.56 0.00 0.25 0.27 0.51 0.90 5946 1
## a0[9] 1.26 0.05 3.72 0.15 0.61 2.48 5294 1
## c0[1] 18.81 0.03 2.22 15.97 18.76 21.68 5779 1
## c0[2] 12.24 0.02 1.60 10.19 12.26 14.24 5489 1
## c0[3] 19.28 0.02 1.68 17.10 19.29 21.38 6186 1
## c0[4] 18.48 0.02 1.89 16.10 18.45 20.92 7563 1
## c0[5] 11.38 0.03 1.67 9.25 11.35 13.53 4437 1
## c0[6] 11.79 0.02 1.69 9.62 11.80 13.98 4606 1
## c0[7] 10.39 0.03 2.24 7.49 10.43 13.21 4149 1
## c0[8] 11.78 0.02 1.84 9.58 11.63 14.22 6336 1
## c0[9] 10.09 0.12 7.22 4.43 8.36 17.28 3649 1
## sig_obs 0.92 0.00 0.09 0.80 0.91 1.04 8073 1
##
## Samples were drawn using NUTS(diag_e) at Fri Sep 13 11:43:32 2024.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
The estimated exponential decay curves for 9 subjects are shown below:
#extract posterior samples
stan_post <- extract(fit_multi_censored, permuted = FALSE)
#graph
par(mfrow=c(3,3))
for (k.subj in 1:data_multi_censored$nsubj){
a0 <- as.vector(stan_post[,,27+k.subj])
c0 <- as.vector(stan_post[,,27+data_multi_censored$nsubj+k.subj])
plot(-1,-1,xlim=c(0,30),ylim=c(-5,10),col="white",
xlab="Days after Symptom Onset",ylab="Genome Copies per mL",main=paste("Subject", k.subj),yaxt="n",cex.axis=1.5,cex.lab=1.5,cex.main=2.5);
lines(x=c(0,30),y=c(data_multi_censored$censlim,data_multi_censored$censlim),lty=3,col="black");
graph.obs(data_multi_censored,k.subj,"black");
graph.resp(k.subj,"black",model="exp");
ticks.log(2,cex.axis=1);
legend("topright", legend=c("Quantifiable","Negative","Median","95% Credible Interval"), lty=c(NA,NA,1,3), pch=c(16,1,NA,NA), lwd=2, pt.cex=2)
}
Then, we can use the model to make prediction of the concentration of SARS-CoV-2 in stool (genome copies per mL) given the days after symptom onset. To show the prediction results, we simulate predicted SARS-CoV-2 concentration between day 1 and day 30 after symptom onset.
library(MASS)
#R simulations
t.new <- seq(0,30,0.1)
n.sim <- 100000
n.iter <- length(stan_post[,1,1])
n.chain <- length(stan_post[1,,1])
pred.dat.r <- matrix(NA,nrow=length(t.new),ncol=8)
sim.dat <- matrix(NA,nrow=n.sim,ncol=length(t.new))
for (k.sim in 1:n.sim){
rng1 <- sample(1:n.iter,1)
rng2 <- sample(1:n.chain,1)
sim.sigma <- diag(stan_post[rng1,rng2,6:7])%*%matrix(c(stan_post[rng1,rng2,2:5]),2,2)%*%t(matrix(c(stan_post[rng1,rng2,2:5]),2,2))%*%diag(stan_post[rng1,rng2,6:7])
sim.par <- exp(mvrnorm(n=1,stan_post[rng1,rng2,8:9],Sigma=sim.sigma))
sim.dat[k.sim,] <- rnorm(length(t.new),mean=sim.par[2]/log(10)-sim.par[1]*t.new/log(10),sd=sample(stan_post[,,1],1))
}
for (i in 1:length(t.new)){
pred.dat.r[i,1] <- t.new[i]
pred.dat.r[i,2:8] <- quantile(sim.dat[,i],c(0.025,0.05,0.1,0.5,0.90,0.95,0.975))
}
Then, we plot the credible interval of the prediction.
#graph
par(mfrow=c(1,1))
plot(-1,-1,xlim=c(0,30),ylim=c(-5,15),col="white",
xlab="Days after Symptom Onset",ylab="Genome Copies per mL",main="Prediction with Credible Intervals",yaxt="n",cex.axis=1.5,cex.lab=1.5,cex.main=2);
lines(x=c(0,30),y=c(censorlimit,censorlimit),lty=3,col="black");
for (k.subj in 1:data_multi_censored$nsubj){
graph.obs(data_multi_censored,k.subj,"grey");
}
#95% credible interval;
lines(x=pred.dat.r[,1],y=pred.dat.r[,2],col="black",lty=3,lwd=2);
lines(x=pred.dat.r[,1],y=pred.dat.r[,8],col="black",lty=3,lwd=2);
lines(x=pred.dat.r[,1],y=pred.dat.r[,3],col="black",lty=4,lwd=2);
lines(x=pred.dat.r[,1],y=pred.dat.r[,7],col="black",lty=4,lwd=2);
lines(x=pred.dat.r[,1],y=pred.dat.r[,4],col="black",lty=2,lwd=2);
lines(x=pred.dat.r[,1],y=pred.dat.r[,6],col="black",lty=2,lwd=2);
lines(x=pred.dat.r[,1],y=pred.dat.r[,5],col="black",lty=1,lwd=2);
ticks.log(2,cex.axis=1);
legend("topright",legend=c("Quantifiable","Negative","Median","80% Credible Interval","90% Credible Interval","95% Credible Interval"), lty=c(NA,NA,1,2,4,3), lwd=2, pt.cex=2, pch=c(16,1,NA,NA,NA,NA))
When the concentration of pathogen shed c(t) is subject to gamma model, it follows:
\[c(t)=c_{0}t^{b_{0}}e^{-a_{0}t},\] Where \(c_{0}\) is the concentration of pathogen shed at symptom onset, a_{0} is the rate parameter, b_{0} is the shape parameter, and t is the day after symptom onset. In this example, we are assuming the fecal shedding starts on symptom onset.
When there are samples from multiple subjects, we can develop a hierarchical model:
\[c_i(t)=c_{i,0}t^{b_{i,0}}e^{-a_{i,0}t},\]
for subject i. \((log(a_{i,0}), log(b_{i,0}), log(c_{i,0})) \sim \mathcal{N}(\mathbf{\mu},\mathbf{\Omega^{-1}})\), where \(\mathbf{\mu}=(\mu_a,\mu_b,\mu_c)\). And we use the Cholesky parameterization for the correlation matrix and \(\mathbf{\Omega^{-1}}=diag(\tau_a,\tau_b,\tau_c)LL^{\top}diag(\tau_a,\tau_b,\tau_c)\). \(L\) is lower triangular such that \(LL^{\top}\) is positive definite.
Similar to the exponential decay case, we first consider the simplest case of one subject with only quantifiable positive samples. For this example, we select Subject 3, which has 14 positive samples and 3 negative samples, but only use measurements of positive samples. The data can be created using the following code:
#Create an indicator for noncensored observations of Subject 3;
ind <- which(shedding.data$subjects==3 & !is.na(shedding.data$value))
#Create input data for Stan model
data_one_noncensored <- list(
"nobs"=length(ind), #number of noncensored observations;
"t" = shedding.data$day[ind], #days after symptom onset;
"c" = shedding.data$value[ind] #log10 concentration of SARS-CoV-2 RNA (copies per mL);
)
rm(ind) #remove the indicator;
data_one_noncensored
## $nobs
## [1] 14
##
## $t
## [1] 6 7 8 9 10 11 12 13 14 15 16 17 18 19
##
## $c
## [1] 6.10 6.03 6.79 6.62 4.68 5.10 5.79 4.30 4.89 3.64 4.71 4.75 4.09 2.33
The gamma model for one subject with only quantifiable positive samples (gamma1.stan) is coded in Stan as follows:
data {
int<lower=0> nobs;// number of noncensored observation
vector[nobs] t; // noncensored observations time vector
vector[nobs] c; // noncensored observations log10 concentration vector
}
parameters {
real<lower=0> sig_obs;//standard deviation of measurement error;
real a0; //rate;
real b0; //shape;
real<lower=0> c0; //log scale concentration at day 0;
}
transformed parameters {
vector[nobs] gene_obs;
for (k_obs in 1:nobs)
gene_obs[k_obs] = c0/log(10)+log(t[k_obs]^b0)/log(10)-a0*t[k_obs]/log(10);
}
model {
// priors
sig_obs ~ uniform(0,10);
a0 ~ normal(0,100);
b0 ~ normal(0,100);
c0 ~ normal(0,100);
// likelihood
c ~ normal(gene_obs, sig_obs);
}
Then, we fit the model using the following codes:
fit_one_noncensored <- stan(
file = "./models/gamma1.stan", # Stan program
data = data_one_noncensored, # named list of data
chains = 4, # number of Markov chains
warmup = 1000, # number of warmup iterations per chain
iter = 2000, # total number of iterations per chain
cores = 4, # number of cores (could use one per chain)
refresh = 500 # no progress shown
)
The inference results are shown below:
print(fit_one_noncensored, pars=c("a0","b0","c0","sig_obs"), probs=c(.1,.5,.9))
## Inference for Stan model: anon_model.
## 4 chains, each with iter=2000; warmup=1000; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=4000.
##
## mean se_mean sd 10% 50% 90% n_eff Rhat
## a0 0.76 0.02 0.63 -0.06 0.85 1.50 673 1
## b0 2.35 0.28 7.10 -6.93 3.47 10.56 662 1
## c0 15.20 0.38 9.87 3.67 13.66 28.44 684 1
## sig_obs 0.82 0.01 0.21 0.60 0.78 1.08 679 1
##
## Samples were drawn using NUTS(diag_e) at Fri Sep 13 11:44:21 2024.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
The estimated gamma curve is shown below:
#extract posterior samples
stan_post <- extract(fit_one_noncensored, permuted = FALSE)
a0 <- as.vector(stan_post[,,2])
b0 <- as.vector(stan_post[,,3])
c0 <- as.vector(stan_post[,,4])
#graph
plot(-1,-1,xlim=c(0,30),ylim=c(-5,10),col="white",
xlab="Days after Symptom Onset",ylab="Genome Copies per mL",main="Subject 3",yaxt="n",cex.axis=1.5,cex.lab=1.5,cex.main=2);
lines(x=c(0,30),y=c(censorlimit,censorlimit),lty=3,col="black");
points(data_one_noncensored$t,data_one_noncensored$c,col="black",cex=1.5,pch=16)
graph.resp.1(color="black",model="gamma")
legend("topright", legend=c("Quantifiable","Negative","Median","95% Credible Interval"), lty=c(NA,NA,1,3), pch=c(16,1,NA,NA), lwd=2, pt.cex=2)
text(5,2.5,"Limit of Quantification")
ticks.log(2,cex.axis=1)
Next, we consider the case of one subject with both quantifiable positive samples and negative samples. Negative samples are treated as censored data indicating the SARS-CoV-2 RNA concentrations in those samples are below the Limit of Quantification (LOQ). We still use Subject 3, which has 14 positive samples and 3 negative samples. The data can be created using the following code:
#Create indicators for noncensored and censored observations of Subject 3;
ind.obs <- which(shedding.data$subjects==3 & !is.na(shedding.data$value))
ind.cen <- which(shedding.data$subjects==3 & is.na(shedding.data$value))
#Create input data for Stan model
data_one_censored <- list(
"nobs"=length(ind.obs), #number of noncensored observations;
"ncen" = length(ind.cen), #number of censored observations;
"censlim"=censorlimit, #log10 scale limit of quantification;
"t_obs" = shedding.data$day[ind.obs], #days after symptom onset;
"c_obs" = shedding.data$value[ind.obs], #log10 concentration of SARS-CoV-2 RNA (copies per mL);
"t_cen" = shedding.data$day[ind.cen] #days after symptom onset;
)
rm(ind.obs) #remove indicators;
rm(ind.cen) #remove indicators;
data_one_censored
## $nobs
## [1] 14
##
## $ncen
## [1] 3
##
## $censlim
## [1] 2
##
## $t_obs
## [1] 6 7 8 9 10 11 12 13 14 15 16 17 18 19
##
## $c_obs
## [1] 6.10 6.03 6.79 6.62 4.68 5.10 5.79 4.30 4.89 3.64 4.71 4.75 4.09 2.33
##
## $t_cen
## [1] 20 22 23
The gamma model for one subject with both quantifiable positive samples and negative samples (gamma1cen.stan) is coded in Stan as follows:
data {
int<lower=0> nobs; // number of noncensored observation
int<lower=0> ncen; // number of censored observation
vector[nobs] t_obs; // noncensored observations time vector
vector[nobs] c_obs; // noncensored observations log10 concentration vector
vector[ncen] t_cen; // censored observations time vector
real<upper=min(c_obs)> censlim;
}
parameters {
real<lower=0> sig_obs;//standard deviation of measurement error;
real a0; //rate;
real b0; //shape;
real<lower=0> c0; //log scale concentration at day 0;
}
transformed parameters {
vector[nobs] gene_obs;
vector[ncen] gene_cen;
for (k_obs in 1:nobs)
gene_obs[k_obs] = c0/log(10)+log(t_obs[k_obs]^b0)/log(10)-a0*t_obs[k_obs]/log(10);
for (k_cen in 1:ncen)
gene_cen[k_cen] = c0/log(10)+log(t_cen[k_cen]^b0)/log(10)-a0*t_cen[k_cen]/log(10);
}
model {
// priors
sig_obs ~ uniform(0,10);
a0 ~ normal(0,100);
b0 ~ normal(0,100);
c0 ~ normal(0,100);
// likelihood
c_obs ~ normal(gene_obs, sig_obs);
target += normal_lcdf(censlim | gene_cen, sig_obs);
}
Then, we fit the model using the following codes:
fit_one_censored <- stan(
file = "./models/gamma1cen.stan", # Stan program
data = data_one_censored, # named list of data
chains = 4, # number of Markov chains
warmup = 1000, # number of warmup iterations per chain
iter = 2000, # total number of iterations per chain
cores = 4, # number of cores (could use one per chain)
refresh = 500 # no progress shown
)
The inference results are shown below:
print(fit_one_censored, pars=c("a0","b0","c0","sig_obs"), probs=c(.1,.5,.9))
## Inference for Stan model: anon_model.
## 4 chains, each with iter=2000; warmup=1000; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=4000.
##
## mean se_mean sd 10% 50% 90% n_eff Rhat
## a0 1.41 0.02 0.38 0.97 1.44 1.84 293 1.01
## b0 8.34 0.29 4.44 3.11 9.09 13.05 238 1.02
## c0 8.32 0.44 6.58 1.47 7.07 15.89 226 1.02
## sig_obs 0.87 0.01 0.22 0.64 0.83 1.15 669 1.00
##
## Samples were drawn using NUTS(diag_e) at Fri Sep 13 11:44:58 2024.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
The estimated exponential decay curve is shown below:
#extract posterior samples
stan_post <- extract(fit_one_censored, permuted = FALSE)
a0 <- as.vector(stan_post[,,2])
b0 <- as.vector(stan_post[,,3])
c0 <- as.vector(stan_post[,,4])
#graph
plot(-1,-1,xlim=c(0,30),ylim=c(-5,10),col="white",
xlab="Days after Symptom Onset",ylab="Genome Copies per mL",main="Subject 3",yaxt="n",cex.axis=1.5,cex.lab=1.5,cex.main=2);
lines(x=c(0,30),y=c(censorlimit,censorlimit),lty=3,col="black");
points(data_one_censored$t_obs,data_one_censored$c_obs,col="black",cex=1.5,pch=16)
points(data_one_censored$t_cen,rep(data_one_censored$censlim,length(data_one_censored$t_cen)),col="black",cex=1.5,pch=1)
graph.resp.1(color="black",model="gamma")
legend("topright", legend=c("Quantifiable","Negative","Median","95% Credible Interval"), lty=c(NA,NA,1,3), pch=c(16,1,NA,NA), lwd=2, pt.cex=2)
text(5,2.5,"Limit of Quantification")
ticks.log(2,cex.axis=1)
At last, we consider the case of multiple subjects (all 9 subjects) with both quantifiable positive samples and negative samples. Negative samples are treated as censored data indicating the SARS-CoV-2 RNA concentrations in those samples are below the Limit of Quantification (LOQ). The data can be created using the following code:
#Create indicators for noncensored and censored observations;
ind.obs <- which(!is.na(shedding.data$value))
ind.cen <- which(is.na(shedding.data$value))
#Create input data for Stan model
data_multi_censored <- list(
"nsubj" = length(unique(shedding.data$subjects)), #number of subjects
"nobs"=length(ind.obs), #number of noncensored observations;
"ncen" = length(ind.cen), #number of censored observations;
"censlim"=1.96, #log10 scale limit of quantification; there is a observation less than censorlimit 2.
"t_obs" = shedding.data$day[ind.obs], #days after symptom onset;
"c_obs" = shedding.data$value[ind.obs], #log10 concentration of SARS-CoV-2 RNA (copies per mL);
"p_obs" = shedding.data$subjects[ind.obs], #subject ID vector for noncensored observations;
"t_cen" = shedding.data$day[ind.cen], #days after symptom onset;
"p_cen" = shedding.data$subjects[ind.cen] #subject ID vector for censored observations;
)
rm(ind.obs) #remove indicators;
rm(ind.cen) #remove indicators;
data_multi_censored
## $nsubj
## [1] 9
##
## $nobs
## [1] 68
##
## $ncen
## [1] 14
##
## $censlim
## [1] 1.96
##
## $t_obs
## [1] 8 9 10 11 12 13 14 15 16 17 20 21 6 8 9 10 11 12 13 14 15 19 6 7 8
## [26] 9 10 11 12 13 14 15 16 17 18 19 7 8 9 10 11 12 14 15 16 19 20 6 8 9
## [51] 12 20 21 6 8 9 13 18 22 6 18 23 26 3 4 5 8 9
##
## $c_obs
## [1] 6.64 7.16 6.68 4.67 4.70 3.67 3.70 3.73 3.01 2.42 3.46 2.80 5.74 4.88 4.77
## [16] 3.63 3.25 2.83 3.84 3.87 3.11 2.28 6.10 6.03 6.79 6.62 4.68 5.10 5.79 4.30
## [31] 4.89 3.64 4.71 4.75 4.09 2.33 6.05 7.23 7.51 5.57 4.43 5.15 4.88 4.25 3.98
## [46] 3.53 3.56 4.68 2.78 2.99 4.27 2.57 2.95 3.88 3.78 4.09 4.23 4.58 2.57 3.66
## [61] 2.59 2.04 1.97 4.68 4.33 3.61 3.99 2.71
##
## $p_obs
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4
## [39] 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 8 8 8 8 8
##
## $t_cen
## [1] 7 20 22 23 26 16 21 19 20 24 12 8 9 10
##
## $p_cen
## [1] 2 3 3 3 5 6 6 7 7 7 8 9 9 9
The hierarchical gamma model for multiple subjects with both quantifiable positive samples and negative samples (gamma_hier_cen.stan) is coded in Stan as follows:
data {
int<lower=0> nsubj; // number of subjects
int<lower=0> nobs; // number of noncensored observation
int<lower=0> ncen; // number of censored observation
vector[nobs] t_obs; // noncensored observations time vector
vector[nobs] c_obs; // noncensored observations log10 concentration vector
int p_obs[nobs]; // noncensored observations subject vector
vector[ncen] t_cen; // censored observations time vector
int p_cen[ncen]; // censored observations subject vector
real<upper=min(c_obs)> censlim;
}
parameters {
real<lower=0> sig_obs;//standard deviation of measurement error;
cholesky_factor_corr[3] L_Omega; // Cholesky factorization for LKJ prior correlation;
vector<lower=0>[3] tau;
vector[3] mu;
matrix[nsubj,3] par_raw;
}
transformed parameters {
vector[nsubj] a0; //rate;
vector[nsubj] b0; //shape;
vector[nsubj] c0; //log scale concentration at day 0;
matrix[nsubj,3] par;
for (k_subj in 1:nsubj)
//reparameterization
par[k_subj,] = to_row_vector(mu + tau .* (L_Omega * to_vector(par_raw[k_subj,])));
for (k_subj in 1:nsubj){
a0[k_subj] = exp(par[k_subj,1]);
b0[k_subj] = exp(par[k_subj,2]);
c0[k_subj] = exp(par[k_subj,3]);
}
vector[nobs] gene_obs;
vector[ncen] gene_cen;
for (k_obs in 1:nobs)
gene_obs[k_obs] = c0[p_obs[k_obs]]/log(10)+log(t_obs[k_obs]^b0[p_obs[k_obs]])/log(10)-a0[p_obs[k_obs]]*t_obs[k_obs]/log(10);
for (k_cen in 1:ncen)
gene_cen[k_cen] = c0[p_cen[k_cen]]/log(10)+log(t_cen[k_cen]^b0[p_obs[k_cen]])/log(10)-a0[p_cen[k_cen]]*t_cen[k_cen]/log(10);
}
model {
// priors
sig_obs ~ uniform(0,5);
tau ~ cauchy(0, 2.5);
L_Omega ~ lkj_corr_cholesky(3);
mu ~ normal(0,5);
// likelihood
for (k_subj in 1:nsubj)
par_raw[k_subj,] ~ std_normal();
c_obs ~ normal(gene_obs, sig_obs);
target += normal_lcdf(censlim | gene_cen, sig_obs);
}
Then, we fit the hierarchical model using the following codes:
fit_multi_censored <- stan(
file = "./models/gamma_hier_cen.stan", # Stan program
data = data_multi_censored, # named list of data
chains = 4, # number of Markov chains
warmup = 5000, # number of warmup iterations per chain
iter = 10000, # total number of iterations per chain
cores = 4, # number of cores (could use one per chain)
refresh = 500, # progress shown
control = list(adapt_delta = 0.999, max_treedepth = 15), #model tuning
seed = 123 #set a seed
)
The inference results are shown below:
print(fit_multi_censored, pars=c("mu","tau","L_Omega","a0","b0","c0","sig_obs"), probs=c(.1,.5,.9))
## Inference for Stan model: anon_model.
## 4 chains, each with iter=10000; warmup=5000; thin=1;
## post-warmup draws per chain=5000, total post-warmup draws=20000.
##
## mean se_mean sd 10% 50% 90% n_eff Rhat
## mu[1] -0.38 0.00 0.24 -0.67 -0.38 -0.11 8448 1
## mu[2] 0.84 0.01 0.67 0.03 0.94 1.56 7534 1
## mu[3] 1.97 0.01 0.53 1.37 2.08 2.46 4887 1
## tau[1] 0.41 0.00 0.31 0.08 0.35 0.81 3982 1
## tau[2] 1.02 0.01 0.63 0.43 0.87 1.77 8550 1
## tau[3] 0.85 0.01 0.52 0.39 0.70 1.45 4399 1
## L_Omega[1,1] 1.00 NaN 0.00 1.00 1.00 1.00 NaN NaN
## L_Omega[1,2] 0.00 NaN 0.00 0.00 0.00 0.00 NaN NaN
## L_Omega[1,3] 0.00 NaN 0.00 0.00 0.00 0.00 NaN NaN
## L_Omega[2,1] 0.03 0.00 0.34 -0.43 0.03 0.48 8924 1
## L_Omega[2,2] 0.94 0.00 0.08 0.83 0.97 1.00 11238 1
## L_Omega[2,3] 0.00 NaN 0.00 0.00 0.00 0.00 NaN NaN
## L_Omega[3,1] 0.15 0.00 0.32 -0.28 0.17 0.57 10036 1
## L_Omega[3,2] -0.40 0.00 0.28 -0.74 -0.44 -0.01 9514 1
## L_Omega[3,3] 0.78 0.00 0.16 0.56 0.81 0.96 10167 1
## a0[1] 0.75 0.00 0.13 0.59 0.75 0.92 14925 1
## a0[2] 0.72 0.00 0.19 0.50 0.71 0.97 11471 1
## a0[3] 0.82 0.00 0.20 0.61 0.78 1.09 7969 1
## a0[4] 0.76 0.00 0.19 0.55 0.73 1.00 8564 1
## a0[5] 0.52 0.00 0.17 0.30 0.52 0.75 5938 1
## a0[6] 0.54 0.00 0.17 0.33 0.54 0.75 7263 1
## a0[7] 0.55 0.00 0.16 0.34 0.55 0.76 7229 1
## a0[8] 0.81 0.00 0.30 0.52 0.75 1.19 11164 1
## a0[9] 1.44 0.03 4.06 0.56 0.93 2.44 15303 1
## b0[1] 0.82 0.01 0.76 0.10 0.60 1.84 10857 1
## b0[2] 3.50 0.01 1.45 1.82 3.35 5.36 9355 1
## b0[3] 3.72 0.02 2.19 1.47 3.21 6.76 8065 1
## b0[4] 2.43 0.03 2.19 0.30 1.83 5.33 6672 1
## b0[5] 3.56 0.03 1.85 1.10 3.50 6.10 5024 1
## b0[6] 4.50 0.02 1.66 2.35 4.45 6.74 6110 1
## b0[7] 3.75 0.02 1.76 1.52 3.63 6.18 6280 1
## b0[8] 2.59 0.02 1.73 0.68 2.29 4.78 12016 1
## b0[9] 37.92 14.48 1518.97 0.85 4.25 16.38 10999 1
## c0[1] 18.38 0.02 2.09 15.74 18.45 20.94 13003 1
## c0[2] 8.54 0.02 2.25 5.63 8.69 11.30 13111 1
## c0[3] 12.48 0.04 3.58 7.64 12.94 16.60 10373 1
## c0[4] 15.34 0.04 3.81 10.13 16.07 19.47 7518 1
## c0[5] 5.73 0.04 3.05 1.60 5.67 9.83 6397 1
## c0[6] 4.79 0.03 2.66 1.23 4.70 8.36 7721 1
## c0[7] 5.50 0.03 2.90 1.57 5.46 9.34 9034 1
## c0[8] 9.11 0.02 2.41 5.96 9.26 12.04 15706 1
## c0[9] 4.95 0.06 4.59 0.93 3.91 9.87 6195 1
## sig_obs 0.76 0.00 0.08 0.67 0.75 0.87 12554 1
##
## Samples were drawn using NUTS(diag_e) at Fri Sep 13 11:54:18 2024.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
The estimated exponential decay curve is shown below:
#extract posterior samples
stan_post <- extract(fit_multi_censored, permuted = FALSE)
#graph
par(mfrow=c(3,3))
for (k.subj in 1:data_multi_censored$nsubj){
a0 <- as.vector(stan_post[,,43+k.subj])
b0 <- as.vector(stan_post[,,52+k.subj])
c0 <- as.vector(stan_post[,,61+k.subj])
plot(-1,-1,xlim=c(0,30),ylim=c(-5,10),col="white",
xlab="Days after Symptom Onset",ylab="Genome Copies per mL",main=paste("Subject", k.subj),yaxt="n",cex.axis=1.5,cex.lab=1.5,cex.main=2.5);
lines(x=c(0,30),y=c(data_multi_censored$censlim,data_multi_censored$censlim),lty=3,col="black");
graph.obs(data_multi_censored,k.subj,"black");
graph.resp(k.subj,"black",model="gamma");
ticks.log(2,cex.axis=1);
legend("topright", legend=c("Quantifiable","Negative","Median","95% Credible Interval"), lty=c(NA,NA,1,3), pch=c(16,1,NA,NA), lwd=2, pt.cex=2)
}
Then, we can use the model to make prediction of the concentration of SARS-CoV-2 in stool (genome copies per mL) given the days after symptom onset. To show the prediction results, we simulate predicted SARS-CoV-2 concentration between day 1 and day 30 after symptom onset.
library(MASS)
#R simulations
t.new <- seq(0,30,0.1)
n.sim <- 100000
n.iter <- length(stan_post[,1,1])
n.chain <- length(stan_post[1,,1])
pred.dat.r <- matrix(NA,nrow=length(t.new),ncol=8)
sim.dat <- matrix(NA,nrow=n.sim,ncol=length(t.new))
for (k.sim in 1:n.sim){
rng1 <- sample(1:n.iter,1)
rng2 <- sample(1:n.chain,1)
sim.sigma <- diag(stan_post[rng1,rng2,11:13])%*%matrix(c(stan_post[rng1,rng2,2:10]),3,3)%*%t(matrix(c(stan_post[rng1,rng2,2:10]),3,3))%*%diag(stan_post[rng1,rng2,11:13])
sim.par <- exp(mvrnorm(n=1,stan_post[rng1,rng2,14:16],Sigma=sim.sigma))
sim.dat[k.sim,] <- rnorm(length(t.new),mean=sim.par[3]/log(10)+log(t.new^sim.par[2])/log(10)-sim.par[1]*t.new/log(10),sd=sample(stan_post[,,1],1))
}
for (i in 1:length(t.new)){
pred.dat.r[i,1] <- t.new[i]
pred.dat.r[i,2:8] <- quantile(sim.dat[,i],c(0.025,0.05,0.1,0.5,0.90,0.95,0.975))
}
Then, we plot the credible interval of the prediction.
#graph
par(mfrow=c(1,1))
plot(-1,-1,xlim=c(0,30),ylim=c(-5,15),col="white",
xlab="Days after Symptom Onset",ylab="Genome Copies per mL",main="Prediction with Credible Intervals",yaxt="n",cex.axis=1.5,cex.lab=1.5,cex.main=2);
lines(x=c(0,30),y=c(censorlimit,censorlimit),lty=3,col="black");
for (k.subj in 1:data_multi_censored$nsubj){
graph.obs(data_multi_censored,k.subj,"grey");
}
#95% credible interval;
lines(x=pred.dat.r[,1],y=pred.dat.r[,2],col="black",lty=3,lwd=2);
lines(x=pred.dat.r[,1],y=pred.dat.r[,8],col="black",lty=3,lwd=2);
lines(x=pred.dat.r[,1],y=pred.dat.r[,3],col="black",lty=4,lwd=2);
lines(x=pred.dat.r[,1],y=pred.dat.r[,7],col="black",lty=4,lwd=2);
lines(x=pred.dat.r[,1],y=pred.dat.r[,4],col="black",lty=2,lwd=2);
lines(x=pred.dat.r[,1],y=pred.dat.r[,6],col="black",lty=2,lwd=2);
lines(x=pred.dat.r[,1],y=pred.dat.r[,5],col="black",lty=1,lwd=2);
ticks.log(2,cex.axis=1);
legend("topright",legend=c("Quantifiable","Negative","Median","80% Credible Interval","90% Credible Interval","95% Credible Interval"), lty=c(NA,NA,1,2,4,3), lwd=2, pt.cex=2, pch=c(16,1,NA,NA,NA,NA))