Introduction

The current tutorial demonstrates the Bayesian workflow to model shedding data using Rstan. We use the longitudinal SARS-CoV-2 fecal shedding data from Wölfel et al. (2020). The data includes observations of SARS-CoV-2 RNA concentration in 82 stool samples from 9 patients. The date of sample collection spans from Day 3 to Day 22 after symptom onset. We load the data in R with:

shedding.data <- data.frame(subjects = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 9, 9, 9), 
                            day = c(8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 20, 21, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 19, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 7, 8, 9, 10, 11, 12, 14, 15, 16, 19, 20, 6, 8, 9, 12, 20, 21, 26, 6, 8, 9, 13, 16, 18, 21, 22, 6, 18, 19, 20, 23, 24, 26, 3, 4, 5, 8, 9, 12, 8, 9, 10),
                            value = c(6.64, 7.16, 6.68, 4.67, 4.7, 3.67, 3.7, 3.73, 3.01, 2.42, 3.46, 2.8, 5.74, NA, 4.88, 4.77, 3.63, 3.25, 2.83, 3.84, 3.87, 3.11, 2.28, 6.1, 6.03, 6.79, 6.62, 4.68, 5.1, 5.79, 4.3, 4.89, 3.64, 4.71, 4.75, 4.09, 2.33, NA, NA, NA, 6.05, 7.23, 7.51, 5.57, 4.43, 5.15, 4.88, 4.25, 3.98, 3.53, 3.56, 4.68, 2.78, 2.99, 4.27, 2.57, 2.95, NA, 3.88, 3.78, 4.09, 4.23, NA, 4.58, NA, 2.57, 3.66, 2.59, NA, NA, 2.04, NA, 1.97, 4.68, 4.33, 3.61, 3.99, 2.71, NA, NA, NA, NA))
censorlimit = 2 ##log10 scale limit of quantification

As most samples collected in shedding studies are several days after symptom onset when the number of pathogens shed is decreasing. In this tutorial, we focus on modeling the decay phase of the shedding dynamics. We consider two classical models, exponential decay model and gamma model.

Exponential Decay Model

When the concentration of pathogen shed c(t) is subject to exponential decay, it decreases at a rate \(a_{0}\) proportional to the current value:

\[c(t)=c_{0}e^{-a_{0}t},\] Where \(c_{0}\) is the concentration of pathogen shed at symptom onset and t is the day after symptom onset.

When there are samples from multiple subjects, we can develop a hierarchical model:

\[c_i(t)=c_{i,0}e^{-a_{i,0}t},\]

for subject i. \((log(a_{i,0}),log(c_{i,0})) \sim \mathcal{N}(\mathbf{\mu},\mathbf{\Omega^{-1}})\), where \(\mathbf{\mu}=(\mu_a,\mu_c)\). And we use the Cholesky parameterization for the correlation matrix and \(\mathbf{\Omega^{-1}}=diag(\tau_a,\tau_c)LL^{\top}diag(\tau_a,\tau_c)\). \(L\) is lower triangular such that \(LL^{\top}\) is positive definite.

#install rstan using "install.packages("rstan")";
library(rstan)
#load functions for visualization
source("./models/plotting_helper.R")

Single Subject without Censored Data

First, we consider the simplest case of one subject with only quantifiable positive samples. For this example, we select Subject 3, which has 14 positive samples and 3 negative samples, but only use measurements of positive samples. The data can be created using the following code:

#Create an indicator for noncensored observations of Subject 3;
ind <- which(shedding.data$subjects==3 & !is.na(shedding.data$value))
#Create input data for Stan model 
data_one_noncensored <- list(
  "nobs"=length(ind), #number of noncensored observations;
  "t" = shedding.data$day[ind], #days after symptom onset;
  "c" = shedding.data$value[ind] #log10 concentration of SARS-CoV-2 RNA (copies per mL);
)
rm(ind) #remove the indicator;
data_one_noncensored
## $nobs
## [1] 14
## 
## $t
##  [1]  6  7  8  9 10 11 12 13 14 15 16 17 18 19
## 
## $c
##  [1] 6.10 6.03 6.79 6.62 4.68 5.10 5.79 4.30 4.89 3.64 4.71 4.75 4.09 2.33

Building the Stan model

The exponential decay model for one subject with only quantifiable positive samples (expon1.stan) is coded in Stan as follows:

data {
  int<lower=0> nobs;// number of noncensored observation
  vector[nobs] t;   // noncensored observations time vector
  vector[nobs] c;   // noncensored observations log10 concentration vector
}

parameters {
  real<lower=0> sig_obs;//standard deviation of measurement error;
  real a0; //decay rate;
  real<lower=0> c0; //log scale concentration at day 0;
}

transformed parameters {
  vector[nobs] gene_obs;
  for (k_obs in 1:nobs)
    gene_obs[k_obs] = c0/log(10)-a0*t[k_obs]/log(10);
}

model {
  // priors
  sig_obs ~ uniform(0,10);
  a0 ~ normal(0,100);
  c0 ~ normal(0,100);
  // likelihood
  c ~ normal(gene_obs, sig_obs);
}

Fitting the model

Then, we fit the model using the following codes:

fit_one_noncensored <- stan(
  file = "./models/expon1.stan",  # Stan program
  data = data_one_noncensored,    # named list of data
  chains = 4,             # number of Markov chains
  warmup = 1000,          # number of warmup iterations per chain
  iter = 2000,            # total number of iterations per chain
  cores = 4,              # number of cores (could use one per chain)
  refresh = 500             # no progress shown
)

The inference results are shown below:

print(fit_one_noncensored, pars=c("a0","c0","sig_obs"), probs=c(.1,.5,.9))
## Inference for Stan model: anon_model.
## 4 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=4000.
## 
##          mean se_mean   sd   10%   50%   90% n_eff Rhat
## a0       0.55    0.01 0.14  0.39  0.56  0.71   592    1
## c0      18.34    0.08 1.84 16.24 18.41 20.45   552    1
## sig_obs  0.82    0.01 0.21  0.60  0.78  1.06   641    1
## 
## Samples were drawn using NUTS(diag_e) at Fri Sep 13 11:41:58 2024.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

The estimated exponential decay curve is shown below:

#extract posterior samples
stan_post <- extract(fit_one_noncensored, permuted = FALSE)
a0 <- as.vector(stan_post[,,2])
c0 <- as.vector(stan_post[,,3])

#graph
plot(-1,-1,xlim=c(0,30),ylim=c(-5,10),col="white",
     xlab="Days after Symptom Onset",ylab="Genome Copies per mL",main="Subject 3",yaxt="n",cex.axis=1.5,cex.lab=1.5,cex.main=2);
lines(x=c(0,30),y=c(censorlimit,censorlimit),lty=3,col="black");
points(data_one_noncensored$t,data_one_noncensored$c,col="black",cex=1.5,pch=16)
graph.resp.1(color="black",model="exp")
legend("topright", legend=c("Quantifiable","Negative","Median","95% Credible Interval"), lty=c(NA,NA,1,3), pch=c(16,1,NA,NA), lwd=2, pt.cex=2)
text(5,2.5,"Limit of Quantification")
ticks.log(2,cex.axis=1)

Single Subject with Censored Data

Next, we consider the case of one subject with both quantifiable positive samples and negative samples. Negative samples are treated as censored data indicating the SARS-CoV-2 RNA concentrations in those samples are below the Limit of Quantification (LOQ). We still use Subject 3, which has 14 positive samples and 3 negative samples. The data can be created using the following code:

#Create indicators for noncensored and censored observations of Subject 3;
ind.obs <- which(shedding.data$subjects==3 & !is.na(shedding.data$value))
ind.cen <- which(shedding.data$subjects==3 & is.na(shedding.data$value))
#Create input data for Stan model 
data_one_censored <- list(
  "nobs"=length(ind.obs), #number of noncensored observations;
  "ncen" = length(ind.cen), #number of censored observations;
  "censlim"=censorlimit, #log10 scale limit of  quantification;
  "t_obs" = shedding.data$day[ind.obs], #days after symptom onset;
  "c_obs" = shedding.data$value[ind.obs], #log10 concentration of SARS-CoV-2 RNA (copies per mL);
  "t_cen" = shedding.data$day[ind.cen] #days after symptom onset;
)
rm(ind.obs) #remove indicators;
rm(ind.cen) #remove indicators;
data_one_censored
## $nobs
## [1] 14
## 
## $ncen
## [1] 3
## 
## $censlim
## [1] 2
## 
## $t_obs
##  [1]  6  7  8  9 10 11 12 13 14 15 16 17 18 19
## 
## $c_obs
##  [1] 6.10 6.03 6.79 6.62 4.68 5.10 5.79 4.30 4.89 3.64 4.71 4.75 4.09 2.33
## 
## $t_cen
## [1] 20 22 23

Building the model

The exponential decay model for one subject with both quantifiable positive samples and negative samples (expon1cen.stan) is coded in Stan as follows:

data {
  int<lower=0> nobs;    // number of noncensored observation
  int<lower=0> ncen;    // number of censored observation
  vector[nobs] t_obs;   // noncensored observations time vector
  vector[nobs] c_obs;   // noncensored observations log10 concentration vector
  vector[ncen] t_cen;   // censored observations time vector
  real<upper=min(c_obs)> censlim; //log10 scale limit of quantification
}
parameters {
  real<lower=0> sig_obs;//standard deviation of measurement error;
  real a0; //decay rate;
  real<lower=0> c0; //log scale concentration at day 0;
}

transformed parameters {
  vector[nobs] gene_obs;
  vector[ncen] gene_cen;
  for (k_obs in 1:nobs)
    gene_obs[k_obs] = c0/log(10)-a0*t_obs[k_obs]/log(10);
  for (k_cen in 1:ncen)
    gene_cen[k_cen] = c0/log(10)-a0*t_cen[k_cen]/log(10);
}

model {
  // priors
  sig_obs ~ uniform(0,10);
  a0 ~ normal(0,100);
  c0 ~ normal(0,100);
  // likelihood
  c_obs ~ normal(gene_obs, sig_obs);
  target += normal_lcdf(censlim | gene_cen, sig_obs);
}

Fitting the model

Then, we fit the model using the following codes:

fit_one_censored <- stan(
  file = "./models/expon1cen.stan",  # Stan program
  data = data_one_censored,    # named list of data
  chains = 4,             # number of Markov chains
  warmup = 1000,          # number of warmup iterations per chain
  iter = 2000,            # total number of iterations per chain
  cores = 4,              # number of cores (could use one per chain)
  refresh = 500             # no progress shown
)

The inference results are shown below:

print(fit_one_censored, pars=c("a0","c0","sig_obs"), probs=c(.1,.5,.9))
## Inference for Stan model: anon_model.
## 4 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=4000.
## 
##          mean se_mean   sd   10%   50%   90% n_eff Rhat
## a0       0.74    0.00 0.11  0.60  0.74  0.88  1275    1
## c0      20.37    0.05 1.61 18.38 20.35 22.41  1283    1
## sig_obs  0.92    0.01 0.22  0.68  0.88  1.21  1331    1
## 
## Samples were drawn using NUTS(diag_e) at Fri Sep 13 11:42:33 2024.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

The estimated exponential decay curve is shown below:

#extract posterior samples
stan_post <- extract(fit_one_censored, permuted = FALSE)
a0 <- as.vector(stan_post[,,2])
c0 <- as.vector(stan_post[,,3])

#graph
plot(-1,-1,xlim=c(0,30),ylim=c(-5,10),col="white",
     xlab="Days after Symptom Onset",ylab="Genome Copies per mL",main="Subject 3",yaxt="n",cex.axis=1.5,cex.lab=1.5,cex.main=2);
lines(x=c(0,30),y=c(censorlimit,censorlimit),lty=3,col="black");
points(data_one_censored$t_obs,data_one_censored$c_obs,col="black",cex=1.5,pch=16)
points(data_one_censored$t_cen,rep(data_one_censored$censlim,length(data_one_censored$t_cen)),col="black",cex=1.5,pch=1)
graph.resp.1(color="black",model="exp")
legend("topright", legend=c("Quantifiable","Negative","Median","95% Credible Interval"), lty=c(NA,NA,1,3), pch=c(16,1,NA,NA), lwd=2, pt.cex=2)
text(5,2.5,"Limit of Quantification")
ticks.log(2,cex.axis=1)

Multiple Subjects with Censored Data

Next, we consider the case of multiple subjects (all 9 subjects) with both quantifiable positive samples and negative samples. Negative samples are treated as censored data indicating the SARS-CoV-2 RNA concentrations in those samples are below the Limit of Quantification (LOQ). The data can be created using the following code:

#Create indicators for noncensored and censored observations;
ind.obs <- which(!is.na(shedding.data$value))
ind.cen <- which(is.na(shedding.data$value))
#Create input data for Stan model 
data_multi_censored <- list(
  "nsubj" = length(unique(shedding.data$subjects)), #number of subjects
  "nobs"=length(ind.obs), #number of noncensored observations;
  "ncen" = length(ind.cen), #number of censored observations;
  "censlim"=1.96, #log10 scale limit of  quantification; there is a observation less than censorlimit 2. 
  "t_obs" = shedding.data$day[ind.obs], #days after symptom onset;
  "c_obs" = shedding.data$value[ind.obs], #log10 concentration of SARS-CoV-2 RNA (copies per mL);
  "p_obs" = shedding.data$subjects[ind.obs], #subject ID vector for noncensored observations;
  "t_cen" = shedding.data$day[ind.cen], #days after symptom onset;
  "p_cen" = shedding.data$subjects[ind.cen] #subject ID vector for censored observations;
)
rm(ind.obs) #remove indicators;
rm(ind.cen) #remove indicators;
data_multi_censored
## $nsubj
## [1] 9
## 
## $nobs
## [1] 68
## 
## $ncen
## [1] 14
## 
## $censlim
## [1] 1.96
## 
## $t_obs
##  [1]  8  9 10 11 12 13 14 15 16 17 20 21  6  8  9 10 11 12 13 14 15 19  6  7  8
## [26]  9 10 11 12 13 14 15 16 17 18 19  7  8  9 10 11 12 14 15 16 19 20  6  8  9
## [51] 12 20 21  6  8  9 13 18 22  6 18 23 26  3  4  5  8  9
## 
## $c_obs
##  [1] 6.64 7.16 6.68 4.67 4.70 3.67 3.70 3.73 3.01 2.42 3.46 2.80 5.74 4.88 4.77
## [16] 3.63 3.25 2.83 3.84 3.87 3.11 2.28 6.10 6.03 6.79 6.62 4.68 5.10 5.79 4.30
## [31] 4.89 3.64 4.71 4.75 4.09 2.33 6.05 7.23 7.51 5.57 4.43 5.15 4.88 4.25 3.98
## [46] 3.53 3.56 4.68 2.78 2.99 4.27 2.57 2.95 3.88 3.78 4.09 4.23 4.58 2.57 3.66
## [61] 2.59 2.04 1.97 4.68 4.33 3.61 3.99 2.71
## 
## $p_obs
##  [1] 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4
## [39] 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 8 8 8 8 8
## 
## $t_cen
##  [1]  7 20 22 23 26 16 21 19 20 24 12  8  9 10
## 
## $p_cen
##  [1] 2 3 3 3 5 6 6 7 7 7 8 9 9 9

Building the model

The hierarchical exponential decay model for multiple subjects with both quantifiable positive samples and negative samples (expon_hier_cen.stan) is coded in Stan as follows:

data {
  int<lower=0> nsubj;   // number of subjects
  int<lower=0> nobs;    // number of noncensored observation
  int<lower=0> ncen;    // number of censored observation
  vector[nobs] t_obs;   // noncensored observations time vector
  vector[nobs] c_obs;   // noncensored observations log10 concentration vector
  int p_obs[nobs];      // noncensored observations subject vector
  vector[ncen] t_cen;   // censored observations time vector
  int p_cen[ncen];      // censored observations subject vector
  real<upper=min(c_obs)> censlim; //limit of quantification
}

parameters {
  real<lower=0> sig_obs;//standard deviation of measurement error; // prediction error scale;
  cholesky_factor_corr[2] L_Omega; // Cholesky factorization for LKJ prior correlation;
  vector<lower=0>[2] tau; //precision
  vector[2] mu; //mean
  matrix[nsubj,2] par;
}

transformed parameters {
  vector[nsubj] a0; //decay rate;
  vector<lower=0>[nsubj] c0; //log scale concentration at day 0;
  for (k_subj in 1:nsubj){
    a0[k_subj] = exp(par[k_subj,1]);
    c0[k_subj] = exp(par[k_subj,2]);
  }
}

model {
  // priors
  sig_obs ~ uniform(0,10);
  tau ~ cauchy(0, 2.5);
  L_Omega ~ lkj_corr_cholesky(2);
  mu ~ normal(0,10);
  for (k_subj in 1:nsubj)
    //Multivariate normal using Cholesky parameterization;
    par[k_subj,] ~ multi_normal_cholesky(mu,diag_pre_multiply(tau, L_Omega));
  vector[nobs] gene_obs;
  vector[ncen] gene_cen;
  for (k_obs in 1:nobs)
    gene_obs[k_obs] = c0[p_obs[k_obs]]/log(10)-a0[p_obs[k_obs]]*t_obs[k_obs]/log(10);
  for (k_cen in 1:ncen)
    gene_cen[k_cen] = c0[p_cen[k_cen]]/log(10)-a0[p_cen[k_cen]]*t_cen[k_cen]/log(10);
  // likelihood
  c_obs ~ normal(gene_obs, sig_obs);
  target += normal_lcdf(censlim | gene_cen, sig_obs);
}

Fitting the model

Then, we fit the hierarchical model using the following codes:

fit_multi_censored <- stan(
  file = "./models/expon_hier_cen.stan",  # Stan program
  data = data_multi_censored,    # named list of data
  chains = 4,             # number of Markov chains
  warmup = 2000,          # number of warmup iterations per chain
  iter = 5000,            # total number of iterations per chain
  cores = 4,              # number of cores (could use one per chain)
  refresh = 500,             # progress shown
  control = list(adapt_delta = 0.999, max_treedepth = 15), #model tuning
  seed = 123 #set a seed
)

The inference results are shown below:

print(fit_multi_censored, pars=c("mu","tau","L_Omega","a0","c0","sig_obs"), probs=c(.1,.5,.9))
## Inference for Stan model: anon_model.
## 4 chains, each with iter=5000; warmup=2000; thin=1; 
## post-warmup draws per chain=3000, total post-warmup draws=12000.
## 
##               mean se_mean   sd   10%   50%   90% n_eff Rhat
## mu[1]        -0.86    0.00 0.32 -1.24 -0.84 -0.50  4443    1
## mu[2]         2.56    0.00 0.17  2.36  2.58  2.76  5298    1
## tau[1]        0.65    0.01 0.36  0.29  0.58  1.10  2442    1
## tau[2]        0.41    0.00 0.16  0.24  0.37  0.61  4296    1
## L_Omega[1,1]  1.00     NaN 0.00  1.00  1.00  1.00   NaN  NaN
## L_Omega[1,2]  0.00     NaN 0.00  0.00  0.00  0.00   NaN  NaN
## L_Omega[2,1]  0.45    0.01 0.36 -0.07  0.51  0.88  5016    1
## L_Omega[2,2]  0.79    0.00 0.21  0.47  0.86  0.99  3733    1
## a0[1]         0.63    0.00 0.15  0.44  0.63  0.83  5770    1
## a0[2]         0.35    0.00 0.13  0.18  0.35  0.51  4906    1
## a0[3]         0.66    0.00 0.12  0.52  0.66  0.81  6057    1
## a0[4]         0.53    0.00 0.14  0.36  0.53  0.71  7479    1
## a0[5]         0.30    0.00 0.11  0.17  0.30  0.44  4046    1
## a0[6]         0.31    0.00 0.11  0.17  0.31  0.45  4257    1
## a0[7]         0.30    0.00 0.11  0.16  0.30  0.45  3722    1
## a0[8]         0.56    0.00 0.25  0.27  0.51  0.90  5946    1
## a0[9]         1.26    0.05 3.72  0.15  0.61  2.48  5294    1
## c0[1]        18.81    0.03 2.22 15.97 18.76 21.68  5779    1
## c0[2]        12.24    0.02 1.60 10.19 12.26 14.24  5489    1
## c0[3]        19.28    0.02 1.68 17.10 19.29 21.38  6186    1
## c0[4]        18.48    0.02 1.89 16.10 18.45 20.92  7563    1
## c0[5]        11.38    0.03 1.67  9.25 11.35 13.53  4437    1
## c0[6]        11.79    0.02 1.69  9.62 11.80 13.98  4606    1
## c0[7]        10.39    0.03 2.24  7.49 10.43 13.21  4149    1
## c0[8]        11.78    0.02 1.84  9.58 11.63 14.22  6336    1
## c0[9]        10.09    0.12 7.22  4.43  8.36 17.28  3649    1
## sig_obs       0.92    0.00 0.09  0.80  0.91  1.04  8073    1
## 
## Samples were drawn using NUTS(diag_e) at Fri Sep 13 11:43:32 2024.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

The estimated exponential decay curves for 9 subjects are shown below:

#extract posterior samples
stan_post <- extract(fit_multi_censored, permuted = FALSE)

#graph
par(mfrow=c(3,3))
for (k.subj in 1:data_multi_censored$nsubj){
  a0 <- as.vector(stan_post[,,27+k.subj])
  c0 <- as.vector(stan_post[,,27+data_multi_censored$nsubj+k.subj])
  plot(-1,-1,xlim=c(0,30),ylim=c(-5,10),col="white",
       xlab="Days after Symptom Onset",ylab="Genome Copies per mL",main=paste("Subject", k.subj),yaxt="n",cex.axis=1.5,cex.lab=1.5,cex.main=2.5);
  lines(x=c(0,30),y=c(data_multi_censored$censlim,data_multi_censored$censlim),lty=3,col="black");
  graph.obs(data_multi_censored,k.subj,"black");
  graph.resp(k.subj,"black",model="exp");
  ticks.log(2,cex.axis=1);
  legend("topright", legend=c("Quantifiable","Negative","Median","95% Credible Interval"), lty=c(NA,NA,1,3), pch=c(16,1,NA,NA), lwd=2, pt.cex=2)
}

Then, we can use the model to make prediction of the concentration of SARS-CoV-2 in stool (genome copies per mL) given the days after symptom onset. To show the prediction results, we simulate predicted SARS-CoV-2 concentration between day 1 and day 30 after symptom onset.

library(MASS)
#R simulations
t.new <- seq(0,30,0.1)
n.sim <- 100000
n.iter <- length(stan_post[,1,1])
n.chain <- length(stan_post[1,,1])
pred.dat.r <- matrix(NA,nrow=length(t.new),ncol=8)
sim.dat <- matrix(NA,nrow=n.sim,ncol=length(t.new))
for (k.sim in 1:n.sim){
  rng1 <- sample(1:n.iter,1)
  rng2 <- sample(1:n.chain,1)
  sim.sigma <- diag(stan_post[rng1,rng2,6:7])%*%matrix(c(stan_post[rng1,rng2,2:5]),2,2)%*%t(matrix(c(stan_post[rng1,rng2,2:5]),2,2))%*%diag(stan_post[rng1,rng2,6:7])
  sim.par <- exp(mvrnorm(n=1,stan_post[rng1,rng2,8:9],Sigma=sim.sigma))
  sim.dat[k.sim,] <- rnorm(length(t.new),mean=sim.par[2]/log(10)-sim.par[1]*t.new/log(10),sd=sample(stan_post[,,1],1))
}
for (i in 1:length(t.new)){
  pred.dat.r[i,1] <- t.new[i]
  pred.dat.r[i,2:8] <- quantile(sim.dat[,i],c(0.025,0.05,0.1,0.5,0.90,0.95,0.975))
}

Then, we plot the credible interval of the prediction.

#graph
par(mfrow=c(1,1))
plot(-1,-1,xlim=c(0,30),ylim=c(-5,15),col="white",
     xlab="Days after Symptom Onset",ylab="Genome Copies per mL",main="Prediction with Credible Intervals",yaxt="n",cex.axis=1.5,cex.lab=1.5,cex.main=2);
lines(x=c(0,30),y=c(censorlimit,censorlimit),lty=3,col="black");
for (k.subj in 1:data_multi_censored$nsubj){
  graph.obs(data_multi_censored,k.subj,"grey");
}
#95% credible interval;
lines(x=pred.dat.r[,1],y=pred.dat.r[,2],col="black",lty=3,lwd=2);
lines(x=pred.dat.r[,1],y=pred.dat.r[,8],col="black",lty=3,lwd=2);
lines(x=pred.dat.r[,1],y=pred.dat.r[,3],col="black",lty=4,lwd=2);
lines(x=pred.dat.r[,1],y=pred.dat.r[,7],col="black",lty=4,lwd=2);
lines(x=pred.dat.r[,1],y=pred.dat.r[,4],col="black",lty=2,lwd=2);
lines(x=pred.dat.r[,1],y=pred.dat.r[,6],col="black",lty=2,lwd=2);
lines(x=pred.dat.r[,1],y=pred.dat.r[,5],col="black",lty=1,lwd=2);
ticks.log(2,cex.axis=1);
legend("topright",legend=c("Quantifiable","Negative","Median","80% Credible Interval","90% Credible Interval","95% Credible Interval"), lty=c(NA,NA,1,2,4,3), lwd=2, pt.cex=2, pch=c(16,1,NA,NA,NA,NA))

Gamma Model

When the concentration of pathogen shed c(t) is subject to gamma model, it follows:

\[c(t)=c_{0}t^{b_{0}}e^{-a_{0}t},\] Where \(c_{0}\) is the concentration of pathogen shed at symptom onset, a_{0} is the rate parameter, b_{0} is the shape parameter, and t is the day after symptom onset. In this example, we are assuming the fecal shedding starts on symptom onset.

When there are samples from multiple subjects, we can develop a hierarchical model:

\[c_i(t)=c_{i,0}t^{b_{i,0}}e^{-a_{i,0}t},\]

for subject i. \((log(a_{i,0}), log(b_{i,0}), log(c_{i,0})) \sim \mathcal{N}(\mathbf{\mu},\mathbf{\Omega^{-1}})\), where \(\mathbf{\mu}=(\mu_a,\mu_b,\mu_c)\). And we use the Cholesky parameterization for the correlation matrix and \(\mathbf{\Omega^{-1}}=diag(\tau_a,\tau_b,\tau_c)LL^{\top}diag(\tau_a,\tau_b,\tau_c)\). \(L\) is lower triangular such that \(LL^{\top}\) is positive definite.

Single Subject without Censored Data

Similar to the exponential decay case, we first consider the simplest case of one subject with only quantifiable positive samples. For this example, we select Subject 3, which has 14 positive samples and 3 negative samples, but only use measurements of positive samples. The data can be created using the following code:

#Create an indicator for noncensored observations of Subject 3;
ind <- which(shedding.data$subjects==3 & !is.na(shedding.data$value))
#Create input data for Stan model 
data_one_noncensored <- list(
  "nobs"=length(ind), #number of noncensored observations;
  "t" = shedding.data$day[ind], #days after symptom onset;
  "c" = shedding.data$value[ind] #log10 concentration of SARS-CoV-2 RNA (copies per mL);
)
rm(ind) #remove the indicator;
data_one_noncensored
## $nobs
## [1] 14
## 
## $t
##  [1]  6  7  8  9 10 11 12 13 14 15 16 17 18 19
## 
## $c
##  [1] 6.10 6.03 6.79 6.62 4.68 5.10 5.79 4.30 4.89 3.64 4.71 4.75 4.09 2.33

Building the model

The gamma model for one subject with only quantifiable positive samples (gamma1.stan) is coded in Stan as follows:

data {
  int<lower=0> nobs;// number of noncensored observation
  vector[nobs] t;   // noncensored observations time vector
  vector[nobs] c;   // noncensored observations log10 concentration vector
}

parameters {
  real<lower=0> sig_obs;//standard deviation of measurement error;
  real a0; //rate;
  real b0; //shape;
  real<lower=0> c0; //log scale concentration at day 0;
}

transformed parameters {
  vector[nobs] gene_obs;
  for (k_obs in 1:nobs)
    gene_obs[k_obs] = c0/log(10)+log(t[k_obs]^b0)/log(10)-a0*t[k_obs]/log(10);
}

model {
  // priors
  sig_obs ~ uniform(0,10);
  a0 ~ normal(0,100);
  b0 ~ normal(0,100);
  c0 ~ normal(0,100);
  // likelihood
  c ~ normal(gene_obs, sig_obs);
}

Fitting the model

Then, we fit the model using the following codes:

fit_one_noncensored <- stan(
  file = "./models/gamma1.stan",  # Stan program
  data = data_one_noncensored,    # named list of data
  chains = 4,             # number of Markov chains
  warmup = 1000,          # number of warmup iterations per chain
  iter = 2000,            # total number of iterations per chain
  cores = 4,              # number of cores (could use one per chain)
  refresh = 500             # no progress shown
)

The inference results are shown below:

print(fit_one_noncensored, pars=c("a0","b0","c0","sig_obs"), probs=c(.1,.5,.9))
## Inference for Stan model: anon_model.
## 4 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=4000.
## 
##          mean se_mean   sd   10%   50%   90% n_eff Rhat
## a0       0.76    0.02 0.63 -0.06  0.85  1.50   673    1
## b0       2.35    0.28 7.10 -6.93  3.47 10.56   662    1
## c0      15.20    0.38 9.87  3.67 13.66 28.44   684    1
## sig_obs  0.82    0.01 0.21  0.60  0.78  1.08   679    1
## 
## Samples were drawn using NUTS(diag_e) at Fri Sep 13 11:44:21 2024.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

The estimated gamma curve is shown below:

#extract posterior samples
stan_post <- extract(fit_one_noncensored, permuted = FALSE)
a0 <- as.vector(stan_post[,,2])
b0 <- as.vector(stan_post[,,3])
c0 <- as.vector(stan_post[,,4])

#graph
plot(-1,-1,xlim=c(0,30),ylim=c(-5,10),col="white",
     xlab="Days after Symptom Onset",ylab="Genome Copies per mL",main="Subject 3",yaxt="n",cex.axis=1.5,cex.lab=1.5,cex.main=2);
lines(x=c(0,30),y=c(censorlimit,censorlimit),lty=3,col="black");
points(data_one_noncensored$t,data_one_noncensored$c,col="black",cex=1.5,pch=16)
graph.resp.1(color="black",model="gamma")
legend("topright", legend=c("Quantifiable","Negative","Median","95% Credible Interval"), lty=c(NA,NA,1,3), pch=c(16,1,NA,NA), lwd=2, pt.cex=2)
text(5,2.5,"Limit of Quantification")
ticks.log(2,cex.axis=1)

Single Subject with Censored Data

Next, we consider the case of one subject with both quantifiable positive samples and negative samples. Negative samples are treated as censored data indicating the SARS-CoV-2 RNA concentrations in those samples are below the Limit of Quantification (LOQ). We still use Subject 3, which has 14 positive samples and 3 negative samples. The data can be created using the following code:

#Create indicators for noncensored and censored observations of Subject 3;
ind.obs <- which(shedding.data$subjects==3 & !is.na(shedding.data$value))
ind.cen <- which(shedding.data$subjects==3 & is.na(shedding.data$value))
#Create input data for Stan model 
data_one_censored <- list(
  "nobs"=length(ind.obs), #number of noncensored observations;
  "ncen" = length(ind.cen), #number of censored observations;
  "censlim"=censorlimit, #log10 scale limit of  quantification;
  "t_obs" = shedding.data$day[ind.obs], #days after symptom onset;
  "c_obs" = shedding.data$value[ind.obs], #log10 concentration of SARS-CoV-2 RNA (copies per mL);
  "t_cen" = shedding.data$day[ind.cen] #days after symptom onset;
)
rm(ind.obs) #remove indicators;
rm(ind.cen) #remove indicators;
data_one_censored
## $nobs
## [1] 14
## 
## $ncen
## [1] 3
## 
## $censlim
## [1] 2
## 
## $t_obs
##  [1]  6  7  8  9 10 11 12 13 14 15 16 17 18 19
## 
## $c_obs
##  [1] 6.10 6.03 6.79 6.62 4.68 5.10 5.79 4.30 4.89 3.64 4.71 4.75 4.09 2.33
## 
## $t_cen
## [1] 20 22 23

Building the model

The gamma model for one subject with both quantifiable positive samples and negative samples (gamma1cen.stan) is coded in Stan as follows:

data {
  int<lower=0> nobs;    // number of noncensored observation
  int<lower=0> ncen;    // number of censored observation
  vector[nobs] t_obs;   // noncensored observations time vector
  vector[nobs] c_obs;   // noncensored observations log10 concentration vector
  vector[ncen] t_cen;   // censored observations time vector
  real<upper=min(c_obs)> censlim;
}
parameters {
  real<lower=0> sig_obs;//standard deviation of measurement error;
  real a0; //rate;
  real b0; //shape;
  real<lower=0> c0; //log scale concentration at day 0;
}

transformed parameters {
  vector[nobs] gene_obs;
  vector[ncen] gene_cen;
  for (k_obs in 1:nobs)
    gene_obs[k_obs] = c0/log(10)+log(t_obs[k_obs]^b0)/log(10)-a0*t_obs[k_obs]/log(10);
  for (k_cen in 1:ncen)
    gene_cen[k_cen] = c0/log(10)+log(t_cen[k_cen]^b0)/log(10)-a0*t_cen[k_cen]/log(10);
}

model {
  // priors
  sig_obs ~ uniform(0,10);
  a0 ~ normal(0,100);
  b0 ~ normal(0,100);
  c0 ~ normal(0,100);
  // likelihood
  c_obs ~ normal(gene_obs, sig_obs);
  target += normal_lcdf(censlim | gene_cen, sig_obs);
}

Fitting the model

Then, we fit the model using the following codes:

fit_one_censored <- stan(
  file = "./models/gamma1cen.stan",  # Stan program
  data = data_one_censored,    # named list of data
  chains = 4,             # number of Markov chains
  warmup = 1000,          # number of warmup iterations per chain
  iter = 2000,            # total number of iterations per chain
  cores = 4,              # number of cores (could use one per chain)
  refresh = 500             # no progress shown
)

The inference results are shown below:

print(fit_one_censored, pars=c("a0","b0","c0","sig_obs"), probs=c(.1,.5,.9))
## Inference for Stan model: anon_model.
## 4 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=4000.
## 
##         mean se_mean   sd  10%  50%   90% n_eff Rhat
## a0      1.41    0.02 0.38 0.97 1.44  1.84   293 1.01
## b0      8.34    0.29 4.44 3.11 9.09 13.05   238 1.02
## c0      8.32    0.44 6.58 1.47 7.07 15.89   226 1.02
## sig_obs 0.87    0.01 0.22 0.64 0.83  1.15   669 1.00
## 
## Samples were drawn using NUTS(diag_e) at Fri Sep 13 11:44:58 2024.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

The estimated exponential decay curve is shown below:

#extract posterior samples
stan_post <- extract(fit_one_censored, permuted = FALSE)
a0 <- as.vector(stan_post[,,2])
b0 <- as.vector(stan_post[,,3])
c0 <- as.vector(stan_post[,,4])

#graph
plot(-1,-1,xlim=c(0,30),ylim=c(-5,10),col="white",
     xlab="Days after Symptom Onset",ylab="Genome Copies per mL",main="Subject 3",yaxt="n",cex.axis=1.5,cex.lab=1.5,cex.main=2);
lines(x=c(0,30),y=c(censorlimit,censorlimit),lty=3,col="black");
points(data_one_censored$t_obs,data_one_censored$c_obs,col="black",cex=1.5,pch=16)
points(data_one_censored$t_cen,rep(data_one_censored$censlim,length(data_one_censored$t_cen)),col="black",cex=1.5,pch=1)
graph.resp.1(color="black",model="gamma")
legend("topright", legend=c("Quantifiable","Negative","Median","95% Credible Interval"), lty=c(NA,NA,1,3), pch=c(16,1,NA,NA), lwd=2, pt.cex=2)
text(5,2.5,"Limit of Quantification")
ticks.log(2,cex.axis=1)

Multiple Subjects with Censored Data

At last, we consider the case of multiple subjects (all 9 subjects) with both quantifiable positive samples and negative samples. Negative samples are treated as censored data indicating the SARS-CoV-2 RNA concentrations in those samples are below the Limit of Quantification (LOQ). The data can be created using the following code:

#Create indicators for noncensored and censored observations;
ind.obs <- which(!is.na(shedding.data$value))
ind.cen <- which(is.na(shedding.data$value))
#Create input data for Stan model 
data_multi_censored <- list(
  "nsubj" = length(unique(shedding.data$subjects)), #number of subjects
  "nobs"=length(ind.obs), #number of noncensored observations;
  "ncen" = length(ind.cen), #number of censored observations;
  "censlim"=1.96, #log10 scale limit of  quantification; there is a observation less than censorlimit 2. 
  "t_obs" = shedding.data$day[ind.obs], #days after symptom onset;
  "c_obs" = shedding.data$value[ind.obs], #log10 concentration of SARS-CoV-2 RNA (copies per mL);
  "p_obs" = shedding.data$subjects[ind.obs], #subject ID vector for noncensored observations;
  "t_cen" = shedding.data$day[ind.cen], #days after symptom onset;
  "p_cen" = shedding.data$subjects[ind.cen] #subject ID vector for censored observations;
)
rm(ind.obs) #remove indicators;
rm(ind.cen) #remove indicators;
data_multi_censored
## $nsubj
## [1] 9
## 
## $nobs
## [1] 68
## 
## $ncen
## [1] 14
## 
## $censlim
## [1] 1.96
## 
## $t_obs
##  [1]  8  9 10 11 12 13 14 15 16 17 20 21  6  8  9 10 11 12 13 14 15 19  6  7  8
## [26]  9 10 11 12 13 14 15 16 17 18 19  7  8  9 10 11 12 14 15 16 19 20  6  8  9
## [51] 12 20 21  6  8  9 13 18 22  6 18 23 26  3  4  5  8  9
## 
## $c_obs
##  [1] 6.64 7.16 6.68 4.67 4.70 3.67 3.70 3.73 3.01 2.42 3.46 2.80 5.74 4.88 4.77
## [16] 3.63 3.25 2.83 3.84 3.87 3.11 2.28 6.10 6.03 6.79 6.62 4.68 5.10 5.79 4.30
## [31] 4.89 3.64 4.71 4.75 4.09 2.33 6.05 7.23 7.51 5.57 4.43 5.15 4.88 4.25 3.98
## [46] 3.53 3.56 4.68 2.78 2.99 4.27 2.57 2.95 3.88 3.78 4.09 4.23 4.58 2.57 3.66
## [61] 2.59 2.04 1.97 4.68 4.33 3.61 3.99 2.71
## 
## $p_obs
##  [1] 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4
## [39] 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 8 8 8 8 8
## 
## $t_cen
##  [1]  7 20 22 23 26 16 21 19 20 24 12  8  9 10
## 
## $p_cen
##  [1] 2 3 3 3 5 6 6 7 7 7 8 9 9 9

Building the model

The hierarchical gamma model for multiple subjects with both quantifiable positive samples and negative samples (gamma_hier_cen.stan) is coded in Stan as follows:

data {
  int<lower=0> nsubj;   // number of subjects
  int<lower=0> nobs;    // number of noncensored observation
  int<lower=0> ncen;    // number of censored observation
  vector[nobs] t_obs;   // noncensored observations time vector
  vector[nobs] c_obs;   // noncensored observations log10 concentration vector
  int p_obs[nobs];      // noncensored observations subject vector
  vector[ncen] t_cen;   // censored observations time vector
  int p_cen[ncen];      // censored observations subject vector
  real<upper=min(c_obs)> censlim;
}

parameters {
  real<lower=0> sig_obs;//standard deviation of measurement error;
  cholesky_factor_corr[3] L_Omega; // Cholesky factorization for LKJ prior correlation;
  vector<lower=0>[3] tau;
  vector[3] mu;
  matrix[nsubj,3] par_raw;
}

transformed parameters {
  vector[nsubj] a0; //rate;
  vector[nsubj] b0; //shape;
  vector[nsubj] c0; //log scale concentration at day 0;
  matrix[nsubj,3] par;
  for (k_subj in 1:nsubj)
    //reparameterization
    par[k_subj,] = to_row_vector(mu + tau .* (L_Omega * to_vector(par_raw[k_subj,])));
  for (k_subj in 1:nsubj){
    a0[k_subj] = exp(par[k_subj,1]);
    b0[k_subj] = exp(par[k_subj,2]);
    c0[k_subj] = exp(par[k_subj,3]);
  }
  vector[nobs] gene_obs;
  vector[ncen] gene_cen;
  for (k_obs in 1:nobs)
    gene_obs[k_obs] = c0[p_obs[k_obs]]/log(10)+log(t_obs[k_obs]^b0[p_obs[k_obs]])/log(10)-a0[p_obs[k_obs]]*t_obs[k_obs]/log(10);
  for (k_cen in 1:ncen)
    gene_cen[k_cen] = c0[p_cen[k_cen]]/log(10)+log(t_cen[k_cen]^b0[p_obs[k_cen]])/log(10)-a0[p_cen[k_cen]]*t_cen[k_cen]/log(10);
}

model {
  // priors
  sig_obs ~ uniform(0,5);
  tau ~ cauchy(0, 2.5);
  L_Omega ~ lkj_corr_cholesky(3);
  mu ~ normal(0,5);
  // likelihood
  for (k_subj in 1:nsubj)
    par_raw[k_subj,] ~ std_normal();
  c_obs ~ normal(gene_obs, sig_obs);
  target += normal_lcdf(censlim | gene_cen, sig_obs);
}

Fitting the model

Then, we fit the hierarchical model using the following codes:

fit_multi_censored <- stan(
  file = "./models/gamma_hier_cen.stan",  # Stan program
  data = data_multi_censored,    # named list of data
  chains = 4,             # number of Markov chains
  warmup = 5000,          # number of warmup iterations per chain
  iter = 10000,            # total number of iterations per chain
  cores = 4,              # number of cores (could use one per chain)
  refresh = 500,             # progress shown
  control = list(adapt_delta = 0.999, max_treedepth = 15), #model tuning
  seed = 123 #set a seed
)

The inference results are shown below:

print(fit_multi_censored, pars=c("mu","tau","L_Omega","a0","b0","c0","sig_obs"), probs=c(.1,.5,.9))
## Inference for Stan model: anon_model.
## 4 chains, each with iter=10000; warmup=5000; thin=1; 
## post-warmup draws per chain=5000, total post-warmup draws=20000.
## 
##               mean se_mean      sd   10%   50%   90% n_eff Rhat
## mu[1]        -0.38    0.00    0.24 -0.67 -0.38 -0.11  8448    1
## mu[2]         0.84    0.01    0.67  0.03  0.94  1.56  7534    1
## mu[3]         1.97    0.01    0.53  1.37  2.08  2.46  4887    1
## tau[1]        0.41    0.00    0.31  0.08  0.35  0.81  3982    1
## tau[2]        1.02    0.01    0.63  0.43  0.87  1.77  8550    1
## tau[3]        0.85    0.01    0.52  0.39  0.70  1.45  4399    1
## L_Omega[1,1]  1.00     NaN    0.00  1.00  1.00  1.00   NaN  NaN
## L_Omega[1,2]  0.00     NaN    0.00  0.00  0.00  0.00   NaN  NaN
## L_Omega[1,3]  0.00     NaN    0.00  0.00  0.00  0.00   NaN  NaN
## L_Omega[2,1]  0.03    0.00    0.34 -0.43  0.03  0.48  8924    1
## L_Omega[2,2]  0.94    0.00    0.08  0.83  0.97  1.00 11238    1
## L_Omega[2,3]  0.00     NaN    0.00  0.00  0.00  0.00   NaN  NaN
## L_Omega[3,1]  0.15    0.00    0.32 -0.28  0.17  0.57 10036    1
## L_Omega[3,2] -0.40    0.00    0.28 -0.74 -0.44 -0.01  9514    1
## L_Omega[3,3]  0.78    0.00    0.16  0.56  0.81  0.96 10167    1
## a0[1]         0.75    0.00    0.13  0.59  0.75  0.92 14925    1
## a0[2]         0.72    0.00    0.19  0.50  0.71  0.97 11471    1
## a0[3]         0.82    0.00    0.20  0.61  0.78  1.09  7969    1
## a0[4]         0.76    0.00    0.19  0.55  0.73  1.00  8564    1
## a0[5]         0.52    0.00    0.17  0.30  0.52  0.75  5938    1
## a0[6]         0.54    0.00    0.17  0.33  0.54  0.75  7263    1
## a0[7]         0.55    0.00    0.16  0.34  0.55  0.76  7229    1
## a0[8]         0.81    0.00    0.30  0.52  0.75  1.19 11164    1
## a0[9]         1.44    0.03    4.06  0.56  0.93  2.44 15303    1
## b0[1]         0.82    0.01    0.76  0.10  0.60  1.84 10857    1
## b0[2]         3.50    0.01    1.45  1.82  3.35  5.36  9355    1
## b0[3]         3.72    0.02    2.19  1.47  3.21  6.76  8065    1
## b0[4]         2.43    0.03    2.19  0.30  1.83  5.33  6672    1
## b0[5]         3.56    0.03    1.85  1.10  3.50  6.10  5024    1
## b0[6]         4.50    0.02    1.66  2.35  4.45  6.74  6110    1
## b0[7]         3.75    0.02    1.76  1.52  3.63  6.18  6280    1
## b0[8]         2.59    0.02    1.73  0.68  2.29  4.78 12016    1
## b0[9]        37.92   14.48 1518.97  0.85  4.25 16.38 10999    1
## c0[1]        18.38    0.02    2.09 15.74 18.45 20.94 13003    1
## c0[2]         8.54    0.02    2.25  5.63  8.69 11.30 13111    1
## c0[3]        12.48    0.04    3.58  7.64 12.94 16.60 10373    1
## c0[4]        15.34    0.04    3.81 10.13 16.07 19.47  7518    1
## c0[5]         5.73    0.04    3.05  1.60  5.67  9.83  6397    1
## c0[6]         4.79    0.03    2.66  1.23  4.70  8.36  7721    1
## c0[7]         5.50    0.03    2.90  1.57  5.46  9.34  9034    1
## c0[8]         9.11    0.02    2.41  5.96  9.26 12.04 15706    1
## c0[9]         4.95    0.06    4.59  0.93  3.91  9.87  6195    1
## sig_obs       0.76    0.00    0.08  0.67  0.75  0.87 12554    1
## 
## Samples were drawn using NUTS(diag_e) at Fri Sep 13 11:54:18 2024.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

The estimated exponential decay curve is shown below:

#extract posterior samples
stan_post <- extract(fit_multi_censored, permuted = FALSE)

#graph
par(mfrow=c(3,3))
for (k.subj in 1:data_multi_censored$nsubj){
  a0 <- as.vector(stan_post[,,43+k.subj])
  b0 <- as.vector(stan_post[,,52+k.subj])
  c0 <- as.vector(stan_post[,,61+k.subj])
  plot(-1,-1,xlim=c(0,30),ylim=c(-5,10),col="white",
       xlab="Days after Symptom Onset",ylab="Genome Copies per mL",main=paste("Subject", k.subj),yaxt="n",cex.axis=1.5,cex.lab=1.5,cex.main=2.5);
  lines(x=c(0,30),y=c(data_multi_censored$censlim,data_multi_censored$censlim),lty=3,col="black");
  graph.obs(data_multi_censored,k.subj,"black");
  graph.resp(k.subj,"black",model="gamma");
  ticks.log(2,cex.axis=1);
  legend("topright", legend=c("Quantifiable","Negative","Median","95% Credible Interval"), lty=c(NA,NA,1,3), pch=c(16,1,NA,NA), lwd=2, pt.cex=2)
}

Then, we can use the model to make prediction of the concentration of SARS-CoV-2 in stool (genome copies per mL) given the days after symptom onset. To show the prediction results, we simulate predicted SARS-CoV-2 concentration between day 1 and day 30 after symptom onset.

library(MASS)
#R simulations
t.new <- seq(0,30,0.1)
n.sim <- 100000
n.iter <- length(stan_post[,1,1])
n.chain <- length(stan_post[1,,1])
pred.dat.r <- matrix(NA,nrow=length(t.new),ncol=8)
sim.dat <- matrix(NA,nrow=n.sim,ncol=length(t.new))
for (k.sim in 1:n.sim){
  rng1 <- sample(1:n.iter,1)
  rng2 <- sample(1:n.chain,1)
  sim.sigma <- diag(stan_post[rng1,rng2,11:13])%*%matrix(c(stan_post[rng1,rng2,2:10]),3,3)%*%t(matrix(c(stan_post[rng1,rng2,2:10]),3,3))%*%diag(stan_post[rng1,rng2,11:13])
  sim.par <- exp(mvrnorm(n=1,stan_post[rng1,rng2,14:16],Sigma=sim.sigma))
  sim.dat[k.sim,] <- rnorm(length(t.new),mean=sim.par[3]/log(10)+log(t.new^sim.par[2])/log(10)-sim.par[1]*t.new/log(10),sd=sample(stan_post[,,1],1))
}
for (i in 1:length(t.new)){
  pred.dat.r[i,1] <- t.new[i]
  pred.dat.r[i,2:8] <- quantile(sim.dat[,i],c(0.025,0.05,0.1,0.5,0.90,0.95,0.975))
}

Then, we plot the credible interval of the prediction.

#graph
par(mfrow=c(1,1))
plot(-1,-1,xlim=c(0,30),ylim=c(-5,15),col="white",
     xlab="Days after Symptom Onset",ylab="Genome Copies per mL",main="Prediction with Credible Intervals",yaxt="n",cex.axis=1.5,cex.lab=1.5,cex.main=2);
lines(x=c(0,30),y=c(censorlimit,censorlimit),lty=3,col="black");
for (k.subj in 1:data_multi_censored$nsubj){
  graph.obs(data_multi_censored,k.subj,"grey");
}
#95% credible interval;
lines(x=pred.dat.r[,1],y=pred.dat.r[,2],col="black",lty=3,lwd=2);
lines(x=pred.dat.r[,1],y=pred.dat.r[,8],col="black",lty=3,lwd=2);
lines(x=pred.dat.r[,1],y=pred.dat.r[,3],col="black",lty=4,lwd=2);
lines(x=pred.dat.r[,1],y=pred.dat.r[,7],col="black",lty=4,lwd=2);
lines(x=pred.dat.r[,1],y=pred.dat.r[,4],col="black",lty=2,lwd=2);
lines(x=pred.dat.r[,1],y=pred.dat.r[,6],col="black",lty=2,lwd=2);
lines(x=pred.dat.r[,1],y=pred.dat.r[,5],col="black",lty=1,lwd=2);
ticks.log(2,cex.axis=1);
legend("topright",legend=c("Quantifiable","Negative","Median","80% Credible Interval","90% Credible Interval","95% Credible Interval"), lty=c(NA,NA,1,2,4,3), lwd=2, pt.cex=2, pch=c(16,1,NA,NA,NA,NA))

References

Wölfel, Roman, Victor M Corman, Wolfgang Guggemos, Michael Seilmaier, Sabine Zange, Marcel A Müller, Daniela Niemeyer, et al. 2020. “Virological Assessment of Hospitalized Patients with COVID-2019.” Nature 581 (7809): 465–69. https://doi.org/10.1038/s41586-020-2196-x.