3.1 Model formulation

A joint model for longitudinal and survival data consists of two components: a model for the longitudinal part (I will be assuming a single longitudinal trajectory from now on for simplicity) and a model for the survival part. These two components will then share a latent structure that will describe the association between the two processes. In literature, the dominant approach seems to be allowing the two components to share random effects; I will follow this approach.

Building on the notation from Section 1, let \(y_{ij} = \{ y_{ij}(t_{ij}) \ \forall \ j = 1, \dots, n_i \}\) be the observed longitudinal response for the \(i\)th subject, with \(y_{ij}(t_{ij})\) the observed response at time \(t_{ij}\) and \(n_i\) the number of longitudinal observations.

The longitudinal component of the joint model is modelled within the mixed-effects framework (Diggle et al. 2013), as longitudinal data is likely measured intermittently and with error. Therefore: \[ y_i(t) = m_i(t) + \epsilon_i(t), \ \epsilon_i(t) \sim N(0, \sigma^2) \] and \[ m_i(t) = X_i(t) \beta + Z_i(t) b_i, \ b_i \sim N(0, \Sigma) \] with \(X_i(t)\) and \(Z_i(t)\) the time-dependent design matrices for the fixed and random effects, respectively, \(\beta\) the fixed effects, and \(b_i\) the random effects for the ith individual. \(y_i(t)\) represents the observed longitudinal trajectory at time \(t\), which could be decomposed into the true longitudinal trajectory \(m_i(t)\) plus the measurement error \(\epsilon_i(t)\).

The survival component of the joint model is modelled using a proportional hazards time to event model, given the true unobserved longitudinal trajectory up to time \(t\), i.e. \(M_i(t) = \{m_i(s) \ \forall \ 0 \le s \le t\}\): \[ h(t | M_i(t)) = h_0(t) \exp(W \psi + \alpha m_i(t)), \] where \(h_0(t)\) is the baseline hazard function and \(W\) is a vector of time-fixed covariates with their regression parameters \(\psi\). \(\alpha\) is the association parameter that links the longitudinal component and the survival component of the joint model; it can be interpreted as the log-hazard ratio for a unit increase in the true longitudinal trajectory \(m_i(t)\), at time \(t\). This specific form of the association parameter is also known as the current value parametrisation; additional association structures are available, allowing for instance interactions, association with the slope of the trajectory or its cumulative effect, and so on. Further details in Rizopoulos (2012).

The survival function follows as \[ S(t | M_i(t)) = \exp \left( -\int_0 ^ t h_0(u) \exp(W \psi + \alpha m_i(u)) \ du \right) \]

Finally, regarding \(h_0(t)\): the choice of the baseline hazard function follows the usual rationale. It can be left unspecified, therefore resulting in a Cox model for the survival component of the joint model, or it can be specified using a parametric distribution (e.g. a Weibull distribution) or some flexible alternative (Crowther, Abrams, and Lambert 2012). Hsieh, Tseng, and Wang (2006) showed that choosing the Cox model for the survival component yields standard errors that are underestimated; consequently, bootstrapping is required to obtain correct standard errors in that situation.

References

Diggle, Peter J, Patrick Heagerty, Kung-Yee Liang, and Scott L Zeger. 2013. Analysis of Longitudinal Data. 2nd ed. Oxford Statistical Science Series. OUP Oxford.

Rizopoulos, Dimitris. 2012. Joint Models for Longitudinal and Time-to-Event Data: With Applications in R. Biostatistics. Chapman & Hall / CRC.

Crowther, Michael J, Keith R Abrams, and Paul C Lambert. 2012. “Flexible Parametric Joint Modelling of Longitudinal and Survival Data.” Statistics in Medicine 31 (30): 4456–71. doi:10.1002/sim.5644.

Hsieh, Fushing, Yi-Kuan Tseng, and Jane-Ling Wang. 2006. “Joint Modeling of Survival and Longitudinal Data: Likelihood Approach Revisited.” Biometrics 62: 1037–43. doi:10.1111/j.1541-0420.2006.00570.x.