2.1 Univariate frailty models

In those settings where risk factors are not measured, their relevance is unknown, or it is not known whether such risk factors exist at all or not, it is useful to consider two sources of variability in survival analysis: variability accounted for by observable risk factors included in the model and heterogeneity caused by unknown covariates. The unobserved heterogeneity is described by the frailty term, which is assumed to follow some distribution. Formally: \[ h(t|\alpha) = \alpha h_0(t), \] where \(\alpha\) is a non-observed frailty effect and \(h_0(t)\) is the baseline hazard function. The random variable \(\alpha\), the frailty term, is chosen to have a distribution \(f(\alpha)\) with expectation \(E(\alpha) = 1\) and variance \(V(\alpha) = \sigma ^ 2\). \(V(\alpha)\) is interpretable as a measure of heterogeneity across the population in baseline risk: as \(\sigma ^ 2\) increases the values of \(\alpha\) are more dispersed, with greater heterogeneity in \(\alpha h_0(t)\). Underlying assumptions are: the frailty is time independent, and it acts multiplicatively on the underlying baseline hazard function.

Introducing observed covariates into the model and inducing proportional hazards: \[ h(t|X,\alpha) = \alpha h_0(t) \exp(X \beta) = \alpha h(t | X), \] with \(X\) and \(\beta\) covariates and regression coefficients, respectively. Given the relationship between hazard and survival function, it can be showed that the individual survival function conditional on the frailty is \(S(t | \alpha) = S(t) ^ \alpha\). The population (i.e. marginal, or unconditional) survival function is obtained by integrating out the frailty from the conditional survival function: \[ S(t) = \int_0^{+\infty} \left[ S(t) \right] ^ \alpha f(\alpha) \ d\alpha \]

The individual contribution to the likelihood (assuming no delayed entry) is conditional on the unobserved frailty \(\alpha\) \[ L_i = \prod_{i = 1} ^ {n} \left( \alpha h_0(t_i)\exp(X_i \beta) \right) ^ {d_i} \exp(-\alpha H_0(t_i) \exp(X_i \beta)), \] with \(d_i\) event indicator variable, \(H_0(t_i)\) cumulative baseline hazard, and \(t_i\) observed survival time - all relative to the \(i\)-th individual.

Different choices for the frailty distribution are possible. Assigning a probability distribution implies that the frailty can be integrated out of the likelihood function. After this integration, the likelihood can be maximized in the usual way if an explicit form exists. Otherwise, more sophisticated approaches like numerical integration or Markov Chain Monte Carlo methods are required. The most often used frailty distributions are the gamma and the log-normal distribution; the positive stable and the inverse Gaussian distribution are also common.

Assuming that the frailty \(\alpha\) has a Gamma distribution is convenient: it has the appropriate range \((0, \infty)\) and it is mathematically tractable. A Gamma distribution with parameters \(a\) and \(b\) has density \[ f(x) = \frac{x ^ {a - 1} \exp(- x / b)}{\Gamma(a)b ^ a}; \] by choosing \(a = 1 / \theta\) and \(b = \theta\) the resulting distribution has expectation \(1\) and finite variance \(\theta\). In these settings, the model is analytically tractable: the population survival function has the form \[ S(t) = (1 - \theta \log(S(t))) ^ {-1/\theta}; \] the likelihood follows by substitution. Estimating such model becomes therefore straightforward, which likely contributed to the popularity of Gamma frailty models.

Together with the Gamma distribution, the log-normal distribution is the most commonly used frailty distribution, given its strong ties to random effect models; more on that in Section 2.3. Hence, assuming a log-normal distribution with a single parameter \(\theta > 0\) (for comparison with the mathematically tractable Gamma frailty model) with density \[ f(x) = (2 \pi \theta) ^ {-\frac{1}{2}} x ^ {-1} \exp \left( -\frac{(\log x) ^ 2}{2 \theta} \right), \] the resulting model has a frailty whose expectation is finite. Nevertheless, this frailty distribution cannot be integrated out of the survival function analytically to obtain the population survival function or the likelihood.