1 Terminology and notation

I will introduce in this Section some notation that I will be using throughout this report.

I denote the random variable for an individual’s survival time with \(T^*\); it denotes time and can, therefore, assume any non-negative value. Lower-case \(t^*\) represents a realisation of \(T^*\) for a given individual. In the case of right censoring, I denote with \(C\) the random variable representing censoring time, and \(c\) its realisation. The observed time is denoted with \(T = \min(T^*, C)\), and its realisation is \(t\). Finally, I denote with \(D = I(T^* \le C)\) the random variable indicating either occurrence of the event of interest or censoring; analogously as before, its realisation is lower-case \(d\).

Next, I will define two of the main quantities of interest in survival analysis, the survival function and the hazard function. They are both functions of the observed time \(t\) and are denoted by \(S(t)\) and \(h(t)\), respectively.

The survival function is the complement of the cumulative distribution function of the observed time \(T\) and represents the probability that a given individual survives (loosely speaking) longer than \(t\): \[ S(t) = 1 - F_T(t) = 1 - P(T \le t) = P(T > t) \] \(t\) ranges (theoretically) between 0 and infinity, hence the survival function can be plotted as a smooth, continuous function that tends to 0 as \(t\) goes to infinity. In practice, though, the survival function appears as a step function as (1) individuals can be observed at discrete times only and (2) not all individuals may experience the event before the end of the study.

The hazard function \(h(t)\) is the limit of the probability of the survival time \(T\) laying within an interval \([t, t + \Delta_t)\) given that an individual survived up to time \(t\) divided by the length of the interval \(\Delta_t\), for \(\Delta_t\) approaching zero: \[ h(t) = \lim_{\Delta_t \to 0} \frac{P(t \le T < t + \Delta_t | T \ge t)}{\Delta_t} \] It represents the instantaneous potential (e.g. risk) for the event to occur within the interval \([t, t + \Delta_t)\) (with \(\Delta_t \to 0\)), given that the individual survived up to time \(t\). The hazard function is always non-negative, it can assume different shapes over time, and it has no upper bound.

The survival function and the hazard function are strictly related. In fact, there is a clearly defined mathematical relationship between them, and it is possible to derive the form of \(S(t)\) when knowing the form of \(h(t)\) (and vice versa). Formally: \[ S(t) = \exp \left[ -\int_0^t h(u) \ du \right] \] \[ h(t) = -\left[ \frac{d S(t) / dt}{S(t)} \right] \]

A third quantity of interest strictly related to the survival and hazard functions is the cumulative hazard function \(H(t)\). The cumulative hazard function represents the accumulation of hazard \(h(t)\) over time, and is defined as \[ H(t) = \int_0^t h(u) \ du; \] it can be expressed in terms of survival function, via the relationships \(H(t) = - \log S(t)\) or alternatively \(S(t) = \exp(-H(t))\).

The notation presented in this Chapter follows Collett (2014), where further details can be found. Other seminal books covering the topic of survival analysis are books by Cox and Oakes (1984), Kalbfleisch and Prentice (2011), and Kleinbaum and Klein (2012). The hazard function \(h(t)\) and the survival function \(S(t)\) form building blocks for the models that I will be presenting in the following Sections 2 and 3 and it is crucial to keep clear in mind their formulation and interpretation.

References

Collett, David. 2014. Modelling Survival Data in Medical Research. 3rd ed. Chapman & Hall / CRC.

Cox, David R, and David Oakes. 1984. Analysis of Survival Data. Monographs on Statistics and Applied Probability. Chapman; Hall / CRC.

Kalbfleisch, John D, and Ross L Prentice. 2011. The Statistical Analysis of Failure Time Data. 2nd ed. Wiley Series in Probability and Statistics. John Wiley & Sons, Inc. doi:10.1002/9781118032985.

Kleinbaum, David G, and Mitchel Klein. 2012. Survival Analysis: A Self-Learning Text. 3rd ed. Springer-Verlag New York. doi:10.1007/978-1-4419-6646-9.