8 Informative visiting process

Health care consumption data is being used increasingly often in medical research, a successful example being the CArdiovascular disease research using LInked Bespoke studies and Electronic health Records (CALIBER) programme (Denaxas et al. 2012). CALIBER was constructed by extracting and linking electronic health records from primary care, hospital care, and nationwide registries (including information such as social deprivation and living status), and it consists of more than 10 million adults with 400 million person-years of follow-up; this vast amount of data allows researchers to answer more relevant and detailed clinical questions, but poses new methodological challenges.

First and foremost, in observational, health care consumption data observation times are likely to be correlated with the underlying disease severity. For instance, individuals tend to have irregular observation times as patients with more severe conditions (or showing early symptoms of a disease) tend to visit their GP or go to the hospital more often than those with milder conditions (and no symptoms). Their worse disease status is also likely to be reflected in worse biomarkers being recorded as such visits, causing abnormal values of such biomarkers to be overrepresented and normal values to be underrepresented. Additionally, for diseases with a high mortality rate, a terminal event that truncates observation of the longitudinal process is likely to be informative in the sense that it likely correlates with disease severity. That is, dropout is likely to be informative as the tendency to drop out after the occurrence of a terminal event is related to the current level of the longitudinally recorded biomarker.

Traditional methods used to analyse longitudinal data rely on the assumption that the underlying mechanism that controls the observation time is independent of disease severity; however, as I mentioned before, that is unlikely with health care consumption data. It can be showed that failing to account for informative dropout in a longitudinal study could yield biased estimates of the model parameters (Wu and Carroll 1988). Analogously, an explicit assumption when jointly modelling longitudinal and survival data (using the framework presented in Chapter 3) is that the timing and number of measurements for each subject should be non-informative, i.e. it does not associate with the survival part of the model. However, it is currently unknown whether violations of this assumption lead to invalid inference or not in the context of joint models.

The topic of informative observation times and informative censoring has been the object of recent investigations. L. Liu, Huang, and O’Quigley (2008) developed a model that analyses repeated measures by taking into account informative observation times and an informative terminal event at the same time; they proposed a joint model formed by three submodels, a frailty model for the observation times, a mixed effects model for the longitudinal process, and a proportional hazards model for the terminal event. P. Ghosh, Ghosh, and Tiwari (2011) proposed a joint longitudinal and survival model that handles multiple change points in the longitudinal profile by including random spline coefficients; the survival part of the models handles an informative terminal event by using a semiparametric Cox model. Han, Song, and Sun (2014) proposed a model for the joint distribution of the longitudinal process, the observation process, and the dropout process that uses, respectively, a semiparametric linear regression model for the longitudinal data and two accelerated failure time models for the observation and dropout process; their model is semiparametric in the sense that it leaves the distributional form and the dependence structure unspecified. Lesperance et al. (2015) developed a joint model within the multi-state framework that handles examination times correlated with disease progression; they link transition intensities of a Markov model with a log-linear mixed model governing observation times through shared random effects. However, they do not integrate out the shared random effects from the model as their target of inference is the transition intensities conditional on the random effects. Analogously, there has been quite a lot of developments in the multi-state framework to handle informative visiting times and/or dropout. Sweeting, Farewell, and De Angelis (2010) developed a model (similar to a hidden Markov model) in the setting of a response variable irregularly and infrequently observed by conditioning on regularly collected auxiliary data. Lange et al. (2015) generalise the work of Sweeting, Farewell, and De Angelis (2010) to better fit observational data settings rather than clinical trial data with informative missingness or observation times by modelling the disease process with a Markov model and the observation process with a Poisson process that depends on the underlying disease status. Lawless and Nazeri Rad (2015) consider the effect of intermittent, irregular observation times on the estimation of Markov models; they show that it is not possible to estimate transition intensities in bi-directional Markov models with good precision when the gap between observation times is too big. They also show how the correlation between visit times and observed disease status can bias model assessment procedures, and propose an inverse intensity weighted estimation procedure for state prevalence. In brief, this approach consists in weighting each individual by their probability of being observed (or measured) at a given point in time; they discuss and develop this further in (Nazeri Rad and Lawless 2017). Finally, Li and Su (2017) proposed a joint model for a longitudinal outcome and semicompeting risk data such as study dropout and death; they model the longitudinal process using a mixed model, and the semicompeting risks using two separate probit models.

In conclusion, joint models for longitudinal and survival data can handle effectively informative dropout in longitudinal study by modelling the longitudinal trajectory and the dropout process jointly. However, it is not clear whether presence of an informative visiting process affects inference from joint models. Further investigating this topic will form a good part of work planned for the second year of my PhD, as I will discuss in Chapter 9. This project will have important practical implications, as it aims to provide guidance to practitioners and applied researchers using joint models for longitudinal and survival data with their observational data.

References

Denaxas, Spiros C, Julie George, Emily Herrett, Anoop D Shah, Dipak Kalra, Aroon D Hingorani, Mika Kivimaki, Adam D Timmis, Liam Smeeth, and Harry Hemingway. 2012. “Data Resource Profile: Cardiovascular Disease Research Using Linked Bespoke Studies and Electronic Health Records (CALIBER).” International Journal of Epidemiology 41: 1625–38. doi:10.1093/ije/dys188.

Wu, Margaret C, and Raymond J Carroll. 1988. “Estimation and Comparison of Changes in the Presence of Informative Right Censoring by Modeling the Censoring Process.” Biometrics 44: 175–88. doi:10.2307/2531905.

Liu, Lei, Xuelin Huang, and John O’Quigley. 2008. “Analysis of Longitudinal Data in Presence of Informative Observational Times and a Dependent Terminal Event, with Application to Medical Cost Data.” Biometrics 64: 950–58. doi:10.1111/j.1541-0420.2007.00954.x.

Ghosh, Pulak, Kaushik Ghosh, and Ram C Tiwari. 2011. “Joint Modeling of Longitudinal Data and Informative Dropout Time in the Presence of Multiple Changepoints.” Statistics in Medicine 30: 611–26. doi:10.1002/sim.4119.

Han, Miao, Xinyuan Song, and Liuquan Sun. 2014. “Joint Modeling of Longitudinal Data with Informative Observation Times and Dropouts.” Statistica Sinica 24: 1487–1504. doi:10.5705/ss.2013.063.

Lesperance, Mary L, Veronica Sabelnykova, Farouk S Nathoo, Francis Lau, and Michael G Downing. 2015. “A Joint Model for Interval-Censored Functional Decline Trajectories Under Informative Observation.” Statistics in Medicine 34: 3929–48. doi:10.1002/sim.6582.

Sweeting, Michael J, Vern T Farewell, and Daniela De Angelis. 2010. “Multi-State Markov Models for Disease Progression in the Presence of Informative Examination Times: An Application to Hepatitis C.” Statistics in Medicine 29: 1161–74. doi:10.1002/sim.3812.

Lange, Jane M, Rebecca A Hubbard, Lurdes YT Inoue, and Vladimir N Minin. 2015. “A Joint Model for Multistate Disease Processes and Random Informative Observation Times, with Applications to Electronic Medical Records Data.” Biometrics 71: 90–101. doi:10.1111/biom.12252.

Lawless, Jerald F, and N Nazeri Rad. 2015. “Estimation and Assessment of Markov Multistate Models with Intermittent Observations on Individuals.” Lifetime Data Analysis 21: 160–79. doi:10.1007/s10985-014-9310-z.

Nazeri Rad, N, and Jerald F Lawless. 2017. “Estimation of State Occupancy Probabilities in Multistate Models with Dependent Intermittent Observation, with Application to Hiv Viral Rebounds.” Statistics in Medicine 36: 1256–71. doi:10.1002/sim.7189.

Li, Qiuju, and Li Su. 2017. “Accommodating Informative Dropout and Death: A Joint Modelling Approach for Longitudinal and Semicompeting Risks Data.” Journal of the Royal Statistical Society: Series C (Applied Statistics). doi:10.1111/rssc.12210.