6.6 Results
Among the 150 different data-generating mechanisms I simulated data from, I chose (for conciseness) to present results only for the settings of 15 clusters of 100 individuals each, with a frailty variance of 0.25. I will also exclude Royston-Parmar models with 3 or 7 degrees of freedom from this comparison, again for conciseness. Additional results can be explored interactively at https://ag475.shinyapps.io/PRR-SiReX/.
Bias, coverage, and mean squared error of the estimated regression coefficient are presented in Tables A.7, A.8, A.9 and Figures B.7, B.8, B.9. With a true exponential baseline hazard, all models produced unbiased estimates under all scenarios; with a true Weibull baseline hazard, all models performed well except the parametric models with an exponential or Gompertz baseline hazard, which yielded underestimated regression coefficients (approximately -0.09 on the log-hazard rate scale). Analogously, with a true Gompertz baseline hazard the parametric Gompertz model, the flexible parametric models, and the Cox models performed well with unbiased estimates; the parametric exponential and Weibull models yielded overestimated results (approximately 0.13). With the first Weibull-Weibull baseline hazard, the flexible parametric models and the Cox model performed well; conversely, the parametric models yielded overestimated results (exponential and Gompertz, approximately -0.07) or underestimated results (Weibull, approximately 0.10). Similarly, with the second Weibull-Weibull, the flexible parametric models and the Cox model returned unbiased estimates; the Weibull model returned unbiased results too. The exponential and Gompertz parametric models, on the other side, return underestimated results (approximately -0.11). Coverage followed a similar pattern; when the model yielded unbiased results, coverage was optimal at approximately 95%. Conversely, when the estimates were biased and the parametric distribution was misspecified or failed to capture a complex hazard shape coverage dropped considerably. The lowest coverage was approximately 35% for models with an exponential baseline hazard. Mean squared error of the estimated coefficients was the smallest when the model was well specified, or when using the Cox model or Royston-Parmar models.
Bias, coverage, and mean squared error of the estimated frailty variance are presented in Tables A.10, A.11, A.12 and Figures B.10, B.11, B.12. With a true exponential baseline hazard, all models yielded slightly biased results irrespectively of the frailty distribution: models with a well-specified frailty distribution yielded slightly negatively biased results (-0.03 to -0.01; the Cox model with a Gamma frailty performed worse with a negative bias of -0.09). When assuming a Gamma frailty in place of a log-normal frailty, negative bias was somewhat greater (around -0.05, with the Cox model once again performing worse with a negative bias of -0.11); when assuming a log-normal frailty in place of a Gamma frailty, results were slightly positively biased (approximately 0.01). With more complex true baseline hazard functions, the flexible parametric models performed the best with performance similar to the exponential setting; conversely, fully parametric models performed worse when the baseline hazard was misspecified (with both negative and positive bias depending on the setting, up to -0.15 and 0.10). With a complex baseline hazard, negative and positive bias for the fully parametric models further increased up to -0.15 and 0.15, approximately. The Cox model with a Gamma frailty performed the worst, severely underestimating the frailty variance (up to -0.18, approximately). Coverage was generally suboptimal, in the range 70-90%, with a few exceptions; the fully parametric models showed good coverage at times, a symptom of overestimated standard errors (given that they returned biased estimates). Mean squared errors reflected the pattern observed for bias, with the flexible parametric models performing better than the other models across the range of frailty distributions and baseline hazards examined; the parametric models performed similarly when well specified, slightly worse otherwise. The Cox model with a log-normal frailty performed similarly to the Royston-Parmar models, while the Cox model with a Gamma frailty performed worse, especially with a complex baseline hazard (where it performed even worse than fully parametric models).
Finally, bias and mean squared error of the estimated difference in 5-years life expectancy are presented in Tables A.13, A.14 and Figures B.13, B.14. With a true Gamma frailty, the flexible parametric models perform well with estimates of the difference in 5-years life expectancy that are practically unbiased; the parametric models showed good performance when well specified, returned slightly biased results otherwise (both negative and positive bias, up to -0.04 and 0.08 respectively - an absolute difference of 0.5 to 1.0 months in terms of time). With a true log-normal frailty distribution, the Royston-Parmar models produced slightly overestimated results (0.01 to 0.05), while the remaining models performed similarly to the setting with a true Gamma distribution. Bias slightly increased with a complex baseline hazard when using parametric models, up to 0.12 (i.e. approximately 1.5 months). Mean squared errors showed a similar pattern, with all models performing comparably with a true exponential or Gompertz baseline hazard, and the flexible parametric models performing best otherwise (compared to misspecified models). The Cox model generally performed similarly or slightly worse than the flexible parametric models.