Generalized Estimating Equations in longitudinal data analysis in the presence of missing data
DOI:
https://doi.org/10.62071/tmjm.v7i1.721Keywords:
Selection Criteria, Correlation Structures, Missing Data, Generalized Estimating Equations, Generalized Linear ModelAbstract
Generalized Estimating Equations (GEE) are a statistical approach used to estimate the parameters of Generalized Linear Models (GLMs) in the presence of potential correlations among observations, particularly across different time points. GEE adjusts for within-cluster correlations, enabling more accurate and efficient parameter estimation when fitting regression models. Correctly specifying the correlation structure in a statistical model enhances the efficiency of parameter estimates. However, the challenge of missing data, which is common in many studies, can significantly impact the reliability of inferences drawn from GEE-based models. This paper explores recently developed selection criteria for identifying the underlying correlation structure, focusing on longitudinal studies with varying degrees of missingness ($\Delta m \in {5\%, 10\%, 15\%}$). The criteria under investigation include: (a) Rotnitzky and Jewell Criterion (RJ), (b) Gaussian Pseudolikelihood Criterion (GP), (c) Quasi-likelihood under Independence Model Criterion (QIC), (d) Correlation Information Criterion (CIC), (e) Pardo and Alonso Criterion (PAC), and (f) Gaussian Bayesian Information Criterion (GBIC). The study examines performance across varying cluster sizes, highlighting the importance of accounting for different degrees of correlation in both complete and incomplete datasets. Across all scenarios with positive results, the findings reveal that GBIC demonstrates robust and consistent performance, even in the presence of missing observations.