A Latent Trait-based Measure as a Data Harmonization and Missing Data Solution Applied to the Environmental Influences on Child Health Outcomes Cohort
Recommended Citation
Knapp EA, Kress AM, Ghidey R, Gorham TJ, Galdo B, Petrill SA, Aris IM, Bastain TM, Camargo CA, Jr., Coccia MA, Cragoe N, Dabelea D, Dunlop AL, Gebretsadik T, Hartert T, Hipwell AE, Johnson CC, Karagas MR, LeWinn KZ, Maldonado LE, McEvoy CT, Mirzakhani H, O'Connor TG, O'Shea TM, Wang Z, Wright RJ, Ziegler K, Zhu Y, Bartlett CW, and Lau B. A Latent Trait-based Measure as a Data Harmonization and Missing Data Solution Applied to the Environmental Influences on Child Health Outcomes Cohort. Epidemiology 2025.
Document Type
Article
Publication Date
1-30-2025
Publication Title
Epidemiology (Cambridge, Mass.)
Abstract
BACKGROUND: Collaborative research consortia provide an efficient method to increase sample size, enabling evaluation of subgroup heterogeneity and rare outcomes. In addition to missing data challenges faced by all cohort studies like nonresponse and attrition, collaborative studies have missing data due to differences in study design and measurement of the contributing studies.
METHODS: We extend ROSETTA, a latent variable method that creates common measures across datasets collecting the same latent constructs with only partial overlap in measures, to define a common measure of socioeconomic status (SES) across cohorts with varying indicators in the Environmental influences on Child Health Outcomes Cohort, a consortium of pregnancy and pediatric cohorts.
RESULTS: Starting with 52 indicators of prenatal SES from 39,372 participants across 53 cohorts, ROSETTA created three factors representing key domains of SES: income and education, insurance and poverty, and unemployment. At least one factor score was available for 34,528 participants; two factors were available for more participants than any single indicator. Factors fit the data well, had content validity, and were correlated with alternative measures of SES (for income & education factor, r= 0.40-0.89). Higher SES as measured by the factor scores was associated with lower odds of prenatal smoking:OR income & education 0.42 (95% CI 0.38, 0.45). Missing data were reduced compared to most methods, except for multiple imputation.
CONCLUSIONS: ROSETTA aids in pooled analysis of individual participant data by creating measures on a common scale and maximizing data in the presence of missing and mismatched measures.
PubMed ID
39884749
ePublication
ePub ahead of print