Mei-Cheng Wang

Complexity in Simple Cross-Sectional Data with Binary Disease Outcome

Cross-sectionally sampled data with binary disease outcome are commonly collected and analyzed in observational studies for understanding how  covariates correlate with disease occurrence.  This talk will address two questions:  (1) Which risk can be identified in a commonly adopted model (such as the logistic  model)?  (2) Are there problems when interpreting the identifiable risk? As the progression of a disease typically involves both disease status and duration, this paper considers  how the binary disease outcome is connected to the progression of disease through the birth-illness-death process. In general,  we conclude that the distribution of cross-sectional binary outcome  could be very different from  the population risk distribution. The cross-sectional risk probability  is determined jointly by the population risk probability  together with the ratio of duration of diseased state to the duration of disease-free state. Using the  logistic  model as an illustrating example, we examine the bias from cross-sectional data  and argue that the bias can almost never be avoided. We present an approach which treats the binary outcome as a specific type of current status data and offers a compromised model on the basis of an age-specific  risk probability (ARP), though the interpretation of the  ARP itself could also be questioned. An analysis based on Alzheimer's disease data is presented to illustrate the ARP approach and data complexity.