Two-phase outcome-dependent sampling and the assessment of expensive covariates
Two-phase or multi-phase sampling is common in settings where measurements on certain variables exist or are readily obtained for a large cohort of individuals (phase 1), but the measurement of other key variables entails substantial additional cost. If these “expensive” variables are measured on a subset of individuals in phase 2, selection of these individuals according to the observed values of the phase 1 variables can produce more precise inferences than selecting a simple random sample for phase 2. In this talk I discuss sampling designs in which phase 1 data include full or partial information on response variables, so that the phase 2 sample is outcome-dependent. Estimation and hypothesis testing methodology for covariate effects will be described. A variety of examples will be discussed, and an application involving tests of association for rare genetic variants and a trait will be presented.