Marginal Loglinear Models for Three Multiple-Response Categorical Variables.
A lot of survey questions include a phrase like, "Choose all that apply'', which lets the respondents choose any number of options from predefined lists of items. Responses to these questions result in categorical variables that are known as multiple-response categorical variables (MRCVs). This thesis focuses on analyzing and modeling of three MRCVs. Bilder and Loughin (2007) model the associations between two MRCVs using generalized loglinear models and briefly describe a few special models to model three MRCVs. This thesis explores the potential complications faced when modeling three MRCVs. Following Bilder and Loughin (2007)’s modeling approach, there are 232 possible models representing different combinations of associations. Parameters are estimated using generalized estimating equations generated by a pseudo-likelihood and variances of the estimates are corrected using sandwich methods. Due to the large number of possible models, model comparisons based on hypothesis testing of nested models would be computationally intensive and inefficient. As an alternative, model averaging is proposed as a model comparison tool which can be also used to account for model selection uncertainty. Further it is noticed that the calculations required for computing the variance of the estimates can exceed 32-bit machine capacity even for a moderately large number of items. This issue is rectified by reducing the dimension of the calculation and using sparse matrices.
Keywords: multiple-response categorical variables; loglinear models; pseudo-likelihood; model averaging.