Bayesian Profile Regression with Evaluation on Simulated Data
Using regression analysis to make inference on data sets that contain a large number of potentially correlated covariates can be difficult. This situation has become more common in clinical observational studies due to the dramatic improvement in information capturing technology for clinical databases. For instance, in disease diagnosis and treatment, obtaining a number of indicators regarding patients' organ function is much easier than before and these indicators can be highly correlated. We discuss Bayesian profile regression, an approach that deals with these problems for the binary covariates commonly recorded in clinical databases. Clusters of patients with similar covariate profiles are formed through the application of a Dirichlet prior and then associated with outcomes via a regression model. Methods for evaluating the clustering and making inference are introduced afterwards. We use simulated data to compare the performance of Bayesian profile regression to the LASSO, a popular alternative for data sets with a large number of predictors. To make these comparisons, we apply the recently developed R package PReMiuM, to fit the Bayesian profile regression.
Keywords: Bayesian mixture model; Clustering; Dirichlet Process; Profile regression