Statistical Learning Tools for Heteroskedastic Data
When modeling heteroskedasticity, it is desirable to simultaneously fit the mean and variance in a single model. Unless the relevant variables in the mean and variance model are pre-specified, it may be necessary to do model selection. Information Criteria (IC) have been around since the early 1970’s as a way of comparing the quality of fit of arbitrary models to each other. Many forms of IC require a sample size tending to infinity to be correct, and they perform poorly on small samples. A well known small sample IC is AICc, which uses a model-specific formulation to provide good performance in finite samples. AICc can be derived from a framework where the truth is a linear model with homoskedastic errors. However, the analogous formula for heteroskedastic errors seems to be intractable, and using either AIC or AICc as an approximation for the heteroskedastic criteria is shown to be woefully inadequate. Our solution is to simulate a large table of approximate values of the IC for various heteroskedastic models, replacing the closed form with a large pre-computed table. We call our approach of tabulating values of a simulated IC “the CHIC paradigm”. The CHIC paradigm is demonstrated successfully under two very different heteroskedastic contexts.
Keywords: Heteroskedasticity, Information Criteria, Random Forest, Regression Tree