Interactive data analysis via selective inference
We consider the problem of a data scientist or applied statistician querying a dataset several times (i.e. observing several functions of the data). After several queries, the data scientist wants to report their findings in the form of confidence intervals or p-values.
Building on the conditional approach to selective inference, we describe a concrete approach to this problem (at least for several common queries). Using the device of randomization (similar to data splitting) we construct an appropriate distribution for inference in this setting. Key to the approach is an explicit inversion of the KKT conditions of a convex program, as well as a selective version of the central limit theorem.
This is joint work with Xioaying Tian Harris, Snigdha Panigrahi, Jelena Markovic and Nan Bi.