No Free Lunch: Non-Asymptotic Analysis of Prediction-Powered Inference

This paper examines the performance of Prediction-Powered Inference (PPI++), a statistical method combining labeled and unlabeled data for estimation. While previous work suggested PPI++ always improved over using labeled data alone asymptotically, this analysis provides a finite-sample "no free lunch" result. It demonstrates that PPI++ only outperforms classical methods if the correlation between pseudo-labels and true labels is above a specific threshold dependent on the labeled sample size. The research characterizes the conditions for this improvement for both single-sample and split-sample versions of PPI++ and shows empirically that the single-sample variant can produce overly optimistic confidence intervals despite potentially lower MSE.

Om Podcasten