In this post we continue the discussion of saving OOB (out-of-bag) predictions when testing via cross-validation with MARS. The principles for MARS are the same as they are for CART and the organization of the file saved follows the same high-level logic. However, as the details are a little different we thought it would be worthwhile exhibiting the OOB results and how we get them in the context of MARS as well. Recalling that when using K-fold cross-validation we actually develop K different models each tested on a different test sample (CVBIN) and that the final model and results are reported for an overall model built on all the data where nothing has been held back for test. The topic of discussion is how to obtain the equivalent of test sample predictions so that we can manipulate and further analyze the test sample residuals (for regressions).
When assessing predictive model performance using cross-validation, the model we obtain after all the computation is actually a model built on all of the data, that is, a model for which no data was reserved for testing. The standard test results reported for this all-data model are actually estimated and synthesized from the supplementary models built on parts of the data. Typically, the supplementary models are thrown away after they have served their purpose of helping us construct educated guesses about the future performance of the all-data model on new previously unseen data.
Random Forests is the unique learning machine that has no need of an explicit test sample because of its use of bootstrap sampling for every tree. This ensures that every tree in the forest is built on about 63% of the available data, leaving the remaining approximately 37% for testing [the OOB (out-of-bag) data].