The "leave-one-out" (LOO) or jackknife testing method is well known for regression models, and users often ask if they could use it for CART models. For example, if you had a dataset with 200 rows, you could ask for 200-fold cross-validation, resulting in 200 runs; each of which would be built on 199 training records and tested on the single record which was left out. Those who have experimented with this for regression trees already know from experience that this does not work well, and you do not obtain reliable estimates of the generalization error (performance of your tree on previously unseen data). In this post I comment on why this is the case and what your options are.
Updated: July 16, 2013
In their 1984 monograph, Classification and Regression Trees, Breiman, Friedman, Olshen and Stone discussed at length the need to obtain “honest” estimates of the predictive accuracy of a tree–based model. At the time the monograph was written, many data sets were small, so the authors took great pains to work out an effective way to use cross–validation with CART trees.
The result was a major advance for data mining, introducing ideas that at the time were radically new. The main point of the discussion was that the only way to avoid overfitting was to rely on test data. With plentiful data we can always reserve a portion for testing, but with fewer data we might have to rely on cross validation. In either case, however, only the test or cross–validated results should be trusted. In contrast, earlier approaches tended to ignore the training data performance results and focus only on the test data.
In this post we continue the discussion of saving OOB (out-of-bag) predictions when testing via cross-validation with MARS. The principles for MARS are the same as they are for CART and the organization of the file saved follows the same high-level logic. However, as the details are a little different we thought it would be worthwhile exhibiting the OOB results and how we get them in the context of MARS as well. Recalling that when using K-fold cross-validation we actually develop K different models each tested on a different test sample (CVBIN) and that the final model and results are reported for an overall model built on all the data where nothing has been held back for test. The topic of discussion is how to obtain the equivalent of test sample predictions so that we can manipulate and further analyze the test sample residuals (for regressions).
When assessing predictive model performance using cross-validation, the model we obtain after all the computation is actually a model built on all of the data, that is, a model for which no data was reserved for testing. The standard test results reported for this all-data model are actually estimated and synthesized from the supplementary models built on parts of the data. Typically, the supplementary models are thrown away after they have served their purpose of helping us construct educated guesses about the future performance of the all-data model on new previously unseen data.
Salford Predictive Modeler™ and its component data mining engines CART®, MARS®, TreeNet®, and RandomForests® contain a variety of tools to help modelers work quickly and efficiently. One of the most effective tools for rapid model development is found in the BATTERY tab of the MODEL Set Up dialog. Because there are so many tools embedded in that dialog we are going to start a series of posts going through the principal BATTERY choices, one at a time.
Let’s start with the idea of the BATTERY. The BATTERY mechanism is an automated system for running experiments and trying out different modeling ideas. Instead of you having to think about how you would like to tweak your model to try to make it better the BATTERY does it for you. Each BATTERY is a planned experiment in which we take some useful modeling control and run a series of models in which we systematically change that control. The best part of this is the SUMMARY which provides you with an executive summary of the results and points you to the best performing model. We recommend that you use the BATTERY often; some modelers don’t do anything without setting up pre–packaged or user customized batteries.