The "leave-one-out" (LOO) or jackknife testing method is well known for regression models, and users often ask if they could use it for CART models. For example, if you had a dataset with 200 rows, you could ask for 200-fold cross-validation, resulting in 200 runs; each of which would be built on 199 training records and tested on the single record which was left out. Those who have experimented with this for regression trees already know from experience that this does not work well, and you do not obtain reliable estimates of the generalization error (performance of your tree on previously unseen data). In this post I comment on why this is the case and what your options are.

The first reason the jackknife will not work for CART regression trees is that LOO never works for the estimation of any statistic (or performance measure) that is not "smooth." Decision trees are anything but smooth, and a small change in the value of a predictor X can lead to a large change in the prediction made, as we cause a record to jump to a different node. Alternatively, many changes in predictor values lead to no change at all in model predictions, when the change is insufficient to change the path of record down a tree. For any such "non smooth" outcome the LOO testing strategy will never work.

However, we do know that for moderate values of k, k-fold cross-validation can be very effective and reliable. So you might ask how far we can safely push k in the direction of LOO without actually going quite that far. In "An Introduction to the Bootstrap" Brad Efron and Rob Tibshirani discuss the "delete-d jackknife" in which we leave out a small number of records in each rebuild of the model. In that section of their book they suggest that the smallest number of records you must leave out needs to be larger than the square root of the training sample size. Thus, in our example of 200 training records we must leave out at least 15 records in each cycle; for simplicity, 10-fold CV seems about the best we can do.

But with larger training sample sizes we push the number of CV folds higher. Some suggested numbers of folds are shown below (recall we want the folds to be as close to equal sized as possible). In the table LOD is the "left out d" and the last column is a convenient value for the number of CV folds compatible with that LOD value.

Ntrain | LOD |

CV | FOLDS |

200 | 15 |

10 | |

500 | 23 |

20 | |

1000 | 32 |

30 | |

1500 | 39 |

34 | |

2000 | 45 |

40 |

For this type of analysis you can either think in terms of the smallest sample size you can safely use for any test partition, or the largest number of approximately equal sized CV folds you should request.

What about for classification trees? Here we are constrained by the fact that each CV fold must contain at least one record from each target class. If we start with 500 training records but only have 15 records for the minority class then we cannot generate more than 15 CV folds. We will have more to say about the largest number of CV folds you can reasonably use in a classification tree in another post, but here want to point out that you should not try to go above the limits discussed above for regression trees.