Probabilities in CART trees are quite straightforward and are displayed for every node in the CART navigator. Below we show a simple example from the KDD Cup ‘98 data predicting response to a direct mail marketing campaign.
Here we see that the probability of response in the ROOT node is 5.1% and that this increases to 7.6% for the subset of records going to the right hand side child node. The optimal tree navigator for this model has four terminal nodes:
If we right-mouse click on the root node of the navigator we obtain a new menu from which we can select “Rules”:
which, among other options, will allow us to display both rules and terminal probabilities in a single list.
The display lists the conditions that must be met to reach “Terminal Node 3” and also the learn sample probabilities for Non-Response and Response.
The Classic Output also produces a plain text report for Learn, Test, and Pooled samples:
The columns report the total number of records in each terminal node, and then the number with target class="0" followed by the number with target class="1".
The final area the advanced user might wish to look is in the “Node Detail” of the classic output. For the root node we see the following report:
This last report requires an understanding of PRIORS to be fully understood. The best way to understand this is to look at the bottom panel “Within Node Probabilities” and focus on the column labeled “Top” where we see that the two classes are reported to each have probability=0.50. At the same time, we see the “Weighted Counts” in the “Top” column to be anything but equal. What is going on behind the scenes is the adjustments CART makes by default to “equalize” the probabilities for each class regardless of the actual class imbalance in the data. CART does this to ensure that each class is treated equally in our attempt to obtain the best classification accuracy. While this is entirely under the modeler’s control, the default setting of PRIORS EQUAL typically gives the most satisfactory results because CART tries to achieve the same level of accuracy for each class regardless of how different the sample sizes may be. This is usually called for because in fact we usually care more about the rarer class.
In this report then, if we apply priors (which is done automatically everywhere) we end up adjusting the probabilities to reflect this. In the root node, regardless of the actual class balance, the two classes are silently reweighted to achieve equal sizes and thus probabilities of 0.50 each. The “Left” and the “Right” column columns then show adjusted probabilities.
To calculate the probabilities, we first normalize all counts to counts relative to the counts in the root node. Thus the count of 13743 becomes 13743/45650 and the count 1136 becomes 1136/2476. Then, we calculate the probabilities based on these normalized counts and see that (1136/2476)/( 1136/2476 + 13743/45650)=.60380.
The typical analyst will rarely want to look at these adjusted probabilities but it is useful to understand that normalization is the technique used in CART to ensure that rare classes are never treated as less important than common classes unless the analyst specifically changes the CART defaults.