When MARS develops a model it actually develops many and presents you with the one that it judges best based on a self-testing procedure. But the so-called MARS optimal model may not be satisfactory from your perspective. It might be too small (include too few variables), too large (include too many variables), too complex (include too many splines, basis functions, or breaks in variables), or otherwise not to your liking based on your domain knowledge. So what can you do to override the MARS process?
To review your options, we start with the GOODBAD.CSV data set we have used in many other FAQs and training sessions; the Activity window displaying some file details appears below.
We use TARGET as our dependent variable, all other variables as predictors, choose MARS as our Analysis Method, and allow SPM to allocate 20% of the data to a test partition randomly. The Model Setup window is displayed next. You can run this model as either a regression or a logistic binary, as this should make no difference for the purpose of the current discussion. MARS always runs linear regressions but will report ROC results as well if you choose the the binary logistic model type.
Running this MARS model yields the following results display:
This display tells us that MARS actually developed 12 models and it displays performance results for each model. This starts at the left with zero predictors in the model (the regression thus contains only a constant), all the way to 12 predictors at the right edge of the graph. MARS selects the 7 basis function model as the optimal using performance on the test partition as its criterion. We also see an alternative method based on the GCV criterion which we will ignore here.
We want to call attention to the button at the bottom left of the display, surrounded by a green box and pointed to by the green arrow. Click on that to reveal the MARS model Selector. From this display we can select a MARS of any size between 0 and 12 basis functions. We just double-click on the corresponding row in the table and all the model details are revealed in a new display.
The table lists 13 different models; all of which are available to you, with full reporting and graphical displays, as if that model were the optimal model. The models differ in their size, i.e. in the number of basis functions retained during the process of backwards elimination. Row 1 in the table above is the MARS maximal model and includes all basis functions MARS discovered during its forward stepping stage. If you like, you can double click on this row to bring up the model details, as we do below. This is the summary display specific to the model with 12 Basis Functions and gives access to all the other model dimensions.
The regression function created by MARS appears below on the “Final Model” tab.
Here we see the 13 coefficients in total: the constant and the 12 other terms. Seven of the original raw variables appear in the model, some contributing only one basis function and some contributing two. In addition, two missing value indicators appear in the model. If we move to a smaller model, some of these predictors will be dropped from the model and of course the regression coefficients are expected to change.
Below we select the model with 4 basis functions and of course several of the predictors of the larger model no longer appear at all.
You might ask how one goes about choosing a model if you want to override the MARS automated optimal model selection. We can start to answer this by pointing out that all estimation processes are subject to error and we cannot be certain that the MARS optimal model is actually “best”. Thus, there is room for judgment.
We have on occasion found it necessary to select a model that was large enough to include predictors that we felt were mandatory for the model but you might exercise caution if the larger model is noticeably inferior in test sample performance.
Another option is to run the BATTERY (called AUTOMATE in the upcoming version 8.0) for bootstrap resampling, BATTERY BOOTSTRAP, to see if repeated reestimation of the model on different samples yields models that are considered more in keeping with prior domain knowledge.