When MARS develops a model it actually develops many and presents you with the one that it judges best based on a self-testing procedure. But the so-called MARS optimal model may not be satisfactory from your perspective. It might be too small (include too few variables), too large (include too many variables), too complex (include too many splines, basis functions, or breaks in variables), or otherwise not to your liking based on your domain knowledge. So what can you do to override the MARS process?
Overfitting is an issue for most machine learning tools. The learners are very flexible and can thus adapt to the noise in the data, as well as to the signal. A classic technique to avoid overfitting is to ensure that we have both learn and validate (or test) data, and then to monitor the learning the process; comparing the goodness of fit or performance on learn and validate data as a function of the amount of training.
In this post we continue the discussion of saving OOB (out-of-bag) predictions when testing via cross-validation with MARS. The principles for MARS are the same as they are for CART and the organization of the file saved follows the same high-level logic. However, as the details are a little different we thought it would be worthwhile exhibiting the OOB results and how we get them in the context of MARS as well. Recalling that when using K-fold cross-validation we actually develop K different models each tested on a different test sample (CVBIN) and that the final model and results are reported for an overall model built on all the data where nothing has been held back for test. The topic of discussion is how to obtain the equivalent of test sample predictions so that we can manipulate and further analyze the test sample residuals (for regressions).
MARS(Multivariate Adaptive Regression Splines), introduced by Stanford University data mining guru Professor Jerome H. Friedman in 1988, is one of the landmarks in the evolution of regression methods. For the first time analysts could leverage a search mechanism intended to automatically discover nonlinearity and interactions in the context of classical regression. The MARS procedure involves a forward stepwise model building stage followed by a backwards elimination of unneeded predictors to arrive at surprisingly high performance models, all automatically. At the heart of the MARS algorithm is the search for "knots" or breaks in the range of a predictor allowing a regression model containing that predictor to have different slopes in each region. Breaking predictors into regions permits nonlinearity, and when interactions are constructed from regions of predictors, remarkable discoveries are enabled.
During the course of Salford Systems' 4-part webinar series "The Evolution of Regression," some very good questions from the audience have made their way to presenter Dr. Dan Steinberg, CEO and Founder. Here are a few responses that we thought would benefit everyone who is interested in regression, nonlinear regression, regularized regression, decision tree ensembles and post-processing techniques.
Part 2 - Hands-On Component is this Friday! [March 15, 2013, 10 am PST (San Diego, CA)]
Overcoming Linear Regression Limitations
Regression is one of the most popular modeling methods, but the classical approach has significant problems. This webinar series addresses these problems. Are you working with larger datasets? Is your data challenging? Does your data include missing values, nonlinear relationships, local patterns and interactions? This webinar series is for you! We will cover improvements to conventional and logistic regression, and will include a discussion of classical, regularized, and nonlinear regression, as well as modern ensemble and data mining approaches. This series will be of value to any classically trained statistician or modeler.
Whether you were able to attend Part 1 of the webinar series "The Evolution of Regression Modeling: From Classical Linear Regression to Modern Ensembles" or not, the on-demand recording is now available. Watch the video and download the slides.
Regression is one of the most popular modeling methods, but the classical approach has significant problems. This webinar series address these problems. Are you working with larger datasets? Is your data challenging? Does your data include missing values, nonlinear relationships, local patterns and interactions? This webinar series is for you! We will cover improvements to conventional and logistic regression, and will include a discussion of classical, regularized, and nonlinear regression, as well as modern ensemble and data mining approaches. This series will be of value to any classically trained statistician or modeler.
Question: How does MARS (Multivariate Adaptive Regression Splines) deal with missing values?
At the 2012 Salford Analytics and Data Mining Conference, Maria Lupetini from Qualcomm gave an easy-to-understand overview of how they are able to make predictions using Salford Systems' MARS (Multivariate Adaptive Regression Splines). A portion of the recorded presentation is below, enjoy!