We recently came across the article, "Random Forest---the go-to machine learning algorithm" from TechWorld Australia.Read More
The "leave-one-out" (LOO) or jackknife testing method is well known for regression models, and users often ask if they could use it for CART models. For example, if you had a dataset with 200 rows, you could ask for 200-fold cross-validation, resulting in 200 runs; each of which would be built on 199 training records and tested on the single record which was left out. Those who have experimented with this for regression trees already know from experience that this does not work well, and you do not obtain reliable estimates of the generalization error (performance of your tree on previously unseen data). In this post I comment on why this is the case and what your options are.
The recent launch of the Salford Predictive Modeler software suite v7.0, the latest version of Salford Systems’ data mining software package, spurred enormous interest in the data science community, especially after its showcase webinar series: “The Evolution of Regression Modeling.” The series’ instructors, Salford Systems’ CEO and Founder Dr. Dan Steinberg and Senior Scientist Mikhail Golovnya, will continue this showcase of the software’s forefront predictive methodology at the Joint Statistical Meetings in Montreal.
For the last decade Salford Systems has hosted computer training workshops at JSM to educate the leading statisticians, analysts, data scientists, and researchers on its flagship products CART® decision trees, MARS® nonlinear regression, TreeNet® stochastic gradient boosting, and Random Forests®. In conjunction with the software suite’s new model compression techniques and hybrid modeling capabilities, Salford Systems will present yet another new algorithm that has been added to the Salford toolkit, Generalized PathSeeker (GPS). This technology includes methods like LASSO, Ridge, and regularized regression.
The San Francisco Data Mining meetup group invited Salford Systems' CEO Dan Steinberg for a drilled-down presentation based on the webinar series “The Evolution of Regression Modeling” from earlier in the year. The presentation was warmly received by the 285-person audience, and Salford Systems was happy to bring something of popular interest to the table.
There are several tricks available for maneuvering CART into generating a single tree structure that will output predictions for several different target (dependent) variables in each terminal node. For CART the idea seems very natural in that the structure of the model is just a segmentation of the data into mutually exclusive and collectively exhaustive segments. If the segments of CART tree designed for one target variable have been well constructed then the segments could easily be relevant for the prediction of many outcomes. A segmentation (CART tree) based on common demographics and Facebook likes, for example, could be used to predict consumption of tuna fish, frequency of cinema visits, and monthly hair stylist spend. Of course, the question is: could a common segmentation in fact be useful for three such diverse behaviors, and, if such a segmentation existed, would be able to find it?
Experienced users of decision trees have long appreciated that decision trees in general are often not impressive performers when it comes to regression. This does not in the least suggest that regression trees are not valuable analytical tools. As always, they are fabulous for gaining insight into data, making rapid out of the box progress even when working with highly flawed data, detecting hidden but important flaws in the data, and identifying valuable predictors. Regression trees are among the most useful of tools during exploratory data analysis, when the modeler is struggling to understand the data and elicit the dominant predictive patterns. This will be especially true when the data is strewn with missing values as the CART regression tree user will not need to do any special data preparation devoted to dealing with the missing values: CART will handle the missings effectively. But regression trees (at least single regression trees) often yield lower predictive accuracy than other methods, in part, because they generally produce a rather limited number of distinct predictions. All records falling into a specific terminal node of a regression tree share the same prediction – lumping all modestly similar records into the same predictive bucket. Regression trees suffer from one further problem that is rarely appreciated: because the criterion that is used to build the model is the same as the criterion used to assess the performance of the model, regression trees have an enhanced tendency to overfit to the training data. (More on this latter point later.)
During the course of Salford Systems' 4-part webinar series "The Evolution of Regression," some very good questions from the audience have made their way to presenter Dr. Dan Steinberg, CEO and Founder. Here are a few responses that we thought would benefit everyone who is interested in regression, nonlinear regression, regularized regression, decision tree ensembles and post-processing techniques.
Part 2 - Hands-On Component is this Friday! [March 15, 2013, 10 am PST (San Diego, CA)]
Overcoming Linear Regression Limitations
Regression is one of the most popular modeling methods, but the classical approach has significant problems. This webinar series addresses these problems. Are you working with larger datasets? Is your data challenging? Does your data include missing values, nonlinear relationships, local patterns and interactions? This webinar series is for you! We will cover improvements to conventional and logistic regression, and will include a discussion of classical, regularized, and nonlinear regression, as well as modern ensemble and data mining approaches. This series will be of value to any classically trained statistician or modeler.