Experienced users of decision trees have long appreciated that decision trees in general are often not impressive performers when it comes to regression. This does not in the least suggest that regression trees are not valuable analytical tools. As always, they are fabulous for gaining insight into data, making rapid out of the box progress even when working with highly flawed data, detecting hidden but important flaws in the data, and identifying valuable predictors. Regression trees are among the most useful of tools during exploratory data analysis, when the modeler is struggling to understand the data and elicit the dominant predictive patterns. This will be especially true when the data is strewn with missing values as the CART regression tree user will not need to do any special data preparation devoted to dealing with the missing values: CART will handle the missings effectively. But regression trees (at least single regression trees) often yield lower predictive accuracy than other methods, in part, because they generally produce a rather limited number of distinct predictions. All records falling into a specific terminal node of a regression tree share the same prediction – lumping all modestly similar records into the same predictive bucket. Regression trees suffer from one further problem that is rarely appreciated: because the criterion that is used to build the model is the same as the criterion used to assess the performance of the model, regression trees have an enhanced tendency to overfit to the training data. (More on this latter point later.)
During the course of Salford Systems' 4-part webinar series "The Evolution of Regression," some very good questions from the audience have made their way to presenter Dr. Dan Steinberg, CEO and Founder. Here are a few responses that we thought would benefit everyone who is interested in regression, nonlinear regression, regularized regression, decision tree ensembles and post-processing techniques.
The SPM software suite v7.0 is Salford Systems' latest release of its award-winning suite of sophisticated data mining software. So, what's new in SPM v7.0? And, what's all this talk about "batteries" A.K.A automation?
Topics: SPM 7
Part 2 - Hands-On Component is this Friday! [March 15, 2013, 10 am PST (San Diego, CA)]
Overcoming Linear Regression Limitations
Regression is one of the most popular modeling methods, but the classical approach has significant problems. This webinar series addresses these problems. Are you working with larger datasets? Is your data challenging? Does your data include missing values, nonlinear relationships, local patterns and interactions? This webinar series is for you! We will cover improvements to conventional and logistic regression, and will include a discussion of classical, regularized, and nonlinear regression, as well as modern ensemble and data mining approaches. This series will be of value to any classically trained statistician or modeler.
Whether you were able to attend Part 1 of the webinar series "The Evolution of Regression Modeling: From Classical Linear Regression to Modern Ensembles" or not, the on-demand recording is now available. Watch the video and download the slides.
Many analysts highly value the ability to rank predictors in a database. It comes down to knowing what matters and what does not. Especially when working with a large number of variables being able to focus on a relatively small number aids decision makers to have confidence in communication with others. In the SPM software suite every one of Salford Systems’ data mining engines offers a plausible ranking of the available predictors, but Random Forests offers a unique twist on this concept.
In a recent webinar series, Salford Systems introduced the newest model compression and post-processing techniques available in SPM v7.0; GPS Generalized Path Seeker, ISLE and RuleLearner.