Simply Salford Blog

Regression Model Building via Classification Trees [tutorial]

Posted by Dan Steinberg on Tue, Apr 9, 2013 @ 12:13 PM

Experienced users of decision trees have long appreciated that decision trees in general are often not impressive performers when it comes to regression.  This does not in the least suggest that regression trees are not valuable analytical tools. As always, they are fabulous for gaining insight into data, making rapid out of the box progress even when working with highly flawed data, detecting hidden but important flaws in the data, and identifying valuable predictors. Regression trees are among the most useful of tools during exploratory data analysis, when the modeler is struggling to understand the data and elicit the dominant predictive patterns. This will be especially true when the data is strewn with missing values as the CART regression tree user will not need to do any special data preparation devoted to dealing with the missing values: CART will handle the missings effectively.  But regression trees (at least single regression trees) often yield lower predictive accuracy than other methods, in part, because they generally produce a rather limited number of distinct predictions. All records falling into a specific terminal node of a regression tree share the same prediction – lumping all modestly similar records into the same predictive bucket.  Regression trees suffer from one further problem that is rarely appreciated: because the criterion that is used to build the model is the same as the criterion used to assess the performance of the model, regression trees have an enhanced tendency to overfit to the training data.  (More on this latter point later.)

Read More

Topics: CART, Regression, classification trees, Tutorial, SPM 7

Discussion Questions from "The Evolution of Regression"

Posted by Heather Hinman on Thu, Apr 4, 2013 @ 11:30 AM

During the course of Salford Systems' 4-part webinar series "The Evolution of Regression," some very good questions from the audience have made their way to presenter Dr. Dan Steinberg, CEO and Founder. Here are a few responses that we thought would benefit everyone who is interested in regression, nonlinear regression, regularized regression, decision tree ensembles and post-processing techniques.

Read More

Topics: TreeNet, Random Forests, GPS, MARS, Regression, Webinar, SPM 7

Navigating the New Features in SPM v7.0

Posted by Heather Hinman on Wed, Mar 20, 2013 @ 05:58 AM

The SPM software suite v7.0 is Salford Systems' latest release of its award-winning suite of sophisticated data mining software. So, what's new in SPM v7.0? And, what's all this talk about "batteries" A.K.A automation?

Read More

Topics: SPM 7

Hands-On Webinar "The Evolution of Regression Modeling"

Posted by Heather Hinman on Tue, Mar 12, 2013 @ 04:08 AM

Part 2 - Hands-On Component is this Friday! [March 15, 2013, 10 am PST (San Diego, CA)]

Overcoming Linear Regression Limitations

Regression is one of the most popular modeling methods, but the classical approach has significant problems. This webinar series addresses these problems. Are you working with larger datasets? Is your data challenging? Does your data include missing values, nonlinear relationships, local patterns and interactions? This webinar series is for you! We will cover improvements to conventional and logistic regression, and will include a discussion of classical, regularized, and nonlinear regression, as well as modern ensemble and data mining approaches. This series will be of value to any classically trained statistician or modeler.

Read More

Topics: GPS, MARS, Regression, Webinar, SPM 7

Video: The Evolution of Regression [Part 1]

Posted by Heather Hinman on Tue, Mar 5, 2013 @ 04:00 AM

Whether you were able to attend Part 1 of the webinar series "The Evolution of Regression Modeling: From Classical Linear Regression to Modern Ensembles" or not, the on-demand recording is now available. Watch the video and download the slides.

Read More

Topics: GPS, MARS, Regression, Webinar, SPM 7

Utilizing Variable Importance in Random Forests [Mini Tutorial]

Posted by Dan Steinberg on Fri, Feb 22, 2013 @ 09:29 AM

(Applies to all versions of Salford Systems Random Forests and SPM. Some controls discussed below are new to SPM 7.0)

Many analysts highly value the ability to rank predictors in a database. It comes down to knowing what matters and what does not. Especially when working with a large number of variables being able to focus on a relatively small number aids decision makers to have confidence in communication with others. In the SPM software suite every one of Salford Systems’ data mining engines offers a plausible ranking of the available predictors, but Random Forests offers a unique twist on this concept.

Read More

Topics: Random Forests, Variable Importance, Tutorial, SPM 7

Benefits of Data Mining with Model Compression Techniques [video]

Posted by Heather Hinman on Fri, Feb 15, 2013 @ 06:25 AM

[Video Post]

In a recent webinar series, Salford Systems introduced the newest model compression and post-processing techniques available in SPM v7.0; GPS Generalized Path Seeker, ISLE and RuleLearner.

Read More

Topics: video, TreeNet, GPS, ISLE, RuleLearner, SPM 7

Data Mining On A Budget: Choose Wisely

Posted by Heather Hinman on Wed, Feb 13, 2013 @ 08:41 AM

If you're an analyst or statistican working on a limited budget, this is a must-read!

Read More

Topics: TreeNet, SPM 7