We recently came across the article, "Random Forest---the go-to machine learning algorithm" from TechWorld Australia.Read More
The SPM Salford Predictive Modeler software suite offers several tools for clustering and segmentation including CART, Random Forests, and a classical statistical module CLUSTER. In this article we illustrate the use of these tools with the well known Boston Housing data set (pertaining to 1970s housing prices and neighborhood characteristics in the greater Boston area).
Random Forests is the unique learning machine that has no need of an explicit test sample because of its use of bootstrap sampling for every tree. This ensures that every tree in the forest is built on about 63% of the available data, leaving the remaining approximately 37% for testing [the OOB (out-of-bag) data].
Salford Systems recently attended the Predictive Analytics World (PAW) conference in San Francisco as a sponsor. Manning the exhibit booth was yours truly, and I was fortunate to meet many analysts, predictive modelers, and data scientists of all experience levels. Even though this is always an entertaining break from every-day office life, my favorite part of the conference was being able to participate in a workshop offered by Dean Abbott, President of Abbott Analytics, entitled "Supercharging Prediction: Hands-On With Ensembles Models."
During the course of Salford Systems' 4-part webinar series "The Evolution of Regression," some very good questions from the audience have made their way to presenter Dr. Dan Steinberg, CEO and Founder. Here are a few responses that we thought would benefit everyone who is interested in regression, nonlinear regression, regularized regression, decision tree ensembles and post-processing techniques.
Many analysts highly value the ability to rank predictors in a database. It comes down to knowing what matters and what does not. Especially when working with a large number of variables being able to focus on a relatively small number aids decision makers to have confidence in communication with others. In the SPM software suite every one of Salford Systems’ data mining engines offers a plausible ranking of the available predictors, but Random Forests offers a unique twist on this concept.
Salford Systems has maintained long-term relationships with data mining visionaries like Random Forests co-developer Dr. Adele Cutler. In a recent visit to San Diego, she spent time with Salford Systems' staff discussing plans for Random Forests future developments, offering an introductory session on Random Forests, and sharing some personal memories of her time working with Dr. Leo Breiman on Random Forests.