Eric Siegel’s Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die is a nontechnical overview of modern analytics with detailed discussion of how machine learning is being deployed across all industries and in all major corporations. Eric is a hugely entertaining writer and brings with him the expertise you would expect of a Columbia University trained Ph.D.. Geoffrey Moore writes that the book is “deeply informative” and Tom Peters calls the book “The most readable ‘big data’ book I’ve come across. By far”.Read More
We recently came across the article, "Random Forest---the go-to machine learning algorithm" from TechWorld Australia.Read More
The "leave-one-out" (LOO) or jackknife testing method is well known for regression models, and users often ask if they could use it for CART models. For example, if you had a dataset with 200 rows, you could ask for 200-fold cross-validation, resulting in 200 runs; each of which would be built on 199 training records and tested on the single record which was left out. Those who have experimented with this for regression trees already know from experience that this does not work well, and you do not obtain reliable estimates of the generalization error (performance of your tree on previously unseen data). In this post I comment on why this is the case and what your options are.
This guide is for data mining practitioners or data scientists with experience using CART Classification and Regression Trees. Walk yourself through the slideshare for a more in-depth understanding of how CART decision trees can be implemented in today's data mining applications.
This blog post is extracted from one of Salford Systems' video tutorial lectures offered by Dan Steinberg. Take some time out of your day to improve your knowldege of CART.
In this blog I'll address the CART tree sequence. CART follows a forward growing and a background pruning process to arrive at the optimal tree. In the process CART generates for us, not just one model, but a collection of progressively simpler models. This collection of models is known as the "tree sequence." In this article I will explain the forward and backwards tree generation process. I will also discuss how a modeler might use judgment to select a near optimal tree that might be better for deployment than the so–called optimal tree. (this blog is a transcript of the video below).
Experienced users of decision trees have long appreciated that decision trees in general are often not impressive performers when it comes to regression. This does not in the least suggest that regression trees are not valuable analytical tools. As always, they are fabulous for gaining insight into data, making rapid out of the box progress even when working with highly flawed data, detecting hidden but important flaws in the data, and identifying valuable predictors. Regression trees are among the most useful of tools during exploratory data analysis, when the modeler is struggling to understand the data and elicit the dominant predictive patterns. This will be especially true when the data is strewn with missing values as the CART regression tree user will not need to do any special data preparation devoted to dealing with the missing values: CART will handle the missings effectively. But regression trees (at least single regression trees) often yield lower predictive accuracy than other methods, in part, because they generally produce a rather limited number of distinct predictions. All records falling into a specific terminal node of a regression tree share the same prediction – lumping all modestly similar records into the same predictive bucket. Regression trees suffer from one further problem that is rarely appreciated: because the criterion that is used to build the model is the same as the criterion used to assess the performance of the model, regression trees have an enhanced tendency to overfit to the training data. (More on this latter point later.)