When your hear data mining automation, what do you usually think of? In general, the automation world has us thinking about the following process: you upload a dataset, select a multitude of modeling engines, then out comes the best model, and voilà, you're done. There are many positives in this scenario; you save time and it's generally reliable regarding the 'best model.' However, what about automating model building EXPERIMENTS? You ask: what do you mean by experiments?
Congratulations to DataLab USA! The database marketing solutions company took first place in the 2013 Direct Marketing Association (DMA) modeling challenge with their successful use of the SPM Ultra software.
Probabilities in CART trees are quite straightforward and are displayed for every node in the CART navigator. Below we show a simple example from the KDD Cup ‘98 data predicting response to a direct mail marketing campaign.
Salford Systems recently attended the Predictive Analytics World (PAW) conference in San Francisco as a sponsor. Manning the exhibit booth was yours truly, and I was fortunate to meet many analysts, predictive modelers, and data scientists of all experience levels. Even though this is always an entertaining break from every-day office life, my favorite part of the conference was being able to participate in a workshop offered by Dean Abbott, President of Abbott Analytics, entitled "Supercharging Prediction: Hands-On With Ensembles Models."
While TreeNet (Stochastic Gradient Boosting) can work phenomenally well out of the box it almost always pays to try to tune your control parameters. Devoting time to optimizing a TreeNet model can improve its out of sample performance noticeably.
This note is actually relevant to all Salford data mining engines. CART, and SPM generally, can use quite a bit of memory for storing the trees grown and all the model statistics and graphs relevant. This is because CART gives you full access to every size of tree grown with performance stats available for each. In a session I just ran today I ran about 50 individual CART trees (I wrote a small command script for this) and about 1,000 additional CART trees using the BATTERY mechanism. Every tree grown, and every sub tree was available for me to examine instantaneously. When I ran another model and wanted to drill down into the details I received a message that the "system was running low on resources" and I was advised to close some open windows. Naturally, I was not eager to start closing windows one-by-one, so what to do?