Salford Systems recently attended the Predictive Analytics World (PAW) conference in San Francisco as a sponsor. Manning the exhibit booth was yours truly, and I was fortunate to meet many analysts, predictive modelers, and data scientists of all experience levels. Even though this is always an entertaining break from every-day office life, my favorite part of the conference was being able to participate in a workshop offered by Dean Abbott, President of Abbott Analytics, entitled "Supercharging Prediction: Hands-On With Ensembles Models."
I will admit I was intimidated at first to be participating in a predictive modeling workshop as I do not have a background in statistics, and only have basic training on decision tree tools by Salford Systems' team of in-house experts. Despite my basic knowledge of decision trees, I was thrilled that I was able to follow along with ease and understanding when learning about tree ensembles and modern hybrid modeling approaches. Marketing folk building predictive models? Yes, we can!
I jotted down some of my takeaways to share with this audience. I hope you enjoy them!
#1 Why use tree ensemble methods?
One of the great things about tree ensembles is their accuracy. They are much better than individual decision trees and are on par with or better than neural networks and support vector machines. The SPM software suite incorporates many automation features which make tree ensembles easy to build, require little data preparation, and are fast and reliable. By combining outputs from multiple models into a single decision you boost your prediction power and create a model that you can trust.
#2 Don't be afraid of the data
Prior to this workshop I was very intimidated by the overwhelming amount of data. What did the variables mean? What might be a good predictor vs. a bad predictor? I learned that In SPM, when you grab your data file, you can click the button "stats" and screen your data before building a model. This way, you can see useful information like: the percent of missing values, which variables may or may not be good predictors, how many variables there are, how many records there are, etc. For example, if you see that in your data you are missing 80% of fields for the variable "Gender" you probably don't want to use this variable as a predictor.
#3 Model setup is KEY
In every tutorial video, training class, and lecture I've ever heard related to data mining and predictive modeling (which, by the way, is very limited) I've at least gathered that model setup is very important. Here are my key takeaways from the workshop at PAW:
- Your testing tab (learn vs. test/ train vs. Test) should really depend on how many records you have in your data set. The SPM software has a few different default testing options like: v-fold cross validation, 20% random testing, no independent testing, etc. If you have enough data to partition 50% learn and 50% test - this is a great place to start!
- The software (SPM v7.0) defaults to priors EQUAL - stick with it if you're a beginner and just starting out.
- Regression vs. Classification
- Okay this is an easy one. Regression = continuous target variable. Classification = categorical target variable. If you're working with TreeNet stochastic gradient boosting (boosted decision tree method) I found it useful to set the analysis type to Logistic Binary for a classification problem.
- Number of Trees to Ensemble
- This is one to play around with. My favorite ensemble method used during the workshop was TreeNet, which has default settings to grow 200 trees. You can play with this and set 300, 500, 1,000 trees to see if your performance improves. (I found building 500 trees on the example data set worked well).
#4 Choose the right ensemble method for your projectDuring the workshop, two ensemble methods were discussed: boosting and bagging.
- Boosting methods go something like this: create a tree using your training data set and score each data point indicating incorrect decisions (errors). Then, restrain, giving rows with incorrect decisions more weight. Repeat. Your final prediction is a weighted average of all the models. TIP: create "weak" models (small trees) and let the boosting iterations find the complexity of the model. Example: TreeNet stochastic gradient boosting.
- Bagging methods create many data sets by bootstrapping and create one decision tree for each data set. Then, it combines the decision trees by averaging (voting) on final decisions. The results produce better performance than individual trees and reduce model variance (error) rather than bias. TIP: this is a great tool when working with "wide" data sets that have a ton of variables (columns) and a limited number of records (rows). Example: Random Forests.
#5 Batteries are a predictive modeler's playground!
Batteries are what Salford Systems calls its pre-packaged experiments based on years of consulting projects and hands-on data modeling projects. These batteries are essentially automation features incorporated into the software and available at the click of a button to run multiple experiments on the data to try to improve the model performance.
After just a few basic, default models I was ready to start building like an expert. May I remind you that I had never built a tree ensemble before.
As a class, we were all trying to improve our model performance by building more trees, eliminating the number of predictors (getting rid of the noise), trying different batteries (automated experiments), and tweaking the default settings of tools like TreeNet and Random Forests. I didn't have time to experiment with all of the batteries, but I did try a few of the more popular ones that have recently been documented on the Salford Systems website.
The cool thing about batteries is that I was able to build a model with "tricks" even though I don't have any expertise in the field. If you're like me and are new to this advanced technology, you can understand how awesome this was for me.
Putting it into Practice
Now back at the office in San Diego, along with my usual responsibilities, I feel confident in my ability to build predictive models and gain insights into the data at hand to achieve the email marketing and online campaign goals for our communication efforts! Heck - maybe you'll be a part of my next campaign.
If you feel like you're a novice when it comes to predictive modeling, we should chat because you can't possibly be less prepared than I was going into this workshop. If you also are new to predictive modeling or attended the workshop at PAW - leave your comments, thoughts and takeaways.