Simply Salford Blog

Predicting Customer Churn with Gradient Boosting

Posted by Salford Systems on Fri, May 6, 2016 @ 07:00 AM

Customer churn presents a particularly vexing problem for businesses; every company loses clients or customers over time. It's no wonder that companies are pouring money and time into this issue, we've all heard that it's less costly to retain a customer than to attract a new one.   Let's take the wireless telecommunications industry as an example. In 2003, the wireless telecom industry had 20-40% of customers leaving their provider in a given year. As once-explosive subscriber growth rates slowed down, retaining existing customers became increasingly important to a company's overall profitability. Currently, annual churn rates for telecommunications companies average between 10-67%. If the customers who are likely to churn can be identified, the company can target them with retention campaigns, giving them an incentive to stay and preventing loss of revenue.

Read More

Topics: TreeNet, stochastic gradient boosting, gradient boosting machine, customer churn, gradient boosting, Customer attrition

The Shape of the Trees in Gradient Boosting Machines

Posted by Salford Systems on Fri, Mar 25, 2016 @ 01:09 PM

Our CEO and founder, Dr. Dan Steinberg recently wrote about gradient boosting machines. Gradient boosting machines are a powerful machine learning technique, and have been deployed with great success over the years in Kaggle competitions.

Read More

Topics: TreeNet, stochastic gradient boosting, machine learning, gradient boosting machine, Jerome Friedman, gradient boosting, gradient boosting machine learning

Data Science in Biology: A Few Problems & Solutions [guest post]

Posted by Kimberly Fahrnkopf on Thu, Sep 11, 2014 @ 10:10 AM

Guest post by Grant Humphries, Post Doctoral researcher, University of California, Davis

Read More

Topics: TreeNet, data mining, big data, data science, predictive modeling, data analysis

Predicting Shifts in El Niño Using Birds & Data Mining

Posted by Kimberly Fahrnkopf on Wed, Sep 3, 2014 @ 10:38 AM

Dr. Grant Humphries, from the Zoology department at the University of Otago, New Zealand, has spent the last three years studying how a bird species called Sooty Shearwaters can help predict upcoming El Niño occurrences. After much time and research, he has figured out a way to do so using data mining.

Read More

Topics: TreeNet, data mining, Variable Importance, big data, data science, prediction, predictive modeling, predictive model

How to Run a Model Using GPS

Posted by David Tolliver on Wed, Aug 27, 2014 @ 05:30 AM

We recently had a question about running a model using GPS, and wanted to share the answer in case anyone else has the same issue.

Read More

Topics: SPM, TreeNet, data mining, GPS, predictive model, data analysis, Tutorial, model scoring

Data Mining 101: A Beginners' Boot Camp

Posted by Heather Hinman on Tue, Jan 28, 2014 @ 04:12 AM

Let's get right to it! You're a beginner, and you want to know what is needed to start data mining and become an experienced data scientist overnight. We get it - this is the world we live in - quick and dirty. So here we go, take notes!

Read More

Topics: TreeNet, CART, data mining, predictive model, beginner, Data Prep

Sneak Peak: New Data Mining Video Documentation

Posted by Heather Hinman on Tue, Dec 3, 2013 @ 08:57 AM

Read More

Topics: video, TreeNet, cost functions, beginner, Tutorial

The History Behind Data Mining Train/Test Performance

Posted by Dan Steinberg on Tue, Jul 16, 2013 @ 12:56 PM

Updated: July 16, 2013

In their 1984 monograph, Classification and Regression Trees, Breiman, Friedman, Olshen and Stone discussed at length the need to obtain “honest” estimates of the predictive accuracy of a tree–based model. At the time the monograph was written, many data sets were small, so the authors took great pains to work out an effective way to use cross–validation with CART trees.

The result was a major advance for data mining, introducing ideas that at the time were radically new. The main point of the discussion was that the only way to avoid overfitting was to rely on test data. With plentiful data we can always reserve a portion for testing, but with fewer data we might have to rely on cross validation. In either case, however, only the test or cross–validated results should be trusted. In contrast, earlier approaches tended to ignore the training data performance results and focus only on the test data.

Watch This Tutorial on Train/Test Consistency in CART
Read More

Topics: TreeNet, CART, train and test data, Cross-Validation, tr

Requested vs. Actual Tree Sizes in TreeNet Models

Posted by Dan Steinberg on Tue, Jul 2, 2013 @ 11:46 AM

One of the most important controls in TreeNet is the maximum number of terminal nodes permitted in each tree (the NODES=number parameter setting on the TreeNet command). You would think that if you ask for say NODES=4 that all of your trees would have no more than 4 trees. However, that is not exactly how things will turn out unless your data contain NO MISSINGS. If there are missings in your data and variables with missing values are used as splitters then the trees may actually contain more nodes than expected.

Read More

Topics: TreeNet

How Much Time Needs to be Spent Preparing Data for Analysis?

Posted by Dan Steinberg on Wed, Jun 19, 2013 @ 04:40 AM

When beginning a data analysis project, analysts often discover that the data as presented or made available is not ready for analysis. The reasons for this lack of readiness could be many, including:

Read More

Topics: TreeNet, stochastic gradient boosting, KDDCup, Data Prep