Data Mining Blog by Salford Systems

TreeNet Gradient Boosting and CART Decision Trees: A Winning Combination

Posted by Kimberly Fahrnkopf on Thu, Apr 9, 2015 @ 10:14 AM

6 Reasons to Combine CART and TreeNet:

#1 Build predictive models quickly: One advantage of CART is that is has the ability to build models relatively fast.

#2 Incorporate all types of variables: Your model can include numeric, binary, categorical, and missing values.

#3 Interpretable model representation: CART’s easy-to-understand decision tree graphics will make your job easy when explaining the model to your boss! All you have to do is print it out!

#4 Maintain model stability: One of TreeNet’s top advantages is that it will retain a stable model due to averaging of the individual decision tree responses – something difficult to do with CART.

#5 Produce a high interaction order model: TreeNet allows precise control over interactions among multiple variables.

#6 Include ALL variables: In CART, relatively few predictors make it into the model, but when using TreeNet each tree works with the entire data – many opportunities for variables to enter.

When combining TreeNet and CART – you maintain the simplicity of CART while overcoming its challenges with TreeNet gradient boosting.

Read More

Two Frequent Challenges in Data Preparation

Posted by Kimberly Fahrnkopf on Wed, Mar 25, 2015 @ 03:10 PM

1.)  HIGH LEVEL CATEGORICAL VARIABLES

Having a very high number of levels can lead to many problems. For example, if there was a variable with 797 distinct levels, there would be a few important factors to look out for that probably would not have been an issue otherwise.

Read More

How Data Science can Expose Biological Invaders

Posted by Kimberly Fahrnkopf on Wed, Mar 11, 2015 @ 08:00 AM

There are a large number of species that cause much environmental harm. These are non-native plant or animal species that tend to spread, and cause damage to their new environment. These "biological invaders" have caused, and continue to cause, billions of dollars worth of damage. 

By identifying these potentially invasive species, and the traits responsible for their invasiveness, we can ultimately help our environment. This presentation will show you how data science has already helped to identify these traits and reduce the damage.

Read More

5 Data Mining Misconceptions

Posted by Kimberly Fahrnkopf on Wed, Feb 18, 2015 @ 01:06 PM

1. Quest for the Holy Grail

Read More

TreeNet Gradient Boosting: An Overview

Posted by Kimberly Fahrnkopf on Tue, Dec 16, 2014 @ 08:20 AM

TreeNet Stochastic Gradient Boosting is regarded as one of the most powerful methodologies in predictive modeling and machine learning. Its power in both classification and regression problems truly make it stand out, and has even been called "the most accurate predictive model in a premier data mining competition." Salford Systems' CEO, Dan Steinberg, summarizes TreeNet in three words: speed, power and accuracy.

 

Read More

How Data Science Can help us Discover our Planet’s History

Posted by Kimberly Fahrnkopf on Wed, Oct 15, 2014 @ 06:55 AM

In order to see how data science can help in discovering our earth’s history, it is important to know firstly, about the Gaia Hypothesis. 

Read More

Topics: data mining, data science, predictive modeling, machine learning

Data Science in Biology: A Few Problems & Solutions [guest post]

Posted by Kimberly Fahrnkopf on Thu, Sep 11, 2014 @ 10:10 AM

Guest post by Grant Humphries, Post Doctoral researcher, University of California, Davis

Read More

Topics: TreeNet, data mining, big data, data science, predictive modeling, data analysis

Predicting Shifts in El Niño Using Birds & Data Mining

Posted by Kimberly Fahrnkopf on Wed, Sep 3, 2014 @ 10:38 AM

Dr. Grant Humphries, from the Zoology department at the University of Otago, New Zealand, has spent the last three years studying how a bird species called Sooty Shearwaters can help predict upcoming El Niño occurrences. After much time and research, he has figured out a way to do so using data mining.

Read More

Topics: TreeNet, data mining, Variable Importance, big data, data science, prediction, predictive modeling, predictive model

How to Run a Model Using GPS

Posted by David Tolliver on Wed, Aug 27, 2014 @ 05:30 AM

We recently had a question about running a model using GPS, and wanted to share the answer in case anyone else has the same issue.

Read More

Topics: SPM, TreeNet, data mining, GPS, predictive model, data analysis, Tutorial, model scoring

Choosing Your Own Preferred MARS Model

Posted by Dan Steinberg on Wed, Aug 20, 2014 @ 09:46 AM

When MARS develops a model it actually develops many and presents you with the one that it judges best based on a self-testing procedure.  But the so-called MARS optimal model may not be satisfactory from your perspective.  It might be too small (include too few variables), too large (include too many variables), too complex (include too many splines, basis functions, or breaks in variables), or otherwise not to your liking based on your domain knowledge. So what can you do to override the MARS process?

Read More

Topics: data mining, Variable Importance, MARS, data science, predictive modeling, predictive model, data analysis, Dan Steinberg, statistics, machine learning

Subscribe to Email Updates

Washington D.C. Mini Data Mining Training
New Call-to-action

Follow Salford Systems