Simply Salford Blog

Kimberly Fahrnkopf

Recent Posts

TreeNet Gradient Boosting and CART Decision Trees: A Winning Combination

Posted by Kimberly Fahrnkopf on Thu, Apr 9, 2015 @ 10:14 AM

6 Reasons to Combine CART and TreeNet:

#1 Build predictive models quickly: One advantage of CART is that is has the ability to build models relatively fast.

#2 Incorporate all types of variables: Your model can include numeric, binary, categorical, and missing values.

#3 Interpretable model representation: CART’s easy-to-understand decision tree graphics will make your job easy when explaining the model to your boss! All you have to do is print it out!

#4 Maintain model stability: One of TreeNet’s top advantages is that it will retain a stable model due to averaging of the individual decision tree responses – something difficult to do with CART.

#5 Produce a high interaction order model: TreeNet allows precise control over interactions among multiple variables.

#6 Include ALL variables: In CART, relatively few predictors make it into the model, but when using TreeNet each tree works with the entire data – many opportunities for variables to enter.

When combining TreeNet and CART – you maintain the simplicity of CART while overcoming its challenges with TreeNet gradient boosting.

Read More

Two Frequent Challenges in Data Preparation

Posted by Kimberly Fahrnkopf on Wed, Mar 25, 2015 @ 03:10 PM

1.)  HIGH LEVEL CATEGORICAL VARIABLES

Having a very high number of levels can lead to many problems. For example, if there was a variable with 797 distinct levels, there would be a few important factors to look out for that probably would not have been an issue otherwise.

Read More

How Data Science can Expose Biological Invaders

Posted by Kimberly Fahrnkopf on Wed, Mar 11, 2015 @ 08:00 AM

There are a large number of species that cause much environmental harm. These are non-native plant or animal species that tend to spread, and cause damage to their new environment. These "biological invaders" have caused, and continue to cause, billions of dollars worth of damage. 

By identifying these potentially invasive species, and the traits responsible for their invasiveness, we can ultimately help our environment. This presentation will show you how data science has already helped to identify these traits and reduce the damage.

Read More

5 Data Mining Misconceptions

Posted by Kimberly Fahrnkopf on Wed, Feb 18, 2015 @ 01:06 PM

1. Quest for the Holy Grail

Read More

TreeNet Gradient Boosting: An Overview

Posted by Kimberly Fahrnkopf on Tue, Dec 16, 2014 @ 08:20 AM

TreeNet Stochastic Gradient Boosting is regarded as one of the most powerful methodologies in predictive modeling and machine learning. Its power in both classification and regression problems truly make it stand out, and has even been called "the most accurate predictive model in a premier data mining competition." Salford Systems' CEO, Dan Steinberg, summarizes TreeNet in three words: speed, power and accuracy.

 

Read More

How Data Science Can help us Discover our Planet’s History

Posted by Kimberly Fahrnkopf on Wed, Oct 15, 2014 @ 06:55 AM

In order to see how data science can help in discovering our earth’s history, it is important to know firstly, about the Gaia Hypothesis. 

Read More

Topics: data mining, data science, predictive modeling, machine learning

Data Science in Biology: A Few Problems & Solutions [guest post]

Posted by Kimberly Fahrnkopf on Thu, Sep 11, 2014 @ 10:10 AM

Guest post by Grant Humphries, Post Doctoral researcher, University of California, Davis

Read More

Topics: TreeNet, data mining, big data, data science, predictive modeling, data analysis

Predicting Shifts in El Niño Using Birds & Data Mining

Posted by Kimberly Fahrnkopf on Wed, Sep 3, 2014 @ 10:38 AM

Dr. Grant Humphries, from the Zoology department at the University of Otago, New Zealand, has spent the last three years studying how a bird species called Sooty Shearwaters can help predict upcoming El Niño occurrences. After much time and research, he has figured out a way to do so using data mining.

Read More

Topics: TreeNet, data mining, Variable Importance, big data, data science, prediction, predictive modeling, predictive model