Simply Salford Blog

9 Data Mining Challenges From a Data Scientist Like You

Posted by Salford Systems on Tue, Jan 19, 2016 @ 07:00 AM

Data mining has a plethora of challenging aspects. Some of these challenges are common among nearly all data scientists, analysts, and predictive modelers while others are more industry-specific. Nevertheless, we all run into a snag here and there (hopefully more like there, not here) and it can be a trying task to overcome our day-to-day or project-to-project challenges.

Read More

Topics: command line, sample size, big data, GUI, missing values, data analysis, data mining in education

Data Mining & Sampling Issues: Do we need a 3-way partition of data (learn, validate, test)?

Posted by Dan Steinberg on Wed, May 14, 2014 @ 10:51 AM

The short answer to this question is “no” we do not think that the 3-way partition is mandatory for SPM core models such as CART and TreeNet.  Here we discuss the issue.

Read More

Topics: train and test data, partition, sample size

9 Data Mining Challenges From Data Scientists Like You

Posted by Heather Hinman on Tue, Jul 23, 2013 @ 06:08 AM

Data mining has a plethora of challenging aspects. Some of these challenges are common among nearly all data scientists, analysts, and predictive modelers while others are more industry-specific. Nevertheless, we all run into a snag here and there (hopefully more like there, not here) and it can be a trying task to overcome our day-to-day or project-to-project challenges.

Read More

Topics: command line, sample size, big data, GUI, missing values, data analysis, data mining in education

Accurate results with limited data in CART and TreeNet

Posted by Dan Steinberg on Fri, Dec 7, 2012 @ 07:11 AM

How Large A Sample Do I Need? Or, Can I Achieve first class results with just a few hundred training samples?

Read More

Topics: TreeNet, CART, sample size