Simply Salford Blog

9 Data Mining Challenges From a Data Scientist Like You

Posted by Salford Systems on Tue, Jan 19, 2016 @ 07:00 AM

Data mining has a plethora of challenging aspects. Some of these challenges are common among nearly all data scientists, analysts, and predictive modelers while others are more industry-specific. Nevertheless, we all run into a snag here and there (hopefully more like there, not here) and it can be a trying task to overcome our day-to-day or project-to-project challenges.

Read More

Topics: command line, sample size, big data, GUI, missing values, data analysis, data mining in education

Confessions of a Data Scientist

Posted by Heather Hinman on Fri, Jul 26, 2013 @ 10:19 AM

All data scientists have their inner monologue of quirky comments and frustrating dilemmas. Sometimes, these comments  have the opportunity to be spoken out loud -- usually in the context of some informal meeting around the coffee pot, or in an under-the-breath mumble during a pacing rant of frustration. I have even witnessed the punching of a balloon and the kicking of a bean-bag chair (accompanied with some inaudible comments) over a data mining challenge that was driving the person bonkers.

Read More

Topics: big data, Data Prep, missing values, data analysis

9 Data Mining Challenges From Data Scientists Like You

Posted by Heather Hinman on Tue, Jul 23, 2013 @ 06:08 AM

Data mining has a plethora of challenging aspects. Some of these challenges are common among nearly all data scientists, analysts, and predictive modelers while others are more industry-specific. Nevertheless, we all run into a snag here and there (hopefully more like there, not here) and it can be a trying task to overcome our day-to-day or project-to-project challenges.

Read More

Topics: command line, sample size, big data, GUI, missing values, data analysis, data mining in education

Handling Missing Values in MARS

Posted by Dan Steinberg on Wed, Dec 19, 2012 @ 02:37 PM

Question: How does MARS (Multivariate Adaptive Regression Splines) deal with missing values?

Read More

Topics: MARS, missing values

A Reminder About Missing Values In Data Mining

Posted by Dan Steinberg on Wed, May 2, 2012 @ 10:03 AM

Our tech support department receives a steady stream of interesting questions regarding how to use our products, with questions about specific features or how to accomplish a given task. We also receive questions about data mining (and predictive analytics generally), modeling strategy and a variety of other topics. One type of query that comes up periodically is what to do with missing values. We have spoken before about missing values in a variety of contexts, but usually at a fairly technical and advanced level. Today’s post is actually quite basic in nature and is in response to a user’s question about what to do with special values for variables that are intended to represent missing values. Data input practice stemming from at least the 1970's has made ‘missing value codes’ for unknown data fields; favorite values have include a string of 9’s such as 9999 or -9999. There are a number of variations on this theme. For example, survey research firms have wanted to distinguish between different reasons for a missing value using, for example, 9999 to represent values missing for no known reason and 9998 representing ‘unknown’ and 9997 for ‘refused.’ Data input clerks have been known to fill in missing birthdays with values such as January 1, 1960.

Read More

Topics: missing values