Simply Salford Blog

9 Data Mining Challenges From a Data Scientist Like You

Posted by Salford Systems on Tue, Jan 19, 2016 @ 07:00 AM

Data mining has a plethora of challenging aspects. Some of these challenges are common among nearly all data scientists, analysts, and predictive modelers while others are more industry-specific. Nevertheless, we all run into a snag here and there (hopefully more like there, not here) and it can be a trying task to overcome our day-to-day or project-to-project challenges.

Read More

Topics: command line, sample size, big data, GUI, missing values, data analysis, data mining in education

Podcast Recap: Dan Steinberg's Early Days

Posted by Nicole Finzi on Tue, Jan 12, 2016 @ 08:00 AM

Did you miss our podcast with Dan Steinberg yesterday? Subscribe to our podcast, Afternoon Analytics, for instant notification reminders! It's never too late to go back and listen! 

Read More

Topics: data mining, data analysis

Data Science in Biology: A Few Problems & Solutions [guest post]

Posted by Kimberly Fahrnkopf on Thu, Sep 11, 2014 @ 10:10 AM

Guest post by Grant Humphries, Post Doctoral researcher, University of California, Davis

Read More

Topics: TreeNet, data mining, big data, data science, predictive modeling, data analysis

How to Run a Model Using GPS

Posted by David Tolliver on Wed, Aug 27, 2014 @ 05:30 AM

We recently had a question about running a model using GPS, and wanted to share the answer in case anyone else has the same issue.

Read More

Topics: SPM, TreeNet, data mining, GPS, predictive model, data analysis, Tutorial, model scoring

Choosing Your Own Preferred MARS Model

Posted by Dan Steinberg on Wed, Aug 20, 2014 @ 09:46 AM

When MARS develops a model it actually develops many and presents you with the one that it judges best based on a self-testing procedure.  But the so-called MARS optimal model may not be satisfactory from your perspective.  It might be too small (include too few variables), too large (include too many variables), too complex (include too many splines, basis functions, or breaks in variables), or otherwise not to your liking based on your domain knowledge. So what can you do to override the MARS process?

Read More

Topics: data mining, Variable Importance, MARS, data science, predictive modeling, predictive model, data analysis, Dan Steinberg, statistics, machine learning

Confessions of a Data Scientist

Posted by Heather Hinman on Fri, Jul 26, 2013 @ 10:19 AM

All data scientists have their inner monologue of quirky comments and frustrating dilemmas. Sometimes, these comments  have the opportunity to be spoken out loud -- usually in the context of some informal meeting around the coffee pot, or in an under-the-breath mumble during a pacing rant of frustration. I have even witnessed the punching of a balloon and the kicking of a bean-bag chair (accompanied with some inaudible comments) over a data mining challenge that was driving the person bonkers.

Read More

Topics: big data, Data Prep, missing values, data analysis

9 Data Mining Challenges From Data Scientists Like You

Posted by Heather Hinman on Tue, Jul 23, 2013 @ 06:08 AM

Data mining has a plethora of challenging aspects. Some of these challenges are common among nearly all data scientists, analysts, and predictive modelers while others are more industry-specific. Nevertheless, we all run into a snag here and there (hopefully more like there, not here) and it can be a trying task to overcome our day-to-day or project-to-project challenges.

Read More

Topics: command line, sample size, big data, GUI, missing values, data analysis, data mining in education

Leo Breiman's Philosophy of Data Analysis

Posted by Dan Steinberg on Thu, Jul 19, 2012 @ 04:17 AM

Leo Breiman laid out his philosophy of data analysis and method invention in his 2001 paper in Statistical Science, "Statistical Modeling: The Two Cultures" in which, to over simplify, he says: if a methodology can generate high predictive accuracy on test data sets we do not need to provide any further justification or defense or explanation. Predictive accuracy is sufficient to justify whatever we do to reach a predictive model. In particular, we do not need a theory describing the process that generates the data we are studying and we do not a need theory explaining the learning machine we are using. He certainly took the latter position regarding one his greatest achievements: Random Forests (co-developed with Adele Cutler).

Read More

Topics: RandomForests, data analysis