Dr. Grant Humphries, from the Zoology department at the University of Otago, New Zealand, has spent the last three years studying how a bird species called Sooty Shearwaters can help predict upcoming El Niño occurrences. After much time and research, he has figured out a way to do so using data mining.
Predictive accuracy is repeatedly cited by data scientists as one of the most important demands in modern data mining algorithms and software. It stands right along side the importance of model-building speed, missing value handling, and memory efficiency. So, if it is so important, how do the experts TEST the accuracy of their models?
January is commonly a time to reflect on the past and make predictions about the future. To each his own, I suppose, but I’m confident that many of us will agree on a few common themes for 2014.
At the 2012 Salford Analytics and Data Mining Conference, Maria Lupetini from Qualcomm gave an easy-to-understand overview of how they are able to make predictions using Salford Systems' MARS (Multivariate Adaptive Regression Splines). A portion of the recorded presentation is below, enjoy!
Once you have built an SPM model (CART, MARS, TreeNet, RandomForests) and
have saved the grove (.GRV) file you are in a position to make predictions
for any other data set containing relevant predictors. Thus, if you trained
your model on file A using variables X1, X2,...,X50, for example, you can now
predictions for file B, provided that file B contains at least some of the same
variables (and preferably all of the variables actually used in the model).
This process of prediction generation is called SCORING in our software and
most models are built specifically so that they can be put into production to
generate predictions. The process can also be used for SIMULATION. In this case
you prepare a data set which will also contain the columns X1, X2, ...,X50 but
the values appearing may not necessarily be real data. Instead the file could contain
hypothesized or imagined values, or forecasted values, as in the case when you
want to make predictions for certain possible future scenarios.