It's not so much about teaching you to use a new data mining tool, but about helping you become a better data modeler. Here at Salford Systems we aim to educate data scientists of all levels, help them overcome their many challenges, and how to perfect their modeling skills, whether they are familiar with the SPM software suite or not.
Let's walk through some basic steps in getting started!
Prepare your data
Remember, that when building a data model the first, very important, step is preparing your data. Make that you have clearly labeled your variables and that you have saved all relevant information in a single location. Have you ever tweaked a dataset just right for your model, then forgot where you saved it? Yeah, so have we.
Setup your model
Okay, let's be realistic! There are so many ways to manipulate the parameters when setting up your model, that it is easy to over-complicate things when you are just starting out. Here are a few recommendations for getting started, and leave the rest as default settings:
- Test partition
- Loss criterion
- Number of decision trees to build
- Number of nodes in a decision tree
Evaluate your results
There are several metrics with which you can evaluate your model. Don't get too carried away with trying to understand every little detail. You can always come back to the nitty gritty later on. When you are just getting started, we recommend evaluating your model with:
- Mean Squared Error (MSE)
- Variable Importance
- Variable Dependence
- (also, it's a good idea to compare your learn vs. test performance)