We recently came across a neat interactive visual introduction to machine learning. It's an excellent explanation on how decision trees work, using data about houses to distinguish homes in New York from homes in San Francisco, for technical and non-technical audiences alike.Recap, taken from the Read More
Overfitting is an issue for most machine learning tools. The learners are very flexible and can thus adapt to the noise in the data, as well as to the signal. A classic technique to avoid overfitting is to ensure that we have both learn and validate (or test) data, and then to monitor the learning the process; comparing the goodness of fit or performance on learn and validate data as a function of the amount of training.
Question: When can you usefully build models using all of your data?