We recently came across the article, "Random Forest---the go-to machine learning algorithm" from TechWorld Australia.
"When in doubt, use Random Forest — the go-to machine learning algorithm that is considered to be one of the most effective and versatile in solving almost any prediction task." Rebecca Merrett, TechWorld Australia
While we can't vouch for Random Forests in terms of it being the default machine learning technique to use when one is unsure of how to proceed, we are pretty much in agreement when it comes to the rest of Rebecca's analysis of this very powerful and very popular machine learning method.
Random Forests are a collection of decision trees (AKA Classification And Regression Trees). Where Random Forests gains its effectiveness is in, well, randomness. Through random sampling, Random Forests gains predictive power. A model can be trained on randomized samples of the dataset and the final prediction rests on "voting" from individual trees, or by going with the decision that appears a majority of times within the trees.
Randomness also has the advantage of making machine learning models more resistant to noise. This bagging method includes an approach that differs from simple bagging- it uses random sampling with replacement. Therefore, a value can appear more than once and each value has the same chance of being selected.
Random Forests can be used for classification tasks, e.g. credit risk, patient disease risk, mechanical failure. It can also be used for regression tasks, e.g. predicting a numeric value of temperature, social media shares, or performance score. In the case of regression, the results of the trees are averaged to make a final prediction. The algorithm is also being used in novel tasks such as text and image classification, and even speech pattern prediction. Read the full rundown on Random Forests here.
We would like to add some additional details about Random Forests and its benefits. Its strengths also include detecting outliers and anomalies, displaying clusters, and identifying important predictors. Random Forests has been successfully used in both small and huge datasets, and is effective in identifying important predictors even in the presence of hundreds to thousands of features.
There are many implementations of Random Forests out there. We encourage anyone looking into using this machine learning technique to try our implementation out for free. Salford Systems' implementation is the only software based on the work of the original creators, Leo Breiman and Adele Cutler. Random Forests® is a registered trademark of Leo Breiman, Adele Cutler, and exclusively licensed to Salford Systems. Download an evaluation of Random Forests here.
Want to learn more about the machine learning algorithm known as Random Forests? We have an ebook dedicated to the algorithm.