Guest post by Grant Humphries, Post Doctoral researcher, University of California, Davis
Data science has become the newest trend in biological studies, primarily due to the large quantities of data that are now available. Moreover, new techniques of analysis are required to understand the complex relationships between parameters in these data. For these reasons, data science has become quite common in the ever-expanding field of biology.
For example, the human genome contains approximately 3 billion DNA base pairs, which can now be mapped and stored as computer data. Traditional computational and statistical techniques were not designed with such large amounts of data in mind, and as such, new methods are required.
Three of the major problems that exist with “big data” in the biological sciences are:
1) Lack of qualified individuals to process large quantities of data.
2) Lack of funding to purchase computing space to house data.
3) The amount of time involved in interpreting large quantities of data.
However, it is possible to get around some of these issues with advances in data mining and machine learning technologies. Here are a few examples:
1) Data mining and predictive analytic companies have teams of experts who deal with big data problems. Individuals from such companies can offer advice and input on how to analyze and process this data in an efficient way.
2) This is primarily driven by economy and is mostly limited to animal based biological sciences. Medical research tends to be better funded due to needs for advances in human health.
3) Time issues can now be resolved due to faster algorithms (e.g., TreeNet or Random Forests). Outputs from these analyses can be easily interpreted with excellent predictive capabilities, which is one of the major goals in scientific research.