Official News from Microsoft’s Information Platform
Machine Learning Blog
A year ago at Virginia Tech researchers needed 2 weeks to analyze just 1 genome. Today, they can analyze 100 genomes each day. Why is this important? Scientists can learn more about our DNA and uncover more effective strategies for detecting, diagnosing, and treating diseases such as cancer. What’s helping to make this possible? An innovative solution developed by Virginia Polytechnic Institute and State University (Virginia Tech) that’s based on Windows Azure and the Windows Azure HDInsight Service.
There are currently an estimated 2,000 DNA sequencers generating around 15 petabytes of data every year. Additionally, data volumes are doubling every 8 months, significantly ahead of Moore’s law of compute capability’s which is doubling only every 24 months. Most institutions can’t afford to scale data centers fast enough to store and analyze all of the new information. To overcome this challenge, Virginia Tech developed a high-performance computing (HPC) solution with Windows Azure. It gives global researchers a highly scalable, on-demand IT infrastructure in the cloud that they can use to store and analyze Big Data, accelerate genome research, and increase collaboration.
To make it easy for researchers to use the solution, Virginia Tech developed two cloud applications. One streamlines the creation of Genome Analysis Toolkit (GATK) pipelines (for DNA sequencing) using Windows Azure HDInsight. The other program simplifies the use of Hadoop MapReduce pipelines to automate data transfers and analyze information that resides on local and cloud-based systems in a hybrid scenario.
The new solution is saving Virginia Tech—and other organizations—millions of dollars because scientists pay only for the resources that they use. This includes Windows Azure Blob storage for temporary or long-term data storage and HDInsight clusters for on-demand HPC nodes. Provisioning a new resource takes just seconds.
Global scientists can also collaborate with less effort because they can now easily share insights and data sets virtually anytime, anywhere—and with any device. As a result, in the future scientists or doctors may be able to use the solution to develop custom treatments for individual patients faster, by engaging in genome analysis directly at hospitals.
You can learn more about Virginia Tech’s solution by watching the video below or reading the detailed case study here.
Comments in this blog are open and monitored for each post for a period of two weeks after the posting date. If you have a specific question about a blog post that is older than two weeks, please submit your question via our Twitter handle @SQLServer