Microsoft Azure ML
Microsoft Data Platform Insider Blog
Microsoft Big Data Solutions
Data Science Dojo
Azure Big Data Blog
This blog post is authored by Markus Weimer, Principal Scientist at Microsoft
Eight years ago, I made the 24-hour journey from my college town of Darmstadt, Germany, to Canberra, Australia to attend a Machine Learning Summer School (MLSS) there. Why, you might ask? At the time, I didn’t have a good answer myself, to be quite honest. Well, at least nothing beyond my love for the land down under and a suggestion from my PhD advisor that it would be a good idea to attend. In hindsight, he was very right. During the two weeks I was in Canberra, I made many new friends and learned things that changed the course of my PhD research and ultimately set me onto a path that led me to the US and indeed to my current position at Microsoft.
I was reminded of this trip when earlier this year, I received an email from Carnegie Mellon University (CMU) professors Alex Smola and Zico Kolter to teach a hands-on class on REEF, my current project at Microsoft, at this year’s Machine Learning Summer School at CMU in Pittsburgh. This seemed like a unique way for me to give back to the student community so I jumped on the opportunity. REEF is a framework for writing distributed applications on top of Apache Hadoop 2 clusters. In order to give the students an opportunity to experience a real Big Data environment, Microsoft sponsored a 1000 core Azure HDInsight cluster for the duration of the class. HDInsight is Microsoft’s fully-managed Hadoop-on-Azure offering.
In the course of five lectures during the week, I walked students through the basics of the Big Data cloud environment, introduced REEF and Azure HDInsight and discussed what we call “Resource Aware Machine Learning”. The main idea behind the latter being that systems events such as adding and removing machines from a distributed application have implications for machine learning (ML). For instance, losing a machine due to hardware failure in the middle of the computation leads to a lost partition of data. That in turn leads to estimates computed on that data to have higher variance. And variance of estimators is a first class object in ML. Hence, we just might find more efficient ways to deal with machine failure than to require the underlying system to handle them.
Summer School attendees and lecturers after the farewell dinner. Photo by Alex Smola
This decidedly hands-on-keyboard lecture was embedded into a full schedule for the students with lectures from industry and academia alike, focusing on such diverse topics as new theoretical foundations for the learning of factor models to the practical lessons from operating internet-scale recommender systems.
I sure hope that the Machine Learning Summer School at CMU this year will have the same profound impact on at least a few students as the one from many years ago had on my own research and career. For those of you who wish to learn more about the CMU MLSS, I highly recommend the lecture recordings posted online by the organizers.
Markus WeimerFollow me on twitter. Follow my blog.