• Machine Learning at Microsoft

    This blog post is authored by Joseph Sirosh, Corporate Vice President of Machine Learning at Microsoft.

    Some months ago, my colleague and friend John Platt and I were bouncing around a few ideas to disseminate the deep advances and practical expertise that Microsoft has accumulated over the years in the field of machine learning (ML). We thought we would share our experiences so that our customers and the community may benefit from it as they embark on their own ML journeys. Hence this blog.

    Today we are focused on the introduction of Microsoft Azure Machine Learning – you can learn more about it via the announce blog located here. It is a fully-managed cloud service that will enable data scientists and developers to efficiently embed predictive analytics into their applications, helping organizations use massive amounts of data and bring all the benefits of the cloud to machine learning.

    John and I are privileged to be working with some of the foremost thought leaders and experts involved in ML today and we look forward to sharing more about the work we are doing. Stay tuned to this blog for more.

    Joseph
    Follow me on Twitter 

  • Machine Learning, meet Computer Vision

    This is part 1 of a 2 part series, co-authored by Jamie Shotton, Antonio Criminisi and Sebastian Nowozin of Microsoft Research, Cambridge, UK. The second part was later posted here.

    Computer vision, the field of building computer algorithms to automatically understand the contents of images, grew out of AI and cognitive neuroscience around the 1960s. “Solving” vision was famously set as a summer project at MIT in 1966, but it quickly became apparent that it might take a little longer! The general image understanding task remains elusive 50 years later, but the field is thriving. Dramatic progress has been made, and vision algorithms have started to reach a broad audience, with particular commercial successes including interactive segmentation (available as the “Remove Background” feature in Microsoft Office), image search, face detection and alignment, and human motion capture for Kinect. Almost certainly the main reason for this recent surge of progress has been the rapid uptake of machine learning (ML) over the last 15 or 20 years.

    This first post in a two-part series will explore some of the challenges of computer vision and touch on the powerful ML technique of decision forests for pixel-wise classification.

    Image Classification

    Imagine trying to answer the following image classification question: “Is there a car present in this image?” To a computer, an image is just a grid of red, green and blue pixels where each color channel is typically represented with a number between 0 and 255. These numbers will change radically depending not only on whether the object is present or not, but also on nuisance factors such as camera viewpoint, lighting conditions, the background, and object pose. Furthermore one has to deal with the changes in appearance within the category of cars. For example, the car could be a station wagon, a pickup, or a coupe, and each of these will result in a very different grid of pixels.

    Supervised ML thankfully offers an alternative to naively attempting to hand-code for these myriad possibilities. By collecting a training dataset of images and hand-labelling each image appropriately, we can use our favorite ML algorithm to work out which patterns of pixels are relevant to our recognition task, and which are nuisance factors. We hope to learn to generalize to new, previously unseen test examples of the objects we care about, while learning invariance to the nuisance factors.  Considerable progress has been made, both in the development of new learning algorithms for vision, and in dataset collection and labeling.

    Decision forests for pixel-wise classification

    Images contain detail at many levels. As mentioned earlier, we can ask a question of the whole image such as whether a particular object category (e.g. a car) is present. But we could instead try to solve a somewhat harder problem that has become known as “semantic image segmentation”: delineating all the objects in the scene. Here’s an example segmentation on a street scene:

    In photographs you could imagine this being used to help selectively edit your photos, or even synthesize entirely new photographs; we’ll see a few more applications in just a minute.

    Solving semantic segmentation can be approached in many ways, but one powerful building block is pixel-wise classification: training a classifier to predict a distribution over object categories (e.g. car, road, tree, wall etc.) at every pixel. This task poses some computational problems for ML. In particular, images contain a large number of pixels (e.g. the Nokia 1020 smartphone can capture at 41 million pixels per image). This means that we potentially have multiple-million-times more training and test examples than we had in the whole-image classification task.

    The scale of this problem led us to investigate one particularly efficient classification model, decision forests (also known as random forests or randomized decision forests). A decision forest is a collection of separately-trained decision trees:

    Each tree has a root node, multiple internal “split” nodes, and multiple terminal “leaf” nodes. Test time classification starts at the root node, and computes some binary “split function” of the data, which could be as simple as “is this pixel redder than one of its neighbors?” Depending on that binary decision, it will branch either left or right, look up the next split function, and repeat. When a leaf node is finally reached, a stored prediction – typically a histogram over the category labels – is output. (Also see Chris Burges’ excellent recent post on boosted variants of decision trees for search ranking.)

    The beauty of decision trees lies in their test-time efficiency: while there can be exponentially many possible paths from the root to leaves, any individual test pixel will only pass down just one path. Furthermore, the split functions computation is conditional on what has come previously: i.e. the classifier hopefully asks just the right question depending on what the answers to the previous questions have been. This is exactly the same trick as in the game of “twenty questions”: while you’re only allowed to ask a small number of questions, you can quickly hone in on the right answer by adapting what question you ask next depending on what the previous answers were.

    Armed with this technique, we’ve had considerable success in tackling such diverse problems as semantic segmentation in photographs, segmentation of street scenes, segmentation of the human anatomy in 3D medical scans, camera relocalization, and segmenting the parts of the body in Kinect depth images. For Kinect, the test-time efficiency of decision forests was crucial: we had an incredibly tight computational budget, but the conditional computation paired with the ability to parallelize across pixels on the Xbox GPU meant we were able to fit [1].

    In the second part of this series, we’ll discuss the recent excitement around “deep learning” for image classification, and gaze into the crystal ball to see what might come next. In the meantime, if you wish to get started with ML in the cloud, do visit the Machine Learning Center. 

    Thanks for tuning in.

    Jamie, Antonio and Sebastian

    [1] Body part classification was only one stage in the full skeletal tracking pipeline put together by this fantastic team of engineers in Xbox.

  • Extensibility and R Support in the Azure ML Platform

    This blog post is authored by Debi Mishra, Partner Engineering Manager in the Information Management and Machine Learning team at Microsoft.

    The open source community practicing machine learning (ML) has grown significantly over the last several years with R and Python tools and packages especially gaining adoption among ML practitioners. Many powerful ML libraries have been developed in these languages resulting in a virtuous cycle with even more adopting these languages as a result. The popularity of R has a lot to do with CRAN while Python adoption has been significantly aided by the SciPy stack. In general though, these languages and associated tools and packages are a bit like islands – there is generally not much interoperability across them. The interoperability challenge is not just at language or script level. There are specialized objects for “dataset”, specialized interpretation of “columnar schema” and other key data science constructs in these environments. To truly enable the notion of “ambient intelligence in the cloud”, ML platforms need to allow developers and data scientists to mix and match languages and frameworks used to compose their solutions. Data science solutions frequently involve many stages of computation and data flow including data ingestion, transformation, and optimization and ML algorithms. Different languages, tools and packages may be optimal for different steps as they may fit the need of that particular stage better. 

    The Azure ML service is an extensible, cloud-based, multi-tenant service for authoring and executing data science workflows and putting such workflows into production. A unique capability of the Azure ML Studio toolset is the ability to perform functional composition and execute arbitrary workflows with data and compute. Such workflows can be operationalized as REST end-points on Azure. This enables a developer or data scientist to author their data and compute workflows using a simple “drag, drop and connect” paradigm, test these workflows and then stand them up as production web services with a single click.

    A key part of the vision for the Azure ML service is the emphasis on the extensibility of the ML platform and its support for open source software such as R, Python and other similar environments. This way, the skills as well as the code and scripts that exist among current ML practitioners can be directly brought into and operationalized within the context of Azure ML in a friction-free manner. We built the foundations of the Azure ML platform with this tenet in mind.

    R is the first such environment that we support, specifically in the following manner:

    • Data scientists can bring their existing assets in R and integrate them seamlessly into their Azure ML workflows.

    • Using Azure ML Studio, R scripts can be operationalized as scalable, low latency web services on Azure in a matter of minutes!

    • Data scientists have access to over 400 of the most popular CRAN packages, pre-installed. Additionally, they have access to optimized linear algebra kernels that are part of the Intel Math Kernel Library.

    • Data scientists can visualize their data using R plotting libraries such as ggplot2.

    • The platform and runtime environment automatically recognize and provide extensibility via high fidelity bi-directional dataframe and schema bridges, for interoperability.

    • Developers can access common ML algorithms from R and compose them with other algorithms provided by the Azure ML platform.

    The pictures below show how one would use the “Execute R Module” in Azure ML to visualize a sample dataset, namely “Breast cancer data”.

    It is gratifying to see how popular R has been with our first wave of users. Interestingly, the most common errors our users see happen to be syntax errors that they discover in their R scripts! Usage data shows R being used in about one quarter of all Azure ML modelling experiments. The R forecasting package is being used by some of Microsoft’s key customers as well as some of our own teams, internally.

    You too can get started with R on Azure ML today. Meanwhile, our engineering teams are working hard to extend Azure ML with similar support for Python – for more information on that, just stay tuned to this blog.

    Debi
    Follow me on Twitter
    Email me at debim@microsoft.com

  • Web Services and Marketplaces Create a New Data Science Economy

    This blog post is authored by Joseph Sirosh, Corporate Vice President of Machine Learning at Microsoft.

    Yesterday, at Strata + Hadoop World, we announced the expansion of our data services with support of real-time analytics for Apache Hadoop in Azure HDInsight and new machine learning (ML) capabilities in the Azure Marketplace. Today, I would like to expand on the new ML capabilities that we announced and share how this is an important step in our journey to jump-start the new data science economy. I’ll also be speaking more about this in my keynote presentation tomorrow at Strata.

    Data scientists and their management are often frustrated by just how little of their work makes it into production deployments. Consider this hypothetical, although not uncommon scenario. A data scientist and his team are asked to create a new sales prediction model that can be run whenever needed. The data scientists perfect the sales model using popular statistical modeling language, “R”. The new model is presented to management who want to get the model up and running right away as a web app and as a mobile client. Unfortunately, engineering is unable to deploy the model as they don’t have R and the only option is to convert it all to Java - something that will take months to get up and running. So the data scientists end up preparing a batch job to run R code and mail reports on a daily basis, leaving everyone unsatisfied.

    Well, now there’s a better way, thanks to Azure Machine Learning.

    We built Azure ML to empower data science with all the benefits of the cloud. Data scientists can bring R code and use Microsoft's world class ML algorithms in our web-based ML Studio. No software installs required for analysis or production – our browser UI works on any machine and operating system. Teams can collaborate in the cloud, share projects, experiment with world-class algorithms and include data from databases or blob storage. They can use enormous storage and compute resources in the cloud to develop the best models from their data, unrestrained by server or storage capacity.

    Perhaps best of all, with just one-click, users can publish a web service with their data science code embedded in it. Data transformations and models can now run in a web service in the cloud – fully managed, secure, reliable, available, and callable from anywhere in the world.

    These web service APIs can be invoked from Excel, as shown in this video, by using this simple plug-in. Now, instead of emailing reports, users can surprise management with cloud-hosted apps that are built in hours. Engineering can hook up APIs to any application easily and even create custom mobile apps. Users can publish as many web services as they like, test multiple models in production and update models with new data. The data science team just became several times more productive and engineering is happy because integration is so easy.

    But wait, there's still more.

    Imagine a data scientist hits upon that perfect idea for an intelligent web service that everyone else in the world should be building into their apps. Maybe it is a great forecasting method, or a new churn prediction technique, or a novel approach to pattern recognition. Data scientists can now build that web service in Azure ML, publish the ML web service on the Azure Marketplace and start charging for it in over one hundred currencies. Published APIs can be found via search engines. Anyone in the world can pay and subscribe to them and use them in their apps.

    For the first time, data scientists can monetize their know-how and creativity just as app developers do. When this happens, we start changing the dynamics of the industry – essentially, data scientists are able to “self-publish” their domain expertise as cloud services which can then be made accessible to billions of users via smartphone apps that tap into those services.

    The Azure Marketplace already has an emerging selection of such services. In just a couple of weeks, four of our data scientists published over 15 analytics APIs into the marketplace by wrapping functions from CRAN. Among others, these include APIs for forecasting, survival analysis and sentiment analysis.

    Our marketplace has much more than basic analytics APIs. For example, we went and built a set of finished end-to-end ML applications, all using Azure ML, to solve specific business needs. These ML apps do not require a data scientist or ML expertise to use – the science is already baked into our solution. Users can just bring their own data and start using them. These include APIs for recommendations, items that are frequently bought together as well as anomaly detection to spot anomalous events in time-series data such as server telemetry.

    A similar anomaly detection API is used by Sumo Logic, a cloud-based machine data analytics company. They have collaborated with Microsoft to bring metric-based anomaly detection capability to their customers. Our metric-based anomaly detection perfectly complements Sumo Logic's structure-based anomaly detection capabilities. Any Sumo Logic query which results in a numerical time-series now has a special “metric anomaly detection” button which sends the pre-aggregated time series data to Azure ML for analysis. The data is then annotated with labels provided by the Azure ML service indicating unusual spikes or level shifts. Sumo Logic is now offering this optional integration in a limited beta release.

    Third parties too are starting to publish APIs into our marketplace. For instance, Versium, a predictive analytics startup, has published these three sophisticated customer scores, all based on public marketing data – Giving Score (which predicts customer propensity to donate), Green Score (predicts customer propensity to make environmentally conscious purchase decisions) and Wealth Score (helps companies estimate the net worth of customers and prospects). Versium offers these scores by analyzing and associating billions of LifeData® attributes and building predictive models using Azure ML.

    Our marketplace also hosts a number of other exciting APIs that use ML, including the Bing Speech Recognition Control, Microsoft Translator, Bing Synonyms API and Bing Search API.

    By bringing ML capabilities to the Azure Marketplace and making it easy for anyone to access, we are liberating data science from its confines. This two-minute video recaps how:

    Get going today – sign up for Azure ML and try out some of our easy to use samples.

    A new future for machine learning is being born in the cloud. 

    Joseph
    Follow me on Twitter.

  • Microsoft Machine Learning Hackathon 2014

    This blog post is authored by Ran Gilad-Bachrach, a Researcher with the Machine Learning Group in Microsoft Research.

    Earlier this summer, we had our first broad internal machine learning (ML) hackathon at the Microsoft headquarters in Redmond. One aspect of this hackathon was a one-day competition, the goal of which was to work in teams to get the highest possible accuracy on a multi-class classification problem. The problem itself was based on a real world issue being faced by one of our business groups, namely that of automatically routing product feedback being received from customers to the most appropriate feature team (i.e. the team closest to the specific customer input). The data consisted of around 15K records, of which 10K were used for training and the rest were split for validation and test. Each record contained a variety of attributes including such properties as device resolution, a log file created on the device, etc. Overall, the size of each record was about 100KB. Each record could be assigned to one of 16 possible feature teams or “buckets”. Participating teams had the freedom to use any tool of their choice to extract features and train models to map these records automatically into the right bucket.

    The hackathon turned out to be a fun event with hundreds of engineers and data scientists participating. Aside from being a great learning experience it was also an opportunity to meet other people in the company with a shared passion for gleaning insights from data using the power of ML. We also used this event as an opportunity to gain some wisdom around the practice of ML, and I would like to share some of our interesting findings with you.

    We had more than 160 participants in the competition track. We asked them to form teams of 1-4 participants and ended up with 52 teams. Many participants were new to the field of ML and therefore, unsurprisingly, 11 of 52 teams failed to submit any meaningfully significant solution to the problem at hand. However, when we looked closer at the teams that dropped out, we found out that all teams with just a single member had dropped out! While it is quite possible that the participants who showed up without a team were only there for the free breakfastJ, when we surveyed our participants and asked them whether working in teams was beneficial, a vast majority, well over 90%, agreed or strongly agreed with the statement.

     We also found out that there were two strategies for splitting the problem workload within teams. Most teams assigned specific roles to their team members with 1 or 2 participants working on “feature engineering”, while others tried different learning algorithms, or ramped up on the tools, or created requisite “plumbing”. The other strategy we saw teams use was to have each participant try a different approach to address the problem, i.e. build multiple end-to-end solutions, with each team member using a different strategy, and later zooming into most promising of their different approaches.

    We did not notice significant difference in the performance of teams based on the strategy they used to split the workload. However, there was evidence that it was important to be working in teams and to be thoughtful about how to split a given ML challenge between team members. Assigning roles and having clarity on what tools to use were critical considerations.

    Over the course of the day, teams were allowed to make multiple submissions of their candidate solutions. We scored these solutions against the test set but these scores were revealed only at the end of the event, when winners were announced. There were more than 270 submissions made, overall. It is interesting to look at these submissions when they are grouped together – the graph below shows all the submissions as blue dots, with the X-axis representing accuracy on the training set and the Y-axis representing accuracy on the test set.

     Most submissions with a training accuracy < 0.55 had a good match between the train and test accuracy (the gray line shows the equal train-test accuracy). However, the test-accuracy keeps improving even when the gap between the train and test accuracy becomes ridiculously large. For example, the winner (the red dot) had a training accuracy of 94% and test accuracy of 62%.

    Next, let us look at the behavior of the algorithms used by the different teams in this particular competition. We are able to understand and analyze these algorithms only because we had asked participants to add a short description to every submission they made (i.e. akin to textual comments provided by software developers each time they update code in a source control system).

    It is interesting to see these algorithms plotted on a graph (below). The submissions with a large gap between training and test were mostly using boosted trees. This makes sense since boosting works by aggressively reducing the training error. Additionally, note that 4 of the leading teams – each of who were among the top 15 submissions, overall – were using boosted trees to solve this particular problem.

     We have seen similar patterns in other cases too, and boosted trees are a strong candidate on many ML tasks. If you have some special or temporal structure to the data, it may be easier to encode it using neural-nets (although exploiting it may be non-trivial). However, if there is no structure to the features or if you have a limited amount of time to spend on a problem, one should definitely consider boosted trees.

    Beyond the numbers and the graphs, what was cool about this event was that hundreds of engineers got a chance to work together and learn and have some plain fun – do check out our quick 1 minute time-lapse video of this, our inaugural ML hackathon.  

    With over half our hackathon participants indicating that they were new to ML, it was great that they showed up in such big numbers and did as well as they did. In fact one of our Top 5 teams comprised entirely of summer interns who were new to this space. If some of you out there are emboldened by that and wish to learn ML for yourself, you can start at the Machine Learning Center – there are videos and other resources available, as well as a free trial.

    Ran
    Follow my research