Connect
Microsoft Research Connections Blog
Next at Microsoft
Social Media Collective
Windows on Theory
Posted by Yuxiong He, Sameh Elnikety, and James Larus, Microsoft Research; and Chenyu Yan, Microsoft
Sharing a resource, such as a computer processor or disk drive, requires the system to make decisions about which user’s task gets to use the resource and for how long. When my task is running, yours may be idling in a queue, awaiting its turn on a processor. The process of making these resource-allocation decisions is known as scheduling, and it has long been one of the most important aspects of system design, as a poor scheduler not only wastes resources but also can ruin a user’s experience.
Scheduling is a well-studied area, but cloud computing raises new challenges. The focus of our paper Zeta: Scheduling Interactive Services with Partial Execution, being presented this week in San Jose, Calif., during the Association for Computing Machinery’s Symposium on Cloud Computing, is an important type of cloud service: time-bounded interactive services. These are websites—such as Microsoft’s Bing and Office 365 or Facebook—in which a user interacts directly with a service running in the cloud, as opposed to an application running on a local machine. To provide a pleasing user experience, the service’s response to a user’s click must occur near instantaneously, typically within a couple of hundred milliseconds, around the limits of human perception.
Building a cloud service that can provide this quality of service to millions of simultaneous users is a vast engineering challenge. Our work focuses on one small, but important aspect of this problem, determining which task to run on a computer and for how long. Our new scheduling algorithm, Zeta, is far more effective than the usual algorithm (first come, first serve). In experiments using Bing, Zeta enabled the search engine to handle 29 percent more queries without degrading quality, potentially reducing the number of servers by 22 percent.
Zeta schedules tasks that have three important characteristics:
Many different services share these characteristics. For example, a web search engine such as Bing accepts a query and returns a collection of webpages within a few hundred milliseconds. The process of matching and selecting the webpages is designed to return the best pages found so far, and the data structures are organized to place the highest-quality pages where they will be examined first. As a result, the quality profile of Bing looks like this:
Zeta’s key insights are that long-running tasks cannot be allowed to starve short ones and that terminating a long task early and giving its resources to a new task improves the overall service quality, because the concave quality profile returns a larger benefit at the start of a task than late in its execution. Zeta has two policies:
These two policies can be implemented simply and efficiently. The changes to the Bing search engine were less than 100 lines and added no perceivable overhead. This small change greatly improved the search engine’s performance:
This chart shows the quality of search results, relative to time-unbounded search results, of a standard workload running on a test cluster of 800 machines. At a given quality level, say 0.99, Zeta delivers a significantly higher throughput: 610 QPS versus 475 for the first-come, first-serve algorithm. Alternatively, at a given workload, Zeta delivers higher-quality search results. Moreover, Zeta reduces the variance in search quality, so that more customers get the best possible search results, even under a heavy load.
This is only a preview of the paper. Zeta is effective not only for search engines; it also works well for tasks whose quality profiles are not strictly concave. The paper contains much more detail.