Posted by George Thomas Jr.

 The cloud is getting crowded.

As more and more devices connect to the Internet and more and more data flows to and from the cloud, the networking fabric once deemed sufficient to handle such traffic quickly is getting stretched.

In 2013 alone, the total Internet bandwidth crossing international borders was 100 terabytes per second, according to TeleGeography's recent Global Internet Geography report.

And even as fiber network capacity increases, so, too, will the volume of big data. The challenge of processing it quickly, securely and more cost-effectively remains.

To address such challenges, Microsoft researchers joined collaborators from multiple universities this week at the annual USENIX Symposium on Networked Systems Design and Implementation. Their goal: To recommend solutions that push the architectural boundaries of network services.

"The efficient management and operation of networks and data centers is Microsoft's core strength and priority," said Victor Bahl, a Microsoft distinguished scientist. "These papers represent the best in systems research, a product of close collaboration between Microsoft researchers, engineers and our colleagues in academia, anticipating and taking care of important issues well before they become problems."

Microsoft's many contributions to the conference include Geode and Retro.

Geode aims to reduce the cost of wide area bandwidth on a global scale. It's a collaboration between the University of Illinois, Microsoft researcher George Varghese, Carlo Curino, a senior scientist in Microsoft's Cloud and Enterprise product division, and Thomas Jungblut, a software engineer with Skype.

Geode specifically targets the problem of wide-area analytics in the context of bandwidth usage of SQL data distributed globally, which they call Wide-Area Big Data (WABD).

The expense of wide-area network bandwidth can drive applications to discard valuable data. It also can contribute to privacy concerns regarding raw data storage, depending on the laws or constraints governments may impose.

However, with Geode, the researchers have solved the WABD issue by:

  • Optimizing query execution plans and data replication to minimize bandwidth costs
  • Modifying query executions to potentially increase computation within individual data centers without worsening cross-data center bandwidth
  • Aggressively caching all intermediate results, thereby eliminating data transfer redundancy.

The Geode prototype, built on the popular Hive analytics framework, already has demonstrated significant improvements. The researchers say there's been a 250-fold reduction in data transfer compared to the centralized approach in a standard Microsoft production workload, and they've seen up to a 360 times improvement in a range of scenarios across several standard benchmarks, including TPC-CH and Berkeley Big Data.

See also: Mobility and networking research at Microsoft

Another project, Retro, improves management of server resources for big data inside the data center. It's a collaboration between Microsoft researchers Peter Bodik and Madan Musuvathi and researchers from Brown University.

Retro is a new framework that identifies what is causing bottlenecks in cloud systems, then optimizes cloud resources to make cloud operations more cost-effective. That also reduces latency to the customer.

Other papers accepted to NSDI'15

Beyond Sensing: Multi-GHz Realtime Spectrum Analytics
Microsoft contributors: Paramvir Bahl
SpecInsight is a system for acquiring a detailed view of 4 GHz of spectrum in realtime and uses a new scheduling algorithm that maximizes the probability of sensing active signals.

Explicit Path Control in Commodity Data Centers: Design and Applications
Microsoft contributors: Haitao Wu, Chuanxiong Guo
Introducing XPath, a method based on existing commodity switches to implement explicit path control this is readily deployable and scales to large data center networks.

Compiling Packet Programs to Reconfigurable Switches
Microsoft contributors: George Varghese
Exploring the design of a compiler for programmable switching chips and how to map logical lookup tables to physical tables while meeting data and control dependencies in the program.

FastRoute: A Scalable Load-Aware Anycast Routing Architecture for Modern CDNs
Microsoft contributors: Ashley Flavel, Pradeepkumar Mani, David A. Maltz, Nick Holt, Jie Liu, Yingying Chen, Oleg Surmachev
By collocating DNS and proxy services in each node location, FastRoute's highperformance, completely distributed system for routing users to a nearby proxy solves control issues of common content delivery networks.

A General Approach to Network Configuration Analysis
Microsoft contributors: Meg Walraed-Sullivan, Ratul Mahajan
This new approach to detect network configuration errors combines the benefits of prior techniques and can find errors proactively, before the configuration is applied.

Analyzing Protocol Implementations for Interoperability
Microsoft contributors: Nupur Kothari, Ratul Mahajan
Introducing PIC, a tool that helps developers search for non-interoperabilities in protocol implementations. Already it has been shown to find multiple previously unknown noninteroperabilities in large and mature implementations of the SIP and SPDY (v2 through v3.1) protocols.

Checking Beliefs in Dynamic Networks
Microsoft contributors: Nuno P. Lopes, Nikolaj Bjørner, Patrice Godefroid, Karthick Jayaraman, George Varghese
Addressing the shortcomings of existing network verification tools, the Network Optimized Dialog tool (NoD) is scalable to large header spaces, allowing checking for beliefs about network reachability policies in dynamic networks.

CubicRing: Enabling One-Hop Failure Detection and Recovery for Distributed In-Memory Storage Systems
Microsoft contributors: Chuanxiong Guo, Haitao Wu, Yongqiang Xiong
CubicRing is a distributed structure for cube-based networks that exploits network proximity to restrict failure detection and recovery within the smallest possible one-hop range.

Tardigrade: Leveraging Lightweight Virtual Machines to Easily and Efficiently Construct Fault-Tolerant Services
Microsoft contributors: Jacob R. Lorch, Andrew Baumann
Tardigrade replicates the service on several machines so that it continues running even when some of them fail. Yet, it keeps the service states synchronized so clients see strongly consistent results.

Scalable Error Isolation for Distributed Systems
Microsoft contributors: Flavio P. Junqueira
Introducing SEI, an algorithm that tolerates Arbitrary State Corruption faults and prevents data corruption from propagating across a distributed system, significantly reducing undetected errors.