Official News from Microsoft’s Information Platform
Machine Learning Blog
This blog post was authored by: Murshed Zaman, AzureCAT PM and Sumin Mohanan, DS SDET
With the advent of SQL Server Parallel Data Warehouse (the MPP version of SQL Server) V2 AU1 (Appliance Update 1), PDW got a new name: the Analytics Platform System [Appliance] or APS. The name changed with the addition of Microsoft’s Windows distribution of Hadoop (HDInsight or HDI) and PDW sharing the same communication fabric in one appliance. Customers can buy an APS appliance with PDW or with PDW and HDI in configurable combinations.
Used in current versions of PDW, Polybase is a technology that allows PDW users to query HDFS data. SQL users can quickly get results from Hadoop data without learning Java or C#.
Features of Polybase include:
In V2AU1 Polybase improvements include:
Another new feature introduced in PDW V2AU1 is the capability to query data that resides in Microsoft Azure Storage Accounts. Just like HDFS data, PDW can place a schema on data in Microsoft Azure Storage Accounts and move data from PDW to Azure and back.
The APS with these new features and improvements has become a first-class citizen in analytics for any type of data. Any company that has Big Data requirements and wants a highly scale-out Data Warehouse appliance can use APS.
Here are four cases that illustrate how different industries are leveraging APS:
Retail companies that use PDW who also want to harvest and curate data from their social analytics sites. This data provides insights into their products and understand the behaviors of the customers. Using APS, the company can offer the right promotion at the right time and to the right demographics. Data also allows the companies to find brand recommendation coming from a friend, relative or a trusted support group that can be much more effective than marketing literature alone. By monitoring and profiling social media, these companies can also gain a competitive advantage.
Today’s empowered shoppers want personalized offers that appeal to their emotional needs. Using social media retailers offer promotions that are tailored to individuals using real-time analytics. This process starts by ranking blogs, forums, Twitter feed and Facebook posts for predetermined KPIs revealed in these posts and conversations. Retail organizations analyze and use the data to profile shoppers to personalize future marketing campaigns. Measureable or sale data reveals the effectiveness of the campaign and the whole process starts again with the insight gained.
In this example, PDW houses the relational sale data and Hadoop houses the social emotions. PDW with built in HDI region gives the company the arsenal to analyze both data sources in a timely manner to be able to react and make changes.
Retail store APS diagram:
Companies that generate massive amounts of electronic test data can get valuable insights from APS. Test data are usually a good candidate for Hadoop due to its key-value type (JSON or XML) structure.
One example in this space is a computer component manufacturer. Due to the volume, velocity and variety of these (ie: Sort/Class) data a conventional ETL process can be very resource expensive. Using APS, companies can gain insight from their data by putting the semi-structured (key-value pair) data into an HDI-Region and other complementary structured data sources (ie: Wafer Electrical Test) into PDW. With the Polybase query feature these two types of data can easily be combined and evaluated for success/failure rates.
Computer Component Manufacturing Diagram:
The PDW with HDI regions can offer a complete solution for online game companies, to derive insights from their data. MMORPG’s (Massively Multiplayer Online Role Playing Games) are good examples where APS can deliver value. Game engines produce many transactional data (events like which avatar got killed in the current active game) and a lot of semi-structured data such as activity logs containing chat data and historical logs. PDW is well-suited to loading the transactional data in to the PDW workload and semi-structured data to the HDI region of APS. The data can then be used to derive insights such as:
Currently these companies deal with multiple solutions and products to achieve the goal. APS provides a single solution to power both their transactional and non-transactional analytics.
In the past, a relational database system was sufficient to satisfy the data requirements of a medium-scale production website. Ever-increasing competition and advancements in technology have changed the way in which websites interact with customers. Apart from storing data that customers explicitly provide the company, sites now record how customers interact with their website. As an example, when a registered user browses a particular car model, additional targeted advertisements and offers can be sent to the user.
This scenario can be captured using collected clickstream data and the Hadoop eco-system. APS acts as the complete solution to these companies by offering the PDW workload to store and analyze transactional data, combined with HDI region to derive insights from the click-stream data.
This solution also applies to Third party companies that specialize in targeted advertising campaigns for their clients.
While “Big Data” is a hot topic, we very often receive questions from customers about the actual use cases that apply to them and how they can derive new business value from “Big Data.” Hopefully these use cases highlight how various industries can truly leverage their data to mine insights that deliver business value in addition to showcasing how traditional data warehouse capabilities work together with Hadoop
Visit the Microsoft Analytics Platform System page to learn more.
Comments in this blog are open and monitored for each post for a period of two weeks after the posting date. If you have a specific question about a blog post that is older than two weeks, please submit your question via our Twitter handle @SQLServer