Klout (www.klout.com) measures influence across the social web by analyzing social network user data. Klout uses the impact of opinions, links, and recommendations to identify influential individuals. Every day Klout scans 15 social networks, scores hundreds of millions of profiles, and processes over 12 billion data points.
The Klout data warehouse, which relies on Apache Hadoop-based technology, exceeds 800 terabytes of data. But Klout doesn’t just crunch large data volumes; Klout takes advantage of Microsoft SQL Server 2012 Analysis Services to deliver reliable scores and actionable insights at the speed of thought.
Microsoft and Klout collaborated to build this Big Data Analytics solution. The goal for this solution was to find a cost-effective way to combine the power of Hadoop with the power of Analysis Services. The result is a solution that connects Analysis Services to Hadoop/Hive via the relational SQL Server engine, enabling Klout to reduce data latencies, eliminate maintenance overhead and costs, move aggregation processing to Hadoop, and shorten development cycles dramatically. Organizations in any industry and business sector can adopt the solution presented in this technical case study to exploit the benefits of Hadoop while preserving existing investments in SQL Server technology. This case study discusses the necessary integration techniques and lessons learned.
To review the document, please download the SQL Server Analysis Services to Hive Word document.
Great post! Apache Hive supports analysis of large datasets stored in Hadoop's HDFS and compatible file systems such as Amazon S3 filesystem. It provides an SQL-like language called HiveQL with schema on read and transparently converts queries to map/reduce,
Apache Tez and Spark jobs. All three execution engines can run in Hadoop YARN. To accelerate queries, it provides indexes, including bitmap indexes. More at www.youtube.com/watch?v=1jMR4cHBwZE