Official News from Microsoft’s Information Platform
Machine Learning Blog
With the In-Memory OLTP feature in SQL Server 2014 we introduce two new types of indexes for memory-optimized tables: the memory-optimized NONCLUSTERED indexes, and the memory-optimized HASH indexes.
Memory-optimized NONCLUSTERED indexes behave similar to traditional NONCLUSTERED indexes from an application point-of-view. The main difference being that memory-optimized indexes are always covering (i.e. all columns are virtually included), while with traditional disk-based NONCLUSTERED indexes you need to specify which column you want to include alongside the index key columns.
Memory-optimized HASH indexes behave different from NONCLUSTERED indexes. They are optimized for point-lookup operations, and do not support ordered scans or inequality seek operations. In addition, you need to specify a BUCKET_COUNT when creating the index, and you need to pick the right value in order for the index to perform optimally. In earlier posts we covered the following three index troubleshooting aspects:
All that said, it seems that HASH indexes have quite a few limitations, compared with NONCLUSTERED indexes. For that reason, it is usually a safer bet to start with NONCLUSTERED indexes, both for new applications and when migrating existing disk-based tables to memory-optimized. You can then use HASH indexes to further optimize the workload. Indeed, HASH indexes are the most efficient index for equality search (point lookup) and full table scan operations.
In the remainder of this post we:
The DMV sys.dm_db_xtp_index_stats shows statistics for index operations performed by the in-memory storage engine. The index contains stats about the usage of the index since its creation in memory – note that memory-optimized indexes are always recreated on database restart. You can use the following query to retrieve key statistics about the usage of indexes on your table:
SELECT ix.index_id, ix.name, scans_started, rows_returned
FROM sys.dm_db_xtp_index_stats ixs JOIN sys.indexes ix ON
ix.object_id=ixs.object_id AND ix.index_id=ixs.index_id
WHERE ix.object_id=object_id('<table name>')
For troubleshooting indexes, the columns ‘scans_started’ and ‘rows_returned’ contain key information:
If the number of rows_returned is significantly larger than the scans_started, this is an indication that, on average, index operations scan a large part of the index. If all index operations are expected to be point lookups, this could be an indication of one of the earlier-mentioned problems where the query calls for an operation to be supported by the index, thus causing a revert to full index scan, such as: search requires a subset of hash index key columns or search on inequality predicates with a hash index.
The scans_started being larger than rows_returned is an indication that the workload is insert-heavy, or that a lot of point lookups failed to locate a row.
Issue: Index keys with many duplicate values can cause performance problems. If each index key has 5 duplicates this is usually not a problem, but if the discrepancy between the number of unique index keys and the number of rows in the tables becomes very large – more than 10X – this can become problematic.
All rows with the same index key end up in the same duplicate chain. For hash indexes this can create a lot of overhead in case of hash collisions: if multiple index keys end up in the same bucket due to a hash collision, index scanners always need to scan the full duplicate chain for the first value before they can locate the first row corresponding to the second value. For nonclustered indexes this causes additional overhead for garbage collection.
Symptom: For hash indexes the performance of DML operations degrades and CPU utilization increases. In addition, there is an increase in CPU utilization during database startup, and a potential increase in recovery time. This becomes especially clear when inserting a large number of rows.
For nonclustered indexes garbage collection will start to consume more resources, resulting in an overall increased CPU utilization in the system. The problem does not affect DML operations directly, but it does put more overall pressure on system resources.
How to troubleshoot: The average number of duplicate rows for a given index key can be obtained using T-SQL queries. First determine the row count of the table, then determine the number of unique index key values. Divide the row count by the number of unique index keys to obtain the average number of duplicates.
To determine the row count for the table use the following query:
select count(*) as 'Row count' from <tableName>
To determine the number of unique index key values use the following query:
select count(*) as 'Distinct index key values' from (select distinct <indexKeyColumns> from <tableName>) a
For hash indexes specifically, you can also troubleshoot using the hash index stats DMV. Use the following query:
SELECT hs.object_id, object_name(hs.object_id) AS 'object name', i.name as 'index name', hs.*
FROM sys.dm_db_xtp_hash_index_stats AS hs
JOIN sys.indexes AS i ON hs.object_id=i.object_id AND hs.index_id=i.index_id;
If the average chain length is high and the number of empty buckets is high, it is likely that there are many rows with duplicate index key values or there is a skew in the key values.
Workaround: First, evaluate if the index is truly needed. The index may have been added in the past to support queries that are no longer part of the workload.
For HASH indexes there are two ways to work around the issue of many duplicates:
For NONCLUSTERED indexes with a lot of duplicates, consider adding additional columns to the index key. For example, you can add the primary key columns to the index key to make it unique, in other words to uniquify the index.
Adding columns to the index key does come with a performance penalty for DML operations. Therefore, it is important to test the new uniquified index under production load, and compare the performance with the old index. Consider the overall throughput of the system, query performance, and also the overall resource utilization, especially CPU.
 In SQL 2014 CTP2 user transactions could be affected, due to garbage collection performed at commit time. This issue will be resolved in SQL 2014 RTM: garbage collection no longer directly affects user transactions.
Comments in this blog are open and monitored for each post for a period of one week after the posting date. If you have a specific question about a blog post that is older than one week, please submit your question via our Twitter handle @SQLServer