Official News from Microsoft’s Information Platform
Machine Learning Blog
As described in implementing durability for memory optimized tables, the storage requirements and its management for memory optimized tables is very different compared to the disk-based tables. If you are migrating a subset of your disk-based tables to memory-optimized tables or are just developing a new application with memory-optimized tables, you will notice these differences. Unlike disk-based tables in SQL Server that use 8k pages to store data in the same format both on disk and in memory, the memory-optimized tables utilize a different size and format both in-memory and on disk. This has been a point of concern by many customers as they have observed, sometimes, disproportionate storage consumed by memory optimized tables. The goal of this blog is to help alleviate this confusion by describing how storage is managed over the lifetime of data rows.
The data for memory optimized tables is stored in one or more data/delta file pairs (also referred to as checkpoint file pairs or CFP) with data file(s) storing inserted rows and delta file(s) referencing deleted rows. During the execution of an OLTP workload, as the DML operations update, insert, and delete rows, new data/delta files are created to persist the data. Also, existing delta files are updated to process delete of existing rows. Over time, if the number of active rows, after accounting for deleted rows, in two or more consecutive CFPs falls below a threshold (usually < 50%) such that they can be merged into one CFP of 128 MB, they are merged automatically by a background merge process into a new CFP. Once the merge operation is complete, the older CFPs go through a transition phase and are eventually removed (i.e. garbage collected) from the storage. Note, SQL Server 2014 CTP2 supports up to 4096 CFPs within a database but this limit will be changed to 8192 CFPs in RTM
At any given time, the data/delta file pairs are in one of the following 4 categories
Transitioning CFPs out of category-3 and category-4 can take up to 5 checkpoints and transaction log backup steps, if not running in simple recovery mode. For in-memory databases, the automatic checkpoint is taken when the size of transaction log exceeds 512MB since the last checkpoint. You can, of course, manually force the checkpoint followed by log backup to expedite the garbage collection but then this will add 5 empty CFPs (i.e. 5 data/delta file pairs with data file of size 128MB each). Typically, in test environment, you may have to force transaction log backups and checkpoints to remove the CFPs that are not needed. In production scenarios, however, we don’t expect customers to do manual checkpoint but to rely on the automatic checkpoints and log backups that are taken as part of backup strategy. The impact of this garbage collection process is that in-memory databases may have a disproportionate storage footprint compared to its size in memory.
To clarify the point further, let us walk through some examples of in-memory databases under various scenarios and compare their size in-memory and on storage. For these examples, we will assume we are running a SQL instance with 8 logical processor with 1 memory optimized table with following schema. Note, this has a row size of approximately 8KB.
CREATE TABLE dbo.t_memopt (
c1 int NOT NULL,
c2 char(40) NOT NULL,
c3 char(8000) NOT NULL,
CONSTRAINT [pk_t_memopt_c1] PRIMARY KEY NONCLUSTERED HASH (c1) WITH
(BUCKET_COUNT = 100000)
) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA)
As indicated earlier, the fixed storage overhead for this configuration will be 1 GB (i.e. 8 data files of 128MB each).
Name memory_used_by_indexes_kb memory_used_by_table_kb
---------- ------------------------- -----------------------
t1 1024 8
Though these examples were over simplified but they do illustrate the difference in storage provisioning for memory optimized tables over disk-based tables. Your storage requirement will depend upon following key factors such
Comments in this blog are open and monitored for each post for a period of two weeks after the posting date. If you have a specific question about a blog post that is older than two weeks, please submit your question via our Twitter handle @SQLServer