Official News from Microsoft’s Information Platform
Machine Learning Blog
In the previous blog Merge Operations, we described that how a CFP transitions across various stages in its life cycle. In this blog, we will walk through an actual example to show the state transition. Note, some of the states shown for CFPs are only visible in RTM bits (not in CTP2) but internally the state transitions is the same. Basing this example using RTM bits adds lot more clarity in how storage is managed. Another note is that RTM bits allocate CFPs (16MB of data file and 1 MB of delta file) on machines with physical memory <= 16GB. This example was run on a machine with 8GB of RAM using pre-RTM bits.
CREATE DATABASE hkdb1 ON
PRIMARY (NAME = [hkdb1_hk_fs_data], FILENAME = 'C:\hekaton_test\data\hkdb1_data.mdf'),
FILEGROUP [hkdb1_hk_fs_fg] CONTAINS MEMORY_OPTIMIZED_DATA
(NAME = [hkdb1_hk_fs_dir], FILENAME = 'C:\hekaton_test\data\hkdb1_hk_fs_dir')
LOG ON (name = [hktest_log], Filename='C:\hekaton_test\data\hkdb1_log.ldf', size=100MB)
-- disable auto-merge so that we can show the merge
-- in a predictable way
dbcc traceon (9851, -1)
-- set the database to full recovery. This is a common practice for production scenario
alter database hkdb1 set recovery full
-- create a memory-optimized table with each row of size > 8KB
CREATE TABLE dbo.t_memopt (
c1 int NOT NULL,
c2 char(40) NOT NULL,
c3 char(8000) NOT NULL,
CONSTRAINT [pk_t_memopt_c1] PRIMARY KEY NONCLUSTERED HASH (c1) WITH (BUCKET_COUNT = 100000)
) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA)
-- do the database backup. We do this so that we can do transaction log backups
BACKUP DATABASE [hkdb1] TO DISK = N'C:\hekaton_test\data\hk-empty-data.bak'
WITH NOFORMAT, INIT, NAME = N'hkdb1-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10
-- load 8000 rows. This should load 5 16MB data files on a machine with <= 16GB of storage
declare @i int = 0
while (@i < 8000)
insert t_memopt values (@i, 'a', replicate ('b', 8000))
set @i += 1;
In this example, we will show the output of DMV sys.dm_db_xtp_checkpoint_files using the following query.
select file_type_desc, state_desc, internal_storage_slot,file_size_in_bytes, file_size_used_in_bytes,
inserted_row_count, deleted_row_count, lower_bound_tsn, upper_bound_tsn, last_backup_page_count, drop_table_deleted_row_count
order by container_id, file_type_desc, upper_bound_tsn
Here is the output after removing some columns and rows (just showing rows with DATA file).
Now, execute a manual checkpoint by executing the following command and see the state transition in the CFPs. You will note that UNDER CONSTRUCTION CFPs are now marked ACTIVE as they are part of a durable checkpoint that we have just completed. The manual checkpoint closed the data file with internal_storage_slot though it was only 25% full. This data file can potentially be merged in future as it qualifies under the merge policy.
Now, we will delete 50% rows
-- now delete 50% rows
while (@i <= 8000)
delete t_memopt where c1 = @i
set @i += 2;
Here is the output of the DMV for CFPs in ACTIVE state using the following query
where state = 2
Note that the delta files have reference to the deleted rows. There is a CFP with 0 data rows due to a manual checkpoint.
For this example, we had disabled auto-merge so that we can see the state transitions with no surprises.
We will now force a manual merge (merging all the ACTIVE CFPs) by executing the following command
-- now do the manual merge
-- it generates merge-target and other files stay as regular files
exec sys.sp_xtp_merge_checkpoint_files 'hkdb1', 1877, 12004
The DMV shows that the merge was executed and there is new CFP marked as MERGE TARGET containing 4000 rows. Also, note the data file is around 33MB, much larger than 16MB. This happened because we forced the merge. The auto-merge would not have merged all ACTIVE CFPs because it won’t meet the merge policy.
At this time, the CFPs that are sources of MERGE are still marked as ACTIVE. This is because the merge has not been installed. You can query the state of merge request as follows
select request_state_desc, lower_bound_tsn, upper_bound_tsn
request_state_desc lower_bound_tsn upper_bound_tsn
-------------------- -------------------- --------------------
PENDING 0 12007
Now, we will force a checkpoint to install the merge.
Here is the output of the DMVs. It shows that the MERGE TARGET is now changed to ACTIVE state and all the CFPs that were used as source of the MERGE are now marked as MERGED SOURCE. Referring to the blog <reference>, these CFPs are now in category-3.
The source CFPs will transition into category-4 automatically as automatic checkpoints and log backups are taken. For the example, here we will force the checkpoint and log backups as follows
-- do the log backup.
BACKUP LOG [hkdb1] TO DISK = N'C:\hekaton_test\data\hk-log-1.bak'
WITH NOFORMAT, INIT, NAME = N'hk-log Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10
Here is the output of the DMV showing that earlier CFPs have now transitioned to state REQUIRED FOR BACKUP/HA meaning that these CFPs can’t be garbage collected as they are needed for operational correctness of the database with memory-optimized table
After couple of manual checkpoint followed by log backup, the DMV output shows that CFPs are finally being handed off to the garbage collector for File Stream. At this stage, these CFPs are not needed for operational correctness for our database. To state differently, the full database backup of the database will NOT include these CFPs but they continue take storage space in the memory-optimized filegroup until they are garbage collected.
One more cycle of manual checkpoint followed by log backup, the CFPs are actually transitioned to TOMBSTONE state which indicates that these will get removed as part of File Stream Garbage Collection. Please refer to FS Garbage Collection for details
This concludes the state transition of CFPs. The main point to remember is that it takes few checkpoint/log-backups after a set of CFPs have been merged before the storage can be garbage collected. We expect these state transitions to occur automatically in production environment as checkpoints will occur automatically for each additional 512MB transaction log generation and the regular log backups that are taken as part of backup strategy. However, you will need to account for this extra storage as CFPs go through transition from MERGE SOURCE to finally getting garbage collected.