The Windows Servicing Guy

Tips and tricks from a Windows support engineer on issues related to servicing

More on hard links

More on hard links

  • Comments 39
  • Likes

After posting my original entry on how hard links work, a number of comments were made requesting clarification.  The original blog posting is below:

http://blogs.technet.com/b/joscon/archive/2011/01/06/how-hard-links-work.aspx

To his credit Joseph has been asking me about revisiting this topic for months.

I think part of the confusion about how hard links work revolves around the difference between what the Windows shell shows us and what is really happening in NTFS.

Here we have a couple of directories roughly displayed as the Windows shell would show it to us. The diagram gives the impression that the files exist inside their respective directories. In the following example there are two instances of ‘File1.txt’.

clip_image002

If we look under the hood we can see that each directory and each file has its own entry in the Master File Table (MFT).

clip_image004

As you can see from the above diagram a file isn’t really ‘inside’ the directory. The directory just has a pointer to the location where the file exists in the MFT. Using the diagram from my old blog entry we can see the three part relationship between the file and the parent directory.

clip_image005

1. The directory has an index entry that tells us the MFT address for the child file.

2. The file has a file name attribute that tells us what the file record number of the parent directory.

3. The file has a link count that tells us that it only has one parent directory.

If I were to dump out the metadata for a directory, it would only tell me the location in the MFT for the files that are related to the directory. No part of the actual file is actually IN the directory. If you were to look at the actual addresses in the MFT they might appear like this…

0025 – Dir1

005a – Dir2

100a – File1.txt

15ab – File1.txt

Dir1 would have an index entry that included a reference to 100a (File1.txt). And Dir2 would have an index entry that included a reference to 15ab (the second instance of File1.txt).

Now let’s look at a hard linked file. The shell part isn’t really going to appear differently.

clip_image006

But when we add in what NTFS is really doing you can start to see a difference.

clip_image008

Instead of having two copies of the same file, the index entries in both directories point to the same address in the MFT for the child file.

The three part relationship also changes. The file becomes aware that it is referenced by multiple directories.

clip_image009

1. Each directory has an index entry that tells us the MFT address for the child file.

2. The file has two file name attributes. One for each parent directory.

3. The link count is incremented to 2h.

And finally if we looked at the addresses in the MFT, they might look like this….

0025 – Dir1

005a – Dir2

100a – File1.txt

100a – File1.txt

Hopefully the new diagrams combined with the older ones will help you to properly visualize what NTFS is doing. To really get your head around it is essential to stop thinking about ‘the real copy of the file’ or ‘the file being IN the directory’.

Finally, when looking at the two link diagrams side-by-side…

clip_image010

…you might be asking yourself, “How is the hard link different than the normal link relationship?”

The answer is that it isn’t. Technically EVERY file is hard linked. We just reserve the term for talking about files that have more than one directory linked to them.

Now moving forward, let’s look at some real world information. Simple names like Dir1 and File1.txt are fine to start off with but we need to relate it to what’s in the Windows directory. We can do this with some easy substitutions.

Dir1 = c:\windows\system32

Dir2 = C:\Windows\winsxs\amd64_microsoft-windows-securestartup-service_31bf3856ad364e35_6.1.7600.16385_none_c09aa5b3bec88beb

File1.txt = bdesvc.dll

And I kept them color coded to keep it easier to follow.

I dumped out the metadata for the file bdesvc.dll. I’ve simplified it for readability but you can see that it has two file name attributes, one that lists a parent directory of 280b and one that lists a parent directory of 124d.

_FILE_NAME {

_MFT_SEGMENT_REFERENCE ParentDirectory {

ULONGLONG SegmentNumber : 0x000000000000280b

USHORT SequenceNumber : 0x0001

..... FileName : "bdesvc.dll"

_FILE_NAME {

_MFT_SEGMENT_REFERENCE ParentDirectory {

ULONGLONG SegmentNumber : 0x000000000000124d

USHORT SequenceNumber : 0x0001

..... FileName : "bdesvc.dll"

And of course the metadata also showed the higher ‘link count’, meaning that there are two links pointing to the file record.

USHORT ReferenceCount : 0x0002

I dumped out the metadata for both 280b and 124d and found that they were the two directories that I’d expected (system32 and amd64_microsoft-windows-securestartup-service_31bf3856ad364e35_6.1.7600.16385_none_c09aa5b3bec88beb).

Joseph brought up an example of what would happen if a private hotfix were installed. Depending on how that was done it would sever the hardlink and put a new version of the file in the system32 directory. So we would end up with two copies of the file. The old one would still be under amd64_microsoft-windows-securestartup-service_31bf3856ad364e35_6.1.7600.16385_none_c09aa5b3bec88beb. And the new one would be in the system32 directory.

Later if you were to run ‘SFC /scannow’ Windows would remove the new copy and establish a new hard link using the file that was still stored under WinSxS.

When SFC runs it compares a checksum of the file against a copy of the checksum that Windows has squirreled away somewhere.

However if the one and only file were to become damaged, then SFC would fail with an error…

“Windows Resource Protection found corrupt files but was unable to fix some of them.

Details are included in the CBS.Log windir\Logs\CBS\CBS.log.”

The other main concern was how to view disk space. That’s actually the easy one.

clip_image011

See the pie chart? Its correct.

Okay, I’ll explain it a bit more in-depth than that.

There are two ways to view how much free space. The first way it to use the pie chart. The information in the pie chart actually comes from a special metafile named $BITMAP. This file maintains a list of all the clusters of the volume and if they are in use or not. When a file needs space, $BITMAP is queried to see what is free. When space is allocated, $BITMAP is updated to show that the allocated clusters are now in use. Keep in mind that $BITMAP doesn’t track what files own what clusters. It only tracks what clusters are in use. So when we draw the pie chart, we just query $BITMAP to find out how many clusters we have and how many are free. This is also why the pie chart is populated so quickly. We just have to read a single file to build the chart.

The second way to get free space is what I refer to as “the wrong way”. That is to open a CMD prompt at the root directory and do a ‘dir /s’. This will list all the files on the volume that you have access to and add up the sizes at the end. This method is just plain wrong. A big part of why it is so wrong is that hardlinked files will get counted twice….once for each directory that is linked to them. The other big reason is that the DIR will only list files that you have access to. Files in the System Volume Information directory will not be included. That’s a problem because that’s where the VSS snapshots are stored. And the special metafiles that are hidden from the user are also not listed in the total. So the space used by your MFT will not be listed, your security file ($SECURE) will not be listed, and so on. There’s just too much to take into account to get a truly accurate total by adding files together.

I know it sounds like it should work but there are factors involved in storing your files that most people just don’t know about. As an example, Windows 2003 reserved about 12% of the volume for the MFT to have room to grow. So if you had a very large volume with just a few files, you might wonder where all your space was.

The take away from that is what I tell my customers and coworkers, “Trust the Pie Chart”.

I hope this has been helpful.

Robert Mitchell

High Availability

Enterprise Platform Support

Enjoy my writing? Here are other blog entries that I have authored…

http://blogs.technet.com/askcore/archive/2009/10/16/the-four-stages-of-ntfs-file-growth.aspx

http://blogs.technet.com/askcore/archive/2009/12/30/ntfs-metafiles.aspx

http://blogs.technet.com/b/askcore/archive/2010/08/25/ntfs-file-attributes.aspx

http://blogs.technet.com/b/askcore/archive/2010/10/08/gpt-in-windows.aspx

http://blogs.technet.com/b/askperf/archive/2010/12/03/performance-counter-for-iscsi.aspx

http://blogs.technet.com/b/joscon/archive/2011/01/06/how-hard-links-work.aspx

http://blogs.technet.com/b/askcore/archive/2011/04/07/gpt-and-failover-clustering.aspx

http://blogs.technet.com/askcore/archive/2010/02/18/understanding-the-2-tb-limit-in-windows-storage.aspx

Comments
  • Nice to know that information. I wish all methods in Explorer and cmd-line tools of reporting free disk space and calculating size accurately were updated in Windows 8 so there are no discrepancies and no inaccuracies.

  • Thanks Robert, very informative.

    "See the pie chart? Its correct."

    Yes, but should the pie chart be a bar chart instead? simplecomplexity.net/pie-chart-arguments

    "The second way to get free space is what I refer to as “the wrong way”. That is to open a CMD prompt at the root directory and do a ‘dir /s’."

    It's odd that the GUI interface is accurate, and the CLI isn't. How about a new command?

    > bitmap <drive>[\path] [/free | /used | /both] [/units:<Bytes|KB|MB|GB|%>] [/b]

  • "To his credit Joseph has been asking me about revisiting this topic for months."

    It was me who kept bugging Joseph who then kept bugging you. So you can smack me.

    THANK YOU THANK YOU THANK YOU THANK YOU for finally doing this. And a big thanks to Joseph for continuing to bug you.

    You know the one statement that really cleared things up for me ?

    "We just reserve the term for talking about files that have more than one directory linked to them."

    To you guys that statement is nothing. No big deal.

    But to the rest of us it is a BIG deal. Why ? Because it clarifies the technical terminology. Terminology clarification makes a HUGE difference in trying to understand something. This needs to be done much more often in Microsoft documentation. I may be wrong but I think you are the only person to have ever made that statement in all of the talk about hard links. And before you made that statement I was going to ask why the hardlink method wasn't used since the NT 4.0 days because it appears to save a lot of space in the MFT. Although I still want to know why your first example where there is more than one entry in the MFT for a file would ever happen.

    It also seems to me that having only one entry for a file in the MFT is not as safe as having multiple entries in the MFT because if the one entry gets corrupted the file is lost.

    Now, I want to make sure that I finally understand how this relates to the WinSxS directory.

    The WinSxS directory has entries ( index entries ) ( for each subdirectory in WinSxS ) that point to metadata that has the linked list of clusters of files that were laid down when windows was installed to the hard drive. The clusters should have no relationship to the way the files are laid out in the WIM file. There is one entry ( index entry ) for each file in WinSxS. When Windows is being installed to the hard drive AFTER the WinSxS directory structure is laid down to the drive the install routine then goes back and creates the Windows directory folder structure and populates the subdirectories with index entries that point to whatever files that subdirectory should be shown to have when looking at it in Explorer. This is different from the way XP was installed because when XP was installed the install routine just laid down the files to the hard drive and then populated the directories with entries ( index entries ) that pointed to the files metadata.

    So in reality the only difference between the way it was done in XP and the way it is done in Windows 7 ( and Vista but we won't count that :-) ) is that in Windows 7 the WinSxS directories act as the last reference to the files as a backup.

    Correct me if I'm wrong on anything.

  • Drewfus,

    There is a CMD way to get the free space that does use the $BITMAP method.  

    'fsutil volume diskfree c:'

    Just be aware that if you go to compare the GUI and CMD methods you have to do them quickly.  Free space, especially on your system volume is in constant flux.

    Having a command like what you are suggesting wouldn’t work currently because the $BITMAP file is per volume, not per directory.  For your suggestion to work we would have to add a $BITMAP attribute to every directory….and that would end up being an extreme performance hit.

  • Dean,

    Actually I’m glad that Joseph kept reminding me.  I just had a great deal of content creation in the last 6 months.  So my time was stretched pretty thin.

    > Although I still want to know why your first example where there is more than one entry in the MFT for a file would ever happen.

    The first example was of two files in two different directories that just happened to have the same name.  Since each instance of the file is unique, each gets an entry in the MFT.  Hardlinks are the exception.  It allows you have to one file that is seemingly in two or more places at once.  As such it only gets a single entry in the MFT.

    I can’t address your installation question.  That’s a bit outside my area.  Perhaps Joseph can handle that one.

  • Dean;

    The installation methods are very different actually.  XP was a flat file copy process, we had an ordered list of files that were expanded onto the disk one at a time.  Vista ++ explodes the install.wim to its given directory structure but it's not a flat file copy.  The foundation packages are layed down first and then the SKU differentiating packages that make up your Windows edition are layed down.  From there they are parented with the servicing stack and then projected to the appropriate directories using hard links.

  • When I said:

    "So in reality the only difference between the way it was done in XP and the way it is done in Windows 7 ( and Vista but we won't count that :-) ) is that in Windows 7 the WinSxS directories act as the last reference to the files as a backup."

    I meant AFTER the files were laid down and the installation was finished. So am I right ?

    Also there is a major problem with your new site design. Once you get past about 5 lines of comments everything slows WAY down and you can only type one character a second and after about 8 lines you can't see what your typing anymore but it's there.

  • I'll check into performance, I havent seen any issues with the new layout though, perhaps others can comment.

    As for the post installation piece, the component store doesnt really act as a last reference, thats what the \winsxs\backup folder is for.  The OS structure is completely different due to the way the componentization of the OS works.  The on disk structure is fundamentally the same though.  Maybe I am missing something in the question though.

    --Joseph

  • I thought the whole point of the WinSxS directory was to act as a backup of the installation files to be able to replace them from the WinSxS directory ( using hard links ) in case they needed to be replaced for some reason. In this regard Windows 7 would be different from Windows XP in that Windows XP had no directory installed that contained a backup of the installation files.

  • It's not used as a backup, think of it more as a flat directory of the files (similar to what you would have done in XP when you wanted the installation files locally).

  • So the \winsxs\backup directory is the backup and is where the SFC gets it's files from to hardlink ? The WinSxS directory is just an installation source like copying the XP CD to the hard drive ? If so why can't the WinSxS directory also act as the backup ? Why make another redundant directory just for backup ?

  • @Dean

    Windows XP HAD an folaer with backups, This was the DLLCache folder which sfc used to restore files.

  • The component store backup directory is there in the event that one of the files in the component store is also corrupted, it holds the manifests for the different components.  Its there for redundancy.  All of the payload is still in the component store itself.  

    You can see that the backup folder isnt hardlinked to the component store using fsutil.  For example, if you check the kernel for hard links, you'll get two:

    C:\>fsutil hardlink list c:\windows\system32\ntoskrnl.exe

    \Windows\System32\ntoskrnl.exe

    \Windows\winsxs\amd64_microsoft-windows-os-kernel_31bf3856ad364e35_6.1.7601.17640_none_ca31f809cade8847\ntoskrnl.exe

    But if I needed to recover the manifest that tells me whats in that payload, then I could potentially use the backup folder to do so (if it were corrupted in the component store).

    Directory of C:\Windows\winsxs\Backup

    08/16/2011  03:00 AM         5,561,216 amd64_microsoft-windows-os-kernel_31bf3856ad364e35_6.1.7601.17640_none_ca31f809cade8847_ntoskrnl.exe_0fb0ab79

    08/16/2011  03:00 AM         3,912,576 x86_microsoft-windows-os-kernel_31bf3856ad364e35_6.1.7601.17640_none_6e135c8612811711_ntoskrnl.exe_0fb0ab79

                  2 File(s)      9,473,792 bytes

    You can see that I have backup manifests for the 32 and 64 bit kernel there in case I need them.

    --Joseph

  • The backup folder inside WinSxS is used to get Windows working again if some critical files are corrupted which Windows needs to boot into safe mode. This is called Windows Resource Protection:

    "WRP copies files that are needed to restart Windows in the cache directory located at %Windir%\winsxs\Backup. Critical files that are not needed to restart Windows are not copied to the cache directory. The size of the cache directory and the list of files copied to cache cannot be modified."

    msdn.microsoft.com/.../aa382530%28v=vs.85%29.aspx

  • Crying :-(

    1) Can you give a definition of "manifest" ? One that's easy to understand.

    2) The %Windir%\winsxs\Backup directory is ONLY for backing up the most critical windows files needed to get windows booted ?

    3) SFC DOES NOT use the %Windir%\winsxs\Backup directory ?

    4) I thought WRP was for ALL of the windows files not just the most critical ?

    5) Where does the SFC get it's replacement files from ? Yes, I know the "replacements" are hard links.

    6) The WinSxS directory is ONLY there for installing OS components that were not chosen during the initial installation so that you do not have to put the DVD in ?