There are times when the Windows shell gives you incorrect information - not because it likes lying, but because it is being given this information by a lower-level component.
Such is the case with the long-awaited "hard links" introduced with Windows Vista.
A hard link is effectively an additional pointer to an already-existing file, which displays all the properties of the file in exactly the same way as the original reference - the data itself is not copied, however.
Because a hard link appears no different from a regular file, all enumerations of folders which contain links to the same file will be overestimating the disk space used.
Here I created a "test" folder on my desktop and copied into it one ~100MB file "test.dat" (large size for better illustration):
C:\Users\padams\Desktop\test>dir Volume in drive C has no label. Volume Serial Number is A8D0-FCE3
Directory of C:\Users\padams\Desktop\test
2008-10-10 16:23 <DIR> . 2008-10-10 16:23 <DIR> .. 2008-10-10 16:23 104 921 808 test.dat 1 File(s) 104 921 808 bytes 2 Dir(s) 54 231 470 080 bytes free
I then made 10 hard links to this file; copy01.dat thru copy10.dat:
C:\Users\padams\Desktop\test>mklink /h copy01.dat test.dat Hardlink created for copy01.dat <<===>> test.dat
C:\Users\padams\Desktop\test>mklink /h copy02.dat test.dat Hardlink created for copy02.dat <<===>> test.dat
C:\Users\padams\Desktop\test>mklink /h copy03.dat test.dat Hardlink created for copy03.dat <<===>> test.dat
C:\Users\padams\Desktop\test>mklink /h copy04.dat test.dat Hardlink created for copy04.dat <<===>> test.dat
C:\Users\padams\Desktop\test>mklink /h copy05.dat test.dat Hardlink created for copy05.dat <<===>> test.dat
C:\Users\padams\Desktop\test>mklink /h copy06.dat test.dat Hardlink created for copy06.dat <<===>> test.dat
C:\Users\padams\Desktop\test>mklink /h copy07.dat test.dat Hardlink created for copy07.dat <<===>> test.dat
C:\Users\padams\Desktop\test>mklink /h copy08.dat test.dat Hardlink created for copy08.dat <<===>> test.dat
C:\Users\padams\Desktop\test>mklink /h copy09.dat test.dat Hardlink created for copy09.dat <<===>> test.dat
C:\Users\padams\Desktop\test>mklink /h copy10.dat test.dat Hardlink created for copy10.dat <<===>> test.dat
Now see how the folder contents are presented:
2008-10-10 16:26 <DIR> . 2008-10-10 16:26 <DIR> .. 2008-10-10 16:23 104 921 808 copy01.dat 2008-10-10 16:23 104 921 808 copy02.dat 2008-10-10 16:23 104 921 808 copy03.dat 2008-10-10 16:23 104 921 808 copy04.dat 2008-10-10 16:23 104 921 808 copy05.dat 2008-10-10 16:23 104 921 808 copy06.dat 2008-10-10 16:23 104 921 808 copy07.dat 2008-10-10 16:23 104 921 808 copy08.dat 2008-10-10 16:23 104 921 808 copy09.dat 2008-10-10 16:23 104 921 808 copy10.dat 2008-10-10 16:23 104 921 808 test.dat 11 File(s) 1 154 139 888 bytes 2 Dir(s) 54 231 461 888 bytes free
Look at the number of bytes allegedly used, but then compare the 'bytes free' values from the 2 listings - clearly the disk space used has not increased by 1GB by making 10 hard links.
The command prompt is not "faulty" - Explorer is given the same information on the files too:
Most of the contents of %systemroot%\System32 is actually hard links to files in folders under %systemroot%\winsxs - the "Side-By-Side" component store.
This basically means if you view the properties of the %systemroot% folder, you can pretty much subtract the size of the %systemroot%\System32 folder to get a more accurate total size.
On a side-note, details on the winsxs folder can be found here, along with the reasoning for its size: http://blogs.technet.com/askcore/archive/2008/09/17/what-is-the-winsxs-directory-in-windows-2008-and-windows-vista-and-why-is-it-so-large.aspx
To save disk space when multiple copies of the same file are useful, especially if the file contents change and all references need to be in sync.
You could use shortcuts (.lnk files) or symbolic (soft) links, but these are separate files on disk and have to be understood by the applications looking at them - an application from the mid 90's, for example, would try to open a .lnk file if you told it to, it would have no idea that this is a pointer to where the file really is.
The file itself is only deleted when ALL references to it have disappeared.
So in the above example I could delete test.dat and copy01.dat thru copy09.dat - the file would not be deleted as it is still accessible through copy10.dat.
Once copy10.dat is deleted, then the disk space is marked as free again.
So, hypothetically speaking... I have a 100 GB drive, 60 GB of data, 30 GB of alleged space because of hard links, my computer would register 90 GB used while only 60 GB is actually used. So what would happen if I tried to copy 30 GB of data to this drive? Would the current design of the operating system prevent the transfer? If so, this is a major issue in the OS with the hard links. If not, what would the drive display after successfully transfering the data?
"I have a 100 GB drive, 60 GB of data, 30 GB of alleged space because of hard links, my computer would register 90 GB used while only 60 GB is actually used"
> The used/free space per volume as viewed through the Computer view or Disk Management is correct, it's looking at allocated blocks and not (double) enumeration of logical files. So in your example the "disk space used" should be reported as ~60GB.
"So what would happen if I tried to copy 30 GB of data to this drive?"
> It would succeed, and the disk space used would change to ~90GB.
"Would the current design of the operating system prevent the transfer?"
> Nope, this issue is just with the enumeration of the "shortcuts to data on disk" and then summing their total space - the _free_ disk space per volume is not based on this (or it would take a VERY long time to calculate on each reboot or browsing of a volume).
"If not, what would the drive display after successfully transfering the data?"
> The drive would show 30GB less free disk space than whatever it said before you started - so ~90GB used as opposed to ~60GB used.
If you extrapolate the demo I did in the blog entry it is possible to have the alleged bytes on disk used by hard links to be greater than the total disk space.
The amount of allocated disk space as seen by Computer / Disk Management is correct - the issue comes with trying to enumerate where this space is actually used.
When you delete the hard link , even if it's the only reference to the original file the file itself is not deleted
I have a question about Folder replication
Let's say I have a folder which contain the "original" file C:\source\file.xxx
I make a HL in the folder c:\toreplicate\file.xxx which contains also normal files.
The I set up a folder replication with DFSR . to another server , for folder c:\toreplicate .
What happens for the "hardLinked" file ?
- Is it skipped ?
- Is it copied as HL ?
- Is it copied as Original file ?
"When you delete the hard link , even if it's the only reference to the original file the file itself is not deleted"
Not quite sure I follow this - if there is only 1 reference to a file, then deleting it marks the disk space as free
For all intents and purposes, both are "real" files and report the same information & (committed) contents.
If it help, consider any file as being a single hard link to the allocated disk space, and subsequent hard links are just extra references to this same data & metadata.
(For the programmer-types out there, a file gets deleted when the reference count is zero, and creating a hard link to a file is increasing the reference count by one.)
Mark Russinovich has produced a "Disk Usage" tool (du.exe) which,when run elevated, gives a more accurate view of the sum of all file sizes and actual disk space (partial and full clusters) used by them:
Brilliant article, very clearly explained