Blog - Title

September, 2007

  • Where’s my file? Root cause analysis of FRS and DFSR data deletion

    Hi, Ned here. In the Directory Services support space here at Microsoft, we are often contacted by customers for disaster recovery scenarios. We’re also brought in for deeper forensic analysis of what lead to a problem. Today we’re going to talk about a situation that covers both:

    • A customer has seen some critical data go missing.
    • That data was replicated via the File Replication Service (FRS) or the Distributed File System Replication (DFSR) Service.
    • Before they restore the data with their backup copy, they want to have root cause on who deleted what and where it started. We can’t do this after restoring data because our whole audit trail will of course be destroyed within the respective JET databases.

    FRS Deletion Forensics – The Where and When

    You need to start by determining the name of some folder or file that has been deleted. It's important that this be exact as we will be using it to search. You will need the full original path since it is possible just the name could be duplicated throughout the content set.

    • For this example we have three servers called 2003SRV13, 2003SRV16, and 2003SRV17.
    • We have a folder called c:\frstestlink\importantfolder13 that has been deleted.
    • It contained a file called c:\frstestlink\importantfolder13\importantfile13.doc which was deleted (naturally). Our folder could contain thousands of files but we just need to know one. That’s easy, someone is screaming at you that it’s missing. :)

    Install FRSDIAG on any server that participated in the FRS content set where data was deleted.

    Open a CMD prompt and navigate to the FRSDIAG directory. This will default to:

    c:\program files\windows resource kit\tools\frsdiag

    You’ll see that we have a very useful utility called NTFRSUTL.EXE. Running it with /? will show you its options:

    ntfrsutl [idtable| configtable | inlog | outlog] [computer] = enumerate the service's idtable/configtable/inlog/outlog
    computer = talk to the NtFrs service on this machine.

    ntfrsutl [memory|threads|stage] [computer]
    = list the service's memory usage
    computer = talk to the NtFrs service on this machine.

    ntfrsutl ds [computer]
    = list the service's view of the DS
    computer = talk to the NtFrs service on this machine.

    ntfrsutl sets [computer]
    = list the active replica sets
    computer = talk to the NtFrs service on this machine.

    ntfrsutl version [computer]
    = list the api and service versions
    computer = talk to the NtFrs service on this machine.

    ntfrsutl forcerepl [computer] /r SetName /p DnsName
    = Force FRS to start a replication cycle ignoring the schedule.
    = Specify the SetName and DnsName.
    computer = talk to the NtFrs service on this machine.
    SetName = Name of the replica set.
    DnsName = DNS name of the inbound partner to force repl from.

    ntfrsutl poll [/quickly[=[N]]] [/slowly[=[N]]] [/now] [computer]
    = list the current polling intervals.
    now = Poll now.
    quickly = Poll quickly until stable configuration retrieved.
    quickly= = Poll quickly every default minutes.
    quickly=N = Poll quickly every N minutes.
    slowly = Poll slowly until stable configuration retrieved.
    slowly= = Poll slowly every default minutes.
    slowly=N = Poll slowly every N minutes.
    computer = talk to the NtFrs service on this machine.

    Cool stuff – you can force replication, list out various FRS configuration info, or get performance stats. It’s great for scripting.

    In order to figure what happened, we will dump out some tables from the FRS JET database by executing:

    NTFRSUTL OUTLOG > outlog.txt
    NTFRSUTL IDTABLE > idtable.txt

    (Note: the output from NTFRSUTL IDTABLE is not the same as collecting IDTABLE information with FRSDIAG’s GUI console).

    We then start FRSDIAG and click the 'Browse' button. Drop down the Replica Set and select the one that contained deleted data. Click 'Add All' to add the members. Then click ok.

    Click 'Tools' then select 'Build GUID2Name for Target Server(s)'.

    This will create us a text file that lists out GUID and its associated SERVER NAME, like so:

    ======================================================
    Replica Set GUID : 6f83352f-f404-4eda-a714ae1691e3e9d8
    Replica Set Name : FRSTEST|FRSTESTLINK
    ======================================================

    GUID                                MEMBER NAME                            SERVER NAME
    ----                                -----------                            -----------
    30409f5d-8493-41ad-a98ab03fc1b795e5 {6AEC89F5-24B5-4C1B-B15E-6EFE5A60B75C} 2003srv16
    e8219dee-532a-4dff-83f09f036e331daa {6F617C11-2997-4134-952B-5B3572D4AF70} 2003srv17
    e8feaedc-6bce-41f4-94c39cade3932da8 {88600E1E-AAF8-4C70-A1C3-A36EB471E2B3} 2003srv13

    We open the IDTABLE.TXT file and search for our file we know was deleted:

    Table Type: ID Table for FRSTEST|FRSTESTLINK (1)
    FileGuid                     : 3647d318-502f-11dc-a1070003ff6813c5
    FileID                       : 00130000 00003e45
    ParentGuid                   : 2d7a8327-7308-4464-a63e367e39c27690 << Folder it was in
    ParentFileID                 : 000d0000 00003e37
    VersionNumber                : 00000001
    EventTime                    : Wed Aug 22, 2007 11:57:26 << when deleted
    OriginatorGuid               : 30409f5d-8493-41ad-a98ab03fc1b795e5
    << where deleted

    OriginatorVSN                : 01c7e4c3 ea6dd496
    CurrentFileUsn               : 00000000 001be408
    FileCreateTime               :
    FileWriteTime                :
    FileSize                     : 00000000 000000a0
    FileObjID                    : 3647d318-502f-11dc-a1070003ff6813c5
    FileName                     : importantfile13.doc << here's our file
    FileIsDir                    : 00000000
    FileAttributes               : 00000020 Flags [ARCHIVE ]
    Flags                        : 00000001 Flags [DELETED ] << Proof of deletion
    ReplEnabled                  : 00000001
    TombStoneGC                  : Sun Oct 21, 2007 11:57:26
    OutLogSeqNum                 : 00000000 00000000
    Spare1Ull                    : 00000000 00000001
    MD5CheckSum                  : MD5: a41eea20 979f04e9 dff7592a e8dc3e8b
    RetryCount                   : 0
    FirstTryTime                 :

    We then look in the OUTLOG.TXT to confirm the folder matches up using ParentGuid above:

    Table Type: Outbound Log Table for FRSTEST|FRSTESTLINK (1)
    SequenceNumber               : 0000008d
    Flags                        : 01000024 Flags [Content LclCo CmpresStage ]
    IFlags                       : 00000001 Flags [IFlagVVRetireExec ]
    State                        : 00000014  CO STATE:  IBCO_OUTBOUND_REQUEST
    ContentCmd                   : 00002000 Flags [RenNew ]
    Lcmd                         : 0000000f  D/F 1   NoCmd
    FileAttributes               : 00000010 Flags [DIRECTORY ]
    FileVersionNumber            : 00000001
    PartnerAckSeqNumber          : 00000000
    FileSize                     : 00000000 00000000
    FileOffset                   : 00000000 00000000
    FrsVsn                       : 01c7e44b df699803
    FileUsn                      : 00000000 001bd690
    JrnlUsn                      : 00000000 001bd690
    JrnlFirstUsn                 : 00000000 001bd690
    OriginalReplica              : 1  [???]
    NewReplica                   : 1  [???]
    ChangeOrderGuid              : cf3bc76b-b3e1-4e72-ae8962371bb48501
    OriginatorGuid               : e8feaedc-6bce-41f4-94c39cade3932da8
    FileGuid                     : 2d7a8327-7308-4464-a63e367e39c27690 << there’s our GUID
    OldParentGuid                : 53605485-4dd4-4b9a-bc5a022760515559
    NewParentGuid                : 53605485-4dd4-4b9a-bc5a022760515559
    CxtionGuid                   : 92f5d906-cd45-4639-973021461e454c8b
    Spare1Ull                    :
    MD5CheckSum                  : MD5: b68d5ccf 21f8b5dd e7eb48f1 f45b01d9
    RetryCount                   : 0
    FirstTryTime                 : Wed Aug 22, 2007 11:55:21
    EventTime                    : Wed Aug 22, 2007 11:55:18
    FileNameLength               :       34
    FileName                     : ImportantFolder13 << definitely our folder that was deleted
    Cxtion Name                  : <Jrnl Cxtion> <- <Jrnl Cxtion>\<Jrnl Cxtion>
    Cxtion State                 : Joined
     

    Then we can verify back in the IDTABLE.TXT what the time and source were:

    Table Type: ID Table for FRSTEST|FRSTESTLINK (1)
    FileGuid                     : 2d7a8327-7308-4464-a63e367e39c27690
    FileID                       : 000d0000 00003e37
    ParentGuid                   : 53605485-4dd4-4b9a-bc5a022760515559
    ParentFileID                 : 00030000 000039bc
    VersionNumber                : 00000002
    EventTime                    : Wed Aug 22, 2007 11:57:23 << there's the delete time
    OriginatorGuid               : 30409f5d-8493-41ad-a98ab03fc1b795e5 << here's the source of the delete
    OriginatorVSN                : 01c7e4c3 ea6dd495
    CurrentFileUsn               : 00000000 001beba8
    FileCreateTime               :
    FileWriteTime                :
    FileSize                     : 00000000 00000000
    FileObjID                    : 2d7a8327-7308-4464-a63e367e39c27690
    FileName                     : Dc9
    FileIsDir                    : 00000001
    FileAttributes               : 00000010 Flags [DIRECTORY ]
    Flags                        : 00000001 Flags [DELETED ] << confirmed that it's been deleted
    ReplEnabled                  : 00000001
    TombStoneGC                  : Sun Oct 21, 2007 11:57:23
    OutLogSeqNum                 : 00000000 00000000
    Spare1Ull                    : 00000000 00000000
    MD5CheckSum                  : MD5: b68d5ccf 21f8b5dd e7eb48f1 f45b01d9
    RetryCount                   : 0
    FirstTryTime                 :

    ===

    Now we look back at the GUID2Name table we generated earlier, we can see that:

    30409f5d-8493-41ad-a98ab03fc1b795e5 = 2003SRV16.

    So we know that at Wed Aug 22, 2007 11:57:23 AM on server 2003SRV16, something or someone deleted all this data. Wasn’t that easy? :) If we had object access auditing enabled on that server at the time and the folder configured for auditing, we can even see who did it. More on this later…

    DFSR Deletion Forensics – The Where and When

    Since DFSR exposes nearly all its interfaces through WMI, we can use a powerful command-line utility called WMIC that can be used to return useful info from the databases. This way we don’t need to rely on add-on tools and debug logs and such. For my example below I am intentionally not using VBScript as I want everyone to understand exactly what it is we’re doing – but feel free to script it up, all the WMI classes are well documented on MSDN.

    So here we go again: 

    • We have our three servers 2003SRV13, 2003SRV16, and 2003SRV17.
    • All are in a Replication Group called ImportantData
    • They have a Replicated Folder called… wait for it… ReplicatedFolder.
    • That folder contains various files and folders, including a folder called ImportantSubFolder. It contains some files, including one called critical.doc. Naturally, someone has deleted critical.doc… let’s figure out where and when.

    First we open a CMD prompt as an admin and dump the Replicated Folder info like so:

    C:\>wmic /namespace:\\root\microsoftdfs path DfsrReplicatedFolderInfo get ReplicatedFolderGuid,ReplicatedFolderName,ReplicationGroupName > rfinfo.txt

    This returns the following into our rfinfo.txt file:

    ReplicatedFolderGuid                  ReplicatedFolderName  ReplicationGroupName
    8722EF11-6466-4472-888F-11B8A57B68A4  replicatedfolder      importantdata

    Now we have enough info to confirm we're looking at the right data. Let's get the status of the deleted file by running this command and providing it the file name and the ReplicatedFolderGuid from above:

    C:\>wmic /namespace:\\root\microsoftdfs path DfsrIdRecordInfo WHERE (filename='critical.doc' and replicatedfolderguid='8722EF11-6466-4472-888F-11B8A57B68A4') get filename,flags,updatetime,GVsn > file.txt

    Our output file.txt contains:

    FileName Flags GVsn UpdateTime

    critical.doc 4 {BAA4E6D9-BF1A-4C83-ADF4-FDFD481AE2FC}-v113867 20070823233010.774625-000

    We can see that it was deleted on Aug 23, 2007 at 23:30:10 (11:30 PM) GMT time. For us this means 7:30PM EDT.

    The Flags value of 4 tells us it's been deleted; examine the chart below. An ordinary replicated file will have a Flags value of 5 (meaning 0x1 && 0x4 for Present and Replicated). If 4 the Present flag has been removed meaning the file is tombstoned, i.e. removed from the replica.

    Value Meaning

    PRESENT_FLAG
    0x1

    The resource is not a tombstone; it is available on the computer.

    NAME_CONFLICT_FLAG
    0x2

    The tombstone was generated because of a name conflict. This flag is meaningful only for tombstones.

    UID_VISIBLE_FLAG
    0x4

    The ID record has already been sent out to other partners; therefore, other partners are aware of this resource.

    JOURNAL_WRAP_FLAG
    0x10

    The volume has had a journal wrap and the resource has not been checked to determine if there is any change by the journal wrap recovery process.

    PENDING_TOMBSTONE_FLAG
    0x20

    The ID record is in the process of being tombstoned (or deleted.)

    The GVsn value is important in that the GUID inside those curly brackets will always contain the unique database GUID of the server where the file was last changed. Since deleting the file counts as a change, now we just need to figure out who owns that GUID. So let's use the DFSRDIAG command to find the culprit:

    C:\>dfsrdiag guid2name /guid:BAA4E6D9-BF1A-4C83-ADF4-FDFD481AE2FC /rgname:importantdata

    Which returns:

    Object Type : DfsrVolumeInfo
    Computer    : 2003SRV16.fabrikam.com  << The Server where the delete occurred
    Volume Guid : 346CA491-54BA-11DB-91ED-806E6F6E6963
    Volume Path : C:
    Volume SN   : 1826913329
    DB Guid     : BAA4E6D9-BF1A-4C83-ADF4-FDFD481AE2FC

    Badda-bing! That BAA4E6D9-BF1A-4C83-ADF4-FDFD481AE2FC GUID matches our GVsn above. There's our guy. What is it with people deleting files and folders off this 2003SRV16 server? We should have a chat with their site admin...

    The same exact steps will work for folders. Let's do it real fast this time and figure out where someone deleted the whole ‘importantsubfolder’:

    C:\>wmic /namespace:\\root\microsoftdfs path DfsrReplicatedFolderInfo get ReplicatedFolderGuid,ReplicatedFolderName,ReplicationGroupName > rfinfo.txt

    ReplicatedFolderGuid                  ReplicatedFolderName  ReplicationGroupName 
    8722EF11-6466-4472-888F-11B8A57B68A4  replicatedfolder      importantdata

    C:\>wmic /namespace:\\root\microsoftdfs path DfsrIdRecordInfo WHERE (filename='importantsubfolder' and replicatedfolderguid='8722EF11-6466-4472-888F-11B8A57B68A4') get filename,flags,updatetime,GVsn,clock > file.txt

    FileName            Flags  GVsn                                        UpdateTime                
    importantsubfolder  4      {97DA0CC3-DBB4-437F-BB6F-BE8A970FE318}-v31  20070823235332.792930-000 

    C:\>dfsrdiag guid2name /guid:97DA0CC3-DBB4-437F-BB6F-BE8A970FE318 /rgname:importantdata

       Object Type : DfsrVolumeInfo
       Computer    : 2003SRV13.fabrikam.com
       Volume Guid : 929F0871-54B9-11DB-B293-806E6F6E6963
       Volume Path : C:
       Volume SN   : 1826913329
       DB Guid     : 97DA0CC3-DBB4-437F-BB6F-BE8A970FE318

    Done! Looks a lot easier the second time around, doesn’t it?

    Note that when a deletion is replicated in to a DFSR server, the file by default is moved to \DfsrPrivate\ConflictAndDeleted under the root of the replicated folder. If the delete was not replicated in, but instead was the result of a local deletion, the file is moved to the Windows Recycle Bin (unless you held down the SHIFT key while deleting, in which case the file is deleted for good). By default the quota on ConflictAndDeleted is 660 MB but that is configurable on the Advanced Tab in the replicated folder properties. In the same location un-checking the “Move deleted files to Conflict And Deleted folder” box will make it so deletions that are replicated in are actually deleted for good without being moved to ConflictAndDeleted.

    The information about all the data residing in ConflictAndDeleted is contained in the \DfsrPrivate\ConflictAndDeletedManifest.xml file. When the quota is reached, files are purged from ConflictAndDeleted folder and the ConflictAndDeletedManifest.xml in the order that they were put there. This means you have a limited amount of time to catch a deletion and be able to restore it from ConflictAndDeleted.

    There is a sample script for restoring data from ConflictAndDeleted. This is needed because the folder structure of deleted data is flattened and all data resides directory off the root of ConflictAndDeleted, and the filename is appended with the GVSN. The script reads the ConflictAndDeletedManifest.xml so it knows the original file names and folder structure. But you can also determine that using the DfsrConflictInfo WMI class. For example you can check for the presence of a file in a given server’s ConflictAndDeleted folder by running:

    C:\>wmic /namespace:\\root\microsoftdfs /node:2003srv13 path DfsrConflictInfo where "filename like 'critical.doc%'" get * /format:textvaluelist

    ConflictFileCount=1
    ConflictPath=\\.\C:\replicatedfolder\importantsubfolder\critical.doc
    ConflictSizeInBytes=881211
    ConflictTime=20070823233010.000000-000
    ConflictType=5
    FileAttributes=32
    FileName=critical.doc-{97DA0CC3-DBB4-437F-BB6F-BE8A970FE318}-v27
    GVsn={97DA0CC3-DBB4-437F-BB6F-BE8A970FE318}-v27
    MemberGuid=2C50672F-32A2-4D7D-AF44-88E1812F6E08
    ReplicatedFolderGuid=8722EF11-6466-4472-888F-11B8A57B68A4
    ReplicationGroupGuid=6143BD54-C9CC-42E1-A1FA-03BB34BF87F2
    Uid={97DA0CC3-DBB4-437F-BB6F-BE8A970FE318}-v27

    The trailing % wildcard is needed because the FileName property has the GVSN of the file appended to it. The ConflictPath property contains the original path and file name for the file before it was deleted.

    Auditing – The Who

    Now that we’ve covered when and where, the Windows Auditing subsystem can be used to tell you who via the Object Access Auditing setting. The important take away is that you need to have it in place and the audit descriptors set on your files and folders before you need them. Setting it up after the files have gone missing isn’t going to buy you anything. I can tell your eyes have started to glaze over on this post so I’m going to wrap it up here. J

    To set up Object Access Auditing you can follow this checklist and set your critical replicated folders to audit EVERYONE for DELETE and process that with inheritance on down. We really don’t usually care about files being changed and certainly not added, but deletions drive end users nuts.

    An important thing to understand about Auditing in Windows 2000 and 2003 is that it’s bound by some legacy limitations in the Event Log system (no longer true in Windows Server 2008 or Vista). Basically, you want to keep the total size of all your event logs at around 300MB total or they will become unstable. You’ll find that enabling Object Access Auditing is going to make your Security event logs wrap pretty often, so if it was only 256MB you wouldn’t have much time for forensics. You can get around this by using KB312571 to configure AutoBackupLogFiles and save off the logs when they wrap automagically. Then they can be backed up and deleted periodically with a scheduled task or whatever you like.

    Auditing is not free – it costs CPU time, disk I/O, and can increase memory usage within LSASS. Please be sure to test this for your environment. Really.

    For excellent tips and deep explanation of Windows Auditing, check out the Windows Security Logging and Other Esoterica MSDN blog by Eric Fitzgerald (who ran Auditing for many years as a Program Manager).

    For everything Auditing, check out the Windows Server 2003 Auditing web portal

    - Ned Pyle

  • Not enough storage is available to complete this operation

    Wait, don't leave!!! 

    We get a lot of hits thanks to Live and Google searches sending people our way when they search on this error. We highly recommend you follow this link to find all the various causes of this error (it uses the site:microsoft.com option to narrow your results and the exact error verbiage) :

    Return all exact Microsoft site articles on this error via Live Search

    Ok, back to our blog post:

    =================================================== 

    Last week I had a customer that was witnessing the following error each time he attempted to manage his Active Directory environment - including attempting to simply add a domain account to his local client machines Administrators group.

    "Not enough storage is available to complete this operation."

     He had confirmed that he was receiving this same pop-up error message on each and every one of his 2003 SP1 Domain Controllers.

    I began troubleshooting this issue as an unresponsive/out-of-resources type of issue in reference to the 'not enough storage' part of the error message.  In doing such, I requested and retrieved both 'NetStat -anb' (http://support.microsoft.com/default.aspx?scid=kb;EN-US;137984) and Server Performance Advisor (SPA) output.   

    Note:  If you're not familiar with the SPA tool, the following blog site for 'A Day at the SPA' is helpful - http://blogs.technet.com/ad/  

    Upon reviewing the information gathered by these tools, it was apparent to me that the error being returned made this not necessarily an unresponsive/out-of-resources type of issue as first suspected.  Specifically, when I reviewed the SPA data and Netstat output it was evident that there was not a resource or bottleneck problem at hand when the issue occurs.

    However, the only reference to the error returned with a search of the KnowledgeBase, included an issue w/ MaxTokenSize - 935744 (http://support.microsoft.com/default.aspx?scid=kb;EN-US;935744).    So that wasn’t quite it either.

    I then located article 913003 - "The Offer Remote Assistance HelpersCNF group is created on domain controllers that have the SMS 2003 Advanced Client installed and that have the Remote Control Agent service enabled" (http://support.microsoft.com/default.aspx?scid=kb;EN-US;913003). 

    After reviewing this I then confirmed with my customer that they had just recently upgraded their version of Microsoft Systems Management Server (SMS) 2003 Advanced Client and had the Remote Control Agent service enabled (as the article details).  Accessing the Users container within the Active Directory Users and Computers snap-in, we had several (322 actually) 'Offer Remote Assistance HelpersCNF' domain groups.

    Upon following through with the article in disabling both the 'Solicited Remote Assistance' and 'Offer Remote Assistance' settings within the Default Domain Controllers Group Policy, we deleted all of the CNF objects.  Once deleted and the deletion replicated around to the other Domain Controllers, we were no longer witnessing the - "Not enough storage is available to complete this operation" error.

     Problem solved: our customer could now add a domain account to his local client machines Administrators group.