Blog - Title

June, 2012

  • New Office 365 and AD FS/DirSync Information Available

    Hi folks, Jonathan here again. Adam Conkle has posted some new goodies related to Office 365 and AD FS/DirSync that may interest you. Take a look, and be sure to rate the content if you find it useful.

    ----

    Client Access Policy Builder - http://gallery.technet.microsoft.com/scriptcenter/Client-Access-Policy-30be8ae2

    Short description:

    Hotfix Rollup Update 2 for AD FS 2.0 had a new feature for Office 365 customers named Client Access Policy. Some organizations may want to create policies that limit access to Office 365 services, depending on where the client resides. For example, you might want to:

    • Block all extranet client access to Office 365
    • Block all extranet client access to Office 365, except for devices accessing Exchange Online for Exchange Active Sync

    AD FS 2.0 provides a way for organizations to configure these types of policies. Office 365 customers using identity federation who require these policies can now use client access policy rules to restrict access based on the location of the computer or device that is making the request.

    Today, customers must manually implement Client Access Policy rules within AD FS 2.0, and it can be confusing. The goal of Client Access Policy Builder is to automate the creation of these rules for the most common use case scenarios.

    Microsoft Office 365 Federation Metadata Update Automation Installation Tool - http://gallery.technet.microsoft.com/scriptcenter/Office-365-Federation-27410bdc

    Short description:

    This tool can be used to automate the update of the Microsoft Office 365 federation metadata regularly to ensure that changes in the case of the token signing certificate configured in Active Directory Federation Services 2.0 are replicated to the identity platform automatically.

    Microsoft Office 365: DirSync - Count Total Synchronized Objects - http://gallery.technet.microsoft.com/office/Office-365-DirSync-Count-617d9268

    Short description:

    ‘DirSync - Count Total Synchronized Objects' shows total counts of users, groups, contacts, and grand total objects by extracting the FIM SourceAD Connector Space data to an XML file named SourceAD.xml.

    When an object makes its way past the Office 365 DirSync filters, they become 'synchronized holograms', and this tool parses the SourceAD.xml looking for synchronized holograms and counts the objects by object type.

    The Office 365 Deployment Readiness Tool (DRT) assumes that you have not yet deployed Office 365 DirSync, and it is giving you total object counts, without having the ability to see if those objects will actually make it past the DirSync filters. 'DirSync - Count Total Synchronized Objects' allows you, now that you are in production with DirSync, to see more accurate numbers of objects that have made it past the filters.

    ---

    Be sure to keep those questions coming. Yes, Ned is going to be busy for the next several weeks, but the entire Directory Services team is available to answer questions and point you to the best available resources.

    Jonathan “What’s a lackey?” Stephens

  • An ADFS Claims Rules Adventure

    [Editor's Note: This is a guest post from Steve Halligan, a Senior Premier Field Engineer.]

    Notes from the Field

    Customers can come up with some fairly complex requirements for access control. It can be a challenge to accommodate these requirements in an Office 365 world. The ADFS claims rule system in ADFS 2.0 UR1 provides some powerful options to implement these controls—and some limitations.

    The requirements:

    1. No one shall access email via Outlook when off the corporate network
    2. Members of a specific security group may not use ActiveSync
    3. Members of a specific security group may not access OWA off the corporate network
    4. All OWA users must log in via a forms based login

    It is important to note that the rule processing system always processes all rules. It is NOT a first match system. Because of this, the first rule is most always an “allow everything” rule followed by additional rules that block some access. Schematically, it could look like this:

    1. Allow everyone
    2. Block members of the group “Bigwigs” from access to OWA
    3. Allow the CEO access to OWA

    When the CEO attempts to log in to OWA: Rule #1 allows him, Rule #2 blocks him, and then Rule #3 allows him. It is the state of the last matching rule that determine his final outcome. If the CIO were to attempt to log in, he would only match rules 1 and 2 and therefore would be blocked.

    Requirement #1: No one shall access email via Outlook when off the corporate network

    This is a fairly common requirement in a lot of enterprise Exchange environments. Security folks get a bit freaked out by the thought that a user could set up Outlook on a home machine. Meeting this challenge is pretty straightforward with ADFS 2.0 claims rules.

    First, let’s review a bit how ADFS claims work in an Office 365 deployment. There are two flavors of ADFS claims requests: Active and Passive. When a claims request is created and submitted by the service (O365) it is an “active” request. This seems kind of counter-intuitive since you (the client) don’t have to do anything for these requests. The active/passive nature of the request refers to the participation of the _service_, not the client. Examples of O365 services that use active claims requests are Outlook 2007/2010 (RPC + HTTPS, EWS), Outlook 2011 for Mac (EWS), ActiveSync, Autodiscover and Lync Online.

    A passive claims request is when the service sends you off to get the claim yourself. The main examples of this in O365 are OWA and Sharepoint Online. When you try to log in to OWA in O365 (and you are using federated identity) your browser is redirected to your ADFS endpoint. There you provide your credentials (either via a web form, basic authentication pop-up or integrated auth.), get your token and then return to OWA with your token in hand.

    Back to the issue at hand: Blocking Outlook when client is not connected to the corporate network. To translate that into ADFS speak—We need to block active ADFS claims for the RPC+HTTPS and EWS services if the IP address doesn’t match a known set of corporate addresses.

    Logically, the rule will look like this:

    If {
         {ClientApplication is RPC
         Or
        
    ClientApplication is EWS}
        
    AND
        
    ClaimType is Active
        
    AND
        
    ClientIPAddress is not in <corporate ip address list>}
    THEN DENY THE CLAIM

    Now let’s translate that to ADFS Claim Rule language:

    exists([Type == "http://schemas.microsoft.com/2012/01/requestcontext/claims/x-ms-proxy"]) &&
    exists([Type == "
    http://schemas.microsoft.com/2012/01/requestcontext/claims/x-ms-endpoint-absolute-path
    ", Value == "/adfs/services/trust/2005/usernamemixed"]) &&
    exists([Type == "
    http://schemas.microsoft.com/2012/01/requestcontext/claims/x-ms-client-application
    ", Value == “Microsoft.Exchange.RPC|Microsoft.Exchange.WebServices"]) &&
    NOT exists([Type == "
    http://schemas.microsoft.com/2012/01/requestcontext/claims/x-ms-forwarded-client-ip
    ", Value =~ “<public NAT addresses>"])

    => issue(Type = "
    http://schemas.microsoft.com/authorization/claims/deny", Value = "true");

    Makes perfect sense, right? Oh…it doesn’t? Allow me to break it down:

    exists([Type == "http://schemas.microsoft.com/2012/01/requestcontext/claims/x-ms-proxy"])

    The ‘Type’ x-ms-proxy exists. This simply means that the claim came through an ADFS Proxy server (or other compatible proxy). Note: we are not checking the ‘value’ for this type, just that the type exists.

    exists([Type == "http://schemas.microsoft.com/2012/01/requestcontext/claims/x-ms-endpoint-absolute-path", Value == "/adfs/services/trust/2005/usernamemixed"])

    The type x-ms-endpoint-absolute-path exists and has a value of usernamemixed. This is the name of the endpoint for _Active_ ADFS Claims. Summary: This is an active ADFS claim.

    exists([Type == "http://schemas.microsoft.com/2012/01/requestcontext/claims/x-ms-client-application", Value == "”Microsoft.Exchange.RPC|Microsoft.Exchange.WebServices"])

    ClientApplication is RPC or WebServices. We can use this ‘or’ (using the ‘|’ character) syntax to check the value field.

    NOT exists([Type == "http://schemas.microsoft.com/2012/01/requestcontext/claims/x-ms-forwarded-client-ip", Value =~ “\b192\.168\.4\.([1-9]|[1-9][0-9]|1[0-9][0-9]|2[0-5][0-9])\b|\b10\.3\.4\.5\b"])

    The value for the type x-ms-forwarded-client-ip has a value that DOES NOT MATCH the regular expression “<public NAT addresses>”. That brings up two important questions:

    1. Where does that “x-ms-forwarded-client-ip” come from and what values should I expect to see there?
    2. What does the format of the regular expression look like?

    According to TechNet:

    “This AD FS claim represents a “best attempt” at ascertaining the IP address of the user (for example, the Outlook client) making the request. This claim can contain multiple IP addresses, including the address of every proxy that forwarded the request. This claim is populated from an HTTP header that is currently only set by Exchange Online, which populates the header when passing the authentication request to AD FS.”

    (http://technet.microsoft.com/en-us/library/hh526961(v=ws.10).aspx)

    So, the value is going to be the border IP address that Exchange Online (EXO) sees for the client. That would be either the border firewall doing NAT/PAT or the border Proxy server. Exchange Online add this IP to the ADFS claim request. Perfect for our Outlook scenario here: Outlook attempt to connect to EXO, EXO builds up a claims request that includes the client IP and heads out to the ADFS endpoint to submit the request.

    The second question is a bit easier (or perhaps a bit harder—regular expressions can get complicated) due to the fact that the regular expression format follows the general rules for regular expressions. The Internet is full of regular expression examples to filter IP addresses. For example, let’s say that your network has one block of addresses in use in a NAT pool: 192.168.4.0-192.168.4.255. You also have one satellite office with a single public IP address: 10.3.4.5. An expression you may use could be:

    “\b192\.168\.4\.([1-9]|[1-9][0-9]|1[0-9][0-9]|2[0-5][0-9])\b|\b10\.3\.4\.5\b”

    To break that down:

    \b192\.168\.4\.([1-9]|[1-9][0-9]|1[0-9][0-9]|2[0-5][0-9])\b applies to the 192.168.4.0-255 network.

    \b192\.168\.4\. matches 192.168.4.

    [1-9] matches address ending in 1-9

    [1-9][0-9] matches 10-99

    1[0-9][0-9] matches 100-199

    2[0-5][0-9] matches 200-259 (yeah…I know a few more than needed)

    The ‘|’ represent “or”

    \b10\.3\.4\.5\b applies to the 10.3.4.5 address.

    These can get tricky. I recommend you use a regular expression verification tool and test.

    Finally, if ALL of these conditions are true:

    => issue(Type = "http://schemas.microsoft.com/authorization/claims/deny", Value = "true");

    We deny the claim.

    If any one of the elements of the rule evaluate to false, the entire rule is skipped. So, if the client IS coming from one of the addresses that match the regular expression, they do not match this rule.

    Requirement #2: Members of a specific security group may not use ActiveSync

    The previous example illustrated how to allow or block users based upon where they are. This is a simple example of how to block users based upon who they are.

    exists([Type == "http://schemas.microsoft.com/2012/01/requestcontext/claims/x-ms-proxy"]) &&
    exists([Type == "http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid", Value =~ "Group SID value of allowed AD group"]) &&
    exists([Type == "http://schemas.microsoft.com/2012/01/requestcontext/claims/x-ms-client-application, Value == “Microsoft.Exchange.ActiveSync”])

    => issue(Type = "http://schemas.microsoft.com/authorization/claims/deny", Value = "true");

    The new element here from the previous example is the “groupsid” type. Yeah, you need to dive into AD and hunt down the SID of the group in question. As is hinted by the “=~” operator, you could create a regular expression that would match more than one group. You could also use the “==” operator and the “|” to do a multiple “or” match.

    That one was easy—sets us up well for the next. Which gets a bit…complicated.

    Requirement #3: Members of a specific group may only use OWA on the corporate network

    We built a rule above that did something very similar for Outlook, couldn’t we just add on or slightly alter that rule? Nope, we can’t. OWA login uses a passive ADFS claim, so the behavior is different. With Outlook, or other active claims, all requests come from O365 and land on the external ADFS proxy. To determine if a user is internal or external, we have to examine the “x-ms-forwarded-client-ip” value. With a passive claim request, like OWA, the client’s browser will be connecting directly to the ADFS endpoint (sts.contoso.com for example). So, we can control who gets in based upon where they are asking. If they ask the external proxies (and they are in the specified group) we say “no”. If they ask the internal ADFS servers we say “yes”.

    DACimage

    The rule to accomplish this would look something like this:

    exists([Type == http://schemas.microsoft.com/2012/01/requestcontext/claims/x-ms-proxy])
    && exists([Type == "http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid", Value =~ "S-1-5-21-299502267-1364589140-1177238915-114465"])
    && exists([Type == "http://schemas.microsoft.com/2012/01/requestcontext/claims/x-ms-endpoint-absolute-path", Value == "/adfs/ls/"])

    => issue(Type = "http://schemas.microsoft.com/authorization/claims/deny", Value = "true");

    Line 1: Request went through a proxy

    Line 2: User is a member of the specified group

    Line 3: This is a passive claim to the /adfs/ls/ endpoint

    Line 4: Deny

    If they hit the internal ADFS servers directly, line 1 would be false and they would be allowed in.

    Requirement #4: Oh…your solution to #3 doesn’t work because we want everyone to use forms-based authentication

    Well…shoot. Easy enough to fix, right? Send everyone to the ADFS proxy and add in a line to the above rule that specifies which client IP addresses are allowed. Something like:

    exists([Type == "http://schemas.microsoft.com/2012/01/requestcontext/claims/x-ms-proxy"])
    && exists([Type == "
    http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid
    ", Value =~ "S-1-5-21-299502267-1364589140-1177238915-114465"])
    && exists([Type == "
    http://schemas.microsoft.com/2012/01/requestcontext/claims/x-ms-endpoint-absolute-path
    ", Value == "/adfs/ls/"])
    && NOT exists([Type == "
    http://schemas.microsoft.com/2012/01/requestcontext/claims/x-ms-forwarded-client-ip", Value =~ “\b192\.168\.4\.([1-9]|[1-9][0-9]|1[0-9][0-9]|2[0-5][0-9])\b|\b10\.3\.4\.5\b"])

    => issue(Type = "http://schemas.microsoft.com/authorization/claims/deny", Value = "true");

    I hope the blaring red font illustrates the point that this will not work. Why not? Scroll up a bit to the section on blocking Outlook based on IP address. Notice what fills in that x-ms-forwarded-client-ip? EXO. In this example we are dealing with _passive_ ADFS claims. EXO is not creating this claim—the user is hitting the ADFS login page directly. If you turn on this rule, everyone in the specified group will be blocked no matter where they are coming from. The x-ms-forwarded-client-ip type does not exist at all, so that line will evaluate to true. In order to make it false (and thereby stopping the deny rule from firing on someone) the element would need to exist AND the value need to match the regular expression.

    If we can’t use the client’s source IP as a filter, how can we solve this problem?

    The answer lies in an ADFS claim element that we have been checking all along, but not checking it fully. Let’s look at the debug log (for more on turning on debugging for ADFS see: http://technet.microsoft.com/en-us/library/adfs2-troubleshooting-configuring-computers(v=WS.10).aspx ).

    Event ID 151 AD FS 2.0 Tracing:

    <Claims>
    ClaimType http://schemas.microsoft.com/ws/2008/06/identity/claims/windowsaccountname Value CONTOSO\USER1 ValueType http://www.w3.org/2001/XMLSchema#string Issuer AD AUTHORITY OriginalIssuer AD AUTHORITY
    ClaimType http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name Value CONTOSO\USER1 ValueType http://www.w3.org/2001/XMLSchema#string Issuer AD AUTHORITY OriginalIssuer AD AUTHORITY
    ClaimType http://schemas.microsoft.com/ws/2008/06/identity/claims/primarysid Value S-1-5-21-3640651473-4051545122-2937135913-1136 ValueType http://www.w3.org/2001/XMLSchema#string Issuer AD AUTHORITY OriginalIssuer AD AUTHORITY
    ClaimType http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid Value S-1-5-15 ValueType http://www.w3.org/2001/XMLSchema#string Issuer AD AUTHORITY OriginalIssuer AD AUTHORITY Value S-1-5-11 Value S-1-5-2 Value S-1-5-32-545 Value S-1-1-0 Value S-1-5-21-3640651473-4051545122-2937135913-513
    ClaimType http://schemas.microsoft.com/ws/2008/06/identity/claims/primarygroupsid Value S-1-5-21-3640651473-4051545122-2937135913-513 ValueType http://www.w3.org/2001/XMLSchema#string Issuer AD AUTHORITY OriginalIssuer AD AUTHORITY
    ClaimType http://schemas.microsoft.com/ws/2008/06/identity/claims/authenticationmethod Value
    http://schemas.microsoft.com/ws/2008/06/identity/authenticationmethod/password ValueType http://www.w3.org/2001/XMLSchema#string Issuer LOCAL AUTHORITY OriginalIssuer LOCAL AUTHORITY
    ClaimType http://schemas.microsoft.com/ws/2008/06/identity/claims/authenticationinstant Value 2012-04-19T17:32:41.459Z ValueType http://www.w3.org/2001/XMLSchema#dateTime Issuer AD AUTHORITY OriginalIssuer AD AUTHORITY
    ClaimType http://schemas.microsoft.com/2012/01/requestcontext/claims/x-ms-proxy Value adfs01p ValueType http://www.w3.org/2001/XMLSchema#string Issuer CLIENT CONTEXT OriginalIssuer CLIENT CONTEXT
    </Claims>

    We have been checking for the existence of the x-ms-proxy element, but we haven’t looked into its value. The value identifies the name of the proxy server that the request passed through. What if we could tell if a user was internal or external based upon which proxy server they came through?

    DACimage2

    With this change, internal OWA users will land on internal ADFS Proxy servers and external OWA users will land on external ADFS Proxy servers. That will allow us to add a rule like this:

    exists([Type == "http://schemas.microsoft.com/2012/01/requestcontext/claims/x-ms-proxy", Value =~ "\badfsp[0-9][0-9]\b"])
    && exists([Type == "http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid", Value =~ "S-1-5-21-299502267-1364589140-1177238915-114465"])
    && exists([Type == "http://schemas.microsoft.com/2012/01/requestcontext/claims/x-ms-endpoint-absolute-path", Value == "/adfs/ls/"])
    => issue(Type = "http://schemas.microsoft.com/authorization/claims/deny", Value = "true");

    Line 1: User is coming through an ADFS Proxy and that proxy has a name that matches ADFSP##

    Line 2: User is in the specified group

    Line 3: User is hitting the passive endpoint

    Line 4: Deny the claim

    If a user in the specified group presents a claim to ADFS from outside the network, all elements of this rule will be true and the claim will be denied. If the same user is inside the network and is using one of the internal proxies, line 1 will be false (the proxy name will not match ADFSP##) and the claim will be allowed.

    For illustration purposes, we can express the same thing in a slightly different way:

    NOT exists([Type == "http://schemas.microsoft.com/2012/01/requestcontext/claims/x-ms-proxy", Value =~ "\badfspi[0-9][0-9]\b"])
    && exists([Type == "http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid", Value =~ "S-1-5-21-299502267-1364589140-1177238915-114465"])
    && exists([Type == "http://schemas.microsoft.com/2012/01/requestcontext/claims/x-ms-endpoint-absolute-path", Value == "/adfs/ls/"])
    => issue(Type = "http://schemas.microsoft.com/authorization/claims/deny", Value = "true");

    Line 1: The user is NOT coming through an ADFS Proxy that matches ADFSPI##

    While you may not need to meet access control requirements this complex, I hope that these notes provide some enlightenment into the ADFS claim rule language.

    Steve Halligan, PFE

  • The Mouse Will Play

    Hey all, Ned here. Mike and I start teaching Windows Server 2012 and Windows 8 DS internals this month in the US and UK and won’t be back until July. Until then, Jonathan is – I can’t believe I’m saying this – in charge of AskDS. He’ll field your questions and publish… stuff. We’ll make sure he takes his medication before replying.

    If you’re in Reading, England June 10-22, first round is on me.

    image
    I didn’t say what the first round was though.

    Ned “crikey” Pyle

  • What's Causing that DFSR Change Storm?

    [This is another guest post from our pal Mark in Oz. Even if you don’t care about DFSR, I highly recommend this post; it teaches some very clever log analysis techniques, useful in a variety of troubleshooting scenarios – The Neditor]

    Hi there! It’s Mark Renoden – Premier Field Engineer in Sydney, Australia – here again. Today I’m going to talk about an issue where a customer's DFSR environment lost file updates and they’d see regular alerts in SCOM about replication backlog. While I was on site working with them, I came up with a few creative ideas about how to use the DFSR debug logs that led us to the root cause.

    The problem at hand was that a large DFS replication backlog would accumulate from time to time. Finding the root cause meant understanding the trend in changes to files in the replica. To do this, we needed to use the debug logs as our data source: to manipulate them so that they would tell the story.

    With the aid of some custom scripts and tools, and a lab environment, I’m going to simulate their experience and talk through the investigation.

    Test Lab Setup

    The test lab I’ve used for this post is pretty simple: I’ve got two file servers, DPS1 and DPS2 configured with a single replication group called RG1 that replicates C:\ReplicatedFolder between the servers. There are 100,000 files in C:\ReplicatedFolder.

    Prepare for Battle

    In Ned’s previous post Understanding DFSR Debug Logging (Part 1: Logging Levels, Log Format, GUID’s), the various options for debug log verbosity are discussed. For this scenario, the only one I’ll change is the number of debug log files. 1000 is generally a good number to choose for troubleshooting purposes:

    image

    Harvest the Data

    After reproducing the symptoms, we want to harvest the logs from the server that has pending replication. When on site at the customer, I just copied DFSR*.log.gz from the %windir%\debug folder, but the best possible practice would be to stop DFSR, copy the logs and then start the service again. This would prevent log rollover while you harvest the logs.

    After you copy the logs for investigation, they need to be un-g-zipped. Use your favourite gzip-aware decompression utility for that.

    Understand the Log Format

    Before we can mine the debug logs for interesting information, we need to look at what we’re dealing with. Opening up one of the logs files, I want to look for a change and to understand the log format –

    20120522 12:39:57.764 2840 USNC  2450 UsnConsumer::UpdateIdRecord LDB Updating ID Record:
    +       fid                             0x2000000014429
    +       usn                             0x693e0ef8
    +       uidVisible                      1
    +       filtered                        0
    +       journalWrapped                  0
    +       slowRecoverCheck                0
    +       pendingTombstone                0
    +       internalUpdate                  0
    +       dirtyShutdownMismatch           0
    +       meetInstallUpdate               0
    +       meetReanimated                  0
    +       recUpdateTime                   20120521 01:04:21.513 GMT
    +       present                         1
    +       nameConflict                    0
    +       attributes                      0x20
    +       ghostedHeader                   0
    +       data                            0
    +       gvsn                            {5442ADD7-04C7-486B-B665-2CB036997A67}-v937024
    +       uid                             {5442ADD7-04C7-486B-B665-2CB036997A67}-v615973
    +       parent                          {8A6CF487-2D5A-456C-A235-09F312D631C8}-v1
    +       fence                           Default (3)
    +       clockDecrementedInDirtyShutdown 0
    +       clock                           20120522 02:39:57.764 GMT (0x1cd37c42d5a9268)
    +       createTime                      20120516 00:41:05.011 GMT
    +       csId                            {8A6CF487-2D5A-456C-A235-09F312D631C8}
    +       hash                            00000000-00000000-00000000-00000000
    +       similarity                      00000000-00000000-00000000-00000000
    +       name                            file0000021380.txt
    +      
    20120522 12:39:59.326 2840 USNC  2453 UsnConsumer::UpdateIdRecord ID record updated from USN_RECORD:
    +       USN_RECORD:
    +       RecordLength:        96
    +       MajorVersion:        2
    +       MinorVersion:        0
    +       FileRefNumber:       0x2000000014429
    +       ParentFileRefNumber: 0x4000000004678
    +       USN:                 0x693e0ef8
    +       TimeStamp:           20120522 12:39:57.764 AUS Eastern Standard Time
    +       Reason:              Close Security Change
    +       SourceInfo:          0x0
    +       SecurityId:          0x0
    +       FileAttributes:      0x20
    +       FileNameLength:      36
    +       FileNameOffset:      60
    +       FileName:            file0000021380.txt

    What I can see here is a local database update followed by the USN record update that triggered it. If I can gather together all of the date stamps for USN record updates, perhaps I can profile the change behaviour on the file server…

    image

    The command above finds every line in every log file that contains USN_RECORD: and then excludes lines that contain a + (thereby eliminating occurrences of + USN_RECORD: as seen in the log excerpt above). Finally, it directs that output into USN.csv.

    Let’s open our CSV file in Excel and see what we can do with it.

    Graph the Data

    I'd ideally like to graph the data that I've got, to make it easy to spot trends. The data I have right now isn't super-easy to work with, so I'm going to sanitize it a bit, and then make a Pivot Table and chart from the sanitized data.

    Here is a single column of USN_RECORD: timestamps:

    image

    I’d like to figure out the rate of change on the file system for files in the replicated folder so I’ll use text to columns. I’m using a fixed width conversion and I’m going to split out my timestamp to the minute (so I can see how many changes per minute I have) and I’ll split USN_RECORD: off the end of the line so that I have something to count:

    image

    Now I’ve got columns like this:

    image

    I delete the columns I don’t need (A and C). My result is a column of timestamps down to the minute and a column of identical values (column B) which I can count to understand the rate of change:

    image

    To do this, I insert a pivot table. I simply select columns A and B and then choose PivotTable from the Insert menu in Excel.

    image

    Now I configure my PivotTable Field List as follows:

    image

    After configuring the PivotTable, it looks like this

    image

    All that’s left for me to do is to click on one of the row labels and to select a chart from the Insert menu. The resulting chart tells us quite a lot:

    image

    Here I can see that there are constant changes at roughly 230 per minute, and that for a two-hour window, the changes increase to about 1500 per minute.

    Conclusions so far

    For the entire duration of the logs, a roughly consistent level of change was occurring. However, for a two-hour window, lots of change was occurring. There are two possibilities here: either the cause of change has become more aggressive during this time or this chart represents two different activities.

    We need more investigation …

    Back to the Debug Logs

    I start by skimming a log that contains the timestamps from the two-hour window where we see many changes, and look at the USN record updates. Skimming through, I can see two different types of change:

    20120523 10:54:41.249 2840 USNC  2453 UsnConsumer::UpdateIdRecord ID record updated from USN_RECORD:
    +       USN_RECORD:
    +       RecordLength:        96
    +       MajorVersion:        2
    +       MinorVersion:        0
    +       FileRefNumber:       0x20000000175FA
    +       ParentFileRefNumber: 0x4000000004678
    +       USN:                 0xbf0ad430
    +       TimeStamp:           20120523 10:54:39.827 AUS Eastern Standard Time
    +       Reason:              Close Security Change
    +       SourceInfo:          0x0
    +       SecurityId:          0x0
    +       FileAttributes:      0x20
    +       FileNameLength:      36
    +       FileNameOffset:      60
    +       FileName:            file0000031231.txt

    And:

    20120523 10:54:41.249 2840 USNC  2085 UsnConsumer::UpdateUsnOnly USN-only update from USN_RECORD:
    +    USN_RECORD:
    +    RecordLength:        96
    +    MajorVersion:        2
    +    MinorVersion:        0
    +    FileRefNumber:       0x2000000019AD4
    +    ParentFileRefNumber: 0x4000000004678
    +    USN:                 0xbf0ad4f0
    +    TimeStamp:           20120523 10:54:39.843 AUS Eastern Standard Time
    +    Reason:              Basic Info Change Close
    +    SourceInfo:          0x0
    +    SecurityId:          0x0
    +    FileAttributes:      0x20
    +    FileNameLength:      36
    +    FileNameOffset:      60
    +    FileName:            file0000038828.txt

    Skimming a log that covers a timeframe with a low rate of change, I can only seem to find:

    20120522 23:28:54.953 2840 USNC  2453 UsnConsumer::UpdateIdRecord ID record updated from USN_RECORD:
    +    USN_RECORD:
    +    RecordLength:        96
    +    MajorVersion:        2
    +    MinorVersion:        0
    +    FileRefNumber:       0x2000000022440
    +    ParentFileRefNumber: 0x4000000004678
    +    USN:                 0x7e7f5188
    +    TimeStamp:           20120522 23:28:52.984 AUS Eastern Standard Time
    +    Reason:              Close Security Change
    +    SourceInfo:          0x0
    +    SecurityId:          0x0
    +    FileAttributes:      0x20
    +    FileNameLength:      36
    +    FileNameOffset:      60
    +    FileName:            file0000072204.txt

    Now I have a theory – Basic Info Change Close events only occur during the two-hour window where there are many changes and there’s an underlying and ongoing security change the rest of the time. I can prove this if I extract the timestamps for Basic Info Change Close changes and similarly, extract timestamps for Close Security Change changes.

    Looking back at the log entries, I can see I have a time stamp followed by a series of lines that start with a +. I need to parse the log with something (I chose PowerShell) that takes note of the timestamp line and when a Basic Info Change Close or Close Security Change follows soon after, return the timestamp.

    Here’s my PS script:

    $files = Get-ChildItem *.log

    $processingBlock = $False
    $usnBlock = $False

    foreach ($file in $files)
    {
        $content = Get-Content $file
        foreach ($line in $content)
        {
            if (!($line.ToString().Contains("+")))
            {
                $outLine = $line
                $processingBlock = $True
            }
            if ($processingBlock)
            {
                if ($line.ToString().Contains("+    USN_RECORD:"))
                {
                    $usnBlock = $True
                }
            }
            if ($usnBlock)
            {
                if ($line.ToString().Contains("+    Reason:              Basic Info Change Close"))
                {
                    $outLine.ToString()
                    $processingBlock = $False
                    $usnBlock = $False
                }
            }
        }
    }

    And:

    $files = Get-ChildItem *.log

    $processingBlock = $False
    $usnBlock = $False

    foreach ($file in $files)
    {
        $content = Get-Content $file
        foreach ($line in $content)
        {
            if (!($line.ToString().Contains("+")))
            {
                $outLine = $line
                $processingBlock = $True
            }
            if ($processingBlock)
            {
                if ($line.ToString().Contains("+    USN_RECORD:"))
                {
                    $usnBlock = $True
                }
            }
            if ($usnBlock)
            {
                if ($line.ToString().Contains("+    Reason:              Close Security Change"))
                {
                    $outLine.ToString()
                    $processingBlock = $False
                    $usnBlock = $False
                }
            }
        }
    }

    I run each of these (they take a while) against the debug log files and then chart the results in exactly the same way as I’ve done above.

    image

    First, Basic Info Change Close (look at the time range covered and number plotted):

    image

    And Close Security Change, below:

    image

    This confirms the theory – Basic Info Change Close takes place in the two hours where there’s a high rate of change and Close Security Change is ongoing.

    Root Cause Discovery

    If this is an ongoing pattern where the high rate of change occurs during the same two hours each day, I can capture both activities using Process Monitor.

    Once I have a trace, it’s time to filter it and see what’s happening:

    image

    Here I’ve reset the filter and added Operation is SetBasicInformationFile then Include. I chose SetBasicInformationFile because it looks like a good fit for the USN record updates labelled Basic Info Change Close. After clicking OK, my filtered trace has the answer…

    image

    As it turns out, the backup window matches nicely with the storm of Basic Info Change Close updates.

    Clearly, this is my own little application replicating behaviour but in the case of my customer, it was actually their backup application causing this change. They were able to talk to their vendor and configure their backup solution so that it wouldn’t manipulate file attributes during backups.

    Now all we need to do is identify the source of Close Security Change updates. Once again, I reset the filter and look for an operation that sounds like a good match. SetSecurityFile looks good.

    image

    What I found this time is that no entries show up in Process Monitor

    image

    What explains this? Either I chose the wrong operation or the filter is broken in some other way. I can’t see any other sensible operation values to filter with so I’ll consider other options. Looking at the filter, I realize that perhaps System is responsible for the change and right now, Procmon filters that activity out. I remove the exclusion of System activity from my filter and see what happens:

    image

    Aha! Now I’ve got something:

    image

    Now I need to understand what System is doing with these files. I right click the path for one of these entries and select “Include C:\ReplicatedFolder\file…”:

    image

    I also need to remove the filter for SetSecurityFile:

    image

    In summary, I’m interested in everything that happened to file0000033459.txt:

    image

    If I look at operations on the file that took place prior to SetSecurityFile, I can see a CreateFile operation. This is where System obtained a handle to the file. Looking at this entry, adding the Details column to Process Monitor and examining the fine print I find:

    mark1

    System is making this change in the context of the account CONTOSO\ACLingApp that just happens to be the service account of an application used to change permissions on resources in the environment.

    Conclusion

    The process I've described today is a good example of the need to Understand the System from my earlier post. The Event Logs - and even the debug logs - won’t always tell you the answer straight away. Know what you’re trying to achieve, know how to use the tools in your arsenal, and know how they can be made to produce the outcome you need.

    Knowing what I know now, I might have found the root cause by starting with Process Monitor but there’s a chance I’d have missed Close Security Change updates (considering that System is excluded from Process Monitor by default). I may have also missed the Basic Info Change Close updates if the tracing interval wasn’t aligned with the backup window. By mining the debug logs, I was able to establish there were two separate behaviours and the appropriate times to gather Process Monitor logs.

    - Mark “Spyglass” Renoden

  • Important Information about Remote Desktop Licensing and Security Advisory 2718704

    Hi folks, Jonathan here. Dave and I wanted to share some important information with you.

    By now you’ve all been made aware of the Microsoft Security Advisory that was published this past Sunday.  If you are a Terminal Services or Remote Desktop Services administrator then we have some information of which you should be aware.  These are just some extra administrative steps you’ll need to follow the next time you have to obtain license key packs, transfer license key packs, or any other task that requires your Windows Server license information to be processed by the Microsoft Product Activation Clearinghouse.  Since there’s a high probability that you’ll have to do that at some point in the future we’re doing our part to help spread the word.  Our colleagues over at the Remote Desktop Services (Terminal Services) Team blog have posted all the pertinent information. Take a look.

    Follow-up to Microsoft Security Advisory 2718704: Why and How to Reactivate License Servers in Terminal Services and Remote Desktop Services

    If you have any questions, feel free to post them over in the Remote Desktop Services forum.

    Jonathan Stephens