Hi dudes and dudettes! Adrian Corona here. I'm a transactional PFE at Microsoft and I'm really excited because this is my first blog post ever, it's a doosey so I hope you're not limited on your data plan.
I'll be adding to the MCM (Microsoft Certified Master) blog post series with a topic that doesn't get as much attention as other Active Directory aspects such as replication, although it is used every day - Indexing! I will only cover Indexing on current supported versions of Windows, Windows 2003 or higher. You have already upgraded your AD infrastructure, right?
Note: this applies to Active Directory Light-weight Services (ADLDS) or Active Directory Application Mode (ADAM) as well.
First of all, let's assume you got a call from the development team claiming that users are complaining about slow application responses (They see this:a lot!!), this application is highly tied to AD but there is not enough knowledge of what exactly this application is doing, probably because this application "was here long before any of the current IT members and not you nor the developers have any documentation" (any similarity with reality is mere coincidence); thus, faster than a bit can flip, you take out your keyboard, log on to your domain controller and bump Performance monitor.
Note: if you have SCOM you could also benefit from this (http://technet.microsoft.com/en-us/library/dd262067.aspx)
You start the fantastic, built-in Active directory Data collector set (available on Windows 2008+)
Then, request the users to do their job and wait to see the results:
After some time you get the results and see something like this:
Interesting, now let's see what is this all about:
Mmmh! I think this has to do with a query, you click on Active Directory – Search to see more info:
After seeing this top row queries you can then turn to your developer and with all pride say… you're doing an expensive, medial search, fix it! and continue working, but then he reminds you there is no documentation and nothing can be done about it as far as he knows and asks if anything can be done on your side.
Your answer is:.…. You got it! Indexing!
If your developer CAN make changes to your application, I’d recommend trying client-side optimization first (details below)
Indexing, in database terms, is creating a data-structure or a "precompiled list" of possible results for fast retrieval when a query is performed on a database table. This essentially means expediting searches around specific attributes.
Since I don't want to overwhelm you with too much information, I will discuss schema classes, attributes and options in a separate post. That will leave you with something to look forward to.
Active Directory, as my colleague David Gregory explained in his previous MCM blog post, is a database. When a server is promoted to a Domain Controller, the database is created and many data structures are created as well. Selected attributes in the Active Directory database are indexed to improve search performance (LDAP or otherwise).
Below is a table of some of the attributes that are indexed by default in the Active Directory database, (For the full list of attributes that are indexed by default refer to this technet article.):
Some attributes of this list are also part of the global catalog's Partial Attribute Set (PAS) and they are also configured for Ambiguous Name Resolution (ANR) which I will explain later in this post. Attributes in the ANR and PAS attribute set are those attributes that are commonly used to identify an object, such as the displayName and name.
One of the most frequent tasks performed by the Active Directory is queries. Queries are performed against the database thousands of times per day (our thousands of times per second, in some cases) for many purposes. Queries are performed for authentication, application information requirements, Group Policies, the Outlook address book and virtually every AD aware application.
Your application performance can be severely degraded when queries to Active Directory are returned slowly. Speeding up the response times of AD queries can have a huge (positive) impact on your application.
Let's assume my application is trying to retrieve AD objects whose descriptions meet specific criteria. For example, if the application can't target an exact value of a description, it might query with wildcards (this is called a medial search, where many values before and after the specified term can exist):
Let's test this type of query with my favorite ldap tool: ldp.exe
If you don't have ldp installed you need to add the RSAT tools for "Domain services" if using Windows 7 or Windows 8.
Note: ldp is automatically installed when you add the domain services role and/or promote a domain controller.
Open ldp and make a connection to my domain controller on the global catalog port (3268), (don't forget to bind)
Since I want to have detailed information on the query results, I choose Options-> Controls, and then choose Search Stats from the Load Predefined drop-down:
Now we construct our query by specifying a filter – (description=*test*). We will leave the Base DN blank, meaning we will start at the root of the directory:
For this to work correctly I need to modify the Search Options with the following settings:
By selecting the Extended call type, I specify that along with my query, I want to submit an extended operation. This will attach an object identifier (OID) to the query. In this case, the attached OID has already been specified in the Search Stats control, above. If we forget to select Extendedin the Search Options the controls we specified will be ignored. When I choose Run from the Search dialog, I see the following:
Let's analyze the additional information that was returned because we configured LDP to return search stats:
Call time: The time query took to execute. It appears that time is the amount of processor time converted to milliseconds, not actual time, if the server is hung doing something else, this time could vary significantly from actual time.
*Entries returned: The number of objects found that meet our criteria.
*Entries visited: Database hits or entries that were actually "touched" by the query. If the ratio between visited and returned entries is large, this is our first clue that our query might be inefficient. To make queries more efficient, we can try a strategy such as reducing the scope (specify a deeper search base), look for a different attribute or stop using wildcards. (See Query Optimization, below).
Used filter: The filter I used on my query.
Used indexes: indicates which database indexes were used by the DC to perform the search.
*Pages referenced: A reference on how many pages the DC had to look at to return results.
*Pages read from disk: From all referenced pages how many were retrieved from disk.
*Pages Pre-read from Disk: From all referenced pages how many were retrieved from cache.
*Clean pages modified: This counter will tell me how many clean pages (unmodified database page that is loaded into memory) the DC modified while retrieving this results.
*Dirty pages modified: This counter gives me the number of dirty pages that were modified by the DC while processing the search, in other words, a "double-dirtied page".
*Log record generated: This number refers to the pages that were modified after read from disk and placed on memory, Log records generated: Number of log records created by this query.
*Log record bytes generated: Size in bytes of the log records created by this query.
*For these controls to work correctly on a 2008, 2008R2 or 2012 DC, the SeDebugPrivilege (Debug Programs) is required to be set for the account used to run LDP on the security policy on the Default domain controllers Policy, by default, Members of the Administrators Domain group have that privilege, otherwise a 0 will be returned.
You can test by running "whoami /priv" on the DC:
If you want to add this privilege to a user, edit the Default domain controllers GPO
Browse to Computer Configuration \ Windows settings \ Security Settings \ Local Policies \ User Rights Assignment \ Debug Programs and add the desired user \ group to the list.
Client-side query optimization
Let's keep the same query criteria/filter, but let's assume that I'm looking only for users within one domain. So repeat the same query but target the LDAP port (389) instead of the GC port(3268) and specify the domain partition in the Base DN. Keep all the other configuration the same.
With this change the call time, pages referenced and entries visited were reduced by approximately 20%. It is important to mention that the number of pages read might change in subsequent queries since they may be cached from previous queries.
Now that we know all this, let's see how we can improve efficiency by modifying our query parameters. For example, change the Base DN to a specific OU where our objects of interest should reside and change the Scope from Subtree to One Level. Also, don't return all attributes, but only the "name" attribute:
We see that by implementing those basic changes, my call time and pages referenced were reduced dramatically. By selecting one level we reduce the number of entries visited and by specifying the attribute we limit the retrieved data, thus we can see a 61% reduction in the call time and a 13% reduction in the pages referenced.
If client side optimization isn't fast enough or we need to do a specific type of search that can be accelerated by an index (i.e. Medial searches) we can perform server-side optimization by indexing. We have 3 types of indexes:
Attribute Index: Indexes the value of a specified attribute. The data structure of the index contains all values of the attribute throughout the database. Therefore, an attribute index can take some time to create.
Containerized Index (PDNT): Indexes the value of the attribute relative to the name of the container. (An OU is an example of a container). Since the index is container-based, its size will be smaller and probably faster.
Tuple Index: This index is optimized for medial searches, where indexes are created with variations of the value. Our filter *test* is a medial search, since we are looking for strings that contain the characters "test" anywhere in the string. Note: this index is only valid when the search string is larger than 3 characters. Tuple Indexes are the largest type of indexes but since they contain a large set of results, they can optimize queries dramatically.
ANR: Ambiguous Name resolution. This is not an Index but it is very important search tool. If we add an attribute to the ANR "list", searches requiring ANR will evaluate the queried value against all ANR-defined attributes. One of the biggest consumers of ANR queries is Exchange server, yes! Those queries to the global address list when you type partial information and email lookups uses ANR queries against a global catalog to complete. If you wish to know which attributes are selected as ANR you can query the Schema for all objects (attributes) where the searchflags attribute is one of any of the possible values.
This is what it looks like:
Getting 17 entries:
There are other types of index that will only specify "hints" for creating the Index:
Subtree Index: This type of index will prepare the DC for performing a Virtual List View: a VLV is not an index, but an LDAP search operation that enables the client to request a sorted search with specific number of result before and after the actual result. You'll see a GUI being drawn dynamically with a small subset of records and the user or application can "browse" the results with different methods (scrolling, PAGE UP/DOWN, etc). This is a very expensive operation for Active directory and should be avoided as possible nevertheless you can ease some of the burden if you create an attribute index to the "VLVed" attribute. (Exchange 2010 requires VLV to be enabled for some operations http://technet.microsoft.com/en-us/library/dd638130.aspx)
You can turn indexes On or Off by modifying the searchflags attribute on the attribute definition (in the schema). The searchflags attribute is a signed integer which means it supports a range from negative 2,147,483,648 through positive 2,147,483,647 (go ahead and try any number between those ranges and see what you get); however it is best if we treat it as a 9-bit array in which every bit has a different purpose when turned on.
Back to our sample search. How can we leverage indexes to optimize searches for the description attribute? More specifically, let's assume we're making medial searches (description=*test*) as oppose to an initial string search (description=test*). So we need an index that is optimized for both cases: an attribute-level, medial search.
Using the information we discussed above, and the values in the above table, we need a Tuple index (32) and an attribute Index(1). So we need to specify our index with a value of 33 (32 + 1)
We need to set this value on the "description" attribute in the schema. You can try to use the Schema Management MMC, but you will only be able to configure an attribute and/or container indexes.
Reminder: You’ll need to have schema admin credentials and connect to the schema master role owner with the MMC, otherwise you’ll have this friendly warning:
Bummer… let’s try to use ADSIedit instead, and connect to the schema partition. You can bring it up from the Partitions container in the configuration partition:
Now that I made sure every requirement is met, I run ADSI Edit and connect to the Schema Partition, and then find the object that represents the description attribute. Edit the object and look for searchFlags. Set the value to 33 (decimal) or as shown here 21 in Hex. What I like about ADSI Edit is that it will translate the input number to the actual value that will be set upon clicking OK:
Once you accept, AD will create the index, you can confirm the index creation in the event viewer on Directory services, take note of the Index name and the error -1404 JET_errIndexNotFound, No Such Index which is expected.
Now we wait for confirmation that the index is complete with the event 1137, depending on the amount of data that has to be indexed this will take from seconds to hours.
Since this index modification is created in a domain controllers, this change has to be replicated and then the other domain controllers will rebuild the index at their own schedule, therefore the overall time required to consider this Index fully replicated depends on the AD topology as well, my recommendation is to monitor for the above events on the event viewer.
Let's see how our new index affects our query. If I execute the client-optimized query again I see the newly created Index is used:
In the Used Indexes entry you'll notice a new Index was used with 3 pieces of information:
If you watch carefully, the number of entries visited was reduced at least 90% versus the previous execution (16449), you'll see the number of entries visited vs. returned to be very if not similar which means a much more efficient query, call time was also reduced by 25%.
"With great indexes comes great responsibility". Your active directory database will grow proportionally to the amount of indexed data and the index type. Fortunately there are ways to know the size of your indexes by using NTDSUTIL; you'll need to stop your AD service (ntds) if using 2008 R2, if using 2003 you need to log in with DSRM (http://technet.microsoft.com/en-us/library/cc816897(v=WS.10).aspx)
Once you finished this step you use NTDSutil as follows:
ntdsutil: activate instance ntds
Active instance set to "ntds".
file maintenance: Space Usage
This will give you a database dump like the following:
The owned column shows the number of pages that particular index occupies, the Active Directory implementation of ESENT uses 8-KB pages Therefore, our new index consumed a colossal 1208 kb worth of disk space (8kb * 151 owned pages).
It is important to mention that this indexes will be consuming domain controller resources when invoked just as the rest of the AD database. And if the data in the indexed attributes changes rapidly, your DC performance will have an impact, I cannot tell you what exact impact you will have because it is based on many variables, however I strongly recommend you to do two things:
Both registry keys are located at: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Diagnostics
Instructions on how to modify this registry keys can be found on (http://support.microsoft.com/kb/314980)
Summarizing this is the improvements we noted on this exercise:
Client Side Optimization
Using Domain Only
Fine grained client parameters
Server Side Optimization
Server Side Index
Indexes are there for making your life easier, but before you run off and create dozens of new indexes, be careful, we've already discussed the potential impact on database size, disk space and memory utilization. Plus your AD slowness problems could be resolved adding more resources, additional DCs or even just fixing a faulty application. I'm not trying to discourage you but to make you think of the potential impact and benefit you'll have by creating new indexes.
Remember: Test and Benchmark! Only then can you understand the impact of indexing on DB growth and DC performance.
Thanks for reading…keep tuned.
Adrian "I only had a" Corona
Great first blog post. Good job!!!
Awesome article. Thanks a lot!
Fantastic post! The only suggestion I have is to keep all the posts in this "MCM: Active Direectory" series linked together so that each post in the series contains links to all the other posts. :)
This was advanced, nice post
Thank you Rafa! Keep tuned for the next post on the series.
Thank you for reading (and hopefuly not falling sleep) Miroslav.
That is a great suggestion, I added a link to the first post but we'll do that as a footnote shortly and as we get more post up in the series. Thank you very much for stopping by!
Thank you very much!
Good Work! :)
just one question...
From looking at the dit dump (from NTDSUTIL) how can you tell what the other indexes are used for?
Very good post Adrian! Do you all enable field engineering logging at every customer you visit?
WOW, Awesome article. Please write more articles :)
Thanks a lot!
Nice write-up Adrian. Thanks
Great Article. Thanks !
Hi Mike, thank you for your comments!
Since this settings will bring up additional information in your logs, we only enable them when we have the need for it, for example, if a query is slow or we are doing any performance analysis.
We don't want to flood your Eventlog and your monitoring systems :).
Thank you Patris!
I'll try to write this type of posts more often.