SQL Server SQLOS team blog

Collective blog for SQLOS team

  • How to configure Soft-NUMA on a system with > 32 processors?

     

    I recently answered this customer question and was looking forward to post the solution here. So here you go!

    There is already a lot of information about how Soft NUMA works and how to configure Soft NUMA on your system. So without delving into the basics, I am jumping right ahead to the topic.

     

    Soft Affinity Setting

    SQL Server 2000 and above supports the affinity option for greater than 32 processors (33 to 64) only on 64-bit operating system.

    If you are running SQL Server 2000 or above on a system with greater than 32 processors, then you can use sp_configure ‘affinity mask’ (32 bit mask) option for the first 32 CPUs and sp_configure ‘affinity64 mask’ (32 bit mask) option for the CPUs 33 to 64.

    Ex: Consider a machine with 33 CPUs. Setting the ‘affinity mask’ to 0x0000ffff and ‘affinity64 mask’ to 0x1 hard affinitizes CPU 1 to 16 and CPU 33 to ONLINE.

     

     

    sp_configure 'show advanced options', 1;

    RECONFIGURE;

    GO

    sp_configure 'affinity mask', 65535;

    RECONFIGURE;

    GO

    sp_configure ‘affinity64 mask’, 1;

    RECONFIGURE;

    GO

     

    Registry Settings  

    To map Soft NUMA nodes to greater than 32 CPUs, use a QWORD registry value to represent CPUs 1 to 64. Use the Registry Editor program (regedit.exe) to add the following registry keys to map soft-NUMA to CPUs as follows.

    Ex: Mapping Soft NUMA nodes to CPUs as follows:

     Node 0: CPUs 1 to 16

     Node 1: CPUs 17 to 32

     Node 2: CPUs 33 and 34

    SQL Server 2005

    Type

    Value name

    Value data

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\90\NodeConfiguration\Node0

    QWORD

    CPUMask

    0x 0000 0000 0000 ffff

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\90\NodeConfiguration\Node1

    QWORD

    CPUMask

    0x0000 0000 ffff 0000

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\90\NodeConfiguration\Node2

    QWORD

    CPUMask

    0x0000 0002 0000 0000

     

    SQL Server 2008

    Type

    Value name

    Value data

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\100\NodeConfiguration\Node0

    QWORD

    CPUMask

    0x 0000 0000 0000 ffff

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\100\NodeConfiguration\Node1

    QWORD

    CPUMask

    0x0000 0000 ffff 0000

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\100\NodeConfiguration\Node2

    QWORD

    CPUMask

    0x0000 0003 0000 0000

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

    Rest of the details about using Soft NUMA and mapping the registry key remains the same.

    If the registry is having faulty settings like overlapping NUMA node masks to CPU mapping or Soft NUMA mapping crossing hardware NUMA, then SQL Server ignores the Soft NUMA configuration.

     

    Good news is that the new version of SQL Server (after SQL Server 2008) will be supporting greater than 64 logical processors and will be using a more convenient way for setting affinity!

     

    -Harshitha Amit

    SQLOS Test Team

  • Debugging slow response times in SQL Server 2008

    Introduction

    The other day I came across a nice article written by Mario Broodbakker called "SQL Server Wait Events: Taking the Guesswork out of Performance Profiling" (see references) on how to troubleshoot slow response times in SQL Server 2000 and 2005 using the YAPP Method. According to the author, one of the biggest challenges to apply this method in SQL Server is that fact that the wait statistics are collected and exposed at the server level (server wide):

    ... "This is simply not enough, unless you have the luxury of testing your application and SQL statements on an isolated server" ..." But, of course, it's often hard or impossible to isolate and replicate production problems on an isolated test server." ...

    I decided to write this because in SQL Server 2008 the wait statistics are available at virtually any level, and (at the time of writing) I couldn't find any articles on the new sqlos.wait_info and sqlos.wait_info_external extended events.

    YAPP Method

    The YAPP Method: "Yet Another Performance Profiling Method" is based on a paper written by Anjo Kolk (see references). The basic idea of the method can be expressed with the following formula:

    (1)   Response time = service time + wait time

    This method for performance troubleshooting has been around for some time and it is a fairly standard practice when debugging slow response times.

    You can get the response time and the service time of a query by looking at sys.dm_exec_sessions, columns total_elapsed_time and cpu_time. Alternatively, you can use SQL trace or the more lightweight extended events in SQL Server 2008. Knowing response time and service time and applying formula (1), gives you wait time. So, we know how much we waited. However, in order to get the individual wait types we need the new extended events.

    Per session wait statistics in SQL Server 2008

    In SQL Server 2008 you can get wait statistics for a given session or statement by leveraging the new extended events. Optionally, you can limit the collection to a small set of wait types or collect only waits exceeding certain duration. If your event session is written carefully, you can use this technique on production systems with very little or no performance impact. In fact, SQL Server 2008 ships with a built-in event session, which is collecting wait statistics, and is running by default out of the box.

    Here is how to create an event session to collect wait types for session 51. The events will go to the specified trace file:

    create event session session_waits on server

    add event sqlos.wait_info

    (WHERE sqlserver.session_id=51 and duration>0)

    , add event sqlos.wait_info_external

    (WHERE sqlserver.session_id=51 and duration>0)

    add target package0.asynchronous_file_target

          (SET filename=N'c:\wait_stats.xel', metadatafile=N'c:\wait_stats.xem');

     

    Here is how to start the event session, run your workload, and then stop the event session:

    alter event session session_waits on server state = start;

    exec sp_workload;

    alter event session session_waits on server state = stop;

     

    Here is how to read the trace file:

    select * from from sys.fn_xe_file_target_read_file

          ('c:\wait_stats*.xel', 'c:\wait_stats*.xem', null, null)

     

    The result is a list of wait events, representing individual waits, having wait type and duration column. In order to make better sense out of this we need to group the events by wait type and calculate the total duration per wait type. The following code can do this:

    create view dbo.read_xe_file as

    select object_name as event, CONVERT(xml, event_data) as data

    from sys.fn_xe_file_target_read_file

    ('c:\wait_stats*.xel', 'c:\wait_stats*.xem', null, null)

    go

     

    create view dbo.xe_file_table as

    select

          event

          , data.value('(/event/data/text)[1]','nvarchar(50)') as 'wait_type'

          , data.value('(/event/data/value)[3]','int') as 'duration'

          , data.value('(/event/data/value)[6]','int') as 'signal_duration'

    from dbo.read_xe_file

    go

     

    select

          wait_type

          , sum(duration) as 'total_duration'

          , sum(signal_duration) as 'total_signal_duration'

    from dbo.xe_file_table

    group by wait_type

    order by sum(duration) desc

    go

     

    Example

    Hopefully this example will demonstrate the power of sqlos.wait_info and sqlos.wait_info_external events. I did some measurements by executing the following workload from two connections:

    create proc #sp_workload

          as

    begin

          SET nocount ON

          CREATE TABLE #t1000k (c2 char(1020));

          BEGIN TRANSACTION

          DECLARE @rows INT = 50000;

          DECLARE @row INT = 0;

          DECLARE @count INT = 0;

     

          WHILE @row < @rows

          BEGIN

                INSERT INTO #t1000k (c2) VALUES (REPLICATE('a',1000));

     

                SELECT @row = @row + 1

     

                IF @count > 10

                BEGIN

                      COMMIT

                      BEGIN TRANSACTION

                      SELECT @count=0

                END

               

                SELECT @count=@count+1

          END

         

          COMMIT

     

    END

    go

    While running the workload I had IOMeter (see references) so that I can add some disk pressure.

    Results for session 51: Apparently, most of the time was spent waiting for space in the log buffer to store log records - wait type LOGBUFFER. For complete list of all wait types with their descriptions search for sys.dm_os_wait_stats in MSDN or Books Online (see references).

    Response time Service time Wait time
    32619 1828 30791
    wait_type total_duration total_signal_duration
    LOGBUFFER 9662 66
    PREEMPTIVE_OS_FILEOPS 6442 276
    PREEMPTIVE_OS_FLUSHFILEBUFFERS 3833 150
    PREEMPTIVE_OS_WRITEFILEGATHER 3015 66
    SOS_SCHEDULER_YIELD 2481 2480
    WRITE_COMPLETION 1971 0
    IO_COMPLETION 1183 0
    PAGEIOLATCH_UP 551 0
    WRITELOG 417 0
    PAGELATCH_UP 30 27
    PAGELATCH_SH 10 10
    PREEMPTIVE_XE_CALLBACKEXECUTE 4 200

     

    Results for session 52: The result for this session is very similar. We can tell that most of the time was spent waiting. After accumulating the durations from all waits we get 30726, which is pretty close to the total wait time calculated using sys.dm_exec_sessions (31484).

    Response time Service time Wait time
    33890 2406 31484
    wait_type total_duration total_signal_duration
    LATCH_EX 16555 32
    LOGBUFFER 10796 28
    SOS_SCHEDULER_YIELD 2739 2729
    WRITELOG 605 0
    PAGELATCH_UP 26 25
    PAGELATCH_SH 5 5

     

    For the sake of completeness I also collected server-wide wait statistics, which are also available in previous version of SQL Server. The server-wide statistics clearly shows that we have IO bottleneck however you don't know exactly which session or query was affected and by how much:

     

    wait_type wait_time_ms signal_wait_time_ms
    LOGBUFFER 21666 127
    WRITELOG 17118 4
    LATCH_EX 16561 34
    PAGEIOLATCH_UP 7891 0
    SOS_SCHEDULER_YIELD 7058 7038
    PREEMPTIVE_OS_FILEOPS 6443 0
    WRITE_COMPLETION 5896 4
    SLEEP_BPOOL_FLUSH 3916 278
    PREEMPTIVE_OS_FLUSHFILEBUFFERS 3833 0
    PREEMPTIVE_OS_WRITEFILEGATHER 3016 0
    IO_COMPLETION 1680 0
    PREEMPTIVE_OS_REPORTEVENT 141 0
    PAGELATCH_UP 139 128
    PREEMPTIVE_OS_PIPEOPS 128 0
    PREEMPTIVE_XE_TARGETINIT 85 0
    XE_BUFFERMGR_ALLPROCESSED_EVENT 70 0
    PAGELATCH_SH 23 23
    PREEMPTIVE_OS_QUERYREGISTRY 16 0
    PREEMPTIVE_XE_CALLBACKEXECUTE 8 0
    PAGELATCH_EX 7 7
    PREEMPTIVE_XE_SESSIONCOMMIT 5 0
    LATCH_SH 1 0
    PREEMPTIVE_OS_GETDISKFREESPACE 1 0

     

    Complete test code

    Here is the complete source code of the test:

    use tempdb

    go

     

    --

    -- Enable xp_cmdshell to delete trace files

    --

    exec sp_configure 'show advanced options', 1

    reconfigure

    exec sp_configure 'xp_cmdshell', 1

    reconfigure

    go

     

    BEGIN TRY

          drop proc #sp_workload

    END TRY BEGIN CATCH END CATCH

    go

     

    create proc #sp_workload

          as

    begin

          SET nocount ON

          CREATE TABLE #t1000k (c2 char(1020));

          BEGIN TRANSACTION

          DECLARE @rows INT = 50000;

          DECLARE @row INT = 0;

          DECLARE @count INT = 0;

     

          WHILE @row < @rows

          BEGIN

                INSERT INTO #t1000k (c2) VALUES (REPLICATE('a',1000));

     

                SELECT @row = @row + 1

     

                IF @count > 10

                BEGIN

                      COMMIT

                      BEGIN TRANSACTION

                      SELECT @count=0

                END

               

                SELECT @count=@count+1

          END

         

          COMMIT

     

    END

    go

     

    BEGIN TRY

          drop event session session_waits on server

    END TRY BEGIN CATCH END CATCH

    go

     

    exec xp_cmdshell 'del c:\wait_stats*.xel', no_output;

    exec xp_cmdshell 'del c:\wait_stats*.xem', no_output;

    go

     

    declare @cmd nvarchar(max);

    set @cmd = N'create event session session_waits on server ' +

          N'add event sqlos.wait_info (WHERE sqlserver.session_id=' + convert(nvarchar(10), @@spid) + N' and duration>0) ' +

          N', add event sqlos.wait_info_external (WHERE sqlserver.session_id=' + convert(nvarchar(10), @@spid) + N' and duration>0) ' +

          N'add target package0.asynchronous_file_target (SET filename=N''c:\wait_stats.xel'', metadatafile=N''c:\wait_stats.xem'')';

    exec(@cmd);

    go

     

    BEGIN TRY

          drop table #snapshot

    END TRY BEGIN CATCH END CATCH

    go

     

    select total_elapsed_time, cpu_time into #snapshot from sys.dm_exec_sessions where session_id=@@SPID;

    go

     

    alter event session session_waits on server state = start;

    exec #sp_workload;

    alter event session session_waits on server state = stop;

    go

     

    select

          (es.total_elapsed_time - ss.total_elapsed_time) as 'Response time',

          (es.cpu_time - ss.cpu_time) as 'Service time',

          ((es.total_elapsed_time - ss.total_elapsed_time) - (es.cpu_time - ss.cpu_time)) as 'Wait time'

          from sys.dm_exec_sessions es, #snapshot ss where session_id=@@SPID;

    go

     

    drop event session session_waits on server

    go

     

    BEGIN TRY

          drop view dbo.read_xe_file

    END TRY BEGIN CATCH END CATCH

    go

    create view dbo.read_xe_file as

    select object_name as event, CONVERT(xml, event_data) as data

    from sys.fn_xe_file_target_read_file('c:\wait_stats*.xel', 'c:\wait_stats*.xem', null, null)

    go

     

    BEGIN TRY

          drop view dbo.xe_file_table

    END TRY BEGIN CATCH END CATCH

    go

    create view dbo.xe_file_table as

    select

          event

          , data.value('(/event/data/text)[1]','nvarchar(50)') as 'wait_type'

          , data.value('(/event/data/value)[3]','int') as 'duration'

          , data.value('(/event/data/value)[6]','int') as 'signal_duration'

    from dbo.read_xe_file

    go

     

    select

          wait_type

          , sum(duration) as 'total_duration'

          , sum(signal_duration) as 'total_signal_duration'

    from dbo.xe_file_table

    group by wait_type

    order by sum(duration) desc

    go

     

    --

    -- CLEANUP

    --

    exec xp_cmdshell 'del c:\wait_stats*.xel', no_output;

    exec xp_cmdshell 'del c:\wait_stats*.xem', no_output;

    drop view dbo.xe_file_table;

    drop view dbo.read_xe_file;

    drop proc #sp_workload;

     

    References

    Sys.dm_os_wait_stats

    The following MSDN article shows all wait types with their description: http://msdn.microsoft.com/en-us/library/ms179984.aspx

    IOMeter

    IOMeter is an IO stress and measurement tool developed by Intel. It is now an Open Source tool, and freely downloadable from:

    http://www.iometer.org

    SQL Server Wait Events: Taking the Guesswork out of Performance Profiling

    A very nice article by Mario Broodbakker on how to troubleshoot slow response times in SQL Server 2000 and SQL Server 2005:

    http://www.simple-talk.com/sql/performance/sql-server-wait-events-taking-the-guesswork-out-of-performance-profiling/

    Troubleshooting Performance Problems in SQL Server 2005

    Here also has a very good presentation on the subject and describes a lot of SQL Server 2005 wait events here:

    http://www.microsoft.com/technet/prodtechnol/sql/2005/tsprfprb.mspx

  • Part 2: Resource Governor CPU Demo on multiple CPUs

    In some cases CPU demo described in the previous post may behave not as expected.

    One such case is doing the demo on multiple CPUs. Even though it appears as Resource Governor is "not working", it is "by design". Let's take a look at what happens in this case.

    What is our promise about CPU bandwidth distribution?

    When you specify percentage on the pool we will ensure average CPU bandwidth distribution on per scheduler basis for CPU bound workloads. Note, emphasis: average, per scheduler, CPU bound. What does this mean?

    For example, you have a 2 CPU machine, and SQL Server runs on both of the CPUs. If you haven't set affinity mask as was indicated in the Setup step of the previous post,

    select scheduler_id, cpu_id, status, is_online from sys.dm_os_schedulers

    will get you something like:

    scheduler_id cpu_id      status                                                       is_online

    ------------ ----------- ------------------------------------------------------------ ---------

    0            0           VISIBLE ONLINE                                               1

    1            1           VISIBLE ONLINE                                               1

    257          0           HIDDEN ONLINE                                                1

    255          0           VISIBLE ONLINE (DAC)                                         1

    Important item to note here is that we have 2 online schedulers to service normal user requests (scheduler_id < 255). You can compare it with the case when affinity mask is set to 1. You will still have 2 schedulers (with scheduler_id < 255), but one of them will be offline.

    Now, for simplicity, let's configure 2 groups and 2 pools as follows:

    CREATE RESOURCE POOL pSlow

    WITH (MAX_CPU_PERCENT = 30)

     

    CREATE RESOURCE POOL pFast

    WITH (MAX_CPU_PERCENT = 70)

     

    CREATE WORKLOAD GROUP gSlow

    USING pSlow

     

    CREATE WORKLOAD GROUP gFast

    USING pFast

    GO

    And corresponding classifier function as

    CREATE FUNCTION f1()

    RETURNS SYSNAME WITH SCHEMABINDING

    BEGIN

          DECLARE @val sysname

          if 'UserSlow' = SUSER_SNAME()

                SET @val = 'gSlow';

          else if 'UserFast' = SUSER_SNAME()

                SET @val = 'gFast';

          return @val;

    END

    GO

    Create logins, assign function and make changes effective 

    CREATE LOGIN UserFast WITH PASSWORD = 'UserFastPwd', CHECK_POLICY = OFF

    CREATE LOGIN UserSlow WITH PASSWORD = 'UserSlowPwd', CHECK_POLICY = OFF

    GO

     

    ALTER RESOURCE GOVERNOR WITH (CLASSIFIER_FUNCTION = dbo.f1)

    GO

     

    ALTER RESOURCE GOVERNOR RECONFIGURE

    Add group counters for CPU Usage % and run 1 CPU intensive query UserSlow, then 1 with UserFast. You will observe a picture similar to this:

    But how can explain that the resource governor does not obey 70:30 distribution we specified for the resource pools? The answer is quite simple: the queries landed on different schedulers (= different CPUs) and thus, there is nothing to share between them and each of them is taking a full CPU. How can you verify that?

    First of all, verify the classification 

    -- verify the classification

    select

          sess.session_id, sess.group_id, CONVERT(NCHAR(20), grps.name) as group_name

    from

          sys.dm_exec_sessions as sess join

          sys.dm_resource_governor_workload_groups as grps on sess.group_id = grps.group_id

    where

          session_id > 50

    GO

    Where you will see something like

    session_id group_id    group_name

    ---------- ----------- --------------------

    52         2           default            

    51         265         gSlow              

    54         266         gFast              

    Everything is normal here. Now, let's take a look at how currently executing requests landed on schedulers:

    select

          r.session_id,

          CONVERT(NCHAR(20), wg.name) as group_name,

          t.scheduler_id,

          r.status

    from sys.dm_exec_requests r

          join sys.dm_os_tasks t on r.task_address = t.task_address

          join sys.dm_resource_governor_workload_groups wg on r.group_id = wg.group_id

    where

          r.session_id > 50

    GO

    Which will get

    session_id group_name           scheduler_id status

    ---------- -------------------- ------------ ------------------------------

    54         gFast                0            runnable

    52         default              0            running

    51         gSlow                1            running

    What you can see here is that gFast and gSlow is working on different schedulers, thus confirming our earlier statement. Thus Red and Blue line corresponds to utilization of 1 full CPU or 50%, since it is normalized to number of CPUs available on the box.

    Is this a problem with Resource Governor in general? Certainly not if you have more executing requests than you have CPUs or if queries are running in parallel. But you should keep this in mind if you are trying to give 90% of CPU bandwidth to a single query running serially compared to a parallel query or number of serial queries in another group/pool.

    A simple way of thinking about this is: is my workload capable of taking as much CPU as I told it to? Remember, when you set MAX_CPU_PERCENT on the pool, you allowing workloads belonging to the pool to take this much bandwidth on each scheduler. In case of the demo, you are working with a single query workload which produces this initially unexpected result.  

  • Part 1: Anatomy of SQL Server 2008 Resource Governor CPU Demo

    First, a quick introduction. My name is Boris Baryshnikov and I'm a program manager in SQL Server Database Engine group. As of today, Resource Governor is the most recent large project I worked on. Not surprisingly, I will talk about the Resource Governor here.

    While Resource Governor spans multiple components of SQL Server Engine, it seems logical to discuss the feature in this blog as the resource management in general is so close to the scope of SQLOS.

    Without further delay, let's get to point. As we have recently released a public CTP (Community Technology Preview) of SQL Server 2008 (aka SQL Server 2008 November CTP), which has Resource Governor functionality, and a number of presentations with Resource Governor demo on various conferences, I started getting a lot of questions about the demo itself which made me write this post.

    We created this demo when Resource Governor was only a prototype to illustrate the concept, but probably its simplicity made it so attractive that it was included in a number of talks and demos by different people all over the world.

    Background

    If you are looking for general concepts about Resource Governor, check out the Books Online which comes as a separate download or read it directly on MSDN Library.

    The Demo in a Nutshell

    In the demo we will create two workload groups which share a common resource pool and one workload group which has its own dedicated resource pool. Each group gets a CPU intensive workload. Using the Resource Governor you will be able to affect CPU distribution between these competing workloads. The effects of the resource management are observed using performance counters specific to Resource Governor.

    Step 1: Initial Demo Setup

    On a newly installed server (i.e. no prior Resource Governor configuration) you may need to run these setup steps. I normally doing my demo on a dual core laptop and for the sake of simplicity I'm using a single CPU for SQL Server. To do so, I adjust CPU affinity mask as follows:

    -- first enable advanced options in sp_configure

    sp_configure 'show advanced', 1

    GO

     

    RECONFIGURE

    GO

     

    -- use only 1 CPU on demo machine

    sp_configure 'affinity mask', 1

    GO

     

    RECONFIGURE

    GO

     

    Using 1 CPU for SQL Server on a dual proc machine has an interesting side-effect: we normalize "CPU usage %" counter to number of CPUs on the box and thus, the values will hover around 50% as maximum and not 100% as you might expect. I will illustrate this below.

    Of course you can use both CPUs but this complicates the demo. This is a subject for whole another post - what happens when you have multiple CPUs.

    In addition to that for demo purposes I will set min/max server memory to a fixed value, since it will improve predictability of the demo on the laptop.

    -- 512 Mb is suitable for laptop demo

    sp_configure 'min server', 512

    GO

    sp_configure 'max server', 512

    GO

     

    RECONFIGURE

    GO

    Step 2: Workload groups and Resource Pools

    Now we will be setting up the following hierarchy of workload groups and resource pools:

    Resource Governor Group and Pool hierarchy

    Each corresponding workload group contains queries of the corresponding class or department (i.e. Marketing, Adhoc, and VP). Note that, Marketing and Adhoc queries share the same resource pool, while workload group VP has its own similarly named pool. The reason of such separation will become clear as we see how we adjust Resource Governor controls.

    You will start building the above configuration in a bottom up manner (i.e. starting from pools and going up)

    To do so, we execute the following T-SQL:

    -- create user pools

    -- note that we are using all default parameters

    CREATE RESOURCE POOL PoolMarketingAdhoc

    CREATE RESOURCE POOL PoolVP

     

    -- create user groups

    -- also note that all groups created with default parameters

    -- only pointing to the corresponding pools (and not 'default')

    CREATE WORKLOAD GROUP GroupMarketing

    USING PoolMarketingAdhoc

     

    CREATE WORKLOAD GROUP GroupAdhoc

    USING PoolMarketingAdhoc

     

    CREATE WORKLOAD GROUP GroupVP

    USING PoolVP

    GO

    Step 3: Classification

    Now, what you have just done is created hierarchy of the groups and pools, however, how does the server know about which query goes where? This is where classification comes in. The above picture becomes:

    Group and Pool hierarchy with classification

    There is a couple of things:

    1. To do the classification you will need to create a user-defined function that will be executed for every new connection and it will place these new connections in the corresponding workload groups.
    2. How will we separate different connections? For demo purposes we will use 3 separate login names which we will check and use inside of the function

    To implement the above 2 steps we will run the following:

    -- classifier function should be created in master database

    -- switch to master unless you are there already

    USE master

    GO

     

    -- create logins to separate users into different groups

    -- note that we disabled strong password checking for demo purposes

    -- but this is against any best practice

    CREATE LOGIN UserMarketing WITH PASSWORD = 'UserMarketingPwd', CHECK_POLICY = OFF

    CREATE LOGIN UserAdhoc WITH PASSWORD = 'UserAdhocPwd', CHECK_POLICY = OFF

    CREATE LOGIN UserVP WITH PASSWORD = 'UserVPPwd', CHECK_POLICY = OFF

    GO

     

     

    -- now create the classifier function

    IF OBJECT_ID('DBO.CLASSIFIER_V1','FN') IS NOT NULL

           DROP FUNCTION DBO.CLASSIFIER_V1

    GO

     

    -- note that this is just a regular function

    CREATE FUNCTION CLASSIFIER_V1 ()

    RETURNS SYSNAME WITH SCHEMABINDING

    BEGIN

           DECLARE @val varchar(32)

           SET @val = 'default';

           if  'UserVP' = SUSER_SNAME()

                  SET @val = 'GroupVP';

           else if 'UserMarketing' = SUSER_SNAME()

                  SET @val = 'GroupMarketing';

           else if 'UserAdhoc' = SUSER_SNAME()

                  SET @val = 'GroupAdhoc';

           return @val;

    END

    GO

    Step 4: Are we there yet?

    After all this work, can we start workloads and see what happens? The answer, as you have guessed by the question is - no. What's left? Again, a couple of steps:

    1. We need to tell Resource Governor to use the function that we just created
    2. Make all the changes effective

    First step is done by

    -- make function known to the Resource Governor

    ALTER RESOURCE GOVERNOR

    WITH (CLASSIFIER_FUNCTION = dbo.CLASSIFIER_V1)

    GO

    For the second step, let's compare output of catalog views with in-memory information (note difference in names of catalog views and dynamic management views (DMVs) which are prefixed with dm_:

    -- metadata information

    SELECT * FROM sys.resource_governor_workload_groups

    SELECT * FROM sys.resource_governor_resource_pools

    SELECT * FROM sys.resource_governor_configuration

     

    -- in-memory information

    SELECT * FROM sys.dm_resource_governor_workload_groups

    SELECT * FROM sys.dm_resource_governor_resource_pools

    SELECT * FROM sys.dm_resource_governor_configuration

    Now transfer changes from metadata to memory by running the following statement. Also, do not confuse it with already existing RECONFIGURE command:

    -- make the changes effective

    ALTER RESOURCE GOVERNOR RECONFIGURE

    GO

    And rerun the above query on metadata and DMVs and you should see that new groups, pools and classifier function ID are present in corresponding DMVs.

    Step 5: Running the workloads

    The easiest way to simulate a CPU intensive workload is to run the following in a loop:

    set nocount on

    declare @i int

    declare @s varchar(100)

     

    set @i = 100000000

     

    while @i > 0

    begin

           select @s = @@version;

           set @i = @i - 1;

    end

    Also, instead of running this query from the Management Studio, consider saving it in a file and running from a command prompt by using a script similar to the below. Note that we are using 3 different user names to connect to the server.

    echo "Press any key to start Marketing workload"

    pausestart sqlcmd -S <your_server_name> -U UserMarketing -P UserMarketingPwd -i "CPU intensive loop.sql"

    echo "Press any key to start VP workload"

    pause

    start sqlcmd -S <your_server_name> -U UserVP -P UserVPPwd -i "CPU intensive loop.sql"

    echo "Press any key to start Adhoc workload"

    pause

    start sqlcmd -S <your_server_name>" -U UserAdhoc -P UserAdhocPwd -i "CPU intensive loop.sql"

    To observe the effects of the load, add the following performance counters in the perfmon:

    1. We will monitor CPU usage per group in the 1st instance of perfmon; add "SQLServer:Workload Group Stats object", "CPU usage %" counter for "GroupMarketing", "GroupAdhoc" and "GroupVP" instances
    2. We will monitor CPU usage per pool 2nd instance of perfmon, add "SQLServer:Resource Pool Stats object", "CPU usage %" counter for "PoolMarketingAdhoc" and "GroupVP" instances

    Before you start the next workload, observe the counters for pools and groups for a number of seconds, you should see approximately the following:

    For groups (click on the image to open in a new window):

    CPU usage % counters for groups

    For pools:

    CPU usage % for pools

    Few things to note:

    1. Remember I set affinity mask to 1? This is why the maximum on the figures is only 50% (it is for a single CPU while usage is normalized to all CPUs on the machine, 2 in my case)
    2. At point A, I started Marketing workload, CPU usage went to the maximum on the CPU for a single workload, 50% in this case
    3. At point B, I started VP workload, which, as you remember, belongs to a separate resource pool and it made Marketing group share half of the CPU with VP group. The same is true for the pools (because we have 1 to 1 match of active requests in groups to pools at this point)
    4. At point C, I started Adhoc workload, which has its own group but shares the pool with Marketing workload. What happened here on the groups is that all 3 of them are now sharing the CPU getting approximately 1/3 of it or roughly 17%. On the pool side, however, we can see that MarketingAdhoc pool which shows aggregate resource usage by all groups inside of it, has 2/3 or the CPU while PoolVP has only 1/3. This behavior is exactly what you get on SQL Server 2005. Resources are distributed as they are being requested and whoever needs more CPU simply gets it. This was one of the goals: to make behavior as close as possible to SQL Server 2005 when you do not use the Resource Governor or adjust any parameters of it. Note, however, by creating the groups and pools you are already slightly altering the behavior of SQL Server 2008, but more on this in the next few posts.
    Step 6: Management Actions

    Now we came to the point where we want to apply action to change the above picture. Specifically, we want our VP workload to proceed faster and thus, limit CPU usage by Marketing and Adhoc workloads to 50% of the CPU.

    To do this, we alter the PoolMarketingAdhoc using the following syntax (remember, we created the pool using all default parameters):

    -- adjust PoolMarketingAdhoc to not consume more than 50% of CPU

    ALTER RESOURCE POOL PoolMarketingAdhoc

    WITH (MAX_CPU_PERCENT = 50)

    Remember to make changes effective:

    -- make the changes effective

    ALTER RESOURCE GOVERNOR RECONFIGURE

    Now, let's look at the counters:

    For groups:

    CPU usage % for groups after management actions

    For pools:

    CPU usage % for pools after management actions

    What you will see is what happened at point D: Pool usage by PoolMarketingAdhoc went back to half of the CPU while PoolVP took the remaining part of it. Note that, on group side GroupVP usage went up to use half of the CPU (it has a single workload) while groups Marketing and Adhoc divide another half equally (12.5% each).

    Further step is to alter IMPORTANCE parameters of the Marketing and Adhoc groups. IMPORTANCE affects CPU distribution when two groups share the same pool. Value of importance has a numeric meaning for CPU bandwidth distribution. Ratio of Low:Medium:High IMPORTANCE is equal to 1:3:9.

    To illustrate this, let's run the following:

    -- alter importance of adhoc group

    ALTER WORKLOAD GROUP GroupAdhoc

    WITH (IMPORTANCE = Low)

     

    -- alter importance of marketing group

    ALTER WORKLOAD GROUP GroupMarketing

    WITH (IMPORTANCE = High)

     

    -- make the changes effective

    ALTER RESOURCE GOVERNOR RECONFIGURE

    When changes become effective corresponds to the point E on the above graphs. Note that distribution between pools does not change, amount of CPU available to GroupVP also does not change (there are no other groups in this pool), but GroupMarketing takes about 10 times of the bandwidth available to GroupAdhoc (remember the ratio of Low:High = 1:9, but together they stay within PoolMarketingAdhoc limits.

     

    That's enough for the first post. In the upcoming post I will try to go over the cases when the demo does not seem to work or shows "unusual" behavior and possible explanations of it.


© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker