When performing batch jobs where you wish to run processes in parallel it is important to understand how the scheduler works.
You can get a description of what the scheduler does by looking under the title ‘SQL Server Task Scheduling’ in Books on Line. However what you need to know is not at all obvious from reading this description. What you need to know are these five facts.
When an application talks to SQL Server it gets assigned a System Process ID (SPID)
1) SPID gets assigned to a UMS
2) There is one UMS per processor
3) A UMS schedules across Processors
4) A SPID stays on UMS for life of thread
5) Two busy SPIDS may fight on the same UMS
You now have the five essential facts, but do you have a useful conclusion? I’m not going to tease you, I shall merely illustrate the implied gotcha.
You have a 4 processor box and a bcp job to run. The job takes 10 processor minutes to run – how long does it take 4 bcp processes (each bound to a separate UMS) to process the job? Pretty easy isn’t it - 2.5 minutes (were obviously in an ideal world here).
So now imagine you’re a very enthusiastic DBA and realize the processors aren’t working to the max. You decide next time you’ll set everything up so that you can run five bcp processes and hopefully make better use of your processes. How long do you think the job will take with five bcp processes? Two minutes? Nope, more like 4 minutes.
Why? Well two of the bcp processes are going to get permanently stuck on one processor. Thus if the work is divided between each spid this will give two minutes of work to each spid. The two spids sharing one processor are now only going to get half the processor time, therefore it will take them twice as long to run – four minutes. QED, five spids thus take considerably longer to run than four
To get a deep understanding of the UMS have a look at the following:
Let me ask you a stupid question:
Suppose that I have a m/c with only one CPU. I open up 2 instances of Query Analyzer and log to SQL Server. This means that I have 2 connections with 2 SPIDS. Correct? In each of these 2 query analyzer windows, I run the following query:
declare @intVar as int, @intIndex as int
SELECT @intIndex = 0
SELECT @intVar = 11000000
WHILE( @intIndex < @intVar)
print 'Looping : ' + CAST(@intIndex as varchar(10))
SELECT @intIndex = @intIndex + 1
Does it mean that, after the first loop is done, the second loop will start? Or they will be executed simultaneously?
Both connections will execute simultaneously. Each connection will have a thread of execution that will be swapped on and off the processor continuously. The processor will 'time slice' or multiplex its work evenly between the two threads until they are both done. The time slicng is cntrolled by the OS scheduler.
What part does SQL UMS play in this scenario?
Its primary function is to keep as much of the SQL Server scheduling process as possible in user mode.