Here are some notes on “SQL Server 2008 SQLOS” I took while attending an advanced class on SQL Server taught by Gert Drapers (from http://dbproj.com/ and http://blogs.msdn.com/gertd/).

Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without some pre-requisite knowledge and the right context. Reading the post from top to bottom will help.


Processes

  • Processes (context, access token, virtual address space descriptors, handle table)
  • SQL is a user mode process
  • Threads (unit of execution, access token, shares process address space, can synchronize)
  • Threads can have the same access token as process, or another one (impersonation)
  • Every process starts with one thread - “main”
  • Context switches
  • Fibers (lightweight thread, need a thread to schedule fibers, fibers share same thread context)

Scheduling

Hardware

  • Defining: Processor, CPU, Socket, Core, Logical Processor, NUMA, NUMA node
  • Example: Notebook, dual-core, hyper-threaded = 2 cores, 4 logical processor
  • Windows has 64 threads of execution in Windows Server 2008 – R2 will change that.
  • NUMA node – group of logical processors and cache that are “near each other”
  • SMP – Symmetric Multi-Processing, Front Bus to access memory, contention
  • SMP layout – CPU, Northbridge (memory, graphics), Shouthbridge (PCI)

NUMA

  • NUMA = Non-Uniform Memory Access / Architecture, ccNUMA
  • NUMA nodes – contain own RAM, CPUs, memory controller, talk to other nodes
  • Goal: minimize/eliminate front-bus contention
  • Performance penalty to access memory in other nodes (foreign node)
  • Applications need to be NUMA-aware to take advantage
  • Soft-NUMA – try out how things behave – only performs on NUMA hardware
  • Support for NUMA – BIOS setting: Node interleaving disabled = NUMA
  • How does the OS knows? ACPI and SRAT – Static Resource Affinity Table
  • If Multiple nodes > 1 – You have NUMA, Number of procs – actual sockets populated
  • Thread scheduling with NUMA (how “ideal processor” is chosen)
  • Important for the app to know the nodes and available memory on each
  • http://www.microsoft.com/whdc/archive/numa_isv.mspx

Windows Server 2008 and NUMA

Windows Server 2008 R2

SQL Server Scheduling

  • SQL Server 6.x scheduling – multi-threaded – worker thread pool – OS pre-empts threads
  • SQL Server 7/2000 – User mode schedulers, 1 per LP – UMS work queues – fewer content switches
  • - Users sessions assigned to UMS scheduler, round robin,  no re-balancing
  • - Bulk Insert is single threaded, not in parallel within the same scheduler
  • - Memory: memtoleave, threads, bpool – simple calculations to divide the memory
  • SQL Server 2005/2008 – SQLOS

SQLOS

SQLOS Architecture

  • SOS_OS – Singleton – Manages nodes, system level information
  • SOS_Node – 1 per NUMA node (or 1 in SMP) – Memory node, Scheduler Management
  • SOS_Scheduler – CPU abstraction, binds tasks and worker threads, manages affinity
  • Affinity.  See http://msdn.microsoft.com/en-us/library/ms187104.aspx
  • SOS_Tasks – Executes request from user, executes on a worker, abortable, timer tasks

Scheduler

  • Queue : Runnable / Pending / Timer / IO
  • Worker Pool, Current Work, Idle Worker, Wake Event, Abort Tasks List
  • States – New, Pending, Runnable, Running, Suspended, Preemptive, Done, Monitor
  • Non-preemptive – task runs until – yield, wait on sync. object, quantum expires
  • Preemptive – threads not in SQLOS control – external code – wait stat skewed
  • I/O completion – Worker on I/O completion port
  • See http://msdn.microsoft.com/en-us/library/aa365198.aspx
  • Scheduler Monitor – Preemptive/long running tasks, non-yielding, deadlock, maintenance
  • Dedicated Admin Connection – DAC – Only one, reserved memory at startup, own scheduler
  • DAC: Connection using sqlcmd.exe –a or server name = admin:servername
  • Backup– also has its scheduler
  • Ring buffer – Key construct for communication between multiple producers/consumers

DMVs

  • SELECT * FROM sys.dm_os_sys_info
  • SELECT * FROM sys.dm_os_schedulers
  • SELECT * FROM sys.dm_os_workers
  • SELECT * FROM sys.dm_os_tasks
  • SELECT * FROM sys.dm_os_waiting_tasks
  • SELECT * FROM sys.dm_os_wait_stats
  • - Look at wait_time_ms – signal_wait_time_ms, not just wait_time_ms
  • - Clear waits with dbcc sqlperf('sys.dm_os_wait_stats',clear)
  • - More detailed wait stats – In SQL Server 2008 with XEvents
  • SELECT * FROM sys.dm_os_threads
  • SELECT * FROM sys.dm_io_pending_io_requests
  • See http://msdn.microsoft.com/en-us/library/ms176083.aspx

OS Memory Management

  • Windows provide virtual memory services (except for AWE and PAE)
  • 32-bit gives you up to 4GB – 64-bit, in theory, up to 16 EB
  • Three buckets: committed memory, reserved memory (not physically used) and free memory
  • Committed: working set (RAM), paged out (pagefile) and mapped (shared components, DLLs)
  • If you try to access reserved or free memory: exception

Memory Concepts

SQL Server Memory

SQL Server Memory Management

  • Memory Cycle: Memory broker / resource monitor / cache / heap / memory allocation
  • Components: SQLSO / Memory Node / SOS_Node / Memory Clerk / Memory Object
  • Components: Buffer Pool / Resource Monitor / Caching Framework / Memory Pools and Brokers
  • Allocators: Single Page / Multi-Page / Large Page / …
  • See http://support.microsoft.com/kb/907877
  • Buffer pool: shared, every node has a piece of it.
  • Buffer pool: reserved up front, committed/mapped on demand. Has a clerk, but not an allocator
  • Stolen pages: piece of buffer pool “borrowed” for other purposes, like procedure cache
  • Memory clerk: The actual page allocator, the one that actually gives you the memory
  • Allows us to know where the memory is going, caches are memory clerks too
  • Memory objects: memory allocators for arbitrary size
  • Caching framework – common way to build caches
  • Cache Store – User Store – Clock Algorithm – Clock Hands – Clock Entry Info

DMVs

Resource Monitor

Memory Broker

Books of interest

Bonus topic: Extended Events