• Troubleshooting System Center 2012 Virtual Machine Manager Object Locking Issues

    ToolsWelcome back to the System Center: Virtual Machine Manager Engineering Blog. Lately, a few customers have called wanting to know why some jobs are failing in System Center 2012 Virtual Machine Manager (VMM 2012). Those of us who have been working with the product for a while know this happens, and that it can happen for a variety of reasons, however one of the more perplexing job failures is associated with object locking as reported in this sample error message:

    Error (2602)

    "Unable to perform the job because one or more of the selected objects are locked by another job"

     

    Recommended Action

    To find out which job is locking the object, in the Jobs view, group by Status and then find the running or canceling job for the object. When the job is complete, try again.

    clip_image002

    Object locking errors can be reported as part of a large job that completes overall but still contains errors. An example of this would be a System-level (background job) VM Refresher job executed against a managed host supporting a large number of virtual machines. A job of this magnitude could report one or more failures but the overall job status can be 'Completed w/Info.' The error message in this case could be Error (23234) indicating a refresh of one or more virtual machines running on a host could not be completed due to object locking. An example is shown below.

    clip_image004

    Multiple error messages of this type may be listed in the job Summary tab:

    clip_image006

    In this article we’ll discuss a little about what is going on behind the scenes with the hope it provides some additional understanding.

    System Center Virtual Machine Manager Jobs

    Most activities executed in Virtual Machine Manager are tracked collectively as 'Jobs' in the Virtual Machine Manager database. In general, VMM jobs fall into two groupings: System jobs and user initiated jobs. System jobs include the default refreshers such as a Virtual Machine Refresh or a Host Cluster Refresh. Manual jobs are user-initiated as part of VMM defined role (e.g. Administrator, Fabric Administrator, Tenant Administrator, etc.) activity.

    clip_image008

    A job is a PowerShell script executed by the Virtual Machine Manager server using WinRM\WMI calls to a VMM agent running on a remote target to complete an operation or series of operations. For the most part, a VMM job is a combination of tasks, and potentially subtasks, executed asynchronously (i.e. in parallel).

    Virtual Machine Manager jobs either verify existing information (e.g. Virtual Machine periodic refresher), update existing information (e.g. adding a virtual hard disk to an existing virtual machine), or add new information to the SCVMM database (Add a Hyper-V Host, establish a connection with System Center Operations Manager server, etc.).

    Virtual Machine Manager Jobs may also require access to information that already exists in the VMM database. The pieces of information contained within the various tables that make up the database are loosely referred to as 'objects.' To gain access to these objects, VMM creates tasks and subtasks (parts of an executing job) which place locks on the object they need as part of task\subtask execution. In general, these locks are Read, Write, Delete, NoDelete or NoLock locks. Here is a table of lock compatibility within VMM:

    Lock Type

    Lock Compatibility

    NoDelete

    NoDelete, Read, Write

    Read

    NoDelete, Read

    Write

    NoDelete

    Delete

    No other lock type

    NoLock

    All other locks

    Here is an example of a Task executed as part of a Refresh a Service Instance job releasing all locks on a specific Service Instance identified by its ObjectID in the database:

    Line 122354: 121452,20:15:51.165 08-20-2014,0x1478,0x2BFC,4,CarmineObjectLock.cs,719,0x00000000,CarmineObjectLock - Release All; Task 8b7c6aa1-b269-4926-b4c6-4f53be5e2726(Refresher:Refresh a ServiceInstance) Releasing Write lock on ServiceInstance:f99308f7-2aef-4d92-a206-2f649b179432,{00000000-0000-0000-0000-000000000000},4,

    Addressing Job Failures due to Object Locking

    When you experience job failures due to object locking you usually just want to understand the failure. The recommendation in the failed job is to simply search the jobs list for any running jobs, but no running jobs are usually found that might account for or explain the error. I admit I would be confused too. Others, perhaps more familiar with Virtual Machine Manager, who have experienced job failures due object locking will typically just right-click on the ‘Failed’ job and restart it. Most times the job will then complete because the lock on the required object no longer exists (perhaps a competing job completed and released the lock).

    To gain a better understanding of this behavior, one needs to understand that not all VMM Jobs are visible in the Jobs View. As previously mentioned, there are jobs executed in the background by the VMM service itself (System Jobs) on pre-determined schedules. These are often referred to as ‘Refreshers’ and here are some examples:

    Refresher

    Default Setting

    Host Refresher

    30 minutes

    Full VM Refresher

    24 hours

    Cluster Refresher

    30 minutes

    Library Refresher

    User Configurable (Default = 1 hour)

    Storage Light Refresher

    2 hours

    Every job is an independent entity and executed asynchronously. If a system generated refresh job is executing in the background and a user manually executes a job, the potential for job failure due to object locking increases. If, however, a user manually executes a job and a system refresher job executes, the system job will fail silently if it is unable to lock the objects it needs and then try again at the next scheduled interval. User initiated jobs, on the other hand, follow the path of 'try and wait' like all user-initiated tasks. Eventually, that job times out or fails perhaps due to locked objects. A failed job is considered finished, and the result is recorded in the database.

    While there are different reasons for a job to fail, I hope the above information was useful in understanding how a job failure due to object locking might be due to multiple activities attempting to access the same database objects. In most cases, waiting a brief period will allow the conflicting activities to complete and you can restart the failed job. If a job continues to fail due to object locking, more in-depth troubleshooting is required beginning with the collection of a VMM trace. See the knowledge base article below for more information.

    2913445 - How to enable debug logging in Virtual Machine Manager (http://support.microsoft.com/kb/2913445)

    Thanks, and come back again soon.

    Chuck Timon | Senior Support Escalation Engineer | Microsoft Enterprise Platforms Support
    Dewitt Hurst | Senior Support Escalation Engineer | Microsoft Enterprise Platforms Support

    Get the latest System Center news on Facebook and Twitter:

    clip_image001 clip_image002

    System Center All Up: http://blogs.technet.com/b/systemcenter/

    Configuration Manager Support Team blog: http://blogs.technet.com/configurationmgr/ 
    Data Protection Manager Team blog: http://blogs.technet.com/dpm/ 
    Orchestrator Support Team blog: http://blogs.technet.com/b/orchestrator/ 
    Operations Manager Team blog: http://blogs.technet.com/momteam/ 
    Service Manager Team blog: http://blogs.technet.com/b/servicemanager 
    Virtual Machine Manager Team blog: http://blogs.technet.com/scvmm

    Microsoft Intune: http://blogs.technet.com/b/microsoftintune/
    WSUS Support Team blog: http://blogs.technet.com/sus/
    The RMS blog: http://blogs.technet.com/b/rms/
    App-V Team blog: http://blogs.technet.com/appv/
    MED-V Team blog: http://blogs.technet.com/medv/
    Server App-V Team blog: http://blogs.technet.com/b/serverappv
    The Surface Team blog: http://blogs.technet.com/b/surface/
    The Application Proxy blog: http://blogs.technet.com/b/applicationproxyblog/

    The Forefront Endpoint Protection blog : http://blogs.technet.com/b/clientsecurity/
    The Forefront Identity Manager blog : http://blogs.msdn.com/b/ms-identity-support/
    The Forefront TMG blog: http://blogs.technet.com/b/isablog/
    The Forefront UAG blog: http://blogs.technet.com/b/edgeaccessblog/

  • Taking a closer look at the Virtual Machine Manager cluster overcommit algorithm

    ~ Hilton Lange | Software Engineer

    ToolsOne common question we get here on the support team for System Center 2012 Virtual Machine Manager is regarding the ‘over-committed’ status and why it might be displayed. For example, you might see this when attempting to migrate a VM to a particular host, but the UI doesn’t elaborate on why this status was triggered or what you should do about it. In this article we explain the algorithm VMM 2012 uses in the hopes that you’ll have a better understanding of how this status is determined and what you can do if you see it.

    Overview of the Approach

    The SCVMM 2012 cluster overcommit check attempts to ascertain if there is any possibility that VMs will not be able to be restarted in the event of a simultaneous failure of R nodes, where R is the cluster reserve. The cluster is assumed to be overcommitted until proven otherwise. There are four different approaches tried, and if any one of them can show that the cluster is not overcommitted, then the cluster state is set to “OK”. Otherwise the cluster state is set to “Overcommitted”.

    The four approaches can be visualized in a table like this:

    image

    Proof Method

    This method works by measuring whether there are enough VMs to fill up all the hosts to a point where the largest VM will just barely fail to start on any of them. It considers the worst case where the largest VM is the last to be failed over, and again the worst case where every host has 1 byte too little memory to start that VM.

    Slot Method

    This method works by assigning each VM on a failed host to a single standard size slot equal to the size of the largest VM on all failed hosts. It then counts the number of available slots on each of the other hosts and checks that there are enough free slots to place all the VMs currently on failed hosts.

    Simple check

    This approach does not consider a specific set of hosts to fail, but rather makes worst case cluster wide assumptions. The largest VM size is chosen as the largest VM in the entire cluster. The failing-over VM sizes are not chosen from a specific set of hosts, but rather simply the theoretical highest sum we can achieve from R failing hosts. Likewise, the amount of memory or slots available on other hosts is the sum across the lowest N-R hosts (where N = cluster size).

    Full Complexity Check

    This approach iterates over every possible set of R failing hosts. It recalculates the slot size, largest VM size, target host memory sizes and slot count based on each possible combination of failing hosts. The number of sets that has to be considered is Choose(N,R), which can become prohibitively slow for large values of N and R. Because this is roughly proportional to N^R, this check is only run if N^R < 5000. What this means in practical terms, is that the full complexity check is only done in the following cases:

    image

    It should be noted that the full complexity check is only a marginal refinement over the simple check, falling back on the simple proof check offers very similar results.

    Precalculations and Definitions

    Cluster Values

    image

    Host Values

    The following values are precalculated for each host. When a value is calculated with respect to LargestClusterVMMB or SlotSizeMB, it is recalculated in each iteration of full complexity checks.

    image

    NOTES:

    1. A 64MB buffer is added to each VM’s memory to account for Hypervisor overhead.

    2. Stopped, saved state, paused and running VMs are all counted. A tenant user starting a stopped VM should be accounted for when calculating overcommit.

    3. If dynamic memory VMs are present in the cluster, their current memory demand is used.

    Algorithms

    Slot Simple

    - SlotSize = Largest HA VM in the cluster.
    - Calculate AvailableSlots, UsedSlots and TotalSlots for each host.
    - If Sum(UsedSlots) <= TotalSlotsRemaining, cluster is NOT overcommitted.

    Slot full

    Iterate over each set of R failing hosts.

    - SlotSize = Largest HA VM on the R failing hosts.
    - Calculate AvailableSlots, UsedSlots and TotalSlots for each host.
    - TotalSlotsRemaining = Sum of TotalSlots on all non-failing hosts.
    - If Sum(UsedSlots) > TotalSlotsRemaining, cluster may be overcommitted.
    - If Sum(UsedSlots) <= TotalSlotsRemaining for every set of failing hosts, cluster is NOT overcommitted.

    Proof Simple

    - LargestClusterVM = Largest HA VM in the cluster.
    - Calculate AdditionalMemory, HAVMs for all hosts.
    - TotalAdditionalSpace = Sum of smallest H values of AdditionalMemory.
    - TotalOrphanedVMs = (Sum of largest R values of HAVMs) – LargestClusterVM.
    - If TotalOrphanedVMs <= TotalAdditionalSpace, cluster is NOT overcommitted.

    Special case: If TotalOrphanedVMs is 0, LargestClusterVM > 0 and TotalAdditionalSpace = 0, then cluster may be overcommitted.

    Proof Full

    Iterate over each set of R failing hosts.

    - LargestClusterVM = Largest HA VM on the R failing hosts.
    - Calculate AdditionalMemory, HAVMs for all hosts.
    - TotalAdditionalSpace = Sum of AdditionalMemory on non-failing hosts.
    - TotalOrphanedVMs = (Sum of HAVMs on the R failing hosts) – LargestClusterVM.
    - If TotalOrphanedVMs > TotalAdditionalSpace, cluster may be overcommitted.
    - f TotalOrphanedVMs = 0, LargestClusterVM > 0 and TotalAdditionalSpace = 0, cluster may be overcommitted.

    If TotalOrphanedVMs < TotalAdditionalSpace for every set of failing hosts, cluster is NOT overcommitted.

    Combining the Methods

    Note that none of the methods attempt to show overcommitment. They can only show the reverse, that the cluster is not overcommitted. If none of the methods we use can show that we are not overcommitted, we are forced to flag the cluster as overcommitted. If even a single method shows that we are not overcommitted, we can flag the cluster as “OK” and cease calculations immediately.

    This is the opposite of the internals for the full complexity analysis. If even a single set of R failing hosts shows that the cluster may be overcommitted, that method is immediately done, having failed to show that the cluster is “OK”.

    Example

    This example is specifically designed to be a borderline case. Only one method (Proof Full) manages to show that the cluster is not overcommmited.

    Cluster has 4x 32GB hosts. Host memory reserve is set to 9GB. 64MB buffer is not added to VM size in this example, just to keep the numbers simpler. Cluster reserve (R) is set to 2.

    image

    Slot Simple Example

    - Slot size = 8GB

    image

    - TotalSlotsRemaining = 2 smallest values of TotalSlots = (1+3) = 4
    - TotalUsedSlots = 7

    Since TotalUsedSlots > TotalSlotsRemaining, the method has failed.

    Slot Full Example

    - TotalUsedSlots = 7, regardless of which hosts fail

    image

    Since some sets of failing hosts led to TotalUsedSlots > TotalSlotsRemaining, the method has failed.

    Proof Simple Example

    - LargestClusterVM = 8GB

    image

    - TotalAdditionalSpace = 2 smallest values of AdditionalMemory = 0GB + 5GB = 5GB.
    - TotalOrphanedVMs = (8GB + 8GB) – 8GB = 8GB.

    Since TotalOrpanedVMs > TotalAdditionalSpace, the method has failed.

    Proof Full Example

    image

    Since every set of failing hosts led to Orphaned – LargestVM <= AdditionalMemory, the method has succeeded, and the entire cluster can be marked as “OK”.

    Hilton Lange | Software Engineer | Microsoft

    Get the latest System Center news on Facebook and Twitter:

    clip_image001 clip_image002

    System Center All Up: http://blogs.technet.com/b/systemcenter/

    Configuration Manager Support Team blog: http://blogs.technet.com/configurationmgr/ 
    Data Protection Manager Team blog: http://blogs.technet.com/dpm/ 
    Orchestrator Support Team blog: http://blogs.technet.com/b/orchestrator/ 
    Operations Manager Team blog: http://blogs.technet.com/momteam/ 
    Service Manager Team blog: http://blogs.technet.com/b/servicemanager 
    Virtual Machine Manager Team blog: http://blogs.technet.com/scvmm

    Microsoft Intune: http://blogs.technet.com/b/microsoftintune/
    WSUS Support Team blog: http://blogs.technet.com/sus/
    The RMS blog: http://blogs.technet.com/b/rms/
    App-V Team blog: http://blogs.technet.com/appv/
    MED-V Team blog: http://blogs.technet.com/medv/
    Server App-V Team blog: http://blogs.technet.com/b/serverappv
    The Surface Team blog: http://blogs.technet.com/b/surface/
    The Application Proxy blog: http://blogs.technet.com/b/applicationproxyblog/

    The Forefront Endpoint Protection blog : http://blogs.technet.com/b/clientsecurity/
    The Forefront Identity Manager blog : http://blogs.msdn.com/b/ms-identity-support/
    The Forefront TMG blog: http://blogs.technet.com/b/isablog/
    The Forefront UAG blog: http://blogs.technet.com/b/edgeaccessblog/

  • KB: Error ID 1602 occurs and the Microsoft System Center 2012 Virtual Machine Manager Console does not start

    KB73343332

    When you try to start the Microsoft System Center 2012 Virtual Machine Manager (VMM 2012) console, the console does not start and you receive the following error message:

    Unable to connect to the VMM Management server server_name. The Virtual Machine Manager service on that server did not respond. Verify that Virtual Machine Manager has been installed on the server and that the Virtual Machine Manager service is running. Then try to connect again. If the problem persists, restart the Virtual Machine Manager service.
    ID: 1602

    You may also notice that the System Center Virtual Machine Manager service is stopped. When you try to start the service, you receive the following error message:

    Windows could not start the System Center Virtual Machine Manager service on Local Computer. The service did not return an error. This could be an internal Windows error or an internal service error. If the problem persists, contact your system administrator.

    Additionally, an error that resembles the following is logged in the Application log on the Virtual Machine Manager server:

    Log Name: Application
    Source: .NET Runtime
    Date:
    Event ID: 1026
    Task Category: None
    Level: Error
    Keywords: Classic
    User: N/A
    Computer:
    Description:
    Application: vmmservice.exe
    Framework Version: v4.0.30319
    Description: The process was terminated due to an unhandled exception.
    Exception Info: System.FormatException
    Stack:
    at System.DateTime.Parse(System.String, System.IFormatProvider)
    at System.Convert.ToDateTime(System.String)
    at Microsoft.VirtualManager.DB.ServerGlobalSettings.ReadServerData(System.Guid)
    at Microsoft.VirtualManager.DB.ServerGlobalSettings.get_Instance()
    at Microsoft.VirtualManager.Engine.VirtualManagerService.StartSQL()
    at Microsoft.VirtualManager.Engine.VirtualManagerService.ExecuteRealEngineStartup()
    at Microsoft.VirtualManager.Engine.VirtualManagerService.TryStart(System.Object)
    at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
    at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
    at System.Threading.TimerQueueTimer.CallCallback()
    at System.Threading.TimerQueueTimer.Fire()
    at System.Threading.TimerQueue.FireNextTimers()

    For all the details and a resolution, please see the following:

    KB3020448 - Error ID 1602 occurs and the Microsoft System Center 2012 Virtual Machine Manager Console does not start (http://support.microsoft.com/kb/3020448)

    J.C. Hornbeck | Solution Asset PM | Microsoft GBS Management and Security Division

    Get the latest System Center news on Facebook and Twitter:

    clip_image001 clip_image002

    System Center All Up: http://blogs.technet.com/b/systemcenter/
    System Center – Configuration Manager Support Team blog: http://blogs.technet.com/configurationmgr/
    System Center – Data Protection Manager Team blog: http://blogs.technet.com/dpm/
    System Center – Orchestrator Support Team blog: http://blogs.technet.com/b/orchestrator/
    System Center – Operations Manager Team blog: http://blogs.technet.com/momteam/
    System Center – Service Manager Team blog: http://blogs.technet.com/b/servicemanager
    System Center – Virtual Machine Manager Team blog: http://blogs.technet.com/scvmm

    Windows Intune: http://blogs.technet.com/b/windowsintune/
    WSUS Support Team blog: http://blogs.technet.com/sus/
    The RMS blog: http://blogs.technet.com/b/rms/

    App-V Team blog: http://blogs.technet.com/appv/
    MED-V Team blog: http://blogs.technet.com/medv/
    Server App-V Team blog: http://blogs.technet.com/b/serverappv

    The Forefront Endpoint Protection blog : http://blogs.technet.com/b/clientsecurity/
    The Forefront Identity Manager blog : http://blogs.msdn.com/b/ms-identity-support/
    The Forefront TMG blog: http://blogs.technet.com/b/isablog/
    The Forefront UAG blog: http://blogs.technet.com/b/edgeaccessblog/

    VMM 2012 R2

  • Help us get better - Join the VMM customer panel

    The System Center engineering team is looking for Virtualization Management customers who can provide feedback on pain points, preferences, and usage behavior.

    We are starting a customer panel for VMM and Hyper-V customers to help influence the future of the product.

    What are my commitments as a panel member?

    • 1 hour meeting once a week for 4 weeks

    • Actively share your views constructively on the conference calls

    • Completing questionnaires or taking part in surveys

     What do I get out of it?

    • Ability to influence the future of VMM and System Center

    • Direct access to the System Center engineering team

    • Improve a product you love

    The goal is to hear customer feedback frequently as development of features progress.

    If you are interested, please fill out the information here: https://www.surveymonkey.com/s/8WPG56N

    Thank you,

    Satya Vel