As Windows HPC Server 2008 R2 (SP1) Green IT offering, we have enabled the power management solution in Windows HPC Server 2008 R2 (SP1) Monitoring Management Pack with two configurable rules: “Calendar-based Power Management Rule” and “Consumption-based Power Management Rule”.
· Calendar-based Power Management Rule
With calendar-based power management rule, you are able to define a certain time period in a day when you want a specified portion of the compute nodes going hibernated to save the power; and also you can define this policy only applies for certain days in a week.
· Consumption-based Power Management Rule
With consumption-based power management rule, we are able to evaluate the cluster utilization over a time period with the number of queued jobs and make the decision on whether we should hibernate a portion of the compute nodes to save the power.
We have defined three levels of cluster capability and each time when the hibernate condition is met, the cluster will change from current capability level to a lower level; and on the other hand, when the wake up condition is met, the cluster will change from current capacity level to a higher level.
Configure the Rules
Both of the above rules are disabled by default, admin is able to enable them and configure them easily after importing Windows HPC Server 2008 R2 Monitoring Management Pack into SCOM server.
Open “System Center Operations Manager”, go to “Authoring” wunderbar, select “Rules”, look for keywords “Power Management”, then you can find these two rules listed as following.
Here lists a set of important configurations you are able to override for the two rules and their default value:
The rule is disabled by default
The time each day when power-saving mode for compute node starts.
The time each day when power-saving mode for compute node ends.
A list of days each week when compute nodes are excluded from entering power-saving mode. The “exclude days” format is like: “Saturday, Sunday”
Power On Percentage
The percentage of compute nodes that will remain power on during the power-saving mode
The percentage of high compute node capacity definition
The percentage of medium compute node capacity definition
The percentage of low compute node capacity definition
The length of the job queue above which the rule can cause the compute nodes to reach a higher capacity level
The length of the job queue below which the rule can cause the compute nodes to reach a lower capacity level
The compute node consumption percentage below which the rule can cause the compute nodes to reach a lower capacity level
Number of Samples
The number of samples to identify the LowConsumption which can push the compute nodes to enter a lower capacity level, the sampling interval is following “interval seconds”
The sampling interval, default is 300 seconds.
Power Saving Evaluation
To evaluate the Power saving efficiency and the impact on the job throughput, we conducted the following experiment:
(1) Setup an HPC cluster with 1 Head node, 1 broker node and 4 compute nodes.
(2) Setup the job submission simulation in one typical working day as following:
Also the job length is distributed as following:
(3) Compare the power saving efficiency and also the impact to job throughput for following three sceanrios:
a. Disable the power management rules;
b. Enable only the calendar-based power management rule;
c. Enable only the consumption-based power management rule.
We adjusted a little bit the configurations for both rules in the experiment:
· Calendar-based rule:
o Set StartTime to 22:00, EndTime to 7:00, PowerOnPercentage to 60%.
· Consumption-based rule:
o Set UpperQueueLength to 2.
Here are the experimental results:
· Power saving efficiency.
We use the # of hibernated nodes multiply the period of time to evaluate the power saving efficiency. By applying calendar-based rule, there are 2 nodes hibernated from 22:00 to 7:00, while applying consumption-based rule, 2 nodes get hibernated from 21:00 to 9:00. Both rules have saved some power for the cluster, while consumption-based rule worked better than calendar-based rule.
· Utilization on available cores
Consumption-based power management rule has achieved the highest utilization on available cores (49.1%), followed by calendar-based rule (47.1%) and no rule enabled (42.7%).
· Impact to job throughput
Job throughput measures the average number of completed jobs per hour for a day, and the throughput is the same for all three scenarios.
· Impact to job turnaround
Job turnaround measures how much time a job needs to wait compared to how much time it runs. Job turnaround increases a little bit for consumption-based rule (from 0.436 to 0.437), but very minimum; it remains the same for calendar-based rule and no rule enabled scenario.
Based on above evaluation, the power management rules are able to save the power effectively for the cluster without bringing noticeable impact on job throughput and job turnaround. It is able to help you achieve the Green IT goal for your cluster. J