Welcome to TechNet Blogs Sign in | Join | Help
I'll be there in 2 microseconds!

Fantastic news! Mellanox has released the beta 2 version of their WinIB 1.4 stack, which works with HPC Server 2008 beta 2 and has Network Direct providers for their latest ConnectX cards. The results announced at ISC 08 are outstanding:

- 2 microseconds' latency

- 2 GB/s throughput

Another outstanding result for HPC Server 2008 is the Umea cluster, at n. 39 in the Top 500 list:

- 46.04 TFlops

- 85.6% efficiency

Hats off to the Network Direct team. Now Windows HPC Server 2008 plays with the big boys :-)

Powered by Qumana

Teched session on HPC in top 20!

Phil Pennington and I presented a session on cluster performance optimization at Teched 2008 in Orlando. It made it in the top 20 list by customer satisfaction!!!

To all those who were there and voted for us: Thank you!!!

To all those who were not there but would still like to know about it: leave a comment!

Powered by Qumana

How fast is this thing, really?!

Microsoft entered the HPC market a couple of years ago with a value proposition based on ease of integration, use and management. The comment we received most often sounded more or less like this: "This is all well and good, but how fast is this thing, really?". Well, here are a couple of impressive answers:

1. The NCSA cluster, running Windows HPC Server 2008 CTP on 1184 nodes (9472 cores), achieved 64.48 TFlops and 77.7% efficiency. This places it at n. 23 of the June 08 Top500 list.

2. The Aachen cluster, running the same build on 262 nodes, achieved 18.81 TFlops and 76.5% efficiency, which places it at n. 100 of that list.

Happy now? ;-) If you want to read more about the details, have a look at http://www.microsoft.com/hpc

Powered by Qumana

A Hybrid OS Cluster Solution

Thomas Varlet, of Microsoft France, and Dr. Patrice Calegari, of BULL SAS, have written an excellent paper on how to build hybrid clusters, i.e. clusters where 2 or more operating systems can be run at the same time. It is recommended reading, in my opinion, for those of us who use both Linux and Windows HPC solutions. You'll find the paper here.

UK HPC User Group Meeting

The UK HPC user group is meeting in London on June 26th, for what promises to be an interesting day at the Imperial War Museum.

This meeting is intended for customers, partners and developers to “meet and mingle”, compare notes and provide Microsoft with direct input into our product and offerings.  
On the day the attendees will hear:
-         The latest news on Windows HPC 2008
-         New software solutions in Finance, Engineering and Defence
-         The latest in MS enabling technology like Microsoft ESP & MS Robotics
-         The winners of the UK HPC student competition
-         Customer stories.
Attendees will also have the opportunity to tour the Imperial War Museum.

Please register here.  

The event is being organised by the UK Microsoft HPC User Group, chaired by Professor Simon Cox, School of Engineering Sciences of  the University of Southampton.

Powered by Qumana

Webcasts Again!

Hey all,

I'm doing another series of webcasts. 2 of them have already been aired, two more will happen shortly. Here are the topics:

- deployment and management

- high availability

- new scheduler features

- hpc server 2008 and linux

You'll find a link to register and summaries on:

http://www.microsoft.com/hpc/events.aspx

Affinity

Did you notice that the latest CTP has introduced a new option for mpiexec? Using mpiexec -affinity you can affinitize the mpi rank to the core where it is started, thus avoiding context switches. Your application will determine whether you actually benefit from affinitization or not. Some of them show a good performance improvement, some do not. In particular, if you have an MPI application that is also multi-threaded, the affinity option may backfire, because the affinity mask that you set for the process is inherited by default by all its threads. Thus, its threads may be stuck on 1 core. Windows offers other API calls to set thread affinity.

"Traditional", non multi-threaded MPI applications may be more straightforward. One important factor to take into account when deciding when to affinitize the process is the compute node architecture: is it NUMA or not? If it is, have you got enough RAM in the memory bank local to the core where the process will run? If not, you may incur frequent (and lengthy) remote memory accesses on the same hardware. In this case, it may be best to rely on the o/s scheduler to determine the ideal NUMA node for the thread.

Powered by Qumana

4x4 and other quick tips

I've recently been involved in a simple benchmarking exercise. Here are a few quick "rules of thumb" that have helped me:

- 4x4

A PCIe 4x slot is supposed to have 4 lanes capable of 250 MB/s each, for a total of 1 GB/s. An Infiniband SDR 4x card has 4 channels clocked at 2.5Gb/s, so a simple rule of thumb is: put an Infiniband card in the PCIe slot with the same number of channels. This is not a coincidence: Intel was part of the original Infiniband group.

BUT

Be aware that not all motherboards are equal, although in theory most of them use the same chipsets. In our case, we found out that the motherboard was not able to sustain more than about 600 MB/s on the PCIe 4x slot. We had to move the Infiniband cards to the 8x slots, where we could reach the expected 900 MB/s transfer rate of the card. The 8x slot on those motherboards is probably not capable of reaching its top speed either, but it is sufficient for the SDR 4x card. 

- Snoop Filters

A snoop filter is a mechanism to reduce traffic between different memory bus segments. It is particularly useful in multi-cpu, multi-core machines. Applications generally benefit from it, but there are some cases where latency-bound applications are adversely affected. If you see erratic behaviours in your latency tests (e.g. "random" high latencies in an otherwise consistent benchmark) and you have quad-core machines (especially early Clovertowns), try and disable the snoop filter in the bios. It may (or may not) help. Again, motherboards affect the results, as different components (with or without snoop filter) were used by different manufacturers.

New quad-core machines (Harpertown) have a snoop filter, but do not seem to show the symptoms mentioned above (at least those I've seen).

- Dynamic Power Management

It is generally NOT a good idea when you're trying to squeeze the last FLOP out of the CPUs. Disable it in the BIOS.

- MPI traffic

You may want to make absolutely sure that your MPI applications are using Infiniband; or you may want to run them once on Ethernet and another time on Infiniband, then compare the results. In any case, you can specify the network where MPI traffic will go at run time:

mpiexec -env MPICH_NETMASK <address>/<mask> <other parameters> <exe>

You may also want to make absolutely sure that your MPI traffic uses Network Direct, not winsock. You can:

- remove the Winsock provider. Coarse, but effective:

clusrun /<nodes> installsp -r

- run your application with

 mpiexec -env MPICH_DISABLE_SOCK 1 <other parameters> <exe>

Incidentally, you can install the Network direct provider with

clusrun /<nodes> ndinstall -i

HPC & Movies

In my last post I investigated how HPC can be used to build UFOs. This time, I've learned to my surprise that HPC can be used to make movies!

Digital media production follows a complex workflow, from initial sketch to wireframe model, to rendered 3D images, to movies. HPC is typically used in rendering, encoding or transcoding. I've done some research on the matter and posted the results here.

Let me know what you think of it.

Interested in HPC? How about UFOs?

To many people HPC is like UFOs: We there's somebody somewhere, but we don't really know what they're doing and where they fit in the grand scheme of things.

Here's my attempt at explaining them (UFOs AND HPC). Happy reading.

If you think I've smoked one too many - please leave a comment. Equally, please let me know if the article makes sense to you.

Seriously, Technet Edge is for professionals!

We do like our jobs and are serious professionals, you know!!


Special Announcement for TechNet Edge Visitors
Infiniband on HPC Server 2008 - again

After I published a document about the installation of Mellanox Infiniband on Server 2008, I have received some good feedback that deserves sharing. Note that:

The WinIB 1.4 beta available for download today on Mellanox's web site does not work with the HPC Server 2008 March CTP. We are working with Mellanox to fix that.

The procedure I illustrated in that document uses a "trick": Deploy the Mellanox package with msiexec first, add the Infiniband network to the cluster configuration later. This works both with our deployment tools and with 3rd parties'. However, one can exploit the built-in HPC Server 2008 tools better. Here's how:

1. Install the Mellanox WinIB package (currently WinIB_x86_1_4_0_2094.msi) on the head node. Set up the cluster network configuration to include Infiniband as the MPI network.

2. Create an o/s image and deployment template for the compute nodes.

3. In the Admin Console, right-click on the image and select Manage Drivers. You need 3 drivers for the card to be visible in the admin console, hence configurable:

  • ib_bus.inf Mellanox InfiniBand Fabric driver
  • mthca.inf InfiniBand Host Channel Adapter driver
  • netipoib.inf Mellanox IP over Infiniband protocol driver
You will find the first 2 files in the C:\Program Files\Mellanox\WinIB\Drivers on the head node after installation of the WinIB package. The last one will be in C:\Program Files\Mellanox\WinIB\IPoIB

4. Copy WinIB_x86_1_4_0_2094.msi package to %CCP_DATA%\InstallShare on the head node.

5. Edit the compute nodes deployment template and add an Installation->Unicast Copy from z:\<winib package>.msi to c:\<winib package>.msi; move the copy operation before the "Install CCP" task.

6. In the same template, add an Installation->Execute OS command: msiexec /i c:\<winib package>.msi /qn ADDLOCAL=ALL.

7. Deploy the compute nodes.

8. When they are deployed, you can start the Network Direct provider with  clusrun /nodes:<list of nodes> "%WinIB_HOME%\IPoIB\NDI\ndinstall.exe" -i
 

HPC Server 2008 webcasts - again!

Hello again,

I'll be doing a series of webcasts soon, hopefully on the feature-complete beta 2 of HPC Server 2008. They will be mostly demonstrations, with a few slides for those concepts that are not evident in the software. Here is the schedule (all times are PST):

- 5/9/2008 08:00 AM: Windows HPC Server 2008: Management and Diagnostics in High-Performance Computing (Level 200) 

- 5/23/2008 08:00 AM: Windows HPC Server 2008: High Availability and Diagnostics for High-Performance Computing (Level 200)

- 5/30/2008 09:30 AM: Windows HPC Server 2008: Job Scheduler and SOA in High-Performance Computing (Level 200)

Phil Pennington will also present on current efforts to develop a unified parallel programming model.

- 5/2/2008 08:00 AM: Future of Multi/Many-Core and the Convergence of Client and Cluster in Parallel Computing (Level 300)

Please click on the links to register.

Free Software from Microsoft? I cannot believe it!

But it is true! We have about 20 not-for-resale copies of Windows Compute Cluster Server 2003 SP1. If you want one, please register as a community member on windowshpc.net and send a note to volker.will(at)microsoft.com. The license terms will allow you to install 1 head node and up to 10 compute nodes; You cannot resell the software, but it is otherwise fully functional.

If instead you'd like to try Windows HPC Server 2008, please register for the public beta on the Microsoft Connect site.

Mellanox Infiniband on HPC Server 2008

One would assume that Infiniband on Windows is just going to be as easy as any other plug and play device installation. Well, in some cases it is. When you have some old cards and an old switch, no documentation, both not supported any longer and in an unknown state, it may not be! I am sure this experience is rather common, so I’ve decided to document what I did to make them work. In this case, I will focus on the Mellanox Infiniband stack, as it is the only one I could find that has public beta support for HPC Server 2008. Besides, Mellanox is the dominant provider of Infiniband hardware, even if it is then re-branded and re-sold by others.

The document is a bit long for a blog post, so I have put it  on Windowshpc.net

DISCLAIMER: This is not official guidance, just my notes. No guarantee is offered or implied. Your experience may differ, mileage may vary etc… etc… Also, I do not claim to be an Infiniband expert in any way. If you have comments, corrections, suggestions to make, please do so by responding to this post. I will update the document accordingly.

Tags: , ,

Powered by Qumana

More Posts Next page »
Page view tracker