posted Saturday, February 17, 2007 2:53 PM by dongarra | 1 Comments
FT-MPI is a full 1.2 MPI specification implementation that provides process level fault tolerance at the MPI API level. FT-MPI has been developed in the frame of the HARNESS (Heterogeneous Adaptive Reconfigurable Networked SyStem) project with the goal of providing the end-user a communication library containing an MPI API, which benefits from the fault-tolerance already found in the HARNESS system.
Currently, FT-MPI has been compiled under Cygwin, Windows Subsystem for UNIX Applications (SUA) and native Windows. There is presently no possibility to start the daemons automatically, as the only supported method (SSH) is not natively available in the Windows environment. However, once the daemons are manually started, we have been able to spawn as many applications as necessary. Also, as the daemons are started manually, security is provided by the Windows user log-on. We also only have current support for BSD-like TCP, i.e. using read and write. But as of yet, there is no support for any direct WinSock2 functions.
Most of our future work will be focused on Open MPI. We plan to tighten the security for starting the applications, to provide full support for the XML format supported by the windows batch scheduler, memory and processor affinity, support for the Windows registry, completely dynamic MPI libraries and internal modules. Moreover, we know that the performances can be improved by at least another 20% (and that's a minimum).
Additional information about FT-MPI can be found on the website – http://icl.cs.utk.edu/ftmpi/.
• Open MPI
Open MPI is an open source implementation of both the MPI-1 and MPI-2 documents and combines technologies and resources from several other projects (FT-MPI, LA-MPI, LAM/MPI, and PACX-MPI) in order to build the best MPI library available.
Currently and like FT-MPI, we have compiled Open MPI under Cygwin, Windows Subsystem for UNIX Applications (SUA) and native Windows. The most used and tested way to compile has been under native Windows. We have provided solutions and project files for Visual C Express, allowing us to compile Open MPI both as a static or a dynamic library. Support for C++, Fortran 77 as well as Fortran 90 is automatically built. We are also able to start daemons locally, using Windows functionality (spawn and/or CreateProcess) and we can start jobs on the cluster with CCS (using submit). However, so far the only available communication framework is on top of WinSock2, but work on Direct Socket is in progress. The Visual C compiler (VC) is used as a backend for mpicc, which allows us to compile the user applications in a normal environment.
Integration with the parallel debugger is in progress, however the lack of comprehensive documentation make this task difficult. We have the same problem for accessing the high performance socket interface. The sparse documentation available on MSDN or the Web does not provide enough insight for a smooth transition.
Performance results compared with the Microsoft MPI have shown that Open MPI performed faster over both shared memory and TCP, by a factor of ~10%. No application benchmark has been run in order to compare these 2 MPI implementations further.
Once the support for Direct Socket is completed, we will benchmark again and we expect a larger performance gap between these 2 MPI libraries. We still need to define the behavior of MPI in the event a failure occurs at the process level.
For more information about Open MPI, visit the website at -http://icl.cs.utk.edu/open-mpi/.