posted Thursday, March 29, 2007 7:54 PM by dongarra | 0 Comments
The Epilog tracing library (part of the KOJAK toolset [1,2]) has been ported to the Microsoft Compute Cluster platform. The library can capture traces from MPI applications written in either FORTRAN or C/C++. The traces can be used to visualize and analyze the message passing behavior in Vampir (Intel Trace Analyzer) after converting traces to VTF format and to automatically search for patterns of inefficient execution with KOJAK/Scalasca .
1. Download the package from here:
2. The package contains the Epilog tracing library (epilog.dll) and the static imports library (epilog.lib) for both the 32-bit and 64-bit versions. The package also includes a utility (elg_merge.exe) to merge the process-local tracefiles into a single coherent tracefile.
3. Link your target application with the appropriate version of the Epilog library instead of the Microsoft MPI library (msmpi.lib), adjust the include paths in your project as appropriate.
4. When executing the target application, make sure the Epilog tracing library (epilog.dll) is in the executable path, i.e., place it in a systems directory or in the same folder as the target application executable. Also make sure that the program elg_merge.exe is in a standard executable directory (such as %SYSTEMROOT% or resides in the same folder as the target application executable
5. Execute the target application as a normal mpi job, e.g., mpiexec -n 2 myapp.exe, etc.
6. Each MPI process writes process-local tracefiles. Opon program termination, elg_merge.exe is automatically invoked to merge the tracefiles into a single tracefile, called a.elg located in the same directory as the target application.
7. The a.elg file is a binary tracefile. Use the tools of the KOJAK [1,2] or SCALASCA  toolset on a *nix machine to analyze it: - Use elg2vtf to convert the trace file to a VTF compatible tracefile that can be visualized using Intel Trace Analyzer (Vampir)  - Use Expert to automatically search for patterns of inefficient execution (such as late-sender, wait-at-barrier, etc.)