posted Friday, October 20, 2006 4:56 PM by DennisCr | 0 Comments

High Performance Compute Clustering with Windows

University of Tennessee

Innovative Computing Laboratory

Computer Science Department

Jack Dongarra

Windows Cluster Project

 

People

Jack Dongarra

George Bosilca

Dave Cronk

Julien Langou

Piotr Luszczek

 

Projects:

1.     Numerical Linear Algebra Algorithms and Software

a.     LAPACK, ScaLAPACK, ATLAS

b.    Self Adapting Numerical Algorithms (SANS) Effort

c.     Generic Code Optimization

d.    LAPACK For Clusters – easy access to clusters

2.     Heterogeneous Distributed Computing

a.     NetSolve, FT-MPI, Open-MPI

3.     Performance Evaluation

a.     PAPI, HPC Challenge, Top500

4.     Software Repositories

a.     Netlib

 

LAPACK

1.     Used by Matlab, Mathematica, Numeric Python,…

2.     Tuned version provided by vendors: AMD, Apple, Compaq, Cray, Fujitsu, Hewlett-Packard, Hitachi, IBM, Intel, MathWorks, NAG, NEC, PGI, SUN, Visual Numerics, by Microsoft and most of Linux distribution (Fedora, Debian, Cygwin,...).

3.     On going work: performance, accuracy, extended precision, ease of use

 

ScaLAPACK

1.     Parallel implementation of LAPACK scaling on parallel hardware from 10’s to 100’s to 1000’s of processors

2.     On going work: Match functionalities of current LAPACK

3.     On going work: Target new architectures, new parallel environment. For example port to Microsoft HPC cluster solution

 

LAPACK for Clusters (LFC)

1.     Most of ScaLAPACK functionality from serial clients (Matlab, Python, Mathematica)

 

FT-MPI and Open-MPI

1.        Define the behavior of MPI in event a failure occurs at the process level.

2.        FT-MPI based on MPI 1.3 (plus some MPI 2 features) with a fault tolerant model similar to what was done in PVM.

3.        Complete reimplementation, not based on other implementations.

a.     Gives the application the possibility to recover from a process-failure.

b.    A regular, non fault-tolerant MPI program will run using FT-MPI.

c.     What FT-MPI does not do:

4.     Recover user data (e.g. automatic check-pointing)

5.     Provide transparent fault-tolerance

 

Performance Application Programming Interface (PAPI)

1.     A portable library to access hardware counters found on processors

2.     Provides a standardized  list of performance metrics

 

KOJAK (Joint with Felix Wolf)

1.     Software package for the automatic performance analysis of parallel apps

2.     Message passing and multi-threading (MPI and/or OpenMP)

3.     Parallel performance

4.     CPU and memory performance


Posters for Related Projects

·         FT-MPI

·         HPCC

·         Kojak

·         LAPACK / ScaLAPACK

·         NetSolve / ActiveSheets

·         NetSolve / .NET

·         Open MPI

·         PAPI

·         top500

 

Hardware Configuration

Team HPC

Dual Core 4GB AMD Opterons

Team HPC Turnkey Beowulf-Class Supercomputer

26 4GB AMD Opteron DC Compute Nodes, 1 Head Node

CPU Manufacturer

AMD

CPU Model

Opteron 265

CPU Speed

1.8 GHZ

Number of nodes

26

Number of cores

2

Interconnect(s)

Infiniband, Myranet, GigE

 

Item Description

QTY

26 Compute Nodes

Supermicro H8DCE Motherboard

26

3U Chassis w/ 350W PS with PCI-E riser & Slide Rails

26

AMD Opteron 265 1.8GHz with Heatsink

52

4GB PC3200 Registered/ECC DDR

104

1Gb X4 Total memory

80GB 7200rpm SATA 8 MB cache HDD

26

ATI Rage on board

26

Dual Gigabit Ethernet Integrated on board

26

One Year Standard Warranty

26

Opteron Linux Installed and Tested

26

Built, Tested & Configured

26

Torque, Kick-Start Utility & Web-Based Mon. Software

Head Node 4Gb per Node

 

Supermicro H8DCE Motherboard

1

3U Chassis w/ PS and Slide Rails

1

AMD Opteron 265 1.8 GHZ with Heatsink and Fan

2

4GB PC3200 Registered/ECC DDR

4

1GB X 4 Total memory

DVD Combo Drive

1

ATI Rage on board

1

Dual Gigabit Ethernet Integrated on board

1

42U APC Rack Enclosure with perforated doors, sides and levelers

2

APC Masterswitch 3 Phase 208

2

Wiring Harness

1

1U All in one KB, VIDEO and MOUSE

1