Untitled Document

TerraWulf II/III - Technical Specifications


The innards of Terrawulf: 158 blade nodes.

  • Introduction

Terrawulf is the High Performance Computing cluster in the Earth Physics area. Its role is to solve large complex computational problems in the Earth Sciences using parallel processing techniques.

  • Hardware

The TerraWulf cluster consists of 62 IBM x3550 compute nodes (Terrawulf III) and 96 IBM x3455 compute nodes (Terrawulf II) linked to an IBM x3650 head node and a separate data storage server. The cluster has a total of 1128 compute cores with 15TB local storage, 16TB shared storage and up to 8GB RAM/core. A separate testing/staging node is available for software compatibility testing and development.

  • Computational Elements

    • Compute Nodes:

        TIII
        50 Terrawulf III nodes are IBM System x3550M3 servers with :
      • Dual Intel Xeon X5650 6-core 2.66 GHz processors
      • 600GB SATA Hard Disk
      • 96GB ECC PC3-10600 DDR3 Memory
      • Integrated dual Gigabit Ethernet
      • 12 Terrawulf III nodes are IBM System x3550M4 servers with :
      • Dual Intel Xeon E5-2620v2 6-core 2.1 GHz processors
      • 1TB SATA Hard Disk
      • 128GB ECC PC3L-12800 DDR3 Memory
      • Integrated dual Gigabit Ethernet

        TII
        Each Terrawulf II node is an IBM System x3455 with :
      • Dual AMD Opteron 2220 Dual-core 2.8 GHz processors
      • 160GB SATA Hard Disk
      • 9GB (or 17GB for 24 nodes) ECC PC2-5300 DDR2 Memory
      • Integrated dual Gigabit Ethernet
      • Voltaire Infiniband card on PCI-E slot (for 48 nodes)

    • Head Node

        The server head node is an IBM System x3650M4 with :
      • Dual Intel Xeon E5-2620v2 6-core 2.1 GHz Processors
      • 48GB ECC PC3L-12800 DDR3 Memory
      • 3x500GB & 1x250GB SATA disks
      • Integrated dual Gigabit Ethernet with TCP/IP Offload Engine
      • Voltaire Infiniband card on PCI-E slot
      • 22TB IBM DS3200/EXP3000 nearline storage system
        - 12x2TB + 12x750GB SATA in Raid5 with dual SAS controllers and dual path to server.

    • Data server

        The data management server is an IBM System x3550M3 with :
      • Dual Intel Xeon E5640 Quad-core 2.66 GHz Processors
      • 48GB ECC PC3-10600 DDR3 Memory
      • 2x 250GB SAS disks
      • 2x 100GB Intel 710-series SSD write cache
      • 2x 120GB Intel 510-series SSD read cache
      • Integrated dual Gigabit Ethernet
      • 15TB JBOD online storage system - 12x 2TB SATA in Raid5.

    • Network

    • All the nodes are interconnected through four UPS supported HP2920 Gigabit ethernet switches running jumbo frames with 10GBit links to the head node and data server. Their management ports are separately connected through a dedicated management LAN. In addition, half (48) of the TII cluster nodes are also inter-connected via three 24port Voltaire ISR9024S Infiniband switches providing 10Gbit inter-process communication.

    • Software

    • The head node and all compute nodes run Open SUSE Leap 42.3
      The resource manager used is TORQUE and the parallel shell is PDSH. The primary MPI environment is OpenMPI, but MPICH2 and VLTMPI are also available.  Installed programming software includes Intel Fortran and C, GNU GFortran/GCC, and Python v2.7.13 and v3.4.6 The cluster is monitored with GANGLIA

    • Help & support Online resources:

    • OpenMPI : http://www.open-mpi.org
      MPICH2 : http://www.mcs.anl.gov/research/projects/mpich2/
      VLTMPI: http://mvapich.cse.ohio-state.edu/
      TORQUE: http://www.adaptivecomputing.com/products/open-source/torque/
      PDSH: https://code.google.com/p/pdsh/
      Ganglia: http://ganglia.sourceforge.net/

       

      Hardware Diagram: Click to Enlarge


    1Specifications for the original Terrawulf (TI) can be found here

Updated:  7 September 2015/Responsible Officer:  Director, RSES /Page Contact:  Web Admin