Fram resides in 11 cabinets or “racks”, each about two feet wide, three feet deep, and six feet high. Each rack holds up to 42U of equipment. The heaviest racks weigh approximately 1750 pounds apiece.
The racks are located in a specially built facility in the Division of Geological and Planetary Sciences at Caltech. They stand side-by-side in a long row, with one two-foot-wide walkway about halfway down the row. Altogether, then, the cluster occupies a floor area 23 feet long and 3 feet wide, and weigh about 20,000 pounds.
There are 314 compute nodes in active service, each with two processors or CPUs, each with 6 cores. The compute nodes act together as a unit, so that a single computational job could easily use 3768 cores, depending on the application software.1.
The compute nodes are HP DL390SL G7s. 252 of the compute nodes are identical. Each of these has:
- two Intel 6 Core Westmere processors
- 64-bit architecture (Nocona, EM64T)
- 48GB of RAM
- one hard disk
- one Mellanox QDR Infiniband card card
- model MT26438
- fiber interconnect
- one Gigabit Ethernet port
- Intel controller
- second port available
- iLo for remote management
The remaning 60 of the compute nodes have identical specifications to the others, but also each contain 3 NVidia M2090 GPGPUs. These cards are very good a certain types of calculations
Each card has:
- 514 cores
- 6 GB ECC Memory
- 1.3 GFlops peak single precision
- 640 GFlops peak double precision
Three HP DL 380 G7s serve as user login servers. These are where users perform their interactive work on the cluster. Typical user activities include editing program files, compiling applications, and launching and monitoring computational runs.
The login nodes, like the other infrastructure servers described in the following sections, are slightly more capable than the compute nodes. Each server has 48GB of RAM, redundant power supplies, and two hard drives connected to a hardware RAID 1 (mirroring) controller.
A single HP DL380 G7 serves as a central “master node” for the cluster. Several elements of cluster infrastructure run on or originate from this server. These include:
- centralized OS & software reinstalls for the compute nodes and login nodes
- PBS and Maui master
- external gateway
- 4 HP DL360 G7s as Object Store Servers
- 2 HP DL360 G7s as the Metadata Servers
- 2.67 GHz
- 64-bit architecture
- 24GB of RAM
- two hard disk
- One Fibre Channel Adapter card
- 2 Mellanox QDR Infiniband cards
- iLO for remote management
DataDirect Networks disk array
- Two DDN SAN Controllers
- model SFA10000
- in dual mode for failover
- 5 Disk Chassis
- 240 3TB drives
High Speed interconnect
The HPC network is specifically used for message passing amongst the nodes and connection of the filesystem. We use Mellanox QDR Infiniband 2G hardware running with OFED. We are using a single 314 port Voltaire switch.
We currently use 10 Gigabit ethernet out to the campus network
We currently use Gigabit ethernet ont he internal network. This network is only used for provisioning nodes and initially launching jobs
We currently use Gigabit ethernet ont he management network. This network allows us to connect to and control computers even when the system is not up.
Operating System Environment
The cluster software environment that we use, Rocks, is founded on RHEL2. Platform Rocks is software created by a community-based, open-source project. The Rocks project receives funding from the National Science Foundation.
Rocks bundles the operating system and a number of separate cluster tools, within a framework of cluster management software and databases designed specifically for Rocks.
Lustre is a fully-integrated, enterprise-class scalable POSIX compliant global parallel file system. It currently powers about 70% of the top supercomputers in the world.
We use the open source Torque resource manager and Maui scheduler