Hardware


Fram resides in 11 cabinets or “racks”, each about two feet wide, three feet deep, and six feet high. Each rack holds up to 42U of equipment. The heaviest racks weigh approximately 1750 pounds apiece.

The racks are located in a specially built facility in the Division of Geological and Planetary Sciences at Caltech. They stand side-by-side in a long row, with one two-foot-wide walkway about halfway down the row. Altogether, then, the cluster occupies a floor area 23 feet long and 3 feet wide, and weigh about 20,000 pounds.

Computational Elements


There are 314 compute nodes in active service, each with two processors or CPUs, each with 6 cores. The compute nodes act together as a unit, so that a single computational job could easily use 3768 cores, depending on the application software.1.

Compute Nodes

The compute nodes are HP DL390SL G7s. 252 of the compute nodes are identical. Each of these has:

GPU Nodes

The remaning 60 of the compute nodes have identical specifications to the others, but also each contain 3 NVidia M2090 GPGPUs. These cards are very good a certain types of calculations

Each card has:

Login Nodes

Three HP DL 380 G7s serve as user login servers. These are where users perform their interactive work on the cluster. Typical user activities include editing program files, compiling applications, and launching and monitoring computational runs.

The login nodes, like the other infrastructure servers described in the following sections, are slightly more capable than the compute nodes. Each server has 48GB of RAM, redundant power supplies, and two hard drives connected to a hardware RAID 1 (mirroring) controller.

Master Node

A single HP DL380 G7 serves as a central “master node” for the cluster. Several elements of cluster infrastructure run on or originate from this server. These include:

Shared Filesystem Elements


Lustre servers

DataDirect Networks disk array

Networks


High Speed interconnect


The HPC network is specifically used for message passing amongst the nodes and connection of the filesystem. We use Mellanox QDR Infiniband 2G hardware running with OFED. We are using a single 314 port Voltaire switch.

Commodity Network


Public Network

We currently use 10 Gigabit ethernet out to the campus network

Private Network

We currently use Gigabit ethernet ont he internal network. This network is only used for provisioning nodes and initially launching jobs

Management Network

We currently use Gigabit ethernet ont he management network. This network allows us to connect to and control computers even when the system is not up.


Software


Operating System Environment


CITerra runs Linux, specifically Red Hat Enterprise Linux (RHEL). All nodes in the cluster run the same release, currently RHEL 5.

The cluster software environment that we use, Rocks, is founded on RHEL2. Platform Rocks is software created by a community-based, open-source project. The Rocks project receives funding from the National Science Foundation.

Rocks bundles the operating system and a number of separate cluster tools, within a framework of cluster management software and databases designed specifically for Rocks.

Filesystem

Lustre is a fully-integrated, enterprise-class scalable POSIX compliant global parallel file system. It currently powers about 70% of the top supercomputers in the world.

Resource Management


We use the open source Torque resource manager and Maui scheduler


Facility