High-Performance Computing Cluster Apollo
Performance: 1.04 TFLOP/s (theoretical peak)
Manufactured by DELL in 2008
Let's see what it is made of...
Configuration: 50 compute nodes (200 processor cores), one master
node, and one NFS server, connected by two 1 Gigabit Ethernet
switches. Each compute node has dual 2.6 GHz Dual Core AMD Opteron™
processors, 4 GB of RAM, and a 160 GB local scratch disk. The master
node offers 584 GB (RAID 5) user space, dual 2.4 GHz Dual Core AMD
Opteron™ processors, and 4 GB of RAM. The NFS server provides 2 TB
(RAID 5) cluster scratch disk, and utilizes two 1.6 GHz Quad Core
Intel Xeon™ processors and 2 GB of RAM.
Software: Red Hat Linux, Platform OCS, Lava
Platform OCS
We use Platform OCS cluster software distribution. This
provides for a straightforward installation and for uncomplicated
maintenance. Here is a
link to the Platform OCS User Guide.
Lava
For job scheduling and cluster resource management, we use
Platform Lava. It is based on Platform LSF. It is easy to use and
allows students to get familiar with one of the popular work load
management systems. Please follow this
link for a description of how to use Lava.
Architecture and Purpose
As mentioned above, the main components of cluster Apollo
are the master node (the front end node), 50 compute nodes, one NFS
server, and the 1Gb Ethernet network provided by 2 switches.
The front-end (master) node is connected to the CCB
network and used for cluster user login and directories. It also
runs the job scheduling and cluster management software. One can
perform all work on the cluster form the master node. The front-end
also provides a Web based interface, which is accessible from the
CCB network using this
link. This
website offers current cluster resources allocation information,
Lava GUI, and cluster documentation.
The NFS server provides the network-attached
storage to all compute nodes, mounted under /apollo-io/scratch.
Please do not run any computing or data processing on this node.
All 50 compute nodes are used for running jobs as
scheduled by Lava batch system. The local /scratch partition on the
compute nodes is used for job data. Please remove the data after the
job is done.
Though the 1Gb/s network is supported by 2 Ethernet
switches, all 50 nodes build a single pool of computing resources
available to the job scheduling system. The stacking link between
the switches provides data bandwidth of 12 Gb/s. This cluster is
designed for loosely-coupled calculations, which require only a
moderate data bandwidth between compute nodes.