The LoBoS supercomputer is a Beowulf class supercomputer managed by the Laboratory of Computational Biology in the National Heart, Lung, and Blood Institute at the National Institutes of Health campus. Researchers may use parallel computing to explore advanced problems in biophysical chemistry including molecular bonding, protein folding, and solvation reactions. These tasks require large amounts of CPU power. For example, a one nanosecond simulation of a mid-size protein in explicit water may require as many as a trillion additions and multiplications.
Recent increases in the power of commodity microprocessors have made Beowulfs viable research tools. Given the nature of biochemical simulations, however, increases in CPU power, no matter how impressive, are insufficient if significant progress is to be made in computational biology. Fortunately, high-speed, low latency network technology has also been developing rapidly. The LoBoS cluster utilizes high-speed InfiniBand technology for this purpose. The coupling of high speed networking with powerful commodity processes provides three main research benefits:
Improved Sampling: The time scale of simulations can be extended and simulations can be run multiple times to get a better idea of statistical significance of results.
Increased System Size: The ability to add more atoms to simulations allows for the tackling of more complex problems.
More Accurate Theory: In practice, most methodological improvements result in an increase in computational cost. However, some of this cost can be offset by parallelization and efficient network management. For example, the inclusion of dynamic electron correlation in the quantum mechanical portion of a QM/MM calculation can increase the scaling of computations by a factor of three.
The LoBoS business model is to purchase many machines equipped with commodity-priced processors rather than investing in expensive supercomputers. This has achieved a tenfold reduction in the computing costs of the research that the laboratory conducts. This plan also affords greater flexibility, as researchers can use small programs which only use one cluster node or achieve true parallel computing using many nodes. Finally, it is a very efficient use of funds because when LoBoS cluster nodes are updated to take advantage of new technology, which generally happens every 18-24 months, the old nodes can easily be converted into general purpose server machines, thus increasing their service life.
LoBoS makes a wide variety of modern hardware available to participating researchers. The current list of hardware is:
- GPU nodes
- 20 nodes with 2xA100 NVIDIA GPUs, eight-core AMD Epyc 7232P CPU, 64 GB of RAM
- 25 nodes with 2xV100 NVIDIA GPUs, six-core Intel Xeon Bronze CPU, 48 GB of RAM
- 2 nodes with 4xP100 NVIDIA GPUs, 2x twelve-core Intel Broadwell CPUs, 64 GB of RAM, FDR Infiniband
- 4 nodes with 4xTitanXp NVIDIA GPUs, 2x twelve-core Intel Broadwell CPUs, 64 GB of RAM, FDR Infiniband
- 24 nodes with 2xK40m NVIDIA GPUs, 2x ten-core Intel Broadwell CPUs, 64 GB of RAM, FDR Infiniband
- 48 nodes with 1xK20Xm NVIDIA GPU, 2x six-core Intel Ivybridge CPUs, 32 GB of RAM, FDR Infiniband
- CPU nodes
- 36 nodes with 2x twelve-core 2.1 GHz, 9.6 GT/s QPI Intel Silver (Skylake) CPUs and 48 GB of RAM, FDR Infiniband
- 72 nodes with 2x eigth-core 2.4 GHz, 7.2 GT/s QPI Intel Xeon (Haswell) CPUs and 32 GB of RAM, FDR Infiniband
- 80 nodes with 2x six-core 2.6 GHz, 7.2 GT/s QPI Intel Xeon (Ivy Bridge) CPUs and 16 GB of RAM, QDR Infiniband
- 8 "Pods", each with 32 2x six-core 2.3 GHz, 7.2 GT/s QPI Intel Xeon (Sandy Bridge) CPUs and 16 GB of RAM, QDR Infiniband
- 5 "Pods", each with 32 2x eigth-core 2.4 GHz, 7.2 GT/s QPI Intel Xeon (Haswell) CPUs and 32 GB of RAM, FDR Infiniband
- 1 Intel Xeon Phi 7210 (Knights Landing), 64 cores (256 threads) and 128 GB of RAM
- 1 node with 2x 14 core Intel Skylake CPUs, 384 GB of RAM, 400 GB of fast SSD storage mounted under /scratch, and a 10 Gbps NIC
- 2 nodes with 2x 12 core Intel Haswell CPUs, 256 GB RAM, 800 GB SSD /scratch, 40 Gb NICs
- 2 nodes with 2x 12 core Intel Ivybridge CPUs, 256 GB of RAM, 450 GB of fast SSD storage mounted under /scratch, and a 10 Gbps NIC
- GPU analysis nodes
- One with 4x Tesla M2090
- Two with 2x K20m
- 768 TB of primary storage. provided by a clustered NetApp FAS8300 system
- 300 TB of global scratch space. provided by a VAST Data flash storage system
- 1.3 PB of archive storage provided by a FreeBSD based ZFS filesystem.
- 500 TB of offsite backup storage provided by a FreeBSD based ZFS filesystem
Researchers who use LoBoS require a wide array of software to conduct molecular modeling and simulations, as well as support software to keep everything running.
- CHARMM. The primary modeling tool used on the cluster is the CHARMM (Chemistry at HARvard Molecular Mechanics) software package. Dr. Bernard Brooks, the head of the LCB, is one of the primary developers of CHARMM. The CHARMM Development Project involves a network of developers in the United States and elsewhere working with Professor Karplus and his group at Harvard to develop and maintain the package.
- AMBER. (Assisted Model Building with Energy Refinement) is designed with particular emphasis on studying the dynamics of biomolecules. AMBER consists of both freely-available tools for molecular dynamics (MD) simulation and analysis via AmberTools (including the MD engine SANDER and the parallelized MD analysis program CPPTRAJ, which can process data from AMBER, CHARMM, Gromacs, and NAMD), as well as the highly-optimized and GPU-enabled MD engine PMEMD
- GROMACS. A versatile package for performing classical molecular dynamics simulations. It comes with a wide range of tools for analysis, and has been parallelized using MPI.
- NAMD. A parallel molecular dynamics program. Its main feature is its extreme scalability. It can be run on hundreds of processors to efficiently characterize the dynamics of very large systems. It is compatible with input files of other software such as CHARMM, AMBER, and X-PLOR.
- Tinker. Molecular modeling software that is being used in the laboratory. It is a complete and general package for molecular mechanics and dynamics, with some special features for biopolymers.
- OpenMM. A freely-available toolkit for molecular simulation. Can be used either as a stand-alone application for running simulations, or as a library that can be called from other codes. Both CHARMM and AMBER can make use of OpenMM functionality.
- AMESS. Another ab initio package we have available is GAMESS-US. This software package provides means of performing QM/MM calculations via a CHARMM QM/MM interface.
- Gaussian. Another piece of commercial software that performs quantum calculations is Gaussian, which also performs ab initio analyses of complex molecular systems. It contains many advanced features such as the ONIOM method of analyzing the electronic structure of large molecules and the Polarizable Continuum Model for studying molecules in solution.
- Psi4. An open-source suite of ab initio quantum chemistry programs designed for efficient, high-accuracy simulations of a variety of molecular properties.
- VMD. A multiplatform molecular visualization program. It supports many different coordinate and structure file formats, including those used by CHARMM, PQR, and AMBER. It also is able to read GAMESS log files. We use it to create beautifully rendered images of the molecular systems which we study (some of which can be found on this Web site).
- PyMOL. A flexible molecular graphics and modelling package which can be also used to generate animated sequences.
- Molden. A visualization program of molecular and electronic structure.
Compilers and Parallel Execution Tools
- GNU Compiler Collection. We used the open-source compilers for multiple applications.
- Intel Compiler Collection. Most of our compiling is done with these compilers.
- PGI Compilers. Contains Fortran 95, Fortran 90, Fortran 77, C, and C++ compilers
- OpenMPI. A portable implementation of the MPI parallel computing standard. The software was developed by a consortium of commercial and academic institutions.
- MVAPICH. An open-source MPI software that is optimized to take advantage of the novel features and mechanisms of high-performance networking technologies such as the Infiniband interconnect.
- Intel MPI. An MPI library that focuses on making applications perform better on Intel architecture-based clusters.
Administrative and Monitoring Software
- SLURM Workload Manager. An open source, fault-tolerant, and highly scalable cluster management and job scheduling system. This will eventually be used to manage the entire LoBoS cluster.
- NAGIOS. We have found Nagios useful for general cluster monitoring and supervising important systems such as our RAID arrays. Nagios can automatically notify an administrator by e-mail when it detects a problem.
This is a detailed journey through how LoBoS came to be.
The Early Years
The first Beowulf cluster was developed by a group including Donald Becker, Thomas Sterling, Jim Fischer, and others at NASA's Goddard Space Flight Center in 1993-1994. The aim of the Beowulf project was to provide supercomputing performance at a fraction of the traditional cost. This was made possible by two recent developments in technology: firstly, the introduction of cheap Intel and Intel clone microprocessors that could perform respectably compared to DEC's Alpha CPU, Sun's SPARC and UltraSPARC lines, and other high performance CPUs, and secondly, the availability of capable open-source operating systems, most notably Linux. The Beowulf project was a success and spawned a variety of imitators at research insitutions that wanted supercomputing power without paying the price.
The original iteration of LoBoS was conceived by Bernard Brooks and Eric Billings in the mid 1990s, in an attempt to use the architecture developed by the NASA group to advance the cost effectiveness of molecular modeling. The first LoBoS cluster was constructed between January and April of 1997, and remained in use until March, 2000. This cluster used state of the art (at the time) hardware. The network topology was a ring (each node having three NICs, see the LoBoS 1 in LoBoS versions page for more details) that was joined to the NIH campus network by a pair of high-speed interconnects. This cluster was able to take advantage of the recent parallelization of computational chemistry software such as CHARMM. It was made available to collaborating researchers at NIH and other institutions.
LoBoS through the Years
The LoBoS cluster, like the original Beowulf, proved to be a success. Researchers at NIH and collaborating institutions used it to develop large-scale parallel molecular modeling experiments and simulations. By 1998, however, the original cluster, whose nodes contained dual 200 MHz Pentium Pro processors, was becoming obsolete. A second cluster, LoBoS 2, was therefore constructed consisting of nodes with dual 450 MHz Intel Pentium II processors. This cluster also abandoned the ring network topology for a standard ethernet bus. The cluster had both fast and gigabit ethernet connections, a rarity for the late 1990s. As this was happening, the original LoBoS cluster was converted for desktop use. This represented another advantage of the LoBoS business model, as machines could be converted for other uses when newer technology became available for the cluster.
With the second incarnation of LoBoS, demand for cluster use continued to increase. To provide NIH and collaborating researchers with a top of the line cluster environment, the Cluster 2000 committee was chartered to build a combined LoBoS 3/Biowulf cluster. This committee evaluated several different options for processors, network interconnections, and other technologies.
Despite the existence of LoBoS 3/Biowulf, the CBS staff decided to construct a LoBoS 4 cluster. This cluster used nodes with a dual AMD Athlon MP configuration. LoBoS 4 also added Myricom's proprietary high speed, low latency fiber network technology, called Myrinet. Myrinet gave a significant performance improvement to parallel applications. With this cluster, the CBS staff ran into power and reliability problems. Although they were mostly fixed, most of the nodes in LoBoS 4 were returned to their vendor as trade-ins for LoBoS 5 nodes. LoBoS 5, which was completed in December, 2004, was an evolutionary development from LoBoS 4, featuring nodes with dual Xeon processors and expanded use of Myrinet technology.
As LoBoS 5 began to age, plans were made for the construction of LoBoS 6, the first version of LoBoS to use 64 bit CPUs. The first batch of nodes, 52 dual dual-core Opteron systems, were brought online in late summer of 2006. The next batch of systems are 76 dual quad core Intel Clovertown nodes, which are currently being brought on-line. The Opteron nodes are connected with high-speed single data rate (SDR; 10 Gbps) and the Clovertown nodes use double data rate (DDR; 20 Gbps) InfiniBand interconnects.
LoBoS 7 showed up after a gradual replacement of the compute nodes and the infiniband switches for newer version and it supposed a big expansion of the LoBoS.
Thanks to the construction of a new data center, LoBoS has substantial capacity for expansion. The Laboratory of Computational Biology is considering several options for new nodes.
LoBoS Previous Versions
LoBoS 7 is a previous, 64-bit version of LoBoS:
|96 compute nodes||
Sandy Bridge Nodes
|72 compute nodes||
|156 compute nodes||
|1 master node||
LoBoS 6 is another previous, 64-bit version of LoBoS:
|52 compute nodes||
|1 master node||
|76 compute nodes||
|228 compute nodes||
With the unreliability of the motherboards in LoBoS 4, the Computational Biophysics Section staff decided to switch back to Intel CPUs with Supermicro motherboards. As the Pentium 4 processor was not designed for multiprocessor operations, LoBoS 5 nodes were delivered with dual Intel Xeon processors. Approval for the cluster was granted in June, 2003 and LoBoS 5 was installed in stages between March and December of 2004. The first batch of 88 nodes were equipped with 2.66 GHz Xeons. All subsequent nodes have dual 3.06 GHz processors.
With 190 nodes now in operation, the LoBoS cluster has reached the limits of the physical space available to it on campus. A new, off-campus computer room has been constructed and work is underway to build out LoBoS 6.
|190 compute nodes||
|2 master nodes||
|Myrinet switching hardware||
With the combination of LoBoS-3 and Biowulf, there was a need to upgrade the LoBoS 2 cluster which was, by 2001, showing its age. The new cluster was designed with 70 compute nodes. In a departure from previous clusters, these nodes would use AMD CPUs instead of Intel chips. LoBoS 4 also saw the deployment of Myrinet, a proprietary high-bandwidth, low-latency data link layer network technology in the LoBoS cluster.
Unfortunately, these nodes had reliability problems. In particular, the motherboards proved problematic, requiring frequent reboots, which put undue stress on the power supplies, which in turn failed at a much higher than expected rate. In addition there were problems with ensuring sufficient power and cooling for the nodes, although these issues were finally resolved. However, because of the component failures, the LoBoS 4 nodes were returned to their vendor in exchange for a discount on some of the nodes that would eventually become LoBoS-5.
|70 Compute Nodes||
|4 master nodes||
|Myrinet switching hardware||
With the success of the first two incarnations of LoBoS, there was an increase in interest, both at NIH and other institutions, in using Beowulf clusters to conduct biochemical research. To meet the challenges posed by this new avenue of research and the increasing demand for cluster resources, CIT and NHLBI at NIH decided to combine forces to design a new cluster. The Cluster 2000 committee was therefore chartered to conduct the design work. The result of this collaboration was a combined LoBoS 3 and Biowulf cluster.
Biowulf itself is maintained by the same organization as the NIH Helix systems.
After the success of the initial LoBoS implementation, the Computational Biophysics Section decided to create a second cluster using more modern (at the time) hardware. Eric Billings once again took the lead of designing and implementing the cluster. Of particular note is the abandonment of the relatively inefficient ring topology for a standard fast ethernet bus topology. A gigabit uplink provided high speed networking outside the cluster's immediate environment. LoBoS 2 was built in July and August of 1998, and was used from October 1998 to January, 2001. It was completely converted to desktop use by June, 2001.
|100 compute nodes||
|4 Master Nodes||
Note: These master nodes were upgraded for use with LoBoS 4.
The original LoBoS was used between June of 1997 and March of 2000. Its nodes were handed back for use as desktop machines between September of 1999 and March, 2000.
|47 compute nodes||
|4 Master Nodes||