High Performance Computing Section

The High Performance Computing Section of the Laboratory of Computational Biology is part of the Biochemistry and Biophysics Center at NHLBI. This Section is a group of researchers who support and maintain the LoBoS cluster that allows scientists in the laboratory to use high-performance computing to investigate biological systems by using molecular dynamics. This Section is lead by John Legato.

Learn More

Overview of LoBoS

Hardware

Software

History

Overview of LoBoS

The LoBoS supercomputer is a Beowulf class supercomputer managed by the Laboratory of Computational Biology in the National Heart, Lung, and Blood Institute at the National Institutes of Health campus. Researchers may use parallel computing to explore advanced problems in biophysical chemistry including molecular bonding, protein folding, and solvation reactions. These tasks require large amounts of CPU power. For example, a one nanosecond simulation of a mid-size protein in explicit water may require as many as a trillion additions and multiplications.

Recent increases in the power of commodity microprocessors have made Beowulfs viable research tools. Given the nature of biochemical simulations, however, increases in CPU power, no matter how impressive, are insufficient if significant progress is to be made in computational biology. Fortunately, high-speed, low latency network technology has also been developing rapidly. The LoBoS cluster utilizes high-speed InfiniBand technology for this purpose. The coupling of high speed networking with powerful commodity processes provides three main research benefits:

Improved Sampling: The time scale of simulations can be extended and simulations can be run multiple times to get a better idea of statistical significance of results.
Increased System Size: The ability to add more atoms to simulations allows for the tackling of more complex problems.
More Accurate Theory: In practice, most methodological improvements result in an increase in computational cost. However, some of this cost can be offset by parallelization and efficient network management. For example, the inclusion of dynamic electron correlation in the quantum mechanical portion of a QM/MM calculation can increase the scaling of computations by a factor of three.

The LoBoS business model is to purchase many machines equipped with commodity-priced processors rather than investing in expensive supercomputers. This has achieved a tenfold reduction in the computing costs of the research that the laboratory conducts. This plan also affords greater flexibility, as researchers can use small programs which only use one cluster node or achieve true parallel computing using many nodes. Finally, it is a very efficient use of funds because when LoBoS cluster nodes are updated to take advantage of new technology, which generally happens every 18-24 months, the old nodes can easily be converted into general purpose server machines, thus increasing their service life.

Hardware

LoBoS makes a wide variety of modern hardware available to participating researchers. The current list of hardware is:

Compute Nodes

GPU nodes
- 20 nodes with 2xA100 NVIDIA GPUs, eight-core AMD Epyc 7232P CPU, 64 GB of RAM
- 25 nodes with 2xV100 NVIDIA GPUs, six-core Intel Xeon Bronze CPU, 48 GB of RAM
- 2 nodes with 4xP100 NVIDIA GPUs, 2x twelve-core Intel Broadwell CPUs, 64 GB of RAM, FDR Infiniband
- 4 nodes with 4xTitanXp NVIDIA GPUs, 2x twelve-core Intel Broadwell CPUs, 64 GB of RAM, FDR Infiniband
- 24 nodes with 2xK40m NVIDIA GPUs, 2x ten-core Intel Broadwell CPUs, 64 GB of RAM, FDR Infiniband
- 48 nodes with 1xK20Xm NVIDIA GPU, 2x six-core Intel Ivybridge CPUs, 32 GB of RAM, FDR Infiniband
CPU nodes
- 36 nodes with 2x twelve-core 2.1 GHz, 9.6 GT/s QPI Intel Silver (Skylake) CPUs and 48 GB of RAM, FDR Infiniband
- 72 nodes with 2x eigth-core 2.4 GHz, 7.2 GT/s QPI Intel Xeon (Haswell) CPUs and 32 GB of RAM, FDR Infiniband
- 80 nodes with 2x six-core 2.6 GHz, 7.2 GT/s QPI Intel Xeon (Ivy Bridge) CPUs and 16 GB of RAM, QDR Infiniband
- 8 "Pods", each with 32 2x six-core 2.3 GHz, 7.2 GT/s QPI Intel Xeon (Sandy Bridge) CPUs and 16 GB of RAM, QDR Infiniband
- 5 "Pods", each with 32 2x eigth-core 2.4 GHz, 7.2 GT/s QPI Intel Xeon (Haswell) CPUs and 32 GB of RAM, FDR Infiniband
Miscellaneous
- 1 Intel Xeon Phi 7210 (Knights Landing), 64 cores (256 threads) and 128 GB of RAM

Analysis Nodes

1 node with 2x 14 core Intel Skylake CPUs, 384 GB of RAM, 400 GB of fast SSD storage mounted under /scratch, and a 10 Gbps NIC
2 nodes with 2x 12 core Intel Haswell CPUs, 256 GB RAM, 800 GB SSD /scratch, 40 Gb NICs
2 nodes with 2x 12 core Intel Ivybridge CPUs, 256 GB of RAM, 450 GB of fast SSD storage mounted under /scratch, and a 10 Gbps NIC
GPU analysis nodes
- One with 4x Tesla M2090
- Two with 2x K20m

Storage

768 TB of primary storage. provided by a clustered NetApp FAS8300 system
300 TB of global scratch space. provided by a VAST Data flash storage system
1.3 PB of archive storage provided by a FreeBSD based ZFS filesystem.
500 TB of offsite backup storage provided by a FreeBSD based ZFS filesystem

Software

Researchers who use LoBoS require a wide array of software to conduct molecular modeling and simulations, as well as support software to keep everything running.

Molecular Dynamics

CHARMM. The primary modeling tool used on the cluster is the CHARMM (Chemistry at HARvard Molecular Mechanics) software package. Dr. Bernard Brooks, the head of the LCB, is one of the primary developers of CHARMM. The CHARMM Development Project involves a network of developers in the United States and elsewhere working with Professor Karplus and his group at Harvard to develop and maintain the package.
AMBER. (Assisted Model Building with Energy Refinement) is designed with particular emphasis on studying the dynamics of biomolecules. AMBER consists of both freely-available tools for molecular dynamics (MD) simulation and analysis via AmberTools (including the MD engine SANDER and the parallelized MD analysis program CPPTRAJ, which can process data from AMBER, CHARMM, Gromacs, and NAMD), as well as the highly-optimized and GPU-enabled MD engine PMEMD
GROMACS. A versatile package for performing classical molecular dynamics simulations. It comes with a wide range of tools for analysis, and has been parallelized using MPI.
NAMD. A parallel molecular dynamics program. Its main feature is its extreme scalability. It can be run on hundreds of processors to efficiently characterize the dynamics of very large systems. It is compatible with input files of other software such as CHARMM, AMBER, and X-PLOR.
Tinker. Molecular modeling software that is being used in the laboratory. It is a complete and general package for molecular mechanics and dynamics, with some special features for biopolymers.
OpenMM. A freely-available toolkit for molecular simulation. Can be used either as a stand-alone application for running simulations, or as a library that can be called from other codes. Both CHARMM and AMBER can make use of OpenMM functionality.

Quantum Mechanics

AMESS. Another ab initio package we have available is GAMESS-US. This software package provides means of performing QM/MM calculations via a CHARMM QM/MM interface.
Gaussian. Another piece of commercial software that performs quantum calculations is Gaussian, which also performs ab initio analyses of complex molecular systems. It contains many advanced features such as the ONIOM method of analyzing the electronic structure of large molecules and the Polarizable Continuum Model for studying molecules in solution.
Psi4. An open-source suite of ab initio quantum chemistry programs designed for efficient, high-accuracy simulations of a variety of molecular properties.

Molecular Visualization

VMD. A multiplatform molecular visualization program. It supports many different coordinate and structure file formats, including those used by CHARMM, PQR, and AMBER. It also is able to read GAMESS log files. We use it to create beautifully rendered images of the molecular systems which we study (some of which can be found on this Web site).
PyMOL. A flexible molecular graphics and modelling package which can be also used to generate animated sequences.
Molden. A visualization program of molecular and electronic structure.

Compilers and Parallel Execution Tools

GNU Compiler Collection. We used the open-source compilers for multiple applications.
Intel Compiler Collection. Most of our compiling is done with these compilers.
PGI Compilers. Contains Fortran 95, Fortran 90, Fortran 77, C, and C++ compilers
OpenMPI. A portable implementation of the MPI parallel computing standard. The software was developed by a consortium of commercial and academic institutions.
MVAPICH. An open-source MPI software that is optimized to take advantage of the novel features and mechanisms of high-performance networking technologies such as the Infiniband interconnect.
Intel MPI. An MPI library that focuses on making applications perform better on Intel architecture-based clusters.

Administrative and Monitoring Software

SLURM Workload Manager. An open source, fault-tolerant, and highly scalable cluster management and job scheduling system. This will eventually be used to manage the entire LoBoS cluster.
NAGIOS. We have found Nagios useful for general cluster monitoring and supervising important systems such as our RAID arrays. Nagios can automatically notify an administrator by e-mail when it detects a problem.

History

This is a detailed journey through how LoBoS came to be.

The Early Years

The first Beowulf cluster was developed by a group including Donald Becker, Thomas Sterling, Jim Fischer, and others at NASA's Goddard Space Flight Center in 1993-1994. The aim of the Beowulf project was to provide supercomputing performance at a fraction of the traditional cost. This was made possible by two recent developments in technology: firstly, the introduction of cheap Intel and Intel clone microprocessors that could perform respectably compared to DEC's Alpha CPU, Sun's SPARC and UltraSPARC lines, and other high performance CPUs, and secondly, the availability of capable open-source operating systems, most notably Linux. The Beowulf project was a success and spawned a variety of imitators at research insitutions that wanted supercomputing power without paying the price.

The original iteration of LoBoS was conceived by Bernard Brooks and Eric Billings in the mid 1990s, in an attempt to use the architecture developed by the NASA group to advance the cost effectiveness of molecular modeling. The first LoBoS cluster was constructed between January and April of 1997, and remained in use until March, 2000. This cluster used state of the art (at the time) hardware. The network topology was a ring (each node having three NICs, see the LoBoS 1 in LoBoS versions page for more details) that was joined to the NIH campus network by a pair of high-speed interconnects. This cluster was able to take advantage of the recent parallelization of computational chemistry software such as CHARMM. It was made available to collaborating researchers at NIH and other institutions.

LoBoS through the Years

The LoBoS cluster, like the original Beowulf, proved to be a success. Researchers at NIH and collaborating institutions used it to develop large-scale parallel molecular modeling experiments and simulations. By 1998, however, the original cluster, whose nodes contained dual 200 MHz Pentium Pro processors, was becoming obsolete. A second cluster, LoBoS 2, was therefore constructed consisting of nodes with dual 450 MHz Intel Pentium II processors. This cluster also abandoned the ring network topology for a standard ethernet bus. The cluster had both fast and gigabit ethernet connections, a rarity for the late 1990s. As this was happening, the original LoBoS cluster was converted for desktop use. This represented another advantage of the LoBoS business model, as machines could be converted for other uses when newer technology became available for the cluster.

With the second incarnation of LoBoS, demand for cluster use continued to increase. To provide NIH and collaborating researchers with a top of the line cluster environment, the Cluster 2000 committee was chartered to build a combined LoBoS 3/Biowulf cluster. This committee evaluated several different options for processors, network interconnections, and other technologies.

Despite the existence of LoBoS 3/Biowulf, the CBS staff decided to construct a LoBoS 4 cluster. This cluster used nodes with a dual AMD Athlon MP configuration. LoBoS 4 also added Myricom's proprietary high speed, low latency fiber network technology, called Myrinet. Myrinet gave a significant performance improvement to parallel applications. With this cluster, the CBS staff ran into power and reliability problems. Although they were mostly fixed, most of the nodes in LoBoS 4 were returned to their vendor as trade-ins for LoBoS 5 nodes. LoBoS 5, which was completed in December, 2004, was an evolutionary development from LoBoS 4, featuring nodes with dual Xeon processors and expanded use of Myrinet technology.

As LoBoS 5 began to age, plans were made for the construction of LoBoS 6, the first version of LoBoS to use 64 bit CPUs. The first batch of nodes, 52 dual dual-core Opteron systems, were brought online in late summer of 2006. The next batch of systems are 76 dual quad core Intel Clovertown nodes, which are currently being brought on-line. The Opteron nodes are connected with high-speed single data rate (SDR; 10 Gbps) and the Clovertown nodes use double data rate (DDR; 20 Gbps) InfiniBand interconnects.

LoBoS 7 showed up after a gradual replacement of the compute nodes and the infiniband switches for newer version and it supposed a big expansion of the LoBoS.

The EonStor RAID arrays provide disk space for the LoBoS 5 cluster.

The Future

Thanks to the construction of a new data center, LoBoS has substantial capacity for expansion. The Laboratory of Computational Biology is considering several options for new nodes.

LoBoS Previous Versions

LoBoS 7

LoBoS 7 is a previous, 64-bit version of LoBoS:

Westmere Nodes

Equipment	Notes
96 compute nodes	Dual 2.40 GHz Intel Xeon E5645 12 GB DDR3 SDRAM 750 GB 7200 RPM SATA hard drive Onboard gigabit ethernet network interface card InfiniBand Interface (see below)
InfiniBand hardware	Mellanox QDR InfiniBand HCA (1 per node) QLogic 12800-120 QDR InfiniBand switch

Sandy Bridge Nodes

Equipment	Notes
72 compute nodes	Dual 2.30 GHz Intel Xeon E5-2630 16 GB DDR3 SDRAM 500 GB 7200 RPM SATA hard drive Onboard gigabit ethernet network interface card InfiniBand Interface (see below)
InfiniBand hardware	Mellanox QDR InfiniBand HCA (1 per node) 6 x 36 port Mellanox QDR InfiniBand switches in a fat-tree topology

Nehalem Nodes

Equipment	Notes
156 compute nodes	Dual 2.27 GHz Intel Xeon E5520 12 GB DDR3 SDRAM 500 GB 7200 RPM SATA hard drive Onboard gigabit ethernet network interface card InfiniBand Interface (see below)
1 master node	SuperMicro motherboard Dual 2.67 GHz Intel Xeon X5650 (Six core) 1 TB SATA hard drive Over 200 TB of shared file storage.
InfiniBand hardware	Mellanox QDR InfiniBand HCA (1 per node) QLogic DDR InfiniBand switch from LoBoS 6

LoBoS 6

LoBoS 6 is another previous, 64-bit version of LoBoS:

Operation Nodes

Equipment	Notes
52 compute nodes	Dual 2.2 GHz AMD Opteron 275 4096 MB DDR SDRAM and 2000 MB of swap space 250 GB 7200 RPM SATA hard drive Onboard gigabit ethernet network interface card InfiniBand Interface (see below)
1 master node	SuperMicro motherboard Dual 2.2 GHz AMD Opteron 275 (Dual core) 3 x 750 GB SATA hard drives Over 20 TB of shared file storage.
InfiniBand hardware	QLogic SDR InfiniPath HCA (1 per node) Voltaire ISR9096 SDR InfiniBand switch

Clovertown Nodes

Equipment	Notes
76 compute nodes	Dual 2.33 GHz quad core Intel Clovertown Xeon 8192 MB DDR SDRAM and 2000 MB of swap space 750 GB 7200 RPM SATA hard drive Onboard gigabit ethernet network interface card InfiniBand Interface (see below)
InfiniBand hardware	InfiniBand HCA (1 per node) QLogic 288 port DDR InfiniBand switch

Harpertown Nodes

Equipment	Notes
228 compute nodes	Dual 2.5 GHz quad core Intel Harpertown Xeon 8192 MB DDR SDRAM and 2000 MB of swap space 500 GB 7200 RPM SATA hard drive Onboard gigabit ethernet network interface card InfiniBand Interface (see below)
InfiniBand hardware	DDR InfiniPath HCA QLogic 288 port DDR InfiniBand switch

LoBoS 5

With the unreliability of the motherboards in LoBoS 4, the Computational Biophysics Section staff decided to switch back to Intel CPUs with Supermicro motherboards. As the Pentium 4 processor was not designed for multiprocessor operations, LoBoS 5 nodes were delivered with dual Intel Xeon processors. Approval for the cluster was granted in June, 2003 and LoBoS 5 was installed in stages between March and December of 2004. The first batch of 88 nodes were equipped with 2.66 GHz Xeons. All subsequent nodes have dual 3.06 GHz processors.

With 190 nodes now in operation, the LoBoS cluster has reached the limits of the physical space available to it on campus. A new, off-campus computer room has been constructed and work is underway to build out LoBoS 6.

Equipment	Notes
190 compute nodes	SuperMicro motherboard CPU: 88 nodes: Dual 2.66 GHz Xeon 512 KB L2 cache 102 nodes: Dual 3.06 GHz Xeon 512 KB L2 cache 2048 MB PC-2100 DDR SDRAM and 1024 MB of swap space 120 GB 7200 RPM EIDE hard drive Onboard gigabit ethernet network interface card Myrinet Interface: 88 nodes: Myrinet E-card fiber network interface card 40 nodes: No Myrinet connectivity 62 nodes: Myrinet C-card fiber network interface card from LoBoS 4
2 master nodes	SuperMicro X5DPA motherboard Dual 2.8 GHz Intel Xeon CPUs with 512 KB L2 cache 2 x 120 GB EIDE hard drives in a RAID1 (mirroring) configuration 4.8 TB RAID storage from EonStor equipped file servers (available throughout the LoBoS network).
Myrinet switching hardware	The Myrinet switching hardware from LoBoS 4 was retained. In addition, a second, identical switch and line cards were added.

LoBoS 4

With the combination of LoBoS-3 and Biowulf, there was a need to upgrade the LoBoS 2 cluster which was, by 2001, showing its age. The new cluster was designed with 70 compute nodes. In a departure from previous clusters, these nodes would use AMD CPUs instead of Intel chips. LoBoS 4 also saw the deployment of Myrinet, a proprietary high-bandwidth, low-latency data link layer network technology in the LoBoS cluster.

Unfortunately, these nodes had reliability problems. In particular, the motherboards proved problematic, requiring frequent reboots, which put undue stress on the power supplies, which in turn failed at a much higher than expected rate. In addition there were problems with ensuring sufficient power and cooling for the nodes, although these issues were finally resolved. However, because of the component failures, the LoBoS 4 nodes were returned to their vendor in exchange for a discount on some of the nodes that would eventually become LoBoS-5.

Equipment	Notes
70 Compute Nodes	Tyan MPX motherboard Dual AMD Athlon MP 2000+ with a 256 KB L2 cache 2048 MB PC-2100 DDR SDRAM with 2048 MB of swap space 266 MHz system bus 20 GB 7200 RPM EIDE hard drive 3Com 10/100 network interface card Myrinet C-Card fiber-optic network interface card
4 master nodes	Supermicro motherboard Dual 450 MHz Intel Pentium II CPUs with 512 KB L2 cache 9 GB EIDE hard drive 1.2 TB RAID5 storage with RAIDZONE SmartCans
Myrinet switching hardware	M3-E128 9U switch enclosure 8 M3-SW16 line cards

LoBoS 3

With the success of the first two incarnations of LoBoS, there was an increase in interest, both at NIH and other institutions, in using Beowulf clusters to conduct biochemical research. To meet the challenges posed by this new avenue of research and the increasing demand for cluster resources, CIT and NHLBI at NIH decided to combine forces to design a new cluster. The Cluster 2000 committee was therefore chartered to conduct the design work. The result of this collaboration was a combined LoBoS 3 and Biowulf cluster.

Biowulf itself is maintained by the same organization as the NIH Helix systems.

LoBoS nodes installed as part of Biowulf.

LoBoS 2

After the success of the initial LoBoS implementation, the Computational Biophysics Section decided to create a second cluster using more modern (at the time) hardware. Eric Billings once again took the lead of designing and implementing the cluster. Of particular note is the abandonment of the relatively inefficient ring topology for a standard fast ethernet bus topology. A gigabit uplink provided high speed networking outside the cluster's immediate environment. LoBoS 2 was built in July and August of 1998, and was used from October 1998 to January, 2001. It was completely converted to desktop use by June, 2001.

Equipment	Notes
100 compute nodes	Supermicro P6DBE motherboard Dual 450 MHz Pentium II processors with 512 KB L2 cache 256 MB SDRAM + 512 MB of swap space Packet Engine gigabit ethernet GNIC-II Linux Operating System
4 Master Nodes	American Megatrends MerlinDP Motherboard Dual 200 MHz Pentium Pro with 256 KB L2 cache 1.2 GB EIDE hard drives Note: These master nodes were upgraded for use with LoBoS 4.

Equipment

Notes

100 compute nodes

Supermicro P6DBE motherboard
Dual 450 MHz Pentium II processors with 512 KB L2 cache
256 MB SDRAM + 512 MB of swap space
Packet Engine gigabit ethernet GNIC-II
Linux Operating System

4 Master Nodes

American Megatrends MerlinDP Motherboard
Dual 200 MHz Pentium Pro with 256 KB L2 cache
1.2 GB EIDE hard drives

Note: These master nodes were upgraded for use with LoBoS 4.

LoBoS 1

The original LoBoS was used between June of 1997 and March of 2000. Its nodes were handed back for use as desktop machines between September of 1999 and March, 2000.

Equipment	Notes
47 compute nodes	American Megatrends Merlin DP Motherboards Dual 200 MHz Pentium Pro processors with 256 KB L2 cache 128 MB SDRAM + 256 MB of swap space 1.2 GB EIDE hard drives 3 D-Link DFE500 fast ethernet NICs (2 for ring topology, 1 for user network) Linux Operating System
4 Master Nodes	Shared with LoBoS 2

A diagram of the original LoBoS ring topology.