A Brief History of FAH: From Tinker to Gromacs and the power of the GPU
Introduction
Since 2000, Folding@home (FAH) has lead to a major jump in the capabilities of molecular simulation. By joining together hundreds of thousands of PCs throughout the world, calculations which were previously considered impossible have now become routine. FAH has targeted the study of protein folding and protein folding diseases, and numerous scientific advances have come from the project.
In 2006, we began looking forward to another major advance in capabilities. This advance utilizes the new, high performance Graphics Processing Units (GPUs) from ATI to achieve performance previously only possible on supercomputers. With this new technology, as well as the new Cell processor in Sony's PlayStation 3, we sought to attain performance on the scale of 100 gigaflop per computer. With this new software and hardware, we pushed Folding@home a major step forward.
Our goal is to apply new technology to dramatically advance the capabilities of Folding@home, applying our simulations to further study of protein folding and related diseases, including Alzheimer's disease, Huntington's disease, and certain forms of cancer. With your help, coupled with new simulation methodologies to harness the new techniques, we will be able to address questions previously considered impossible to tackle computationally, and make even greater impacts on our knowledge of folding and folding related diseases.
Folding@home debuts with the Tinker core (October 2000)
In October 2000, Folding@home was officially released. The main software core engine was the Tinker molecular dynamics (MD) code. Tinker was chosen as the first scientific core due to its versatility and well laid out software design. In particular, Tinker was the only code to support a wide variety of MD force fields and solvent models. With the Tinker core, we were able to make several advances, including the first folding of a small protein starting purely from sequence (subsequently published in Nature).
A major step forward: the Gromacs core (May 2003)
After many months of testing, Folding@home officially rolled out a new core based on the Gromacs MD code in May 2003. Gromacs is the fastest MD code available, and likely one of the most optimized scientific codes in the world. By using hand tuned assembly code and utilizing new hardware in many PCs and Intel-based Macs (the SSE instructions), Gromacs was considerably faster than most MD codes by a factor of about 10x, and approximately a 20x to 30x speed increase over Tinker (which was written for flexibility and functionality, but not for speed).
In 2003, Gromacs had limits to what it could do, and did not support many implicit solvent models, which played a key role in our folding simulations with Tinker. Thus, while Gromacs significantly sped certain calculations, it was not a replacement for Tinker, and so the Tinker core continued to play an important role in the science of Folding@home. For these reasons, points for Gromacs WUs were set to be consistent with points for Tinker WUs. Moreover, we switched the benchmark machine to a 2.8 GHz Pentium 4 (from a 500MHz Celeron) in order to allow us to fairly benchmark these types of WUs (as the benchmark machine needed to have hardware support for SSE).
The next major step forward: Streaming Processor cores (September 2006)
Much like the Gromacs core greatly enhanced Folding@home by a 20x to 30x speed increase via a new utilization of hardware (SSE) in PCs, in 2006, we developed a new streaming processor core to utilize another new generation of hardware: GPUs with programmable floating-point capability. By writing highly optimized, hand-tuned code to run on ATI X1900 class GPUs, the science of Folding@home will see another 20x to 30x speed increase over its previous software (Gromacs) for certain applications. This great speed increase is achieved by running essentially the complete molecular dynamics calculation on the GPU; while this is a challenging software development task, it appears to be the way to achieve the highest speed improvement on GPUs.
In addition, through collaboration with Pande Group, Sony has developed an analogous core for the PS3's Cell processor (another streaming processor), which should see a significant speed increase for the science over the types of calculations we could previously do on a x86/SSE Gromacs core as well. Following what we did with the introduction of Gromacs, we will now switch benchmark machines and include an ATI X1900XT GPU in order to be able to benchmark streaming WUs (which cannot be run on non-GPU machines). This machine will also benchmark CPU units (which continue to be of value since GPUs work only for certain simulations) without using its GPU.
The second-generation GPU core, aka GPU2, for ATI hardware (April 2008)
After running the original GPU core for quite some time and analyzing its results, we have learned a lot about running GPGPU software. For example, it has become clear that a GPGPU approach via DirectX (DX) is not sufficiently reliable for what we need to do. Also, we've learned a great deal about GPU algorithms and improvements. One of the really exciting aspects about GPU's is that not only can they accelerate existing algorithms significantly, they get really interesting in that they can open doors to new algorithms that we would never think to do on CPUs at all (due to their very slow speed on CPUs, not but GPU's).
After much effort, we took all we learned about GPUs from the first-generation client and produced a second-generation client, GPU2. This core was much more technically sophisticated than the original, but it was faster, had higher reliability, ease of use, and much more scientific calculation capabilities. The results from it were very exciting.
The second-generation GPU core for NVIDIA (June 2008)
In collaboration with NVIDIA, we released a GPU2 core for NVIDIA hardware.