Folding@home high performance client FAQ
- Folding on GPUs
- What are GPU's and how can they help FAH?
- Which GPUs will be supported?
- Credits.
- Updates.
- Folding on SMP or Multi-cored computers
- Folding on the Sony Playstation 3 (PS3)
- GROMACS on Clearspeed
- FAH on BOINC
Introduction
The performance of the Folding@home (FAH) client is critical to the success of the Folding@home project. While many calculations we perform can be accelerated by adding more CPUs to FAH, others have constraints. In many calculations, a given simulation trajectory needs to reach a distant point in time before any of the calculations are useful. This suggests the need for clients which can calculate a given trajectory faster.
We have been working since early 2003 to give the FAH client advanced features to drive the performance for this class of calculation. We have been trying several different approaches. We are doing this work through collaboration with other groups because there are limits to what the FAH team can do; through collaborations, we can also take advantage of expertise not natively found within our group.
Currently, all of our attention has been directed towards our main core, GROMACS. This work is still very experimental and it is hard to tell when these new methods will go out of the lab and into FAH. Below, we detail some of the approaches and the current status. Due to the experimental research nature of this work, it is unclear when these projects will be ready to be released in FAH.
Also, to respect our collaborators wishes (and requested confidentiality in certain cases), we will not be able to give regular updates in all instances. Also, experimental work means that there are big hills to climb and lots of different approaches to test; thus, something that appeared to be working at one point, may no longer work with updates.
We are very excited about these collaborations and hope the projects will be released in FAH soon (ideally 6-12 month timescale). However, we reserve the right to delay these projects if we feel that it's too early. Clearly, a buggy or inefficient "high performance client" would not help the FAH project.
Folding on GPUs
What are GPU's and how can they help FAH?
GPU's are Graphics Processing Units -- chips used in today's PC's to help speed high performance graphics, such as 3D games or 3D scientific visualization. GPUs have the possibility to perform an enormous number of Floating Point OPerations (FLOPs). However, they achieve this high performance by losing generality -- there are only certain types of calculations which would be well-suited to GPUs. However, after much work, we have been able to write a highly optimized molecular dynamics code for GPU's, achieving a 20x to 40x speed increase over comparable CPU code for certain types of calculations in FAH. This means that we will be able to make an enormous advance over what we could do only just a few years ago.
Which GPUs will be supported?
We support CUDA-class GPU's for NVIDIA (primarily G80 and later) and R6xx or later GPU's for ATI.
Credits.
There have been several incarnations of our GPU code for MD. The earliest version resulted from our collaborators Prof. Pat Hanrahan (Stanford University, Computer Science Dept) and Prof. Eric Darve (Stanford University, Mechanical Engineering Dept) and their groups; this version was never released on Folding@home. The next version was written and optimized by V. Vishal and released as GPU1 on Folding@home. More recently, we've released our second generation GPU code (GPU2) on Folding@home; this was written by V. Vishal. Mark Friedrichs (general code and ATI), Scott LeGrand (NVIDIA port), and Mike Houston (ATI optimizations).
Updates.
We will periodically make updates below (and will keep the old news for posterity). These updates serve as an overview of major changes. For more up to the minute details about client developments, please read the project News page (Vijay's blog).
July 2005 We have a working version of GROMACS on Brook, but are tuning performance.
August 2005 Vishal has made great progress in rewriting the GROMACS inner loops in order to take advantage of more memory. Now, we're tweaking for performance.
September 2005 Vishal has found that the new changes aren't quite there yet. We're also benchmarking/testing/evaluating new hardware that has come out.
October 2005 With the release of the R520 card, it looks like we're getting close. Our unoptimized code is already looking good and we are working to optimize it.
December 2005 Vishal is working on optimizing the newest code. We are seeking help from GPU manufactures to tune our code on their hardware.
January 2006 We have had some one-on-one discussions with hardware vendors (sorry, can't say who yet, but hopefully soon) to try to help us port and optimize our code. I hope this will lead to something exciting -- stay tuned.
February 2006 Vishal is trying out a new type of code to run on GPUs to see if we can get an even greater speed increase. Depending on how this looks, we may go with the new version or stick with the previous one to release on FAH.
March 2006 We have been investigating using ATI boards. The newest ones (with the R520 or R580 GPU -- eg the 1800XT or 1900XT boards) support 32 bit floating point operations, so they now work for folding calculations.
April 2006 We and our collaborators have submitted our first folding on GPU papers to the Supercomputing 2006 meeting. The results will be publically available at a later date. We are now working to refine the algorithms as well as make the code suitable for running on FAH. One of the big challenges for FAH deployment will be the differences in drivers on donors machines. However, we now have a plan for this and so we are starting to make plans for porting to FAH. We note that the port may take a while to debug (and we will want to do a great deal of internal testing before releasing such a different type of port).
May 2006 We have started building a GPU cluster to test the GPU port of FAH/Gromacs and to get some initial results to make sure that everything is working well. Before putting this into FAH, we need to make sure that both the science is validated (we get the right answers, only quicker!) and that the GPUs are up to being run 24x7 on FAH. I'm most concerned about the latter point and so this "burn in" test will be important. Assuming all goes well, we'll start the porting to FAH, with the hope (but not an official release date) to release this by the end of the 2006 calendar year.
June 2006 Vishal has been testing the code to make sure it gives the right numbers for FAH WU's. We've found some surprises and so there has been a fair bit of code development to make fixes.
July 2006 Vishal has made a new library to make it a lot easier to make new codes into FAH cores. The "corification" of the GPU code should follow shortly. For now, we've been mainly running code tests by hand on the cluster.
August 2006 We're running on the cluster and starting to ramp up. Vishal's new method to build cores should yield a FAH core soon, but it will take a lot of Q&A (GPU folding is much more sensitive to machine configuration) so it will still take some time before we release the core. It's pretty exciting for us to see the calculation racing away on the GPU cluster! Prof. Pande also formally announced FAH's GPU client at his award lecture at the Protein Society meeting, with a great response for the GPU work. We will be making a more formal announcement of the GPU client and some other surprises relatively soon. Note the update above regarding GPUs -- we currently plan to support only recent generation ATI GPUs (see above).
September 2006 We have been alpha testing the code with people outside Stanford. So far so good, although the driver version seems to be an important consideration (more info later). We are beta testing the ATI GPU client software internally at the moment and will likely announce an open beta in four to five weeks (end of September).
October 2006 The open beta has begun. More info can be found on our GPU FAQ. We have had early problems with EUE's due to some bad interactions between the core and certain drivers, but that appears to have been worked out. There are a few cosmetic bugs still (in particular, output of how much has been completed), but for the most part, it looks like folding is working on GPU's. We are moving to support additional types of cards as well as multiple boards in a single computer. We hope that after a few weeks, all the major issues will be ironed out and we can move to a more full release.
November 2006 We have been making lots of updates to the GPU core. Some are to help the code run better (eg on multiple GPU's) and some are science-side updates.
December 2006 We have been looking into the ability to run Folding@home on nVidia G80's and have found a bug in the G80 driver. We have informed nVidia about it and are waiting for them to resolve it.
March 2007 We are nearing completion of our next major revisions to the GPU core. We hope to have it out in a month or two (but GPU coding has been notoriously complex).
April 2008 We have released our second generation GPU2 code for ATI clients.
June 2008 We have released our second generation GPU2 code for NVIDIA clients.
We have also added a new FAQ for the PS3 client.
Folding on SMP or Multi-cored computers
SMP boxes are already fairly common and with the release of dual core CPUs, we expect this to be getting even more common. To make FAH run one core calculation on a SMP box, there are several challenges to overcome. One of the most prominent is the scalability of the GROMACS core. For most codes (GROMACS included), one does not get a 4x speed increase on 4 processors, due to some inefficiencies of scaling. Also, there are some unique challenges server-side, since GROMACS requires load balancing done in code that currently runs server side in FAH. Our primary collaborator here is Prof. Charlie Peck (Earlham College, Computer Science Dept). Prof. Peck has been working on ways to easily run GROMACS on clusters and SMP machines in conjunction with Folding@home.
July 2005 We have two approaches to solving the SMP problem and are experimenting with them in parallel. Both are pretty early stage, but appear to work reasonably well in the lab. The biggest challenge right now appears to be server side.
August 2005 Abhay has been making steady progress here. We are trying two approaches: a pure SMP approach as well as an approach which would also work on computer clusters. There are pros and cons of each and having both programs allows us to take the best version.
September 2005 Abhay has found that we need to change the Gromacs code base used. We're working on the switch with Prof. Peck's group.
November 2005 We have had some snags with the code. Prof. Peck will be coming out to Stanford in early 2006 and that should help push this through.
January 2006 We have been talking with the Gromacs developers about a threads-based solution as well, which would have many benefits over an MPI solution for multi-core CPUs/SMP.
July 2006 Discussions with Gromacs developers suggest that their code development is going well, but a bit delayed. The good news is that the delay is due to added functionality, so when it does ship, we should be in good shape.
October 2006 We have had some good success with a new direction for the SMP client. We are optimistic that this will be reasonable to release. Once the GPU client is a little further along, we will put more attention into this direction.
November 2006 The SMP client is now looking good enough that we are starting a more broad beta test outside of Stanford. If that looks good, we will move to a completely open beta test of this new client. The SMP client supports OSX/Intel natively (which means a major points boost for OSX donors) as well as 64-bit linux (with 32-bit linux hopefully to come soon). Windows support will come much later, as this is a very different architecture for porting than OSX & Linux.
March 2007 We have released a version of the SMP client for 32-bit Windows. There are some unique quirks due to the nature of running MPI on Windows, but it's a natural choice for donors who need or want to run Windows.
April 2008 We have continued to update our SMP core, most notably including our A2 core, which has much better scalability.
Folding on the Sony Playstation 3 (PS3)
The PS3 is a powerful system for scientific calculation. Its Cell processor has the potential to be very powerful. In many ways, the Cell (and therefore the PS3) sits in between GPU's and multi-core CPU's. It is more flexible than GPU's, but less flexible than CPU's in the calculations it can perform. Consequently, it is more powerful than CPU's, but less powerful than GPU's. In the balance between flexibility and speed, the Cell makes a natural middle path. Other benefits are the uniformity of PS3's (all have the same processor, GPU, RAM, etc) and the ability to stream data quickly to the GPU, allowing for real time visualization.
March 2007 In partnership with Sony, we have launched Folding@home on the PS3.
April 2008 Version 1.3 was released, including new science code (GB/SA).
GROMACS on Clearspeed
Clearspeed offers a CPU architecture which, while similar in architecture to GPUs in terms of many parallel FPUs, goes beyond GPUs by having a more flexible architecture and many other scientifically useful properties, such as double precision FPUs. Clearspeed has been working on porting GROMACS and has done some public demos. Once they have completed the port, we will start testing its feasibility to be used in FAH.
July 2005 GROMACS is running on Clearspeed hardware with a significant speed increase over CPUs. However, we are still waiting to get final code from Clearspeed.
August 2005 No specific news to update. We are looking forward to the launch of the new cards, hopefully in the fall.
September 2005 No specific news to update. There have been rumors about pricing for these cards, but nothing official yet.
December 2005 The CS hardware appears set to roll out in the Spring of 2006. We do not have updates on the code which we can publicly discuss at the moment.
FAH on BOINC
BOINC is an infrastructure for doing distributed computing. BOINC is produced by the same group at LBNL which has done SETI@Home and has become a useful tool for distributed computing. We have been exploring using BOINC as an additional client in addition to the existing FAH clients in order to give donors a choice (especially those who currently run BOINC).
January 2006 We have an initial release client which we alpha tested in a small group. This has lead to some issues which have been difficult to resolve and we are delaying our launch until these issues can be resolved to our satisfaction.
April 2006 We have updated much of the code, but are now in the midst of dealing with staff turnover in the BOINC part of the development team, which has slowed development.
For More Information, Please See:
Last Updated on August 14, 2008, at 03:51 PM