Second generation GPU client on NVIDIA hardware (GPU2) FAQ
Table of Contents
- Introduction
- Folding@home debuts with the Tinker core (October 2000)
- A major step forward: the Gromacs core (May 2003)
- The next major step forward: Streaming Processor cores (September 2006)
- The second generation GPU core, aka GPU2, for ATI hardware (April 2008)
- The second generation GPU core for NVIDIA (June 2008)
- General instructions
- Frequently Asked Questions (common to both ATI and NVIDIA GPU2 clients)
- My points per day (PPD) varies significantly from project to project
- What about visualization?
- I'm having trouble getting visualization to work on Vista (it just crashes immediately).
- What OSs does the new client/core support?
- Can I run the GPU client as a service?
- Can I use my CPU to do calculations too?
- How do I use flags with the SysTray client?
- How do I make a new shortcut?
- What about multi-gpu support?
- How do you decide the credit value of GPU work units?
- Why is the new GPU client important?
- What's different between the GPU1 (first generation) and the GPU2 (second generation) client?
- Can I still use my GPU when the client is running?
- Troubleshooting
- The client was working, but now all I'm getting was Early Unit Ends (EUE's). How can I fix this?
- My client gives an UNSTABLE_MACHINE error and is going to sleep for 24 hours! What should I do?
- Hey, where did all of the GPU client data files go?
- The GPU2 client isn't working on Windows Server OS's (eg Server 2003).
- Beta client warning
- Issues specific to the GPU2/NV Clients
- What hardware does the new client/core support?
- The core can't find the DLL's!
- The client displays an error saying that I do not have a supported GPU, but I do!!!
- A DLL error dialog box is popping up -- what's up with that?
- Who did all of this anyway?

A Brief History of FAH: From Tinker to Gromacs to GPU to GPU2
Introduction
Since 2000, Folding@home (FAH) has lead to a major jump in the capabilities of molecular simulation. By joining together hundreds of thousands of PCs throughout the world, calculations, which were previously considered impossible, have now become routine. FAH has targeted the study of protein folding and protein folding diseases, and numerous scientific advances have come from the project.
In 2006, we began looking forward to another major advance in capabilities. This advance utilizes the new, high performance Graphics Processing Units (GPUs) from ATI to achieve performance previously only possible on supercomputers. With this new technology, as well as the new Cell processor in Sony's PlayStation 3, we will soon be able to attain performance on the 100 gigaflop scale per computer. With this new software and hardware, we will be able to push Folding@home a major step forward.
Now in 2008, we have developed a second generation GPU core (GPU2). This core is much more sophisticated than the original, with higher reliability, ease of use, and much more scientific calculation capabilities.
Our goal is to apply this new technology to dramatically advance the capabilities of Folding@home, applying our simulations to further study of protein folding and related diseases, including Alzheimer's Disease, Huntington's Disease, and certain forms of cancer. With these computational advances, coupled with new simulation methodologies to harness the new techniques, we will be able to address questions previously considered impossible to tackle computationally, and make even greater impacts on our knowledge of folding and folding related diseases.
Folding@home debuts with the Tinker core (October 2000)
In October 2000, Folding@home was officially released. The main software core engine was the Tinker molecular dynamics (MD) code. Tinker was chosen as the first scientific core due to its versatility and well laid out software design. In particular, Tinker was the only code to support a wide variety of MD force fields and solvent models. With the Tinker core, we were able to make several advances, including the first folding of a small protein starting purely from sequence (subsequently published in Nature).
A major step forward: the Gromacs core (May 2003)
After many months of testing, Folding@home officially rolled out a new core based on the Gromacs MD code in May 2003. Gromacs is the fastest MD code available, and likely one of the most optimized scientific codes in the world. By using hand tuned assembly code and utilizing new hardware in many PCs and Intel-based Macs (the SSE instructions), Gromacs was considerably faster than most MD codes by a factor of about 10x, and approximately a 20x to 30x speed increase over Tinker (which was written for flexibility and functionality, but not for speed).
However, while Gromacs is faster than Tinker, it has limits to what it can do; for example, it does not support many implicit solvent models, which play a key role in our folding simulations with Tinker. Thus, while Gromacs significantly sped certain calculations, it was not a replacement for Tinker, and so the Tinker core will continue to play an important role in Folding@home (including a recent paper in Science). For these reasons, points for Gromacs WUs were set to be consistent with points for Tinker WUs, as both play an important role in the science of FAH. Moreover, we switched the benchmark machine to a 2.8 GHz Pentium 4 (from a 500MHz Celeron) in order to allow us to fairly benchmark these types of WUs (as the benchmark machine needed to have hardware support for SSE).
The next major step forward: Streaming Processor cores (September 2006)
Much like the Gromacs core greatly enhanced Folding@home by a 20x to 30x speed increase via a new utilization of hardware (SSE) in PCs, in 2006, Folding@home has developed a new streaming processor core to utilize another new generation of hardware: GPUs with programmable floating-point capability. By writing highly optimized, hand tuned code to run on ATI X1900 class GPUs, the science of Folding@home will see another 20x to 30x speed increase over its previous software (Gromacs) for certain applications. This great speed increase is achieved by running essentially the complete molecular dynamics calculation on the GPU; while this is a challenging software development task, it appears to be the way to achieve the highest speed improvement on GPU's.
In addition, through collaboration with Pande Group, Sony has developed an analogous core for the PS3's Cell processor (another streaming processor), which should see a significant speed increase for the science over the types of calculations we could previously do on a x86/SSE Gromacs core as well. Following what we did with the introduction of Gromacs, we will now switch benchmark machines and include an ATI X1900XT GPU in order to be able to benchmark streaming WUs (which cannot be run on non-GPU machines). This machine will also benchmark CPU units (which continue to be of value since GPUs work only for certain simulations) without using its GPU.
The second generation GPU core, aka GPU2, for ATI hardware (April 2008)
After running the original GPU core for quite some time and analyzing its results, we have learned a lot about running GPGPU software. For example, it has become clear that a GPGPU approach via DirectX (DX) is not sufficiently reliable for what we need to do. Also, we've learned a great deal about GPU algorithms and improvements. One of the really exciting aspects about GPU's is that not only can they accelerate existing algorithms significantly, they get really interesting in that they can open doors to new algorithms that we would never think to do on CPUs at all (due to their very slow speed on CPUs, not but GPU's).
After much effort, we have taken all we've learned about GPUs from the first generation client and produced a second generation client. This new client appears to be faster, more reliable, and has more scientific functionality. The preliminary results so far from it look very exciting, and we're excited to now open up the client for FAH donors to run.
The second generation GPU core for NVIDIA (June 2008)
In collaboration with NVIDIA, we have released a GPU2 core for NVIDIA hardware.
General instructions
This web page will serve as the FAQ and Release Notes for this new client, and we will update this page as more information becomes available.
The FAH GPU Client installer should do everything one needs. It installs the new v6.x SysTray style client, as well as DLL files used by this new client. Download the client from the High Performance Client Download Page for folding experts. The Windows GPU Guide can help you install the GPU2 client.
Basic Requirements:
- a GeForce, Quadro, or Tesla card that supports CUDA (G80 or later for the most part)
- A CUDA capable driver, version 174.55 is recommended. Or 177.35 for GTX 2xx cards (you can download the driver for Win XP, Win XP 64 bit, Vista, and Vista 64 bit).
- Windows operating system (32 or 64 bit), XP or newer (better Vista and 64-bit support coming soon)
While the GPU2 client is not beta, the cores is still a beta release and we expect there will be bugs, flaws, problems, etc. To minimize problems, we have been testing the cores extensively in house and they run well there. However, it's our experience that running in the controlled setup in our lab and running "out in the wild" are very different situations.
As in the use of any beta software, please make sure to back up your hard drive, and do not run this client on any machine which cannot tolerate even the slightest instability or problems.
Frequently Asked Questions (common to both ATI and NVIDIA GPU2 clients)
My points per day (PPD) varies significantly from project to project
There are lots of differences between GPUs and this leads to big swings in PPD when proteins of different sizes are simulated. When we benchmark on a given machine, we can ensure that on a machine that is similar to the benchmark machine, there will be no fluctuation in PPD. For machines which are very different from the benchmark machine, there could be big swings (33% is not unheard of, considering the large differences in hardware, such as the number of shaders, from GPU to GPU). This is particularly true for NVIDIA cards, which do very well at small proteins compared to the benchmark machine, but not nearly as well for larger proteins.
What about visualization?
We are working to add visualization and the visualization code right now is out in limited testing. The picture on the right shows what the visualization looks like. Click on the figure to see a larger version.
Like the Folding@home PS3 client visualization, the GPU2 client shows a real time view of the protein during the simulation. Since the GPU2 client is quite fast, the protein does move around a fair bit (about 1000x more compared to say the classic client in some cases). Click here to see a movie of the ATI version (warning: 10MB download). Thanks to ATI and NVIDIA for their help with this visualization, especially for the look, and to Adam Beberg for the main engine behind it.
I'm having trouble getting visualization to work on Vista (it just crashes immediately).
On Vista, you need to make sure that the User Account Controls (UAC) allow the viewer to be run as administrator. You can also just run it as an admin if you right click on the viewer binary. We are not officially supporting Vista viewer behavior just yet (due to all of these issues), but this should make it work for a lot of people, especially if you find it crashes immediately on your machine.
What OSs does the new client/core support?
The client runs on Windows XP and Vista for now. (Linux and OSX may be a possibility in the future.)
Can I run the GPU client as a service?
The service installation is not currently supported. It may be possible to run the GPU console client as a service in Windows XP, but it may never be possible in Windows Vista, due to the different video driver architecture. Vista does not present the driver interface to the service, so it would take a significant effort to make that work.
However, with the Systray client, you can set the client to start with Windows by putting the shortcut in the Startup folder. This can work fine in both XP and Vista.
Can I use my CPU to do calculations too?
For now, the GPU2 core uses the CPU a bit in addition to heavy use of the GPU. However, we hope to off load the calculation completely to the GPU in the future.
How do I use flags with the SysTray client?
Starting with the v6.12beta8 client, flags can be added in the client configuration panel under the Advanced Tab. One can also create a new shortcut and add the command line switches in the shortcut properties, but this is not recommended since it is easy for this to cause problems.
How do I make a new shortcut?
If you must use a shortcut, please follow the instructions below carefully. After you make a new shortcut in Windows Explorer, you need to set the properties correctly, as noted below.
In Windows XP:
Target: "C:\Program Files\Folding@home\Folding@home-gpu\Folding@home.exe" -verbosity 9
(or whatever flags you use instead of -verbosity 9)
Start in: "C:\Documents and Settings\<your_windows_username>\Application Data\Folding@home-gpu\"
In Windows Vista:
Target: "C:\Program Files (x86)\Folding@home\Folding@home-gpu\Folding@home.exe" -verbosity 9
(or whatever flags you use instead of -verbosity 9)
Start in: "C:\Users\<your_windows_username>\AppData\Roaming\Folding@home-gpu\"
NOTE: The "Start in" path is not the same as the "Target" path! Do NOT enter this specific text: <your_windows_username> it only represents the account name you used to login to Windows. Enter your actual account name instead. And when using a new shortcut, be sure to remove the original FAH GPU shortcut from the Start/Programs/Startup folder. Also, if you update or reinstall the client again later, the installer will recreate that original shortcut. Remove the original shortcut again to avoid corrupting the work unit data.
More details can be found in the Windows GPU Guide
What about multi-gpu support?
Yes, you can add the "-gpu N" flag (N starts at 0) to your extra parameters in the advanced page of the systray clients, or the advances settings of the console clients. Again N starts with 0 not 1, so your primary display is 0, the next is 1, etc. Each client if you are running more then one needs a different -gpu and a different machineD, and a different working directory, so follow the instructions for multiple clients.
For Tesla and other non-desktop cards, we have the "-forcegpu" flag, which will make the client ignore what it thinks the GPU is, but the core will not work if that card is unsupported, so use with caution. To use this flag you need to supply it with a gpu core type to use as an override. Currently these are ati_r600, ati_r700 and nvidia_g80. This flag may also help for headless multi-gpu setups on which the client refuses to acknowledge the presence of your GPUs. The -forcegpu flag can be use in conjunction with the -gpu flag to force the client to try and use the given gpuid. For example:
Folding@home.exe -gpu 2 -forcegpu nvidia_g80
Would force the client to try and run the NVIDIA core on the 3rd (2 + 1) cuda enabled device in the system. Similarly:
Folding@home.exe -gpu 3 -forcegpu ati_r600
Would force the client to try and run the ATI core on the 4th (3 + 1) CAL enabled device in the system.
For a more detailed set of instructions, please see these links
We will update our main FAQ in time with these details, but we are also working on ways to handle this from an installer directly.
How do you decide the credit value of GPU work units?
Points are determined by the performance of a given machine relative to a benchmark machine, similar to the CPU client benchmark process. Before releasing any new project (series of work units), we benchmark it on a dedicated computer with an ATI Radeon 3850 GPU (512 MB, 320 Stream Processors), running in a Dell Inspiron 531, with a 2.16 GHz dual core AMD 64 X2 4000+.
We plug the results of this benchmark test into the following formula:
Points = 1500 * (DaysPerWU)
where DaysPerWU is the number of days it took the benchmark hardware to complete the work unit. Note that the GPU client still relies on a fast CPU, so the CPU is an important part of this. The Points Per Day (PPD) given here assumes that a CPU is heavily needed, with a larger PPD to compensate for the use of that CPU.
Please note the very concept of a reference machine will mean that some WU performance will vary from the performance on your machine. Even between various GPU models, there are significant differences in architectures and memory speeds. Moreover, there are variations between WUs within a given project which can lead to speed differences.
Our goal is consistency within a given definition of a reference machine setup (described above), but beyond that, the natural variation from machine to machine and WU to WU will never allow any point system to perfectly predict what you get on your machine.
Why is the new GPU client important?
The purpose of the GPU client is twofold: to take advantage of the high-performance capabilities of Stream Processing, and to help develop a simulation architecture that will become one of the dominant FAH computing paradigms as multi-processor GPUs become an industry standard over the next several years. High-performance clients enable us to run types of calculations that would be impractical on our standard architecture--calculations that enhance our scientific capabilities, and your scientific contributions, significantly.
High-performance clients often require more computing resources. GPU clients typically run on dedicated systems, 24 hours a day, and use more processing power, more disk space, more network resources, more system memory, etc. Also, a major part of the scientific benefit is dependent on rapid turnaround of work units; hence we assign short deadlines for GPU work units. To reward those contributors for donating resources beyond the typical CPU client, for completing these work units very quickly within the short deadlines, and for contributing to the development of our next-generation capabilities, we currently set a benchmark value proportional to these demanding GPU work units. Without the GPU clients and your additional contributions, we would not be able to complete many important projects.
What's different between the GPU1 (first generation) and the GPU2 (second generation) client?
Scientifically, GPU2 introduces several new advances which makes it much more useful. It matches the advanced water models in the PS3 client and adds a new one (which will likely appear in a future PS3 client). These more advanced water models make this new GPU client very useful to us.
There are also many changes under the hood. The previous generation client proved to be problematic due to GPU-specific issues and we've fixed all of them (as far as we can tell) in this second generation client. An important part of these fixes is using CAL (on ATI) and CUDA (on NVIDIA) instead of DirectX (the previous generation GPU client highlighted several issues with using DirectX). A major upside to using CAL and CUDA is that DirectX context switches no longer affect the client. Actions such as fast-user switching, or locking your computer have no effect on GPU processing. Remote desktop does still affect the GPU client and will cause the FahCore to fail when a connection is initiated; VNC does not have the same problem and can be used as an alternative (this needs to be tested on NVIDIA hardware).
Initially, this new client will be a SysTray style client only. A console version may follow later.
Can I still use my GPU when the client is running?
Yes. Unlike the original GPU client, which interfered with many operations that used the GPU, the new GPU2 client does not. Playing videos and playing games either have no effect on the action of the GPU client other than a slow-down in processing, or cause a temporary suspension of folding. The new client will automatically back off whenever an application requests exclusive DirectX mode, although it is not reported in the client logfile. DirectX programs that do not request exclusive mode will cause the GPU client to slow down, and may in some instances have a detrimental effect on application performance. Full screen video is unaffected by the GPU client.
Troubleshooting
The client was working, but now all I'm getting was Early Unit Ends (EUE's). How can I fix this?
We've seen cases where playing GPU intensive games can leave the GPU in a weird state, leading to consistent EUE's (Early Unit End error messages). Restarting the computer has worked to resolve this problem. We are looking into a better solution.
My client gives an UNSTABLE_MACHINE error and is going to sleep for 24 hours! What should I do?
This occurs when 5 EUE's occur. Rapidly EUE-ing machines are a sign that the client needs some donor intervention to fix it. Please check out the FAQ below as well as forum (http://foldingforum.org) for details about how to fix a misconfigured client. This error typically results from a problem with drivers. Please see the instructions above for which drivers you should use for your hardware. Unfortunately, we cannot give more information from the client, since all the client knows is that it can't run CUDA and there's lots of reasons why (and there's currently no way for the core to detect them).
If your client has worked before, try restarting your machine, as that has also shown to help. Restarting the client will reset the EUE counter.
Hey, where did all of the GPU client data files go?
The new GPU client is a SysTray client, similar to the new v6.x CPU SysTray clients, and follows a standardized Windows installation procedure. This new client type is similar in nature to the previous GUI style client, but with notable changes, and a separate visualizations module (to come later).
The client executable can be installed to any directory you select.
The default locations are:
In Windows XP:
Executable: "C:\Program Files\Folding@home\Folding@home-gpu\Folding@home.exe"
Data Files: "C:\Documents and Settings\<your_windows_username>\Application Data\Folding@home-gpu\"
In Windows Vista:
Executable: "C:\Program Files (x86)\Folding@home\Folding@home-gpu\Folding@home.exe"
Data Files: "C:\Users\<your_windows_username>\AppData\Roaming\Folding@home-gpu\"
Note: The client installer creates a shortcut to the data files in the Folding@home Programs folder. The installer also creates a program shortcut in the Startup folder that launches the GPU client. The Startup shortcut points to the specific file locations in the Target: and Start In: fields. That shortcut cannot be edited to add client switches. To add a client switch, follow the instructions for creating a new shortcut above.
The GPU2 client isn't working on Windows Server OS's (eg Server 2003).
Please make sure Hardware Acceleration is enabled in grpahics advanced configuration (it's disabled by default on Windows server OS versions).
Beta client warning
We often release clients early for donors to beta test. These beta versions likely have some rough edges, but we expect that they should work reasonably well for all donors. See the respective installation instructions for more details of known bugs for each of the beta versions.
As in the use of any beta software, please make sure to back up your hard drive before installing. DO NOT not run a beta client if you or your machines cannot tolerate even the slightest instability or problems. Beta clients and servers performance may vary significantly from standard FAH clients during the development process, including but not limited to work unit shortages, server downtime for upgrades, short notice for client upgrades, and Points Per Day that differs a little or a lot from the developmental benchmark level.
Finally, note that while the points per day for these clients are higher than the classic client, they can require a lot more maintenance due to their experimental or beta nature. If you would prefer to have a client which runs as smoothly as possible, we suggest you run our main client, not a high performance client. If you run a high performance client, expect a much more complex experience and much more work to keep the client running (which is compensated by extra points per day).
Issues specific to the GPU2/NV Clients
What hardware does the new client/core support?
The client runs on
- GeForce 8xxx
- GeForce 9xxx
- Quadro FX 360, 370, 570, 1600, 1700, 3600, 3700, 4600, 5600
- Quadro NVS 130, 135, 140, 290, 320
- Tesla C870*
- MCP77/78*
- NVIDIA GeForce G*
Which is most of the hardware supported by NVIDIA's CUDA.
The core can't find the DLL's!
We've been seeing some unusual behavior with virus scanners. We are looking into this. For now, reboot and give it a second try and it should work.
The client displays an error saying that I do not have a supported GPU, but I do!!!
Please make sure your driver is CUDA capable (for NVIDIA cards) and that your GPU board is the primary display board. The CUDA capable driver is 177.35 and above (the original 174.55 driver is also CUDA capable).
A DLL error dialog box is popping up -- what's up with that?
If the DLL error pops up, go to the installed location, C:\Program Files\Folding@home\Folding@home-gpu by default, and make sure the cudart.dll ended up there with the FahCore_11.exe file. If not, do a file search (including hidden and system files/folders) and copy the file to the needed location.
Who did all of this anyway?
In alphabetical order:
- Adam Beberg (Pande Lab): client modifications, GPU's APIs under the hood
- Dan Ensign (Pande Lab): server setup, science, testing
- Mark Friedrichs (Pande Lab, Simbios): core science code updates, testing
- Simon Green (NVIDIA): visualizer for NVIDIA hardware
- Mike Houston (AMD): testing, problem solving, GPU tuning on the GPU and GPU2 code; core bug fixing
- Scott LeGrand (NVIDIA): Port of GPU2 code to CUDA, performance enhancements, visualization enhancements
- Vijay Pande (Pande Lab): Project management, fitting square pegs through round holes, etc
- We would also like to thank the Folding@home Community Forum moderators for their help with this FAQ and some early beta testing of the software.
For More Information
Last Updated on September 17, 2008, at 12:41 PM
