Things aren't looking so good for "vasp_gpu". Their current version only compiles under cuda/7.5, because under cuda/8.0, the compilation runs into an error:
kernels.h(182): error: function "atomicAdd(double *, double)" has already been defined
because cuda/8.0 predefines it, apparently.
But it turns out that Nvidia's device drivers are tightly coupled with their cuda versions, so when I tried to run yesterday's vasp_gpu compiled with cuda/7.5, it generated this error:
CUDA driver version is insufficient for CUDA runtime version
In fact, any of cuda/7.5's utilities fail this way on compute-1-14, because its current Nvidia driver is associated with cuda/8.0. I thought at first that the driver was old and installed the newest one, but it's the same problem.
If I "module load cuda/8.0", then at least cuda/8.0's utilities will run on compute-1-14.
So there are only a couple of ways around this:
1) Install the older Nvidia driver associated with cuda/7.5 on compute-1-14 -- but then it won't work with any programs compiled against cuda/8.0.
or 2) Modify vasp's source code so it doesn't try to have its own "atomicAdd()" function.
or 3) Throw in the towel. vasp_gpu might be something better suited to individual servers, rather than a cluster.
Better that the VASP folks fix their source code to accommodate cuda/8.0 -- or else propose a supported work-around -- than us. They probably haven't thought much about cuda/8 yet, if VASP mostly runs on vanilla Ubuntu and Redhat servers that come with older versions of cuda.
Regarding the source code "fix" (in case you want to forward this to the vasp folks), I knew that vasp was defining the conflicting "atomicAdd(double, double)" function in their file at "src/CUDA/kernels.h". Meanwhile, I found a useful web posting for a different program with a similar problem with CUDA 8.0:
The issue is that the CUDA folks didn't define that "atomicAdd()" "double" variant in previous versions of CUDA, but they did provide code suggesting how programmers should define it in their own code. Then in version 8.0, they decided to include their suggested code within CUDA after all, and so now all of these "pre-8.0" programs are running into this atomicAdd() conflict, since it's defined both in their own source code and by CUDA's. Following the web page suggestion above, I modified the "atomicAdd" section of "src/CUDA/kernels.h" as follows:
#if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 600
// atomicAdd for "double" already defined under CUDA 8.0; no need to define it here.
<... vasp's old pre-cuda/8.0 atomicAdd definition code placed here...>
Then I ran "make gpu", and it finally created a version of "vasp_gpu" that runs under cuda/8.0. Woo!
Give it a try, and let us (HPC Support) know how it goes, and -- if so -- whether it's much of an improvement over regular vasp.
Users browsing this forum: No registered users and 1 guest