VASP

License Restrictions

VASP is licensed software. Only users in the vasp group are authorized to use it. To check your group membership run the command groups.

If you’re not a member and attempt to load a VASP module, you’ll see an error like:

Lmod has detected the following error:  Access denied: user '<username>' is not in group 'vasp'
While processing the following module(s):
    Module fullname                   Module Filename
    ---------------                   ---------------
    vasp/6.5.1-nvhpc-25.5-mkl-2025.2  /mnt/apps/modulefiles/manual/vasp/6.5.1-nvhpc-25.5-mkl-2025.2.lua

Running VASP

Available versions of VASP
Version	MPI support	CPU support	GPU support	Compilers and Libraries	Binaries	Modulefile
6.5.1	yes (openmpi)	yes	yes (preferred)	`nvhpc`, `intel-mkl`	`vasp_gam`, `vasp_ncl`, and `vasp_std`	`vasp/6.5.1-nvhpc-25.5-mkl-2025.2`
6.5.1	yes (openmpi)	yes	no	`amd-aocc`, `amd-aocl`	`vasp_gam`, `vasp_ncl`, and `vasp_std`	`vasp/6.5.1-aocc-5.0.0-aocl-5.1`

VASP on GPU nodes

VASP is designed to run with one MPI process per GPU. Each process spawns multiple OpenMP threads, ideally bound to separate CPU cores. For example, on a node with 2 GPUs and 32 CPU cores, use:

2 MPI processes (1 per GPU)
16 OpenMP threads per process (32 cores / 2 processes)

This ensures optimal utilization of both CPU and GPU resources.

VASP on CPU-only nodes

MPI Ranks per Socket

A common recommendation is to launch 2 MPI ranks per socket, especially when each socket has a large number of cores (e.g., 32 or 64). This setup helps balance communication overhead and computational efficiency, particularly for Gamma-point-only calculations.

Our CPU-only nodes have 2 sockets with 16 cores each, so this setting is appropriate.

Use of NCORE

NCORE defines the number of cores used per MPI rank for FFT parallelization. A good heuristic is to set NCORE to match the number of cores per socket or divide the total cores per node by the number of MPI ranks. For example, on our hgxh200 nodes with 56 cores per socket (112 total), using 2 MPI ranks per socket (4 total) would suggest NCORE = 28.

Avoid Over-Parallelization

Using more MPI ranks than necessary can lead to diminishing returns due to increased communication overhead. It’s generally advised not to exceed the number of physical cores per socket with MPI ranks.

Benchmarking

The best configuration often depends on the specific system and calculation type. Use the LOOP: cpu time entry in the OUTCAR file to compare performance across different configurations.

Example Slurm Job Script

Warning

If you plan to use the scripts provided below, please keep the following in mind:

Case sensitivity matters: Unlike Windows, Linux treats uppercase and lowercase letters as different characters. For example, $vasp_exec and $VASP_EXEC refer to two distinct variables.
Line continuation requires spacing: In bash, the \ character is used to continue a command onto the next line. However, you must include a space before the backslash to prevent command-line options from being accidentally merged.

Example (correct):
```
mpirun -np 4 \
vasp_std
```
Example (incorrect):
```
mpirun -np 4\
vasp_std
```
The shell treats the correct example as mpirun -np 4 vasp_std and the incorrect example as mpirun -np 4vasp_std.

Example VASP on GPU with Openmpi

#!/bin/bash
#SBATCH --job-name=vasp_gpu
#SBATCH --partition=gpu2h100
#SBATCH --nodes=1
#SBATCH --gpus-per-node=2
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=16
#SBATCH --gpus-per-task=1
#SBATCH --time=02:00:00
#SBATCH --output=vasp_%j.out
#SBATCH --error=vasp_%j.err

ulimit -l unlimited

module load vasp/6.5.1-nvhpc-25.5-mkl-2025.2

# Path to your VASP executable. You can use either of:
# vasp_gam, vasp_ncl, or vasp_std
# NOTE: The VASP_ROOT variable is set by the VASP module above.
VASP_EXEC=$VASP_ROOT/bin/vasp_gam

mpirun -np $SLURM_NTASKS --map-by node:PE=$SLURM_CPUS_PER_TASK --bind-to core \
    -x OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK \
    -x OMP_STACKSIZE=512m \
    -x OMP_PLACES=cores \
    -x OMP_PROC_BIND=close \
    $VASP_EXEC ...

Example VASP on CPU-only with Openmpi and AMD libraries

#!/bin/bash
#SBATCH --job-name=vasp_cpu_amd
#SBATCH --partition=cpu384g
#SBATCH --nodes=1
#SBATCH --ntasks-per-socket=2
#SBATCH --ntasks=4                # 2 MPI ranks per socket × 2 sockets
#SBATCH --cpus-per-task=16        # 16 cores per MPI rank
#SBATCH --time=24:00:00
#SBATCH --output=vasp_output.log
#SBATCH --error=vasp_error.log

ulimit -l unlimited

module load vasp/6.5.1-aocc-5.0.0-aocl-5.1

# Path to your VASP executable. You can use either of:
# vasp_gam, vasp_ncl, or vasp_std
# NOTE: The VASP_ROOT variable is set by the VASP module above.
VASP_EXEC=$VASP_ROOT/bin/vasp_gam

mpirun -np $SLURM_NTASKS \
    --map-by ppr:$SLURM_NTASKS_PER_SOCKET:socket:PE=$SLURM_CPUS_PER_TASK \
    --bind-to core \
    -x OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK \
    -x OMP_STACKSIZE=512m \
    -x OMP_PLACES=cores \
    -x OMP_PROC_BIND=close \
    $VASP_EXEC ...