Page 1 of 1

VASP Job Hangs

Posted: Thu Jan 16, 2025 9:23 pm
by franklin_goldsmith1

A VASP job that runs with 32 cores on single node sometimes hangs. With the option "--get-stack-traces --report-state-on-timeout" for mpirun, the following error is reported:
Rank: 17 Node: node1648 PID: 3344693 State: WAITPID FIRED ExitCode 0

The attached zip file has all the files include the job script "submit.sh" for reproducing the issue.


Re: VASP Job Hangs

Posted: Fri Jan 17, 2025 11:01 am
by ahampel

Hi,

thank you for reaching out to us on the official VASP forum.

From what I can see in your output files one of the MPI ranks seems to have not returned. It seems that this did not happen during one calculation but right at the beginning? In your output log I can see that the last calculation finished:

Code: Select all

Total vdW correction in eV:     55.6945537                       
   1 F= -.10914094E+03 E0= -.10907376E+03  d E =-.201543E+00     
 writing wavefunctions                                           

and the following iteration in your slurm job did not produce any output yet correct? Or does this also happen during an electronic scf calculation?

Can you give me some details on how VASP is compiled? Compilers, libraries, makefile.include, etc please. Otherwise it will be hard for me to test this. Such crashes might be very compiler specific. Let's say no bug is known to me that sounds close to what you report. After I have this information I will try to reproduce the problem.

Best regards,
Alex H.