I am running into a persistent error (possibly a memory leak?) when attempting to calculate Wannier projections using the latest version of VASP 6.2.1 with Wannier90-3.1.0. The error occurs only when NCORE>1; for NCORE=1, everything works as expected.
I am attaching a bug report here, using Si as an example. The bug is reproducible using either vasp_gam or vasp_std (for 1x1x1 k-point mesh) and with vasp_std (for larger k-point meshes). I am using Intel parallel_studio_xe 2020.2 compilers for the tests attached here but I am also able to reproduce the error with GCC/GFortran 9.3.0.
Crash when computing MMN with NCORE>1
Moderators: Global Moderator, Moderator
-
- Newbie
- Posts: 8
- Joined: Sat Nov 16, 2019 8:58 pm
Crash when computing MMN with NCORE>1
You do not have the required permissions to view the files attached to this post.
-
- Global Moderator
- Posts: 506
- Joined: Mon Nov 04, 2019 12:41 pm
- Contact:
Re: Crash when computing MMN with NCORE>1
Thanks for reporting the issue.
This is a known limitation that was present since older versions of VASP (I checked in 5.4.4).
The solution for the moment is to always use NCORE=1.
This issue appears because when NCORE/=1 is used the WF components are distributed among different MPI ranks (each band on NCORE MPI ranks). To compute MMN and AMN matrices for Wannier we need to generate the WFs in the full Brillouin zone which implies rotating them from one k-point to another which in turn requires transferring data among the CPUs that are treating the same band. This is not implemented yet.
Note that the default data distribution in VASP i.e. NCORE=1 means that each band is treated on one CPU so the code will work without problems.
This is a known limitation that was present since older versions of VASP (I checked in 5.4.4).
The solution for the moment is to always use NCORE=1.
This issue appears because when NCORE/=1 is used the WF components are distributed among different MPI ranks (each band on NCORE MPI ranks). To compute MMN and AMN matrices for Wannier we need to generate the WFs in the full Brillouin zone which implies rotating them from one k-point to another which in turn requires transferring data among the CPUs that are treating the same band. This is not implemented yet.
Note that the default data distribution in VASP i.e. NCORE=1 means that each band is treated on one CPU so the code will work without problems.
-
- Newbie
- Posts: 8
- Joined: Sat Nov 16, 2019 8:58 pm
Re: Crash when computing MMN with NCORE>1
Thanks, Henrique. It might be useful to add a one-line check to terminate the calculation at the very outset if LWANNIER90 = .TRUE. and NCORE/=1 so that the calculation does not crash after wasting compute time on self-consistency cycles.
-
- Global Moderator
- Posts: 506
- Joined: Mon Nov 04, 2019 12:41 pm
- Contact:
Re: Crash when computing MMN with NCORE>1
Yes, this is a good point.
We will make this change in a future release.
We will make this change in a future release.