Machine learning force field calculations: Basics: Difference between revisions

From VASP Wiki
No edit summary
 
(17 intermediate revisions by 2 users not shown)
Line 1: Line 1:
The machine-learned force fields (MLFF) feature of {{VASP}} allows you to generate, improve, modify and apply force fields based on machine learning techniques for your system of interest. Although there are many tunable parameters, i.e. MLFF-related {{FILE|INCAR}} tags, the default values have been carefully selected to simplify the initial creation of a machine-learned force field. Hence, we hope that only minimal additional effort is required to get started with this feature. Nevertheless, because machine learning involves multiple steps, e.g., at a minimum separate training and application stages, this page tries to explain the basic tags controlling the corresponding modes of operation. If you are already familiar with the basic usage of the MLFF feature, you may want to have a closer look at the [[Best practices for machine-learned force fields|best practices page]] which offers more in-depth advice for tuning MLFF settings. If you need more information about the theory and algorithms please visit the [[Machine learning force field: Theory|ML theory page]].
The machine-learned force fields (MLFF) feature of {{VASP}} allows you to generate, improve, modify and apply force fields based on machine learning techniques for your system of interest. Although there are many tunable parameters, i.e. MLFF-related {{FILE|INCAR}} tags, the default values have been carefully selected to simplify the initial creation of an MLFF. Hence, we hope that only minimal additional effort is required to get started with this feature. Nevertheless, because machine learning involves multiple steps, e.g., at a minimum separate training and application stages, this page tries to explain the basic tags controlling the corresponding modes of operation. If you are already familiar with the basic usage of the MLFF feature, you may want to have a closer look at the [[Best practices for machine-learned force fields|best practices page]] which offers more in-depth advice for tuning MLFF settings. If you need more information about the theory and algorithms please visit the [[Machine learning force field: Theory|MLFF theory page]].


== Important general remarks ==
== Step-by-step instructions ==


On-the-fly learning can be significantly more involved than, e.g., a single-point electronic calculation, because it combines multiple features of {{VASP}}. Each part requires a proper setup via the available {{FILE|INCAR}} tags. A misconfiguration corresponding to one part of the calculation may have severe effects on the quality of the resulting machine-learned force field. In the worst case, successful training may even be impossible. To be more specific, on-the-fly learning requires control over the following aspects:
On-the-fly training is based on molecular-dynamics (MD) simulations to sample training structures. Piece by piece a data set is automatically assembled and used to generate an MLFF whenever feasible. Conversely, at each time step the current force field predicts energy, forces and the corresponding Bayesian error estimations. Simply put, if the error is above a certain threshold another ab-initio calculation is performed and the reference energy and forces are added to the training data set. In the opposite case, the ab-initio step is omitted and the system is propagated via MLFF predictions. As the force field gets better along the trajectory many ab-initio steps can be avoided and the MD simulation is significantly accelerated. Ultimately, the on-the-fly training results in an MLFF which is ready for production, i.e., running an MD simulation in prediction-only mode. The following steps outline the path from start to production run:
* '''Consistent convergence'''
:It is required that all ab initio reference data collected via on-the-fly training is consistent and well-converged with respect to the [[:Category:Electronic minimization|single-point electronic calculation setup]]. Mind different temperatures and densities targeted in MD runs. A machine-learned force field can only reproduce a single potential energy landscape!
* '''Correct setup of [[:Category:Molecular dynamics|molecular dynamics simulations]]'''
:Consider the choice of thermodynamic ensembles, thermostat and barostat settings and an appropriate time step.
* '''Proper setup of machine-learned force field parameters'''
:Mind system-dependent parameters like the cutoff radius or atomic environment descriptor resolution.
* '''Control over data set generation via on-the-fly learning'''
:Monitor and control how much ab initio reference data is harvested via automatic Bayesian threshold determination and sparsification.
* '''Quality control'''
:Establish reasonable expectations regarding residual training errors. Benchmark the quality of resulting force fields by comparison of predictions with known quantities (from ab initio).
{{NB|warning|It is essential to validate the setup of each of these parts. Before trying to generate a machine-learned force field from scratch always familiarize with the pure ab initio calculations first. If this step is fully understood and convergence is under control, then initially try to run a (short) MD simulation without the aid of machine learning. Verify the resulting trajectory meets expectations with respect to conserved quantities, etc. Only then, start with [[Machine learning force field calculations: Basics|machine-learned force field generation]]. An incorrect setting in the ab initio or molecular dynamics settings can lead to a complete failure of the force field generation. Consider the following example: an ab initio setup which is not converged with respect to k-points is used as a basis to create a machine-learned force field. Then, even when the MD part is configured correctly the training structures obtained from on-the-fly learning will contain noisy reference forces and an inconsistent potential energy landscape. Eventually, even after collecting a lot of training data the remaining training errors will be significantly higher than what could be achieved with a converged k-points setup. Ultimately, this will result in poor predictive quality of the generated force field.}}


== Parallelization ==
''' Step 1: Prepare a molecular dynamics run '''


Currently {{VASP}} offers only MPI-parallelization for the MLFF feature. Hence, all operation modes based solely on MLFF code, i.e., prediction-only MD simulations ({{TAG|ML_MODE}} = run) and local reference configuration reselection ({{TAG|ML_MODE}} = select) cannot benefit from other parallelization techniques, e.g., [[OpenACC GPU port of VASP|OpenACC offloading to GPUs]] or [[Combining MPI and OpenMP|MPI/OpenMP-hybrid parallelization]]. However, a typical on-the-fly training performs also ab-initio calculations.
Prepare an [[#Category:Calculation setup|ab-initio]] [[#Category:Molecular dynamics|MD]] run with your desired {{FILE|POSCAR}} starting configuration and an appropriate setup in {{FILE|INCAR}}, {{FILE|KPOINTS}} and {{FILE|POTCAR}} files.


''' Step 2: Start on-the-fly training from scratch '''


In general, to perform a machine learning force field (MLFF) calculation, you need to set
The MLFF method can be configured with a lot of [[#Category:Machine-learned force fields|INCAR tags]] which are easily recognized from their prefix <code>ML_</code>. In general, to enable any MLFF feature the following {{FILE|INCAR}} tag needs to be set:
  {{TAGBL|ML_LMLFF}} = .TRUE.
  {{TAGBL|ML_LMLFF}} = .TRUE.
in the {{FILE|INCAR}} file. Then depending on the particular calculation, you need to set the values of additional {{FILE|INCAR}} tags.
If this tag is not set to <code>.TRUE.</code> other MLFF-related {{FILE|INCAR}} tags are completely ignored and {{VASP}} will perform regular ab-initio calculations. Furthermore, to start on-the-fly training we additionally need to set the {{TAG|ML_MODE}} "super"-tag:
In the first few sections, we list the tags that a user may typically encounter.
{{TAGBL|ML_MODE}} = train
Most of the [[#Other important input tags|other input]] are set to defaults and should be only changed by experienced users in cases where the changes are essential.  
When executed in this <code>train</code> mode {{VASP}} will automatically perform ab-initio calculation whenever necessary and otherwise rely on the predictions of the MLFF. The usual output files, e.g., {{FILE|OUTCAR}}, {{FILE|XDATCAR}}, will be created along the MD trajectory. In addition, MLFF-related files will be written to disk, the most important ones being:
* {{FILE|ML_LOGFILE}} The log file for all MLFF-related details; training status, current errors and other important quantities can be extracted from here.
* {{FILE|ML_ABN}} This file contains the collected training structures and a list of selected local reference configurations.
* {{FILE|ML_FFN}} A binary file containing the current MLFF.
All three files are repeatedly updated during the MD simulation. After {{TAG|NSW}} time steps are carried out the {{FILE|ML_ABN}} and {{FILE|ML_FFN}} file contain the complete training data set and the final MLFF, respectively. Training errors can be found in {{FILE|ML_LOGFILE}} by searching for lines starting with <code>ERR</code>.


In the following, most of the tags are only shown for the angular descriptor (tags containing a 2 in it). Almost every tag has an analogous tag for the radial descriptor (tags containing 1 in it). The usage of these tags is the same for both descriptors.
''' Step 3 (optional): Continue on-the-fly training from existing training database '''


== Type of machine learning calculation ==
In principle, step 2 above may yield a force field ready for further processing and application. However, most of the time additional on-the-fly training iterations are necessary. For example, to extend the training database with structures at higher temperatures or different densities. Or, a force field is required to capture different atom type compositions or phases, e.g., a liquid and multiple solid phases. This can be achieved by on-the-fly continuation runs: at the beginning a force field is generated from the previous training data and - if applicable - used for predictions in the MD run. Like in step 2, the force field is trained along the trajectory. However, it also retains its applicability to the structures of the previous on-the-fly run. Finally, the continuation training will result in an MLFF capable of predicting structures of both runs. To continue on-the-fly training first set up your new starting {{FILE|POSCAR}} structure, e.g., by copying from the {{FILE|CONTCAR}} file. The new structure may share some atom types with the previous run but this is not a requirement. It is also possible to continue training with completely different atom types in the {{FILE|POSCAR}} file (remember to set up your {{FILE|POTCAR}} accordingly). The only other action required is to copy the existing database to the {{FILE|ML_AB}} file:
cp ML_ABN ML_AB
Leave <code>{{TAG|ML_MODE}} = train</code> unchanged and restart {{VASP}}. The log file will contain a section describing the existing data set and after initial generation of a force field the regular on-the-fly procedure continues. In the end, the resulting {{FILE|ML_ABN}} will contain the training structures from both on-the-fly runs. Similarly, the {{FILE|ML_FFN}} file is a combined force field. In the presence of an {{FILE|ML_AB}} file the <code>train</code> mode will always perform a continuation run. If you would like to start from scratch just remove the {{FILE|ML_AB}} file from the execution directory. {{NB|tip|Apply this strategy repeatedly in order to systematically improve your MLFF, e.g., first train on water only, then on sodium chloride and finally, train on the combination of both.}}


In this section, we describe the modes in which machine learning calculations can be done in VASP and show exemplary INCAR settings.
''' Step 4: Refit for fast prediction mode '''
A typical example showing these modes in action is the machine learning of a force field on a material with two phases A and B.
Initially, we have no force field of the material, so we choose a small to medium-sized supercell of phase A to [[#On-the-fly force field generation from scratch|generate a new force field from scratch]]. In this step, ab initio calculations are performed whenever necessary improving the force field in this phase until it is sufficiently accurate.
When applied to phase B, the force field learned in phase A might contain useful information about the local configurations. [[#Continuing on-the-fly learning from already existing force fields|Hence one would run a continuation run]] and the machine will automatically collect the necessary structure datasets from phase B to refine the force field. In many cases, only a few such structure datasets are required, but it is still necessary to verify this for every case.
After the force field is sufficiently trained, one can use it to describe much larger cell sizes. Hence, one can [[#Force field calculations without learning|switch off learning]] on larger cells and use only the force field. This is then orders of magnitude faster than the ab initio calculation. If the sampled atomic environments are similar to the structure datasets used for learning, the force field is transferable for the same constituting elements, but it should be still cautiously judged whether the force field can describe rare events in the larger cell.


=== On-the-fly force field generation from scratch ===
When on-the-fly training succeeded and the result matches your expectations with respect to applicability and residual errors there is one final step required before the force field should be applied in prediction-only MD runs: refitting for fast prediction mode. Copy once again the final data set to {{FILE|ML_AB}}:
cp ML_ABN ML_AB
Also, set in the {{FILE|INCAR}} file:
{{TAGBL|ML_MODE}} = refit
Running {{VASP}} will create a new {{FILE|ML_FFN}} which finally can be used for production. {{NB|important|Although it is technically possible to continue directly with step 5 given a {{FILE|ML_FFN}} file from steps 2 or 3 it is strongly discouraged. Without the refitting step {{VASP}} cannot enable the fast prediction mode which comes with speedup factor of approximately 20 to 100. You can check the {{FILE|ML_FFN}} ASCII header to be sure whether the contained force field supports fast prediction.}}


To generate a new force field, one does not need any special input files. First, one sets up a molecular dynamics calculation as usual ([[:Category:Molecular dynamics|see molecular dynamics]]) adding the machine-learning related ones to the {{TAG|INCAR}} file. To start from scratch add
''' Step 5: Applying the machine-learned force field in production runs '''
{{TAGBL|ML_ISTART}} = 0
Running the calculation will result in generating the main output files {{TAG|ML_LOGFILE}}, {{TAG|ML_ABN}} and {{TAG|ML_FFN}} files. The latter two are required for restarting from an existing force field.


=== Continuing on-the-fly learning from already existing force-fields ===
The MLFF obtained from step 4 is now ready to be applied in the prediction-only mode. First, copy the {{FILE|ML_FFN}} file:
To continue from a previous run, copy the following files
  cp {{FILE|ML_FFN}} {{FILE|ML_FF}}
  cp {{TAGBL|ML_ABN}} {{TAGBL|ML_AB}}
In the {{FILE|INCAR}} file set
The file {{TAG|ML_AB}} contains the ab initio reference data.  
{{TAGBL|ML_MODE}} = run
With this choice {{VASP}} will only use the predictions from the MLFF, no ab-initio calculations are performed. The execution time per time step will be orders of magnitude lower if compared with corresponding ab-initio runs. {{NB|tip|The MLFF can be transferred to larger system sizes, i.e., you may duplicate your simulation box to benefit from improved statistics. Because the method scales linearly with the number of atoms you can easily estimate the impact on computational demand.}} {{NB|warning|Please have a look at a [[Known issues|known issue]] (from date 2023-05-17) if you intend to use triclinic geometries with small or large lattice angles.}}


Next either copy your last structure
== Advice ==
cp {{TAGBL|CONTCAR}} {{TAGBL|POSCAR}}
or start from a completely new {{TAG|POSCAR}} file. This new {{TAG|POSCAR}} file is allowed to have completely different elements and a number of atoms for each element.


To proceed with learning and obtain an improved force field set
On-the-fly learning can be significantly more involved than, e.g., a single-point electronic calculation, because it combines multiple features of {{VASP}}. Each part requires a proper setup via the available {{FILE|INCAR}} tags. A misconfiguration corresponding to one part of the calculation may have severe effects on the quality of the resulting MLFF. In the worst case, successful training may even be impossible. To be more specific, on-the-fly learning requires control over the following aspects:
{{TAGBL|ML_ISTART}} = 1
* '''Consistent convergence'''
in the {{TAG|INCAR}} file.
:It is required that all ab-initio reference data collected via on-the-fly training is consistent and well-converged with respect to the [[:Category:Electronic minimization|single-point electronic calculation setup]]. Mind different temperatures and densities targeted in MD runs. A MLFF can only reproduce a single potential energy landscape!
* '''Correct setup of [[:Category:Molecular dynamics|molecular dynamics simulations]]'''
:Consider the choice of thermodynamic ensembles, thermostat and barostat settings and an appropriate time step.
* '''Proper setup of machine-learned force field parameters'''
:Mind system-dependent parameters like the cutoff radius or atomic environment descriptor resolution.
* '''Control over data set generation via on-the-fly learning'''
:Monitor and control how much ab-initio reference data is harvested via automatic Bayesian threshold determination and sparsification.
* '''Quality control'''
:Establish reasonable expectations regarding residual training errors. Benchmark the quality of resulting force fields by comparison of predictions with known quantities (from ab-initio).
{{NB|tip|Begin by thoroughly familiarizing yourself with pure ab-initio calculations for your system before attempting to generate a MLFF from scratch. Once you are confident in controlling the convergence, proceed to run a brief MD simulation without machine learning assistance. Validate whether the results align with expected values regarding conservation principles and so forth. Only then, move forward with the machine learning aspects of the calculation.}}


=== Force field calculations without learning ===
== Parallelization ==
Once a sufficiently accurate force field has been generated, one can use it to predict properties. Copy the force field information
cp {{TAGBL|ML_FFN}} {{TAGBL|ML_FF}}
The file {{TAG|ML_FFN}} holds the force field parameters. One can also use different {{TAG|POSCAR}} files, e.g., a larger supercell.
In the {{TAG|INCAR}} file, select only force-field based calculations by setting
{{TAGBL|ML_ISTART}} = 2
 
== Reference total energies ==
To obtain the force field, one needs a reference for total energy.
For {{TAG|ML_ISCALE_TOTEN}}=2 this reference energy is set to the average of the total energy of the training data. This is the default setting and we advise one to use this setting if not needed otherwise.
 
If needed, reference atomic calculations can be performed (see {{TAG|Calculation of atoms}}). One can then specify to use the atomic energy and give reference energies for all atoms by setting the following variables in the {{TAG|INCAR}} file
{{TAGBL|ML_ISCALE_TOTEN}}=1
{{TAGBL|ML_EATOM_REF}} = E_at1 E_at2 ...
If the tag {{TAG|ML_EATOM_REF}} is not specified, default values of 0.0 eV/atom are assumed.
 
== Cut-off radius ==
 
The following tags define the cut-off radii for each central atom
{{TAGBL|ML_RCUT1}}
{{TAGBL|ML_RCUT2}}
 
The defaults should be only changed if the structural distances require it, e.g., two atoms on a surface that are further away than the default. In that case, the cut-off radii should be set slightly larger than the distance between these two atoms.
 
== Weighting of energy, forces, and stress ==
 
In most cases, the force field can be optimized to reproduce one of the target observables more accurately by weighting the desired quantity more strongly. Of course, at the same time, other observables are less well reproduced. Empirically up to a certain weight ratio (typically 10-20), the improvement of the more strongly weighted observable was much greater than the accuracy loss of the other observables. The optimum ratio depends on the material and the parameters of the force field, hence it has to be determined for each case. The weights of the energies, forces and stresses can be changed using {{TAG|ML_WTOTEN}}, {{TAG|ML_WTIFOR}} and {{TAG|ML_WTSIF}}, respectively. The default value is 1.0 for each. Since the input tags define the ratio of the weights, it suffices to raise the value of only one observable.
 
'''We advise to use {{TAG|ML_WTOTEN}}<math> \ge </math>10 whenever energies are important. '''
 
Note, however, that increasing {{TAG|ML_WTOTEN}} tends to lead to overfitting, that is, seemingly the errors for the energy decrease for the training set, but not always does this improvement carry over
to an independent test set. It is therefore advised to also evaluate the performance of the MLFF on an independent test set, and to increase {{TAG|ML_WTOTEN}} only up to the point where the improvements carry over to the test set. Finally, note that changing the ratios is best done in post-processing, by refitting a pre-existing {{TAG|ML_AB}} file.
 
== Caution: number of structures and basis functions ==
 
The maximum number of structure datasets {{TAG|ML_MCONF}} and basis functions (local reference configurations) {{TAG|ML_MB}} potentially constitutes a memory bottleneck, because the required arrays are allocated statically at the beginning of the calculation. It is advised not to use too large numbers initially. For {{TAG|ML_ISTART}}=0, the defaults are {{TAG|ML_MCONF}}=1500 and {{TAG|ML_MB}}=1500. For {{TAG|ML_ISTART}}=1 and 3, the values are set to the number of entries read from the ML_AB file plus a small overhead. If at any point during the training either the number of structure datasets or the size of the basis set exceeds its respective maximum number, the calculation stops with an error message.  Since the {{TAG|ML_ABN}} is continuously written during on-the-fly learning, not all is lost though.
Simply copy the  {{TAG|ML_ABN}} to {{TAG|ML_AB}} and {{TAG|CONTCAR}} to {{TAG|POSCAR}},  increase {{TAG|ML_MCONF}} or {{TAG|ML_MB}}, and continue training (see [[#Continuing on-the-fly learning from already existing force-fields|restart the calculation]]).
 
== VASP output ==
When the machine learning force field code is switched on some of the output files in VASP can be modified or augmented.
=== OUTCAR ===
Additional lines for the predicted energies, forces, and stresses appear in the [[OUTCAR]] file. They are analogous to the ab initio output except that they contain the ''ML'' keyword.
 
=== OSZICAR ===
The entries in the {{FILE|OSZICAR}} file are also written if a force field only is employed. In this case, the ''E0'' entry contains the same as the ''F'' entry since the MLFF does not predict the entropy. The rest is analogous to the ab initio output.
 
== Example input for silicon ==
This is an example {{FILE|INCAR}} file for a very basic machine learning calculation of silicon.
 
# machine learning
{{TAGBL|ML_LMLFF}}        = .TRUE.
{{TAGBL|ML_ISTART}}        = 0                            # 0 or 1, depends whether there is an existing ML_AB file
# electronic degrees of freedom
{{TAGBL|ENCUT}}            = 325                          # increase ENCUT by 30 % for accurate stress
{{TAGBL|PREC}}            = Normal                        # for ML, it is not advised to use Fast to avoid error propagation
{{TAGBL|ALGO}}            = Fast                          # Fast is usually robust, if it fails move back to ALGO = Normal
{{TAGBL|LREAL}}            = A                            # real space projection is usually faster for large systems
{{TAGBL|ISMEAR}}          = 0 ;  {{TAGBL|SIGMA}} = 0.1            # whatever you desire, for insulators never use ISMEAR>0
{{TAGBL|EDIFF}}            = 1E-05                        # it is preferred to make the calculations accurate
# technical flags, update when changing number of cores
{{TAGBL|NELM}}            = 100 ; {{TAGBL|NELMIN}} = 6              # set the mininum number of iterations to 6 to avoid early break
{{TAGBL|LCHARG}}          = .FALSE.                      # avoid writing files
{{TAGBL|LWAVE}}            = .FALSE.
{{TAGBL|NWRITE}}          = 0                            # write less to avoid huge OUTCAR
# MD
{{TAGBL|ISYM}}            = 0                            # no symmetry, we are at finite temperature
{{TAGBL|POTIM}}            = 2                            # This is very system dependent, for H to not exceed 0.5 fs, B-F about 1 fs, Na-Cl 2 fs, etc.
{{TAGBL|IBRION}}          = 0                            # MD
{{TAGBL|ISIF}}            = 3                            # you really want volume changes during MD in the training to get elastic tensors right
{{TAGBL|NSW}}              = 20000                        # honestly the larger the merrier, 50000 is strongly recommended
{{TAGBL|TEBEG}}            = 100 ; {{TAGBL|TEEND}} = 800 K          # max desired temperature times 1.3
{{TAGBL|MDALGO}}          = 3                            # use Langevin thermostat
{{TAGBL|LANGEVIN_GAMMA}}  = 10.0                          # friction coef. for atomic DoFs for each species
{{TAGBL|LANGEVIN_GAMMA_L}} = 10.0                          # friction coef. for the lattice DoFs
{{TAGBL|NBLOCK}}          = 10 ;  {{TAGBL|KBLOCK}} = 1000          # pair correlation function every 10 steps; write files every 10 x 1000 steps
 
----


At present, {{VASP}} provides only MPI-based parallelization for the MLFF feature. Therefore, any operational mode relying exclusively on MLFF code - such as predictive MD simulations ({{TAG|ML_MODE}} = <code>run</code>) and local reference configuration selection ({{TAG|ML_MODE}} = <code>select</code>) - cannot leverage alternative forms of parallelization like [[OpenACC GPU port of VASP|OpenACC offloading to GPUs]] or [[Combining MPI and OpenMP|an MPI/OpenMP hybrid approach]]. Conversely, a usual on-the-fly training involves both MLFF generation and ab-initio computations. When the latter component predominates in terms of computational demand, utilizing non-MPI parallelization remains practical.


[[Category:Machine-learned force fields]]
[[Category:Machine-learned force fields]]
[[Category:Howto]]
[[Category:Howto]]

Latest revision as of 13:18, 17 May 2023

The machine-learned force fields (MLFF) feature of VASP allows you to generate, improve, modify and apply force fields based on machine learning techniques for your system of interest. Although there are many tunable parameters, i.e. MLFF-related INCAR tags, the default values have been carefully selected to simplify the initial creation of an MLFF. Hence, we hope that only minimal additional effort is required to get started with this feature. Nevertheless, because machine learning involves multiple steps, e.g., at a minimum separate training and application stages, this page tries to explain the basic tags controlling the corresponding modes of operation. If you are already familiar with the basic usage of the MLFF feature, you may want to have a closer look at the best practices page which offers more in-depth advice for tuning MLFF settings. If you need more information about the theory and algorithms please visit the MLFF theory page.

Step-by-step instructions

On-the-fly training is based on molecular-dynamics (MD) simulations to sample training structures. Piece by piece a data set is automatically assembled and used to generate an MLFF whenever feasible. Conversely, at each time step the current force field predicts energy, forces and the corresponding Bayesian error estimations. Simply put, if the error is above a certain threshold another ab-initio calculation is performed and the reference energy and forces are added to the training data set. In the opposite case, the ab-initio step is omitted and the system is propagated via MLFF predictions. As the force field gets better along the trajectory many ab-initio steps can be avoided and the MD simulation is significantly accelerated. Ultimately, the on-the-fly training results in an MLFF which is ready for production, i.e., running an MD simulation in prediction-only mode. The following steps outline the path from start to production run:

Step 1: Prepare a molecular dynamics run

Prepare an ab-initio MD run with your desired POSCAR starting configuration and an appropriate setup in INCAR, KPOINTS and POTCAR files.

Step 2: Start on-the-fly training from scratch

The MLFF method can be configured with a lot of INCAR tags which are easily recognized from their prefix ML_. In general, to enable any MLFF feature the following INCAR tag needs to be set:

ML_LMLFF = .TRUE.

If this tag is not set to .TRUE. other MLFF-related INCAR tags are completely ignored and VASP will perform regular ab-initio calculations. Furthermore, to start on-the-fly training we additionally need to set the ML_MODE "super"-tag:

ML_MODE = train

When executed in this train mode VASP will automatically perform ab-initio calculation whenever necessary and otherwise rely on the predictions of the MLFF. The usual output files, e.g., OUTCAR, XDATCAR, will be created along the MD trajectory. In addition, MLFF-related files will be written to disk, the most important ones being:

  • ML_LOGFILE The log file for all MLFF-related details; training status, current errors and other important quantities can be extracted from here.
  • ML_ABN This file contains the collected training structures and a list of selected local reference configurations.
  • ML_FFN A binary file containing the current MLFF.

All three files are repeatedly updated during the MD simulation. After NSW time steps are carried out the ML_ABN and ML_FFN file contain the complete training data set and the final MLFF, respectively. Training errors can be found in ML_LOGFILE by searching for lines starting with ERR.

Step 3 (optional): Continue on-the-fly training from existing training database

In principle, step 2 above may yield a force field ready for further processing and application. However, most of the time additional on-the-fly training iterations are necessary. For example, to extend the training database with structures at higher temperatures or different densities. Or, a force field is required to capture different atom type compositions or phases, e.g., a liquid and multiple solid phases. This can be achieved by on-the-fly continuation runs: at the beginning a force field is generated from the previous training data and - if applicable - used for predictions in the MD run. Like in step 2, the force field is trained along the trajectory. However, it also retains its applicability to the structures of the previous on-the-fly run. Finally, the continuation training will result in an MLFF capable of predicting structures of both runs. To continue on-the-fly training first set up your new starting POSCAR structure, e.g., by copying from the CONTCAR file. The new structure may share some atom types with the previous run but this is not a requirement. It is also possible to continue training with completely different atom types in the POSCAR file (remember to set up your POTCAR accordingly). The only other action required is to copy the existing database to the ML_AB file:

cp ML_ABN ML_AB

Leave ML_MODE = train unchanged and restart VASP. The log file will contain a section describing the existing data set and after initial generation of a force field the regular on-the-fly procedure continues. In the end, the resulting ML_ABN will contain the training structures from both on-the-fly runs. Similarly, the ML_FFN file is a combined force field. In the presence of an ML_AB file the train mode will always perform a continuation run. If you would like to start from scratch just remove the ML_AB file from the execution directory.

Tip: Apply this strategy repeatedly in order to systematically improve your MLFF, e.g., first train on water only, then on sodium chloride and finally, train on the combination of both.

Step 4: Refit for fast prediction mode

When on-the-fly training succeeded and the result matches your expectations with respect to applicability and residual errors there is one final step required before the force field should be applied in prediction-only MD runs: refitting for fast prediction mode. Copy once again the final data set to ML_AB:

cp ML_ABN ML_AB

Also, set in the INCAR file:

ML_MODE = refit

Running VASP will create a new ML_FFN which finally can be used for production.

Important: Although it is technically possible to continue directly with step 5 given a ML_FFN file from steps 2 or 3 it is strongly discouraged. Without the refitting step VASP cannot enable the fast prediction mode which comes with speedup factor of approximately 20 to 100. You can check the ML_FFN ASCII header to be sure whether the contained force field supports fast prediction.

Step 5: Applying the machine-learned force field in production runs

The MLFF obtained from step 4 is now ready to be applied in the prediction-only mode. First, copy the ML_FFN file:

cp ML_FFN ML_FF

In the INCAR file set

ML_MODE = run

With this choice VASP will only use the predictions from the MLFF, no ab-initio calculations are performed. The execution time per time step will be orders of magnitude lower if compared with corresponding ab-initio runs.

Tip: The MLFF can be transferred to larger system sizes, i.e., you may duplicate your simulation box to benefit from improved statistics. Because the method scales linearly with the number of atoms you can easily estimate the impact on computational demand.
Warning: Please have a look at a known issue (from date 2023-05-17) if you intend to use triclinic geometries with small or large lattice angles.

Advice

On-the-fly learning can be significantly more involved than, e.g., a single-point electronic calculation, because it combines multiple features of VASP. Each part requires a proper setup via the available INCAR tags. A misconfiguration corresponding to one part of the calculation may have severe effects on the quality of the resulting MLFF. In the worst case, successful training may even be impossible. To be more specific, on-the-fly learning requires control over the following aspects:

  • Consistent convergence
It is required that all ab-initio reference data collected via on-the-fly training is consistent and well-converged with respect to the single-point electronic calculation setup. Mind different temperatures and densities targeted in MD runs. A MLFF can only reproduce a single potential energy landscape!
Consider the choice of thermodynamic ensembles, thermostat and barostat settings and an appropriate time step.
  • Proper setup of machine-learned force field parameters
Mind system-dependent parameters like the cutoff radius or atomic environment descriptor resolution.
  • Control over data set generation via on-the-fly learning
Monitor and control how much ab-initio reference data is harvested via automatic Bayesian threshold determination and sparsification.
  • Quality control
Establish reasonable expectations regarding residual training errors. Benchmark the quality of resulting force fields by comparison of predictions with known quantities (from ab-initio).
Tip: Begin by thoroughly familiarizing yourself with pure ab-initio calculations for your system before attempting to generate a MLFF from scratch. Once you are confident in controlling the convergence, proceed to run a brief MD simulation without machine learning assistance. Validate whether the results align with expected values regarding conservation principles and so forth. Only then, move forward with the machine learning aspects of the calculation.

Parallelization

At present, VASP provides only MPI-based parallelization for the MLFF feature. Therefore, any operational mode relying exclusively on MLFF code - such as predictive MD simulations (ML_MODE = run) and local reference configuration selection (ML_MODE = select) - cannot leverage alternative forms of parallelization like OpenACC offloading to GPUs or an MPI/OpenMP hybrid approach. Conversely, a usual on-the-fly training involves both MLFF generation and ab-initio computations. When the latter component predominates in terms of computational demand, utilizing non-MPI parallelization remains practical.