ML IWEIGHT: Difference between revisions

From VASP Wiki
No edit summary
No edit summary
 
(24 intermediate revisions by 3 users not shown)
Line 1: Line 1:
{{TAGDEF|ML_FF_IWEIGHT|[integer]|3}}
{{DISPLAYTITLE:ML_IWEIGHT}}
{{TAGDEF|ML_IWEIGHT|[integer]|3}}


Description: Flag to control the weighting of training data in the machine learning force field method.
Description: This tag controls which procedure is used for normalizing and weighting the energies, forces, and stresses in the machine learning force field method.
----
----
The following cases for {{TAG|ML_FF_IWEIGHT}} are possible:
To achieve optimal training it is important to normalize the available data. Furthermore, sometimes it may be desired to emphasize some training quantities over others, e.g. one might want excellent force predictions, even at the cost of sacrificing some energy and stress accuracy. How normalizing and weighting are performed can be controlled with the {{TAG|ML_IWEIGHT}} together with weighting parameters {{TAG|ML_WTOTEN}}, {{TAG|ML_WTIFOR}} and {{TAG|ML_WTSIF}} for energies, forces, and stresses, respectively. The following procedures can be selected via {{TAG|ML_IWEIGHT}}:
*{{TAG|ML_FF_IWEIGHT}}=1: Unnormalized energy, force and stress training data are divided by the weights determined by the flags {{TAG|ML_FF_WTOTEN}} (eV/atom), {{TAG|ML_FF_WTIFOR}} (eV/Angstrom) and {{TAG|ML_FF_WTSIF}} (kBar), respectively.
*{{TAG|ML_FF_IWEIGHT}}=2: The training data are normalized by using their standard deviations. The averaging is done over whole training data. The normalized energy, forces and stress tensor are multiplied by {{TAG|ML_FF_WTOTEN}}, {{TAG|ML_FF_WTIFOR}} and {{TAG|ML_FF_WTSIF}}, respectively. In this case the flags {{TAG|ML_FF_WTOTEN}}, {{TAG|ML_FF_WTIFOR}} and {{TAG|ML_FF_WTSIF}} are unitless quantities.
*{{TAG|ML_FF_IWEIGHT}}=3: Same as {{TAG|ML_FF_IWEIGHT}}=2 but training data is divided into individual systems. The energy, forces and stress tensor for a system are normalized using the average of the standard deviations of each system in the training data.


'''Mind''': For {{TAG|ML_FF_IWEIGHT}}=2 and 3 the weights are unitless quantities used to multiply the data, whereas for {{TAG|ML_FF_IWEIGHT}}=1 they have a unit and are used to divide the data by them. All three methods provide unitless energies, forces and stress tensors.
*{{TAG|ML_IWEIGHT}} = 1: Manual control over normalization/weighting: the unnormalized energies, forces, and stress tensor training data are divided by the weights determined by the flags {{TAG|ML_WTOTEN}} (eV/atom), {{TAG|ML_WTIFOR}} (eV/<math>\AA</math>) and {{TAG|ML_WTSIF}} (kBar), respectively.


== Related Tags and Sections ==
*{{TAG|ML_IWEIGHT}} = 2: Normalization via global standard deviations: The energies, forces, and stresses are normalized by their respective standard deviation over the entire training data. Then, the normalized quantities are weighted by {{TAG|ML_WTOTEN}}, {{TAG|ML_WTIFOR}} and {{TAG|ML_WTSIF}} when they are processed for learning in the design matrix <math>\mathbf{\Phi}</math> (see [[Machine learning force field: Theory#Matrix_vector_form_of_linear_equations|this section]]). In this case the values of {{TAG|ML_WTOTEN}}, {{TAG|ML_WTIFOR}} and {{TAG|ML_WTSIF}} are unitless quantities.
{{TAG|ML_FF_LMLFF}}, {{TAG|ML_FF_WTOTEN}}, {{TAG|ML_FF_WTIFOR}}, {{TAG|ML_FF_WTSIF}}


{{sc|ML_FF_IWEIGHT|Examples|Examples that use this tag}}
*{{TAG|ML_IWEIGHT}} = 3: Normalization via averages over subset standard deviations: Same as {{TAG|ML_IWEIGHT}} = 2 but the training data is divided into individual subsets. For each subset, the standard deviations are calculated separately. Then, the energies, forces, and stresses are normalized using the average of the standard deviations of all subsets. Finally, as for {{TAG|ML_IWEIGHT}} = 2 the normalized quantities are multiplied by {{TAG|ML_WTOTEN}}, {{TAG|ML_WTIFOR}} and {{TAG|ML_WTSIF}} for learning purposes. By default ({{TAG|ML_LUSE_NAMES}}=''.FALSE.'') the division into subsets is based on the atom types and number of atoms per type. If two systems contain the same atom types and the same number of atoms per type then they are considered to be in the same subset. To further divide them into subsets set {{TAG|ML_LUSE_NAMES}}=''.TRUE.'' and choose different system names in the first line of the  {{TAG|POSCAR}} file. This can be useful if training is performed for widely different materials, for instance, different phases with widely different energies. Without the finer subset assignment, the overall energy standard deviation might become large, reducing the weight of the energies too much of given subsets.
 
For {{TAG|ML_IWEIGHT}} = 2, 3 the weights are unitless quantities used to multiply the data, whereas for {{TAG|ML_IWEIGHT}} = 1 they have a unit. All three methods provide unitless energies, forces, and stress tensors, which are then passed to the learning algorithm. Although the defaults are usually rather sensible, it can be useful to explore different weights. For instance, if vibrational frequencies are supposed to be reproduced accurately, we found it helpful to increase {{TAG|ML_WTIFOR}} to 10-100. On the other hand, if the energy difference between different phases needs to be described accurately by the force field, it might be useful to increase {{TAG|ML_WTOTEN}} to around 10-100.
{{NB|tip|On-the-fly learning implies that training structures accumulate along the running MD trajectory. Hence, also the standard deviations of energies, forces, and stresses change over time and will be recalculated whenever a learning step is triggered. We highly recommend using {{TAG|ML_IWEIGHT}} {{=}} 3 because this ensures that at any time learning is performed on an adequately normalized set.}}
 
== Related tags and articles ==
{{TAG|ML_LMLFF}}, {{TAG|ML_WTOTEN}}, {{TAG|ML_WTIFOR}}, {{TAG|ML_WTSIF}}, {{TAG|ML_LUSE_NAMES}}
 
{{sc|ML_IWEIGHT|Examples|Examples that use this tag}}
----
----


[[Category:INCAR]][[Category:Machine Learning]][[Category:Machine Learned Force Fields]][[Category:VASP6]]
[[Category:INCAR tag]][[Category:Machine-learned force fields]]

Latest revision as of 15:04, 29 August 2024

ML_IWEIGHT = [integer]
Default: ML_IWEIGHT = 3 

Description: This tag controls which procedure is used for normalizing and weighting the energies, forces, and stresses in the machine learning force field method.


To achieve optimal training it is important to normalize the available data. Furthermore, sometimes it may be desired to emphasize some training quantities over others, e.g. one might want excellent force predictions, even at the cost of sacrificing some energy and stress accuracy. How normalizing and weighting are performed can be controlled with the ML_IWEIGHT together with weighting parameters ML_WTOTEN, ML_WTIFOR and ML_WTSIF for energies, forces, and stresses, respectively. The following procedures can be selected via ML_IWEIGHT:

  • ML_IWEIGHT = 1: Manual control over normalization/weighting: the unnormalized energies, forces, and stress tensor training data are divided by the weights determined by the flags ML_WTOTEN (eV/atom), ML_WTIFOR (eV/) and ML_WTSIF (kBar), respectively.
  • ML_IWEIGHT = 2: Normalization via global standard deviations: The energies, forces, and stresses are normalized by their respective standard deviation over the entire training data. Then, the normalized quantities are weighted by ML_WTOTEN, ML_WTIFOR and ML_WTSIF when they are processed for learning in the design matrix (see this section). In this case the values of ML_WTOTEN, ML_WTIFOR and ML_WTSIF are unitless quantities.
  • ML_IWEIGHT = 3: Normalization via averages over subset standard deviations: Same as ML_IWEIGHT = 2 but the training data is divided into individual subsets. For each subset, the standard deviations are calculated separately. Then, the energies, forces, and stresses are normalized using the average of the standard deviations of all subsets. Finally, as for ML_IWEIGHT = 2 the normalized quantities are multiplied by ML_WTOTEN, ML_WTIFOR and ML_WTSIF for learning purposes. By default (ML_LUSE_NAMES=.FALSE.) the division into subsets is based on the atom types and number of atoms per type. If two systems contain the same atom types and the same number of atoms per type then they are considered to be in the same subset. To further divide them into subsets set ML_LUSE_NAMES=.TRUE. and choose different system names in the first line of the POSCAR file. This can be useful if training is performed for widely different materials, for instance, different phases with widely different energies. Without the finer subset assignment, the overall energy standard deviation might become large, reducing the weight of the energies too much of given subsets.

For ML_IWEIGHT = 2, 3 the weights are unitless quantities used to multiply the data, whereas for ML_IWEIGHT = 1 they have a unit. All three methods provide unitless energies, forces, and stress tensors, which are then passed to the learning algorithm. Although the defaults are usually rather sensible, it can be useful to explore different weights. For instance, if vibrational frequencies are supposed to be reproduced accurately, we found it helpful to increase ML_WTIFOR to 10-100. On the other hand, if the energy difference between different phases needs to be described accurately by the force field, it might be useful to increase ML_WTOTEN to around 10-100.

Tip: On-the-fly learning implies that training structures accumulate along the running MD trajectory. Hence, also the standard deviations of energies, forces, and stresses change over time and will be recalculated whenever a learning step is triggered. We highly recommend using ML_IWEIGHT = 3 because this ensures that at any time learning is performed on an adequately normalized set.

Related tags and articles

ML_LMLFF, ML_WTOTEN, ML_WTIFOR, ML_WTSIF, ML_LUSE_NAMES

Examples that use this tag