ML IWEIGHT: Difference between revisions

From VASP Wiki
No edit summary
No edit summary
 
(8 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{DISPLAYTITLE:ML_IWEIGHT}}
{{TAGDEF|ML_IWEIGHT|[integer]|3}}
{{TAGDEF|ML_IWEIGHT|[integer]|3}}


Description: This tag controls which procedure is used for normalizing and weighting the energies, forces and stresses in the machine learning force field method.
Description: This tag controls which procedure is used for normalizing and weighting the energies, forces, and stresses in the machine learning force field method.
----
----
In order to achieve optimal training it is important to normalize the available data. Furthermore, sometimes it may be desired to emphasize some training quantities over others, e.g. one might want excellent force predictions, even at the cost of sacrificing some energy and stress accuracy. How normalizing and weighting is performed can be controlled with the {{TAG|ML_IWEIGHT}} together with {{TAG|ML_WTOTEN}}, {{TAG|ML_WTIFOR}} and {{TAG|ML_WTSIF}}.
To achieve optimal training it is important to normalize the available data. Furthermore, sometimes it may be desired to emphasize some training quantities over others, e.g. one might want excellent force predictions, even at the cost of sacrificing some energy and stress accuracy. How normalizing and weighting are performed can be controlled with the {{TAG|ML_IWEIGHT}} together with weighting parameters {{TAG|ML_WTOTEN}}, {{TAG|ML_WTIFOR}} and {{TAG|ML_WTSIF}} for energies, forces, and stresses, respectively. The following procedures can be selected via {{TAG|ML_IWEIGHT}}:


For {{TAG|ML_IWEIGHT}} the following settings  are possible:
*{{TAG|ML_IWEIGHT}} = 1: Manual control over normalization/weighting: the unnormalized energies, forces, and stress tensor training data are divided by the weights determined by the flags {{TAG|ML_WTOTEN}} (eV/atom), {{TAG|ML_WTIFOR}} (eV/<math>\AA</math>) and {{TAG|ML_WTSIF}} (kBar), respectively.
*{{TAG|ML_IWEIGHT}}=1: The unnormalized energies, forces and stress tensor training data are divided by the weights determined by the flags {{TAG|ML_WTOTEN}} (eV/atom), {{TAG|ML_WTIFOR}} (eV/Angstrom) and {{TAG|ML_WTSIF}} (kBar), respectively.
*{{TAG|ML_IWEIGHT}}=2: The training data are normalized by using their standard deviations. The averaging is done over all training data. Then, the normalized energy, forces and stress tensor are multiplied by {{TAG|ML_WTOTEN}}, {{TAG|ML_WTIFOR}} and {{TAG|ML_WTSIF}}, respectively. In this case the flags {{TAG|ML_WTOTEN}}, {{TAG|ML_WTIFOR}} and {{TAG|ML_WTSIF}} are unitless quantities.
*{{TAG|ML_IWEIGHT}}=3: Same as {{TAG|ML_IWEIGHT}}=2 but the training data is divided into individual subsets. For each subset the standard deviations are calculated separately. The energies, forces and stress are normalized using the average of the standard deviations of all subsets. Finally, the normalized energy, forces and stress tensor are multiplied by {{TAG|ML_WTOTEN}}, {{TAG|ML_WTIFOR}} and {{TAG|ML_WTSIF}}, respectively. The division into subsets is based on the name tag as given in the first line of the {{TAG|POSCAR}} file. If training is performed for widely different materials, for instance different phases that have widely different energies, it is important to chose different system names in the first line of the  {{TAG|POSCAR}} file. If this is not done, the standard deviation for the energy might become large, concomitantly reducing the weight of the energy equations.


*{{TAG|ML_IWEIGHT}} = 2: Normalization via global standard deviations: The energies, forces, and stresses are normalized by their respective standard deviation over the entire training data. Then, the normalized quantities are weighted by {{TAG|ML_WTOTEN}}, {{TAG|ML_WTIFOR}} and {{TAG|ML_WTSIF}} when they are processed for learning in the design matrix <math>\mathbf{\Phi}</math> (see [[Machine learning force field: Theory#Matrix_vector_form_of_linear_equations|this section]]). In this case the values of {{TAG|ML_WTOTEN}}, {{TAG|ML_WTIFOR}} and {{TAG|ML_WTSIF}} are unitless quantities.


'''Mind''': For {{TAG|ML_IWEIGHT}}=2 and 3 the weights are unitless quantities used to multiply the data, whereas for {{TAG|ML_IWEIGHT}}=1 they have a unit. All three methods provide unitless energies, forces and stress tensors, which are then passed
*{{TAG|ML_IWEIGHT}} = 3: Normalization via averages over subset standard deviations: Same as {{TAG|ML_IWEIGHT}} = 2 but the training data is divided into individual subsets. For each subset, the standard deviations are calculated separately. Then, the energies, forces, and stresses are normalized using the average of the standard deviations of all subsets. Finally, as for {{TAG|ML_IWEIGHT}} = 2 the normalized quantities are multiplied by {{TAG|ML_WTOTEN}}, {{TAG|ML_WTIFOR}} and {{TAG|ML_WTSIF}} for learning purposes. By default ({{TAG|ML_LUSE_NAMES}}=''.FALSE.'') the division into subsets is based on the atom types and number of atoms per type. If two systems contain the same atom types and the same number of atoms per type then they are considered to be in the same subset. To further divide them into subsets set {{TAG|ML_LUSE_NAMES}}=''.TRUE.'' and choose different system names in the first line of the {{TAG|POSCAR}} file. This can be useful if training is performed for widely different materials, for instance, different phases with widely different energies. Without the finer subset assignment, the overall energy standard deviation might become large, reducing the weight of the energies too much of given subsets.  
to the regression. Although the defaults are usually rather sensible, it can be useful to explore different weights. For instance,
if vibrational frequencies are supposed to be reproduced accurately, we found it helpful to increase {{TAG|ML_WTIFOR}} to 10-100.
On the other hand, if energy difference between different phases need to be described accurately by the force field, it might be
useful to increase {{TAG|ML_WTOTEN}} to around 10-100.


== Related Tags and Sections ==
For {{TAG|ML_IWEIGHT}} = 2, 3 the weights are unitless quantities used to multiply the data, whereas for {{TAG|ML_IWEIGHT}} = 1 they have a unit. All three methods provide unitless energies, forces, and stress tensors, which are then passed to the learning algorithm. Although the defaults are usually rather sensible, it can be useful to explore different weights. For instance, if vibrational frequencies are supposed to be reproduced accurately, we found it helpful to increase {{TAG|ML_WTIFOR}} to 10-100. On the other hand, if the energy difference between different phases needs to be described accurately by the force field, it might be useful to increase {{TAG|ML_WTOTEN}} to around 10-100.
{{TAG|ML_LMLFF}}, {{TAG|ML_WTOTEN}}, {{TAG|ML_WTIFOR}}, {{TAG|ML_WTSIF}}
{{NB|tip|On-the-fly learning implies that training structures accumulate along the running MD trajectory. Hence, also the standard deviations of energies, forces, and stresses change over time and will be recalculated whenever a learning step is triggered. We highly recommend using {{TAG|ML_IWEIGHT}} {{=}} 3 because this ensures that at any time learning is performed on an adequately normalized set.}}
 
== Related tags and articles ==
{{TAG|ML_LMLFF}}, {{TAG|ML_WTOTEN}}, {{TAG|ML_WTIFOR}}, {{TAG|ML_WTSIF}}, {{TAG|ML_LUSE_NAMES}}


{{sc|ML_IWEIGHT|Examples|Examples that use this tag}}
{{sc|ML_IWEIGHT|Examples|Examples that use this tag}}
----
----


[[Category:INCAR]][[Category:Machine Learning]][[Category:Machine Learned Force Fields]][[Category: Alpha]]
[[Category:INCAR tag]][[Category:Machine-learned force fields]]

Latest revision as of 15:04, 29 August 2024

ML_IWEIGHT = [integer]
Default: ML_IWEIGHT = 3 

Description: This tag controls which procedure is used for normalizing and weighting the energies, forces, and stresses in the machine learning force field method.


To achieve optimal training it is important to normalize the available data. Furthermore, sometimes it may be desired to emphasize some training quantities over others, e.g. one might want excellent force predictions, even at the cost of sacrificing some energy and stress accuracy. How normalizing and weighting are performed can be controlled with the ML_IWEIGHT together with weighting parameters ML_WTOTEN, ML_WTIFOR and ML_WTSIF for energies, forces, and stresses, respectively. The following procedures can be selected via ML_IWEIGHT:

  • ML_IWEIGHT = 1: Manual control over normalization/weighting: the unnormalized energies, forces, and stress tensor training data are divided by the weights determined by the flags ML_WTOTEN (eV/atom), ML_WTIFOR (eV/) and ML_WTSIF (kBar), respectively.
  • ML_IWEIGHT = 2: Normalization via global standard deviations: The energies, forces, and stresses are normalized by their respective standard deviation over the entire training data. Then, the normalized quantities are weighted by ML_WTOTEN, ML_WTIFOR and ML_WTSIF when they are processed for learning in the design matrix (see this section). In this case the values of ML_WTOTEN, ML_WTIFOR and ML_WTSIF are unitless quantities.
  • ML_IWEIGHT = 3: Normalization via averages over subset standard deviations: Same as ML_IWEIGHT = 2 but the training data is divided into individual subsets. For each subset, the standard deviations are calculated separately. Then, the energies, forces, and stresses are normalized using the average of the standard deviations of all subsets. Finally, as for ML_IWEIGHT = 2 the normalized quantities are multiplied by ML_WTOTEN, ML_WTIFOR and ML_WTSIF for learning purposes. By default (ML_LUSE_NAMES=.FALSE.) the division into subsets is based on the atom types and number of atoms per type. If two systems contain the same atom types and the same number of atoms per type then they are considered to be in the same subset. To further divide them into subsets set ML_LUSE_NAMES=.TRUE. and choose different system names in the first line of the POSCAR file. This can be useful if training is performed for widely different materials, for instance, different phases with widely different energies. Without the finer subset assignment, the overall energy standard deviation might become large, reducing the weight of the energies too much of given subsets.

For ML_IWEIGHT = 2, 3 the weights are unitless quantities used to multiply the data, whereas for ML_IWEIGHT = 1 they have a unit. All three methods provide unitless energies, forces, and stress tensors, which are then passed to the learning algorithm. Although the defaults are usually rather sensible, it can be useful to explore different weights. For instance, if vibrational frequencies are supposed to be reproduced accurately, we found it helpful to increase ML_WTIFOR to 10-100. On the other hand, if the energy difference between different phases needs to be described accurately by the force field, it might be useful to increase ML_WTOTEN to around 10-100.

Tip: On-the-fly learning implies that training structures accumulate along the running MD trajectory. Hence, also the standard deviations of energies, forces, and stresses change over time and will be recalculated whenever a learning step is triggered. We highly recommend using ML_IWEIGHT = 3 because this ensures that at any time learning is performed on an adequately normalized set.

Related tags and articles

ML_LMLFF, ML_WTOTEN, ML_WTIFOR, ML_WTSIF, ML_LUSE_NAMES

Examples that use this tag