ML AB: Difference between revisions

From VASP Wiki
No edit summary
No edit summary
Line 118: Line 118:
*All element dependent quantities must follow the order of the element entries of the line <code>The atom types in the data file</code>.
*All element dependent quantities must follow the order of the element entries of the line <code>The atom types in the data file</code>.
*The order of the entries for the header and also the data is fixed.  
*The order of the entries for the header and also the data is fixed.  
*The ledger lines cannot be omitted. "*****" and "-----" lines for the header. *****", "-----" and "=====" lines for the data.


Header:
== Header ==
*<code>The number of configurations</code>: Total number of training configurations.
*<code>The number of configurations</code>: Total number of training configurations.
*<code>The maximum number of atom type</code>: Union of the types of all configurations.
*<code>The maximum number of atom type</code>: Union of the types of all configurations.
Line 127: Line 128:
*<code>Reference atomic energy (eV)>: Reference atomic energies used in the calculation for each element type. Multiple lines for more than 3 element types. Maximum 3 entries per line. This entry is only important for <code>ML_ISCALE_TOTEN=1</code>.
*<code>Reference atomic energy (eV)>: Reference atomic energies used in the calculation for each element type. Multiple lines for more than 3 element types. Maximum 3 entries per line. This entry is only important for <code>ML_ISCALE_TOTEN=1</code>.
*<code>Atomic mass</code>: Atomic mass of each element type. Multiple lines for more than 3 element types. Maximum 3 entries per line.
*<code>Atomic mass</code>: Atomic mass of each element type. Multiple lines for more than 3 element types. Maximum 3 entries per line.
*<code>The numbers of basis sets per atom type</code>: Number of local reference configurations for each type.  
*<code>The numbers of basis sets per atom type</code>: Number of local reference configurations for each type. Multiple lines for more than 3 element types. Maximum 3 entries per line.
*<code>Basis set for X</code>: List of local reference configurations for each type. This line is followed by a block with two columns. The first column shows from which training structure the local reference configuration is taken. The second column shows the number of the atom in that training structure that is chosen as a local reference configuration. This whole block (together with the title line) is repeated for each element type in the force field.
*<code>Basis set for X</code>: List of local reference configurations for each type. This line is followed by a block with two columns. The first column shows from which training structure the local reference configuration is taken. The second column shows the number of the atom in that training structure that is chosen as a local reference configuration. This whole block (together with the title line) is repeated for each element type in the force field.


Training structures:
== Training structure data ==
*<code>Configuration num.      n</code>: The data is stored for each configuration of the training data. The training structures have to be numbered consecutively starting with 1.
*<code>Configuration num.      n</code>: The data is stored for each configuration of the training data. The training structures have to be numbered consecutively starting with 1.
*<code>System name</code>: Name of the structure. The length of the system names are limited to 40 characters (same as for the structure names in the [[POSCAR]] file).
*<code>System name</code>: Name of the structure. The length of the system names are limited to 40 characters (same as for the structure names in the [[POSCAR]] file).
Line 142: Line 143:
*<code>Forces (eV ang.^-1)</code>: Forces (in eV/Angstrom) for each atom in the structure.
*<code>Forces (eV ang.^-1)</code>: Forces (in eV/Angstrom) for each atom in the structure.
*<code>Stress (kbar)</code>: 6 entries for the stress tensor (in kB) of the structure.
*<code>Stress (kbar)</code>: 6 entries for the stress tensor (in kB) of the structure.
== How to merge different ML_AB files ==
*The training structure data can be simply concatenated, but the numbering of the structures needs to be renewed, so that it goes from 1 to the new maximum number of structures seamlessly.
*We strongly advise to group structures with the same number of elements and atoms per element in the training data together, otherwise the code will automatically reorder the data, such that those are sticking together. This makes problems in the <code>diff</code> of an {{TAG|ML_AB}} file and it's corresponding {{TAG|ML_ABN}} file.
*Adjust the header if needed (element types, maximum number of atoms, maximum number of atoms per element type, etc.).
*The local reference configurations need to be recalculated, since they were only calculated for separate structures. To do this first set <code>The numbers of basis sets per atom type</code> to one for each species. Then also set the block <code>Basis set for X</code> with dummy value <code>  1  1</code> for each species. After that run the code using {{TAG|ML_ISTART}}=3. This will select new local reference configurations on the scratch for the new combined training data.


'''Important''': The maximum size of the training structures {{TAG|ML_MCONF}} and the maximum size for the local configurations {{TAG|ML_MB}} in the {{TAG|INCAR}} file have to be set larger than the entries ''The number of configurations'' and ''The numbers of basis sets per atom type'' in the {{TAG|ML_AB}} file.
'''Important''': The maximum size of the training structures {{TAG|ML_MCONF}} and the maximum size for the local configurations {{TAG|ML_MB}} in the {{TAG|INCAR}} file have to be set larger than the entries ''The number of configurations'' and ''The numbers of basis sets per atom type'' in the {{TAG|ML_AB}} file.
----
----
[[Category:Files]][[Category:Machine-learned force fields]][[Category:Input files]]
[[Category:Files]][[Category:Machine-learned force fields]][[Category:Input files]]

Revision as of 11:50, 12 October 2022

This file is used within the machine learning force field method. It contains the ab initio data from previous calculations: Bravais matrices, atom positions, energies, forces and stress tensors (the charge is also written out but only optionally used) . It is used for continuation runs (ML_ISTART=1 or ML_ISTART=2). The updated data is written to ML_ABN. Essentially the ML_AB and the ML_ABN files are the same and for continuation runs the ML_ABN file is just copied to ML_AB.

Here is a sample output how this file should look like:

1.0 Version
**************************************************
     The number of configurations
--------------------------------------------------
         10
**************************************************
     The maximum number of atom type
--------------------------------------------------
       1
**************************************************
     The atom types in the data file
--------------------------------------------------
     Si
**************************************************
     The maximum number of atoms per system
--------------------------------------------------
             64
**************************************************
     The maximum number of atoms per atom type
--------------------------------------------------
             64
**************************************************
     Reference atomic energy (eV)
--------------------------------------------------
 -0.785951000000000
**************************************************
     Atomic mass
--------------------------------------------------
   28.0850000000000
**************************************************
     The numbers of basis sets per atom type
--------------------------------------------------
        10
**************************************************
     Basis set for Si
--------------------------------------------------
          1     53
          2      3
          3     19
          4     62
          5     51
          6     41
          7     49
          8      3
          9     64
         10     56
**************************************************
     Configuration num.      1
==================================================
     System name
--------------------------------------------------
     Si_liquid
==================================================
     The number of atom types
--------------------------------------------------
       1
==================================================
     The number of atoms
--------------------------------------------------
         64
**************************************************
     Atom types and atom numbers
--------------------------------------------------
     Si     64
==================================================
     Primitive lattice vectors (ang.)
--------------------------------------------------
  11.0072130000000        0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  11.0072130000000        0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  10.1908520000000
==================================================
     Wycoff positions (Cartesian)
--------------------------------------------------
   0.69872000000000        2.47436000000000        7.82749000000000
   6.37825000000000        1.01296000000000        3.70012000000000
   5.47749000000000        3.63097000000000        7.54054000000000
   4.52326000000000        10.2439400000000        5.06801000000000
   ...                     ...                     ...
   ...                     ...                     ...
==================================================
     Total energy (eV)
--------------------------------------------------
  -302.800146000000
==================================================
     Forces (eV ang.^-1)
--------------------------------------------------
  0.256099000000000      -0.510102000000000       0.652442000000000
 -0.538669000000000       7.069000000000000E-002  3.899200000000000E-002
  0.189456000000000       0.566218000000000       2.230000000000000E-004
 -1.485015000000000       0.755044000000000       0.261758000000000
 -0.285376000000000      -0.341509000000000       -1.00031200000000
  ...                     ...                    ...
  ...                     ...                    ...
==================================================
     Stress (kbar)
--------------------------------------------------
     XX YY ZZ
--------------------------------------------------
  -26.3822000000000       -7.00984000000000       -31.6619300000000
--------------------------------------------------
     XY YZ ZX
--------------------------------------------------
   4.95694000000000        2.44523000000000        6.77740000000000
**************************************************
     Configuration num.      2
==================================================
...
...
...


Some general remarks:

  • All element type dependent information is limited to 3 entries per line. For more than 3 types or multiples of 3 the entries are written over multiple lines.
  • All element dependent quantities must follow the order of the element entries of the line The atom types in the data file.
  • The order of the entries for the header and also the data is fixed.
  • The ledger lines cannot be omitted. "*****" and "-----" lines for the header. *****", "-----" and "=====" lines for the data.

Header

  • The number of configurations: Total number of training configurations.
  • The maximum number of atom type: Union of the types of all configurations.
  • The atom types in the data file: Listing of all atom types (two characters for each type as in VASP) appearing in all structures. Multiple lines for more than 3 element types. Maximum 3 entries per line.
  • The maximum number of atoms per system: The largest number of atoms within one structure among all training structures.
  • The maximum number of atoms per atom type: The largest number of atoms per element within one structure among all elements within all training structures.
  • Reference atomic energy (eV)>: Reference atomic energies used in the calculation for each element type. Multiple lines for more than 3 element types. Maximum 3 entries per line. This entry is only important for ML_ISCALE_TOTEN=1.
  • Atomic mass: Atomic mass of each element type. Multiple lines for more than 3 element types. Maximum 3 entries per line.
  • The numbers of basis sets per atom type: Number of local reference configurations for each type. Multiple lines for more than 3 element types. Maximum 3 entries per line.
  • Basis set for X: List of local reference configurations for each type. This line is followed by a block with two columns. The first column shows from which training structure the local reference configuration is taken. The second column shows the number of the atom in that training structure that is chosen as a local reference configuration. This whole block (together with the title line) is repeated for each element type in the force field.

Training structure data

  • Configuration num. n: The data is stored for each configuration of the training data. The training structures have to be numbered consecutively starting with 1.
  • System name: Name of the structure. The length of the system names are limited to 40 characters (same as for the structure names in the POSCAR file).
  • The number of atom types: The number of atom types in the structure. This has to be at least a subset of element types of The atom types in the data file in the header and can maximally have all element types of the header.
  • The number of atoms: Number of atoms in the structure.
  • Atom types and atom numbers: Atom types and number of atoms per type in the structure. Each type is written on a separate line.
  • Optional CTIFOR: Value of ML_CTIFOR used for the sampling of the structure. This line is optional and may not occur in your file. It is important, that either none of the training structures contain this entry or all of them contain it. It is not permitted to have mixed entries.
  • Primitive lattice vectors (ang.): Bravais matrix of the structure. The units are in Angstrom.
  • Wycoff positions (Cartesian): Ionic positions in Cartesian coordinates. The units are in Angstrom.
  • Total energy (eV): Total energy (in eV) of the structure.
  • Forces (eV ang.^-1): Forces (in eV/Angstrom) for each atom in the structure.
  • Stress (kbar): 6 entries for the stress tensor (in kB) of the structure.

How to merge different ML_AB files

  • The training structure data can be simply concatenated, but the numbering of the structures needs to be renewed, so that it goes from 1 to the new maximum number of structures seamlessly.
  • We strongly advise to group structures with the same number of elements and atoms per element in the training data together, otherwise the code will automatically reorder the data, such that those are sticking together. This makes problems in the diff of an ML_AB file and it's corresponding ML_ABN file.
  • Adjust the header if needed (element types, maximum number of atoms, maximum number of atoms per element type, etc.).
  • The local reference configurations need to be recalculated, since they were only calculated for separate structures. To do this first set The numbers of basis sets per atom type to one for each species. Then also set the block Basis set for X with dummy value 1 1 for each species. After that run the code using ML_ISTART=3. This will select new local reference configurations on the scratch for the new combined training data.


Important: The maximum size of the training structures ML_MCONF and the maximum size for the local configurations ML_MB in the INCAR file have to be set larger than the entries The number of configurations and The numbers of basis sets per atom type in the ML_AB file.