ML MODE: Difference between revisions

From VASP Wiki
No edit summary
No edit summary
 
(37 intermediate revisions by 3 users not shown)
Line 1: Line 1:
{{DISPLAYTITLE:ML_MODE}}
{{DISPLAYTITLE:ML_MODE}}
{{TAGDEF|ML_MODE|[string]|NONE}}
{{TAGDEF|ML_MODE|train {{!}} select {{!}} refit {{!}} refitbayesian {{!}} run {{!}} none | none}}


Description: String-based tag selecting operation mode for machine learning force fields.
Description: String-based tag selecting operation mode for machine learning force fields.
Line 7: Line 7:
This tag acts as a "super tag" and selects the operation mode by selecting the defaults for all other tags. Every tag that is affected by this "super tag" can be overwritten by the user by simply specifying the value for that tag.
This tag acts as a "super tag" and selects the operation mode by selecting the defaults for all other tags. Every tag that is affected by this "super tag" can be overwritten by the user by simply specifying the value for that tag.
The following options are available for this tag:
The following options are available for this tag:
*{{TAG|ML_MODE}} = TRAIN or train:
:Force predictions from the machine learning force field are used to drive the molecular dynamics (MD) simulation. However, if the error estimation performed in each time step indicates a high force error an ab initio calculation is performed instead and the collected energy, forces, and stress are used to improve the machine learning force field.
: There are two possible cases depending on, whether an {{TAG|ML_AB}} is present in the calculation folder or not:
**No {{TAG|ML_AB}} file: On-the-fly training, starting from scratch.
::Sets: {{TAG|ML_ISTART}} = 0.
::Note: At the beginning of the MD run, there is no force field available and ab-initio calculations will happen frequently.
**{{TAG|ML_AB}} file present: On-the-fly training, taking into account pre-existing ab initio data.
::Sets: {{TAG|ML_ISTART}} = 1.
::Note: This is the usual choice for continuing a previous MD simulation with activated machine learning. Before the MD run starts the {{FILE|ML_AB}} file, copied from {{FILE|ML_ABN}} from a previous run, is read and the contained ab initio energies, forces, and stresses are used to generate an initial force field. Note that this preparative learning step adopts the previous choice of local reference configurations, i.e. the reference atomic environments entering the kernel are taken from a list in the {{FILE|ML_AB}} file. Then, the MD simulation is started with on-the-fly learning enabled. The {{FILE|ML_AB}} file does not necessarily need to contain structures matching the current starting configuration in the {{FILE|POSCAR}} file in terms of the simulation box, present elements, or the number of atoms. However, if the same elements appear the initial force field is of course used for predictions. In any case, the provided training data is included in the finally generated machine learning force field, i.e. the {{FILE|ML_FFN}} file will define a force field applicable to both, the structures in the {{FILE|ML_AB}} file and the current MD simulation. By restarting repeatedly with {{TAG|ML_MODE}}=''TRAIN'' while providing an {{FILE|ML_AB}} file from the last run it is possible to iteratively extend the applicability of the resulting machine learning force field, e.g. by exploring different temperature ranges or element compositions.
*{{TAG|ML_MODE}} = SELECT or select: Reselection of the local reference configurations is done for an existing {{TAG|ML_AB}} file. This tag sets the following tags: {{TAG|ML_ISTART}} = 3; {{TAG|NSW}} = 1 ; {{TAG|ML_CDOUB}} = 4. In this operation mode, a new machine learning force field is generated from ab initio data provided in the {{FILE|ML_AB}} file. The structures are read in and processed one by one as if harvested via an MD simulation. In other words, the same steps are performed as in on-the-fly training but the source of data is not an MD run but the series of structures available in {{FILE|ML_AB}}. This mode will ignore the list of local reference configurations in the {{FILE|ML_AB}} file and instead will determine a new collection which is written to the resulting {{FILE|ML_ABN}} file. A new iteration through the training structures can lead to a frequent update of the force field. This is quite time-consuming. Increasing {{TAG|ML_CDOUB}} from 2 to 4 for this mode will result in a much less frequent update of the force field. This leads to much more efficient calculations while practically not changing the results. The {{TAG|ML_AB}} file may contain values for ''CTIFOR'' for each training structure. These are the thresholds used to sample that structure from the previous training. If a value for {{TAG|ML_CTIFOR}} is specified in the {{TAG|INCAR}} file, that value is then used and the thresholds from the {{TAG|ML_AB}} are ignored. Otherwise: 1) If thresholds exist in the {{TAG|ML_AB}} they are used. 2) If no thresholds are specified the default value for {{TAG|ML_CTIFOR}} is used.
{{NB|tip|If calculations for {{TAG|ML_MODE}} {{=}} SELECT are too time-consuming, it is useful to increase {{TAG|ML_MCONF_NEW}} to values around 10-16. Together with {{TAG|ML_CDOUB}} {{=}} 4, this often accelerates the calculations by a factor of 2-4.}}
{{NB|mind|This operation mode needs to be used to generate {{VASP}} machine learning force fields from pre-computed or external ab initio data sets.}}
*{{TAG|ML_MODE}} = REFIT or refit: Refitting of the force field from an existing {{TAG|ML_AB}} file using the fast version. The following tags are set: {{TAG|ML_ISTART}}=4 ; {{TAG|ML_LFAST}}=.TRUE. ; {{TAG|NSW}}=1 ; {{TAG|ML_IALGO_LINREG}}=4 ; {{TAG|ML_SIGW0}} = 1E-7 ; {{TAG|ML_SIGV0}} = 1 ; {{TAG|ML_EPS_LOW}} = 1E-11. Similar to {{TAG|ML_MODE}} = ''SELECT'' refitting is done based on an existing {{TAG|ML_AB}} file, but the number of local reference configurations for each species is taken from the {{TAG|ML_AB}} file. Sparsification is performed on the local reference configurations, so the resulting {{TAG|ML_ABN}} might contain the same number or fewer local reference configurations than the {{TAG|ML_AB}} file. 
{{NB|warning|We strongly advise to use {{TAG|ML_MODE}} {{=}} ''REFIT'' if no error estimates are required during production runs.}}
*{{TAG|ML_MODE}} = REFITBAYESIAN or refitbayesian (deprecated); Same as {{TAG|ML_MODE}} = ''REFIT'' but Bayesian regression is employed. The following tags are set: {{TAG|ML_ISTART}}=4 ; {{TAG|NSW}}=1 ; {{TAG|ML_IALGO_LINREG}}=1 ; {{TAG|ML_LFAST}}=.FALSE.. This results in lower accuracy and much slower force fields than using {{TAG|ML_MODE}} = REFIT  and should be used with caution. On the other hand, this mode allows the generation of {{TAG|ML_FFN}} files that can calculate Bayesian error estimates.
*{{TAG|ML_MODE}} = RUN or run: Force field only mode is executed ({{TAG|ML_ISTART}}=2). This mode requires an {{TAG|ML_FF}} file. In this mode, the previously trained machine learning force field is read from the {{FILE|ML_FF}} file. The MD simulation is driven with predictions from the force field only, no ab initio calculations are performed and no learning is executed. This setting is typically used when the machine learning force field is considered mature and ready for production runs. Optionally if the force field was refitted using {{TAG|ML_MODE}}=''REFITBAYESIAN'', the Bayesian error estimate of the energies, forces, and stress can be computed and logged in the {{FILE|ML_LOGFILE}} by setting the output frequency of the Bayesian errors {{TAG|ML_IERR}}. The default is {{TAG|ML_IERR}}=0.
*{{TAG|ML_MODE}} = NONE or none: This tag is not used.


If any option other than the above is chosen or any of them is misspelled (be careful to write everything in upper case or lower case letters) the code will exit with an error.
<ul>
<li>{{TAGO|ML_MODE|train}}''': On-the-fly training'''<p>
Force predictions from the machine learning force field are used to drive the molecular dynamics (MD) simulation. However, if the error estimation performed in each time step indicates a high force error an ab initio calculation is performed instead and the collected energy, forces, and stress are used to improve the machine learning force field. There are two possible cases depending on, whether an {{TAG|ML_AB}} is present in the calculation folder or not:</p>
<ol>
<li>
No {{FILE|ML_AB}} file found:<p>
On-the-fly training is starting from scratch. Note that at the beginning of the MD run, when there is no force field available or it is still poorly trained, ''ab initio'' calculations will happen frequently. For VASP versions prior to 6.4.0 this corresponds to {{TAGO|ML_ISTART|0}}.</p>
</li>
<li>
{{FILE|ML_AB}} file present:<p>
Restart on-the-fly training from existing training database. Before the MD run starts the {{FILE|ML_AB}} file (a copy of the {{FILE|ML_ABN}} from a previous training run!) is read and the ''ab initio'' data (energies, forces, and stresses) and local reference configurations it contains are used to generate an initial force field. Subsequently, the on-the-fly training MD is started. For VASP versions prior to 6.4.0 this corresponds to {{TAGO|ML_ISTART|1}}.</p>{{NB|tip|None of the structures in the {{FILE|ML_AB}} file need to match he {{FILE|POSCAR}} file for the current MD training run in terms of the simulation box, elements, or number of atoms. However, if the same elements appear the initial force field is used for predictions in the current MD run.


{{NB|tip|The user may overwrite the default by specifying any of the machine learning tags in the INCAR file.}}
The training data contained in the {{FILE|ML_AB}} file is included in the final machine learning force field, i.e., the {{FILE|ML_FFN}} file will define a force field applicable to both the structures on the {{FILE|ML_AB}} file as well as to the current MD simulation. This means that by restarting repeatedly with {{TAGO|ML_MODE|train}}, and copying the {{FILE|ML_ABN}} file from the previous run to {{FILE|ML_AB}}(!), it is possible to iteratively extend the applicability of the a machine learning force field, e.g., by exploring different temperature ranges or element compositions.}}
</li>
</ol>
</li><br />
<li>
{{TAGO|ML_MODE|select}}''': Re-selection of local reference configurations'''<p>
A new machine learning force field is generated from the ''ab initio'' data provided in the {{FILE|ML_AB}} file. The structures are read and processed one by one as if harvested in an MD simulation. In other words, the same steps are performed as in on-the-fly training but the source of the data are not actual ''ab initio'' calculations in an MD run but the series of structures available in the {{FILE|ML_AB}} file. The list of local reference configurations on the {{FILE|ML_AB}} file will be ignored. Instead a new collection of local reference configurations is determined and written to the resulting {{FILE|ML_ABN}} file.</p>
{{NB|important|This operation mode allows to generate a {{VASP}} machine learning force field from pre-computed or external ''ab initio'' data sets. In contrast to {{TAGO|ML_MODE|refit}} also the local reference configurations are selected from the entire data set. For example, an {{FILE|ML_AB}} file created manually by extracting ''ab initio'' data (energies, forces, stresses) from {{FILE|OUTCAR}} files (or even other external sources) can be processed in this mode without prior knowledge of local reference configurations (only a dummy section must be added to the {{FILE|ML_AB}} file, see its documentation). Similar to on-the-fly training this mode generates {{FILE|ML_FFN}} files and {{FILE|ML_ABN}} '''with''' local reference configurations.}}<p>
A new iteration through the training structures can lead to a frequent update of the force field. This is quite time-consuming. Hence, for this mode the default value of {{TAG|ML_CDOUB}} is automatically increased from 2 to 4 which will result in a much less frequent update of the force field. This leads to much more efficient calculations while practically not changing the results.</p>
{{NB|tip|If calculations for {{TAGO|ML_MODE|select}} are too time-consuming, it is useful to increase {{TAG|ML_MCONF_NEW}} to values around 10-16. Together with {{TAGO|ML_CDOUB|4}}, this often accelerates the calculations by a factor of 2-4.}}<p>
The {{FILE|ML_AB}} file may contain values for <tt>CTIFOR</tt> for each training structure. These are the thresholds used to sample that structure from the previous training. The thresholds found on the {{FILE|ML_AB}} will be re-used unless a threshold is explicitly specified in the {{FILE|INCAR}} file, by means of the {{TAG|ML_CTIFOR}} tag. In the latter case the thresholds from the {{TAG|ML_AB}} file are ignored. In case the {{FILE|ML_AB}} contains '''no''' <tt>CTIFOR</tt> information and '''no''' threshold is specified in the {{FILE|INCAR}} file, the default value for {{TAG|ML_CTIFOR}} is used.</p><p>
This mode automatically sets {{TAGO|NSW|1}} and {{TAGO|ML_CDOUB|4}}.For VASP versions prior to 6.4.0 this corresponds to {{TAGO|ML_ISTART|3}}.
{{NB|warning|{{TAGO|ML_MODE|select}} ignores the structure in the {{TAG|POSCAR}} and hence no error, force and stress predictions are made at the end of this calculation (instead zeros are written to stdout, {{TAG|OSZICAR}} and {{TAG|OUTCAR}}).}}</p>
</li><br />
<li>{{TAGO|ML_MODE|refit}}''': Refit a force field for "fast" evaluation'''<p>
Similar to {{TAGO|ML_MODE|select}}, refitting is done based on an existing {{TAG|ML_AB}} file, but the number of local reference configurations for each species is taken from the {{TAG|ML_AB}} file. Sparsification is performed on the local reference configurations, so the resulting {{TAG|ML_ABN}} file will contain the same number or fewer local reference configurations than the {{TAG|ML_AB}} file.</p><p>
By default the resulting force field is geared towards "fast" evaluation to speed up production runs ({{TAGO|ML_LFAST|.TRUE.}}). This comes at the cost of not being able to evaluate Bayesian error estimates.</p>
{{NB|warning|We strongly advise to use {{TAGO|ML_MODE|refit}} if no Bayesian error estimates are required during production runs.}}<p>
This mode automatically sets {{TAGO|ML_LFAST|.TRUE.}}, {{TAGO|NSW|1}}, {{TAGO|ML_IALGO_LINREG|4}}, {{TAGO|ML_SIGW0|1E-7}}, {{TAGO|ML_SIGV0|1}} and {{TAGO|ML_EPS_LOW|1E-11}}. For VASP versions prior to 6.4.0 this corresponds to {{TAGO|ML_ISTART|4}}.</p>
</li><br />
<li>{{TAGO|ML_MODE|refitbayesian}}''': Refit a force field with Bayesian regression (deprecated)'''<p>
Same as {{TAGO|ML_MODE|refit}}, but Bayesian regression is employed. This results in lower accuracy and much slower force fields than using {{TAGO|ML_MODE|refit}} and should be used with caution. On the other hand, this mode allows the generation of {{TAG|ML_FFN}} files that can calculate Bayesian error estimates in addition to predictions.</p><p>
This modes sets {{TAGO|NSW|1}}, {{TAGO|ML_IALGO_LINREG|1}} and {{TAGO|ML_LFAST|.FALSE.}}. For VASP versions prior to 6.4.0 this corresponds to {{TAGO|ML_ISTART|4}}.</p>
</li><br />
<li>{{TAGO|ML_MODE|run}}''': Perform only force field predictions'''<p>
A previously trained machine learning force field is read from the {{FILE|ML_FF}} file, and the MD simulation is driven with predictions from the force field only. '''No''' ''ab initio'' calculations are performed and '''no''' learning is executed. This setting is typically used when the machine learning force field is considered mature and ready for production runs.</p><p>
Optionally, if the force field was refitted using {{TAGO|ML_MODE|refitbayesian}}, the Bayesian error estimate of the energies, forces, and stress can be computed and logged in the {{FILE|ML_LOGFILE}}. The output frequency of the Bayesian errors can be set via the {{TAG|ML_IERR}} tag, the default is 0.</p><p>
For VASP versions prior to 6.4.0 this corresponds to {{TAGO|ML_ISTART|2}}.</p>
</li><br />
<li>{{TAGO|ML_MODE|none}}''': The tag is ignored'''
</ul><br />
{{NB|warning|If any option other than the above is chosen or there is a spelling error (be careful to write everything in upper case or lower case letters) the code will exit with an error.}}
{{NB|tip|Some choices of {{TAG|ML_MODE}} will automatically set other machine-learned force field tags. However, it is still possible to overwrite the defaults by specifying the corresponding tags in the {{FILE|INCAR}} file.}}


== Related tags and articles ==
== Related tags and articles ==

Latest revision as of 14:43, 20 August 2024

ML_MODE = train | select | refit | refitbayesian | run | none
Default: ML_MODE = none 

Description: String-based tag selecting operation mode for machine learning force fields.

Mind: This tag is only available as of VASP.6.4.0.

This tag acts as a "super tag" and selects the operation mode by selecting the defaults for all other tags. Every tag that is affected by this "super tag" can be overwritten by the user by simply specifying the value for that tag. The following options are available for this tag:

  • ML_MODE = train: On-the-fly training

    Force predictions from the machine learning force field are used to drive the molecular dynamics (MD) simulation. However, if the error estimation performed in each time step indicates a high force error an ab initio calculation is performed instead and the collected energy, forces, and stress are used to improve the machine learning force field. There are two possible cases depending on, whether an ML_AB is present in the calculation folder or not:

    1. No ML_AB file found:

      On-the-fly training is starting from scratch. Note that at the beginning of the MD run, when there is no force field available or it is still poorly trained, ab initio calculations will happen frequently. For VASP versions prior to 6.4.0 this corresponds to ML_ISTART = 0.

    2. ML_AB file present:

      Restart on-the-fly training from existing training database. Before the MD run starts the ML_AB file (a copy of the ML_ABN from a previous training run!) is read and the ab initio data (energies, forces, and stresses) and local reference configurations it contains are used to generate an initial force field. Subsequently, the on-the-fly training MD is started. For VASP versions prior to 6.4.0 this corresponds to ML_ISTART = 1.

      Tip: None of the structures in the ML_AB file need to match he POSCAR file for the current MD training run in terms of the simulation box, elements, or number of atoms. However, if the same elements appear the initial force field is used for predictions in the current MD run.

      The training data contained in the ML_AB file is included in the final machine learning force field, i.e., the ML_FFN file will define a force field applicable to both the structures on the ML_AB file as well as to the current MD simulation. This means that by restarting repeatedly with ML_MODE = train, and copying the ML_ABN file from the previous run to ML_AB(!), it is possible to iteratively extend the applicability of the a machine learning force field, e.g., by exploring different temperature ranges or element compositions.


  • ML_MODE = select: Re-selection of local reference configurations

    A new machine learning force field is generated from the ab initio data provided in the ML_AB file. The structures are read and processed one by one as if harvested in an MD simulation. In other words, the same steps are performed as in on-the-fly training but the source of the data are not actual ab initio calculations in an MD run but the series of structures available in the ML_AB file. The list of local reference configurations on the ML_AB file will be ignored. Instead a new collection of local reference configurations is determined and written to the resulting ML_ABN file.

    Important: This operation mode allows to generate a VASP machine learning force field from pre-computed or external ab initio data sets. In contrast to ML_MODE = refit also the local reference configurations are selected from the entire data set. For example, an ML_AB file created manually by extracting ab initio data (energies, forces, stresses) from OUTCAR files (or even other external sources) can be processed in this mode without prior knowledge of local reference configurations (only a dummy section must be added to the ML_AB file, see its documentation). Similar to on-the-fly training this mode generates ML_FFN files and ML_ABN with local reference configurations.

    A new iteration through the training structures can lead to a frequent update of the force field. This is quite time-consuming. Hence, for this mode the default value of ML_CDOUB is automatically increased from 2 to 4 which will result in a much less frequent update of the force field. This leads to much more efficient calculations while practically not changing the results.

    Tip: If calculations for ML_MODE = select are too time-consuming, it is useful to increase ML_MCONF_NEW to values around 10-16. Together with ML_CDOUB = 4, this often accelerates the calculations by a factor of 2-4.

    The ML_AB file may contain values for CTIFOR for each training structure. These are the thresholds used to sample that structure from the previous training. The thresholds found on the ML_AB will be re-used unless a threshold is explicitly specified in the INCAR file, by means of the ML_CTIFOR tag. In the latter case the thresholds from the ML_AB file are ignored. In case the ML_AB contains no CTIFOR information and no threshold is specified in the INCAR file, the default value for ML_CTIFOR is used.

    This mode automatically sets NSW = 1 and ML_CDOUB = 4.For VASP versions prior to 6.4.0 this corresponds to ML_ISTART = 3.

    Warning: ML_MODE = select ignores the structure in the POSCAR and hence no error, force and stress predictions are made at the end of this calculation (instead zeros are written to stdout, OSZICAR and OUTCAR).


  • ML_MODE = refit: Refit a force field for "fast" evaluation

    Similar to ML_MODE = select, refitting is done based on an existing ML_AB file, but the number of local reference configurations for each species is taken from the ML_AB file. Sparsification is performed on the local reference configurations, so the resulting ML_ABN file will contain the same number or fewer local reference configurations than the ML_AB file.

    By default the resulting force field is geared towards "fast" evaluation to speed up production runs (ML_LFAST = .TRUE.). This comes at the cost of not being able to evaluate Bayesian error estimates.

    Warning: We strongly advise to use ML_MODE = refit if no Bayesian error estimates are required during production runs.

    This mode automatically sets ML_LFAST = .TRUE., NSW = 1, ML_IALGO_LINREG = 4, ML_SIGW0 = 1E-7, ML_SIGV0 = 1 and ML_EPS_LOW = 1E-11. For VASP versions prior to 6.4.0 this corresponds to ML_ISTART = 4.


  • ML_MODE = refitbayesian: Refit a force field with Bayesian regression (deprecated)

    Same as ML_MODE = refit, but Bayesian regression is employed. This results in lower accuracy and much slower force fields than using ML_MODE = refit and should be used with caution. On the other hand, this mode allows the generation of ML_FFN files that can calculate Bayesian error estimates in addition to predictions.

    This modes sets NSW = 1, ML_IALGO_LINREG = 1 and ML_LFAST = .FALSE.. For VASP versions prior to 6.4.0 this corresponds to ML_ISTART = 4.


  • ML_MODE = run: Perform only force field predictions

    A previously trained machine learning force field is read from the ML_FF file, and the MD simulation is driven with predictions from the force field only. No ab initio calculations are performed and no learning is executed. This setting is typically used when the machine learning force field is considered mature and ready for production runs.

    Optionally, if the force field was refitted using ML_MODE = refitbayesian, the Bayesian error estimate of the energies, forces, and stress can be computed and logged in the ML_LOGFILE. The output frequency of the Bayesian errors can be set via the ML_IERR tag, the default is 0.

    For VASP versions prior to 6.4.0 this corresponds to ML_ISTART = 2.


  • ML_MODE = none: The tag is ignored


Warning: If any option other than the above is chosen or there is a spelling error (be careful to write everything in upper case or lower case letters) the code will exit with an error.
Tip: Some choices of ML_MODE will automatically set other machine-learned force field tags. However, it is still possible to overwrite the defaults by specifying the corresponding tags in the INCAR file.

Related tags and articles

ML_LMLFF, ML_ISTART, ML_LFAST, ML_IERR, ML_OUTBLOCK, ML_OUTPUT_MODE, ML_IALGO_LINREG, ML_MCONF_NEW, ML_CDOUB, ML_CTIFOR, ML_IERR