Part 1: Error analysis and hyperparameter optimization of machine-learned force fields ¶

This tutorial will explain error analysis and hyperparameter optimization of machine-learned force fields (MLFF). In the first part of the tutorial, you will learn how to examine the training-set and test-set errors. The second part of the tutorial will teach you about hyperparameter optimization. By careful hyperparameter optimization, it is possible to improve the accuracy and the performance of the MLFF simultaneously.

Disclaimer: The force field and test set used in this tutorial are tweaked. They are useful to demonstrate the workflow within reasonable time and computational effort, but not for research applications.

Content¶

1 Training-set and test-set error analysis
2 Optimize the accuracy by hyperparameter tuning
3 Timing and optimization of MLFF performance
4 Using the optimized MLFF model for production runs

1 Training-set and test-set error analysis
$\uparrow$
¶

1.1 Task¶

Compute the training-set and test-set errors of an MLFF model for given reference data.

Error estimation of an MLFF model gives insights into the accuracy of the trained model and how the model generalizes to out-of-training-set structures. In machine-learning approaches, one generally differentiates between two types of errors: The training-set error and the test-set error. The training-set error is evaluated on the training set. It is computed as the error between the values of the density-functional-theory (DFT) calculation used for fitting and the MLFF prediction.

The test-set error is evaluated using an external test set that was not utilized during the training process. It is considered good practice to sample the configurations in the test set under the same conditions as those employed in the MLFF production runs. For example, the structures in the test set should possess the same number of atoms as the production run, even if the initial training was conducted on structures with fewer atoms. Furthermore, the test set must be collected in the same thermodynamic phase as the one in which the production run will take place. When test-set error analysis is conducted in this manner, it provides insights into how well the MLFF model generalizes from the training set to the production conditions.

In general, there are three cases which can be distinguished for the comparison of training-set and test-set errors:

A low training-set error and a high test-set error indicate overfitting. This scenario means that your force field is bad at extrapolating to structures outside of the training set. Improving this force field is necessary, either by including more training structures or by tuning hyperparameters.
A training-set error and a test-set error that are roughly of the same size indicate that your force field will work as long as both errors are low enough for the desired application.
A high training-set error and a low test-set error indicate that your test set is biased, i.e., not general enough.

For further information about errors in machine-learning approaches study section IV in the paper by A. M. Tokita and J. Behler (DOI: 10.1063/5.0160326) about MLFF validation.

In this tutorial, we are going to refit an MLFF model based on existing ab-initio data stored in an ML_AB file. We assume that we finished the on-the-fly learning. With the ML_AB data we will analyze the training-set error and the test-set error of the obtained MLFF, i.e. we will evaluate this force-field.

1.2 Input¶

To fit a MLFF model based on existing ab-initio data, supply a ML_AB file that contains the ab-initio data in addition to the VASP-standard-input files. During this tutorial, we will not perform any DFT calculation but rely on previously computed data. Check out the input at $TUTORIALS/MLFF/e01_error_analysis:

INCAR.refit:

ML_LMLFF = TRUE
ML_MODE  = refit

INCAR.run:

IBRION   = -1   # no ionic update (default)
ML_LMLFF = TRUE
ML_MODE  = run

POSCAR:

Li H
   1.00000000000000
     4.0271000861999999    0.0000000000000000    0.0000000000000000
     0.0000000000000000    4.0271000861999999    0.0000000000000000
     0.0000000000000000    0.0000000000000000    4.0271000861999999
   Li   H
     4     4
Direct
  0.0021931998366077  0.9377313783581447  0.0399062356234574
  0.9873787948250990  0.4401410952913111  0.5508732802249812
  0.4655028962322062  0.0412653312320183  0.4666666187329044
  0.5673178577048460  0.5946132074966266  0.9329094887162737
  0.4620390886414084  0.4849929008378770  0.4952311142297970
  0.4556333788011961  0.0088289694138672  0.0177579698840076
  0.9786514408600437  0.4389104900251807  0.0101037431973456
  0.9467029288350567  0.9708730319095862  0.5445141862785567

KPOINTS:

ML_AB

Collection of ab-initio data from previous calculations for LiH: Lattice vectors, atomic positions, energies, forces, and stress tensors.

POTCAR

Pseudopotentials for two atom types.

1.2.1 Refitting input¶

To do a refitting, the INCAR file only contains two tags: ML_LMLFF to switch on the machine-learning mode and ML_MODE to select the refitting mode. In general, refitting should always be done after a ML_AB file was obtained during on-the-fly training. This will generally result in a more accurate MLFF because the model weights are determined via linear regression using singular-value decomposition instead of evidence approximation. After refitting, VASP runs faster in production (with ML_MODE=run) because the more performant algorithms require refitting.

Why does the POSCAR file not influence the refitting?

Click to see the answer!

The used POSCAR will not influence the refitting procedure because the refit is based on the ab-initio data stored in the ML_AB file. The structure defined in the POSCAR file is used for a single prediction step, so that the energies and forces which are in the OUTCAR file will belong to it. But we don't care about these results in this exercise about analysing an MLFF.

We use a gamma-only KPOINTS file.

We have to supply a POTCAR file with the proper number of atom types for refitting but it will only be read and not used. The machine-learning algorithms will determine the atom types from the ML_AB file independent of the atom types defined by the POTCAR file.

1.2.1 Error-analysis input¶

The test set can be found in the folder $TUTORIALS/MLFF/test_set. It contains POSCAR files in the folder $TUTORIALS/MLFF/test_set/structures and the corresponding DFT data in $TUTORIALS/MLFF/test_set/DFT_data. The test set consists of 50 independent structures. In practical applications, the test set should be chosen larger, i.e., roughly around 500 structures, depending on the final application of the force field.

1.3 Calculation¶

Determine the training-set error by executing the following commands in the terminal

cd $TUTORIALS/MLFF/e01_*
cp INCAR.refit INCAR
mpirun -np 4 vasp_gam

After refitting, VASP writes the force field to an ML_FFN file and information about the training-set error to the ML_LOGFILE. Check the documentation of the ML_LOGFILE file. How can you grep for the training-set error and print it to the stdout? What units do the training-set error of the energy, force and stress tensor have, respectively?

Click to see the answer!

To print the training-set error, run the following command in the bash shell

grep ERR ML_LOGFILE

The last line contains 5 columns. The third column shows the training-set error of the energy in eV, the fourth column is the force error in eV/Å, and the last column is the error in the stress tensor in kbar.

%%bash
grep ERR e01_*/ML_LOGFILE

# ERR ######################################################################
# ERR This line contains the RMSEs of the predictions with respect to ab initio results for the training data.
# ERR 
# ERR nstep ......... MD time step or input structure counter
# ERR rmse_energy ... RMSE of energies (eV atom^-1)
# ERR rmse_force .... RMSE of forces (eV Angst^-1)
# ERR rmse_stress ... RMSE of stress (kB)
# ERR ######################################################################
# ERR               nstep      rmse_energy       rmse_force      rmse_stress
# ERR                   2                3                4                5
# ERR ######################################################################
ERR                     0   1.20525930E-04   8.13245834E-03   6.08059778E-01

Next, we want to analyse the test-set error and compare it to the training-set error. To do this, we have to apply the MLFF model on our test set. Compute energies, forces and stress tensor of the test set by executing the following commands:

cp ML_FFN ML_FF
cp INCAR.run INCAR
bash test_set_analysis.sh

This runs the following script

test_set_analysis.sh

#!/usr/bin/env bash

mkdir MLFF_data

for i in {1..50}; do
   echo "test configuration " $i
   cp ../test_set/structures/POSCAR.${i} POSCAR
   mpirun -np 4 vasp_gam
   cp ./vaspout.h5 ./MLFF_data/vaspout_${i}.h5
done

After this calculation has finished, we compute the errors between the DFT data and the MLFF data in the test set with py4vasp by running the following python cell. It will compute the test-set error for every configuration in the test set and plot it.

from py4vasp import MLFFErrorAnalysis
from py4vasp import plot
import numpy as np
# Compute the errors
mlff_error_analysis = MLFFErrorAnalysis.from_files(
    dft_data="./test_set/DFTdata/*.h5",
    mlff_data="./e01_error_analysis/MLFF_data/*.h5"
)
energy_error = mlff_error_analysis.get_energy_error_per_atom()
force_error = mlff_error_analysis.get_force_rmse()
stress_error = mlff_error_analysis.get_stress_rmse()
x = np.arange(len(energy_error))

plot(x, energy_error, ylabel="Energy Error per atom [eV]", xlabel="Configuration Number")

plot(x, force_error, ylabel="Force RMSE [eV/Å]", xlabel="Configuration Number")

plot(x, stress_error, ylabel="Stress RMSE [kbar]", xlabel="Configuration Number")

To compute the total test-set error averaged over the configurations in the test set, execute:

from py4vasp import MLFFErrorAnalysis
mlff_error_analysis = MLFFErrorAnalysis.from_files(
    dft_data="./test_set/DFTdata/*.h5",
    mlff_data="./e01_error_analysis/MLFF_data/*.h5",
)

energy_error = mlff_error_analysis.get_energy_error_per_atom(normalize_by_configurations=True)
force_error = mlff_error_analysis.get_force_rmse(normalize_by_configurations=True)
stress_error = mlff_error_analysis.get_stress_rmse(normalize_by_configurations=True)

print(energy_error,force_error,stress_error)

0.00012588334025769753 0.008501265747264132 0.3881138246666107

What do you conclude from the comparison of the test set-error and the training-set error in the forces?

Click to see the answer!

The training-set error in the forces is slightly lower compared to the test-set error. Since both errors are small and below 10meV/Å the MLFF model seems to be reliable. The test-set error is of the same order as the training-set error and hence the MLFF model generalizes to structures outside of the training set.

1.4 Questions¶

What happened if the error analysis yields a low training-set error and a high test-set error?
Is the POSCAR file required when ML_MODE=refit?
How is the linear-regression problem addressed when ML_MODE=refit?

2 Optimize the accuracy by hyperparameter tuning
$\uparrow$
¶

2.1 Task¶

Hyperparameter are user-defined parameters of the MLFF model that will not be optimized during training of the MLFF. Hyperparameter optimization can improve the accuracy and at the same time the performance of the force field. Examples for such parameters are the cutoff radii (ML_RCUT1 and ML_RCUT2) within which the descriptors of the local atomic environments are computed. For further information about hyperparameters and their optimization, we recommend Hyperparameter optimization by M. Feurer and F. Hutter. In our experience, the hyperparameters listed in the following table are the most important to consider in order to optimize the accuracy and performance:

parameter	function	class
ML_RCUT1	cutoff radius 2 body	descriptor
ML_RCUT2	cutoff radius 3 body	descriptor
ML_WTIFOR	weight factor atomic forces	fitting
ML_WTOTEN	weight factor forces	fitting
ML_WTSIF	weight factor stress tensor	fitting
ML_EPS_LOW	sparse factor local reference configurations	sparse
ML_RDES_SPARSDES	sparse factor descriptors	sparse

This is only a small subselection of hyperparameters available in VASP. To find more information about these hyperparameters and to find a full list of the parameters available in VASP, please check the VASP Wiki.

In this exercise, you are going to scan the ML_RCUT1 parameter and try to improve your force field.

Cross-validation) is a method to judge the quality of the applied hyperparameters. As a preparatory step, make sure that there is no overfitting by ensuring the training-set error is not significantly smaller than the test-set error. Next, optimize the hyperparameters by varying them systematically. It is unlikely to improve the quality of the force field by randomly changing hyperparameters. Then, do a repeated comparison of training-set and test-set error with different hyperparameters as shown in the previous exercise.

For instance, to find the optimal value for ML_RCUT1, we compute the training-set errors (ML_LOGFILE file) and force fields (ML_FF file) for several values of ML_RCUT1. The ML_FF files allow to do an error analysis on the test set using py4vasp like in the previous exercise. To reduce computational time and disk usage, it is sufficient to consider only the training-set error during the parameter scan. Then, one analyzes the test set only on the MLFF model with the lowest training-set error.

2.2 Input¶

Check out the input at $TUTORIALS/MLFF/e02_hyperparameter_scan.

INCAR.refit:

ML_LMLFF = TRUE
ML_MODE  = refit
ML_RCUT1 = #vary this value
KSPACING = 300

INCAR.run:

IBRION   = -1   # no ionic update (default)
ML_LMLFF = TRUE
ML_MODE  = run
KSPACING = 300

POSCAR:

Click to see the structure!

Li H
   1.00000000000000
     4.0271000861999999    0.0000000000000000    0.0000000000000000
     0.0000000000000000    4.0271000861999999    0.0000000000000000
     0.0000000000000000    0.0000000000000000    4.0271000861999999
   Li   H
     4     4
Direct
  0.0021931998366077  0.9377313783581447  0.0399062356234574
  0.9873787948250990  0.4401410952913111  0.5508732802249812
  0.4655028962322062  0.0412653312320183  0.4666666187329044
  0.5673178577048460  0.5946132074966266  0.9329094887162737
  0.4620390886414084  0.4849929008378770  0.4952311142297970
  0.4556333788011961  0.0088289694138672  0.0177579698840076
  0.9786514408600437  0.4389104900251807  0.0101037431973456
  0.9467029288350567  0.9708730319095862  0.5445141862785567

ML_AB (will be copied from exercise 1)

Collection of ab-initio data from previous calculations for LiH: Lattice vectors, atomic positions, energies, forces, and stress tensors.

POTCAR

Pseudopotentials for two atom types.

The two central input files are the ML_AB file and the INCAR file. While VASP requires the POSCAR and the POTCAR files, they do not affect the hyperparameter scan. We avoid using a KPOINTS file by including the KSPACING tag in the INCAR file. In any case, the reciprocal lattice is irrelevant when refitting a force field.

We are using the ML_AB file from the previous exercise. Here, the ab-initio data contained in the ML_AB file will be refitted several times with different cutoff radii for the two-body descriptor. Create the following script and run it using bash run_hyper_param_scan.sh to create an INCAR file to scan over ML_RCUT1 and run VASP:

run_hyperparam_scan.sh

#!/usr/bin/env bash

for rcut in 13.0 14.0 15.0
do
  echo "RCUT1" $rcut
  # make INCAR file on-the-fly
  cat << EOF > INCAR
ML_LMLFF  = TRUE
ML_MODE   = refit
ML_RCUT1  = ${rcut}
KSPACING  = 300
EOF
  mpirun -np 4 vasp_gam
  cp ML_LOGFILE ML_LOGFILE_${rcut}
  cp ML_FFN ML_FF_${rcut}
done

2.3 Calculation¶

To execute the parameter scan for ML_RCUT1, execute the run_hyperparam_scan.sh script.

cd $TUTORIALS/MLFF/e02_*
ln -s ../e01_*/ML_AB .
bash run_hyperparam_scan.sh

After running the calculation, plot the training-set error over the proposed cutoff radii by entering

bash extract_training_error.sh

This will run the following script to extract the training-set error from the ML_LOGFILEs extract_training_error.sh

#!/usr/bin/env bash

rm training_error.dat
# write header to file
echo "# rcut  energy        force         stress " > training_error.dat
for rcut in 12.0 13.0 14.0
do
   # extract energy, force, stress training-set 
   # errors from ML_LOGFILE and 
   # write to output file
   grep ERR ML_LOGFILE_${rcut} | tail -n 1 | awk -v r=${rcut} '{print r, $3, $4, $5}' 
done >> training_error.dat

%%bash
cat e02_*/training_error.dat

# rcut  energy        force         stress 
12.0 9.47049971E-05 7.68544454E-03 4.93079194E-01
13.0 9.42368836E-05 7.64456267E-03 4.83259618E-01
14.0 9.85727612E-05 7.65661277E-03 5.12837039E-01

Now, there is a file named training_error.dat that you can plot by executing the code provided below. In addition to the three data points that were computed, there is also a file called training_error_store.dat which contains more data points. The tutorial is reduced to just three data points in order to save time.

import numpy as np
import plotly.express as px
import plotly.graph_objects as go

data = np.loadtxt("./e02_hyperparameter_scan/training_error.dat")
data_store = np.loadtxt("./e02_hyperparameter_scan/training_error_store.dat")

fig1 = px.scatter(x=data_store[:,0], y=data_store[:,1], labels={"x":"ML_RCUT1 [Å]", "y":"RMSE Energy [eV]"},color_discrete_sequence=['blue'])
fig2 = px.line(x=data_store[:,0], y=data_store[:,1],color_discrete_sequence=['blue'])
# plotting your data
fig3 = px.scatter(x=data[:,0], y=data[:,1], color_discrete_sequence=['red'] )
fig4 = px.line(x=data[:,0], y=data[:,1],color_discrete_sequence=['red'])

fig5 = go.Figure(data=fig1.data + fig2.data + fig3.data + fig4.data, layout = fig1.layout)
fig5.show()

fig1 = px.scatter(x=data_store[:,0], y=data_store[:,2], labels={"x":"ML_RCUT1 [Å]", "y":"RMSE Force [eV/Å]"},color_discrete_sequence=['blue'])
fig2 = px.line(x=data_store[:,0], y=data_store[:,2],color_discrete_sequence=['blue'])
fig3 = px.scatter(x=data[:,0], y=data[:,2], color_discrete_sequence=['red'] )
fig4 = px.line(x=data[:,0], y=data[:,2],color_discrete_sequence=['red'] )
fig5 = go.Figure(data=fig1.data + fig2.data + fig3.data + fig4.data, layout = fig1.layout)
fig5.show()

fig1 = px.scatter(x=data_store[:,0], y=data_store[:,3], labels={"x":"ML_RCUT1 [Å]", "y":"RMSE Stress [kBar]"},color_discrete_sequence=['blue'])
fig2 = px.line(x=data_store[:,0], y=data_store[:,3],color_discrete_sequence=['blue'])
fig3 = px.scatter(x=data[:,0], y=data[:,3],color_discrete_sequence=['red'] )
fig4 = px.line(x=data[:,0], y=data[:,3],color_discrete_sequence=['red'])
fig5 = go.Figure(data=fig1.data + fig2.data + fig3.data + fig4.data, layout = fig1.layout)
fig5.show()

The plot contains three different errors. The energy error, the force error and the error of the stress tensor. The red part of the plot shows the data points computed during this tutorial. In principle, it is always good to have all three errors as small as possible. But there might also be cases where the user has to pay special attention to one of them. For example, for the computation of defect-formation energies, the MLFF model should be optimized to minimize the energy error. In contrast, phonon frequencies require a MLFF model with minimal errors of the forces. To pay special attention to one of the errors, weights for the quantity of interest can be set during fitting. To increase the importance of the total energy during fitting, adjust the value of ML_WTOTEN. Similarly, the importance of the forces is set by ML_WTIFOR and the weight factor for the stress tensor by ML_WTSIF. Focus on the stress tensor if you are interested in an exact prediction of the volume.

From the plot we see that the model with ML_RCUT1=13 is the most accurate. Thus, we keep ML_ABN_13.0 and ML_FFN_13.0 for further analysis. In particular, we repeat the steps of the test-set analysis. Execute the following command in the bash shell:

ln -s ML_FF_13.0 ML_FF
cp INCAR.run INCAR
bash test_set_analysis.sh

This will run the script test_set_analysis.sh

#!/usr/bin/env bash

mkdir MLFF_data

for i in {1..50}
do
   echo "test configuration " $i
   cp ../test_set/structures/POSCAR.${i} POSCAR
   mpirun -np 4 vasp_gam
   cp ./vaspout.h5 ./MLFF_data/vaspout_${i}.h5
done

from py4vasp import MLFFErrorAnalysis
mlff_error_analysis = MLFFErrorAnalysis.from_files(
    dft_data="./test_set/DFTdata/*.h5",
    mlff_data="./e02_hyperparameter_scan/MLFF_data/*.h5",
    )
energy_error = mlff_error_analysis.get_energy_error_per_atom(normalize_by_configurations=True)
force_error = mlff_error_analysis.get_force_rmse(normalize_by_configurations=True)
stress_error = mlff_error_analysis.get_stress_rmse(normalize_by_configurations=True)
print(energy_error,force_error,stress_error)

7.56458395150883e-05 0.007860020568993053 0.32776810061376616

In principle, one should repeat this procedure for every hyperparameter available in VASP to obtain the best MLFF model; however in practice, it is not possible to scan all hyperparameters. Usually VASP has reasonable default choices for these parameters because they were optimized on a selected set of bulk materials and on a molecular data base. Depending on the intended amount of production calculations for your force field, it might be worth considering the parameters of the table above.

2.4 Questions¶

What are hyperparameters?
In this exercise, we optimized the hyperparameters. Compared to the previous exercise, how much did this optimization improve the error in the forces?
Can hyperparameter optimization increase the performance of the force field?

3 Optimization of MLFF performance
$\uparrow$
¶

3.1 Task¶

Optimize the performance of a force field based on an existing ML_AB file.

Efficient sparsification increases the performance of a force field. For instance, VASP can reduce the number of angular descriptors of the MLFF model using the CUR algorithm (ML_LSPARSDES). The descriptor-covariance matrix A is decomposed into a product of three matrices, $C$ , $U$ , and $R$ , where $C$ consists of a small number of actual columns of $A$ , $R$ consists of a small number of actual rows of $A$ , and $U$ is a small unitary matrix that guarantees that the product CUR is close to A. While the original CUR algorithm was developed to efficiently select a few significant columns of the matrix A, we use it to search for insignificant angular descriptors and remove them. This decision is based on the leverage scores and ML_RDES_SPARSDENS.

For detailed information on the CUR algorithm, check out Mahoney et al., Comp. Sci. 106(3), 697-702 (2009) and Jinnouchi et al., J. Chem. Phys. 152, 234102 (2020).

Removing descriptors increases the performance of the force field at the cost of its accuracy. To find an optimal tradeoff between accuracy and performance one can perform a Pareto front that has the timing of the code on the x-axis and the test-set accuracy on the y-axis of the plot. From the Pareto front you can select the optimal value for ML_RDES_SPARSDES. It requires the following steps for each value of ML_RDES_SPARSDES:

refit the MLFF model with a certain ML_RDES_SPARSDES value
determine the test-set accuracy of the MLFF model (as in exercise 1)
determine the timing of the MLFF model

To obtain meaningful timings, you need to average the timing of the force-evaluation routine over several MD steps. Moreover, writing to files is usually very slow and hence this task has to be eliminated during a timing run. This is achieved by setting ML_OUTBLOCK and ML_OUTPUT_MODE.

3.2 Input¶

Check out the input at $TUTORIALS/MLFF/e03_sparsification!

INCAR.sparse:

ML_LMLFF  = True
ML_MODE   = refit
# the previously determined cutoff is kept
ML_RCUT1  = 13.0

ML_LSPARSDES     = True
ML_RDES_SPARSDES = 0.5

KPACING   = 300

INCAR.run:

IBRION   = -1   # no ionic update (default)
ML_LMLFF = TRUE
ML_MODE  = run
KSPACING = 300

INCAR.timing:

# molecular dynamics (MD)
IBRION = 0     # switch on MD
NSW    = 100   # number of MD steps
POTIM  = 2.0   # time step in fs

# NVT ensemble
MDALGO = 3               # Langevin thermostat
TEBEG  = 300             # temperature
LANGEVIN_GAMMA = 3.0 3.0 # friction coefficients
ISIF   = 2               # const. volume

RANDOM_SEED    = 10 0 0

# production run
ML_LMLFF = .TRUE.
ML_MODE  = RUN

ML_OUTBLOCK    = 100
ML_OUTPUT_MODE = 0

KPACING   = 300

POSCAR:

Click to see the structure!

Li H
   1.00000000000000
     4.0271000861999999    0.0000000000000000    0.0000000000000000
     0.0000000000000000    4.0271000861999999    0.0000000000000000
     0.0000000000000000    0.0000000000000000    4.0271000861999999
   Li   H
     4     4
Direct
  0.0021931998366077  0.9377313783581447  0.0399062356234574
  0.9873787948250990  0.4401410952913111  0.5508732802249812
  0.4655028962322062  0.0412653312320183  0.4666666187329044
  0.5673178577048460  0.5946132074966266  0.9329094887162737
  0.4620390886414084  0.4849929008378770  0.4952311142297970
  0.4556333788011961  0.0088289694138672  0.0177579698840076
  0.9786514408600437  0.4389104900251807  0.0101037431973456
  0.9467029288350567  0.9708730319095862  0.5445141862785567

ML_AB

Collection of ab-initio data from previous calculations for LiH: Lattice vectors, atomic positions, energies, forces, and stress tensors.

POTCAR

Pseudopotentials for two atom types.

In this exercise, we use two different INCAR files:

The INCAR.sparse does a sparsification of the three-body descriptors and refits the force field with the reduced number of descriptors. We set ML_RCUT1=13.0 because in the last exercise we determined it to be the optimal value.
The INCAR.run computes the energy, forces and stress tensor for a given POSCAR

We are reusing the ML_AB file from the previous exercises to have access to ab-initio data.

The POSCAR file defines the initial structure for the MD run and is not used during the sparsification. The POTCAR file is not used in both cases.

3.3 Calculation¶

Perform the sparsification by entering the following into the terminal:

cd $TUTORIALS/MLFF/e03_*
ln -s ../e01_*/ML_AB
cp INCAR.sparse INCAR
mpirun -np 4 vasp_gam
cp ML_LOGFILE ML_LOGFILE.sparse
cp ML_FFN ML_FF.sparse

Now, we can compare the number of descriptors of the accurate model from the previous exercise with the sparsified model. The number of descriptors can be found in the ML_LOGFILE with the keyword NDESC. Enter the grep command as follows:

grep "NDESC" ML_LOGFILE.sparse

%%bash
grep "NDESC" e03_*/ML_LOGFILE.sparse

# NDESC #####################################################
# NDESC This line shows the number of radial and angular descriptors for each element.
# NDESC 
# NDESC nstep ....... MD time step or input structure counter
# NDESC descrad ..... Number of radial descriptors
# NDESC el .......... Element symbol
# NDESC descang ..... Number of angular descriptors for a given element type
# NDESC #####################################################
# NDESC             nstep   descrad el   descang el   descang
# NDESC                 2         3  4         5  6         7
# NDESC #####################################################
# NDESC_SIC #######################################
# NDESC_SIC This line shows the number of self-interaction  
# NDESC_SIC correction (SIC) terms for the angular descriptor.
# NDESC_SIC 
# NDESC_SIC nstep ....... MD time step or input structure counter
# NDESC_SIC el .......... Element symbol
# NDESC_SIC descsic ..... Number of SIC descriptors for a given element type
# NDESC_SIC #######################################
# NDESC_SIC         nstep el   descsic el   descsic
# NDESC_SIC             2  3         4  5         6
# NDESC_SIC #######################################
NDESC                   0        24 Li       225 H        225

The number of descriptors can be found in the last line of the output. Compare the number of descriptors before and after sparsification by executing the following in the bash shell:

grep NDESC ../e02_*/ML_LOGFILE_13.0 | tail -n 1
grep NDESC ML_LOGFILE.sparse | tail -n 1

%%bash
grep NDESC e02_*/ML_LOGFILE_13.0 | tail -n 1
grep NDESC e03_*/ML_LOGFILE.sparse | tail -n 1

NDESC                   0        24 Li       452 H        452
NDESC                   0        24 Li       225 H        225

From the output we can deduce that the number of radial descriptors is the same in both cases, namely 24. The number of angular descriptors cut in half minus 1 from 452 to 225 for both atom types.

Next, we analyse the obtained force field in the same way as in exercise 1. We have to loop over the structures of our test set and store the vaspout.h5 files. Then, we execute the error analysis in py4vasp.

cp INCAR.run INCAR
ln -s  ML_FF.sparse ML_FF
bash test_set_analysis.sh

from py4vasp import MLFFErrorAnalysis
mlff_error_analysis = MLFFErrorAnalysis.from_files(
    dft_data="./test_set/DFTdata/*.h5",
    mlff_data="./e03_sparsification/MLFF_data/*.h5",
)
energy_error = mlff_error_analysis.get_energy_error_per_atom(normalize_by_configurations=True)
force_error = mlff_error_analysis.get_force_rmse(normalize_by_configurations=True)
stress_error = mlff_error_analysis.get_stress_rmse(normalize_by_configurations=True)
print(energy_error,force_error,stress_error)

7.942174938917468e-05 0.008208206344436218 0.37269617625407725

The following table shows the test-set errors for the three force fields you trained so far:

standard MLFF	MLFF ML_RUCT1 scan	MLFF sparse
8.5E-3 eV/Å	7.9E-3 eV/Å	8.2E-3 eV/Å

What can you conclude from the test-set errors in the table?

Click to see the answer!

The sparsification increases the test-set error slightly but reduces the number of angular descriptors by half for every atom type.

The subsequent phase involves evaluating the performance of the MLFF models. This analysis is strongly affected by the number of jobs running simultaneously on the compute node, which significantly influences the timings. Nevertheless, let us discuss how you could analyze the performance of the MLFF models.

First, reduce the I/O of your calculation. The INCAR.timing shows how to reduce the output written during a timed MD run. To run the calculation, you would copy INCAR.timing to INCAR and execute VASP.
Run calculations for different MLFF that you want to compare and save the OUTCAR file.
Finally, average the timing for an MD step by grepping the line LOOP+ from the OUTCAR file that contains the CPU time and averaging over the CPU time. The following bash lines achieves this for OUTCAR.sparse_timing and OUTCAR.no-sparse_timing, which we have prepared for you in advance. Enter the following into the Terminal:

grep LOOP+ OUTCAR.sparse_timing | awk '{sum+=$7;n++} END {print sum/n}'
grep LOOP+ OUTCAR.no-sparse_timing | awk '{sum+=$7;n++} END {print sum/n}'

Note that here we are using a rather small database of ab-initio data (ML_AB file). The larger the database, the more the sparsification impacts the timing. The maximum achievable speed-up factor is ML_RDES_SPARSDES.

%%bash
grep LOOP+ e03_*/OUTCAR.sparse_timing | awk '{sum+=$7;n++} END {print sum/n}'
grep LOOP+ e03_*/OUTCAR.no-sparse_timing | awk '{sum+=$7;n++} END {print sum/n}'

0.0110761
0.0127074

3.4 Questions¶

How can you find an optimal tradeoff between performance and accuracy?
What does ML_RDES_SPARSDES specify?
What output is written if ML_OUTPUT_MODE=1?

4 Using the optimized MLFF model for production runs
$\uparrow$
¶

4.1 Task¶

In the last part of this exercise, the obtained sparse MLFF model is used for a small production run. Since the model is now properly trained and all the parameters are stored in the ML_FF file, there is no need anymore to set them in the INCAR file. So, the only tags in the INCAR file which have to be set for the finished MLFF model are ML_LMLFF=TRUE and ML_MODE=run. To remember which hyperparameters were used during the creation of the MLFF model they are stored in the first line of the ML_FF file. The rest of the ML_FF file is stored in a binary format. To check the hyperparameters of the ML_FF model execute the following command

cd $TUTORIALS/MLFF/e04_*
ln -s ../e03_*/ML_FF.sparse ML_FF
head -n 1 ML_FF

which should produce the following output:

ML_FF 0.2.1 binary { "date" : "2023-09-22T14:43:08.294", "ML_LFAST" : True, "ML_DESC_TYPE" :   0, "types" : [ "Li", "H" ],
"training_structures" : 122, "local_reference_cfgs" : [ 737, 752 ], "descriptors" : [ 249, 249 ], "ML_IALGO_LINREG" : 4,
"ML_RCUT1" :  1.3000E+01, "ML_RCUT2" :  5.0000E+00, "ML_W1" :  1.0000E-01, "ML_SION1" :  5.0000E-01, "ML_SION2" :  5.0000E-01,
"ML_LMAX2" : 3, "ML_MRB1" : 12, "ML_MRB2" : 8, "ML_IWEIGHT" : 3, "ML_WTOTEN" :  1.0000E+00, "ML_WTIFOR" :  1.0000E+00,
"ML_WTSIF" :  1.0000E+00 }

You see that we optimized ML_RCUT1 to a value of 13.0 and that the number of descriptors is [249, 249] for the atom types Li and H, respectively.

To use our MLFF model for a production run, we are going to run a molecular-dynamics (MD) simulation and compute the pair-distribution function. The pair-distribution function is related to the static structure factor by means of a Fourier transform. The static structure factor is an interesting quantity because it can be directly obtained from scattering experiments such as X-ray or neutron diffraction experiments. Therefore, the pair distribution gives us the possibility to validate our model on experimental data. For further information on the structure factor check L. Van Hove, Phys. Rev. 95, 249 (1954).

In this part of the tutorial we will determine the pair distribution of LiH at a temperature of 450K and a pressure of 1bar. To maintain the temperature and pressure we use a Langevin thermostat.

4.2 Input¶

Check out the input at $TUTORIALS/MLFF/e04_production.

INCAR

# molecular dynamics (MD)
IBRION = 0     # switch on MD
NSW    = 2000  # number of MD steps
POTIM  = 2.0   # time step in fs

# NVT ensemble
MDALGO = 3               # Langevin thermostat
TEBEG  = 450             # temperature
LANGEVIN_GAMMA = 0.1 0.1 # friction coefficients
ISIF   = 2               # const. volume

RANDOM_SEED = 10 0 0

# production run
ML_LMLFF = .TRUE.
ML_MODE  = RUN
ML_OUTPUT_MODE = 1 # Ensures that PCDAT is printed

# Misc
KPACING   = 300
ISYM = -1

POSCAR

LiH
   1.0000000000000000
     7.8287847029081057    0.0184022147739740    0.0436744842177140
    -0.0000000000059867    7.8831855213703577    0.0109444497488786
    -0.0000000000095877   -0.0000000000034072    7.6332104165900363
  Li             H
              32              32
Direct
   1.9817054671128955 -0.4615590061054727 -1.4210973981086219
   1.9908342020561749 -0.4245884677547543 -1.9403613271831182
   2.0291987979331716  0.0764675410482946 -1.4473190803670182
   2.0003819712963882  0.0434600233223213 -1.9372585128028623
   1.5123565198129114 -0.4631422998001105 -1.4223489601879342
   1.5009761278969092 -0.4453976364291092 -1.9415619597553522
   1.4994078683320884  0.0453111416662948 -1.4395625528728533
   1.4920787358153047  0.0749759754227328 -1.9554193908340134
   1.4771192200171763 -0.6956853853527750 -1.1740875527008734
   1.4857876526841061 -0.6917686740837428 -1.6720984387748790
   1.4982238953402405 -0.2052814837011414 -1.2032635147179722
   1.5076711504068163 -0.1525020681581678 -1.6625242247821244
   2.0237745834200087 -0.6865011313240997 -1.1880793891256338
   1.9619016330665227 -0.6920859965869365 -1.7098418060597396
   1.9848255490801441 -0.1805732277990417 -1.1730681923310977
   1.9930394513513248 -0.1840355873006903 -1.6854182262485986
   1.2413563659888442  0.0625339636088172 -1.1946327025499690
   1.2718302311426040  0.0484902467170686 -1.6864281291080219
   1.2314950738872033 -0.4199584043288853 -1.1989048647882197
   1.2661536447359654 -0.4379143690275620 -1.6987405959330355
   1.7628840424723418  0.0709537191751443 -1.2065539892018160
   1.7525527647373844  0.0770286007123581 -1.6762495528836525
   1.7485055244828722 -0.4215982120264667 -1.1845971178647334
   1.7448584151990505 -0.4067962476727136 -1.6789000393433424
   1.2150187769175835 -0.6663000573428698 -1.9216020378849861
   1.2524959087735665 -0.6812575935175010 -1.4462894477063559
   1.2531731761067684 -0.1867855213621671 -1.9291100549777591
   1.2259867851441322 -0.1856967875773643 -1.4252590344016063
   1.7493982411143223 -0.6973680885139401 -1.9354784312219775
   1.7260537783199135 -0.7018034121966544 -1.4161537454728923
   1.7208441337659361 -0.1685614662605335 -1.9438107029069940
   1.7333301563888721 -0.1708630537252926 -1.4381194728742182
   1.2649257565892971 -0.6556014343128211 -1.1817810013660568
   1.2585150056419852 -0.6950521445902047 -1.7022316528699457
   1.2366899715948825 -0.2016212867097995 -1.1889179623891479
   1.2435972373518811 -0.1674805421763512 -1.6746199753397368
   1.7360565085394983 -0.7014677640229965 -1.1806409696057392
   1.7359989851552748 -0.7073444407917674 -1.6708498403301331
   1.7474656103131458 -0.1777402172080236 -1.1826824484643708
   1.7624647683600680 -0.1757906844274497 -1.6801347751005680
   1.2674652456927549  0.0503884660816078 -1.4356097484617691
   1.2310466249001411  0.0327844150204245 -1.9403971909709119
   1.2098635737050121 -0.4263929003423095 -1.4182292509337020
   1.2265347871913885 -0.4279538904443739 -1.9537757614594222
   1.7283483994297175  0.0558563680517394 -1.4483120337520408
   1.7756017110881346  0.0672337699055441 -1.9278288436644531
   1.7750253655838191 -0.4648904046255498 -1.4549195521056619
   1.7295923315567368 -0.4219817905587360 -1.9470964548760867   
   1.4770209675407064 -0.6938438306751112 -1.4335017836486030
   1.4723646353634945 -0.6770195558022604 -1.8958872528692552
   1.4723304891734692 -0.2127593172465675 -1.4473881729001288
   1.4891149694386838 -0.1626854203039640 -1.9603811580254049
   1.9926313874455659 -0.6908264960781072 -1.4497987024242049
   1.9867827816577046 -0.6980021465438837 -1.9437561167974060
   1.9931966385787605 -0.1915982553314239 -1.4347752030340506
   1.9783410196089279 -0.1914865493283802 -1.9326203981603027
   1.4764840292260042 -0.4461711698868597 -1.1775341299154893
   1.4875834918595106 -0.4154993260957937 -1.6775044859519459
   1.4810701055683364  0.0320617038904891 -1.2116883860657408
   1.5095460900056492  0.0791435657918060 -1.6881031427292668
   1.9825393542246055 -0.4577940687402631 -1.1998173435832227
   1.9904498339985650 -0.4649735127116422 -1.6891634696973543
   2.0269030501276393  0.0453801235753682 -1.1771020717088660
   1.9888483400537196  0.0679206981683836 -1.6933672606587671

ML_FF

Force field for LiH.

POTCAR

Pseudopotentials for two atom types.

Check the meaning of all INCAR tags by referring to the VASP Wiki!

The POSCAR used during this exercise was already pre-equilibrated and contains starting velocities.

We want to use the force field we have created in the previouse exercise. Link the file, to avoid duplicating a large file, with the following command:

cd $TUTORIALS/MLFF/e04_*
ln -s  ../e03_*/ML_FF.sparse ML_FF

The POTCAR file is again a dummy file because we do a pure MLFF model run. It has to be present but its content is of minor importance.

4.3 Calculation¶

To execute the production run for the perfomant MLFF model, execute the following commands

cd $TUTORIALS/MLFF/e04_*
mpirun -np 4 vasp_std

First, plot the energy and temperature to confirm that the energy is conserved during the MD run and the temperature is fluctuating around the desired value.

from py4vasp import Calculation

my_calc = Calculation.from_path("./e04_production")

my_calc.energy[:].plot("TOTEN temperature")

If both lines are fluctuating around an average value, we have determined that the MD run was executed properly.

Next, plot the pair-distribution function!

my_calc.pair_correlation.plot()

How could you check the energy and temperature without py4vasp?

Click to see the answer!

You could extract the energy and temperature from the OSZICAR file and then plot them with your favorite tool. For instance, first enter the following commands into the terminal:

grep T= OSZICAR | awk '{print $7}' > energy.out
grep T= OSZICAR | awk '{print $3}' > temperature.out

Then, open gnuplot and run

p 'energy.out' w lp

How could you check the pair-distribution function without py4vasp?

Click to see the answer!

Compute the pair-distribution function with the following bash script:

extract_PCDAT.sh

#!/usr/bin/env bash
file=PCDAT
awk <$file >PCDAT.xy '
NR==8 { pcskal=$1}
NR==9 { pcfein=$1}
NR==7 { npaco=$1}
NR>=13 {  
  line=line+1
  if (line==1) s=s+1
  if (line==(npaco+1))  {
     print " "
     line=0
  }
  else  {
     a1[line]=  a1[line] + $1
     a2[line]=  a2[line] + $2
     a3[line]=  a3[line] + $3
     a4[line]=  a4[line] + $4
     print (line-0.5)*pcfein/pcskal,$1,$2, $3, $4, $5
  }
}'

The script will create a file called PCDAT.xy, e.g., using gnuplot

set xlabel "r [Å]"
set ylabel "g(r)"
p 'PCDAT.xy' w l

Good job! You have finished Part 1!¶

Go to Top $\uparrow$

After doing this tutorial you have accomplished the following tasks:

computing training and test set errors and interpreting them
fine tuning the hyperparameters of your model to increase the accuracy of your MLFF model
timing the force evaluation of the MLFF
optimizing the performance of the MLFF model by sparsifying the angular descriptor
using the produced MLFF model to compute real quantities which are comparable to experiment

Part 1: Error analysis and hyperparameter optimization of machine-learned force fields ¶

Content¶

1 Training-set and test-set error analysis ↑↑\uparrow¶

1.1 Task¶

1.2 Input¶

1.2.1 Refitting input¶

1.2.1 Error-analysis input¶

1.3 Calculation¶

1.4 Questions¶

2 Optimize the accuracy by hyperparameter tuning ↑\uparrow¶

2.1 Task¶

2.2 Input¶

2.3 Calculation¶

2.4 Questions¶

3 Optimization of MLFF performance ↑\uparrow¶

3.1 Task¶

3.2 Input¶

3.3 Calculation¶

3.4 Questions¶

4 Using the optimized MLFF model for production runs ↑\uparrow¶

4.1 Task¶

4.2 Input¶

4.3 Calculation¶

Good job! You have finished Part 1!¶

1 Training-set and test-set error analysis
$\uparrow$
¶

2 Optimize the accuracy by hyperparameter tuning
$\uparrow$
¶

3 Optimization of MLFF performance
$\uparrow$
¶

4 Using the optimized MLFF model for production runs
$\uparrow$
¶