mace

gdp writes ./_train.xyz and ./_test.xyz into the training directory based on dataset and generates a command line based on trainer.

Notice some parameters are override by gdp based on the dataset and the trainer parameters. The trainer.config section will be converted to a command line as python ./run_train.py –name=’MACE_model’ …, which is the current training command supported by MACE.

seed: Override by trainer.seed
max_num_epochs: Override by trainer.train_epochs.
batch_size: Override by dataset.
train_file: Override as ./_train.xyz
valid_file: Override as ./_test.xyz
valid_fraction: Always 0.
device: Automatically detected (either cpu or cuda). No Apple Silicon!
config_type_weights: Must be a string instead of a dictionary.

Note

Train set are data used to optimise model parameters. Validation set are data that helps us monitor the training progress and decide to save the model at which epoch. Test set are data that neither are trained nor affect our decision on the model. Some training simplifies these complex concepts and just use one test set for both the validation and the test purposes.

See MACE doc for more info about configuration parameters. Example Configuration:

dataset:
  name: xyz
  dataset_path: ./dataset
  train_ratio: 0.9
  batchsize: 16
  random_seed: 1112
trainer:
  name: mace
  command: python ./run_train.py
  config: # This section can be put into a separate file e.g. `./config.yaml`
    name: MACE_model
    valid_fraction: 0.05
    config_type_weights: '{"Default": 1.0}'
    E0s: {1: -12.6261, 8: -428.5812}
    model: MACE
    default_dtype: float32
    hidden_irreps: "128x0e + 128x1o"
    r_max: 4.0
    swa: true
    start_swa: 10
    ema: true
    ema_decay: 0.99
    amsgrad: true
    restart_latest: true
  type_list: ["H", "O"]
  train_epochs: 10
  random_seed: 1112

Warning

If one uses swa, gdp will not check if start_swa is smaller than max_num_epochs. If start_swa is larger than max_num_epochs, there will be an error when saving the model.