Train

We can access the training by a train operation. This operation accepst four input variables and forwards a potter (AbstractPotentialManager) object.

For the input variables,

  • potter:

    The potential manager. See Potentials for more details.

  • dataset:

    The dataset. See Trainers for more details.

  • trainer:

    The trainer configuration that defines the commands and the model configuration.

  • scheduler:

    Any scheduler. In general, the training needs a GPU-scheduler.

Note

The name in potter and trainer should be the same.

Extra parameters,

  • size:

    Number of models trained at the same time. This is useful when a committee needs later for uncertainty estimation.

  • init_models:

    A List of model checkpoints to initialise model parameters. The number should be the same as size.

Session Configuration

variables:
  dataset:
    type: dataset
    name: xyz
    dataset_path: ./dataset
    train_ratio: 0.9
    batchsize: 16
    # random_seed: 1112 # Set this if one wants to reproduce results
  potter:
    type: potter
    name: deepmd
    params:
      backend: lammps
      command: "lmp -in in.lammps 2>&1 > lmp.out"
      type_list: ["H", "O"]
  trainer:
    type: trainer
    name: deepmd
    command: dp
    config: ${json:./config.json}
    train_epochs: 500
    # random_seed: 1112 # Set this if one wants to reproduce results
  scheduler_gpu:
    type: scheduler
    backend: slurm
    partition: k2-gpu
    time: "6:00:00"
    ntasks: 1
    cpus-per-task: 4
    mem-per-cpu: 4G
    gres: gpu:1
    environs: "conda activate deepmd\n"
operations:
  train:
    type: train
    potter: ${vx:potter}
    dataset: ${vx:dataset}
    trainer: ${vx:trainer}
    scheduler: ${vx:scheduler_gpu}
    size: 4
    init_models:
      - ./model.ckpt
sessions:
  _train: train