Getting Started
Here, we would introduce several basic components of GDPy, namely potential, driver, worker. This section demonstrates how to use gdp to computate a number of structures.
The related commands are
# gdp -h for more info
$ gdp -h
# --- run simulations on local nodes or submitted to job queues
$ gdp -p ./worker.yaml compute ./structures.xyz
# - if -d option is used, results would be written to the folder `./results`
$ gdp -d ./results -p ./worker.yaml compute ./structures.xyz
An example input file (worker.yaml) is organised as follows:
potential:
... # define the backend, the model path and the specific parameters
driver:
... # define the init and the run parameters of a simulation
scheduler:
... # define a scheduler
Units
We use the following units through all input files:
Time fs, Length AA, Energy eV, Force eV/AA.
Potential
We have supported several MLIP formulations based on an AbstractPotentialManager
class to access driver, expedition, and training through workflows.
The example below shows how to define a deepmd potential using the ase backend in a yaml file:
# -- ase interface
potential:
name: deepmd # name of the potential
params: # potential-specifc params
backend: ase # ase or lammps
model: ./graph.pb
See Potentials section for more details.
Driver
After potential is defined, we need to further specify what simulation would be
perfomed in the driver section. A driver (AbstractDriver) is the basic
unit with an attacthed ase calculators for basic dynamics tasks, namely,
minimisation, molecular dynamics and transition-state search. Through a driver,
we can reuse the input file to perform the same simulation with several different
backends.
The example below shows how to define a driver in a yaml file:
driver:
backend: external # this means using the same backend as the calc
task: md # molecular dynamics (md) or minimisation (min)
init:
md_style: nvt # thermostat NVT
temp: 600 # temperature, Kelvin
timestep: 1.0 # fs
run:
steps: 100
See Driver section for more details.
Scheduler
With potential and driver defined, we can run simulations on local machines (directly in the command line). However, simulations, under most circumstances, would be really heavy even by MLIPs (imagine a 10 ns molecular dynamics). The simulations would ideally be dispatched to high performace clusters (HPCs).
The example below shows how to define a scheduler in a yaml file:
scheduler:
# -- currently, we only have slurm :(
backend: slurm
# -- scheduler script parameters
partition: k2-hipri
ntasks: 1
time: "0:10:00"
# -- environment settings
environs: "conda activate py37\n"
Worker
Worker that combines the above components is what we use throughout various workflows to deal with computations.
The example below shows how to define a worker in a yaml file:
potential:
name: deepmd # name of the potential
backend: ase # ase or lammps
params: # potential-specifc params
model: ./graph.pb
driver:
backend: external
task: md # molecular dynamics (md) or minimisation (min)
init:
md_style: nvt # thermostat NVT
temp: 600 # temperature, Kelvin
timestep: 1.0 # fs
run:
steps: 100
scheduler:
backend: slurm
partition: k2-hipri
ntasks: 1
time: "0:10:00"
environs: "conda activate py37\n"
to run a nvt simulation with given structures by deepmd on a slurm machine
# -- submit jobs...
# one structure for one job
$ gdp -p ./worker.yaml compute ./frames.xyz
nframes: 2
@@@DriverBasedWorker+run
cand100 JOBID: 10206151
cand96 JOBID: 10206152
@@@DriverBasedWorker+inspect
cand100 is running...
cand96 is running...
@@@DriverBasedWorker+inspect
cand100 is running...
cand96 is running...
@@@DriverBasedWorker+retrieve
# -- wait a few minutes...
# if jobs are not finished, run the command would retrieve nothing
$ gdp -p ./worker.yaml worker ./frames.xyz
nframes: 2
@@@DriverBasedWorker+run
@@@DriverBasedWorker+inspect
cand100 is running...
cand96 is running...
@@@DriverBasedWorker+inspect
cand100 is running...
cand96 is running...
@@@DriverBasedWorker+retrieve
# -- retrieve results...
$ gdp -p ./worker.yaml worker ./frames.xyz
nframes: 2
@@@DriverBasedWorker+run
@@@DriverBasedWorker+inspect
cand100 is finished...
cand96 is finished...
@@@DriverBasedWorker+inspect
@@@DriverBasedWorker+retrieve
*** read-results time: 0.0280 ***
new_frames: 2 energy of the first: -92.219757
nframes: 2
statistics of total energies: min -108.5682 max -92.2198 avg -100.3940
Note
If scheduler is not set in the yaml file, the default
LocalScheduler would be used. In other words, the simulations would be
directly run in the command line.