Compute+Select
==============

We can run basic computations with the operation `compute`. This operation accepts 
two input variables and one extra parameter and forwards `a List of Workers` that have 
computation results (several MD/MIN trajectories). 

Two input variables,

- builder: A node (variable or operation) that forwards structures.
- worker: Any `computer` variable.

One parameter,

- batchsize: How to allocate simulations into jobs.

The `computer` variable is almost the same as what we define in `worker.yaml` shown 
in :ref:`computations`. (Just change `potential` to `potter`...)

Taking `compute`'s output, we can use the 
`extract` operation to get the trajectories and use the `select` operation to select 
certain structures.

.. =====

.. |graph| image:: ../../images/gdpflow.png
    :width: 800

The workflow defined in the configuration below looks like 

    |graph|

Sesscion Configuration
----------------------

This configuration runs MD simulations, select some structures for DFT single-point 
calculations, and transfer them to the dataset.

`read` , the `read_stru` operation reads structures from the file, which is basically a wrapper 
of the `ase.io.read` function. The `./candidates.xyz` contains five structures.

`scan`, the `compute` operation, accepts **${op:read}** as input structures and runs 
simulations defined in **${vx:dpmd_computation}**. In fact, `builder` can be any 
variable or operation that forwards structures, which, for instance, are from 
builders in Section :ref:`Builders` or the extract/select by other explorations. 
Meanwhile, `temp` in the `driver` variable is `[150, 300, 600, 1200, 1500]`. There 
will be FIVE workers that run MD simulations at different temperatures.

`extract`, the `extract` operation, reads the trajectories by `scan` and forwards 
an **AtomsArray** with a shape of (5, 5, 1000). The dimensions are `number of workers`, 
`number of input structures`, `the length of trajectory`.

`select_devi`, the `select` operation, uses a `property` selector to select structures 
with `max_devi_f` in the range of [0.08, 0.64] eV/Ang (NOTE: The potential used should 
support uncertainty quantification.)

`select_desc`, the `select` operation, usese a `descriptor` selector to select structures 
using `fps` (Farthest-Point Sampling) in the `soap`-based feature space. The selection is 
performed on the dimension (axis) 0, which means structures from different temperatures 
will be selected separately. Each group gets 64 structures and 320 (64*5) structures are 
selected in total.

`run_vasp`, another `compute` operation, takes the output of `select_desc` and perform 
the single-point DFT calculations.

`transfer`, the `transfer` operation, transfers structures calculated by DFT to a 
file `./dataset/${SESSION_NAME}-${COMPOSITION}-${SYSTEM}/${VERSION}.xyz`. If the 
input structures from `./candidates.xyz` all have a composition of Cu16. 
The stored xyz-file should be `./dataset/md-Cu16-surf/dpmd.xyz`

.. code-block:: yaml
    :emphasize-lines: 28

    variables:
      dataset:
        type: dataset
        name: xyz
        dataset_path: ./dataset
      # --- computers (workers)
      dpmd_computation:
        type: computer
        potter: ${vx:dpmd}
        driver: ${vx:nvtmd}
        scheduler: ${vx:scheduler_gpu1_dpmd}
      dpmd:
        type: potter
        name: deepmd
        params:
          backend: lammps
          command: lmp -in in.lammps 2>&1 > lmp.out
          type_list: ["Al", "Cu", "O"]
          model:
            - ./graph-0.pb
            - ./graph-1.pb
      nvtmd:
        type: driver
        task: md
        init:
          md_style: nvt
          timestep: 2.0
          temp: [150, 300, 600, 1200, 1500]
          dump_period: 10
          neighbor: "2.0 bin"
          neigh_modify: "every 10 check yes"
        run:
          steps: 10000
          constraint: "lowest 120"
      scheduler_gpu1_dpmd:
        type: scheduler
        backend: slurm
        ntasks: 1
        cpus-per-task: 1
        gres: gpu:1
        mem-per-cpu: 8G
        time: "0:30:00"
        environs: "export OMP_NUM_THREADS=1\nexport KMP_WARNINGS=0\nconda activate deepmd\n"
      vasp_computation:
        type: computer
        potter: ${vx:vasp_gam}
        driver: ${vx:driver_spc}
        scheduler: ${vx:scheduler_cpu64_vasp}
      vasp_gam:
        type: potter
        name: vasp
        params:
          backend: vasp
          command: srun vasp_gam 2>&1 > vasp.out
          incar: ./INCAR_LABEL_NoMAG
          kpts: 25
          pp_path: /home/apps/vasp/potpaw/recommend
          vdw_path: /home/apps/vasp/potpaw
      driver_spc:
        type: driver
        ignore_convergence: true
      scheduler_cpu64_vasp:
        type: scheduler
        backend: slurm
        ntasks: 64
        cpus-per-task: 1
        mem-per-cpu: 256M
        time: "24:00:00"
        environs: "export OMP_NUM_THREADS=1\nmodule purge\nmodule load intel/2021.1.2 intel-mpi/intel/2021.1.1\nconda activate deepmd\n"
      # --- selectors
      sift_desc:
        type: selector
        selection:
          - method: descriptor
            axis: 0
            descriptor:
              name: soap
              species: ["Al", "Cu", "O"]
              r_cut : 6.0
              n_max : 12
              l_max : 8
              sigma : 0.2
              average : inner
              periodic : true
            sparsify:
              method: fps
              min_distance: 0.1
            number: [64, 1.0]
      sift_devi:
        type: selector
        selection:
          - method: property
            properties:
              max_devi_f:
                range: [0.08, 0.64]
                nbins: 20
                sparsify: filter
    operations:
      read:
        type: read_stru
        fname: ./candidates.xyz
      scan:
        type: compute
        builder: ${op:read}
        worker: ${vx:dpmd_computation}
        batchsize: 256
      extract:
        type: extract
        compute: ${op:scan}
      select_devi:
        type: select
        structures: ${op:extract}
        selector: ${vx:sift_devi}
      select_soap:
        type: select
        structures: ${op:select_devi}
        selector: ${vx:sift_desc}
      run_vasp:
        type: compute
        builder: ${op:select_soap}
        worker: ${vx:vasp_computation}
        batchsize: 512
      extract_dft:
        type: extract
        compute: ${op:run_vasp}
      transfer:
        type: transfer
        structures: ${op:extract_dft}
        dataset: ${vx:dataset}
        version: dpmd
        system: surf
    sessions:
      md: transfer

.. warning::

    If the installed **dscribe** version is < 2.0.0, you need to change the parameters 
    `r_cut`, `n_max`, and `l_max` to `rcut`, `nmax`, and `lmax`.