版本：Next

NEP 操作演示

这里，我们以 MatPL 源码根目录/example/HfO2/nep_demo 为例（HfO2 训练集来源），演示 NEP 模型的训练、测试、lammps模拟以及其他功能。案例目录结构如下所示。

HfO2/
├── atom.config
├── pwdata/
└── nep_demo/
    ├── nep_test.json
    ├── nep_train.json
    ├── train.job
    └── nep_lmps/
        ├── in.lammps
        ├── lmp.config
        ├── nep_to_lmps.txt
        ├── runcpu.job
        └── rungpu.job

pwdata 目录为训练数据目录
nep_train.json 是训练 NEP 力场输入参数文件
nep_test.json 是测试 NEP 力场输入参数文件
train.job 是slurm 提交训练任务例子
nep_lmps 目录下为 NEP 力场的 lammps md例子
- 力场文件 nep_to_lmps.txt
- 初始结构 lmp.config
- 控制文件 in.lammps
- runcpu.job 和 rungpu.job 是 slurm 脚本例子

train 训练

在 nep_demo 目录下使用如下命令即可开始训练：

MatPL train nep_train.json
# 或修改环境变量之后通过slurm 提交训练任务 sbatch train.job

输入文件解释

nep_train.json 中的内容如下所示，关于 NEP 的参数解释，请参考 NEP 参数手册：

{
    "model_type": "NEP",
    "atom_type": [
        8, 72
    ],
    "optimizer": {
        "optimizer": "ADAM",
        "epochs": 30, 
        "batch_size": 1,
        "print_freq": 10,
        "train_energy": true,
        "train_force": true,
        "train_virial": true
    },

    "format": "pwmlff/npy",
    "train_data": [
        "../pwdata/init_000_50/", "../pwdata/init_002_50/", 
        "../pwdata/init_004_50/", "../pwdata/init_006_50/", 
        "../pwdata/init_008_50/", "../pwdata/init_010_50/", 
        "../pwdata/init_012_50/", "../pwdata/init_014_50/", 
        "../pwdata/init_016_50/", "../pwdata/init_018_50/", 
        "../pwdata/init_020_20/", "../pwdata/init_022_20/", 
        "../pwdata/init_024_20/", "../pwdata/init_026_20/", 
        "../pwdata/init_001_50/", "../pwdata/init_003_50/", 
        "../pwdata/init_005_50/", "../pwdata/init_007_50/", 
        "../pwdata/init_009_50/", "../pwdata/init_011_50/", 
        "../pwdata/init_013_50/", "../pwdata/init_015_30/", 
        "../pwdata/init_017_50/", "../pwdata/init_019_50/", 
        "../pwdata/init_021_20/", "../pwdata/init_023_20/", 
        "../pwdata/init_025_20/", "../pwdata/init_027_20/"
    ],
    "valid_data":[
        "../pwdata/init_000_50/", "../pwdata/init_004_50/", 
        "../pwdata/init_008_50/"       
    ]
}

训练结束后的力场文件目录请参考 model_record 详解

多节点多卡训练

多节点多卡训练的目录结构与上面相同，案例请参考 MatPL 源码根目录/example/parallelnep 为例（HfO2 训练集来源）。

该目录下提供了单节点单卡 1node-1g-run.job 、单节点多卡 1node-4g-run.job 、多节点多卡 2node-8g-run.job 三种启动脚本供参考，该脚本适用于 mcloud 用户。对于在线安装用户，MatPL-2026.3 的环境加载请参考文件env.sh。

多节点多卡训练启动时要求提供主机节点的地址以及可用端口，建议通过如下shell 命令自动获取

MASTER_ADDR=$(scontrol show hostnames $SLURM_JOB_NODELIST | head -n 1)
# 动态分配空闲端口
function get_free_port() {
    python -c 'import socket; s = socket.socket(socket.AF_INET, socket.SOCK_STREAM); s.bind(("", 0)); print(s.getsockname()[1]); s.close()'
}
MASTER_PORT=$(get_free_port)

export MASTER_ADDR=$MASTER_ADDR
export MASTER_PORT=$MASTER_PORT

echo "addrs: $MASTER_ADDR"
echo "port:  $MASTER_PORT"
echo "tasks: $SLURM_NTASKS"

srun MATPL train train.json 

警告

注意，NEP 多卡训练只支持使用 ADAM 优化器，不支持 LKF 或 GKF 优化器。

test 测试

test 命令支持来自 MatPL nep_model.ckpt 力场文件，以及在 lammps 或 GPUMD 中使用的 nep5.txt、nep4.txt 格式文件。

MatPL test nep_test.json

test.json 中的内容如下所示，参数解释请参考参数手册

{
    "model_type": "NEP",
    "format": "pwmlff/npy",
    "model_load_file": "./model_record/nep_model.ckpt",
    "test_data": [
        "../init_000_50", "../init_004_50", "../init_008_50", 
        "../init_012_50", "../init_016_50", "../init_020_20", 
        "../init_024_20", "../init_001_50", "../init_005_50", 
        "../init_009_50", "../init_013_50", "../init_017_50", 
        "../init_021_20", "../init_025_20", "../init_002_50", 
        "../init_006_50", "../init_010_50", "../init_014_50", 
        "../init_018_50", "../init_022_20", "../init_026_20", 
        "../init_003_50", "../init_007_50", "../init_011_50", 
        "../init_015_30", "../init_019_50", "../init_023_20", 
        "../init_027_20"
    ]
}

测试结束后的力场文件目录请参考 test_result 详解

infer 推理单结构

infer 命令支持来自MatPL nep_model.ckpt 力场文件、GPUMD 的 nep4.txt 文件、 lammps 和 GPUMD 中通用的nep5.txt 格式文件。

MatPL infer nep_model.ckpt atom.config pwmat/config
MatPL infer gpumd_nep.txt 0.lammpstrj lammps/dump Hf O
# Hf O 为 lammps/dump格式的结构中的元素名称，Hf为结构中1号元素类型，O为元素中2号元素类型

推理成功后，将在窗口输出推理的总能、每原子能量、每原子受力和维里

totxt 转ckpt训练文件为nep5.txt

用于把 MatPL 训练的 nep_model.ckpt 文件转换为 txt 格式的nep5.txt 文件，该文件可用于 GPUMD 或 lammps-MatPL 中做分子动力学模拟。

MatPL totxt nep_model.ckpt

执行成功将在执行该命令的所在目录生成名称为nep5.txt文件

lammps MD

step1. 准备力场文件

将训练完成后生成的nep_model.ckpt力场文件用于 lammps 模拟，您需要提取力场文件，您只需要输入如下命令

MatPL totxt nep_model.ckpt

转换成功之后，您将得到一个力场文件nep5.txt。

如果您的模型正常训练结束，在model_record目录下会存在一个nep5.txt 文件，您可以直接使用。

此外，也支持 GPUMD 的 NEP5、 NEP4 力场文件。

step2. 准备输入控制文件

您需要在lammps的输入控制文件中设置如下力场，这里以HfO2为例（HfO2/nep_demo/nep_lmps

对于lammps nep的 kokkos 加速版本：

# 2024版本的lammps 需要设置 neigh half (2023版本的lammps 设置 half 或者 full 都可)
package kokkos neigh half comm device
newton on

pair_style   matpl/nep/kk   力场文件路径 
pair_coeff   * *     O Hf

2024版本的lammps 需要设置 neigh half (2023版本的lammps 设置 half 或者 full 都可)
pair_style 设置力场文件路径，这里 matpl/nep/kk 为固定格式，代表使用MatPL中的 NEP kokkos GPU 加速功能，如果是 matpl/nep 则使用只使用 cpu。如果是使用 DP 模型，则对应matpl/dp，此时如果存在GPU，将会自动调用GPU做加速，否则只使用CPU。

这里也支持多模型的偏差值输出，该功能一般用于主动学习采用中。您可以指定多个模型，在模拟中将使用第1个模型做MD，其他模型参与偏差值计算，例如例子中所示，此时pair_style设置为如下:
```
pair_style   matpl/nep/kk   0_nep.txt 1_nep.txt 2_nep.txt 3_nep.txt  out_freq DUMP_FREQ_VALUE out_file model_devi.out 
```
pair_coeff 指定待模拟结构中的原子类型对应的元素序号。例如，如果您的结构中 1 为 O 元素，2 为 Hf 元素，设置 pair_coeff * * 8 72即可。这里支持使用元素序号或者元素名称，只要顺序与输入结构文件中保持一致即可。

step3 启动lammps模拟

# 加载 lammps 环境变量env.sh 文件，正确安装后，该文件位于 lammps 源码根目录下
source /the/path/of/lammps/env.sh

# 执行lammps命令
# 对于 NEP 力场，提供了kokkos 加速，对应pair设置为 matpl/nep/kk 采用如下命令启动
# 单节点多卡（如下为单节点4卡）
mpirun -np 4 --bind-to numa lmp -k on g 4 -sf kk -pk kokkos -in kkin.lmp

# 多节点多卡（如下为2个节点，每个节点4张卡）
mpirun -np 8 --map-by ppr:4:node lmp -k on g 4 -sf kk -pk kokkos -in kkin.lmp

# 下面的这种方式适合于matpl/nep cpu版本或者matpl/dp的启动
mpirun -np N lmp -in in.lammps

ASE 接口

NEP 模型提供了 ase 接口，使用方式如下脚本例子所示gitee 或 github。

from src.ase.calculate import MatPL_calculator
calc = MatPL(model_file='nep_model.ckpt or nep.txt')
atoms = ..... # create ase.atoms.Atoms
atoms.calc = calc # or atoms.set_calculator(calc)
energy = atoms.get_potential_energy()
forces = atoms.get_forces()
stress = atoms.get_stress()

注意，在使用本ase接口时确保已经导入了MatPL的环境变量。