!449 增加baichuan7B/baichuan2-7B adaptor

Merge pull request !449 from xiongliangcheng/modellink
2024-12-05 05:17:40 +08:00 · 2024-01-26 11:52:49 +00:00 · 2024-01-26 11:52:49 +00:00 · 2a9af85917
commit 2a9af85917
parent daa835112c
23 changed files with 851 additions and 0 deletions
--- a/8
+++ b/8
@ -30,3 +30,11 @@ reviewers:
 - wenjiang2357
 - leizhenzhen
 - liuyanghan
+- Ares_Lzk
+- flying-artillery
+- xiong-liangcheng_admin
+- gitee-code-template
+- yaojia2021
+- chantcalf
+- kongfuziyue
+- yuhui69
--- a/examples/baichuan/README.md
+++ b/examples/baichuan/README.md
@ -0,0 +1,166 @@
+# BaiChuan
+<p align="left">
+        <b>简体中文</b> |
+        <b><a href="https://gitee.com/ascend/ModelLink/blob/modellink/examples/baichuan/README_en.md">English</a> </b> 
+    </p>
+</p>
+
+
+
+#  目录
+
+- [Baichuan-7B](#Baichuan-7B)
+  - [训练](#训练)
+    - [脚本](#脚本)
+    - [性能](#性能)
+        - [吞吐](#吞吐)
+        - [精度](#精度)
+
+
+# Baichuan-7B
+
+## 训练
+Baichuan-7B 训练的硬件配置如下：
+
+|  硬件 |       配置        |
+|:---:|:---------------:|
+| NPU | 8 x Ascend NPUs |
+
+### 脚本
+
+1. 拷贝仓库到你的个人服务器：
+```shell
+git clone https://gitee.com/ascend/ModelLink.git 
+cd ModeLlink 
+git checkout modellink
+mkdir logs
+mkdir ckpt
+```
+
+2. 搭建环境
+
+```bash
+# python3.8
+conda create -n test python=3.8
+conda activate test
+
+# 安装 torch 和 torch_npu 
+pip install torch-2.1.0-cp37-cp37m-linux_aarch64.whl
+pip install torch_npu-2.1.0.XXX-cp37-cp37m-linux_aarch64.whl
+pip install apex-0.1_ascend*-cp38-cp38m-linux_aarch64.whl
+
+# 修改 ascend-toolkit 路径
+source /usr/local/Ascend/ascend-toolkit/set_env.sh
+
+# 安装加速库
+git clone https://gitee.com/ascend/AscendSpeed.git
+cd AscendSpeed
+pip install -r requirements.txt 
+pip3 install -e .
+cd ..
+
+# 安装其余依赖库
+pip install -r requirements.txt 
+```
+
+3. （可选）准备预训练权重
+
+从 [huggingface](https://huggingface.co/baichuan-inc/Baichuan-7B/tree/main) 下载预训练权重：
+
+
+
+```shell
+mkdir baichuan-7B-hf
+cd ./baichuan-7B-hf
+wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/config.json
+wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/configuration_baichuan.py
+wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/generation_config.json
+wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/handler.py
+wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/modeling_baichuan.py
+wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/pytorch_model.bin
+wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/special_tokens_map.json
+wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenization_baichuan.py
+wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenizer.model
+wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenizer_config.json
+cd ..
+```
+
+接着将hf格式的权重转化为AscendSpeed可以加载的形式：
+```shell
+mkdir baichuan-7B-mt
+
+SCRIPT_PATH=./tools/ckpt_convert/llama/convert_weights_from_huggingface.py
+python $SCRIPT_PATH \
+    --input-model-dir ./baichuan-7B-hf \
+    --output-model-dir ./baichuan-7B-mt \
+    --tensor-model-parallel-size 8 \
+    --pipeline-model-parallel-size 1 \
+    --type 7B \
+    --pse \
+    --merge-mlp
+```
+
+
+4. 准备数据集
+
+从 [这里](https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet) 下载 BaiChuan-7B 的数据集：
+
+```shell
+# 下载数据集
+mkdir dataset_baichuan7B
+cd ./dataset_baichuan7B
+wget https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
+cd ..
+
+# 准备数据集                              
+python ./tools/preprocess_data.py \
+--input ./dataset_baichuan7B/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
+--tokenizer-name-or-path ./baichuan-7B-hf \
+--output-prefix ./dataset_baichuan7B/alpaca \
+--workers 4 \
+--log-interval 1000 \
+--tokenizer-type PretrainedFromHF
+```
+
+
+5. 配置 Baichuan-7B 预训练脚本: examples/baichuan/pretrain_baichuan_ptd_7B.sh 
+
+```shell
+# 修改 ascend-toolkit 路径
+source /usr/local/Ascend/ascend-toolkit/set_env.sh 
+
+CKPT_SAVE_DIR="./ckpt"
+DATA_PATH="./dataset_baichuan7B/alpaca_text_document"
+TOKENIZER_MODEL="./baichuan-7B-hf/tokenizer.model"
+CKPT_LOAD_DIR="./baichuan-7B-mt"
+```
+
+6. 启动 Baichuan-7B 预训练脚本: examples/baichuan/pretrain_baichuan_ptd_7B.sh 
+
+```shell
+bash examples/baichuan/pretrain_baichuan_ptd_7B.sh 
+```
+
+### 性能
+
+#### 吞吐
+
+Baichuan-7B 在 **昇腾芯片** 和 **参考芯片** 上的性能对比：
+
+|  设备  |    模型     | 迭代数  | 样本吞吐 (samples/s) | tokens吞吐 (tokens/s/p) | 单步迭代时间 (s/step) | 
+|:----:|:---------:|:----:|:---------------------:|:---------------:|:----------------:|
+| NPUs | Baichuan-7B | 1000 | 4.78 | 2448.76 | 6.688| 
+|  参考  | Baichuan-7B | 1000 | 5.45 |  2792.56 | 5.863       | 
+
+#### 精度
+
+NPU vs 参考 loss.
+
+![NPU-LOSS](../../sources/images/baichuan/baichuan7B-loss-compare.png)
+
+NPU vs 参考 loss 相对误差.
+
+![NPU-Relative-Error](../../sources/images/baichuan/baichuan7B-loss-relative-error.png)
+
+
+
--- a/examples/baichuan/README_en.md
+++ b/examples/baichuan/README_en.md
@ -0,0 +1,165 @@
+# BaiChuan
+<p align="left">
+        <b><a href="https://gitee.com/ascend/ModelLink/blob/modellink/examples/baichuan/README.md">简体中文</a></b> |
+        <b>English</b> 
+    </p>
+</p>
+
+
+#  Contents
+
+- [Baichuan-7B](#contents)
+  - [Training](#pre-training)
+    - [Script](#script)
+    - [Performance](#performance)
+        - [Machine performance](#machine-performance)
+        - [Accuracy of the loss](#accuracy-of-the-loss)
+
+
+# Baichuan-7B
+
+## Training
+
+Here's a hardware summary of pre-training Baichuan-7B:
+
+| Hardware |                      Value                      |
+| :------: | :---------------------------------------------: |
+|   NPU    |               8 x Ascend NPUs                   |
+
+### Script
+
+1. Clone the repository to your local server:
+```shell
+git clone https://gitee.com/ascend/ModelLink.git 
+cd ModeLlink 
+git checkout modellink
+mkdir logs
+mkdir ckpt
+```
+
+2. Build environment
+
+```bash
+# python3.8
+conda create -n test python=3.8
+conda activate test
+
+# install torch and torch_npu 
+pip install torch-2.1.0-cp38-cp38m-linux_aarch64.whl
+pip install torch_npu-2.1.0.XXX-cp38-cp38m-linux_aarch64.whl
+pip install apex-0.1_ascend*-cp38-cp38m-linux_aarch64.whl
+
+# modify the path according to your own  ascend-toolkit path
+source /usr/local/Ascend/ascend-toolkit/set_env.sh
+
+# install AscendSpeed
+git clone https://gitee.com/ascend/AscendSpeed.git
+cd AscendSpeed
+pip install -r requirements.txt 
+pip3 install -e .
+cd ..
+
+# install other packages
+pip install -r requirements.txt 
+```
+
+3. Prepare pretrained weights
+Download the Baichuan-7B checkpoint from [here](https://huggingface.co/baichuan-inc/Baichuan-7B/tree/main) 
+
+```shell
+mkdir baichuan-7B-hf
+cd ./baichuan-7B-hf
+wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/config.json
+wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/configuration_baichuan.py
+wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/generation_config.json
+wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/handler.py
+wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/modeling_baichuan.py
+wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/pytorch_model.bin
+wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/special_tokens_map.json
+wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenization_baichuan.py
+wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenizer.model
+wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenizer_config.json
+cd ..
+```
+In order to adapt to the baichuan-7B model, the following script is used to convert the model pre-training weights.
+```shell
+mkdir weight
+
+SCRIPT_PATH=./tools/ckpt_convert/llama/convert_weights_from_huggingface.py
+python $SCRIPT_PATH \
+    --input-model-dir ./baichuan-7B-hf \
+    --output-model-dir ./weight \
+    --tensor-model-parallel-size 8 \
+    --pipeline-model-parallel-size 1 \
+    --type 7B \
+    --pse \
+    --merge-mlp
+```
+
+
+4. Prepare dataset
+
+Download the Baichuan-7B datasets from [here](https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet) 
+
+```shell
+# download datasets
+mkdir dataset_baichuan7B
+cd ./dataset_baichuan7B
+wget https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
+cd ..
+
+# process datasets                              
+python ./tools/preprocess_data.py \
+--input ./dataset_baichuan7B/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
+--tokenizer-name-or-path ./baichuan-7B-hf \
+--output-prefix ./dataset_baichuan7B/alpaca \
+--workers 4 \
+--log-interval 1000 \
+--tokenizer-type PretrainedFromHF
+```
+
+
+5. Config Baichuan-7B pre-training script : examples/baichuan/pretrain_baichuan_ptd_7B.sh 
+
+```shell
+# modify the script according to your own  ascend-toolkit path
+source /usr/local/Ascend/ascend-toolkit/set_env.sh 
+
+CKPT_SAVE_DIR="./ckpt"
+DATA_PATH="./dataset_baichuan7B/alpaca_text_document"
+TOKENIZER_MODEL="./baichuan-7B-hf/tokenizer.model"
+CKPT_LOAD_DIR="./baichuan-7B-mt"
+```
+
+
+6. Launch Baichuan-7B  pre-training script: examples/baichuan/pretrain_baichuan_ptd_7B.sh 
+
+```shell
+bash examples/baichuan/pretrain_baichuan_ptd_7B.sh 
+```
+
+
+
+### Performance
+
+#### Machine performance
+
+The performance of Baichuan-7B in **Ascend NPU** and **Reference**:
+
+| Device | Model       | total Iterations | throughput rate (samples/s) | throughput rate (tokens/s/p) | single-step time (s/step) | 
+|:----:|:---------:|:----:|:---------------------:|:---------------:|:----------------:|
+| NPUs | Baichuan-7B | 1000 | 4.78 | 2448.76 | 6.688| 
+|  Reference  | Baichuan-7B | 1000 | 5.45 |  2792.56 | 5.863       | 
+
+#### Accuracy of the loss
+
+NPU vs Reference loss.
+
+![NPU-LOSS](../../sources/images/baichuan/baichuan7B-loss-compare.png)
+
+NPU vs Reference loss relative error.
+
+![NPU-Relative-Error](../../sources/images/baichuan/baichuan7B-loss-relative-error.png)
+
+
+
--- a/examples/baichuan/pretrain_baichuan_ptd_7B.sh
+++ b/examples/baichuan/pretrain_baichuan_ptd_7B.sh
@ -0,0 +1,89 @@
+#!/bin/bash
+
+export CUDA_DEVICE_MAX_CONNECTIONS=1
+
+GPUS_PER_NODE=8
+MASTER_ADDR=localhost
+MASTER_PORT=6000
+NNODES=1
+NODE_RANK=0
+WORLD_SIZE=$(($GPUS_PER_NODE*$NNODES))
+
+CKPT_SAVE_DIR="your model save ckpt path"
+DATA_PATH="your data path"
+TOKENIZER_MODEL="your tokenizer model path"
+CKPT_LOAD_DIR="your model load ckpt path"
+
+TP=8
+PP=1
+
+DISTRIBUTED_ARGS="
+    --nproc_per_node $GPUS_PER_NODE \
+    --nnodes $NNODES \
+    --node_rank $NODE_RANK \
+    --master_addr $MASTER_ADDR \
+    --master_port $MASTER_PORT
+"
+
+GPT_ARGS="
+    --tensor-model-parallel-size ${TP} \
+    --pipeline-model-parallel-size ${PP} \
+    --sequence-parallel \
+    --num-layers 32 \
+    --hidden-size 4096 \
+    --ffn-hidden-size 11008 \
+    --num-attention-heads 32 \
+    --tokenizer-type Llama2Tokenizer \
+    --tokenizer-model ${TOKENIZER_MODEL} \
+    --load ${CKPT_LOAD_DIR} \
+    --seq-length 4096 \
+    --max-position-embeddings 4096 \
+    --micro-batch-size 4 \
+    --global-batch-size 32 \
+    --make-vocab-size-divisible-by 128 \
+    --lr 1e-5 \
+    --train-iters 5000 \
+    --lr-decay-style cosine \
+    --untie-embeddings-and-output-weights \
+    --disable-bias-linear \
+    --attention-dropout 0.0 \
+    --init-method-std 0.01 \
+    --hidden-dropout 0.0 \
+    --position-embedding-type rope \
+    --normalization RMSNorm \
+    --use-fused-rmsnorm \
+    --use-flash-attn \
+    --swiglu \
+    --no-masked-softmax-fusion \
+    --attention-softmax-in-fp32 \
+    --min-lr 1e-6 \
+    --weight-decay 1e-2 \
+    --lr-warmup-fraction 0.1 \
+    --clip-grad 1.0 \
+    --adam-beta1 0.9 \
+    --initial-loss-scale 8188.0 \
+    --adam-beta2 0.95 \
+    --no-gradient-accumulation-fusion \
+    --no-load-optim \
+    --no-load-rng \
+    --fp16
+"
+
+DATA_ARGS="
+    --data-path $DATA_PATH \
+    --split 949,50,1
+"
+
+OUTPUT_ARGS="
+    --log-interval 1 \
+    --save-interval 1000 \
+    --eval-interval 1000 \
+    --eval-iters 1 \
+"
+
+torchrun $DISTRIBUTED_ARGS pretrain_gpt.py \
+    $GPT_ARGS \
+    $DATA_ARGS \
+    $OUTPUT_ARGS \
+    --distributed-backend nccl \
+    --save ${CKPT_SAVE_DIR}
--- a/examples/baichuan2/README.md
+++ b/examples/baichuan2/README.md
@ -0,0 +1,166 @@
+# BaiChuan2
+<p align="left">
+        <b>简体中文</b> |
+        <b><a href="https://gitee.com/ascend/ModelLink/blob/modellink/examples/baichuan2/README_en.md">English</a> </b> 
+    </p>
+</p>
+
+
+#  目录
+- [Baichuan2-7B](#Baichuan2-7B)
+  - [训练](#训练)
+    - [脚本](#脚本)
+    - [性能](#性能)
+        - [吞吐](#吞吐)
+        - [精度](#精度)
+
+
+# Baichuan2-7B
+
+## 训练
+Baichuan2-7B 训练的硬件配置如下：
+
+|  硬件 |       配置        |
+|:---:|:---------------:|
+| NPU | 8 x Ascend NPUs |
+
+### 脚本
+
+1. 拷贝仓库到你的个人服务器：
+```shell
+git clone https://gitee.com/ascend/ModelLink.git 
+cd ModeLlink 
+git checkout modellink 
+mkdir logs
+mkdir ckpt
+```
+
+2. 搭建环境
+
+```bash
+# python3.8
+conda create -n test python=3.8
+conda activate test
+
+# 安装 torch 和 torch_npu 
+pip install torch-2.1.0-cp38-cp38m-linux_aarch64.whl
+pip install torch_npu-2.1.0.XXX-cp38-cp38m-linux_XXX.whl
+
+# 修改 ascend-toolkit 路径
+source /usr/local/Ascend/ascend-toolkit/set_env.sh 
+
+# 安装加速库
+git clone https://gitee.com/ascend/AscendSpeed.git
+cd AscendSpeed
+pip install -r requirements.txt 
+pip3 install -e .
+cd ..
+
+# 安装其余依赖库
+pip install -r requirements.txt 
+```
+
+3. （可选）准备预训练权重
+
+从 [huggingface](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/tree/main) 下载预训练权重：
+
+```shell
+mkdir baichuan2-7B-hf
+cd ./baichuan2-7B-hf
+wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/config.json
+wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/configuration_baichuan.py
+wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/generation_utils.py
+wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/modeling_baichuan.py
+wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/pytorch_model-00001-of-00002.bin
+wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/pytorch_model-00002-of-00002.bin
+wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/pytorch_model.bin.index.json
+wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/quantizer.py
+wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/special_tokens_map.json
+wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/tokenization_baichuan.py
+wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/tokenizer.model
+wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/tokenizer_config.json
+cd ..
+```
+
+接着将hf格式的权重转化为AscendSpeed可以加载的形式：
+```shell
+mkdir baichuan2-7B-mt
+
+SCRIPT_PATH=./tools/ckpt_convert/llama/convert_weights_from_huggingface.py
+# for ptd
+python $SCRIPT_PATH \
+    --input-model-dir ./baichuan2-7B-hf \
+    --output-model-dir ./baichuan2-7B-mt \
+    --tensor-model-parallel-size 8 \
+    --pipeline-model-parallel-size 1 \
+    --type 7B \
+    --merge-mlp \
+    --pse  
+```
+
+
+4. 准备数据集
+
+从 [这里](https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet) 下载 Baichuan2-7B-Base 的数据集：
+
+```shell
+# 下载数据集
+mkdir dataset_baichuan2-7B
+cd ./dataset_baichuan2-7B
+wget https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
+cd ..
+
+# 准备数据集                              
+python ./tools/preprocess_data.py \
+--input ./dataset_baichuan2-7B/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
+--tokenizer-name-or-path ./baichuan2-7B-hf \
+--output-prefix ./dataset_baichuan2-7B/alpaca \
+--workers 4 \
+--log-interval 1000 \
+--tokenizer-type PretrainedFromHF
+```
+
+
+5. 配置 Baichuan2-7B 预训练脚本: examples/baichuan/pretrain_baichuan2_ptd_7B.sh 
+
+```shell
+# 修改 ascend-toolkit 路径
+source /usr/local/Ascend/ascend-toolkit/set_env.sh 
+
+# 修改数据集，权重，词表等路径
+CKPT_SAVE_DIR="./ckpt"
+DATA_PATH="./dataset_baichuan2-7B/alpaca_text_document"
+TOKENIZER_MODEL="./baichuan2-7B-hf/tokenizer.model"
+CKPT_LOAD_DIR="./baichuan2-7B-mt"
+```
+
+6. 启动 Baichuan2-7B 预训练脚本: examples/baichuan2/pretrain_baichuan2_ptd_7B.sh 
+
+```shell
+bash examples/baichuan2/pretrain_baichuan2_ptd_7B.sh 
+```
+
+### 性能
+
+#### 吞吐
+
+Baichuan2-7B 在 **昇腾芯片** 和 **参考芯片** 上的性能对比：
+
+|  设备  |    模型     | 迭代数  | 样本吞吐 (samples/s) | tokens吞吐 (tokens/s/p) | 单步迭代时间 (s/step) | 
+|:----:|:---------:|:----:|:---------------------:|:---------------:|:----------------:|
+| NPUs | Baichuan2-7B | 1000 | 4.59 | 2349 | 6.973| 
+|  参考  | Baichuan2-7B | 1000 | 5.40 |  2769 | 5.915       | 
+
+
+
+#### 精度
+
+NPU vs 参考 loss.
+
+![NPU-LOSS](../../sources/images/baichuan2/baichuan2-7B-loss-compare.png)
+
+NPU vs 参考 loss 相对误差.
+
+![NPU-Relative-Error](../../sources/images/baichuan2/baichuan2-7B-loss-relative-error.png)
+
+
--- a/examples/baichuan2/README_en.md
+++ b/examples/baichuan2/README_en.md
@ -0,0 +1,168 @@
+# BaiChuan2
+<p align="left">
+        <b><a href="https://gitee.com/ascend/ModelLink/blob/modellink/examples/baichuan2/README.md">简体中文</a></b> |
+        <b>English</b> 
+    </p>
+</p>
+
+
+#  Contents
+- [Baichuan2-7B](#contents)
+  - [Training](#pre-training)
+    - [Script](#script)
+    - [Performance](#performance)
+        - [Machine performance](#machine-performance)
+        - [Accuracy of the loss](#accuracy-of-the-loss)
+
+# Baichuan2-7B
+
+## Training
+
+Here's a hardware summary of pre-training Baichuan2-7B:
+
+| Hardware |                      Value                      |
+| :------: | :---------------------------------------------: |
+|   NPU    |               8 x Ascend NPUs                   |
+
+
+### Script
+
+1. Clone the repository to your local server:
+```shell
+git clone https://gitee.com/ascend/ModelLink.git 
+cd ModeLlink 
+git checkout -b modellink origin/modellink 
+mkdir logs
+mkdir ckpt
+```
+
+2. Build environment
+
+```bash
+# python3.8
+conda create -n test python=3.8
+conda activate test
+
+# install torch and torch_npu 
+pip install torch-2.1.0-cp38-cp38m-linux_aarch64.whl
+pip install torch_npu-2.1.0.XXX-cp38-cp38m-linux_XXX.whl
+
+# modify the path according to your own  ascend-toolkit path
+source /usr/local/Ascend/ascend-toolkit/set_env.sh 
+
+# install AscendSpeed
+git clone https://gitee.com/ascend/AscendSpeed.git
+cd AscendSpeed
+pip install -r requirements.txt 
+pip3 install -e .
+cd ..
+
+# install other packages
+pip install -r requirements.txt 
+```
+
+
+3. Prepare pretrained weights
+Download the Baichuan2-7B checkpoint from [here](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/tree/main)
+
+
+
+```shell
+mkdir baichuan2-7B-hf
+cd ./baichuan2-7B-hf
+wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/config.json
+wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/configuration_baichuan.py
+wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/generation_utils.py
+wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/modeling_baichuan.py
+wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/pytorch_model-00001-of-00002.bin
+wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/pytorch_model-00002-of-00002.bin
+wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/pytorch_model.bin.index.json
+wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/quantizer.py
+wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/special_tokens_map.json
+wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/tokenization_baichuan.py
+wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/tokenizer.model
+wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/tokenizer_config.json
+cd ..
+```
+
+In order to adapt to the baichuan2-7B model, the following script is used to convert the model pre-training weights.
+```shell
+mkdir weight
+
+SCRIPT_PATH=./tools/ckpt_convert/llama/convert_weights_from_huggingface.py
+# for ptd
+python $SCRIPT_PATH \
+    --input-model-dir ./baichuan2-7B-hf \
+    --output-model-dir ./weight-tp8 \
+    --tensor-model-parallel-size 8 \
+    --pipeline-model-parallel-size 1 \
+    --type 7B \
+    --merge-mlp \
+    --pse  
+```
+
+
+4. Prepare dataset
+
+Download the Baichuan2-7B-Base datasets from [here](https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet) 
+
+```shell
+# download datasets
+mkdir dataset_baichuan2-7B
+cd ./dataset_baichuan2-7B
+wget https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
+cd ..
+
+# process datasets                              
+python ./tools/preprocess_data.py \
+--input ./dataset_baichuan2-7B/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
+--tokenizer-name-or-path ./baichuan2-7B-hf \
+--output-prefix ./dataset_baichuan2-7B/alpaca \
+--workers 4 \
+--log-interval 1000 \
+--tokenizer-type PretrainedFromHF
+```
+
+
+5. Config Baichuan2-7B pre-training script : examples/baichuan/pretrain_baichuan2_ptd_7B.sh 
+
+```shell
+# modify the script according to your own  ascend-toolkit path
+source /usr/local/Ascend/ascend-toolkit/set_env.sh 
+
+# modify script orign dataset path according to your own dataset path
+CKPT_SAVE_DIR="./ckpt"
+DATA_PATH="./dataset_baichuan2-7B/alpaca_text_document"
+TOKENIZER_MODEL="./baichuan2-7B-hf/tokenizer.model"
+CKPT_LOAD_DIR="./baichuan2-7B-mt"
+```
+
+ 
+6. Launch Baichuan2-7B  pre-training script: examples/baichuan2/pretrain_baichuan2_ptd_7B.sh 
+
+```shell
+bash examples/baichuan2/pretrain_baichuan2_ptd_7B.sh 
+```
+
+
+### Performance
+
+#### Machine performance
+
+The performance of Baichuan2-7B in **Ascend NPU** and **Reference**:
+
+| Device | Model       | total Iterations | throughput rate (samples/s) | throughput rate (tokens/s/p) | single-step time (s/step) | 
+|:----:|:---------:|:----:|:---------------------:|:---------------:|:----------------:|
+| NPUs | Baichuan2-7B | 1000 | 4.59 | 2349 | 6.973| 
+|  Reference  | Baichuan2-7B | 1000 | 5.40 |  2769 | 5.915       |
+
+#### Accuracy of the loss
+
+NPU vs Reference loss.
+
+![NPU-LOSS](../../sources/images/baichuan2/baichuan2-7B-loss-compare.png)
+
+NPU vs Reference loss relative error.
+
+![NPU-Relative-Error](../../sources/images/baichuan2/baichuan2-7B-loss-relative-error.png)
+
--- a/examples/baichuan2/pretrain_baichuan2_ptd_7B.sh
+++ b/examples/baichuan2/pretrain_baichuan2_ptd_7B.sh
@ -0,0 +1,89 @@
+#!/bin/bash
+
+export CUDA_DEVICE_MAX_CONNECTIONS=1
+
+GPUS_PER_NODE=8
+MASTER_ADDR=localhost
+MASTER_PORT=6000
+NNODES=1
+NODE_RANK=0
+WORLD_SIZE=$(($GPUS_PER_NODE*$NNODES))
+
+CKPT_SAVE_DIR="your model save ckpt path"
+DATA_PATH="your data path"
+TOKENIZER_MODEL="your tokenizer model path"
+CKPT_LOAD_DIR="your model load ckpt path"
+
+TP=8
+PP=1
+
+DISTRIBUTED_ARGS="
+    --nproc_per_node $GPUS_PER_NODE \
+    --nnodes $NNODES \
+    --node_rank $NODE_RANK \
+    --master_addr $MASTER_ADDR \
+    --master_port $MASTER_PORT
+"
+
+GPT_ARGS="
+    --tensor-model-parallel-size ${TP} \
+    --pipeline-model-parallel-size ${PP} \
+    --sequence-parallel \
+    --num-layers 32 \
+    --hidden-size 4096 \
+    --ffn-hidden-size 11008 \
+    --num-attention-heads 32 \
+    --tokenizer-type Llama2Tokenizer \
+    --tokenizer-model ${TOKENIZER_MODEL} \
+    --load ${CKPT_LOAD_DIR} \
+    --seq-length 4096 \
+    --max-position-embeddings 4096 \
+    --micro-batch-size 4 \
+    --global-batch-size 32 \
+    --make-vocab-size-divisible-by 128 \
+    --lr 1e-6 \
+    --train-iters 5000 \
+    --lr-decay-style cosine \
+    --untie-embeddings-and-output-weights \
+    --disable-bias-linear \
+    --attention-dropout 0.0 \
+    --init-method-std 0.01 \
+    --hidden-dropout 0.0 \
+    --position-embedding-type rope \
+    --normalization RMSNorm \
+    --use-fused-rmsnorm \
+    --use-flash-attn \
+    --swiglu \
+    --no-masked-softmax-fusion \
+    --attention-softmax-in-fp32 \
+    --min-lr 1e-8 \
+    --weight-decay 1e-2 \
+    --lr-warmup-fraction 0.1 \
+    --clip-grad 1.0 \
+    --adam-beta1 0.9 \
+    --initial-loss-scale 8188.0 \
+    --adam-beta2 0.95 \
+    --no-gradient-accumulation-fusion \
+    --no-load-optim \
+    --no-load-rng \
+    --bf16
+"
+
+DATA_ARGS="
+    --data-path $DATA_PATH \
+    --split 949,50,1
+"
+
+OUTPUT_ARGS="
+    --log-interval 1 \
+    --save-interval 1000 \
+    --eval-interval 1000 \
+    --eval-iters 1 \
+"
+
+torchrun $DISTRIBUTED_ARGS pretrain_gpt.py \
+    $GPT_ARGS \
+    $DATA_ARGS \
+    $OUTPUT_ARGS \
+    --distributed-backend nccl \
+    --save ${CKPT_SAVE_DIR}
--- a/sources/images/baichuan/13B-inference.png
+++ b/sources/images/baichuan/13B-inference.png
--- a/sources/images/baichuan/13B-lora-inference.png
+++ b/sources/images/baichuan/13B-lora-inference.png
--- a/sources/images/baichuan/13B-loss-compare.png
+++ b/sources/images/baichuan/13B-loss-compare.png
--- a/sources/images/baichuan/7B_loss_compare.png
+++ b/sources/images/baichuan/7B_loss_compare.png
--- a/sources/images/baichuan/7B_relative_error.png
+++ b/sources/images/baichuan/7B_relative_error.png
--- a/sources/images/baichuan/baichuan13B-loss-relative-error.png
+++ b/sources/images/baichuan/baichuan13B-loss-relative-error.png
--- a/sources/images/baichuan/baichuan7B-loss-compare.png
+++ b/sources/images/baichuan/baichuan7B-loss-compare.png
--- a/sources/images/baichuan/baichuan7B-loss-relative-error.png
+++ b/sources/images/baichuan/baichuan7B-loss-relative-error.png
--- a/sources/images/baichuan2/13B-inference.png
+++ b/sources/images/baichuan2/13B-inference.png
--- a/sources/images/baichuan2/13B-inference_en.png
+++ b/sources/images/baichuan2/13B-inference_en.png
--- a/sources/images/baichuan2/13B-loss-compare.png
+++ b/sources/images/baichuan2/13B-loss-compare.png
--- a/sources/images/baichuan2/7B_loss_compare.png
+++ b/sources/images/baichuan2/7B_loss_compare.png
--- a/sources/images/baichuan2/7B_relative_error.png
+++ b/sources/images/baichuan2/7B_relative_error.png
--- a/sources/images/baichuan2/baichuan2-13B-loss-relative-error.png
+++ b/sources/images/baichuan2/baichuan2-13B-loss-relative-error.png
--- a/sources/images/baichuan2/baichuan2-7B-loss-compare.png
+++ b/sources/images/baichuan2/baichuan2-7B-loss-compare.png
--- a/sources/images/baichuan2/baichuan2-7B-loss-relative-error.png
+++ b/sources/images/baichuan2/baichuan2-7B-loss-relative-error.png