add baichuan chinese readme

2024-12-05 05:17:40 +08:00 · 2023-11-17 13:58:19 +08:00 · 2023-11-17 13:58:19 +08:00 · eea45231c2
commit eea45231c2
parent dda7ac859b
3 changed files with 495 additions and 122 deletions
--- a/examples/README.md
+++ b/examples/README.md
@ -1,2 +0,0 @@
-## Recipes and Scripts
-
--- a/examples/baichuan/README.md
+++ b/examples/baichuan/README.md
@ -1,50 +1,55 @@
+# BaiChuan
+<p align="left">
+        <b>简体中文</b> |
+        <b><a href="https://gitee.com/ascend/AscendSpeed/blob/master/examples/baichuan/README_en.md">English</a> </b> 
+    </p>
+</p>


-#  Contents

- [Baichuan-7B](#contents)
-  - [Training](#pre-training)
-  - [Script](#script)
-  - [Performance](#performance)
-    - [Machine performance](#machine-performance)
-    - [Accuracy of the loss](#accuracy-of-the-loss)
+#  目录

- [Baichuan-13B](#contents)
-  - [Training](#pre-training)
-  - [Script](#script)
-  - [Performance](#performance)
-    - [Machine performance](#machine-performance)
-    - [Accuracy of the loss](#accuracy-of-the-loss)
+- [Baichuan-7B](#Baichuan-7B)
+  - [训练](#训练)
+  - [脚本](#脚本)
+  - [性能](#性能)
+    - [吞吐](#吞吐)
+    - [精度](#精度)
+
+- [Baichuan-13B](#Baichuan-13B)
+  - [训练](#训练)
+  - [脚本](#脚本)
+  - [性能](#性能)
+    - [吞吐](#吞吐)
+    - [精度](#精度)

 # Baichuan-7B

-## Training
+## 训练
+Baichuan-7B 训练的硬件配置如下：

-Here's a hardware summary of pre-training Baichuan-7B:
+|  硬件 |       配置        |
+|:---:|:---------------:|
+| NPU | 8 x Ascend NPUs |

-| Hardware |                      Value                      |
-| :------: | :---------------------------------------------: |
-|   NPU    |               8 x Ascend NPUs                   |
+Baichuan-7B 训练的软件配置如下： 

-Here's a software summary of pre-training Baichuan-7B: 
-
-
-|         Software          |   Version   |
-| :-----------------------: |:-----------:|
-|          Python           |   3.7.16    |
-|          driver           | [package](https://support.huawei.com/enterprise/zh/ascend-computing/atlas-900-pod-a2-pid-254184911/software) |
-|         firmware          | [package](https://support.huawei.com/enterprise/zh/ascend-computing/atlas-900-pod-a2-pid-254184911/software) |
-|           CANN            |       [package](https://support.huawei.com/enterprise/zh/ascend-computing/cann-pid-251168373/software)       |
-| binary arithmetic package |       [package](https://support.huawei.com/enterprise/zh/ascend-computing/cann-pid-251168373/software)       |
-|           torch           |                                                    1.11.0                                                    |
-|         torch_npu         |                             [package](https://gitee.com/ascend/pytorch/releases)                             |
-|           apex            |       [package](https://pytorch-package.obs.cn-north-4.myhuaweicloud.com/pta/Daily/v1.11.0/20230915.2/pytorch_v1.11.0_py37.tar.gz)       |
+|            软件             |                                                              配置                                                              |
+|:-------------------------:|:----------------------------------------------------------------------------------------------------------------------------:|
+|          python           |                                                            3.7.16                                                            |
+|          driver           |         [package](https://support.huawei.com/enterprise/zh/ascend-computing/atlas-900-pod-a2-pid-254184911/software)         |
+|         firmware          |         [package](https://support.huawei.com/enterprise/zh/ascend-computing/atlas-900-pod-a2-pid-254184911/software)         |
+|           CANN            |               [package](https://support.huawei.com/enterprise/zh/ascend-computing/cann-pid-251168373/software)               |
+| binary arithmetic package |               [package](https://support.huawei.com/enterprise/zh/ascend-computing/cann-pid-251168373/software)               |
+|           torch           |                                                            1.11.0                                                            |
+|         torch_npu         |                                     [package](https://gitee.com/ascend/pytorch/releases)                                     |
+|           apex            | [package](https://pytorch-package.obs.cn-north-4.myhuaweicloud.com/pta/Daily/v1.11.0/20230915.2/pytorch_v1.11.0_py37.tar.gz) |



-### Script
+### 脚本

-1. Clone the repository to your local server:
+1. 拷贝仓库到你的个人服务器：
 ```shell
 git clone https://gitee.com/ascend/AscendSpeed.git 
 cd AscendSpeed 
@ -52,21 +57,21 @@ mkdir logs
 mkdir ckpt
 ```

-2. Build environment
+2. 搭建环境

 ```bash
 # python3.7
 conda create -n test python=3.7
 conda activate test

-# install torch and torch_npu 
+# 安装 torch 和 torch_npu 
 pip install torch-1.11.0-cp37-cp37m-linux_aarch64.whl
 pip install torch_npu-1.11.0.XXX-cp37-cp37m-linux_XXX.whl

-# install megatron-core
+# 安装 megatron-core
 pip3 install --no-use-pep517 -e git+https://github.com/NVIDIA/Megatron-LM.git@23.05#egg=megatron-core

-# install deepspeed and deepspeed_npu
+# 安装 deepspeed 和 deepspeed_npu
 pip install deepspeed==0.9.2
 git clone https://gitee.com/ascend/DeepSpeed.git -b v0.9.2 deepspeed_npu
 cd deepspeed_npu
@ -74,25 +79,29 @@ pip3 install -e ./
 cd ..


-# install other packages
+# 安装其余依赖库
 pip install -r requirements.txt 
 ```

-*Note that if you want to train with the weight from huggingface, please run fix a deepspeed loading checkpointing bug by modified `if zero_sd_list is None` as `if zero_sd_list is None or len(zero_sd_list) == 0` in the `_load_zero_checkpoint` function of `<deepspeed-installed-path>/runtime/engine.py`*
-     
+3. （可选）准备预训练权重
+
+从 [huggingface](https://huggingface.co/baichuan-inc/Baichuan-7B/tree/main) 下载预训练权重：
+
 ```text
-# original deepspeed/runtime/engine.py, about #Lines2746-2748
+# 请注意，如果要加载huggingface的预训练权重，需要修改一个deepspeed关于加载权重的bug：
+# 在 `<deepspeed-installed-path>/runtime/engine.py` 文件里的 `_load_zero_checkpoint` 函数，
+# 将 `if zero_sd_list is None` 改为 `if zero_sd_list is None or len(zero_sd_list) == 0`
+
+# 原始 deepspeed/runtime/engine.py, 大概 #Lines2746-2748
 zero_sd_list = self._get_all_zero_checkpoints(load_dir, tag)
 if zero_sd_list is None:
    return False

-# modified
+# 修改后
 zero_sd_list = self._get_all_zero_checkpoints(load_dir, tag)
 if zero_sd_list is None or len(zero_sd_list) == 0:
    return False
 ```
-3. Prepare pretrained weights
-Download the Baichuan-7B checkpoint from [here](https://huggingface.co/baichuan-inc/Baichuan-7B/tree/main) 

 ```shell
 mkdir baichuan-7B-hf
@ -109,7 +118,8 @@ wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenizer.mode
 wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenizer_config.json
 cd ..
 ```
-In order to adapt to the baichuan-7B model, the following script is used to convert the model pre-training weights.
+
+接着将hf格式的权重转化为AscendSpeed可以加载的形式：
 ```shell
 mkdir weight

@ -127,18 +137,18 @@ python $SCRIPT_PATH \
 ```


-4. Prepare dataset
+4. 准备数据集

-Download the Baichuan-7B datasets from [here](https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet) 
+从 [这里](https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet) 下载 BaiChuan-7B 的数据集：

 ```shell
-# download datasets
+# 下载数据集
 mkdir dataset_baichuan7B
 cd ./dataset_baichuan7B
 wget https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
 cd ..

-# process datasets                              
+# 准备数据集                              
 python ./tools/preprocess_data.py \
 --input ./dataset_baichuan7B/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
 --tokenizer-name-or-path ./baichuan-7B-hf \
@ -149,47 +159,44 @@ python ./tools/preprocess_data.py \
 ```


-5. Config Baichuan-7B pre-training script : examples/baichuan/pretrain_baichuan_zero_7B.sh 
+5. 配置 Baichuan-7B 预训练脚本: examples/baichuan/pretrain_baichuan_zero_7B.sh 

 ```shell
-# modify the script according to your own  ascend-toolkit path
+# 修改 ascend-toolkit 路径
 source /usr/local/Ascend/ascend-toolkit/set_env.sh 

-# modify script orign dataset path according to your own dataset path
-TOKENIZER_PATH=./baichuan-7B-hf/  #tokenizer path
-DATA_PATH=./dataset_baichuan7B/alpaca_text_document  #processed dataset
+# 修改数据集，权重，词表等路径
+TOKENIZER_PATH=./baichuan-7B-hf/  #tokenizer 路径
+DATA_PATH=./dataset_baichuan7B/alpaca_text_document  #数据集路径
+# 如果要加载权重，可以增加参数 `--load ./weight`
 ```

-6. Launch Baichuan-7B  pre-training script: examples/baichuan/pretrain_baichuan_zero_7B.sh 
+6. 启动 Baichuan-7B 预训练脚本: examples/baichuan/pretrain_baichuan_zero_7B.sh 

 ```shell
 bash examples/baichuan/pretrain_baichuan_zero_7B.sh 
 ```
-*Note that if you want to train with weights from the huggingface, please add a parameter to the script  `pretrain_baichuan_zero_7B.sh` by inserting `--load ./weight` at lines 74 - 107 and rerun it.*

+### 性能

-### Performance
+#### 吞吐

-#### Machine performance
+Baichuan-7B 使用 **昇腾芯片** 和 **参考芯片** 的吞吐对比:

-The performance of Baichuan-7B in **Ascend NPU** and **Reference**:
-
-| Device | Model       | total Iterations | throughput rate (samples/s/p) | throughput rate (tokens/s/p) | single-step time (s/step) | floating point operation (TFLOPs/s) |
-| ------ | ----------- | ---------------- | ----------------------------- | ---------------------------- | ------------------------- | ----------------------------------- |
-| NPUs   | Baichuan-7B | 1024      | 3.722                      | 1905                         | 2.14                      | 102.69                              |
-| Reference   | Baichuan-7B | 1024             | 3.978                         | 2036                         | 1.98                      | 125.66                              |
+| 设备 | 模型     | 迭代 | 样本吞吐 (samples/p/s) | tokens吞吐 (tokens/p/s) | 单步迭代时间 (s/step) | 浮点计算数 (TFLOPs/s) |
+|----|--------|----|--------------------|-----------------------|-----------------|------------------|
+| NPUs | Baichuan-7B | 1024 | 3.722              | 1905                  | 2.14            | 102.69           |
+| 参考 | Baichuan-7B | 1024 | 3.978              | 2036                  | 1.98            | 125.66           |



-#### Accuracy of the loss
+#### 精度

-NPU vs Reference loss.
-
-The NPU runs smoothly, the resource usage is stable, no errors are reported in the middle of the process, the Loss is on a decreasing trend, and the convergence speed is as expected. The relative error of the average loss is 0.01093, less than 2%, the maximum relative error is 0.1243, and the maximum absolute error is 0.4859. The precision meets the requirements.
+NPU vs 参考 loss.

 ![NPU-LOSS](../../sources/images/baichuan/7B_loss_compare.png)

-NPU vs Reference loss relative error.
+NPU vs 参考 loss 相对误差.

 ![NPU-Relative-Error](../../sources/images/baichuan/7B_relative_error.png)

@ -197,20 +204,19 @@ NPU vs Reference loss relative error.

 # Baichuan-13B

-## Training
+## 训练

-Here's a hardware summary of pre-training Baichuan-13B:
+Baichuan-13B 训练的硬件配置如下:

-| Hardware |                      Value                      |
-| :------: | :---------------------------------------------: |
-|   NPU    |               8 x Ascend NPUs               |
+|  硬件 |       配置        |
+|:---:|:---------------:|
+| NPU | 8 x Ascend NPUs |

-Here's a software summary of pre-training Baichuan-13B:
+Baichuan-13B 训练的软件配置如下:

-
-|         Software          |   Version   |
-| :-----------------------: |:-----------:|
-|          Python           |   3.7.16    |
+|            软件             |                                                      配置                                                      |
+|:-------------------------:|:------------------------------------------------------------------------------------------------------------:|
+|          python           |                                                    3.7.16                                                    |
 |          driver           | [package](https://support.huawei.com/enterprise/zh/ascend-computing/atlas-900-pod-a2-pid-254184911/software) |
 |         firmware          | [package](https://support.huawei.com/enterprise/zh/ascend-computing/atlas-900-pod-a2-pid-254184911/software) |
 |           CANN            |       [package](https://support.huawei.com/enterprise/zh/ascend-computing/cann-pid-251168373/software)       |
@ -221,46 +227,45 @@ Here's a software summary of pre-training Baichuan-13B:



-### Script
-1. Clone the repository to your local server:
+### 脚本
+1. 将仓库拷贝到你的个人服务器:
 ```shell
 git clone https://gitee.com/ascend/AscendSpeed.git 
 cd AscendSpeed 
 mkdir logs
 mkdir ckpt
 ```
-2. Build environment
+2. 搭建环境

 ```bash
 # python3.7
 conda create -n test python=3.7
 conda activate test

-# install torch and torch_npu
+# 安装 torch 和 torch_npu
 pip install torch-1.11.0-cp37-cp37m-linux_aarch64.whl
 pip install torch_npu-1.11.0.XXX-cp37-cp37m-linux_XXX.whl

-#install megatron
+# 安装 megatron
 git clone https://github.com/NVIDIA/Megatron-LM.git -b 23.05
 cd Megatron-LM
 pip3 install -e ./
 cd ..

-# install deepspeed and deepspeed_npu
+# 安装 deepspeed 和 deepspeed_npu
 pip install deepspeed==0.9.2
 git clone https://gitee.com/ascend/DeepSpeed.git -b v0.9.2 deepspeed_npu
 cd deepspeed_npu
 pip3 install -e ./
 cd ..

-# install other packages
+# 安装其余依赖库
 pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
 ```

-3. Prepare pretrained weights
+3. （可选的）准备预训练权重

-
-Download the Baichuan-13B checkpoint from [here](https://huggingface.co/baichuan-inc/Baichuan-13B-Base/tree/main) 
+从 [huggingface](https://huggingface.co/baichuan-inc/Baichuan-13B-Base/tree/main) 下载预训练权重
 ```shell
 mkdir baichuan-13B-hf
 cd ./baichuan-13B-hf
@ -280,7 +285,7 @@ wget https://huggingface.co/baichuan-inc/Baichuan-13B-Base/resolve/main/tokenize
 cd ..
 ```

-In order to adapt to the baichuan-13B model, the following script is used to convert the model pre-training weights.
+将 BaiChuan-13B 模型权重从 huggingface 格式转换为 AscendSpeed 格式
 ```shell
 mkdir weight

@ -295,8 +300,9 @@ python $SCRIPT_PATH \
    --pse     
 ```

-4. Prepare dataset
-Download the Baichuan-13B datasets from [here](https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet) 
+4. 准备数据集
+
+下载 Baichuan-13B [数据集](https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet) 

 ```shell
 mkdir dataset_baichuan13B
@ -304,7 +310,6 @@ cd ./dataset_baichuan13B
 wget https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
 cd ..

-
 python ./tools/preprocess_data.py \
    --input ./dataset_baichuan13B/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
    --tokenizer-name-or-path ./baichuan-13B-hf \
@ -315,57 +320,49 @@ python ./tools/preprocess_data.py \
 ```


-5. Config Baichuan-13B pre-training script: /examples/baichuan/pretrain_baichuan_ptd_13B.sh
+5. 配置 Baichuan-13B 训练脚本: /examples/baichuan/pretrain_baichuan_ptd_13B.sh


 ```shell
-# modify the script according to your own  ascend-toolkit path
+# 修改 ascend-toolkit 路径
 source /usr/local/Ascend/ascend-toolkit/set_env.sh 

-# modify script orign dataset path according to your own dataset path
+# 修改词表，数据集等路径
 TOKENIZER_PATH=./baichuan-13B-hf  
 DATA_PATH=./dataset_baichuan13B/aplaca_text_document  
 ```

-6. Launch Baichuan-13B pre-training script: /examples/baichuan/pretrain_baichuan_ptd_13B.sh
+6. 启动 Baichuan-13B 训练脚本: /examples/baichuan/pretrain_baichuan_ptd_13B.sh

 ```bash
 bash examples/baichuan/pretrain_baichuan_ptd_13B.sh
 ```

-There is an hourly pulse checking script running that checks that the training is either running or scheduled.
+
+
+### 性能
+
+#### 吞吐
+
+Baichuan-13B 在 **昇腾芯片** 和 **参考芯片** 上的性能对比:
+
+|  设备  |      模型      | 迭代数  | 样本吞吐 (samples/p/s) | token吞吐 (tokens/p/s) | 单步迭代时间 (s/step) | 浮点计算数 (TFLOPs/s) |
+|:----:|:------------:|:----:|:------------------:|:--------------------:|:---------------:|:----------------:|
+| NPUs | Baichuan-13B | 1000 |       1.928        |         1024         |     16.067      |      89.37       |
+|  参考  | Baichuan-13B | 1000 |       1.535        |         862          |     19.852      |      72.39       |



-### Performance
+#### 精度

-#### Machine performance
+NPU vs 参考 loss.

-The performance of the Baichuan-13B in **Ascend NPU** and **Reference**:
-
-| Device |    Model     | total Iterations | throughput rate (samples/s/p) | throughput rate (tokens/s/p) | single-step time (s/step) | floating point operation (TFLOPs/s) |
-| :----: | :----------: | :--------------: | :---------------------------: | :--------------------------: | :-----------------------: | :---------------------------------: |
-|  NPUs  | Baichuan-13B |       1000       |             1.928             |             1024             |          16.067           |                89.37                |
-|  Reference  | Baichuan-13B |       1000       |             1.535             |             862              |          19.852           |                72.39                |
-
-
-
-#### Accuracy of the loss
-
-NPU vs Reference loss.
-
-The NPU runs smoothly, the resource usage is stable, no errors are reported in the middle of the process, the Loss is on a decreasing trend, and the convergence speed is as expected. The relative error of the average loss is 0.00725, less than 2%, the maximum relative error is 0.01978, and the maximum absolute error is 0.10811. The precision meets the requirements.

 ![NPU-LOSS](../../sources/images/baichuan/13B-loss-compare.png)

-NPU vs Reference loss relative error.
-
-The relative error between NPU and Reference Loss is less than 0.02 throughout, as expected.
+NPU vs 参考 loss 相对误差.

 ![NPU-Relative-Error](../../sources/images/baichuan/baichuan13B-loss-relative-error.png)
-\
-\
-<font size=1>If the download of the file fails using 'wget' , you can download it manually while ensuring website security.</font>



--- a/examples/baichuan/README_en.md
+++ b/examples/baichuan/README_en.md
@ -0,0 +1,378 @@
+# BaiChuan
+<p align="left">
+        <b><a href="https://gitee.com/ascend/AscendSpeed/blob/master/examples/baichuan/README.md">简体中文</a></b> |
+        <b>English</b> 
+    </p>
+</p>
+
+
+#  Contents
+
+- [Baichuan-7B](#contents)
+  - [Training](#pre-training)
+  - [Script](#script)
+  - [Performance](#performance)
+    - [Machine performance](#machine-performance)
+    - [Accuracy of the loss](#accuracy-of-the-loss)
+
+- [Baichuan-13B](#contents)
+  - [Training](#pre-training)
+  - [Script](#script)
+  - [Performance](#performance)
+    - [Machine performance](#machine-performance)
+    - [Accuracy of the loss](#accuracy-of-the-loss)
+
+# Baichuan-7B
+
+## Training
+
+Here's a hardware summary of pre-training Baichuan-7B:
+
+| Hardware |                      Value                      |
+| :------: | :---------------------------------------------: |
+|   NPU    |               8 x Ascend NPUs                   |
+
+Here's a software summary of pre-training Baichuan-7B: 
+
+
+|         Software          |   Version   |
+| :-----------------------: |:-----------:|
+|          Python           |   3.7.16    |
+|          driver           | [package](https://support.huawei.com/enterprise/zh/ascend-computing/atlas-900-pod-a2-pid-254184911/software) |
+|         firmware          | [package](https://support.huawei.com/enterprise/zh/ascend-computing/atlas-900-pod-a2-pid-254184911/software) |
+|           CANN            |       [package](https://support.huawei.com/enterprise/zh/ascend-computing/cann-pid-251168373/software)       |
+| binary arithmetic package |       [package](https://support.huawei.com/enterprise/zh/ascend-computing/cann-pid-251168373/software)       |
+|           torch           |                                                    1.11.0                                                    |
+|         torch_npu         |                             [package](https://gitee.com/ascend/pytorch/releases)                             |
+|           apex            |       [package](https://pytorch-package.obs.cn-north-4.myhuaweicloud.com/pta/Daily/v1.11.0/20230915.2/pytorch_v1.11.0_py37.tar.gz)       |
+
+
+
+### Script
+
+1. Clone the repository to your local server:
+```shell
+git clone https://gitee.com/ascend/AscendSpeed.git 
+cd AscendSpeed 
+mkdir logs
+mkdir ckpt
+```
+
+2. Build environment
+
+```bash
+# python3.7
+conda create -n test python=3.7
+conda activate test
+
+# install torch and torch_npu 
+pip install torch-1.11.0-cp37-cp37m-linux_aarch64.whl
+pip install torch_npu-1.11.0.XXX-cp37-cp37m-linux_XXX.whl
+
+# install megatron-core
+pip3 install --no-use-pep517 -e git+https://github.com/NVIDIA/Megatron-LM.git@23.05#egg=megatron-core
+
+# install deepspeed and deepspeed_npu
+pip install deepspeed==0.9.2
+git clone https://gitee.com/ascend/DeepSpeed.git -b v0.9.2 deepspeed_npu
+cd deepspeed_npu
+pip3 install -e ./
+cd ..
+
+
+# install other packages
+pip install -r requirements.txt 
+```
+
+*Note that if you want to train with the weight from huggingface, please run fix a deepspeed loading checkpointing bug by modified `if zero_sd_list is None` as `if zero_sd_list is None or len(zero_sd_list) == 0` in the `_load_zero_checkpoint` function of `<deepspeed-installed-path>/runtime/engine.py`*
+     
+```text
+# original deepspeed/runtime/engine.py, about #Lines2746-2748
+zero_sd_list = self._get_all_zero_checkpoints(load_dir, tag)
+if zero_sd_list is None:
+    return False
+
+# modified
+zero_sd_list = self._get_all_zero_checkpoints(load_dir, tag)
+if zero_sd_list is None or len(zero_sd_list) == 0:
+    return False
+```
+3. Prepare pretrained weights
+Download the Baichuan-7B checkpoint from [here](https://huggingface.co/baichuan-inc/Baichuan-7B/tree/main) 
+
+```shell
+mkdir baichuan-7B-hf
+cd ./baichuan-7B-hf
+wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/config.json
+wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/configuration_baichuan.py
+wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/generation_config.json
+wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/handler.py
+wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/modeling_baichuan.py
+wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/pytorch_model.bin
+wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/special_tokens_map.json
+wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenization_baichuan.py
+wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenizer.model
+wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenizer_config.json
+cd ..
+```
+In order to adapt to the baichuan-7B model, the following script is used to convert the model pre-training weights.
+```shell
+mkdir weight
+
+SCRIPT_PATH=./tools/ckpt_convert/llama/convert_weights_from_huggingface.py
+python $SCRIPT_PATH \
+    --input-model-dir ./baichuan-7B-hf \
+    --output-model-dir ./weight \
+    --tensor-model-parallel-size 1 \
+    --pipeline-model-parallel-size 1 \
+    --type 7B \
+    --pse \
+    --deepspeed \
+    --use_wpack_rotray \
+    --load_weight_map   
+```
+
+
+4. Prepare dataset
+
+Download the Baichuan-7B datasets from [here](https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet) 
+
+```shell
+# download datasets
+mkdir dataset_baichuan7B
+cd ./dataset_baichuan7B
+wget https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
+cd ..
+
+# process datasets                              
+python ./tools/preprocess_data.py \
+--input ./dataset_baichuan7B/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
+--tokenizer-name-or-path ./baichuan-7B-hf \
+--output-prefix ./dataset_baichuan7B/alpaca \
+--workers 4 \
+--log-interval 1000 \
+--tokenizer-type PretrainedFromHF
+```
+
+
+5. Config Baichuan-7B pre-training script : examples/baichuan/pretrain_baichuan_zero_7B.sh 
+
+```shell
+# modify the script according to your own  ascend-toolkit path
+source /usr/local/Ascend/ascend-toolkit/set_env.sh 
+
+# modify script orign dataset path according to your own dataset path
+TOKENIZER_PATH=./baichuan-7B-hf/  #tokenizer path
+DATA_PATH=./dataset_baichuan7B/alpaca_text_document  #processed dataset
+```
+
+6. Launch Baichuan-7B  pre-training script: examples/baichuan/pretrain_baichuan_zero_7B.sh 
+
+```shell
+bash examples/baichuan/pretrain_baichuan_zero_7B.sh 
+```
+*Note that if you want to train with weights from the huggingface, please add a parameter to the script  `pretrain_baichuan_zero_7B.sh` by inserting `--load ./weight` at lines 74 - 107 and rerun it.*
+
+
+### Performance
+
+#### Machine performance
+
+The performance of Baichuan-7B in **Ascend NPU** and **Reference**:
+
+| Device | Model       | total Iterations | throughput rate (samples/s/p) | throughput rate (tokens/s/p) | single-step time (s/step) | floating point operation (TFLOPs/s) |
+| ------ | ----------- | ---------------- | ----------------------------- | ---------------------------- | ------------------------- | ----------------------------------- |
+| NPUs   | Baichuan-7B | 1024      | 3.722                      | 1905                         | 2.14                      | 102.69                              |
+| Reference   | Baichuan-7B | 1024             | 3.978                         | 2036                         | 1.98                      | 125.66                              |
+
+
+
+#### Accuracy of the loss
+
+NPU vs Reference loss.
+
+The NPU runs smoothly, the resource usage is stable, no errors are reported in the middle of the process, the Loss is on a decreasing trend, and the convergence speed is as expected. The relative error of the average loss is 0.01093, less than 2%, the maximum relative error is 0.1243, and the maximum absolute error is 0.4859. The precision meets the requirements.
+
+![NPU-LOSS](../../sources/images/baichuan/7B_loss_compare.png)
+
+NPU vs Reference loss relative error.
+
+![NPU-Relative-Error](../../sources/images/baichuan/7B_relative_error.png)
+
+
+
+# Baichuan-13B
+
+## Training
+
+Here's a hardware summary of pre-training Baichuan-13B:
+
+| Hardware |                      Value                      |
+| :------: | :---------------------------------------------: |
+|   NPU    |               8 x Ascend NPUs               |
+
+Here's a software summary of pre-training Baichuan-13B:
+
+
+|         Software          |   Version   |
+| :-----------------------: |:-----------:|
+|          Python           |   3.7.16    |
+|          driver           | [package](https://support.huawei.com/enterprise/zh/ascend-computing/atlas-900-pod-a2-pid-254184911/software) |
+|         firmware          | [package](https://support.huawei.com/enterprise/zh/ascend-computing/atlas-900-pod-a2-pid-254184911/software) |
+|           CANN            |       [package](https://support.huawei.com/enterprise/zh/ascend-computing/cann-pid-251168373/software)       |
+| binary arithmetic package |       [package](https://support.huawei.com/enterprise/zh/ascend-computing/cann-pid-251168373/software)       |
+|           torch           |                                                    1.11.0                                                    |
+|         torch_npu         |                             [package](https://gitee.com/ascend/pytorch/releases)                             |
+
+
+
+
+### Script
+1. Clone the repository to your local server:
+```shell
+git clone https://gitee.com/ascend/AscendSpeed.git 
+cd AscendSpeed 
+mkdir logs
+mkdir ckpt
+```
+2. Build environment
+
+```bash
+# python3.7
+conda create -n test python=3.7
+conda activate test
+
+# install torch and torch_npu
+pip install torch-1.11.0-cp37-cp37m-linux_aarch64.whl
+pip install torch_npu-1.11.0.XXX-cp37-cp37m-linux_XXX.whl
+
+#install megatron
+git clone https://github.com/NVIDIA/Megatron-LM.git -b 23.05
+cd Megatron-LM
+pip3 install -e ./
+cd ..
+
+# install deepspeed and deepspeed_npu
+pip install deepspeed==0.9.2
+git clone https://gitee.com/ascend/DeepSpeed.git -b v0.9.2 deepspeed_npu
+cd deepspeed_npu
+pip3 install -e ./
+cd ..
+
+# install other packages
+pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
+```
+
+3. Prepare pretrained weights
+
+
+Download the Baichuan-13B checkpoint from [here](https://huggingface.co/baichuan-inc/Baichuan-13B-Base/tree/main) 
+```shell
+mkdir baichuan-13B-hf
+cd ./baichuan-13B-hf
+wget https://huggingface.co/baichuan-inc/Baichuan-13B-Base/resolve/main/config.json
+wget https://huggingface.co/baichuan-inc/Baichuan-13B-Base/resolve/main/configuration_baichuan.py
+wget https://huggingface.co/baichuan-inc/Baichuan-13B-Base/resolve/main/generation_config.json
+wget https://huggingface.co/baichuan-inc/Baichuan-13B-Base/resolve/main/modeling_baichuan.py
+wget https://huggingface.co/baichuan-inc/Baichuan-13B-Base/resolve/main/pytorch_model-00001-of-00003.bin
+wget https://huggingface.co/baichuan-inc/Baichuan-13B-Base/resolve/main/pytorch_model-00002-of-00003.bin
+wget https://huggingface.co/baichuan-inc/Baichuan-13B-Base/resolve/main/pytorch_model-00003-of-00003.bin
+wget https://huggingface.co/baichuan-inc/Baichuan-13B-Base/resolve/main/pytorch_model.bin.index.json
+wget https://huggingface.co/baichuan-inc/Baichuan-13B-Base/resolve/main/quantizer.py
+wget https://huggingface.co/baichuan-inc/Baichuan-13B-Base/resolve/main/special_tokens_map.json
+wget https://huggingface.co/baichuan-inc/Baichuan-13B-Base/resolve/main/tokenization_baichuan.py
+wget https://huggingface.co/baichuan-inc/Baichuan-13B-Base/resolve/main/tokenizer_config.json
+wget https://huggingface.co/baichuan-inc/Baichuan-13B-Base/resolve/main/tokenizer.model
+cd ..
+```
+
+In order to adapt to the baichuan-13B model, the following script is used to convert the model pre-training weights.
+```shell
+mkdir weight
+
+SCRIPT_PATH=./tools/ckpt_convert/llama/convert_weights_from_huggingface.py
+python $SCRIPT_PATH \
+    --input-model-dir ./baichuan-13B-hf \
+    --output-model-dir ./weight \
+    --tensor-model-parallel-size 8 \
+    --pipeline-model-parallel-size 1 \
+    --make-vocab-size-divisible-by 8 \
+    --type 13B \
+    --pse     
+```
+
+4. Prepare dataset
+Download the Baichuan-13B datasets from [here](https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet) 
+
+```shell
+mkdir dataset_baichuan13B
+cd ./dataset_baichuan13B
+wget https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
+cd ..
+
+
+python ./tools/preprocess_data.py \
+    --input ./dataset_baichuan13B/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
+    --tokenizer-name-or-path ./baichuan-13B-hf \
+    --output-prefix ./dataset_baichuan13B/alpaca \
+    --workers 4 \
+    --log-interval 1000 \
+    --tokenizer-type PretrainedFromHF 
+```
+
+
+5. Config Baichuan-13B pre-training script: /examples/baichuan/pretrain_baichuan_ptd_13B.sh
+
+
+```shell
+# modify the script according to your own  ascend-toolkit path
+source /usr/local/Ascend/ascend-toolkit/set_env.sh 
+
+# modify script orign dataset path according to your own dataset path
+TOKENIZER_PATH=./baichuan-13B-hf  
+DATA_PATH=./dataset_baichuan13B/aplaca_text_document  
+```
+
+6. Launch Baichuan-13B pre-training script: /examples/baichuan/pretrain_baichuan_ptd_13B.sh
+
+```bash
+bash examples/baichuan/pretrain_baichuan_ptd_13B.sh
+```
+
+There is an hourly pulse checking script running that checks that the training is either running or scheduled.
+
+
+
+### Performance
+
+#### Machine performance
+
+The performance of the Baichuan-13B in **Ascend NPU** and **Reference**:
+
+| Device |    Model     | total Iterations | throughput rate (samples/s/p) | throughput rate (tokens/s/p) | single-step time (s/step) | floating point operation (TFLOPs/s) |
+| :----: | :----------: | :--------------: | :---------------------------: | :--------------------------: | :-----------------------: | :---------------------------------: |
+|  NPUs  | Baichuan-13B |       1000       |             1.928             |             1024             |          16.067           |                89.37                |
+|  Reference  | Baichuan-13B |       1000       |             1.535             |             862              |          19.852           |                72.39                |
+
+
+
+#### Accuracy of the loss
+
+NPU vs Reference loss.
+
+The NPU runs smoothly, the resource usage is stable, no errors are reported in the middle of the process, the Loss is on a decreasing trend, and the convergence speed is as expected. The relative error of the average loss is 0.00725, less than 2%, the maximum relative error is 0.01978, and the maximum absolute error is 0.10811. The precision meets the requirements.
+
+![NPU-LOSS](../../sources/images/baichuan/13B-loss-compare.png)
+
+NPU vs Reference loss relative error.
+
+The relative error between NPU and Reference Loss is less than 0.02 throughout, as expected.
+
+![NPU-Relative-Error](../../sources/images/baichuan/baichuan13B-loss-relative-error.png)
+\
+\
+<font size=1>If the download of the file fails using 'wget' , you can download it manually while ensuring website security.</font>
+
+
+
+