mirror of
https://gitee.com/ascend/ModelLink.git
synced 2024-12-05 05:17:40 +08:00
add baichuan chinese readme
This commit is contained in:
parent
dda7ac859b
commit
eea45231c2
@ -1,2 +0,0 @@
|
||||
## Recipes and Scripts
|
||||
|
@ -1,50 +1,55 @@
|
||||
# BaiChuan
|
||||
<p align="left">
|
||||
<b>简体中文</b> |
|
||||
<b><a href="https://gitee.com/ascend/AscendSpeed/blob/master/examples/baichuan/README_en.md">English</a> </b>
|
||||
</p>
|
||||
</p>
|
||||
|
||||
|
||||
# Contents
|
||||
|
||||
- [Baichuan-7B](#contents)
|
||||
- [Training](#pre-training)
|
||||
- [Script](#script)
|
||||
- [Performance](#performance)
|
||||
- [Machine performance](#machine-performance)
|
||||
- [Accuracy of the loss](#accuracy-of-the-loss)
|
||||
# 目录
|
||||
|
||||
- [Baichuan-13B](#contents)
|
||||
- [Training](#pre-training)
|
||||
- [Script](#script)
|
||||
- [Performance](#performance)
|
||||
- [Machine performance](#machine-performance)
|
||||
- [Accuracy of the loss](#accuracy-of-the-loss)
|
||||
- [Baichuan-7B](#Baichuan-7B)
|
||||
- [训练](#训练)
|
||||
- [脚本](#脚本)
|
||||
- [性能](#性能)
|
||||
- [吞吐](#吞吐)
|
||||
- [精度](#精度)
|
||||
|
||||
- [Baichuan-13B](#Baichuan-13B)
|
||||
- [训练](#训练)
|
||||
- [脚本](#脚本)
|
||||
- [性能](#性能)
|
||||
- [吞吐](#吞吐)
|
||||
- [精度](#精度)
|
||||
|
||||
# Baichuan-7B
|
||||
|
||||
## Training
|
||||
## 训练
|
||||
Baichuan-7B 训练的硬件配置如下:
|
||||
|
||||
Here's a hardware summary of pre-training Baichuan-7B:
|
||||
| 硬件 | 配置 |
|
||||
|:---:|:---------------:|
|
||||
| NPU | 8 x Ascend NPUs |
|
||||
|
||||
| Hardware | Value |
|
||||
| :------: | :---------------------------------------------: |
|
||||
| NPU | 8 x Ascend NPUs |
|
||||
Baichuan-7B 训练的软件配置如下:
|
||||
|
||||
Here's a software summary of pre-training Baichuan-7B:
|
||||
|
||||
|
||||
| Software | Version |
|
||||
| :-----------------------: |:-----------:|
|
||||
| Python | 3.7.16 |
|
||||
| driver | [package](https://support.huawei.com/enterprise/zh/ascend-computing/atlas-900-pod-a2-pid-254184911/software) |
|
||||
| firmware | [package](https://support.huawei.com/enterprise/zh/ascend-computing/atlas-900-pod-a2-pid-254184911/software) |
|
||||
| CANN | [package](https://support.huawei.com/enterprise/zh/ascend-computing/cann-pid-251168373/software) |
|
||||
| binary arithmetic package | [package](https://support.huawei.com/enterprise/zh/ascend-computing/cann-pid-251168373/software) |
|
||||
| torch | 1.11.0 |
|
||||
| torch_npu | [package](https://gitee.com/ascend/pytorch/releases) |
|
||||
| apex | [package](https://pytorch-package.obs.cn-north-4.myhuaweicloud.com/pta/Daily/v1.11.0/20230915.2/pytorch_v1.11.0_py37.tar.gz) |
|
||||
| 软件 | 配置 |
|
||||
|:-------------------------:|:----------------------------------------------------------------------------------------------------------------------------:|
|
||||
| python | 3.7.16 |
|
||||
| driver | [package](https://support.huawei.com/enterprise/zh/ascend-computing/atlas-900-pod-a2-pid-254184911/software) |
|
||||
| firmware | [package](https://support.huawei.com/enterprise/zh/ascend-computing/atlas-900-pod-a2-pid-254184911/software) |
|
||||
| CANN | [package](https://support.huawei.com/enterprise/zh/ascend-computing/cann-pid-251168373/software) |
|
||||
| binary arithmetic package | [package](https://support.huawei.com/enterprise/zh/ascend-computing/cann-pid-251168373/software) |
|
||||
| torch | 1.11.0 |
|
||||
| torch_npu | [package](https://gitee.com/ascend/pytorch/releases) |
|
||||
| apex | [package](https://pytorch-package.obs.cn-north-4.myhuaweicloud.com/pta/Daily/v1.11.0/20230915.2/pytorch_v1.11.0_py37.tar.gz) |
|
||||
|
||||
|
||||
|
||||
### Script
|
||||
### 脚本
|
||||
|
||||
1. Clone the repository to your local server:
|
||||
1. 拷贝仓库到你的个人服务器:
|
||||
```shell
|
||||
git clone https://gitee.com/ascend/AscendSpeed.git
|
||||
cd AscendSpeed
|
||||
@ -52,21 +57,21 @@ mkdir logs
|
||||
mkdir ckpt
|
||||
```
|
||||
|
||||
2. Build environment
|
||||
2. 搭建环境
|
||||
|
||||
```bash
|
||||
# python3.7
|
||||
conda create -n test python=3.7
|
||||
conda activate test
|
||||
|
||||
# install torch and torch_npu
|
||||
# 安装 torch 和 torch_npu
|
||||
pip install torch-1.11.0-cp37-cp37m-linux_aarch64.whl
|
||||
pip install torch_npu-1.11.0.XXX-cp37-cp37m-linux_XXX.whl
|
||||
|
||||
# install megatron-core
|
||||
# 安装 megatron-core
|
||||
pip3 install --no-use-pep517 -e git+https://github.com/NVIDIA/Megatron-LM.git@23.05#egg=megatron-core
|
||||
|
||||
# install deepspeed and deepspeed_npu
|
||||
# 安装 deepspeed 和 deepspeed_npu
|
||||
pip install deepspeed==0.9.2
|
||||
git clone https://gitee.com/ascend/DeepSpeed.git -b v0.9.2 deepspeed_npu
|
||||
cd deepspeed_npu
|
||||
@ -74,25 +79,29 @@ pip3 install -e ./
|
||||
cd ..
|
||||
|
||||
|
||||
# install other packages
|
||||
# 安装其余依赖库
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
*Note that if you want to train with the weight from huggingface, please run fix a deepspeed loading checkpointing bug by modified `if zero_sd_list is None` as `if zero_sd_list is None or len(zero_sd_list) == 0` in the `_load_zero_checkpoint` function of `<deepspeed-installed-path>/runtime/engine.py`*
|
||||
|
||||
3. (可选)准备预训练权重
|
||||
|
||||
从 [huggingface](https://huggingface.co/baichuan-inc/Baichuan-7B/tree/main) 下载预训练权重:
|
||||
|
||||
```text
|
||||
# original deepspeed/runtime/engine.py, about #Lines2746-2748
|
||||
# 请注意,如果要加载huggingface的预训练权重,需要修改一个deepspeed关于加载权重的bug:
|
||||
# 在 `<deepspeed-installed-path>/runtime/engine.py` 文件里的 `_load_zero_checkpoint` 函数,
|
||||
# 将 `if zero_sd_list is None` 改为 `if zero_sd_list is None or len(zero_sd_list) == 0`
|
||||
|
||||
# 原始 deepspeed/runtime/engine.py, 大概 #Lines2746-2748
|
||||
zero_sd_list = self._get_all_zero_checkpoints(load_dir, tag)
|
||||
if zero_sd_list is None:
|
||||
return False
|
||||
|
||||
# modified
|
||||
# 修改后
|
||||
zero_sd_list = self._get_all_zero_checkpoints(load_dir, tag)
|
||||
if zero_sd_list is None or len(zero_sd_list) == 0:
|
||||
return False
|
||||
```
|
||||
3. Prepare pretrained weights
|
||||
Download the Baichuan-7B checkpoint from [here](https://huggingface.co/baichuan-inc/Baichuan-7B/tree/main)
|
||||
|
||||
```shell
|
||||
mkdir baichuan-7B-hf
|
||||
@ -109,7 +118,8 @@ wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenizer.mode
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenizer_config.json
|
||||
cd ..
|
||||
```
|
||||
In order to adapt to the baichuan-7B model, the following script is used to convert the model pre-training weights.
|
||||
|
||||
接着将hf格式的权重转化为AscendSpeed可以加载的形式:
|
||||
```shell
|
||||
mkdir weight
|
||||
|
||||
@ -127,18 +137,18 @@ python $SCRIPT_PATH \
|
||||
```
|
||||
|
||||
|
||||
4. Prepare dataset
|
||||
4. 准备数据集
|
||||
|
||||
Download the Baichuan-7B datasets from [here](https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet)
|
||||
从 [这里](https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet) 下载 BaiChuan-7B 的数据集:
|
||||
|
||||
```shell
|
||||
# download datasets
|
||||
# 下载数据集
|
||||
mkdir dataset_baichuan7B
|
||||
cd ./dataset_baichuan7B
|
||||
wget https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
|
||||
cd ..
|
||||
|
||||
# process datasets
|
||||
# 准备数据集
|
||||
python ./tools/preprocess_data.py \
|
||||
--input ./dataset_baichuan7B/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
|
||||
--tokenizer-name-or-path ./baichuan-7B-hf \
|
||||
@ -149,47 +159,44 @@ python ./tools/preprocess_data.py \
|
||||
```
|
||||
|
||||
|
||||
5. Config Baichuan-7B pre-training script : examples/baichuan/pretrain_baichuan_zero_7B.sh
|
||||
5. 配置 Baichuan-7B 预训练脚本: examples/baichuan/pretrain_baichuan_zero_7B.sh
|
||||
|
||||
```shell
|
||||
# modify the script according to your own ascend-toolkit path
|
||||
# 修改 ascend-toolkit 路径
|
||||
source /usr/local/Ascend/ascend-toolkit/set_env.sh
|
||||
|
||||
# modify script orign dataset path according to your own dataset path
|
||||
TOKENIZER_PATH=./baichuan-7B-hf/ #tokenizer path
|
||||
DATA_PATH=./dataset_baichuan7B/alpaca_text_document #processed dataset
|
||||
# 修改数据集,权重,词表等路径
|
||||
TOKENIZER_PATH=./baichuan-7B-hf/ #tokenizer 路径
|
||||
DATA_PATH=./dataset_baichuan7B/alpaca_text_document #数据集路径
|
||||
# 如果要加载权重,可以增加参数 `--load ./weight`
|
||||
```
|
||||
|
||||
6. Launch Baichuan-7B pre-training script: examples/baichuan/pretrain_baichuan_zero_7B.sh
|
||||
6. 启动 Baichuan-7B 预训练脚本: examples/baichuan/pretrain_baichuan_zero_7B.sh
|
||||
|
||||
```shell
|
||||
bash examples/baichuan/pretrain_baichuan_zero_7B.sh
|
||||
```
|
||||
*Note that if you want to train with weights from the huggingface, please add a parameter to the script `pretrain_baichuan_zero_7B.sh` by inserting `--load ./weight` at lines 74 - 107 and rerun it.*
|
||||
|
||||
### 性能
|
||||
|
||||
### Performance
|
||||
#### 吞吐
|
||||
|
||||
#### Machine performance
|
||||
Baichuan-7B 使用 **昇腾芯片** 和 **参考芯片** 的吞吐对比:
|
||||
|
||||
The performance of Baichuan-7B in **Ascend NPU** and **Reference**:
|
||||
|
||||
| Device | Model | total Iterations | throughput rate (samples/s/p) | throughput rate (tokens/s/p) | single-step time (s/step) | floating point operation (TFLOPs/s) |
|
||||
| ------ | ----------- | ---------------- | ----------------------------- | ---------------------------- | ------------------------- | ----------------------------------- |
|
||||
| NPUs | Baichuan-7B | 1024 | 3.722 | 1905 | 2.14 | 102.69 |
|
||||
| Reference | Baichuan-7B | 1024 | 3.978 | 2036 | 1.98 | 125.66 |
|
||||
| 设备 | 模型 | 迭代 | 样本吞吐 (samples/p/s) | tokens吞吐 (tokens/p/s) | 单步迭代时间 (s/step) | 浮点计算数 (TFLOPs/s) |
|
||||
|----|--------|----|--------------------|-----------------------|-----------------|------------------|
|
||||
| NPUs | Baichuan-7B | 1024 | 3.722 | 1905 | 2.14 | 102.69 |
|
||||
| 参考 | Baichuan-7B | 1024 | 3.978 | 2036 | 1.98 | 125.66 |
|
||||
|
||||
|
||||
|
||||
#### Accuracy of the loss
|
||||
#### 精度
|
||||
|
||||
NPU vs Reference loss.
|
||||
|
||||
The NPU runs smoothly, the resource usage is stable, no errors are reported in the middle of the process, the Loss is on a decreasing trend, and the convergence speed is as expected. The relative error of the average loss is 0.01093, less than 2%, the maximum relative error is 0.1243, and the maximum absolute error is 0.4859. The precision meets the requirements.
|
||||
NPU vs 参考 loss.
|
||||
|
||||
![NPU-LOSS](../../sources/images/baichuan/7B_loss_compare.png)
|
||||
|
||||
NPU vs Reference loss relative error.
|
||||
NPU vs 参考 loss 相对误差.
|
||||
|
||||
![NPU-Relative-Error](../../sources/images/baichuan/7B_relative_error.png)
|
||||
|
||||
@ -197,20 +204,19 @@ NPU vs Reference loss relative error.
|
||||
|
||||
# Baichuan-13B
|
||||
|
||||
## Training
|
||||
## 训练
|
||||
|
||||
Here's a hardware summary of pre-training Baichuan-13B:
|
||||
Baichuan-13B 训练的硬件配置如下:
|
||||
|
||||
| Hardware | Value |
|
||||
| :------: | :---------------------------------------------: |
|
||||
| NPU | 8 x Ascend NPUs |
|
||||
| 硬件 | 配置 |
|
||||
|:---:|:---------------:|
|
||||
| NPU | 8 x Ascend NPUs |
|
||||
|
||||
Here's a software summary of pre-training Baichuan-13B:
|
||||
Baichuan-13B 训练的软件配置如下:
|
||||
|
||||
|
||||
| Software | Version |
|
||||
| :-----------------------: |:-----------:|
|
||||
| Python | 3.7.16 |
|
||||
| 软件 | 配置 |
|
||||
|:-------------------------:|:------------------------------------------------------------------------------------------------------------:|
|
||||
| python | 3.7.16 |
|
||||
| driver | [package](https://support.huawei.com/enterprise/zh/ascend-computing/atlas-900-pod-a2-pid-254184911/software) |
|
||||
| firmware | [package](https://support.huawei.com/enterprise/zh/ascend-computing/atlas-900-pod-a2-pid-254184911/software) |
|
||||
| CANN | [package](https://support.huawei.com/enterprise/zh/ascend-computing/cann-pid-251168373/software) |
|
||||
@ -221,46 +227,45 @@ Here's a software summary of pre-training Baichuan-13B:
|
||||
|
||||
|
||||
|
||||
### Script
|
||||
1. Clone the repository to your local server:
|
||||
### 脚本
|
||||
1. 将仓库拷贝到你的个人服务器:
|
||||
```shell
|
||||
git clone https://gitee.com/ascend/AscendSpeed.git
|
||||
cd AscendSpeed
|
||||
mkdir logs
|
||||
mkdir ckpt
|
||||
```
|
||||
2. Build environment
|
||||
2. 搭建环境
|
||||
|
||||
```bash
|
||||
# python3.7
|
||||
conda create -n test python=3.7
|
||||
conda activate test
|
||||
|
||||
# install torch and torch_npu
|
||||
# 安装 torch 和 torch_npu
|
||||
pip install torch-1.11.0-cp37-cp37m-linux_aarch64.whl
|
||||
pip install torch_npu-1.11.0.XXX-cp37-cp37m-linux_XXX.whl
|
||||
|
||||
#install megatron
|
||||
# 安装 megatron
|
||||
git clone https://github.com/NVIDIA/Megatron-LM.git -b 23.05
|
||||
cd Megatron-LM
|
||||
pip3 install -e ./
|
||||
cd ..
|
||||
|
||||
# install deepspeed and deepspeed_npu
|
||||
# 安装 deepspeed 和 deepspeed_npu
|
||||
pip install deepspeed==0.9.2
|
||||
git clone https://gitee.com/ascend/DeepSpeed.git -b v0.9.2 deepspeed_npu
|
||||
cd deepspeed_npu
|
||||
pip3 install -e ./
|
||||
cd ..
|
||||
|
||||
# install other packages
|
||||
# 安装其余依赖库
|
||||
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
|
||||
```
|
||||
|
||||
3. Prepare pretrained weights
|
||||
3. (可选的)准备预训练权重
|
||||
|
||||
|
||||
Download the Baichuan-13B checkpoint from [here](https://huggingface.co/baichuan-inc/Baichuan-13B-Base/tree/main)
|
||||
从 [huggingface](https://huggingface.co/baichuan-inc/Baichuan-13B-Base/tree/main) 下载预训练权重
|
||||
```shell
|
||||
mkdir baichuan-13B-hf
|
||||
cd ./baichuan-13B-hf
|
||||
@ -280,7 +285,7 @@ wget https://huggingface.co/baichuan-inc/Baichuan-13B-Base/resolve/main/tokenize
|
||||
cd ..
|
||||
```
|
||||
|
||||
In order to adapt to the baichuan-13B model, the following script is used to convert the model pre-training weights.
|
||||
将 BaiChuan-13B 模型权重从 huggingface 格式转换为 AscendSpeed 格式
|
||||
```shell
|
||||
mkdir weight
|
||||
|
||||
@ -295,8 +300,9 @@ python $SCRIPT_PATH \
|
||||
--pse
|
||||
```
|
||||
|
||||
4. Prepare dataset
|
||||
Download the Baichuan-13B datasets from [here](https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet)
|
||||
4. 准备数据集
|
||||
|
||||
下载 Baichuan-13B [数据集](https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet)
|
||||
|
||||
```shell
|
||||
mkdir dataset_baichuan13B
|
||||
@ -304,7 +310,6 @@ cd ./dataset_baichuan13B
|
||||
wget https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
|
||||
cd ..
|
||||
|
||||
|
||||
python ./tools/preprocess_data.py \
|
||||
--input ./dataset_baichuan13B/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
|
||||
--tokenizer-name-or-path ./baichuan-13B-hf \
|
||||
@ -315,57 +320,49 @@ python ./tools/preprocess_data.py \
|
||||
```
|
||||
|
||||
|
||||
5. Config Baichuan-13B pre-training script: /examples/baichuan/pretrain_baichuan_ptd_13B.sh
|
||||
5. 配置 Baichuan-13B 训练脚本: /examples/baichuan/pretrain_baichuan_ptd_13B.sh
|
||||
|
||||
|
||||
```shell
|
||||
# modify the script according to your own ascend-toolkit path
|
||||
# 修改 ascend-toolkit 路径
|
||||
source /usr/local/Ascend/ascend-toolkit/set_env.sh
|
||||
|
||||
# modify script orign dataset path according to your own dataset path
|
||||
# 修改词表,数据集等路径
|
||||
TOKENIZER_PATH=./baichuan-13B-hf
|
||||
DATA_PATH=./dataset_baichuan13B/aplaca_text_document
|
||||
```
|
||||
|
||||
6. Launch Baichuan-13B pre-training script: /examples/baichuan/pretrain_baichuan_ptd_13B.sh
|
||||
6. 启动 Baichuan-13B 训练脚本: /examples/baichuan/pretrain_baichuan_ptd_13B.sh
|
||||
|
||||
```bash
|
||||
bash examples/baichuan/pretrain_baichuan_ptd_13B.sh
|
||||
```
|
||||
|
||||
There is an hourly pulse checking script running that checks that the training is either running or scheduled.
|
||||
|
||||
|
||||
### 性能
|
||||
|
||||
#### 吞吐
|
||||
|
||||
Baichuan-13B 在 **昇腾芯片** 和 **参考芯片** 上的性能对比:
|
||||
|
||||
| 设备 | 模型 | 迭代数 | 样本吞吐 (samples/p/s) | token吞吐 (tokens/p/s) | 单步迭代时间 (s/step) | 浮点计算数 (TFLOPs/s) |
|
||||
|:----:|:------------:|:----:|:------------------:|:--------------------:|:---------------:|:----------------:|
|
||||
| NPUs | Baichuan-13B | 1000 | 1.928 | 1024 | 16.067 | 89.37 |
|
||||
| 参考 | Baichuan-13B | 1000 | 1.535 | 862 | 19.852 | 72.39 |
|
||||
|
||||
|
||||
|
||||
### Performance
|
||||
#### 精度
|
||||
|
||||
#### Machine performance
|
||||
NPU vs 参考 loss.
|
||||
|
||||
The performance of the Baichuan-13B in **Ascend NPU** and **Reference**:
|
||||
|
||||
| Device | Model | total Iterations | throughput rate (samples/s/p) | throughput rate (tokens/s/p) | single-step time (s/step) | floating point operation (TFLOPs/s) |
|
||||
| :----: | :----------: | :--------------: | :---------------------------: | :--------------------------: | :-----------------------: | :---------------------------------: |
|
||||
| NPUs | Baichuan-13B | 1000 | 1.928 | 1024 | 16.067 | 89.37 |
|
||||
| Reference | Baichuan-13B | 1000 | 1.535 | 862 | 19.852 | 72.39 |
|
||||
|
||||
|
||||
|
||||
#### Accuracy of the loss
|
||||
|
||||
NPU vs Reference loss.
|
||||
|
||||
The NPU runs smoothly, the resource usage is stable, no errors are reported in the middle of the process, the Loss is on a decreasing trend, and the convergence speed is as expected. The relative error of the average loss is 0.00725, less than 2%, the maximum relative error is 0.01978, and the maximum absolute error is 0.10811. The precision meets the requirements.
|
||||
|
||||
![NPU-LOSS](../../sources/images/baichuan/13B-loss-compare.png)
|
||||
|
||||
NPU vs Reference loss relative error.
|
||||
|
||||
The relative error between NPU and Reference Loss is less than 0.02 throughout, as expected.
|
||||
NPU vs 参考 loss 相对误差.
|
||||
|
||||
![NPU-Relative-Error](../../sources/images/baichuan/baichuan13B-loss-relative-error.png)
|
||||
\
|
||||
\
|
||||
<font size=1>If the download of the file fails using 'wget' , you can download it manually while ensuring website security.</font>
|
||||
|
||||
|
||||
|
||||
|
378
examples/baichuan/README_en.md
Normal file
378
examples/baichuan/README_en.md
Normal file
@ -0,0 +1,378 @@
|
||||
# BaiChuan
|
||||
<p align="left">
|
||||
<b><a href="https://gitee.com/ascend/AscendSpeed/blob/master/examples/baichuan/README.md">简体中文</a></b> |
|
||||
<b>English</b>
|
||||
</p>
|
||||
</p>
|
||||
|
||||
|
||||
# Contents
|
||||
|
||||
- [Baichuan-7B](#contents)
|
||||
- [Training](#pre-training)
|
||||
- [Script](#script)
|
||||
- [Performance](#performance)
|
||||
- [Machine performance](#machine-performance)
|
||||
- [Accuracy of the loss](#accuracy-of-the-loss)
|
||||
|
||||
- [Baichuan-13B](#contents)
|
||||
- [Training](#pre-training)
|
||||
- [Script](#script)
|
||||
- [Performance](#performance)
|
||||
- [Machine performance](#machine-performance)
|
||||
- [Accuracy of the loss](#accuracy-of-the-loss)
|
||||
|
||||
# Baichuan-7B
|
||||
|
||||
## Training
|
||||
|
||||
Here's a hardware summary of pre-training Baichuan-7B:
|
||||
|
||||
| Hardware | Value |
|
||||
| :------: | :---------------------------------------------: |
|
||||
| NPU | 8 x Ascend NPUs |
|
||||
|
||||
Here's a software summary of pre-training Baichuan-7B:
|
||||
|
||||
|
||||
| Software | Version |
|
||||
| :-----------------------: |:-----------:|
|
||||
| Python | 3.7.16 |
|
||||
| driver | [package](https://support.huawei.com/enterprise/zh/ascend-computing/atlas-900-pod-a2-pid-254184911/software) |
|
||||
| firmware | [package](https://support.huawei.com/enterprise/zh/ascend-computing/atlas-900-pod-a2-pid-254184911/software) |
|
||||
| CANN | [package](https://support.huawei.com/enterprise/zh/ascend-computing/cann-pid-251168373/software) |
|
||||
| binary arithmetic package | [package](https://support.huawei.com/enterprise/zh/ascend-computing/cann-pid-251168373/software) |
|
||||
| torch | 1.11.0 |
|
||||
| torch_npu | [package](https://gitee.com/ascend/pytorch/releases) |
|
||||
| apex | [package](https://pytorch-package.obs.cn-north-4.myhuaweicloud.com/pta/Daily/v1.11.0/20230915.2/pytorch_v1.11.0_py37.tar.gz) |
|
||||
|
||||
|
||||
|
||||
### Script
|
||||
|
||||
1. Clone the repository to your local server:
|
||||
```shell
|
||||
git clone https://gitee.com/ascend/AscendSpeed.git
|
||||
cd AscendSpeed
|
||||
mkdir logs
|
||||
mkdir ckpt
|
||||
```
|
||||
|
||||
2. Build environment
|
||||
|
||||
```bash
|
||||
# python3.7
|
||||
conda create -n test python=3.7
|
||||
conda activate test
|
||||
|
||||
# install torch and torch_npu
|
||||
pip install torch-1.11.0-cp37-cp37m-linux_aarch64.whl
|
||||
pip install torch_npu-1.11.0.XXX-cp37-cp37m-linux_XXX.whl
|
||||
|
||||
# install megatron-core
|
||||
pip3 install --no-use-pep517 -e git+https://github.com/NVIDIA/Megatron-LM.git@23.05#egg=megatron-core
|
||||
|
||||
# install deepspeed and deepspeed_npu
|
||||
pip install deepspeed==0.9.2
|
||||
git clone https://gitee.com/ascend/DeepSpeed.git -b v0.9.2 deepspeed_npu
|
||||
cd deepspeed_npu
|
||||
pip3 install -e ./
|
||||
cd ..
|
||||
|
||||
|
||||
# install other packages
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
*Note that if you want to train with the weight from huggingface, please run fix a deepspeed loading checkpointing bug by modified `if zero_sd_list is None` as `if zero_sd_list is None or len(zero_sd_list) == 0` in the `_load_zero_checkpoint` function of `<deepspeed-installed-path>/runtime/engine.py`*
|
||||
|
||||
```text
|
||||
# original deepspeed/runtime/engine.py, about #Lines2746-2748
|
||||
zero_sd_list = self._get_all_zero_checkpoints(load_dir, tag)
|
||||
if zero_sd_list is None:
|
||||
return False
|
||||
|
||||
# modified
|
||||
zero_sd_list = self._get_all_zero_checkpoints(load_dir, tag)
|
||||
if zero_sd_list is None or len(zero_sd_list) == 0:
|
||||
return False
|
||||
```
|
||||
3. Prepare pretrained weights
|
||||
Download the Baichuan-7B checkpoint from [here](https://huggingface.co/baichuan-inc/Baichuan-7B/tree/main)
|
||||
|
||||
```shell
|
||||
mkdir baichuan-7B-hf
|
||||
cd ./baichuan-7B-hf
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/config.json
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/configuration_baichuan.py
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/generation_config.json
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/handler.py
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/modeling_baichuan.py
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/pytorch_model.bin
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/special_tokens_map.json
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenization_baichuan.py
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenizer.model
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenizer_config.json
|
||||
cd ..
|
||||
```
|
||||
In order to adapt to the baichuan-7B model, the following script is used to convert the model pre-training weights.
|
||||
```shell
|
||||
mkdir weight
|
||||
|
||||
SCRIPT_PATH=./tools/ckpt_convert/llama/convert_weights_from_huggingface.py
|
||||
python $SCRIPT_PATH \
|
||||
--input-model-dir ./baichuan-7B-hf \
|
||||
--output-model-dir ./weight \
|
||||
--tensor-model-parallel-size 1 \
|
||||
--pipeline-model-parallel-size 1 \
|
||||
--type 7B \
|
||||
--pse \
|
||||
--deepspeed \
|
||||
--use_wpack_rotray \
|
||||
--load_weight_map
|
||||
```
|
||||
|
||||
|
||||
4. Prepare dataset
|
||||
|
||||
Download the Baichuan-7B datasets from [here](https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet)
|
||||
|
||||
```shell
|
||||
# download datasets
|
||||
mkdir dataset_baichuan7B
|
||||
cd ./dataset_baichuan7B
|
||||
wget https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
|
||||
cd ..
|
||||
|
||||
# process datasets
|
||||
python ./tools/preprocess_data.py \
|
||||
--input ./dataset_baichuan7B/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
|
||||
--tokenizer-name-or-path ./baichuan-7B-hf \
|
||||
--output-prefix ./dataset_baichuan7B/alpaca \
|
||||
--workers 4 \
|
||||
--log-interval 1000 \
|
||||
--tokenizer-type PretrainedFromHF
|
||||
```
|
||||
|
||||
|
||||
5. Config Baichuan-7B pre-training script : examples/baichuan/pretrain_baichuan_zero_7B.sh
|
||||
|
||||
```shell
|
||||
# modify the script according to your own ascend-toolkit path
|
||||
source /usr/local/Ascend/ascend-toolkit/set_env.sh
|
||||
|
||||
# modify script orign dataset path according to your own dataset path
|
||||
TOKENIZER_PATH=./baichuan-7B-hf/ #tokenizer path
|
||||
DATA_PATH=./dataset_baichuan7B/alpaca_text_document #processed dataset
|
||||
```
|
||||
|
||||
6. Launch Baichuan-7B pre-training script: examples/baichuan/pretrain_baichuan_zero_7B.sh
|
||||
|
||||
```shell
|
||||
bash examples/baichuan/pretrain_baichuan_zero_7B.sh
|
||||
```
|
||||
*Note that if you want to train with weights from the huggingface, please add a parameter to the script `pretrain_baichuan_zero_7B.sh` by inserting `--load ./weight` at lines 74 - 107 and rerun it.*
|
||||
|
||||
|
||||
### Performance
|
||||
|
||||
#### Machine performance
|
||||
|
||||
The performance of Baichuan-7B in **Ascend NPU** and **Reference**:
|
||||
|
||||
| Device | Model | total Iterations | throughput rate (samples/s/p) | throughput rate (tokens/s/p) | single-step time (s/step) | floating point operation (TFLOPs/s) |
|
||||
| ------ | ----------- | ---------------- | ----------------------------- | ---------------------------- | ------------------------- | ----------------------------------- |
|
||||
| NPUs | Baichuan-7B | 1024 | 3.722 | 1905 | 2.14 | 102.69 |
|
||||
| Reference | Baichuan-7B | 1024 | 3.978 | 2036 | 1.98 | 125.66 |
|
||||
|
||||
|
||||
|
||||
#### Accuracy of the loss
|
||||
|
||||
NPU vs Reference loss.
|
||||
|
||||
The NPU runs smoothly, the resource usage is stable, no errors are reported in the middle of the process, the Loss is on a decreasing trend, and the convergence speed is as expected. The relative error of the average loss is 0.01093, less than 2%, the maximum relative error is 0.1243, and the maximum absolute error is 0.4859. The precision meets the requirements.
|
||||
|
||||
![NPU-LOSS](../../sources/images/baichuan/7B_loss_compare.png)
|
||||
|
||||
NPU vs Reference loss relative error.
|
||||
|
||||
![NPU-Relative-Error](../../sources/images/baichuan/7B_relative_error.png)
|
||||
|
||||
|
||||
|
||||
# Baichuan-13B
|
||||
|
||||
## Training
|
||||
|
||||
Here's a hardware summary of pre-training Baichuan-13B:
|
||||
|
||||
| Hardware | Value |
|
||||
| :------: | :---------------------------------------------: |
|
||||
| NPU | 8 x Ascend NPUs |
|
||||
|
||||
Here's a software summary of pre-training Baichuan-13B:
|
||||
|
||||
|
||||
| Software | Version |
|
||||
| :-----------------------: |:-----------:|
|
||||
| Python | 3.7.16 |
|
||||
| driver | [package](https://support.huawei.com/enterprise/zh/ascend-computing/atlas-900-pod-a2-pid-254184911/software) |
|
||||
| firmware | [package](https://support.huawei.com/enterprise/zh/ascend-computing/atlas-900-pod-a2-pid-254184911/software) |
|
||||
| CANN | [package](https://support.huawei.com/enterprise/zh/ascend-computing/cann-pid-251168373/software) |
|
||||
| binary arithmetic package | [package](https://support.huawei.com/enterprise/zh/ascend-computing/cann-pid-251168373/software) |
|
||||
| torch | 1.11.0 |
|
||||
| torch_npu | [package](https://gitee.com/ascend/pytorch/releases) |
|
||||
|
||||
|
||||
|
||||
|
||||
### Script
|
||||
1. Clone the repository to your local server:
|
||||
```shell
|
||||
git clone https://gitee.com/ascend/AscendSpeed.git
|
||||
cd AscendSpeed
|
||||
mkdir logs
|
||||
mkdir ckpt
|
||||
```
|
||||
2. Build environment
|
||||
|
||||
```bash
|
||||
# python3.7
|
||||
conda create -n test python=3.7
|
||||
conda activate test
|
||||
|
||||
# install torch and torch_npu
|
||||
pip install torch-1.11.0-cp37-cp37m-linux_aarch64.whl
|
||||
pip install torch_npu-1.11.0.XXX-cp37-cp37m-linux_XXX.whl
|
||||
|
||||
#install megatron
|
||||
git clone https://github.com/NVIDIA/Megatron-LM.git -b 23.05
|
||||
cd Megatron-LM
|
||||
pip3 install -e ./
|
||||
cd ..
|
||||
|
||||
# install deepspeed and deepspeed_npu
|
||||
pip install deepspeed==0.9.2
|
||||
git clone https://gitee.com/ascend/DeepSpeed.git -b v0.9.2 deepspeed_npu
|
||||
cd deepspeed_npu
|
||||
pip3 install -e ./
|
||||
cd ..
|
||||
|
||||
# install other packages
|
||||
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
|
||||
```
|
||||
|
||||
3. Prepare pretrained weights
|
||||
|
||||
|
||||
Download the Baichuan-13B checkpoint from [here](https://huggingface.co/baichuan-inc/Baichuan-13B-Base/tree/main)
|
||||
```shell
|
||||
mkdir baichuan-13B-hf
|
||||
cd ./baichuan-13B-hf
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-13B-Base/resolve/main/config.json
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-13B-Base/resolve/main/configuration_baichuan.py
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-13B-Base/resolve/main/generation_config.json
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-13B-Base/resolve/main/modeling_baichuan.py
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-13B-Base/resolve/main/pytorch_model-00001-of-00003.bin
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-13B-Base/resolve/main/pytorch_model-00002-of-00003.bin
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-13B-Base/resolve/main/pytorch_model-00003-of-00003.bin
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-13B-Base/resolve/main/pytorch_model.bin.index.json
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-13B-Base/resolve/main/quantizer.py
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-13B-Base/resolve/main/special_tokens_map.json
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-13B-Base/resolve/main/tokenization_baichuan.py
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-13B-Base/resolve/main/tokenizer_config.json
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-13B-Base/resolve/main/tokenizer.model
|
||||
cd ..
|
||||
```
|
||||
|
||||
In order to adapt to the baichuan-13B model, the following script is used to convert the model pre-training weights.
|
||||
```shell
|
||||
mkdir weight
|
||||
|
||||
SCRIPT_PATH=./tools/ckpt_convert/llama/convert_weights_from_huggingface.py
|
||||
python $SCRIPT_PATH \
|
||||
--input-model-dir ./baichuan-13B-hf \
|
||||
--output-model-dir ./weight \
|
||||
--tensor-model-parallel-size 8 \
|
||||
--pipeline-model-parallel-size 1 \
|
||||
--make-vocab-size-divisible-by 8 \
|
||||
--type 13B \
|
||||
--pse
|
||||
```
|
||||
|
||||
4. Prepare dataset
|
||||
Download the Baichuan-13B datasets from [here](https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet)
|
||||
|
||||
```shell
|
||||
mkdir dataset_baichuan13B
|
||||
cd ./dataset_baichuan13B
|
||||
wget https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
|
||||
cd ..
|
||||
|
||||
|
||||
python ./tools/preprocess_data.py \
|
||||
--input ./dataset_baichuan13B/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
|
||||
--tokenizer-name-or-path ./baichuan-13B-hf \
|
||||
--output-prefix ./dataset_baichuan13B/alpaca \
|
||||
--workers 4 \
|
||||
--log-interval 1000 \
|
||||
--tokenizer-type PretrainedFromHF
|
||||
```
|
||||
|
||||
|
||||
5. Config Baichuan-13B pre-training script: /examples/baichuan/pretrain_baichuan_ptd_13B.sh
|
||||
|
||||
|
||||
```shell
|
||||
# modify the script according to your own ascend-toolkit path
|
||||
source /usr/local/Ascend/ascend-toolkit/set_env.sh
|
||||
|
||||
# modify script orign dataset path according to your own dataset path
|
||||
TOKENIZER_PATH=./baichuan-13B-hf
|
||||
DATA_PATH=./dataset_baichuan13B/aplaca_text_document
|
||||
```
|
||||
|
||||
6. Launch Baichuan-13B pre-training script: /examples/baichuan/pretrain_baichuan_ptd_13B.sh
|
||||
|
||||
```bash
|
||||
bash examples/baichuan/pretrain_baichuan_ptd_13B.sh
|
||||
```
|
||||
|
||||
There is an hourly pulse checking script running that checks that the training is either running or scheduled.
|
||||
|
||||
|
||||
|
||||
### Performance
|
||||
|
||||
#### Machine performance
|
||||
|
||||
The performance of the Baichuan-13B in **Ascend NPU** and **Reference**:
|
||||
|
||||
| Device | Model | total Iterations | throughput rate (samples/s/p) | throughput rate (tokens/s/p) | single-step time (s/step) | floating point operation (TFLOPs/s) |
|
||||
| :----: | :----------: | :--------------: | :---------------------------: | :--------------------------: | :-----------------------: | :---------------------------------: |
|
||||
| NPUs | Baichuan-13B | 1000 | 1.928 | 1024 | 16.067 | 89.37 |
|
||||
| Reference | Baichuan-13B | 1000 | 1.535 | 862 | 19.852 | 72.39 |
|
||||
|
||||
|
||||
|
||||
#### Accuracy of the loss
|
||||
|
||||
NPU vs Reference loss.
|
||||
|
||||
The NPU runs smoothly, the resource usage is stable, no errors are reported in the middle of the process, the Loss is on a decreasing trend, and the convergence speed is as expected. The relative error of the average loss is 0.00725, less than 2%, the maximum relative error is 0.01978, and the maximum absolute error is 0.10811. The precision meets the requirements.
|
||||
|
||||
![NPU-LOSS](../../sources/images/baichuan/13B-loss-compare.png)
|
||||
|
||||
NPU vs Reference loss relative error.
|
||||
|
||||
The relative error between NPU and Reference Loss is less than 0.02 throughout, as expected.
|
||||
|
||||
![NPU-Relative-Error](../../sources/images/baichuan/baichuan13B-loss-relative-error.png)
|
||||
\
|
||||
\
|
||||
<font size=1>If the download of the file fails using 'wget' , you can download it manually while ensuring website security.</font>
|
||||
|
||||
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user