!449 增加baichuan7B/baichuan2-7B adaptor

Merge pull request !449 from xiongliangcheng/modellink
This commit is contained in:
xiongliangcheng 2024-01-26 11:52:49 +00:00 committed by i-robot
parent daa835112c
commit 2a9af85917
23 changed files with 851 additions and 0 deletions

8
OWNERS
View File

@ -30,3 +30,11 @@ reviewers:
- wenjiang2357 - wenjiang2357
- leizhenzhen - leizhenzhen
- liuyanghan - liuyanghan
- Ares_Lzk
- flying-artillery
- xiong-liangcheng_admin
- gitee-code-template
- yaojia2021
- chantcalf
- kongfuziyue
- yuhui69

166
examples/baichuan/README.md Normal file
View File

@ -0,0 +1,166 @@
# BaiChuan
<p align="left">
<b>简体中文</b> |
<b><a href="https://gitee.com/ascend/ModelLink/blob/modellink/examples/baichuan/README_en.md">English</a> </b>
</p>
</p>
# 目录
- [Baichuan-7B](#Baichuan-7B)
- [训练](#训练)
- [脚本](#脚本)
- [性能](#性能)
- [吞吐](#吞吐)
- [精度](#精度)
# Baichuan-7B
## 训练
Baichuan-7B 训练的硬件配置如下:
| 硬件 | 配置 |
|:---:|:---------------:|
| NPU | 8 x Ascend NPUs |
### 脚本
1. 拷贝仓库到你的个人服务器:
```shell
git clone https://gitee.com/ascend/ModelLink.git
cd ModeLlink
git checkout modellink
mkdir logs
mkdir ckpt
```
2. 搭建环境
```bash
# python3.8
conda create -n test python=3.8
conda activate test
# 安装 torch 和 torch_npu
pip install torch-2.1.0-cp37-cp37m-linux_aarch64.whl
pip install torch_npu-2.1.0.XXX-cp37-cp37m-linux_aarch64.whl
pip install apex-0.1_ascend*-cp38-cp38m-linux_aarch64.whl
# 修改 ascend-toolkit 路径
source /usr/local/Ascend/ascend-toolkit/set_env.sh
# 安装加速库
git clone https://gitee.com/ascend/AscendSpeed.git
cd AscendSpeed
pip install -r requirements.txt
pip3 install -e .
cd ..
# 安装其余依赖库
pip install -r requirements.txt
```
3. (可选)准备预训练权重
从 [huggingface](https://huggingface.co/baichuan-inc/Baichuan-7B/tree/main) 下载预训练权重:
```shell
mkdir baichuan-7B-hf
cd ./baichuan-7B-hf
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/config.json
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/configuration_baichuan.py
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/generation_config.json
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/handler.py
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/modeling_baichuan.py
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/pytorch_model.bin
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/special_tokens_map.json
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenization_baichuan.py
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenizer.model
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenizer_config.json
cd ..
```
接着将hf格式的权重转化为AscendSpeed可以加载的形式
```shell
mkdir baichuan-7B-mt
SCRIPT_PATH=./tools/ckpt_convert/llama/convert_weights_from_huggingface.py
python $SCRIPT_PATH \
--input-model-dir ./baichuan-7B-hf \
--output-model-dir ./baichuan-7B-mt \
--tensor-model-parallel-size 8 \
--pipeline-model-parallel-size 1 \
--type 7B \
--pse \
--merge-mlp
```
4. 准备数据集
从 [这里](https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet) 下载 BaiChuan-7B 的数据集:
```shell
# 下载数据集
mkdir dataset_baichuan7B
cd ./dataset_baichuan7B
wget https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
cd ..
# 准备数据集
python ./tools/preprocess_data.py \
--input ./dataset_baichuan7B/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
--tokenizer-name-or-path ./baichuan-7B-hf \
--output-prefix ./dataset_baichuan7B/alpaca \
--workers 4 \
--log-interval 1000 \
--tokenizer-type PretrainedFromHF
```
5. 配置 Baichuan-7B 预训练脚本: examples/baichuan/pretrain_baichuan_ptd_7B.sh
```shell
# 修改 ascend-toolkit 路径
source /usr/local/Ascend/ascend-toolkit/set_env.sh
CKPT_SAVE_DIR="./ckpt"
DATA_PATH="./dataset_baichuan7B/alpaca_text_document"
TOKENIZER_MODEL="./baichuan-7B-hf/tokenizer.model"
CKPT_LOAD_DIR="./baichuan-7B-mt"
```
6. 启动 Baichuan-7B 预训练脚本: examples/baichuan/pretrain_baichuan_ptd_7B.sh
```shell
bash examples/baichuan/pretrain_baichuan_ptd_7B.sh
```
### 性能
#### 吞吐
Baichuan-7B 在 **昇腾芯片****参考芯片** 上的性能对比:
| 设备 | 模型 | 迭代数 | 样本吞吐 (samples/s) | tokens吞吐 (tokens/s/p) | 单步迭代时间 (s/step) |
|:----:|:---------:|:----:|:---------------------:|:---------------:|:----------------:|
| NPUs | Baichuan-7B | 1000 | 4.78 | 2448.76 | 6.688|
| 参考 | Baichuan-7B | 1000 | 5.45 | 2792.56 | 5.863 |
#### 精度
NPU vs 参考 loss.
![NPU-LOSS](../../sources/images/baichuan/baichuan7B-loss-compare.png)
NPU vs 参考 loss 相对误差.
![NPU-Relative-Error](../../sources/images/baichuan/baichuan7B-loss-relative-error.png)

View File

@ -0,0 +1,165 @@
# BaiChuan
<p align="left">
<b><a href="https://gitee.com/ascend/ModelLink/blob/modellink/examples/baichuan/README.md">简体中文</a></b> |
<b>English</b>
</p>
</p>
# Contents
- [Baichuan-7B](#contents)
- [Training](#pre-training)
- [Script](#script)
- [Performance](#performance)
- [Machine performance](#machine-performance)
- [Accuracy of the loss](#accuracy-of-the-loss)
# Baichuan-7B
## Training
Here's a hardware summary of pre-training Baichuan-7B:
| Hardware | Value |
| :------: | :---------------------------------------------: |
| NPU | 8 x Ascend NPUs |
### Script
1. Clone the repository to your local server:
```shell
git clone https://gitee.com/ascend/ModelLink.git
cd ModeLlink
git checkout modellink
mkdir logs
mkdir ckpt
```
2. Build environment
```bash
# python3.8
conda create -n test python=3.8
conda activate test
# install torch and torch_npu
pip install torch-2.1.0-cp38-cp38m-linux_aarch64.whl
pip install torch_npu-2.1.0.XXX-cp38-cp38m-linux_aarch64.whl
pip install apex-0.1_ascend*-cp38-cp38m-linux_aarch64.whl
# modify the path according to your own ascend-toolkit path
source /usr/local/Ascend/ascend-toolkit/set_env.sh
# install AscendSpeed
git clone https://gitee.com/ascend/AscendSpeed.git
cd AscendSpeed
pip install -r requirements.txt
pip3 install -e .
cd ..
# install other packages
pip install -r requirements.txt
```
3. Prepare pretrained weights
Download the Baichuan-7B checkpoint from [here](https://huggingface.co/baichuan-inc/Baichuan-7B/tree/main)
```shell
mkdir baichuan-7B-hf
cd ./baichuan-7B-hf
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/config.json
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/configuration_baichuan.py
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/generation_config.json
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/handler.py
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/modeling_baichuan.py
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/pytorch_model.bin
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/special_tokens_map.json
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenization_baichuan.py
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenizer.model
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenizer_config.json
cd ..
```
In order to adapt to the baichuan-7B model, the following script is used to convert the model pre-training weights.
```shell
mkdir weight
SCRIPT_PATH=./tools/ckpt_convert/llama/convert_weights_from_huggingface.py
python $SCRIPT_PATH \
--input-model-dir ./baichuan-7B-hf \
--output-model-dir ./weight \
--tensor-model-parallel-size 8 \
--pipeline-model-parallel-size 1 \
--type 7B \
--pse \
--merge-mlp
```
4. Prepare dataset
Download the Baichuan-7B datasets from [here](https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet)
```shell
# download datasets
mkdir dataset_baichuan7B
cd ./dataset_baichuan7B
wget https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
cd ..
# process datasets
python ./tools/preprocess_data.py \
--input ./dataset_baichuan7B/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
--tokenizer-name-or-path ./baichuan-7B-hf \
--output-prefix ./dataset_baichuan7B/alpaca \
--workers 4 \
--log-interval 1000 \
--tokenizer-type PretrainedFromHF
```
5. Config Baichuan-7B pre-training script : examples/baichuan/pretrain_baichuan_ptd_7B.sh
```shell
# modify the script according to your own ascend-toolkit path
source /usr/local/Ascend/ascend-toolkit/set_env.sh
CKPT_SAVE_DIR="./ckpt"
DATA_PATH="./dataset_baichuan7B/alpaca_text_document"
TOKENIZER_MODEL="./baichuan-7B-hf/tokenizer.model"
CKPT_LOAD_DIR="./baichuan-7B-mt"
```
6. Launch Baichuan-7B pre-training script: examples/baichuan/pretrain_baichuan_ptd_7B.sh
```shell
bash examples/baichuan/pretrain_baichuan_ptd_7B.sh
```
### Performance
#### Machine performance
The performance of Baichuan-7B in **Ascend NPU** and **Reference**:
| Device | Model | total Iterations | throughput rate (samples/s) | throughput rate (tokens/s/p) | single-step time (s/step) |
|:----:|:---------:|:----:|:---------------------:|:---------------:|:----------------:|
| NPUs | Baichuan-7B | 1000 | 4.78 | 2448.76 | 6.688|
| Reference | Baichuan-7B | 1000 | 5.45 | 2792.56 | 5.863 |
#### Accuracy of the loss
NPU vs Reference loss.
![NPU-LOSS](../../sources/images/baichuan/baichuan7B-loss-compare.png)
NPU vs Reference loss relative error.
![NPU-Relative-Error](../../sources/images/baichuan/baichuan7B-loss-relative-error.png)

View File

@ -0,0 +1,89 @@
#!/bin/bash
export CUDA_DEVICE_MAX_CONNECTIONS=1
GPUS_PER_NODE=8
MASTER_ADDR=localhost
MASTER_PORT=6000
NNODES=1
NODE_RANK=0
WORLD_SIZE=$(($GPUS_PER_NODE*$NNODES))
CKPT_SAVE_DIR="your model save ckpt path"
DATA_PATH="your data path"
TOKENIZER_MODEL="your tokenizer model path"
CKPT_LOAD_DIR="your model load ckpt path"
TP=8
PP=1
DISTRIBUTED_ARGS="
--nproc_per_node $GPUS_PER_NODE \
--nnodes $NNODES \
--node_rank $NODE_RANK \
--master_addr $MASTER_ADDR \
--master_port $MASTER_PORT
"
GPT_ARGS="
--tensor-model-parallel-size ${TP} \
--pipeline-model-parallel-size ${PP} \
--sequence-parallel \
--num-layers 32 \
--hidden-size 4096 \
--ffn-hidden-size 11008 \
--num-attention-heads 32 \
--tokenizer-type Llama2Tokenizer \
--tokenizer-model ${TOKENIZER_MODEL} \
--load ${CKPT_LOAD_DIR} \
--seq-length 4096 \
--max-position-embeddings 4096 \
--micro-batch-size 4 \
--global-batch-size 32 \
--make-vocab-size-divisible-by 128 \
--lr 1e-5 \
--train-iters 5000 \
--lr-decay-style cosine \
--untie-embeddings-and-output-weights \
--disable-bias-linear \
--attention-dropout 0.0 \
--init-method-std 0.01 \
--hidden-dropout 0.0 \
--position-embedding-type rope \
--normalization RMSNorm \
--use-fused-rmsnorm \
--use-flash-attn \
--swiglu \
--no-masked-softmax-fusion \
--attention-softmax-in-fp32 \
--min-lr 1e-6 \
--weight-decay 1e-2 \
--lr-warmup-fraction 0.1 \
--clip-grad 1.0 \
--adam-beta1 0.9 \
--initial-loss-scale 8188.0 \
--adam-beta2 0.95 \
--no-gradient-accumulation-fusion \
--no-load-optim \
--no-load-rng \
--fp16
"
DATA_ARGS="
--data-path $DATA_PATH \
--split 949,50,1
"
OUTPUT_ARGS="
--log-interval 1 \
--save-interval 1000 \
--eval-interval 1000 \
--eval-iters 1 \
"
torchrun $DISTRIBUTED_ARGS pretrain_gpt.py \
$GPT_ARGS \
$DATA_ARGS \
$OUTPUT_ARGS \
--distributed-backend nccl \
--save ${CKPT_SAVE_DIR}

View File

@ -0,0 +1,166 @@
# BaiChuan2
<p align="left">
<b>简体中文</b> |
<b><a href="https://gitee.com/ascend/ModelLink/blob/modellink/examples/baichuan2/README_en.md">English</a> </b>
</p>
</p>
# 目录
- [Baichuan2-7B](#Baichuan2-7B)
- [训练](#训练)
- [脚本](#脚本)
- [性能](#性能)
- [吞吐](#吞吐)
- [精度](#精度)
# Baichuan2-7B
## 训练
Baichuan2-7B 训练的硬件配置如下:
| 硬件 | 配置 |
|:---:|:---------------:|
| NPU | 8 x Ascend NPUs |
### 脚本
1. 拷贝仓库到你的个人服务器:
```shell
git clone https://gitee.com/ascend/ModelLink.git
cd ModeLlink
git checkout modellink
mkdir logs
mkdir ckpt
```
2. 搭建环境
```bash
# python3.8
conda create -n test python=3.8
conda activate test
# 安装 torch 和 torch_npu
pip install torch-2.1.0-cp38-cp38m-linux_aarch64.whl
pip install torch_npu-2.1.0.XXX-cp38-cp38m-linux_XXX.whl
# 修改 ascend-toolkit 路径
source /usr/local/Ascend/ascend-toolkit/set_env.sh
# 安装加速库
git clone https://gitee.com/ascend/AscendSpeed.git
cd AscendSpeed
pip install -r requirements.txt
pip3 install -e .
cd ..
# 安装其余依赖库
pip install -r requirements.txt
```
3. (可选)准备预训练权重
从 [huggingface](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/tree/main) 下载预训练权重:
```shell
mkdir baichuan2-7B-hf
cd ./baichuan2-7B-hf
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/config.json
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/configuration_baichuan.py
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/generation_utils.py
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/modeling_baichuan.py
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/pytorch_model-00001-of-00002.bin
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/pytorch_model-00002-of-00002.bin
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/pytorch_model.bin.index.json
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/quantizer.py
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/special_tokens_map.json
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/tokenization_baichuan.py
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/tokenizer.model
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/tokenizer_config.json
cd ..
```
接着将hf格式的权重转化为AscendSpeed可以加载的形式
```shell
mkdir baichuan2-7B-mt
SCRIPT_PATH=./tools/ckpt_convert/llama/convert_weights_from_huggingface.py
# for ptd
python $SCRIPT_PATH \
--input-model-dir ./baichuan2-7B-hf \
--output-model-dir ./baichuan2-7B-mt \
--tensor-model-parallel-size 8 \
--pipeline-model-parallel-size 1 \
--type 7B \
--merge-mlp \
--pse
```
4. 准备数据集
从 [这里](https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet) 下载 Baichuan2-7B-Base 的数据集:
```shell
# 下载数据集
mkdir dataset_baichuan2-7B
cd ./dataset_baichuan2-7B
wget https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
cd ..
# 准备数据集
python ./tools/preprocess_data.py \
--input ./dataset_baichuan2-7B/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
--tokenizer-name-or-path ./baichuan2-7B-hf \
--output-prefix ./dataset_baichuan2-7B/alpaca \
--workers 4 \
--log-interval 1000 \
--tokenizer-type PretrainedFromHF
```
5. 配置 Baichuan2-7B 预训练脚本: examples/baichuan/pretrain_baichuan2_ptd_7B.sh
```shell
# 修改 ascend-toolkit 路径
source /usr/local/Ascend/ascend-toolkit/set_env.sh
# 修改数据集,权重,词表等路径
CKPT_SAVE_DIR="./ckpt"
DATA_PATH="./dataset_baichuan2-7B/alpaca_text_document"
TOKENIZER_MODEL="./baichuan2-7B-hf/tokenizer.model"
CKPT_LOAD_DIR="./baichuan2-7B-mt"
```
6. 启动 Baichuan2-7B 预训练脚本: examples/baichuan2/pretrain_baichuan2_ptd_7B.sh
```shell
bash examples/baichuan2/pretrain_baichuan2_ptd_7B.sh
```
### 性能
#### 吞吐
Baichuan2-7B 在 **昇腾芯片****参考芯片** 上的性能对比:
| 设备 | 模型 | 迭代数 | 样本吞吐 (samples/s) | tokens吞吐 (tokens/s/p) | 单步迭代时间 (s/step) |
|:----:|:---------:|:----:|:---------------------:|:---------------:|:----------------:|
| NPUs | Baichuan2-7B | 1000 | 4.59 | 2349 | 6.973|
| 参考 | Baichuan2-7B | 1000 | 5.40 | 2769 | 5.915 |
#### 精度
NPU vs 参考 loss.
![NPU-LOSS](../../sources/images/baichuan2/baichuan2-7B-loss-compare.png)
NPU vs 参考 loss 相对误差.
![NPU-Relative-Error](../../sources/images/baichuan2/baichuan2-7B-loss-relative-error.png)

View File

@ -0,0 +1,168 @@
# BaiChuan2
<p align="left">
<b><a href="https://gitee.com/ascend/ModelLink/blob/modellink/examples/baichuan2/README.md">简体中文</a></b> |
<b>English</b>
</p>
</p>
# Contents
- [Baichuan2-7B](#contents)
- [Training](#pre-training)
- [Script](#script)
- [Performance](#performance)
- [Machine performance](#machine-performance)
- [Accuracy of the loss](#accuracy-of-the-loss)
# Baichuan2-7B
## Training
Here's a hardware summary of pre-training Baichuan2-7B:
| Hardware | Value |
| :------: | :---------------------------------------------: |
| NPU | 8 x Ascend NPUs |
### Script
1. Clone the repository to your local server:
```shell
git clone https://gitee.com/ascend/ModelLink.git
cd ModeLlink
git checkout -b modellink origin/modellink
mkdir logs
mkdir ckpt
```
2. Build environment
```bash
# python3.8
conda create -n test python=3.8
conda activate test
# install torch and torch_npu
pip install torch-2.1.0-cp38-cp38m-linux_aarch64.whl
pip install torch_npu-2.1.0.XXX-cp38-cp38m-linux_XXX.whl
# modify the path according to your own ascend-toolkit path
source /usr/local/Ascend/ascend-toolkit/set_env.sh
# install AscendSpeed
git clone https://gitee.com/ascend/AscendSpeed.git
cd AscendSpeed
pip install -r requirements.txt
pip3 install -e .
cd ..
# install other packages
pip install -r requirements.txt
```
3. Prepare pretrained weights
Download the Baichuan2-7B checkpoint from [here](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/tree/main)
```shell
mkdir baichuan2-7B-hf
cd ./baichuan2-7B-hf
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/config.json
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/configuration_baichuan.py
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/generation_utils.py
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/modeling_baichuan.py
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/pytorch_model-00001-of-00002.bin
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/pytorch_model-00002-of-00002.bin
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/pytorch_model.bin.index.json
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/quantizer.py
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/special_tokens_map.json
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/tokenization_baichuan.py
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/tokenizer.model
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/tokenizer_config.json
cd ..
```
In order to adapt to the baichuan2-7B model, the following script is used to convert the model pre-training weights.
```shell
mkdir weight
SCRIPT_PATH=./tools/ckpt_convert/llama/convert_weights_from_huggingface.py
# for ptd
python $SCRIPT_PATH \
--input-model-dir ./baichuan2-7B-hf \
--output-model-dir ./weight-tp8 \
--tensor-model-parallel-size 8 \
--pipeline-model-parallel-size 1 \
--type 7B \
--merge-mlp \
--pse
```
4. Prepare dataset
Download the Baichuan2-7B-Base datasets from [here](https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet)
```shell
# download datasets
mkdir dataset_baichuan2-7B
cd ./dataset_baichuan2-7B
wget https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
cd ..
# process datasets
python ./tools/preprocess_data.py \
--input ./dataset_baichuan2-7B/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
--tokenizer-name-or-path ./baichuan2-7B-hf \
--output-prefix ./dataset_baichuan2-7B/alpaca \
--workers 4 \
--log-interval 1000 \
--tokenizer-type PretrainedFromHF
```
5. Config Baichuan2-7B pre-training script : examples/baichuan/pretrain_baichuan2_ptd_7B.sh
```shell
# modify the script according to your own ascend-toolkit path
source /usr/local/Ascend/ascend-toolkit/set_env.sh
# modify script orign dataset path according to your own dataset path
CKPT_SAVE_DIR="./ckpt"
DATA_PATH="./dataset_baichuan2-7B/alpaca_text_document"
TOKENIZER_MODEL="./baichuan2-7B-hf/tokenizer.model"
CKPT_LOAD_DIR="./baichuan2-7B-mt"
```
6. Launch Baichuan2-7B pre-training script: examples/baichuan2/pretrain_baichuan2_ptd_7B.sh
```shell
bash examples/baichuan2/pretrain_baichuan2_ptd_7B.sh
```
### Performance
#### Machine performance
The performance of Baichuan2-7B in **Ascend NPU** and **Reference**:
| Device | Model | total Iterations | throughput rate (samples/s) | throughput rate (tokens/s/p) | single-step time (s/step) |
|:----:|:---------:|:----:|:---------------------:|:---------------:|:----------------:|
| NPUs | Baichuan2-7B | 1000 | 4.59 | 2349 | 6.973|
| Reference | Baichuan2-7B | 1000 | 5.40 | 2769 | 5.915 |
#### Accuracy of the loss
NPU vs Reference loss.
![NPU-LOSS](../../sources/images/baichuan2/baichuan2-7B-loss-compare.png)
NPU vs Reference loss relative error.
![NPU-Relative-Error](../../sources/images/baichuan2/baichuan2-7B-loss-relative-error.png)

View File

@ -0,0 +1,89 @@
#!/bin/bash
export CUDA_DEVICE_MAX_CONNECTIONS=1
GPUS_PER_NODE=8
MASTER_ADDR=localhost
MASTER_PORT=6000
NNODES=1
NODE_RANK=0
WORLD_SIZE=$(($GPUS_PER_NODE*$NNODES))
CKPT_SAVE_DIR="your model save ckpt path"
DATA_PATH="your data path"
TOKENIZER_MODEL="your tokenizer model path"
CKPT_LOAD_DIR="your model load ckpt path"
TP=8
PP=1
DISTRIBUTED_ARGS="
--nproc_per_node $GPUS_PER_NODE \
--nnodes $NNODES \
--node_rank $NODE_RANK \
--master_addr $MASTER_ADDR \
--master_port $MASTER_PORT
"
GPT_ARGS="
--tensor-model-parallel-size ${TP} \
--pipeline-model-parallel-size ${PP} \
--sequence-parallel \
--num-layers 32 \
--hidden-size 4096 \
--ffn-hidden-size 11008 \
--num-attention-heads 32 \
--tokenizer-type Llama2Tokenizer \
--tokenizer-model ${TOKENIZER_MODEL} \
--load ${CKPT_LOAD_DIR} \
--seq-length 4096 \
--max-position-embeddings 4096 \
--micro-batch-size 4 \
--global-batch-size 32 \
--make-vocab-size-divisible-by 128 \
--lr 1e-6 \
--train-iters 5000 \
--lr-decay-style cosine \
--untie-embeddings-and-output-weights \
--disable-bias-linear \
--attention-dropout 0.0 \
--init-method-std 0.01 \
--hidden-dropout 0.0 \
--position-embedding-type rope \
--normalization RMSNorm \
--use-fused-rmsnorm \
--use-flash-attn \
--swiglu \
--no-masked-softmax-fusion \
--attention-softmax-in-fp32 \
--min-lr 1e-8 \
--weight-decay 1e-2 \
--lr-warmup-fraction 0.1 \
--clip-grad 1.0 \
--adam-beta1 0.9 \
--initial-loss-scale 8188.0 \
--adam-beta2 0.95 \
--no-gradient-accumulation-fusion \
--no-load-optim \
--no-load-rng \
--bf16
"
DATA_ARGS="
--data-path $DATA_PATH \
--split 949,50,1
"
OUTPUT_ARGS="
--log-interval 1 \
--save-interval 1000 \
--eval-interval 1000 \
--eval-iters 1 \
"
torchrun $DISTRIBUTED_ARGS pretrain_gpt.py \
$GPT_ARGS \
$DATA_ARGS \
$OUTPUT_ARGS \
--distributed-backend nccl \
--save ${CKPT_SAVE_DIR}

Binary file not shown.

Before

Width:  |  Height:  |  Size: 36 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 49 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 55 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 46 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 80 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 76 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 59 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 192 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 228 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 156 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 101 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 65 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 41 KiB