!449 增加baichuan7B/baichuan2-7B adaptor
Merge pull request !449 from xiongliangcheng/modellink
8
OWNERS
@ -30,3 +30,11 @@ reviewers:
|
||||
- wenjiang2357
|
||||
- leizhenzhen
|
||||
- liuyanghan
|
||||
- Ares_Lzk
|
||||
- flying-artillery
|
||||
- xiong-liangcheng_admin
|
||||
- gitee-code-template
|
||||
- yaojia2021
|
||||
- chantcalf
|
||||
- kongfuziyue
|
||||
- yuhui69
|
||||
|
166
examples/baichuan/README.md
Normal file
@ -0,0 +1,166 @@
|
||||
# BaiChuan
|
||||
<p align="left">
|
||||
<b>简体中文</b> |
|
||||
<b><a href="https://gitee.com/ascend/ModelLink/blob/modellink/examples/baichuan/README_en.md">English</a> </b>
|
||||
</p>
|
||||
</p>
|
||||
|
||||
|
||||
|
||||
# 目录
|
||||
|
||||
- [Baichuan-7B](#Baichuan-7B)
|
||||
- [训练](#训练)
|
||||
- [脚本](#脚本)
|
||||
- [性能](#性能)
|
||||
- [吞吐](#吞吐)
|
||||
- [精度](#精度)
|
||||
|
||||
|
||||
# Baichuan-7B
|
||||
|
||||
## 训练
|
||||
Baichuan-7B 训练的硬件配置如下:
|
||||
|
||||
| 硬件 | 配置 |
|
||||
|:---:|:---------------:|
|
||||
| NPU | 8 x Ascend NPUs |
|
||||
|
||||
### 脚本
|
||||
|
||||
1. 拷贝仓库到你的个人服务器:
|
||||
```shell
|
||||
git clone https://gitee.com/ascend/ModelLink.git
|
||||
cd ModeLlink
|
||||
git checkout modellink
|
||||
mkdir logs
|
||||
mkdir ckpt
|
||||
```
|
||||
|
||||
2. 搭建环境
|
||||
|
||||
```bash
|
||||
# python3.8
|
||||
conda create -n test python=3.8
|
||||
conda activate test
|
||||
|
||||
# 安装 torch 和 torch_npu
|
||||
pip install torch-2.1.0-cp37-cp37m-linux_aarch64.whl
|
||||
pip install torch_npu-2.1.0.XXX-cp37-cp37m-linux_aarch64.whl
|
||||
pip install apex-0.1_ascend*-cp38-cp38m-linux_aarch64.whl
|
||||
|
||||
# 修改 ascend-toolkit 路径
|
||||
source /usr/local/Ascend/ascend-toolkit/set_env.sh
|
||||
|
||||
# 安装加速库
|
||||
git clone https://gitee.com/ascend/AscendSpeed.git
|
||||
cd AscendSpeed
|
||||
pip install -r requirements.txt
|
||||
pip3 install -e .
|
||||
cd ..
|
||||
|
||||
# 安装其余依赖库
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
3. (可选)准备预训练权重
|
||||
|
||||
从 [huggingface](https://huggingface.co/baichuan-inc/Baichuan-7B/tree/main) 下载预训练权重:
|
||||
|
||||
|
||||
|
||||
```shell
|
||||
mkdir baichuan-7B-hf
|
||||
cd ./baichuan-7B-hf
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/config.json
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/configuration_baichuan.py
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/generation_config.json
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/handler.py
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/modeling_baichuan.py
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/pytorch_model.bin
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/special_tokens_map.json
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenization_baichuan.py
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenizer.model
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenizer_config.json
|
||||
cd ..
|
||||
```
|
||||
|
||||
接着将hf格式的权重转化为AscendSpeed可以加载的形式:
|
||||
```shell
|
||||
mkdir baichuan-7B-mt
|
||||
|
||||
SCRIPT_PATH=./tools/ckpt_convert/llama/convert_weights_from_huggingface.py
|
||||
python $SCRIPT_PATH \
|
||||
--input-model-dir ./baichuan-7B-hf \
|
||||
--output-model-dir ./baichuan-7B-mt \
|
||||
--tensor-model-parallel-size 8 \
|
||||
--pipeline-model-parallel-size 1 \
|
||||
--type 7B \
|
||||
--pse \
|
||||
--merge-mlp
|
||||
```
|
||||
|
||||
|
||||
4. 准备数据集
|
||||
|
||||
从 [这里](https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet) 下载 BaiChuan-7B 的数据集:
|
||||
|
||||
```shell
|
||||
# 下载数据集
|
||||
mkdir dataset_baichuan7B
|
||||
cd ./dataset_baichuan7B
|
||||
wget https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
|
||||
cd ..
|
||||
|
||||
# 准备数据集
|
||||
python ./tools/preprocess_data.py \
|
||||
--input ./dataset_baichuan7B/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
|
||||
--tokenizer-name-or-path ./baichuan-7B-hf \
|
||||
--output-prefix ./dataset_baichuan7B/alpaca \
|
||||
--workers 4 \
|
||||
--log-interval 1000 \
|
||||
--tokenizer-type PretrainedFromHF
|
||||
```
|
||||
|
||||
|
||||
5. 配置 Baichuan-7B 预训练脚本: examples/baichuan/pretrain_baichuan_ptd_7B.sh
|
||||
|
||||
```shell
|
||||
# 修改 ascend-toolkit 路径
|
||||
source /usr/local/Ascend/ascend-toolkit/set_env.sh
|
||||
|
||||
CKPT_SAVE_DIR="./ckpt"
|
||||
DATA_PATH="./dataset_baichuan7B/alpaca_text_document"
|
||||
TOKENIZER_MODEL="./baichuan-7B-hf/tokenizer.model"
|
||||
CKPT_LOAD_DIR="./baichuan-7B-mt"
|
||||
```
|
||||
|
||||
6. 启动 Baichuan-7B 预训练脚本: examples/baichuan/pretrain_baichuan_ptd_7B.sh
|
||||
|
||||
```shell
|
||||
bash examples/baichuan/pretrain_baichuan_ptd_7B.sh
|
||||
```
|
||||
|
||||
### 性能
|
||||
|
||||
#### 吞吐
|
||||
|
||||
Baichuan-7B 在 **昇腾芯片** 和 **参考芯片** 上的性能对比:
|
||||
|
||||
| 设备 | 模型 | 迭代数 | 样本吞吐 (samples/s) | tokens吞吐 (tokens/s/p) | 单步迭代时间 (s/step) |
|
||||
|:----:|:---------:|:----:|:---------------------:|:---------------:|:----------------:|
|
||||
| NPUs | Baichuan-7B | 1000 | 4.78 | 2448.76 | 6.688|
|
||||
| 参考 | Baichuan-7B | 1000 | 5.45 | 2792.56 | 5.863 |
|
||||
|
||||
#### 精度
|
||||
|
||||
NPU vs 参考 loss.
|
||||
|
||||
![NPU-LOSS](../../sources/images/baichuan/baichuan7B-loss-compare.png)
|
||||
|
||||
NPU vs 参考 loss 相对误差.
|
||||
|
||||
![NPU-Relative-Error](../../sources/images/baichuan/baichuan7B-loss-relative-error.png)
|
||||
|
||||
|
||||
|
165
examples/baichuan/README_en.md
Normal file
@ -0,0 +1,165 @@
|
||||
# BaiChuan
|
||||
<p align="left">
|
||||
<b><a href="https://gitee.com/ascend/ModelLink/blob/modellink/examples/baichuan/README.md">简体中文</a></b> |
|
||||
<b>English</b>
|
||||
</p>
|
||||
</p>
|
||||
|
||||
|
||||
# Contents
|
||||
|
||||
- [Baichuan-7B](#contents)
|
||||
- [Training](#pre-training)
|
||||
- [Script](#script)
|
||||
- [Performance](#performance)
|
||||
- [Machine performance](#machine-performance)
|
||||
- [Accuracy of the loss](#accuracy-of-the-loss)
|
||||
|
||||
|
||||
# Baichuan-7B
|
||||
|
||||
## Training
|
||||
|
||||
Here's a hardware summary of pre-training Baichuan-7B:
|
||||
|
||||
| Hardware | Value |
|
||||
| :------: | :---------------------------------------------: |
|
||||
| NPU | 8 x Ascend NPUs |
|
||||
|
||||
### Script
|
||||
|
||||
1. Clone the repository to your local server:
|
||||
```shell
|
||||
git clone https://gitee.com/ascend/ModelLink.git
|
||||
cd ModeLlink
|
||||
git checkout modellink
|
||||
mkdir logs
|
||||
mkdir ckpt
|
||||
```
|
||||
|
||||
2. Build environment
|
||||
|
||||
```bash
|
||||
# python3.8
|
||||
conda create -n test python=3.8
|
||||
conda activate test
|
||||
|
||||
# install torch and torch_npu
|
||||
pip install torch-2.1.0-cp38-cp38m-linux_aarch64.whl
|
||||
pip install torch_npu-2.1.0.XXX-cp38-cp38m-linux_aarch64.whl
|
||||
pip install apex-0.1_ascend*-cp38-cp38m-linux_aarch64.whl
|
||||
|
||||
# modify the path according to your own ascend-toolkit path
|
||||
source /usr/local/Ascend/ascend-toolkit/set_env.sh
|
||||
|
||||
# install AscendSpeed
|
||||
git clone https://gitee.com/ascend/AscendSpeed.git
|
||||
cd AscendSpeed
|
||||
pip install -r requirements.txt
|
||||
pip3 install -e .
|
||||
cd ..
|
||||
|
||||
# install other packages
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
3. Prepare pretrained weights
|
||||
Download the Baichuan-7B checkpoint from [here](https://huggingface.co/baichuan-inc/Baichuan-7B/tree/main)
|
||||
|
||||
```shell
|
||||
mkdir baichuan-7B-hf
|
||||
cd ./baichuan-7B-hf
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/config.json
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/configuration_baichuan.py
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/generation_config.json
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/handler.py
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/modeling_baichuan.py
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/pytorch_model.bin
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/special_tokens_map.json
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenization_baichuan.py
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenizer.model
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenizer_config.json
|
||||
cd ..
|
||||
```
|
||||
In order to adapt to the baichuan-7B model, the following script is used to convert the model pre-training weights.
|
||||
```shell
|
||||
mkdir weight
|
||||
|
||||
SCRIPT_PATH=./tools/ckpt_convert/llama/convert_weights_from_huggingface.py
|
||||
python $SCRIPT_PATH \
|
||||
--input-model-dir ./baichuan-7B-hf \
|
||||
--output-model-dir ./weight \
|
||||
--tensor-model-parallel-size 8 \
|
||||
--pipeline-model-parallel-size 1 \
|
||||
--type 7B \
|
||||
--pse \
|
||||
--merge-mlp
|
||||
```
|
||||
|
||||
|
||||
4. Prepare dataset
|
||||
|
||||
Download the Baichuan-7B datasets from [here](https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet)
|
||||
|
||||
```shell
|
||||
# download datasets
|
||||
mkdir dataset_baichuan7B
|
||||
cd ./dataset_baichuan7B
|
||||
wget https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
|
||||
cd ..
|
||||
|
||||
# process datasets
|
||||
python ./tools/preprocess_data.py \
|
||||
--input ./dataset_baichuan7B/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
|
||||
--tokenizer-name-or-path ./baichuan-7B-hf \
|
||||
--output-prefix ./dataset_baichuan7B/alpaca \
|
||||
--workers 4 \
|
||||
--log-interval 1000 \
|
||||
--tokenizer-type PretrainedFromHF
|
||||
```
|
||||
|
||||
|
||||
5. Config Baichuan-7B pre-training script : examples/baichuan/pretrain_baichuan_ptd_7B.sh
|
||||
|
||||
```shell
|
||||
# modify the script according to your own ascend-toolkit path
|
||||
source /usr/local/Ascend/ascend-toolkit/set_env.sh
|
||||
|
||||
CKPT_SAVE_DIR="./ckpt"
|
||||
DATA_PATH="./dataset_baichuan7B/alpaca_text_document"
|
||||
TOKENIZER_MODEL="./baichuan-7B-hf/tokenizer.model"
|
||||
CKPT_LOAD_DIR="./baichuan-7B-mt"
|
||||
```
|
||||
|
||||
|
||||
6. Launch Baichuan-7B pre-training script: examples/baichuan/pretrain_baichuan_ptd_7B.sh
|
||||
|
||||
```shell
|
||||
bash examples/baichuan/pretrain_baichuan_ptd_7B.sh
|
||||
```
|
||||
|
||||
|
||||
|
||||
### Performance
|
||||
|
||||
#### Machine performance
|
||||
|
||||
The performance of Baichuan-7B in **Ascend NPU** and **Reference**:
|
||||
|
||||
| Device | Model | total Iterations | throughput rate (samples/s) | throughput rate (tokens/s/p) | single-step time (s/step) |
|
||||
|:----:|:---------:|:----:|:---------------------:|:---------------:|:----------------:|
|
||||
| NPUs | Baichuan-7B | 1000 | 4.78 | 2448.76 | 6.688|
|
||||
| Reference | Baichuan-7B | 1000 | 5.45 | 2792.56 | 5.863 |
|
||||
|
||||
#### Accuracy of the loss
|
||||
|
||||
NPU vs Reference loss.
|
||||
|
||||
![NPU-LOSS](../../sources/images/baichuan/baichuan7B-loss-compare.png)
|
||||
|
||||
NPU vs Reference loss relative error.
|
||||
|
||||
![NPU-Relative-Error](../../sources/images/baichuan/baichuan7B-loss-relative-error.png)
|
||||
|
||||
|
||||
|
89
examples/baichuan/pretrain_baichuan_ptd_7B.sh
Normal file
@ -0,0 +1,89 @@
|
||||
#!/bin/bash
|
||||
|
||||
export CUDA_DEVICE_MAX_CONNECTIONS=1
|
||||
|
||||
GPUS_PER_NODE=8
|
||||
MASTER_ADDR=localhost
|
||||
MASTER_PORT=6000
|
||||
NNODES=1
|
||||
NODE_RANK=0
|
||||
WORLD_SIZE=$(($GPUS_PER_NODE*$NNODES))
|
||||
|
||||
CKPT_SAVE_DIR="your model save ckpt path"
|
||||
DATA_PATH="your data path"
|
||||
TOKENIZER_MODEL="your tokenizer model path"
|
||||
CKPT_LOAD_DIR="your model load ckpt path"
|
||||
|
||||
TP=8
|
||||
PP=1
|
||||
|
||||
DISTRIBUTED_ARGS="
|
||||
--nproc_per_node $GPUS_PER_NODE \
|
||||
--nnodes $NNODES \
|
||||
--node_rank $NODE_RANK \
|
||||
--master_addr $MASTER_ADDR \
|
||||
--master_port $MASTER_PORT
|
||||
"
|
||||
|
||||
GPT_ARGS="
|
||||
--tensor-model-parallel-size ${TP} \
|
||||
--pipeline-model-parallel-size ${PP} \
|
||||
--sequence-parallel \
|
||||
--num-layers 32 \
|
||||
--hidden-size 4096 \
|
||||
--ffn-hidden-size 11008 \
|
||||
--num-attention-heads 32 \
|
||||
--tokenizer-type Llama2Tokenizer \
|
||||
--tokenizer-model ${TOKENIZER_MODEL} \
|
||||
--load ${CKPT_LOAD_DIR} \
|
||||
--seq-length 4096 \
|
||||
--max-position-embeddings 4096 \
|
||||
--micro-batch-size 4 \
|
||||
--global-batch-size 32 \
|
||||
--make-vocab-size-divisible-by 128 \
|
||||
--lr 1e-5 \
|
||||
--train-iters 5000 \
|
||||
--lr-decay-style cosine \
|
||||
--untie-embeddings-and-output-weights \
|
||||
--disable-bias-linear \
|
||||
--attention-dropout 0.0 \
|
||||
--init-method-std 0.01 \
|
||||
--hidden-dropout 0.0 \
|
||||
--position-embedding-type rope \
|
||||
--normalization RMSNorm \
|
||||
--use-fused-rmsnorm \
|
||||
--use-flash-attn \
|
||||
--swiglu \
|
||||
--no-masked-softmax-fusion \
|
||||
--attention-softmax-in-fp32 \
|
||||
--min-lr 1e-6 \
|
||||
--weight-decay 1e-2 \
|
||||
--lr-warmup-fraction 0.1 \
|
||||
--clip-grad 1.0 \
|
||||
--adam-beta1 0.9 \
|
||||
--initial-loss-scale 8188.0 \
|
||||
--adam-beta2 0.95 \
|
||||
--no-gradient-accumulation-fusion \
|
||||
--no-load-optim \
|
||||
--no-load-rng \
|
||||
--fp16
|
||||
"
|
||||
|
||||
DATA_ARGS="
|
||||
--data-path $DATA_PATH \
|
||||
--split 949,50,1
|
||||
"
|
||||
|
||||
OUTPUT_ARGS="
|
||||
--log-interval 1 \
|
||||
--save-interval 1000 \
|
||||
--eval-interval 1000 \
|
||||
--eval-iters 1 \
|
||||
"
|
||||
|
||||
torchrun $DISTRIBUTED_ARGS pretrain_gpt.py \
|
||||
$GPT_ARGS \
|
||||
$DATA_ARGS \
|
||||
$OUTPUT_ARGS \
|
||||
--distributed-backend nccl \
|
||||
--save ${CKPT_SAVE_DIR}
|
166
examples/baichuan2/README.md
Normal file
@ -0,0 +1,166 @@
|
||||
# BaiChuan2
|
||||
<p align="left">
|
||||
<b>简体中文</b> |
|
||||
<b><a href="https://gitee.com/ascend/ModelLink/blob/modellink/examples/baichuan2/README_en.md">English</a> </b>
|
||||
</p>
|
||||
</p>
|
||||
|
||||
|
||||
# 目录
|
||||
- [Baichuan2-7B](#Baichuan2-7B)
|
||||
- [训练](#训练)
|
||||
- [脚本](#脚本)
|
||||
- [性能](#性能)
|
||||
- [吞吐](#吞吐)
|
||||
- [精度](#精度)
|
||||
|
||||
|
||||
# Baichuan2-7B
|
||||
|
||||
## 训练
|
||||
Baichuan2-7B 训练的硬件配置如下:
|
||||
|
||||
| 硬件 | 配置 |
|
||||
|:---:|:---------------:|
|
||||
| NPU | 8 x Ascend NPUs |
|
||||
|
||||
### 脚本
|
||||
|
||||
1. 拷贝仓库到你的个人服务器:
|
||||
```shell
|
||||
git clone https://gitee.com/ascend/ModelLink.git
|
||||
cd ModeLlink
|
||||
git checkout modellink
|
||||
mkdir logs
|
||||
mkdir ckpt
|
||||
```
|
||||
|
||||
2. 搭建环境
|
||||
|
||||
```bash
|
||||
# python3.8
|
||||
conda create -n test python=3.8
|
||||
conda activate test
|
||||
|
||||
# 安装 torch 和 torch_npu
|
||||
pip install torch-2.1.0-cp38-cp38m-linux_aarch64.whl
|
||||
pip install torch_npu-2.1.0.XXX-cp38-cp38m-linux_XXX.whl
|
||||
|
||||
# 修改 ascend-toolkit 路径
|
||||
source /usr/local/Ascend/ascend-toolkit/set_env.sh
|
||||
|
||||
# 安装加速库
|
||||
git clone https://gitee.com/ascend/AscendSpeed.git
|
||||
cd AscendSpeed
|
||||
pip install -r requirements.txt
|
||||
pip3 install -e .
|
||||
cd ..
|
||||
|
||||
# 安装其余依赖库
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
3. (可选)准备预训练权重
|
||||
|
||||
从 [huggingface](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/tree/main) 下载预训练权重:
|
||||
|
||||
```shell
|
||||
mkdir baichuan2-7B-hf
|
||||
cd ./baichuan2-7B-hf
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/config.json
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/configuration_baichuan.py
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/generation_utils.py
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/modeling_baichuan.py
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/pytorch_model-00001-of-00002.bin
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/pytorch_model-00002-of-00002.bin
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/pytorch_model.bin.index.json
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/quantizer.py
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/special_tokens_map.json
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/tokenization_baichuan.py
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/tokenizer.model
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/tokenizer_config.json
|
||||
cd ..
|
||||
```
|
||||
|
||||
接着将hf格式的权重转化为AscendSpeed可以加载的形式:
|
||||
```shell
|
||||
mkdir baichuan2-7B-mt
|
||||
|
||||
SCRIPT_PATH=./tools/ckpt_convert/llama/convert_weights_from_huggingface.py
|
||||
# for ptd
|
||||
python $SCRIPT_PATH \
|
||||
--input-model-dir ./baichuan2-7B-hf \
|
||||
--output-model-dir ./baichuan2-7B-mt \
|
||||
--tensor-model-parallel-size 8 \
|
||||
--pipeline-model-parallel-size 1 \
|
||||
--type 7B \
|
||||
--merge-mlp \
|
||||
--pse
|
||||
```
|
||||
|
||||
|
||||
4. 准备数据集
|
||||
|
||||
从 [这里](https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet) 下载 Baichuan2-7B-Base 的数据集:
|
||||
|
||||
```shell
|
||||
# 下载数据集
|
||||
mkdir dataset_baichuan2-7B
|
||||
cd ./dataset_baichuan2-7B
|
||||
wget https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
|
||||
cd ..
|
||||
|
||||
# 准备数据集
|
||||
python ./tools/preprocess_data.py \
|
||||
--input ./dataset_baichuan2-7B/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
|
||||
--tokenizer-name-or-path ./baichuan2-7B-hf \
|
||||
--output-prefix ./dataset_baichuan2-7B/alpaca \
|
||||
--workers 4 \
|
||||
--log-interval 1000 \
|
||||
--tokenizer-type PretrainedFromHF
|
||||
```
|
||||
|
||||
|
||||
5. 配置 Baichuan2-7B 预训练脚本: examples/baichuan/pretrain_baichuan2_ptd_7B.sh
|
||||
|
||||
```shell
|
||||
# 修改 ascend-toolkit 路径
|
||||
source /usr/local/Ascend/ascend-toolkit/set_env.sh
|
||||
|
||||
# 修改数据集,权重,词表等路径
|
||||
CKPT_SAVE_DIR="./ckpt"
|
||||
DATA_PATH="./dataset_baichuan2-7B/alpaca_text_document"
|
||||
TOKENIZER_MODEL="./baichuan2-7B-hf/tokenizer.model"
|
||||
CKPT_LOAD_DIR="./baichuan2-7B-mt"
|
||||
```
|
||||
|
||||
6. 启动 Baichuan2-7B 预训练脚本: examples/baichuan2/pretrain_baichuan2_ptd_7B.sh
|
||||
|
||||
```shell
|
||||
bash examples/baichuan2/pretrain_baichuan2_ptd_7B.sh
|
||||
```
|
||||
|
||||
### 性能
|
||||
|
||||
#### 吞吐
|
||||
|
||||
Baichuan2-7B 在 **昇腾芯片** 和 **参考芯片** 上的性能对比:
|
||||
|
||||
| 设备 | 模型 | 迭代数 | 样本吞吐 (samples/s) | tokens吞吐 (tokens/s/p) | 单步迭代时间 (s/step) |
|
||||
|:----:|:---------:|:----:|:---------------------:|:---------------:|:----------------:|
|
||||
| NPUs | Baichuan2-7B | 1000 | 4.59 | 2349 | 6.973|
|
||||
| 参考 | Baichuan2-7B | 1000 | 5.40 | 2769 | 5.915 |
|
||||
|
||||
|
||||
|
||||
#### 精度
|
||||
|
||||
NPU vs 参考 loss.
|
||||
|
||||
![NPU-LOSS](../../sources/images/baichuan2/baichuan2-7B-loss-compare.png)
|
||||
|
||||
NPU vs 参考 loss 相对误差.
|
||||
|
||||
![NPU-Relative-Error](../../sources/images/baichuan2/baichuan2-7B-loss-relative-error.png)
|
||||
|
||||
|
168
examples/baichuan2/README_en.md
Normal file
@ -0,0 +1,168 @@
|
||||
# BaiChuan2
|
||||
<p align="left">
|
||||
<b><a href="https://gitee.com/ascend/ModelLink/blob/modellink/examples/baichuan2/README.md">简体中文</a></b> |
|
||||
<b>English</b>
|
||||
</p>
|
||||
</p>
|
||||
|
||||
|
||||
# Contents
|
||||
- [Baichuan2-7B](#contents)
|
||||
- [Training](#pre-training)
|
||||
- [Script](#script)
|
||||
- [Performance](#performance)
|
||||
- [Machine performance](#machine-performance)
|
||||
- [Accuracy of the loss](#accuracy-of-the-loss)
|
||||
|
||||
# Baichuan2-7B
|
||||
|
||||
## Training
|
||||
|
||||
Here's a hardware summary of pre-training Baichuan2-7B:
|
||||
|
||||
| Hardware | Value |
|
||||
| :------: | :---------------------------------------------: |
|
||||
| NPU | 8 x Ascend NPUs |
|
||||
|
||||
|
||||
### Script
|
||||
|
||||
1. Clone the repository to your local server:
|
||||
```shell
|
||||
git clone https://gitee.com/ascend/ModelLink.git
|
||||
cd ModeLlink
|
||||
git checkout -b modellink origin/modellink
|
||||
mkdir logs
|
||||
mkdir ckpt
|
||||
```
|
||||
|
||||
2. Build environment
|
||||
|
||||
```bash
|
||||
# python3.8
|
||||
conda create -n test python=3.8
|
||||
conda activate test
|
||||
|
||||
# install torch and torch_npu
|
||||
pip install torch-2.1.0-cp38-cp38m-linux_aarch64.whl
|
||||
pip install torch_npu-2.1.0.XXX-cp38-cp38m-linux_XXX.whl
|
||||
|
||||
# modify the path according to your own ascend-toolkit path
|
||||
source /usr/local/Ascend/ascend-toolkit/set_env.sh
|
||||
|
||||
# install AscendSpeed
|
||||
git clone https://gitee.com/ascend/AscendSpeed.git
|
||||
cd AscendSpeed
|
||||
pip install -r requirements.txt
|
||||
pip3 install -e .
|
||||
cd ..
|
||||
|
||||
# install other packages
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
|
||||
3. Prepare pretrained weights
|
||||
Download the Baichuan2-7B checkpoint from [here](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/tree/main)
|
||||
|
||||
|
||||
|
||||
```shell
|
||||
mkdir baichuan2-7B-hf
|
||||
cd ./baichuan2-7B-hf
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/config.json
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/configuration_baichuan.py
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/generation_utils.py
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/modeling_baichuan.py
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/pytorch_model-00001-of-00002.bin
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/pytorch_model-00002-of-00002.bin
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/pytorch_model.bin.index.json
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/quantizer.py
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/special_tokens_map.json
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/tokenization_baichuan.py
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/tokenizer.model
|
||||
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/tokenizer_config.json
|
||||
cd ..
|
||||
```
|
||||
|
||||
In order to adapt to the baichuan2-7B model, the following script is used to convert the model pre-training weights.
|
||||
```shell
|
||||
mkdir weight
|
||||
|
||||
SCRIPT_PATH=./tools/ckpt_convert/llama/convert_weights_from_huggingface.py
|
||||
# for ptd
|
||||
python $SCRIPT_PATH \
|
||||
--input-model-dir ./baichuan2-7B-hf \
|
||||
--output-model-dir ./weight-tp8 \
|
||||
--tensor-model-parallel-size 8 \
|
||||
--pipeline-model-parallel-size 1 \
|
||||
--type 7B \
|
||||
--merge-mlp \
|
||||
--pse
|
||||
```
|
||||
|
||||
|
||||
4. Prepare dataset
|
||||
|
||||
Download the Baichuan2-7B-Base datasets from [here](https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet)
|
||||
|
||||
```shell
|
||||
# download datasets
|
||||
mkdir dataset_baichuan2-7B
|
||||
cd ./dataset_baichuan2-7B
|
||||
wget https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
|
||||
cd ..
|
||||
|
||||
# process datasets
|
||||
python ./tools/preprocess_data.py \
|
||||
--input ./dataset_baichuan2-7B/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
|
||||
--tokenizer-name-or-path ./baichuan2-7B-hf \
|
||||
--output-prefix ./dataset_baichuan2-7B/alpaca \
|
||||
--workers 4 \
|
||||
--log-interval 1000 \
|
||||
--tokenizer-type PretrainedFromHF
|
||||
```
|
||||
|
||||
|
||||
5. Config Baichuan2-7B pre-training script : examples/baichuan/pretrain_baichuan2_ptd_7B.sh
|
||||
|
||||
```shell
|
||||
# modify the script according to your own ascend-toolkit path
|
||||
source /usr/local/Ascend/ascend-toolkit/set_env.sh
|
||||
|
||||
# modify script orign dataset path according to your own dataset path
|
||||
CKPT_SAVE_DIR="./ckpt"
|
||||
DATA_PATH="./dataset_baichuan2-7B/alpaca_text_document"
|
||||
TOKENIZER_MODEL="./baichuan2-7B-hf/tokenizer.model"
|
||||
CKPT_LOAD_DIR="./baichuan2-7B-mt"
|
||||
```
|
||||
|
||||
|
||||
6. Launch Baichuan2-7B pre-training script: examples/baichuan2/pretrain_baichuan2_ptd_7B.sh
|
||||
|
||||
```shell
|
||||
bash examples/baichuan2/pretrain_baichuan2_ptd_7B.sh
|
||||
```
|
||||
|
||||
|
||||
### Performance
|
||||
|
||||
#### Machine performance
|
||||
|
||||
The performance of Baichuan2-7B in **Ascend NPU** and **Reference**:
|
||||
|
||||
| Device | Model | total Iterations | throughput rate (samples/s) | throughput rate (tokens/s/p) | single-step time (s/step) |
|
||||
|:----:|:---------:|:----:|:---------------------:|:---------------:|:----------------:|
|
||||
| NPUs | Baichuan2-7B | 1000 | 4.59 | 2349 | 6.973|
|
||||
| Reference | Baichuan2-7B | 1000 | 5.40 | 2769 | 5.915 |
|
||||
|
||||
#### Accuracy of the loss
|
||||
|
||||
NPU vs Reference loss.
|
||||
|
||||
![NPU-LOSS](../../sources/images/baichuan2/baichuan2-7B-loss-compare.png)
|
||||
|
||||
NPU vs Reference loss relative error.
|
||||
|
||||
![NPU-Relative-Error](../../sources/images/baichuan2/baichuan2-7B-loss-relative-error.png)
|
||||
|
89
examples/baichuan2/pretrain_baichuan2_ptd_7B.sh
Normal file
@ -0,0 +1,89 @@
|
||||
#!/bin/bash
|
||||
|
||||
export CUDA_DEVICE_MAX_CONNECTIONS=1
|
||||
|
||||
GPUS_PER_NODE=8
|
||||
MASTER_ADDR=localhost
|
||||
MASTER_PORT=6000
|
||||
NNODES=1
|
||||
NODE_RANK=0
|
||||
WORLD_SIZE=$(($GPUS_PER_NODE*$NNODES))
|
||||
|
||||
CKPT_SAVE_DIR="your model save ckpt path"
|
||||
DATA_PATH="your data path"
|
||||
TOKENIZER_MODEL="your tokenizer model path"
|
||||
CKPT_LOAD_DIR="your model load ckpt path"
|
||||
|
||||
TP=8
|
||||
PP=1
|
||||
|
||||
DISTRIBUTED_ARGS="
|
||||
--nproc_per_node $GPUS_PER_NODE \
|
||||
--nnodes $NNODES \
|
||||
--node_rank $NODE_RANK \
|
||||
--master_addr $MASTER_ADDR \
|
||||
--master_port $MASTER_PORT
|
||||
"
|
||||
|
||||
GPT_ARGS="
|
||||
--tensor-model-parallel-size ${TP} \
|
||||
--pipeline-model-parallel-size ${PP} \
|
||||
--sequence-parallel \
|
||||
--num-layers 32 \
|
||||
--hidden-size 4096 \
|
||||
--ffn-hidden-size 11008 \
|
||||
--num-attention-heads 32 \
|
||||
--tokenizer-type Llama2Tokenizer \
|
||||
--tokenizer-model ${TOKENIZER_MODEL} \
|
||||
--load ${CKPT_LOAD_DIR} \
|
||||
--seq-length 4096 \
|
||||
--max-position-embeddings 4096 \
|
||||
--micro-batch-size 4 \
|
||||
--global-batch-size 32 \
|
||||
--make-vocab-size-divisible-by 128 \
|
||||
--lr 1e-6 \
|
||||
--train-iters 5000 \
|
||||
--lr-decay-style cosine \
|
||||
--untie-embeddings-and-output-weights \
|
||||
--disable-bias-linear \
|
||||
--attention-dropout 0.0 \
|
||||
--init-method-std 0.01 \
|
||||
--hidden-dropout 0.0 \
|
||||
--position-embedding-type rope \
|
||||
--normalization RMSNorm \
|
||||
--use-fused-rmsnorm \
|
||||
--use-flash-attn \
|
||||
--swiglu \
|
||||
--no-masked-softmax-fusion \
|
||||
--attention-softmax-in-fp32 \
|
||||
--min-lr 1e-8 \
|
||||
--weight-decay 1e-2 \
|
||||
--lr-warmup-fraction 0.1 \
|
||||
--clip-grad 1.0 \
|
||||
--adam-beta1 0.9 \
|
||||
--initial-loss-scale 8188.0 \
|
||||
--adam-beta2 0.95 \
|
||||
--no-gradient-accumulation-fusion \
|
||||
--no-load-optim \
|
||||
--no-load-rng \
|
||||
--bf16
|
||||
"
|
||||
|
||||
DATA_ARGS="
|
||||
--data-path $DATA_PATH \
|
||||
--split 949,50,1
|
||||
"
|
||||
|
||||
OUTPUT_ARGS="
|
||||
--log-interval 1 \
|
||||
--save-interval 1000 \
|
||||
--eval-interval 1000 \
|
||||
--eval-iters 1 \
|
||||
"
|
||||
|
||||
torchrun $DISTRIBUTED_ARGS pretrain_gpt.py \
|
||||
$GPT_ARGS \
|
||||
$DATA_ARGS \
|
||||
$OUTPUT_ARGS \
|
||||
--distributed-backend nccl \
|
||||
--save ${CKPT_SAVE_DIR}
|
Before Width: | Height: | Size: 36 KiB |
Before Width: | Height: | Size: 49 KiB |
Before Width: | Height: | Size: 55 KiB |
Before Width: | Height: | Size: 46 KiB |
Before Width: | Height: | Size: 80 KiB |
Before Width: | Height: | Size: 76 KiB |
BIN
sources/images/baichuan/baichuan7B-loss-compare.png
Normal file
After Width: | Height: | Size: 59 KiB |
BIN
sources/images/baichuan/baichuan7B-loss-relative-error.png
Normal file
After Width: | Height: | Size: 54 KiB |
Before Width: | Height: | Size: 25 KiB |
Before Width: | Height: | Size: 32 KiB |
Before Width: | Height: | Size: 192 KiB |
Before Width: | Height: | Size: 228 KiB |
Before Width: | Height: | Size: 156 KiB |
Before Width: | Height: | Size: 101 KiB |
BIN
sources/images/baichuan2/baichuan2-7B-loss-compare.png
Normal file
After Width: | Height: | Size: 65 KiB |
BIN
sources/images/baichuan2/baichuan2-7B-loss-relative-error.png
Normal file
After Width: | Height: | Size: 41 KiB |