!449 增加baichuan7B/baichuan2-7B adaptor
Merge pull request !449 from xiongliangcheng/modellink
8
OWNERS
@ -30,3 +30,11 @@ reviewers:
|
|||||||
- wenjiang2357
|
- wenjiang2357
|
||||||
- leizhenzhen
|
- leizhenzhen
|
||||||
- liuyanghan
|
- liuyanghan
|
||||||
|
- Ares_Lzk
|
||||||
|
- flying-artillery
|
||||||
|
- xiong-liangcheng_admin
|
||||||
|
- gitee-code-template
|
||||||
|
- yaojia2021
|
||||||
|
- chantcalf
|
||||||
|
- kongfuziyue
|
||||||
|
- yuhui69
|
||||||
|
166
examples/baichuan/README.md
Normal file
@ -0,0 +1,166 @@
|
|||||||
|
# BaiChuan
|
||||||
|
<p align="left">
|
||||||
|
<b>简体中文</b> |
|
||||||
|
<b><a href="https://gitee.com/ascend/ModelLink/blob/modellink/examples/baichuan/README_en.md">English</a> </b>
|
||||||
|
</p>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
# 目录
|
||||||
|
|
||||||
|
- [Baichuan-7B](#Baichuan-7B)
|
||||||
|
- [训练](#训练)
|
||||||
|
- [脚本](#脚本)
|
||||||
|
- [性能](#性能)
|
||||||
|
- [吞吐](#吞吐)
|
||||||
|
- [精度](#精度)
|
||||||
|
|
||||||
|
|
||||||
|
# Baichuan-7B
|
||||||
|
|
||||||
|
## 训练
|
||||||
|
Baichuan-7B 训练的硬件配置如下:
|
||||||
|
|
||||||
|
| 硬件 | 配置 |
|
||||||
|
|:---:|:---------------:|
|
||||||
|
| NPU | 8 x Ascend NPUs |
|
||||||
|
|
||||||
|
### 脚本
|
||||||
|
|
||||||
|
1. 拷贝仓库到你的个人服务器:
|
||||||
|
```shell
|
||||||
|
git clone https://gitee.com/ascend/ModelLink.git
|
||||||
|
cd ModeLlink
|
||||||
|
git checkout modellink
|
||||||
|
mkdir logs
|
||||||
|
mkdir ckpt
|
||||||
|
```
|
||||||
|
|
||||||
|
2. 搭建环境
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# python3.8
|
||||||
|
conda create -n test python=3.8
|
||||||
|
conda activate test
|
||||||
|
|
||||||
|
# 安装 torch 和 torch_npu
|
||||||
|
pip install torch-2.1.0-cp37-cp37m-linux_aarch64.whl
|
||||||
|
pip install torch_npu-2.1.0.XXX-cp37-cp37m-linux_aarch64.whl
|
||||||
|
pip install apex-0.1_ascend*-cp38-cp38m-linux_aarch64.whl
|
||||||
|
|
||||||
|
# 修改 ascend-toolkit 路径
|
||||||
|
source /usr/local/Ascend/ascend-toolkit/set_env.sh
|
||||||
|
|
||||||
|
# 安装加速库
|
||||||
|
git clone https://gitee.com/ascend/AscendSpeed.git
|
||||||
|
cd AscendSpeed
|
||||||
|
pip install -r requirements.txt
|
||||||
|
pip3 install -e .
|
||||||
|
cd ..
|
||||||
|
|
||||||
|
# 安装其余依赖库
|
||||||
|
pip install -r requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
3. (可选)准备预训练权重
|
||||||
|
|
||||||
|
从 [huggingface](https://huggingface.co/baichuan-inc/Baichuan-7B/tree/main) 下载预训练权重:
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
```shell
|
||||||
|
mkdir baichuan-7B-hf
|
||||||
|
cd ./baichuan-7B-hf
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/config.json
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/configuration_baichuan.py
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/generation_config.json
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/handler.py
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/modeling_baichuan.py
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/pytorch_model.bin
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/special_tokens_map.json
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenization_baichuan.py
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenizer.model
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenizer_config.json
|
||||||
|
cd ..
|
||||||
|
```
|
||||||
|
|
||||||
|
接着将hf格式的权重转化为AscendSpeed可以加载的形式:
|
||||||
|
```shell
|
||||||
|
mkdir baichuan-7B-mt
|
||||||
|
|
||||||
|
SCRIPT_PATH=./tools/ckpt_convert/llama/convert_weights_from_huggingface.py
|
||||||
|
python $SCRIPT_PATH \
|
||||||
|
--input-model-dir ./baichuan-7B-hf \
|
||||||
|
--output-model-dir ./baichuan-7B-mt \
|
||||||
|
--tensor-model-parallel-size 8 \
|
||||||
|
--pipeline-model-parallel-size 1 \
|
||||||
|
--type 7B \
|
||||||
|
--pse \
|
||||||
|
--merge-mlp
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
4. 准备数据集
|
||||||
|
|
||||||
|
从 [这里](https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet) 下载 BaiChuan-7B 的数据集:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# 下载数据集
|
||||||
|
mkdir dataset_baichuan7B
|
||||||
|
cd ./dataset_baichuan7B
|
||||||
|
wget https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
|
||||||
|
cd ..
|
||||||
|
|
||||||
|
# 准备数据集
|
||||||
|
python ./tools/preprocess_data.py \
|
||||||
|
--input ./dataset_baichuan7B/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
|
||||||
|
--tokenizer-name-or-path ./baichuan-7B-hf \
|
||||||
|
--output-prefix ./dataset_baichuan7B/alpaca \
|
||||||
|
--workers 4 \
|
||||||
|
--log-interval 1000 \
|
||||||
|
--tokenizer-type PretrainedFromHF
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
5. 配置 Baichuan-7B 预训练脚本: examples/baichuan/pretrain_baichuan_ptd_7B.sh
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# 修改 ascend-toolkit 路径
|
||||||
|
source /usr/local/Ascend/ascend-toolkit/set_env.sh
|
||||||
|
|
||||||
|
CKPT_SAVE_DIR="./ckpt"
|
||||||
|
DATA_PATH="./dataset_baichuan7B/alpaca_text_document"
|
||||||
|
TOKENIZER_MODEL="./baichuan-7B-hf/tokenizer.model"
|
||||||
|
CKPT_LOAD_DIR="./baichuan-7B-mt"
|
||||||
|
```
|
||||||
|
|
||||||
|
6. 启动 Baichuan-7B 预训练脚本: examples/baichuan/pretrain_baichuan_ptd_7B.sh
|
||||||
|
|
||||||
|
```shell
|
||||||
|
bash examples/baichuan/pretrain_baichuan_ptd_7B.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
### 性能
|
||||||
|
|
||||||
|
#### 吞吐
|
||||||
|
|
||||||
|
Baichuan-7B 在 **昇腾芯片** 和 **参考芯片** 上的性能对比:
|
||||||
|
|
||||||
|
| 设备 | 模型 | 迭代数 | 样本吞吐 (samples/s) | tokens吞吐 (tokens/s/p) | 单步迭代时间 (s/step) |
|
||||||
|
|:----:|:---------:|:----:|:---------------------:|:---------------:|:----------------:|
|
||||||
|
| NPUs | Baichuan-7B | 1000 | 4.78 | 2448.76 | 6.688|
|
||||||
|
| 参考 | Baichuan-7B | 1000 | 5.45 | 2792.56 | 5.863 |
|
||||||
|
|
||||||
|
#### 精度
|
||||||
|
|
||||||
|
NPU vs 参考 loss.
|
||||||
|
|
||||||
|
![NPU-LOSS](../../sources/images/baichuan/baichuan7B-loss-compare.png)
|
||||||
|
|
||||||
|
NPU vs 参考 loss 相对误差.
|
||||||
|
|
||||||
|
![NPU-Relative-Error](../../sources/images/baichuan/baichuan7B-loss-relative-error.png)
|
||||||
|
|
||||||
|
|
||||||
|
|
165
examples/baichuan/README_en.md
Normal file
@ -0,0 +1,165 @@
|
|||||||
|
# BaiChuan
|
||||||
|
<p align="left">
|
||||||
|
<b><a href="https://gitee.com/ascend/ModelLink/blob/modellink/examples/baichuan/README.md">简体中文</a></b> |
|
||||||
|
<b>English</b>
|
||||||
|
</p>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
|
||||||
|
# Contents
|
||||||
|
|
||||||
|
- [Baichuan-7B](#contents)
|
||||||
|
- [Training](#pre-training)
|
||||||
|
- [Script](#script)
|
||||||
|
- [Performance](#performance)
|
||||||
|
- [Machine performance](#machine-performance)
|
||||||
|
- [Accuracy of the loss](#accuracy-of-the-loss)
|
||||||
|
|
||||||
|
|
||||||
|
# Baichuan-7B
|
||||||
|
|
||||||
|
## Training
|
||||||
|
|
||||||
|
Here's a hardware summary of pre-training Baichuan-7B:
|
||||||
|
|
||||||
|
| Hardware | Value |
|
||||||
|
| :------: | :---------------------------------------------: |
|
||||||
|
| NPU | 8 x Ascend NPUs |
|
||||||
|
|
||||||
|
### Script
|
||||||
|
|
||||||
|
1. Clone the repository to your local server:
|
||||||
|
```shell
|
||||||
|
git clone https://gitee.com/ascend/ModelLink.git
|
||||||
|
cd ModeLlink
|
||||||
|
git checkout modellink
|
||||||
|
mkdir logs
|
||||||
|
mkdir ckpt
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Build environment
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# python3.8
|
||||||
|
conda create -n test python=3.8
|
||||||
|
conda activate test
|
||||||
|
|
||||||
|
# install torch and torch_npu
|
||||||
|
pip install torch-2.1.0-cp38-cp38m-linux_aarch64.whl
|
||||||
|
pip install torch_npu-2.1.0.XXX-cp38-cp38m-linux_aarch64.whl
|
||||||
|
pip install apex-0.1_ascend*-cp38-cp38m-linux_aarch64.whl
|
||||||
|
|
||||||
|
# modify the path according to your own ascend-toolkit path
|
||||||
|
source /usr/local/Ascend/ascend-toolkit/set_env.sh
|
||||||
|
|
||||||
|
# install AscendSpeed
|
||||||
|
git clone https://gitee.com/ascend/AscendSpeed.git
|
||||||
|
cd AscendSpeed
|
||||||
|
pip install -r requirements.txt
|
||||||
|
pip3 install -e .
|
||||||
|
cd ..
|
||||||
|
|
||||||
|
# install other packages
|
||||||
|
pip install -r requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Prepare pretrained weights
|
||||||
|
Download the Baichuan-7B checkpoint from [here](https://huggingface.co/baichuan-inc/Baichuan-7B/tree/main)
|
||||||
|
|
||||||
|
```shell
|
||||||
|
mkdir baichuan-7B-hf
|
||||||
|
cd ./baichuan-7B-hf
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/config.json
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/configuration_baichuan.py
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/generation_config.json
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/handler.py
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/modeling_baichuan.py
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/pytorch_model.bin
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/special_tokens_map.json
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenization_baichuan.py
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenizer.model
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenizer_config.json
|
||||||
|
cd ..
|
||||||
|
```
|
||||||
|
In order to adapt to the baichuan-7B model, the following script is used to convert the model pre-training weights.
|
||||||
|
```shell
|
||||||
|
mkdir weight
|
||||||
|
|
||||||
|
SCRIPT_PATH=./tools/ckpt_convert/llama/convert_weights_from_huggingface.py
|
||||||
|
python $SCRIPT_PATH \
|
||||||
|
--input-model-dir ./baichuan-7B-hf \
|
||||||
|
--output-model-dir ./weight \
|
||||||
|
--tensor-model-parallel-size 8 \
|
||||||
|
--pipeline-model-parallel-size 1 \
|
||||||
|
--type 7B \
|
||||||
|
--pse \
|
||||||
|
--merge-mlp
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
4. Prepare dataset
|
||||||
|
|
||||||
|
Download the Baichuan-7B datasets from [here](https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet)
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# download datasets
|
||||||
|
mkdir dataset_baichuan7B
|
||||||
|
cd ./dataset_baichuan7B
|
||||||
|
wget https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
|
||||||
|
cd ..
|
||||||
|
|
||||||
|
# process datasets
|
||||||
|
python ./tools/preprocess_data.py \
|
||||||
|
--input ./dataset_baichuan7B/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
|
||||||
|
--tokenizer-name-or-path ./baichuan-7B-hf \
|
||||||
|
--output-prefix ./dataset_baichuan7B/alpaca \
|
||||||
|
--workers 4 \
|
||||||
|
--log-interval 1000 \
|
||||||
|
--tokenizer-type PretrainedFromHF
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
5. Config Baichuan-7B pre-training script : examples/baichuan/pretrain_baichuan_ptd_7B.sh
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# modify the script according to your own ascend-toolkit path
|
||||||
|
source /usr/local/Ascend/ascend-toolkit/set_env.sh
|
||||||
|
|
||||||
|
CKPT_SAVE_DIR="./ckpt"
|
||||||
|
DATA_PATH="./dataset_baichuan7B/alpaca_text_document"
|
||||||
|
TOKENIZER_MODEL="./baichuan-7B-hf/tokenizer.model"
|
||||||
|
CKPT_LOAD_DIR="./baichuan-7B-mt"
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
6. Launch Baichuan-7B pre-training script: examples/baichuan/pretrain_baichuan_ptd_7B.sh
|
||||||
|
|
||||||
|
```shell
|
||||||
|
bash examples/baichuan/pretrain_baichuan_ptd_7B.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
### Performance
|
||||||
|
|
||||||
|
#### Machine performance
|
||||||
|
|
||||||
|
The performance of Baichuan-7B in **Ascend NPU** and **Reference**:
|
||||||
|
|
||||||
|
| Device | Model | total Iterations | throughput rate (samples/s) | throughput rate (tokens/s/p) | single-step time (s/step) |
|
||||||
|
|:----:|:---------:|:----:|:---------------------:|:---------------:|:----------------:|
|
||||||
|
| NPUs | Baichuan-7B | 1000 | 4.78 | 2448.76 | 6.688|
|
||||||
|
| Reference | Baichuan-7B | 1000 | 5.45 | 2792.56 | 5.863 |
|
||||||
|
|
||||||
|
#### Accuracy of the loss
|
||||||
|
|
||||||
|
NPU vs Reference loss.
|
||||||
|
|
||||||
|
![NPU-LOSS](../../sources/images/baichuan/baichuan7B-loss-compare.png)
|
||||||
|
|
||||||
|
NPU vs Reference loss relative error.
|
||||||
|
|
||||||
|
![NPU-Relative-Error](../../sources/images/baichuan/baichuan7B-loss-relative-error.png)
|
||||||
|
|
||||||
|
|
||||||
|
|
89
examples/baichuan/pretrain_baichuan_ptd_7B.sh
Normal file
@ -0,0 +1,89 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
export CUDA_DEVICE_MAX_CONNECTIONS=1
|
||||||
|
|
||||||
|
GPUS_PER_NODE=8
|
||||||
|
MASTER_ADDR=localhost
|
||||||
|
MASTER_PORT=6000
|
||||||
|
NNODES=1
|
||||||
|
NODE_RANK=0
|
||||||
|
WORLD_SIZE=$(($GPUS_PER_NODE*$NNODES))
|
||||||
|
|
||||||
|
CKPT_SAVE_DIR="your model save ckpt path"
|
||||||
|
DATA_PATH="your data path"
|
||||||
|
TOKENIZER_MODEL="your tokenizer model path"
|
||||||
|
CKPT_LOAD_DIR="your model load ckpt path"
|
||||||
|
|
||||||
|
TP=8
|
||||||
|
PP=1
|
||||||
|
|
||||||
|
DISTRIBUTED_ARGS="
|
||||||
|
--nproc_per_node $GPUS_PER_NODE \
|
||||||
|
--nnodes $NNODES \
|
||||||
|
--node_rank $NODE_RANK \
|
||||||
|
--master_addr $MASTER_ADDR \
|
||||||
|
--master_port $MASTER_PORT
|
||||||
|
"
|
||||||
|
|
||||||
|
GPT_ARGS="
|
||||||
|
--tensor-model-parallel-size ${TP} \
|
||||||
|
--pipeline-model-parallel-size ${PP} \
|
||||||
|
--sequence-parallel \
|
||||||
|
--num-layers 32 \
|
||||||
|
--hidden-size 4096 \
|
||||||
|
--ffn-hidden-size 11008 \
|
||||||
|
--num-attention-heads 32 \
|
||||||
|
--tokenizer-type Llama2Tokenizer \
|
||||||
|
--tokenizer-model ${TOKENIZER_MODEL} \
|
||||||
|
--load ${CKPT_LOAD_DIR} \
|
||||||
|
--seq-length 4096 \
|
||||||
|
--max-position-embeddings 4096 \
|
||||||
|
--micro-batch-size 4 \
|
||||||
|
--global-batch-size 32 \
|
||||||
|
--make-vocab-size-divisible-by 128 \
|
||||||
|
--lr 1e-5 \
|
||||||
|
--train-iters 5000 \
|
||||||
|
--lr-decay-style cosine \
|
||||||
|
--untie-embeddings-and-output-weights \
|
||||||
|
--disable-bias-linear \
|
||||||
|
--attention-dropout 0.0 \
|
||||||
|
--init-method-std 0.01 \
|
||||||
|
--hidden-dropout 0.0 \
|
||||||
|
--position-embedding-type rope \
|
||||||
|
--normalization RMSNorm \
|
||||||
|
--use-fused-rmsnorm \
|
||||||
|
--use-flash-attn \
|
||||||
|
--swiglu \
|
||||||
|
--no-masked-softmax-fusion \
|
||||||
|
--attention-softmax-in-fp32 \
|
||||||
|
--min-lr 1e-6 \
|
||||||
|
--weight-decay 1e-2 \
|
||||||
|
--lr-warmup-fraction 0.1 \
|
||||||
|
--clip-grad 1.0 \
|
||||||
|
--adam-beta1 0.9 \
|
||||||
|
--initial-loss-scale 8188.0 \
|
||||||
|
--adam-beta2 0.95 \
|
||||||
|
--no-gradient-accumulation-fusion \
|
||||||
|
--no-load-optim \
|
||||||
|
--no-load-rng \
|
||||||
|
--fp16
|
||||||
|
"
|
||||||
|
|
||||||
|
DATA_ARGS="
|
||||||
|
--data-path $DATA_PATH \
|
||||||
|
--split 949,50,1
|
||||||
|
"
|
||||||
|
|
||||||
|
OUTPUT_ARGS="
|
||||||
|
--log-interval 1 \
|
||||||
|
--save-interval 1000 \
|
||||||
|
--eval-interval 1000 \
|
||||||
|
--eval-iters 1 \
|
||||||
|
"
|
||||||
|
|
||||||
|
torchrun $DISTRIBUTED_ARGS pretrain_gpt.py \
|
||||||
|
$GPT_ARGS \
|
||||||
|
$DATA_ARGS \
|
||||||
|
$OUTPUT_ARGS \
|
||||||
|
--distributed-backend nccl \
|
||||||
|
--save ${CKPT_SAVE_DIR}
|
166
examples/baichuan2/README.md
Normal file
@ -0,0 +1,166 @@
|
|||||||
|
# BaiChuan2
|
||||||
|
<p align="left">
|
||||||
|
<b>简体中文</b> |
|
||||||
|
<b><a href="https://gitee.com/ascend/ModelLink/blob/modellink/examples/baichuan2/README_en.md">English</a> </b>
|
||||||
|
</p>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
|
||||||
|
# 目录
|
||||||
|
- [Baichuan2-7B](#Baichuan2-7B)
|
||||||
|
- [训练](#训练)
|
||||||
|
- [脚本](#脚本)
|
||||||
|
- [性能](#性能)
|
||||||
|
- [吞吐](#吞吐)
|
||||||
|
- [精度](#精度)
|
||||||
|
|
||||||
|
|
||||||
|
# Baichuan2-7B
|
||||||
|
|
||||||
|
## 训练
|
||||||
|
Baichuan2-7B 训练的硬件配置如下:
|
||||||
|
|
||||||
|
| 硬件 | 配置 |
|
||||||
|
|:---:|:---------------:|
|
||||||
|
| NPU | 8 x Ascend NPUs |
|
||||||
|
|
||||||
|
### 脚本
|
||||||
|
|
||||||
|
1. 拷贝仓库到你的个人服务器:
|
||||||
|
```shell
|
||||||
|
git clone https://gitee.com/ascend/ModelLink.git
|
||||||
|
cd ModeLlink
|
||||||
|
git checkout modellink
|
||||||
|
mkdir logs
|
||||||
|
mkdir ckpt
|
||||||
|
```
|
||||||
|
|
||||||
|
2. 搭建环境
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# python3.8
|
||||||
|
conda create -n test python=3.8
|
||||||
|
conda activate test
|
||||||
|
|
||||||
|
# 安装 torch 和 torch_npu
|
||||||
|
pip install torch-2.1.0-cp38-cp38m-linux_aarch64.whl
|
||||||
|
pip install torch_npu-2.1.0.XXX-cp38-cp38m-linux_XXX.whl
|
||||||
|
|
||||||
|
# 修改 ascend-toolkit 路径
|
||||||
|
source /usr/local/Ascend/ascend-toolkit/set_env.sh
|
||||||
|
|
||||||
|
# 安装加速库
|
||||||
|
git clone https://gitee.com/ascend/AscendSpeed.git
|
||||||
|
cd AscendSpeed
|
||||||
|
pip install -r requirements.txt
|
||||||
|
pip3 install -e .
|
||||||
|
cd ..
|
||||||
|
|
||||||
|
# 安装其余依赖库
|
||||||
|
pip install -r requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
3. (可选)准备预训练权重
|
||||||
|
|
||||||
|
从 [huggingface](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/tree/main) 下载预训练权重:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
mkdir baichuan2-7B-hf
|
||||||
|
cd ./baichuan2-7B-hf
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/config.json
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/configuration_baichuan.py
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/generation_utils.py
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/modeling_baichuan.py
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/pytorch_model-00001-of-00002.bin
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/pytorch_model-00002-of-00002.bin
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/pytorch_model.bin.index.json
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/quantizer.py
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/special_tokens_map.json
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/tokenization_baichuan.py
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/tokenizer.model
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/tokenizer_config.json
|
||||||
|
cd ..
|
||||||
|
```
|
||||||
|
|
||||||
|
接着将hf格式的权重转化为AscendSpeed可以加载的形式:
|
||||||
|
```shell
|
||||||
|
mkdir baichuan2-7B-mt
|
||||||
|
|
||||||
|
SCRIPT_PATH=./tools/ckpt_convert/llama/convert_weights_from_huggingface.py
|
||||||
|
# for ptd
|
||||||
|
python $SCRIPT_PATH \
|
||||||
|
--input-model-dir ./baichuan2-7B-hf \
|
||||||
|
--output-model-dir ./baichuan2-7B-mt \
|
||||||
|
--tensor-model-parallel-size 8 \
|
||||||
|
--pipeline-model-parallel-size 1 \
|
||||||
|
--type 7B \
|
||||||
|
--merge-mlp \
|
||||||
|
--pse
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
4. 准备数据集
|
||||||
|
|
||||||
|
从 [这里](https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet) 下载 Baichuan2-7B-Base 的数据集:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# 下载数据集
|
||||||
|
mkdir dataset_baichuan2-7B
|
||||||
|
cd ./dataset_baichuan2-7B
|
||||||
|
wget https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
|
||||||
|
cd ..
|
||||||
|
|
||||||
|
# 准备数据集
|
||||||
|
python ./tools/preprocess_data.py \
|
||||||
|
--input ./dataset_baichuan2-7B/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
|
||||||
|
--tokenizer-name-or-path ./baichuan2-7B-hf \
|
||||||
|
--output-prefix ./dataset_baichuan2-7B/alpaca \
|
||||||
|
--workers 4 \
|
||||||
|
--log-interval 1000 \
|
||||||
|
--tokenizer-type PretrainedFromHF
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
5. 配置 Baichuan2-7B 预训练脚本: examples/baichuan/pretrain_baichuan2_ptd_7B.sh
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# 修改 ascend-toolkit 路径
|
||||||
|
source /usr/local/Ascend/ascend-toolkit/set_env.sh
|
||||||
|
|
||||||
|
# 修改数据集,权重,词表等路径
|
||||||
|
CKPT_SAVE_DIR="./ckpt"
|
||||||
|
DATA_PATH="./dataset_baichuan2-7B/alpaca_text_document"
|
||||||
|
TOKENIZER_MODEL="./baichuan2-7B-hf/tokenizer.model"
|
||||||
|
CKPT_LOAD_DIR="./baichuan2-7B-mt"
|
||||||
|
```
|
||||||
|
|
||||||
|
6. 启动 Baichuan2-7B 预训练脚本: examples/baichuan2/pretrain_baichuan2_ptd_7B.sh
|
||||||
|
|
||||||
|
```shell
|
||||||
|
bash examples/baichuan2/pretrain_baichuan2_ptd_7B.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
### 性能
|
||||||
|
|
||||||
|
#### 吞吐
|
||||||
|
|
||||||
|
Baichuan2-7B 在 **昇腾芯片** 和 **参考芯片** 上的性能对比:
|
||||||
|
|
||||||
|
| 设备 | 模型 | 迭代数 | 样本吞吐 (samples/s) | tokens吞吐 (tokens/s/p) | 单步迭代时间 (s/step) |
|
||||||
|
|:----:|:---------:|:----:|:---------------------:|:---------------:|:----------------:|
|
||||||
|
| NPUs | Baichuan2-7B | 1000 | 4.59 | 2349 | 6.973|
|
||||||
|
| 参考 | Baichuan2-7B | 1000 | 5.40 | 2769 | 5.915 |
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
#### 精度
|
||||||
|
|
||||||
|
NPU vs 参考 loss.
|
||||||
|
|
||||||
|
![NPU-LOSS](../../sources/images/baichuan2/baichuan2-7B-loss-compare.png)
|
||||||
|
|
||||||
|
NPU vs 参考 loss 相对误差.
|
||||||
|
|
||||||
|
![NPU-Relative-Error](../../sources/images/baichuan2/baichuan2-7B-loss-relative-error.png)
|
||||||
|
|
||||||
|
|
168
examples/baichuan2/README_en.md
Normal file
@ -0,0 +1,168 @@
|
|||||||
|
# BaiChuan2
|
||||||
|
<p align="left">
|
||||||
|
<b><a href="https://gitee.com/ascend/ModelLink/blob/modellink/examples/baichuan2/README.md">简体中文</a></b> |
|
||||||
|
<b>English</b>
|
||||||
|
</p>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
|
||||||
|
# Contents
|
||||||
|
- [Baichuan2-7B](#contents)
|
||||||
|
- [Training](#pre-training)
|
||||||
|
- [Script](#script)
|
||||||
|
- [Performance](#performance)
|
||||||
|
- [Machine performance](#machine-performance)
|
||||||
|
- [Accuracy of the loss](#accuracy-of-the-loss)
|
||||||
|
|
||||||
|
# Baichuan2-7B
|
||||||
|
|
||||||
|
## Training
|
||||||
|
|
||||||
|
Here's a hardware summary of pre-training Baichuan2-7B:
|
||||||
|
|
||||||
|
| Hardware | Value |
|
||||||
|
| :------: | :---------------------------------------------: |
|
||||||
|
| NPU | 8 x Ascend NPUs |
|
||||||
|
|
||||||
|
|
||||||
|
### Script
|
||||||
|
|
||||||
|
1. Clone the repository to your local server:
|
||||||
|
```shell
|
||||||
|
git clone https://gitee.com/ascend/ModelLink.git
|
||||||
|
cd ModeLlink
|
||||||
|
git checkout -b modellink origin/modellink
|
||||||
|
mkdir logs
|
||||||
|
mkdir ckpt
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Build environment
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# python3.8
|
||||||
|
conda create -n test python=3.8
|
||||||
|
conda activate test
|
||||||
|
|
||||||
|
# install torch and torch_npu
|
||||||
|
pip install torch-2.1.0-cp38-cp38m-linux_aarch64.whl
|
||||||
|
pip install torch_npu-2.1.0.XXX-cp38-cp38m-linux_XXX.whl
|
||||||
|
|
||||||
|
# modify the path according to your own ascend-toolkit path
|
||||||
|
source /usr/local/Ascend/ascend-toolkit/set_env.sh
|
||||||
|
|
||||||
|
# install AscendSpeed
|
||||||
|
git clone https://gitee.com/ascend/AscendSpeed.git
|
||||||
|
cd AscendSpeed
|
||||||
|
pip install -r requirements.txt
|
||||||
|
pip3 install -e .
|
||||||
|
cd ..
|
||||||
|
|
||||||
|
# install other packages
|
||||||
|
pip install -r requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
3. Prepare pretrained weights
|
||||||
|
Download the Baichuan2-7B checkpoint from [here](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/tree/main)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
```shell
|
||||||
|
mkdir baichuan2-7B-hf
|
||||||
|
cd ./baichuan2-7B-hf
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/config.json
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/configuration_baichuan.py
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/generation_utils.py
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/modeling_baichuan.py
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/pytorch_model-00001-of-00002.bin
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/pytorch_model-00002-of-00002.bin
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/pytorch_model.bin.index.json
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/quantizer.py
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/special_tokens_map.json
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/tokenization_baichuan.py
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/tokenizer.model
|
||||||
|
wget https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/tokenizer_config.json
|
||||||
|
cd ..
|
||||||
|
```
|
||||||
|
|
||||||
|
In order to adapt to the baichuan2-7B model, the following script is used to convert the model pre-training weights.
|
||||||
|
```shell
|
||||||
|
mkdir weight
|
||||||
|
|
||||||
|
SCRIPT_PATH=./tools/ckpt_convert/llama/convert_weights_from_huggingface.py
|
||||||
|
# for ptd
|
||||||
|
python $SCRIPT_PATH \
|
||||||
|
--input-model-dir ./baichuan2-7B-hf \
|
||||||
|
--output-model-dir ./weight-tp8 \
|
||||||
|
--tensor-model-parallel-size 8 \
|
||||||
|
--pipeline-model-parallel-size 1 \
|
||||||
|
--type 7B \
|
||||||
|
--merge-mlp \
|
||||||
|
--pse
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
4. Prepare dataset
|
||||||
|
|
||||||
|
Download the Baichuan2-7B-Base datasets from [here](https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet)
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# download datasets
|
||||||
|
mkdir dataset_baichuan2-7B
|
||||||
|
cd ./dataset_baichuan2-7B
|
||||||
|
wget https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
|
||||||
|
cd ..
|
||||||
|
|
||||||
|
# process datasets
|
||||||
|
python ./tools/preprocess_data.py \
|
||||||
|
--input ./dataset_baichuan2-7B/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
|
||||||
|
--tokenizer-name-or-path ./baichuan2-7B-hf \
|
||||||
|
--output-prefix ./dataset_baichuan2-7B/alpaca \
|
||||||
|
--workers 4 \
|
||||||
|
--log-interval 1000 \
|
||||||
|
--tokenizer-type PretrainedFromHF
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
5. Config Baichuan2-7B pre-training script : examples/baichuan/pretrain_baichuan2_ptd_7B.sh
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# modify the script according to your own ascend-toolkit path
|
||||||
|
source /usr/local/Ascend/ascend-toolkit/set_env.sh
|
||||||
|
|
||||||
|
# modify script orign dataset path according to your own dataset path
|
||||||
|
CKPT_SAVE_DIR="./ckpt"
|
||||||
|
DATA_PATH="./dataset_baichuan2-7B/alpaca_text_document"
|
||||||
|
TOKENIZER_MODEL="./baichuan2-7B-hf/tokenizer.model"
|
||||||
|
CKPT_LOAD_DIR="./baichuan2-7B-mt"
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
6. Launch Baichuan2-7B pre-training script: examples/baichuan2/pretrain_baichuan2_ptd_7B.sh
|
||||||
|
|
||||||
|
```shell
|
||||||
|
bash examples/baichuan2/pretrain_baichuan2_ptd_7B.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
### Performance
|
||||||
|
|
||||||
|
#### Machine performance
|
||||||
|
|
||||||
|
The performance of Baichuan2-7B in **Ascend NPU** and **Reference**:
|
||||||
|
|
||||||
|
| Device | Model | total Iterations | throughput rate (samples/s) | throughput rate (tokens/s/p) | single-step time (s/step) |
|
||||||
|
|:----:|:---------:|:----:|:---------------------:|:---------------:|:----------------:|
|
||||||
|
| NPUs | Baichuan2-7B | 1000 | 4.59 | 2349 | 6.973|
|
||||||
|
| Reference | Baichuan2-7B | 1000 | 5.40 | 2769 | 5.915 |
|
||||||
|
|
||||||
|
#### Accuracy of the loss
|
||||||
|
|
||||||
|
NPU vs Reference loss.
|
||||||
|
|
||||||
|
![NPU-LOSS](../../sources/images/baichuan2/baichuan2-7B-loss-compare.png)
|
||||||
|
|
||||||
|
NPU vs Reference loss relative error.
|
||||||
|
|
||||||
|
![NPU-Relative-Error](../../sources/images/baichuan2/baichuan2-7B-loss-relative-error.png)
|
||||||
|
|
89
examples/baichuan2/pretrain_baichuan2_ptd_7B.sh
Normal file
@ -0,0 +1,89 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
export CUDA_DEVICE_MAX_CONNECTIONS=1
|
||||||
|
|
||||||
|
GPUS_PER_NODE=8
|
||||||
|
MASTER_ADDR=localhost
|
||||||
|
MASTER_PORT=6000
|
||||||
|
NNODES=1
|
||||||
|
NODE_RANK=0
|
||||||
|
WORLD_SIZE=$(($GPUS_PER_NODE*$NNODES))
|
||||||
|
|
||||||
|
CKPT_SAVE_DIR="your model save ckpt path"
|
||||||
|
DATA_PATH="your data path"
|
||||||
|
TOKENIZER_MODEL="your tokenizer model path"
|
||||||
|
CKPT_LOAD_DIR="your model load ckpt path"
|
||||||
|
|
||||||
|
TP=8
|
||||||
|
PP=1
|
||||||
|
|
||||||
|
DISTRIBUTED_ARGS="
|
||||||
|
--nproc_per_node $GPUS_PER_NODE \
|
||||||
|
--nnodes $NNODES \
|
||||||
|
--node_rank $NODE_RANK \
|
||||||
|
--master_addr $MASTER_ADDR \
|
||||||
|
--master_port $MASTER_PORT
|
||||||
|
"
|
||||||
|
|
||||||
|
GPT_ARGS="
|
||||||
|
--tensor-model-parallel-size ${TP} \
|
||||||
|
--pipeline-model-parallel-size ${PP} \
|
||||||
|
--sequence-parallel \
|
||||||
|
--num-layers 32 \
|
||||||
|
--hidden-size 4096 \
|
||||||
|
--ffn-hidden-size 11008 \
|
||||||
|
--num-attention-heads 32 \
|
||||||
|
--tokenizer-type Llama2Tokenizer \
|
||||||
|
--tokenizer-model ${TOKENIZER_MODEL} \
|
||||||
|
--load ${CKPT_LOAD_DIR} \
|
||||||
|
--seq-length 4096 \
|
||||||
|
--max-position-embeddings 4096 \
|
||||||
|
--micro-batch-size 4 \
|
||||||
|
--global-batch-size 32 \
|
||||||
|
--make-vocab-size-divisible-by 128 \
|
||||||
|
--lr 1e-6 \
|
||||||
|
--train-iters 5000 \
|
||||||
|
--lr-decay-style cosine \
|
||||||
|
--untie-embeddings-and-output-weights \
|
||||||
|
--disable-bias-linear \
|
||||||
|
--attention-dropout 0.0 \
|
||||||
|
--init-method-std 0.01 \
|
||||||
|
--hidden-dropout 0.0 \
|
||||||
|
--position-embedding-type rope \
|
||||||
|
--normalization RMSNorm \
|
||||||
|
--use-fused-rmsnorm \
|
||||||
|
--use-flash-attn \
|
||||||
|
--swiglu \
|
||||||
|
--no-masked-softmax-fusion \
|
||||||
|
--attention-softmax-in-fp32 \
|
||||||
|
--min-lr 1e-8 \
|
||||||
|
--weight-decay 1e-2 \
|
||||||
|
--lr-warmup-fraction 0.1 \
|
||||||
|
--clip-grad 1.0 \
|
||||||
|
--adam-beta1 0.9 \
|
||||||
|
--initial-loss-scale 8188.0 \
|
||||||
|
--adam-beta2 0.95 \
|
||||||
|
--no-gradient-accumulation-fusion \
|
||||||
|
--no-load-optim \
|
||||||
|
--no-load-rng \
|
||||||
|
--bf16
|
||||||
|
"
|
||||||
|
|
||||||
|
DATA_ARGS="
|
||||||
|
--data-path $DATA_PATH \
|
||||||
|
--split 949,50,1
|
||||||
|
"
|
||||||
|
|
||||||
|
OUTPUT_ARGS="
|
||||||
|
--log-interval 1 \
|
||||||
|
--save-interval 1000 \
|
||||||
|
--eval-interval 1000 \
|
||||||
|
--eval-iters 1 \
|
||||||
|
"
|
||||||
|
|
||||||
|
torchrun $DISTRIBUTED_ARGS pretrain_gpt.py \
|
||||||
|
$GPT_ARGS \
|
||||||
|
$DATA_ARGS \
|
||||||
|
$OUTPUT_ARGS \
|
||||||
|
--distributed-backend nccl \
|
||||||
|
--save ${CKPT_SAVE_DIR}
|
Before Width: | Height: | Size: 36 KiB |
Before Width: | Height: | Size: 49 KiB |
Before Width: | Height: | Size: 55 KiB |
Before Width: | Height: | Size: 46 KiB |
Before Width: | Height: | Size: 80 KiB |
Before Width: | Height: | Size: 76 KiB |
BIN
sources/images/baichuan/baichuan7B-loss-compare.png
Normal file
After Width: | Height: | Size: 59 KiB |
BIN
sources/images/baichuan/baichuan7B-loss-relative-error.png
Normal file
After Width: | Height: | Size: 54 KiB |
Before Width: | Height: | Size: 25 KiB |
Before Width: | Height: | Size: 32 KiB |
Before Width: | Height: | Size: 192 KiB |
Before Width: | Height: | Size: 228 KiB |
Before Width: | Height: | Size: 156 KiB |
Before Width: | Height: | Size: 101 KiB |
BIN
sources/images/baichuan2/baichuan2-7B-loss-compare.png
Normal file
After Width: | Height: | Size: 65 KiB |
BIN
sources/images/baichuan2/baichuan2-7B-loss-relative-error.png
Normal file
After Width: | Height: | Size: 41 KiB |