mirror of https://gitee.com/ascend/ModelLink.git synced 2024-12-05 05:17:40 +08:00

History

LeiZhenzhen a47f94f2a9 !1345 chatglm3性能优化/增加微调功能 Merge pull request !1345 from LeiZhenzhen/master		2024-06-18 04:03:01 +00:00
..
evaluate_chatglm3_6B.sh	!1231 对chatglm3增加partial_rope支持	2024-04-15 13:11:56 +00:00
generate_chatglm3_6B.sh	!1345 chatglm3性能优化/增加微调功能	2024-06-18 04:03:01 +00:00
pretrain_chatglm3_6B_8K.sh	!1345 chatglm3性能优化/增加微调功能	2024-06-18 04:03:01 +00:00
README_en.md	!1345 chatglm3性能优化/增加微调功能	2024-06-18 04:03:01 +00:00
README.md	!1345 chatglm3性能优化/增加微调功能	2024-06-18 04:03:01 +00:00
tune_chatglm3_6B_8K.sh	!1345 chatglm3性能优化/增加微调功能	2024-06-18 04:03:01 +00:00

README_en.md

ChatGLM

简体中文 | English

ChatGLM3
Contents
ChatGLM3-6B

ChatGLM3-6B

Training

Here's a hardware summary of pre-training ChatGLM3-6B:

Hardware	Value
NPU	8 x Ascend NPUs

Script

Clone the repository to your local server:

git clone https://gitee.com/ascend/ModelLink.git 
git clone https://github.com/NVIDIA/Megatron-LM.git
cd Megatron-LM
git checkout core_r0.6.0
cp -r megatron ../ModelLink/
cd .. 
cd ModelLink
mkdir logs
mkdir model_from_hf
mkdir dataset
mkdir ckpt

Build environment

# python3.8
conda create -n test python=3.8
conda activate test

# install torch and torch_npu
pip install torch-2.1.0-cp38-cp38m-manylinux2014_aarch64.whl
pip install torch_npu-2.1.0*-cp38-cp38m-linux_aarch64.whl
pip install apex-0.1_ascend*-cp38-cp38m-linux_aarch64.whl

# modify ascend-toolkit path
source /usr/local/Ascend/ascend-toolkit/set_env.sh 

# install MindSpeed
git clone https://gitee.com/ascend/MindSpeed.git
cd MindSpeed
git checkout 2b0edd2
pip install -r requirements.txt 
pip install -e .
cd ..

# install other packages
pip install -r requirements.txt

Prepare pretrained weights and tokenizer Download the ChatGLM3-6B checkpoint from here

#!/bin/bash
mkdir ./model_from_hf/chatglm3_6b_hf/
cd ./model_from_hf/chatglm3_6b_hf/
wget https://huggingface.co/THUDM/chatglm3-6b/resolve/main/config.json
wget https://huggingface.co/THUDM/chatglm3-6b/resolve/main/configuration_chatglm.py
wget https://huggingface.co/THUDM/chatglm3-6b/resolve/main/modeling_chatglm.py
wget https://huggingface.co/THUDM/chatglm3-6b/resolve/main/pytorch_model-00001-of-00007.bin
wget https://huggingface.co/THUDM/chatglm3-6b/resolve/main/pytorch_model-00002-of-00007.bin
wget https://huggingface.co/THUDM/chatglm3-6b/resolve/main/pytorch_model-00003-of-00007.bin
wget https://huggingface.co/THUDM/chatglm3-6b/resolve/main/pytorch_model-00004-of-00007.bin
wget https://huggingface.co/THUDM/chatglm3-6b/resolve/main/pytorch_model-00005-of-00007.bin
wget https://huggingface.co/THUDM/chatglm3-6b/resolve/main/pytorch_model-00006-of-00007.bin
wget https://huggingface.co/THUDM/chatglm3-6b/resolve/main/pytorch_model-00007-of-00007.bin
wget https://huggingface.co/THUDM/chatglm3-6b/resolve/main/pytorch_model.bin.index.json
wget https://huggingface.co/THUDM/chatglm3-6b/resolve/main/quantization.py
wget https://huggingface.co/THUDM/chatglm3-6b/resolve/main/tokenization_chatglm.py
wget https://huggingface.co/THUDM/chatglm3-6b/resolve/main/tokenizer.model
wget https://huggingface.co/THUDM/chatglm3-6b/resolve/main/tokenizer_config.json
cd ../../

weight conversion in ptd mode

4.1 Convert weights from HuggingFace format to Megatron format （This scenario is generally used to enable the open-source HuggingFace model to be trained on Megatron）

# modify the script according to your own ascend-toolkit path
source /usr/local/Ascend/ascend-toolkit/set_env.sh

# convert to ptd weights
python tools/checkpoint/convert_ckpt.py \
    --model-type GPT \
    --loader chatglm3_hf \
    --saver megatron \
    --target-tensor-parallel-size 1 \
    --target-pipeline-parallel-size 2 \
    --load-dir ./model_from_hf/chatglm3_6b_hf/ \
    --save-dir ./model_weights/chatglm3_6b_tp1pp2/ \
    --tokenizer-model ./model_from_hf/chatglm3_6b_hf/tokenizer.model \
    --add-qkv-bias

Note: The --target-tensor-parallel-size of chatglm3 is related to the multi_query_attention configuration in the config.json, and the multi_query_attention set here is 2.

4.2 Any Megatron weights with parallel slicing strategy --> Any Megatron weights with parallel slicing strategy

# Modify the ascend-toolkit path
source /usr/local/Ascend/ascend-toolkit/set_env.sh
python tools/checkpoint/convert_ckpt.py \
    --model-type GPT \
    --loader megatron \
    --saver megatron \
    --save-model-type save_huggingface_chatglm3 \
    --load-dir ./model_weights/chatglm3_6b_tp1pp2/ \
    --target-tensor-parallel-size 1 \
    --target-pipeline-parallel-size 1 \
    --add-qkv-bias \
    --save-dir ./model_from_hf/chatglm3_6b_hf/     # <-- Fill in the original HF model path here, new weights will be saved in ./model_from_hf/chatglm3_6b_hf/mg2hg/

pre-training

5.1 Prepare dataset

Download the ChatGLM3-6B datasets from here

# download datasets
cd ./dataset
wget https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
cd ..

# process datasets  
mkdir ./dataset/chatglm3_6b_hf/
python ./tools/preprocess_data.py \
    --input ./dataset/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
    --tokenizer-name-or-path ./model_from_hf/chatglm3_6b_hf/ \
    --output-prefix ./dataset/chatglm3_6b_hf/alpaca \
    --workers 4 \
    --log-interval 1000 \
    --tokenizer-type PretrainedFromHF

5.2 pre-training using ptd mode Config ChatGLM3-6B pre-training script: examples/chatglm3/pretrain_chatglm3_6B_8K.sh

# modify the script according to your own ascend-toolkit path
source /usr/local/Ascend/ascend-toolkit/set_env.sh 

# modify config according to your own actual situation
LOAD_CHECKPOINT_PATH="./model_weights/chatglm3_6b_tp1pp2/"
SAVE_CHECKPOINT_PATH="./ckpt/chatglm3_6b_hf/"
TOKENIZER_PATH="./model_from_hf/chatglm3_6b_hf/"  #tokenizer path
DATA_PATH="./dataset/chatglm3_6b_hf/alpaca_text_document"  #processed dataset

Multi-machine training requires the addition of parameter --overlap-grad-reduce

Launch ChatGLM3-6B pre-training script: examples/chatglm3/pretrain_chatglm3_6B_8K.sh

bash examples/chatglm3/pretrain_chatglm3_6B_8K.sh

Note: If using multi machine training, and no data sharing configuration on the mechines, it's necessary to add the parameter --no-shared-storage. This parameter will determine whether non master nodes need to load data based on distributed parameters, and check the corresponding cache and generated data.

fine-tuning

6.1 Prepare fine-tuning dataset Download the alpaca datasets from here

# download datasets
mkdir finetune_dataset
cd ./finetune_dataset
wget https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
cd ..

# process datasets  
mkdir ./finetune_dataset/chatglm3-6b-hf/
python ./tools/preprocess_data.py \
    --input ./finetune_dataset/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
    --tokenizer-name-or-path ./model_from_hf/chatglm3_6b_hf/ \
    --output-prefix ./finetune_dataset/chatglm3-6b-hf/alpaca \
    --workers 4 \
    --log-interval 1000 \
    --tokenizer-type PretrainedFromHF \
    --handler-name GeneralInstructionHandler \
    --append-eod

6.2 Full Parameters Fine-Tuning The configuration script for full parameters fine-tuning is basically the same as that for pretrain_chatglm3_6B_8K.sh.The difference is that the dataset and the training parameter is-instruction-dataset are added.

Add the fine-tuning parameter --finetune so that fine-tuning starts from the first step. Use --tokenizer-padding-side left.

DATA_PATH="./finetune_dataset/chatglm3-6b-hf/alpaca"
TOKENIZER_PATH="./model_from_hf/chatglm3-6b-hf/"
CKPT_LOAD_DIR="./model_weights/chatglm3_6b_tp1pp2/"
    --load ${CKPT_LOAD_DIR} \
    --finetune \
    --is-instruction-dataset \
    --tokenizer-padding-side left \
    --tokenizer-type PretrainedFromHF \
    --tokenizer-not-use-fast \

Launch ChatGLM3-6B finetune script: examples/chatglm3/tune_chatglm3_6B_8K.sh

bash examples/chatglm3/tune_chatglm3_6B_8K.sh

Performance

Machine performance

The performance of ChatGLM3-6B in Ascend NPU and Reference:

Device	Model	sequence length	throughput rate (tokens/s/p)
NPUs	ChatGLM3-6B	8192	4297
Reference	ChatGLM3-6B	8192	4269

Inference

We support Inference for text generation with ChatGLM3_6B. Inference different from pre-training, such as we need to Load pre-training checkpoint and the length of the output samples:

Config ChatGLM3-6B inference script: examples/chatglm3/generate_chatglm3_6B.sh

# modify the model weight path and tokenizer path
CHECKPOINT="./model_weights/chatglm3_6b_tp1pp2/"
TOKENIZER_PATH="./model_from_hf/chatglm3_6b_hf/"

Launch ChatGLM3-6B inference script.

bash ./examples/chatglm3/generate_chatglm3_6B.sh

Some inference samples are as follows:

Evaluation

Use mmlu benchmark to evaluate our model. MMLU benchmark Download here.

Config chatglm3-6b evaluation script: examples/chatglm3/evaluate_chatglm3_6B.sh

# ascend-toolkit path
source /usr/local/Ascend/ascend-toolkit/set_env.sh

# Modify the model parameter path and vocabulary path
TOKENIZER_PATH="./model_from_hf/chatglm3_6b_hf/"  # vocabulary path
CHECKPOINT="./model_weights/chatglm3_6b_tp2pp4/"  # parameter path

# Configure the task type and dataset path
DATA_PATH="./mmlu/data/test/"
TASK="mmlu"

Launch chatglm3-6b evaluation

bash examples/chatglm3/evaluate_chatglm3_6B.sh

Task	Subset	Question	OpenSource	NPU
MMLU	57	14042	61.4	61.5

README_en.md Unescape Escape