Go to file
商元义 baf8f2237f !1349 修复Qwen1.5错误
Merge pull request !1349 from 商元义/master
2024-06-20 07:29:12 +00:00
ci !1344 修复padding_side为left时attention_mask不匹配问题 2024-06-18 03:50:18 +00:00
examples !1349 修复Qwen1.5错误 2024-06-20 07:29:12 +00:00
modellink !1345 chatglm3性能优化/增加微调功能 2024-06-18 04:03:01 +00:00
sources/images !1331 添加Qwen1.5-0.5B适配 2024-06-17 06:04:05 +00:00
tests !1355 看护m-core中的ep特性 2024-06-20 04:19:46 +00:00
tools !1345 chatglm3性能优化/增加微调功能 2024-06-18 04:03:01 +00:00
.gitignore !1306 feat: 给评估任务加上总文件及子任务进度条 2024-05-30 01:37:41 +00:00
evaluation.py !1314 增加st和ut用例 2024-06-13 14:37:40 +00:00
inference.py !1314 增加st和ut用例 2024-06-13 14:37:40 +00:00
LICENSE Initial commit 2023-05-25 02:15:25 +00:00
OWNERS !1326 更新owners文件 2024-06-04 07:17:42 +00:00
pretrain_gpt.py !1344 修复padding_side为left时attention_mask不匹配问题 2024-06-18 03:50:18 +00:00
README_en.md !1351 首页readme序列长度排版更新 2024-06-18 11:14:46 +00:00
README.md !1351 首页readme序列长度排版更新 2024-06-18 11:14:46 +00:00
requirements.txt !1320 gemma模型参数优化 2024-06-03 08:14:09 +00:00
SECURITY.md fork megatron-deepspeed code. 2023-05-25 14:49:59 +08:00
SECURITYNOTE.md !1330 添加securitynote 2024-06-05 08:18:58 +00:00
setup.py !1342 脚本内删除overlap-param-gather,删除无效网址链接和无效markdown 2024-06-12 12:42:28 +00:00

GitHub Documentation

简体中文 | English

ModelLink provides end-to-end solutions for large language models on Ascend chips, including models, algorithms and tasks.


Supported features

Current ModelLink supported features for large model usage:

More novel and useful features are developing for LLMs training on Ascend ...

Supported Models

Current ModelLink supports pre-training and fine-tuning for the following models:

Model Parameters Sequence length Pretrain Inference LoRA SFT Chat Evaluation Contributor
Aquila 7B 2K pretrain generate -- -- -- eval 【Ascend】
Aquila2 7B 2K pretrain generate -- -- -- eval 【Community】
34B 4K pretrain generate -- -- -- eval 【Community】
Baichuan 7B 4K pretrain generate -- -- -- eval 【Ascend】
13B 4K pretrain generate -- -- -- eval 【Ascend】
Baichuan2 7B 4K pretrain generate -- -- -- eval 【Ascend】
13B 4K pretrain generate -- -- -- eval 【Ascend】
Bloom 7B1 2K pretrain generate -- -- -- eval 【Ascend】
176B 2K pretrain generate -- -- -- eval 【Ascend】
ChatGLM3 6B 8K pretrain generate -- -- -- eval 【Community】
CodeLlama 34B 4K pretrain generate -- -- -- eval 【Community】
InternLM 7B 2K pretrain generate -- -- -- eval 【Ascend】
65B 2K pretrain -- -- -- -- -- 【Ascend】
LLaMA 7B 2K pretrain generate lora -- -- eval 【Ascend】
13B 2K pretrain generate lora -- -- eval 【Ascend】
33B 2K pretrain generate lora -- -- eval 【Ascend】
65B 2K pretrain generate lora -- -- eval 【Ascend】
LLaMA2 7B 4K pretrain generate lora -- -- eval 【Ascend】
13B 4K pretrain generate lora -- -- eval 【Ascend】
34B 4K pretrain generate lora -- -- eval 【Ascend】
70B 4K pretrain generate lora -- -- eval 【Ascend】
LLaMA3 8B 8K pretrain generate -- -- chat eval 【Ascend】
70B 8K pretrain generate -- -- -- eval 【Ascend】
Qwen 7B 8K pretrain generate -- -- -- eval 【Ascend】
14B 2K pretrain generate -- -- -- eval 【Ascend】
72B 8K pretrain generate -- -- -- eval 【Ascend】
Qwen1.5 0.5B 8K pretrain generate -- -- -- eval 【Community】
1.8B 8K pretrain generate -- -- -- eval 【Community】
4B 8K pretrain generate -- -- -- eval 【Community】
7B 8K pretrain generate -- -- -- eval 【Community】
14B 8K pretrain generate -- -- -- eval 【Community】
32B 8K pretrain generate lora -- -- eval 【Community】
72B 8K pretrain generate lora -- -- eval 【Ascend】
Yi 34B 4K pretrain generate -- -- -- eval 【Community】
Mixtral 8x7B 32K pretrain generate -- -- -- eval 【Ascend】
Mistral 7B 32K pretrain generate -- -- -- eval 【Ascend】
Gemma 2B 8K pretrain generate -- -- -- eval 【Ascend】
7B 8K pretrain generate lora -- -- eval 【Ascend】
GPT3 175B 2K pretrain -- -- -- -- -- 【Community】

Script Naming Rules

Script Rule
pretrain_xxx.sh Pre-training Script
tune_xxx.sh Fine-tuning Script
generate_xxx.sh Inference Script
xxx_chat_xxx.sh Chat Script
evaluation_xxx.sh Evaluation Script

Model Usage Guide and Version Notes

Model Usage Guide and Version Notes For the supported models listed above, we provide training scripts and readme instructions in the examples folder, which contain detailed processes for model training, inference, and evaluation.

【Please note the corresponding environment versions for model usage, as follows】

Software Version
Python 3.8
driver under development version
firmware under development version
CANN under development version
torch 2.1.0、2.2.0
torch_npu under development version

【Based on the current version of megatron, the performance statistics from our testing are as follows (Hardware infoAtlas 900 A2 PODc)】

Model Parameters Sequence length Cluster Scale Precision Mode Performance Reference Performance
Aquila 7B 2K 1x8 BF16 2849 2874
Aquila2 7B 2K 1x8 FP16 3323 2673
34B 4K 2x8 BF16 854 732
Baichuan 7B 4K 1x8 FP16 2685 2036
13B 4K 1x8 FP16 1213 862
Baichuan2 7B 4K 1x8 BF16 2664 3969
13B 4K 1x8 BF16 1668 2062
Bloom 7B1 2K 1x8 FP16 2034 2525
176B 2K 12x8 BF16 100 107
ChatGLM3 6B 8K 1x8 FP16 4297 4267
CodeLlama 34B 4K 2x8 BF16 837 762
InternLM 7B 2K 1x8 BF16 2776 2854
65B 2K 4x8 BF16 341 414
LLaMA 7B 2K 1x8 FP16 3600 3804
13B 2K 1x8 FP16 1895 2012
33B 2K 4x8 FP16 621 776
65B 2K 4x8
BF16 348 426
LLaMA2 7B 4K 1x8 BF16 4200 3850
13B 4K 1x8 BF16 1990 1920
34B 4K 2x8 BF16 749 796
70B 4K 4x8 BF16 420 430
LLaMA3 8B 8K 1x8 BF16 2483 2674
70B 8K 8x8 BF16 283 355
Qwen 7B 8K 1x8 BF16 2499 2867
14B 2K 1x8 BF16 1560 1578
72B 8K 16x8 BF16 285 345
Qwen1.5 0.5B 8K 1x8 BF16 22834 25306
1.8B 8K 1x8 BF16 13029 12181
4B 8K 1x8 BF16 5033 5328
7B 8K 1x8 BF16 2862 2621
14B 8K 1x8 BF16 1717 1702
32B 8K 4x8 BF16 751 708
72B 8K 8x8 BF16 301 317
Yi 34B 4K 2x8 BF16 809 730
Mixtral 8x7B 32K 2x8 BF16 487 610
Mistral 7B 32K 1x8 BF16 2806 2734
Gemma 2B 8K 1x8 BF16 6821 7602
7B 8K 1x8 BF16 2938 2607
GPT3 175B 2K 16x8 FP16 153 --

Acceleration Features

ModelLink supports various acceleration algorithms such as tensor parallelism, pipeline parallelism, sequence parallelism, recomputation, distributed optimizer, and more. The table below shows the enable switches corresponding to each acceleration feature:

Acceleration Feature Enable Parameter
Tensor Parallel --tensor-model-parallel-size
Pipeline Parallel --pipeline-model-parallel-size
Dynamic division for PP --num-layer-list
Sequence Parallel --sequence-parallel
Recomputation --recompute-granularity
Distributed Optimizer --use-distributed-optimizer
overlap DDP allreduce --overlap-grad-reduce
Flash attention --use-flash-attn
Fused rmsnorm --use-fused-rmsnorm
Fused swiglu --use-fused-swiglu
mc2 --use-mc2
Fused rotary
position embedding
--use-fused-rotary-pos-emb
Sliding window attention --sliding-window
torchrun $DISTRIBUTED_ARGS pretrain_gpt.py \
    --tensor-model-parallel-size ${TP} \
    --pipeline-model-parallel-size ${PP} \
    --num-layer-list 1,2,2,2,1 \
    --sequence-parallel \
    --recompute-granularity full \
    --recompute-method block \
    --recompute-num-layers 72 \
    --use-distributed-optimizer \
    --use-flash-attn \
    --use-fused-rmsnorm \
    --use-fused-swiglu \
    --overlap-grad-reduce \
    --use-fused-rotary-pos-emb \
    --use-mc2 \
    --sliding-window 4096 \
    ... \
    ...

Analyze profiling data based on Ascend chips

Modellink supports analyze profiling data based on Ascend chips, which is useful for modelling:

--profile                        # enable profiling
--profile-step-start  5          # the start step
--profile-step-end 6             # the end step
--profile-ranks 0 1 2 3 4        # ranks for profiling
--profile-level level2           # level0, 1, 2 for data profiling
--profile-with-cpu               # profiling cpu information
--profile-with-stack             # profile stack information
--profile-with-memory            # profile memory information
--profile-record-shapes          # profile shape information
--profile-save-path ./profile_dir    # path to save data

Enable deterministic computing based on Ascend chips

  • add choice in script
--use-deter-comp
  • add environment variable
export HCCL_DETERMINISITIC=True

Acknowledgments


ModelLink is jointly contributed by the following departments of Huawei Corporation:

  • Ascend Computing Product Unit
  • Algorithm Unit of Computing Product Unit
  • Research Unit of Computing Product Unit
  • Open Computing Kit of Computing Product Unit
  • General Development Department
  • Global Technical Service Department

We appreciate every PR from community, and welcome to contribute to ModelLink.

Appendix