ModelLink2/examples/llama_task
2023-07-05 14:25:29 +08:00
..
evaluate_llama_zeroshot_7b.sh llama模型zeroshot 33B/65B适配代码提交;提交README.md文件 2023-07-05 14:25:29 +08:00
pretrain_llama_1p.sh LlamaTokenizer适配及预训练脚本更改 2023-06-13 12:33:33 +08:00
pretrain_llama_ptd_8p.sh LlamaTokenizer适配及预训练脚本更改 2023-06-13 12:33:33 +08:00
pretrain_llama_td_8p.sh LlamaTokenizer适配及预训练脚本更改 2023-06-13 12:33:33 +08:00
README.md llama模型zeroshot 33B/65B适配代码提交;提交README.md文件 2023-07-05 14:25:29 +08:00

Contents

Evaluation and Tasks

Datastes

First of all, You must download the evaluation datasets for the BoolQ, PIQA (1, 2), HellaSwag tasks.

LLama Evaluation

We include zeroshot example scripts for llama evaluation on BoolQ, PIQA (1, 2), and HellaSwag accuracy.

For example, you can use the following command to run BoolQ zeroshot task on a Llama-7B parameter model.

WORLD_SIZE=8

DISTRIBUTED_ARGS="--nproc_per_node $WORLD_SIZE \
                  --nnodes 1 \
                  --node_rank 0 \
                  --master_addr localhost \
                  --master_port 6000"

TASK="BoolQ"
VALID_DATA=<boolq dev data path>.jsonl

python -m torch.distributed.launch $DISTRIBUTED_ARGS ./tasks/main.py \
               --task $TASK \
               --valid-data $VALID_DATA \
               --tokenizer-type PretrainedFromHF \
               --tokenizer-name-or-path ./dataset/llama/  \
               --tokenizer-not-use-fast \
               --load $CHECKPOINT \
               --tensor-model-parallel-size 8 \
               --pipeline-model-parallel-size 1 \
               --num-layers 32 \
               --hidden-size 4096 \
               --ffn-hidden-size 11008 \
               --num-attention-heads 32 \
               --micro-batch-size 8 \
               --seq-length 2048 \
               --max-position-embeddings 2048 \
               --log-interval 1 \
               --layernorm-epsilon 1e-6 \
               --fp16 \
               --no-load-optim \
               --no-load-rng

Zeroshot Task

The following table shows the NPU and LLama Paper accuracy achieved by the Zeroshot task of the Llama model.

Model Size BoolQ PIQA HellaSwag
7B 74.7% | 76.5% 78.6% | 79.8% 73.9% | 79.8%
13B 79.5% | 78.1% 80.4% | 80.1% 77.3% | 80.1%
33B 83.1% | 83.1% 81.7% | 82.3% 83.0% | 82.3%
65B 85.5% | 85.3% 81.2% | 82.8% 82.3% | 82.8%