1 ./examples/bloom/images 文件夹移动到 ./sources/images/bloom

2 ./examples/baichuan/images 文件夹移动到 ./sources/images/baichuan
3 修改评估代码bug
4 删除reanMe中冗余操作步骤
This commit is contained in:
liuyanghan 2023-10-19 15:36:39 +08:00
parent 07b36f72e7
commit c703b968e3
15 changed files with 16 additions and 20 deletions

View File

@ -71,7 +71,7 @@ The plan for more tasks, like RLHF and RM, is under way ...
<td> FP16 </td>
<td> 1914 tokens/p/s </td>
<td> 2068 tokens/p/s </td>
<td> <a href="examples/baichuan/images/7B_loss_compare.png">Loss</a> </td>
<td> <a href="./sources/images/baichuan/7B_loss_compare.png">Loss</a> </td>
<td> <a href="examples/baichuan/pretrain_baichuan_zero_7B.sh">Train</a> </td>
</tr>
<tr>
@ -80,7 +80,7 @@ The plan for more tasks, like RLHF and RM, is under way ...
<td> FP16 </td>
<td> 1024 tokens/p/s </td>
<td> 824 tokens/p/s </td>
<td> <a href="examples/baichuan/images/13B_loss_compare.png">Loss</a> </td>
<td> <a href="./sources/images/baichuan/13B-loss-compare.png">Loss</a> </td>
<td> <a href="examples/baichuan/pretrain_baichuan_ptd_13B.sh">Train</a> </td>
</tr>
<tr>

View File

@ -60,7 +60,6 @@ conda activate test
# install torch and torch_npu
pip install torch-1.11.0-cp37-cp37m-manylinux2014_aarch64.whl
pip install torch_npu-1.11.0.post4_XXXXXX-cp37-cp37m-linux_aarch64.whl
pip install apex-0.1_ascend_XXXXXX-cp37-cp37m-linux_aarch64.whl
# install megatron-core
pip3 install -e git+https://github.com/NVIDIA/Megatron-LM.git@23.05#egg=megatron-core
@ -152,11 +151,11 @@ NPU vs Reference loss.
The NPU runs smoothly, the resource usage is stable, no errors are reported in the middle of the process, the Loss is on a decreasing trend, and the convergence speed is as expected. The relative error of the average loss is 0.01093, less than 2%, the maximum relative error is 0.1243, and the maximum absolute error is 0.4859. The precision meets the requirements.
![NPU-LOSS](./images/7B_loss_compare.png)
![NPU-LOSS](../../sources/images/baichuan/7B_loss_compare.png)
NPU vs Reference loss relative error.
![NPU-Relative-Error](./images/7B_relative_error.png)
![NPU-Relative-Error](../../sources/images/baichuan/7B_relative_error.png)
@ -204,7 +203,6 @@ conda activate test
# install torch and torch_npu
pip install torch-1.11.0-cp37-cp37m-manylinux2014_aarch64.whl
pip install torch_npu-1.11.0.post4_XXXXXX-cp37-cp37m-linux_aarch64.whl
pip install apex-0.1_ascend_XXXXXX-cp37-cp37m-linux_aarch64.whl
#install megatron
git clone https://github.com/NVIDIA/Megatron-LM.git -b 23.05
@ -328,13 +326,13 @@ NPU vs Reference loss.
The NPU runs smoothly, the resource usage is stable, no errors are reported in the middle of the process, the Loss is on a decreasing trend, and the convergence speed is as expected. The relative error of the average loss is 0.00725, less than 2%, the maximum relative error is 0.01978, and the maximum absolute error is 0.10811. The precision meets the requirements.
![NPU-LOSS](./images/13B-loss-compare.png)
![NPU-LOSS](../../sources/images/baichuan/13B-loss-compare.png)
NPU vs Reference loss relative error.
The relative error between NPU and Reference Loss is less than 0.02 throughout, as expected.
![NPU-Relative-Error](./images/baichuan13B-loss-relative-error.png)
![NPU-Relative-Error](../../sources/images/baichuan/baichuan13B-loss-relative-error.png)

View File

@ -60,7 +60,6 @@ conda activate bloom7b
# install torch and torch_npu and apex
pip install torch-2.0.1-cp38-cp38-manylinux2014_aarch64.whl
pip install torch_npu-2.0.1rc1.postxxxxxxxx-cp38-cp38-linux_aarch64.whl
pip install apex-0.1_ascend_xxxxxxxx-cp38-cp38-linux_aarch64.whl
# install megatron-core
pip3 install -e git+https://github.com/NVIDIA/Megatron-LM.git@23.05#egg=megatron-core
@ -356,12 +355,12 @@ The performance of Bloom-176B in **Ascend NPU** and **Reference**:
NPU vs GPU loss. The loss curves of GPUs and NPUs basically coincide.
![bloom176b_lm_loss_compare](./images/bloom176b_lm_loss_compare.PNG)
![bloom176b_lm_loss_compare](../../sources/images/bloom/bloom176b_lm_loss_compare.PNG)
We reduce the number of layers of the model to six, the following figure shows the loss comparsion between the NPU
and GPU on a single-node system. The average relative error is 0.1%, less than 2%, and the proportion of relative error less than 2% reaches 99.9%. The average absolute error is 0.04. The precision meets the requirements.
![bloom176b_1node_lm_loss_compare](./images/bloom176b_lm_loss_1node_compare.PNG)
![bloom176b_1node_lm_loss_compare](../../sources/images/bloom/bloom176b_lm_loss_1node_compare.PNG)
## Inference

View File

@ -51,7 +51,6 @@ conda activate test
# install torch and torch_npu
pip install torch-1.11.0-cp37-cp37m-linux_aarch64.whl
pip install torch_npu-1.11.0.post4_XXXXXX-cp37-cp37m-linux_aarch64.whl
pip install apex-0.1_ascend_XXXXXX-cp37-cp37m-linux_aarch64.whl
# install megatron-core
pip3 install -e git+https://github.com/NVIDIA/Megatron-LM.git@23.05#egg=megatron-core
# install deepspeed and deepspeed_npu

View File

@ -56,7 +56,6 @@ conda activate test
# install torch and torch_npu
pip install torch-1.11.0-cp37-cp37m-linux_aarch64.whl
pip install torch_npu-1.11.0.post4_XXXXXX-cp37-cp37m-linux_aarch64.whl
pip install apex-0.1_ascend_XXXXXX-cp37-cp37m-linux_aarch64.whl
# install megatron-core
pip3 install -e git+https://github.com/NVIDIA/Megatron-LM.git@23.05#egg=megatron-core
# install deepspeed and deepspeed_npu

View File

Before

Width:  |  Height:  |  Size: 55 KiB

After

Width:  |  Height:  |  Size: 55 KiB

View File

Before

Width:  |  Height:  |  Size: 46 KiB

After

Width:  |  Height:  |  Size: 46 KiB

View File

Before

Width:  |  Height:  |  Size: 80 KiB

After

Width:  |  Height:  |  Size: 80 KiB

View File

Before

Width:  |  Height:  |  Size: 76 KiB

After

Width:  |  Height:  |  Size: 76 KiB

View File

Before

Width:  |  Height:  |  Size: 43 KiB

After

Width:  |  Height:  |  Size: 43 KiB

View File

Before

Width:  |  Height:  |  Size: 116 KiB

After

Width:  |  Height:  |  Size: 116 KiB

View File

Before

Width:  |  Height:  |  Size: 44 KiB

After

Width:  |  Height:  |  Size: 44 KiB

View File

Before

Width:  |  Height:  |  Size: 34 KiB

After

Width:  |  Height:  |  Size: 34 KiB

View File

@ -17,9 +17,9 @@ import logging
import json
import pandas as pd
import tqdm
from tasks.task_eval.eval_api.dataset_eval import DatasetEval
from tasks.task_eval.eval_api.llm_chat import LlmChat
from tasks.task_eval.eval_impl.template import CEVAL_TEMPLATE_DIR
from tasks.evaluation.eval_api.dataset_eval import DatasetEval
from tasks.evaluation.eval_api.llm_chat import LlmChat
from tasks.evaluation.eval_impl.template import CEVAL_TEMPLATE_DIR
logger = logging.getLogger(__name__)

View File

@ -15,6 +15,8 @@
import json
import os
import logging
import stat
import re
import subprocess
from typing import Iterable, Dict
@ -23,9 +25,8 @@ from tasks.evaluation.eval_api.dataset_eval import DatasetEval
from tasks.evaluation.eval_api.llm_chat import LlmChat
from tasks.evaluation.eval_impl.template import CODE_TEST_LOG_DIR
from ascendspeed.error_utils import check_divisible_by_zero
from ascendspeed.utils WRITE_FILE_DEFAULT_FLAGS, WRITE_FILE_DEFAULT_MODES
logger = logging.getLogger(__name__)
flags = os.O_WRONLY | os.O_CREAT | os.O_EXCL
modes = stat.S_IWUSR | stat.S_IRUSR
def extract_answer_code(answer, task: dict):
@ -45,7 +46,7 @@ def extract_answer_code(answer, task: dict):
if not os.path.exists(CODE_TEST_LOG_DIR):
os.makedirs(CODE_TEST_LOG_DIR)
test_code_path = "{}{}".format(CODE_TEST_LOG_DIR, save_file)
with os.fdopen(os.open(test_code_path, flags, modes), 'w') as f:
with os.fdopen(os.open(test_code_path, WRITE_FILE_DEFAULT_FLAGS, WRITE_FILE_DEFAULT_MODES), 'w') as f:
f.write("from typing import List\n")
f.write("import math\n")
for i, line in enumerate(code_lines):
@ -90,7 +91,7 @@ class HumanEval(DatasetEval):
"""
for file in os.listdir(test_dir):
file_path = os.path.join(self.test_dir, file)
with os.fdopen(os.open(test_code_path, flags, modes)) as fp::
with os.fdopen(os.open(test_code_path, WRITE_FILE_DEFAULT_FLAGS, WRITE_FILE_DEFAULT_MODES)) as fp:
for line in fp:
if any(not x.isspace() for x in line):
yield json.loads(line)