1 ./examples/bloom/images 文件夹移动到 ./sources/images/bloom
2 ./examples/baichuan/images 文件夹移动到 ./sources/images/baichuan 3 修改评估代码bug 4 删除reanMe中冗余操作步骤
@ -71,7 +71,7 @@ The plan for more tasks, like RLHF and RM, is under way ...
|
||||
<td> FP16 </td>
|
||||
<td> 1914 tokens/p/s </td>
|
||||
<td> 2068 tokens/p/s </td>
|
||||
<td> <a href="examples/baichuan/images/7B_loss_compare.png">Loss</a> </td>
|
||||
<td> <a href="./sources/images/baichuan/7B_loss_compare.png">Loss</a> </td>
|
||||
<td> <a href="examples/baichuan/pretrain_baichuan_zero_7B.sh">Train</a> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
@ -80,7 +80,7 @@ The plan for more tasks, like RLHF and RM, is under way ...
|
||||
<td> FP16 </td>
|
||||
<td> 1024 tokens/p/s </td>
|
||||
<td> 824 tokens/p/s </td>
|
||||
<td> <a href="examples/baichuan/images/13B_loss_compare.png">Loss</a> </td>
|
||||
<td> <a href="./sources/images/baichuan/13B-loss-compare.png">Loss</a> </td>
|
||||
<td> <a href="examples/baichuan/pretrain_baichuan_ptd_13B.sh">Train</a> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
|
@ -60,7 +60,6 @@ conda activate test
|
||||
# install torch and torch_npu
|
||||
pip install torch-1.11.0-cp37-cp37m-manylinux2014_aarch64.whl
|
||||
pip install torch_npu-1.11.0.post4_XXXXXX-cp37-cp37m-linux_aarch64.whl
|
||||
pip install apex-0.1_ascend_XXXXXX-cp37-cp37m-linux_aarch64.whl
|
||||
|
||||
# install megatron-core
|
||||
pip3 install -e git+https://github.com/NVIDIA/Megatron-LM.git@23.05#egg=megatron-core
|
||||
@ -152,11 +151,11 @@ NPU vs Reference loss.
|
||||
|
||||
The NPU runs smoothly, the resource usage is stable, no errors are reported in the middle of the process, the Loss is on a decreasing trend, and the convergence speed is as expected. The relative error of the average loss is 0.01093, less than 2%, the maximum relative error is 0.1243, and the maximum absolute error is 0.4859. The precision meets the requirements.
|
||||
|
||||
![NPU-LOSS](./images/7B_loss_compare.png)
|
||||
![NPU-LOSS](../../sources/images/baichuan/7B_loss_compare.png)
|
||||
|
||||
NPU vs Reference loss relative error.
|
||||
|
||||
![NPU-Relative-Error](./images/7B_relative_error.png)
|
||||
![NPU-Relative-Error](../../sources/images/baichuan/7B_relative_error.png)
|
||||
|
||||
|
||||
|
||||
@ -204,7 +203,6 @@ conda activate test
|
||||
# install torch and torch_npu
|
||||
pip install torch-1.11.0-cp37-cp37m-manylinux2014_aarch64.whl
|
||||
pip install torch_npu-1.11.0.post4_XXXXXX-cp37-cp37m-linux_aarch64.whl
|
||||
pip install apex-0.1_ascend_XXXXXX-cp37-cp37m-linux_aarch64.whl
|
||||
|
||||
#install megatron
|
||||
git clone https://github.com/NVIDIA/Megatron-LM.git -b 23.05
|
||||
@ -328,13 +326,13 @@ NPU vs Reference loss.
|
||||
|
||||
The NPU runs smoothly, the resource usage is stable, no errors are reported in the middle of the process, the Loss is on a decreasing trend, and the convergence speed is as expected. The relative error of the average loss is 0.00725, less than 2%, the maximum relative error is 0.01978, and the maximum absolute error is 0.10811. The precision meets the requirements.
|
||||
|
||||
![NPU-LOSS](./images/13B-loss-compare.png)
|
||||
![NPU-LOSS](../../sources/images/baichuan/13B-loss-compare.png)
|
||||
|
||||
NPU vs Reference loss relative error.
|
||||
|
||||
The relative error between NPU and Reference Loss is less than 0.02 throughout, as expected.
|
||||
|
||||
![NPU-Relative-Error](./images/baichuan13B-loss-relative-error.png)
|
||||
![NPU-Relative-Error](../../sources/images/baichuan/baichuan13B-loss-relative-error.png)
|
||||
|
||||
|
||||
|
||||
|
@ -60,7 +60,6 @@ conda activate bloom7b
|
||||
# install torch and torch_npu and apex
|
||||
pip install torch-2.0.1-cp38-cp38-manylinux2014_aarch64.whl
|
||||
pip install torch_npu-2.0.1rc1.postxxxxxxxx-cp38-cp38-linux_aarch64.whl
|
||||
pip install apex-0.1_ascend_xxxxxxxx-cp38-cp38-linux_aarch64.whl
|
||||
|
||||
# install megatron-core
|
||||
pip3 install -e git+https://github.com/NVIDIA/Megatron-LM.git@23.05#egg=megatron-core
|
||||
@ -356,12 +355,12 @@ The performance of Bloom-176B in **Ascend NPU** and **Reference**:
|
||||
|
||||
NPU vs GPU loss. The loss curves of GPUs and NPUs basically coincide.
|
||||
|
||||
![bloom176b_lm_loss_compare](./images/bloom176b_lm_loss_compare.PNG)
|
||||
![bloom176b_lm_loss_compare](../../sources/images/bloom/bloom176b_lm_loss_compare.PNG)
|
||||
|
||||
We reduce the number of layers of the model to six, the following figure shows the loss comparsion between the NPU
|
||||
and GPU on a single-node system. The average relative error is 0.1%, less than 2%, and the proportion of relative error less than 2% reaches 99.9%. The average absolute error is 0.04. The precision meets the requirements.
|
||||
|
||||
![bloom176b_1node_lm_loss_compare](./images/bloom176b_lm_loss_1node_compare.PNG)
|
||||
![bloom176b_1node_lm_loss_compare](../../sources/images/bloom/bloom176b_lm_loss_1node_compare.PNG)
|
||||
|
||||
## Inference
|
||||
|
||||
|
@ -51,7 +51,6 @@ conda activate test
|
||||
# install torch and torch_npu
|
||||
pip install torch-1.11.0-cp37-cp37m-linux_aarch64.whl
|
||||
pip install torch_npu-1.11.0.post4_XXXXXX-cp37-cp37m-linux_aarch64.whl
|
||||
pip install apex-0.1_ascend_XXXXXX-cp37-cp37m-linux_aarch64.whl
|
||||
# install megatron-core
|
||||
pip3 install -e git+https://github.com/NVIDIA/Megatron-LM.git@23.05#egg=megatron-core
|
||||
# install deepspeed and deepspeed_npu
|
||||
|
@ -56,7 +56,6 @@ conda activate test
|
||||
# install torch and torch_npu
|
||||
pip install torch-1.11.0-cp37-cp37m-linux_aarch64.whl
|
||||
pip install torch_npu-1.11.0.post4_XXXXXX-cp37-cp37m-linux_aarch64.whl
|
||||
pip install apex-0.1_ascend_XXXXXX-cp37-cp37m-linux_aarch64.whl
|
||||
# install megatron-core
|
||||
pip3 install -e git+https://github.com/NVIDIA/Megatron-LM.git@23.05#egg=megatron-core
|
||||
# install deepspeed and deepspeed_npu
|
||||
|
Before Width: | Height: | Size: 55 KiB After Width: | Height: | Size: 55 KiB |
Before Width: | Height: | Size: 46 KiB After Width: | Height: | Size: 46 KiB |
Before Width: | Height: | Size: 80 KiB After Width: | Height: | Size: 80 KiB |
Before Width: | Height: | Size: 76 KiB After Width: | Height: | Size: 76 KiB |
Before Width: | Height: | Size: 43 KiB After Width: | Height: | Size: 43 KiB |
Before Width: | Height: | Size: 116 KiB After Width: | Height: | Size: 116 KiB |
Before Width: | Height: | Size: 44 KiB After Width: | Height: | Size: 44 KiB |
Before Width: | Height: | Size: 34 KiB After Width: | Height: | Size: 34 KiB |
@ -17,9 +17,9 @@ import logging
|
||||
import json
|
||||
import pandas as pd
|
||||
import tqdm
|
||||
from tasks.task_eval.eval_api.dataset_eval import DatasetEval
|
||||
from tasks.task_eval.eval_api.llm_chat import LlmChat
|
||||
from tasks.task_eval.eval_impl.template import CEVAL_TEMPLATE_DIR
|
||||
from tasks.evaluation.eval_api.dataset_eval import DatasetEval
|
||||
from tasks.evaluation.eval_api.llm_chat import LlmChat
|
||||
from tasks.evaluation.eval_impl.template import CEVAL_TEMPLATE_DIR
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
|
@ -15,6 +15,8 @@
|
||||
|
||||
import json
|
||||
import os
|
||||
import logging
|
||||
import stat
|
||||
import re
|
||||
import subprocess
|
||||
from typing import Iterable, Dict
|
||||
@ -23,9 +25,8 @@ from tasks.evaluation.eval_api.dataset_eval import DatasetEval
|
||||
from tasks.evaluation.eval_api.llm_chat import LlmChat
|
||||
from tasks.evaluation.eval_impl.template import CODE_TEST_LOG_DIR
|
||||
from ascendspeed.error_utils import check_divisible_by_zero
|
||||
from ascendspeed.utils WRITE_FILE_DEFAULT_FLAGS, WRITE_FILE_DEFAULT_MODES
|
||||
logger = logging.getLogger(__name__)
|
||||
flags = os.O_WRONLY | os.O_CREAT | os.O_EXCL
|
||||
modes = stat.S_IWUSR | stat.S_IRUSR
|
||||
|
||||
|
||||
def extract_answer_code(answer, task: dict):
|
||||
@ -45,7 +46,7 @@ def extract_answer_code(answer, task: dict):
|
||||
if not os.path.exists(CODE_TEST_LOG_DIR):
|
||||
os.makedirs(CODE_TEST_LOG_DIR)
|
||||
test_code_path = "{}{}".format(CODE_TEST_LOG_DIR, save_file)
|
||||
with os.fdopen(os.open(test_code_path, flags, modes), 'w') as f:
|
||||
with os.fdopen(os.open(test_code_path, WRITE_FILE_DEFAULT_FLAGS, WRITE_FILE_DEFAULT_MODES), 'w') as f:
|
||||
f.write("from typing import List\n")
|
||||
f.write("import math\n")
|
||||
for i, line in enumerate(code_lines):
|
||||
@ -90,7 +91,7 @@ class HumanEval(DatasetEval):
|
||||
"""
|
||||
for file in os.listdir(test_dir):
|
||||
file_path = os.path.join(self.test_dir, file)
|
||||
with os.fdopen(os.open(test_code_path, flags, modes)) as fp::
|
||||
with os.fdopen(os.open(test_code_path, WRITE_FILE_DEFAULT_FLAGS, WRITE_FILE_DEFAULT_MODES)) as fp:
|
||||
for line in fp:
|
||||
if any(not x.isspace() for x in line):
|
||||
yield json.loads(line)
|
||||
|