Commit Graph

511 Commits

Author SHA1 Message Date
fengliangjun
791677c135 !1246 更新baichuan2-13B性能至1668
Merge pull request !1246 from fengliangjun/master
2024-04-26 01:47:52 +00:00
Liuchang
4109f95dfd !1242 新增Llama3-8B和70B模型
Merge pull request !1242 from Liuchang/master
2024-04-25 01:24:31 +00:00
guhangsong
d17e1da6b5 !1244 解决patch后报错问题
Merge pull request !1244 from guhangsong/bugfix
2024-04-25 01:16:09 +00:00
guhangsong
39d6fd7336 !1218 迁移megatron patch
Merge pull request !1218 from guhangsong/patch
2024-04-23 01:57:03 +00:00
fengliangjun
464131283f !1239 去除FA适配时的一些冗余shape变换操作,提升性能
Merge pull request !1239 from fengliangjun/master
2024-04-18 01:42:50 +00:00
glhyy
75a81f58f9 !1233 README已知问题更新
Merge pull request !1233 from glhyy/master
2024-04-16 02:22:33 +00:00
LeiZhenzhen
5ad4ceddd4 !1231 对chatglm3增加partial_rope支持
Merge pull request !1231 from LeiZhenzhen/master
2024-04-15 13:11:56 +00:00
liuyanghan
760c3c42cb !1236 权重转换新增padding特性 bug fixed
Merge pull request !1236 from liuyanghan/master
2024-04-15 09:49:15 +00:00
liuyanghan
a5fe9c9f9e !1230 权重转换新增padding特性
Merge pull request !1230 from liuyanghan/master
2024-04-15 03:33:05 +00:00
LeiZhenzhen
ab22271e13 !1227 新增chatglm3 预训练、推理、评估基线
Merge pull request !1227 from LeiZhenzhen/master
2024-04-11 03:23:33 +00:00
guoxinjie
2f32c76be2 !1224 移除 ModelLink 下的 megatron,并在 readme 中进行补充
Merge pull request !1224 from guoxinjie/remove_megatron
2024-04-09 07:44:00 +00:00
LeiZhenzhen
8524ea2735 !1225 增加chatglm3权重转换功能
Merge pull request !1225 from LeiZhenzhen/master
2024-04-09 06:05:25 +00:00
zhangshengdong29
721ce18db6 !1223 将peft引入改为懒加载
Merge pull request !1223 from zhangshengdong29/master
2024-04-08 11:04:46 +00:00
guoxinjie
3ee4b9fa94 !1213 将门禁中的 unittest 改写成 pytest,便于后续门禁增加测试case
Merge pull request !1213 from guoxinjie/ut_pytest
2024-04-03 02:14:09 +00:00
黄宇豪
e23e1e354b !1215 fix: 统一Mixtral-README为预训练模板
Merge pull request !1215 from 黄宇豪/master
2024-04-03 02:08:55 +00:00
guoxinjie
cc74ac5e76 !1209 修复流水线文件 import
Merge pull request !1209 from guoxinjie/pipeline_fix
2024-04-03 02:01:11 +00:00
glhyy
cfb76e6257 !1207 readme错误链接修复
Merge pull request !1207 from glhyy/master
2024-04-01 08:42:36 +00:00
guoxinjie
a273842158 !1148 为 ModelLink 补充 st
Merge pull request !1148 from guoxinjie/ci_st
2024-04-01 02:22:40 +00:00
fengliangjun
0df09cd187 !1202 添加profiling功能
* add profiling
2024-03-30 08:58:41 +00:00
黄宇豪
62c39ddb9b !1201 fix: 修复权重保存路径和数据集路径,格式化了README
Merge pull request !1201 from 黄宇豪/master
2024-03-30 06:32:18 +00:00
shishaoyu
ce01706c93 !1199 【DTS2024032814829】临时规避压测反复kill拉起情况下loss出现NaN的问题
Merge pull request !1199 from shishaoyu/master
2024-03-29 06:09:28 +00:00
黄宇豪
e8ae798db4 !1186 统一权重路径和README样式
Merge pull request !1186 from 黄宇豪/master
2024-03-28 03:43:17 +00:00
liuyanghan
aa6d2662cc !1177 多机训练下,数据加载问题说明
Merge pull request !1177 from liuyanghan/master
2024-03-28 01:07:00 +00:00
wwzhuo
24c423201b !1152 llama2 readme修改,更正tokenizer说明
* 修改llama2 readme中微调tokenizer变更说明
2024-03-27 08:25:03 +00:00
fengliangjun
f7af425efb !1169 整理 tasks 文件目录,对外提供 evaluation和 inference.py
* provide inference and evaluation
2024-03-27 07:55:22 +00:00
黄宇豪
1cd3206f58 !1147 修复:添加了bf16-dtype字段以防止影响训练精度
Merge pull request !1147 from 黄宇豪/master
2024-03-26 01:09:59 +00:00
guhangsong
8cc8b1e919 !1151 修复对话生成偶现卡住问题
Merge pull request !1151 from guhangsong/bugfix
2024-03-25 11:43:52 +00:00
huangyiming
a2e9699361 !1146 删除bloom readme里的公网信息
Merge pull request !1146 from huangyiming/master
2024-03-25 06:51:34 +00:00
guhangsong
ddefd6151c !1143 修改llama2 README文件
Merge pull request !1143 from guhangsong/readme
2024-03-25 03:18:27 +00:00
shishaoyu
b12727f3cd !1122 【DTS2024032202178】修复评估脚本中变量名字与readme里不一致的问题
Merge pull request !1122 from shishaoyu/master
2024-03-25 01:32:07 +00:00
liuyanghan
21a8f1aa97 !1139 解决多机训练过程的数据处理阶段进程同步问题
Merge pull request !1139 from liuyanghan/master
2024-03-23 12:37:58 +00:00
guoxinjie
69202f5250 !1134 修复 import NoopTransformer
Merge pull request !1134 from guoxinjie/NoopTransformer
2024-03-23 12:34:35 +00:00
yuhui
17fcedcf86 !1098 Qwen模型readme修改
* qwen模型readme修改
2024-03-22 01:05:02 +00:00
guoxinjie
e9d19b2f87 !1105 修复推理乱码+修正llama2 readme
* fix infer bug for baichuan and modify llama2 readme
2024-03-21 09:53:10 +00:00
shengjy
d5e1353c0a !1095 llama2 7B/13B新增多机训练参数说明
* add llama2 multi-machine training param
2024-03-20 09:15:56 +00:00
wwzhuo
a46f5ed5ad !1088 更改llama 13b 精度模式,适配性能指标
* 修改精度模式
2024-03-20 08:36:03 +00:00
LeiZhenzhen
0030cc9bd1 !1092 推理评估微调脚本统一添加日志保存
* 推理评估微调脚本统一添加日志保存
2024-03-20 07:11:26 +00:00
guoxinjie
11fbfdce01 !1082 增加 llama2-70B 脚本中的环境变量
* fix llama2-70B script bug
2024-03-19 13:13:40 +00:00
xiongliangcheng
a03487b01a !1051 删除baichuan13B微调脚本
* 删除baichuan13B微调脚本
2024-03-19 12:15:25 +00:00
LeiZhenzhen
bf6456e04c !1074 requirements.txt移除apex依赖,模型训练脚本规范化加上日志存档
* requirements.txt移除apex依赖,模型训练脚本规范化加上日志存档
2024-03-19 10:55:11 +00:00
liuyanghan
670cad5dfe !1069 解決多机训练,helpers.so无法自动编译生成问题
* 解決训练,helpers.so无法自动编译问题
2024-03-19 08:41:44 +00:00
fengliangjun
0f8a1851fe !1062 修复moe代码合入导致的分布式优化器不可用bug
* fix bug for alibi and distributed opti
2024-03-19 02:32:12 +00:00
llXll
6a232b8f1f !1058 模型评估去除tokenizer参数
* 模型评估去除tokenizer参数
2024-03-18 08:59:17 +00:00
fengliangjun
8998057f4d !1049 为baihcuan2-13B适配FA
* add FA for baichuan2-13B
2024-03-18 08:06:32 +00:00
zhangbin
4b459852b9 !1053 intern_7B修改readme
Merge pull request !1053 from zhangbin/master
2024-03-18 06:59:10 +00:00
fengliangjun
c4714245ed
Revert "add fa for baichuan2"
This reverts commit 9ff89b8765.
2024-03-17 08:08:41 +00:00
fengliangjun
9ff89b8765 add fa for baichuan2 2024-03-17 16:07:06 +08:00
liuyanghan
560554a5e0 !1046 解决多机环境下训练,从机无法生成数据问题
* 解决多机环境下训练,从机无法生成数据问题
2024-03-16 09:47:38 +00:00
wwzhuo
8215e0e689 !1036 更正llama/llama2 readme中文件名大小写
Merge pull request !1036 from wwzhuo/master
2024-03-15 08:42:22 +00:00
liuyanghan
12bce62426 !1029 修复megaton 转 huggingface bug
Merge pull request !1029 from liuyanghan/master
2024-03-15 08:29:17 +00:00