.. |
baichuan2
|
!1893 optim: sft微调性能提升
|
2024-11-15 04:34:08 +00:00 |
chatglm3
|
!1893 optim: sft微调性能提升
|
2024-11-15 04:34:08 +00:00 |
codellama
|
!1814 refactor trainer
|
2024-11-06 10:53:02 +00:00 |
deepseek2
|
!1844 [mcore-llm]MLA结构及group_limited_greedy适配CP标准流程
|
2024-11-05 07:18:18 +00:00 |
deepseek2_coder
|
!1511 refactor: support Deepseek Specification
|
2024-10-21 07:57:37 +00:00 |
deepseek2_lite
|
!1814 refactor trainer
|
2024-11-06 10:53:02 +00:00 |
gemma
|
!1745 新增baichuan2全参微调脚本和相应模版
|
2024-11-04 13:03:12 +00:00 |
gemma2
|
!1814 refactor trainer
|
2024-11-06 10:53:02 +00:00 |
glm4
|
!1814 refactor trainer
|
2024-11-06 10:53:02 +00:00 |
gpt4
|
!1601 新增gpt4 moe dropless
|
2024-09-04 01:13:13 +00:00 |
grok1
|
!1619 新增glm4模型适配
|
2024-09-05 13:05:30 +00:00 |
internlm2
|
!1917 添加分支与标签说明
|
2024-11-25 10:37:48 +00:00 |
internlm25
|
!1915 InternLM预训练脚本问题修改、用户指南问题修改
|
2024-11-22 06:29:09 +00:00 |
llama2
|
!1911 增加llama2、mixtral的lora权重转换脚本
|
2024-11-25 01:23:41 +00:00 |
llama3
|
!1858 dpo、simpo方案特性支持:支持vpp、dpp、ep、cp、断点续训等
|
2024-11-21 03:31:39 +00:00 |
llama31
|
!1858 dpo、simpo方案特性支持:支持vpp、dpp、ep、cp、断点续训等
|
2024-11-21 03:31:39 +00:00 |
llama32
|
!1744 新增llama3.2-1b模型适配
|
2024-10-08 06:40:24 +00:00 |
minicpm
|
!1814 refactor trainer
|
2024-11-06 10:53:02 +00:00 |
mistral
|
!1814 refactor trainer
|
2024-11-06 10:53:02 +00:00 |
mixtral
|
!1911 增加llama2、mixtral的lora权重转换脚本
|
2024-11-25 01:23:41 +00:00 |
qwen2
|
!1904 添加Qwen2.5-72B模型
|
2024-11-18 08:59:00 +00:00 |
qwen2_moe
|
!1707 添加新模型Qwen2-57B-A14B
|
2024-09-24 14:40:52 +00:00 |
qwen15
|
!1806 Optim: llama3 qwen系列模型 预训练性能提升
|
2024-11-18 08:29:28 +00:00 |
qwen25
|
!1904 添加Qwen2.5-72B模型
|
2024-11-18 08:59:00 +00:00 |
qwen25_coder
|
!1830 Qwen2.5代码大模型适配
|
2024-11-19 03:42:05 +00:00 |
qwen25_math
|
!1913 Qwen2.5-Math系列模型适配
|
2024-11-22 06:13:09 +00:00 |
yi
|
!1676 Legacy模型Qwen1.5-32B适配mcore
|
2024-09-14 01:01:27 +00:00 |
yi15
|
!1916 yi1.5-6b最佳性能更新
|
2024-11-22 02:39:13 +00:00 |