云计算百科
云计算领域专业知识百科平台

LLaMA-Factory使用

文章目录

  • 一、LLAMA-Factory简介
  • 二、安装LLaMA-Factory
  • 三、准备训练数据
  • 四、模型训练
    • 1. 模型下载
    • 2. 全量微调
    • 3.lora微调
    • 4.QLora微调
  • 五、合并模型权重
    • 1.模型合并
    • 2.测试

一、LLAMA-Factory简介

LLaMA-Factory是一个简单易用且高效的大模型训练框架,支持上百种大模型的训练,框架特性主要包括:

  • 模型种类:LLaMA、LLaVA、Mistral、Mixtral-MoE、Qwen、Yi、Gemma、Baichuan、ChatGLM、Phi 等等。
  • 训练算法:(增量)预训练、(多模态)指令监督微调、奖励模型训练、PPO 训练、DPO 训练、KTO 训练、ORPO 训练等等。
  • 运算精度:16比特全参数微调、冻结微调、LoRA微调和基于AQLM/AWQ/GPTQ/LLM.int8/HQQ/EETQ的⅔/⅘/6/8比特QLoRA 微调。
  • 优化算法:GaLore、BAdam、DoRA、LongLoRA、LLaMA Pro、Mixture-of-Depths、LoRA+、LoftQ和PiSSA。
  • 加速算子:FlashAttention-2和Unsloth。
  • 推理引擎:Transformers和vLLM。
  • 实验面板:LlamaBoard、TensorBoard、Wandb、MLflow等等。

本文将介绍如何使用LLAMA-Factory对Qwen2.5系列大模型进行微调(Qwen1.5系列模型也适用),更多特性请参考https://github.com/hiyouga/LlamaFactory

二、安装LLaMA-Factory

LLaMA-Factory的github地址为:https://github.com/hiyouga/LLaMA-Factory 。为防止项目更新带来软件版本不适配,我们下面安装一个历史版本。

  • 在使用AutoDL克隆git仓库时,速度较慢,可以运行如下命令。

source /etc/network_turbo

  • 下载并安装LLaMA-Factory:

cd /root/autodl-tmp git clone –depth 1 https://github.com/Jiangnanjiezi/LLaMA-Factory.git cd LLaMA-Factory pip install -e “.[torch,metrics]” -i https://mirrors.aliyun.com/pypi/simple/

  • 安装完成后,执行 llamafactory-cli version,若出现以下提示,则表明安装成功: 在这里插入图片描述

三、准备训练数据

训练数据应保存为json文件,文件为:qwen_dataset.json。需要将其放到 autodl-tmp/LLaMA-Factory/data 下。

其内容示例如下:

[
{
"instruction": "请提取以下内容中的摘要信息",
"input": "保持身体健康的五个方法:\\n\\n1. 每天至少饮用8杯水,促进新陈代谢\\n2. 每周进行150分钟中等强度运动,如快走或游泳\\n3. 保证7-9小时高质量睡眠,避免熬夜\\n4. 饮食中增加蔬菜水果比例,减少油炸食品\\n5. 定期体检,监测血压、血糖等指标",
"output": "多喝水、规律运动、充足睡眠、均衡饮食、定期体检"
},
{
"instruction": "请提取以下内容中的摘要信息",
"input": "提高学习效率的三个技巧:\\n\\n1. 使用番茄工作法,每25分钟专注后休息5分钟\\n2. 建立思维导图整理知识框架\\n3. 睡前复习重点内容加强记忆",
"output": "番茄工作法、思维导图、睡前复习"
},
{
"instruction": "请提取以下内容中的摘要信息",
"input": "旅行必备物品清单:\\n1. 护照/身份证原件及复印件\\n2. 便携充电宝和转换插头\\n3. 常用药品(退烧药、创可贴)\\n4. 轻便折叠雨伞\\n5. 分装洗漱用品",
"output": "证件、充电设备、药品、雨具、洗漱包"
},
{
"instruction": "请提取以下内容中的摘要信息",
"input": "职场沟通四大原则:\\n① 明确沟通目标\\n② 使用金字塔表达结构\\n③ 注意非语言信号(眼神/姿态)\\n④ 及时确认信息理解度",
"output": "目标明确、结构化表达、非语言交流、信息确认"
},
]

在LLaMA-Factory文件夹下的data/dataset_info.json文件中注册自定义的训练数据,在文件中添加如下配置信息:

"qwen_dataset": {
"file_name": "qwen_dataset.json"
},

四、模型训练

1. 模型下载

  • 安装modelscope

pip install modelscope

  • 下载Qwen2.5

mkdir p /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B
cd /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B

# 下载模型
# modelscope download –model Qwen/Qwen2.5-7B –local_dir ./
# 因为7B模型下载太慢,并且微调所占显存也大,所以用1.8B模型来演示
modelscope download model Qwen/Qwen2.5-1.5B local_dir ./

2. 全量微调

在LLaMA-Factory文件夹下,创建 qwen2.5-7b-full-sft.yaml 配置文件,用于设置全量参数训练的配置。

### 模型配置
# 预训练模型的本地路径或HuggingFace模型ID
model_name_or_path: /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B
# 必须开启以加载包含自定义代码的模型(如Qwen/ChatGLM等)
trust_remote_code: true

### 方法配置
# 微调阶段:监督式微调 (Supervised Fine-Tuning)
stage: sft
# 是否执行训练阶段
do_train: true
# 微调类型:全参数微调(可选值:full/lora/qlora)
finetuning_type: full
# DeepSpeed配置文件路径(使用ZeRO Stage 3优化策略)
deepspeed: /root/autodl-tmp/LLaMA-Factory/examples/deepspeed/ds_z3_config.json

### 数据集配置
# 使用的数据集名称(需与data目录下的数据集名称对应)
dataset: qwen_dataset
# 使用的模板格式(与模型架构匹配)
template: qwen
# 输入序列最大长度(单位:token)
cutoff_len: 1024
# 是否覆盖已有的缓存文件(建议数据集修改后启用)
overwrite_cache: true
# 数据预处理的并行进程数(建议设置为CPU核心数的50-70%)
preprocessing_num_workers: 16

### 输出配置
# 模型和日志的输出目录
output_dir: saves/qwen2.5-7b/full
# 每隔多少训练步记录一次日志
logging_steps: 10
# 每隔多少训练步保存一次模型
save_steps: 100
# 是否生成训练损失曲线图
plot_loss: true
# 是否覆盖已有输出目录(建议新训练时启用)
overwrite_output_dir: true

### 训练参数
# 每个GPU的批次大小(实际batch_size = 此值 * gradient_accumulation_steps * GPU数量)
per_device_train_batch_size: 1
# 梯度累积步数(用于模拟更大batch_size)
gradient_accumulation_steps: 16
# 初始学习率(适合7B级别模型的典型值)
learning_rate: 1.0e-5
# 训练总轮数
num_train_epochs: 1.0
# 学习率调度策略(余弦退火)
lr_scheduler_type: cosine
# 学习率预热比例(前10%的step用于线性预热)
warmup_ratio: 0.1
# 启用BF16混合精度训练(需要Ampere架构以上GPU)
bf16: true
# 分布式训练超时时间(单位:毫秒)
ddp_timeout: 180000000 # 约50小时

### 评估配置
# 验证集划分比例(从训练集划分)
val_size: 0.1
# 评估时每个GPU的批次大小
per_device_eval_batch_size: 1
# 评估策略(按训练步数间隔评估)
eval_strategy: steps
# 每隔多少训练步执行一次评估
eval_steps: 500

deepspeed的配置:

{
// 全局训练批次大小(自动计算为:micro_batch * gpu_num * gradient_accumulation)
"train_batch_size": "auto",

// 单GPU的微批次大小(根据显存自动调整)
"train_micro_batch_size_per_gpu": "auto",

// 梯度累积步数(自动匹配micro_batch配置)
"gradient_accumulation_steps": "auto",

// 梯度裁剪阈值(自动禁用或设置默认1.0)
"gradient_clipping": "auto",

// 允许未经官方测试的优化器(需谨慎开启)
"zero_allow_untested_optimizer": true,

// FP16混合精度配置
"fp16": {
"enabled": "auto", // 自动根据硬件兼容性启用
"loss_scale": 0, // 动态损失缩放(0表示自动调整)
"loss_scale_window": 1000,// 缩放调整窗口大小(1000次迭代)
"initial_scale_power": 16,// 初始缩放比例2^16
"hysteresis": 2, // 缩放容差(防止频繁调整)
"min_loss_scale": 1 // 最小缩放比例
},

// BF16混合精度配置(与FP16二选一)
"bf16": {
"enabled": "auto" // 在支持BF16的GPU上自动启用
},

// ZeRO优化策略(Stage3完整配置)
"zero_optimization": {
"stage": 3, // 最高优化等级(参数/梯度/优化器状态分片)

// 优化器状态卸载到CPU
"offload_optimizer": {
"device": "cpu", // 卸载到CPU内存
"pin_memory": true // 使用锁页内存加速传输
},

// 模型参数卸载到CPU
"offload_param": {
"device": "cpu", // 参数存储到CPU内存
"pin_memory": true // 使用DMA加速数据传输
},

"overlap_comm": false, // 禁用通信计算重叠(提升稳定性)
"contiguous_gradients": true, // 保持梯度内存连续(优化显存)

// 参数分组配置
"sub_group_size": 1e9, // 单参数组最大尺寸(默认1B防止分组)

// 通信缓冲区自动调整
"reduce_bucket_size": "auto", // AllReduce缓冲区大小
"stage3_prefetch_bucket_size": "auto", // 参数预取缓冲区

// 参数持久化阈值
"stage3_param_persistence_threshold": "auto", // 参数驻留GPU的阈值

"stage3_max_live_parameters": 1e9, // 最大驻留参数数量
"stage3_max_reuse_distance": 1e9, // 参数重用距离阈值

// 模型保存时收集16位权重
"stage3_gather_16bit_weights_on_model_save": true
}
}

开始训练: 切换到qwen2.5-7b-full-sft.yaml所在的路径,执行下面的命令。

# 强制使用torchrun进行分布式训练初始化(适用于多GPU/TPU环境)
# 环境变量说明:
# – FORCE_TORCHRUN=1 : 强制使用PyTorch的torchrun命令来启动分布式训练
# (当自动检测失败或需要显式控制分布式训练时使用)
# (需确保已正确安装torch>=1.8.0)

# 执行LLaMA Factory训练流程
# 命令结构:
# llamafactory-cli : 主程序入口(基于Python Fire的CLI工具)
# train : 子命令,指定执行训练任务
# qwen2.5-7b-full-sft.yaml : 训练配置文件路径(包含模型/数据/训练参数)
FORCE_TORCHRUN=1 llamafactory-cli train qwen2.5-7b-full-sft.yaml

训练结果:

[INFO|trainer.py:2519] 2025-11-15 00:36:01,373 >> ***** Running training *****
[INFO|trainer.py:2520] 2025-11-15 00:36:01,373 >> Num examples = 54
[INFO|trainer.py:2521] 2025-11-15 00:36:01,373 >> Num Epochs = 1
[INFO|trainer.py:2522] 2025-11-15 00:36:01,373 >> Instantaneous batch size per device = 1
[INFO|trainer.py:2525] 2025-11-15 00:36:01,373 >> Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:2526] 2025-11-15 00:36:01,373 >> Gradient Accumulation steps = 16
[INFO|trainer.py:2527] 2025-11-15 00:36:01,373 >> Total optimization steps = 2
[INFO|trainer.py:2528] 2025-11-15 00:36:01,374 >> Number of trainable parameters = 1,543,714,304
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:24<00:00, 11.65s/it][INFO|trainer.py:4309] 2025-11-15 00:36:28,898 >> Saving model checkpoint to saves/qwen2.5-7b/full/checkpoint-2
[INFO|configuration_utils.py:491] 2025-11-15 00:36:28,901 >> Configuration saved in saves/qwen2.5-7b/full/checkpoint-2/config.json
[INFO|configuration_utils.py:757] 2025-11-15 00:36:28,902 >> Configuration saved in saves/qwen2.5-7b/full/checkpoint-2/generation_config.json
[INFO|modeling_utils.py:4181] 2025-11-15 00:36:33,246 >> Model weights saved in saves/qwen2.5-7b/full/checkpoint-2/model.safetensors
[INFO|tokenization_utils_base.py:2421] 2025-11-15 00:36:33,247 >> chat template saved in saves/qwen2.5-7b/full/checkpoint-2/chat_template.jinja
[INFO|tokenization_utils_base.py:2590] 2025-11-15 00:36:33,248 >> tokenizer config file saved in saves/qwen2.5-7b/full/checkpoint-2/tokenizer_config.json
[INFO|tokenization_utils_base.py:2599] 2025-11-15 00:36:33,248 >> Special tokens file saved in saves/qwen2.5-7b/full/checkpoint-2/special_tokens_map.json
[2025-11-15 00:36:33,422] [INFO] [logging.py:107:log_dist] [Rank 0] [Torch] Checkpoint global_step2 is about to be saved!
[2025-11-15 00:36:33,428] [INFO] [logging.py:107:log_dist] [Rank 0] Saving model checkpoint: saves/qwen2.5-7b/full/checkpoint-2/global_step2/zero_pp_rank_0_mp_rank_00_model_states.pt
[2025-11-15 00:36:33,428] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving saves/qwen2.5-7b/full/checkpoint-2/global_step2/zero_pp_rank_0_mp_rank_00_model_states.pt...
[2025-11-15 00:36:33,438] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved saves/qwen2.5-7b/full/checkpoint-2/global_step2/zero_pp_rank_0_mp_rank_00_model_states.pt.
[2025-11-15 00:36:33,439] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving saves/qwen2.5-7b/full/checkpoint-2/global_step2/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt...
[2025-11-15 00:36:47,668] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved saves/qwen2.5-7b/full/checkpoint-2/global_step2/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt.
[2025-11-15 00:36:47,669] [INFO] [engine.py:3701:_save_zero_checkpoint] zero checkpoint saved saves/qwen2.5-7b/full/checkpoint-2/global_step2/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
[2025-11-15 00:36:47,673] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step2 is ready now!
[INFO|trainer.py:2810] 2025-11-15 00:36:47,675 >>

Training completed. Do not forget to share your model on huggingface.co/models =)

{'train_runtime': 46.3009, 'train_samples_per_second': 1.166, 'train_steps_per_second': 0.043, 'train_loss': 3.873927593231201, 'epoch': 1.0}
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:46<00:00, 23.15s/it]
[INFO|trainer.py:4309] 2025-11-15 00:36:49,886 >> Saving model checkpoint to saves/qwen2.5-7b/full
[INFO|configuration_utils.py:491] 2025-11-15 00:36:49,889 >> Configuration saved in saves/qwen2.5-7b/full/config.json
[INFO|configuration_utils.py:757] 2025-11-15 00:36:49,889 >> Configuration saved in saves/qwen2.5-7b/full/generation_config.json
[INFO|modeling_utils.py:4181] 2025-11-15 00:36:52,910 >> Model weights saved in saves/qwen2.5-7b/full/model.safetensors
[INFO|tokenization_utils_base.py:2421] 2025-11-15 00:36:52,910 >> chat template saved in saves/qwen2.5-7b/full/chat_template.jinja
[INFO|tokenization_utils_base.py:2590] 2025-11-15 00:36:52,911 >> tokenizer config file saved in saves/qwen2.5-7b/full/tokenizer_config.json
[INFO|tokenization_utils_base.py:2599] 2025-11-15 00:36:52,911 >> Special tokens file saved in saves/qwen2.5-7b/full/special_tokens_map.json
***** train metrics *****
epoch = 1.0
total_flos = 4GF
train_loss = 3.8739
train_runtime = 0:00:46.30
train_samples_per_second = 1.166
train_steps_per_second = 0.043
[WARNING|2025-11-15 00:36:53] llamafactory.extras.ploting:148 >> No metric loss to plot.
[WARNING|2025-11-15 00:36:53] llamafactory.extras.ploting:148 >> No metric eval_loss to plot.
[WARNING|2025-11-15 00:36:53] llamafactory.extras.ploting:148 >> No metric eval_accuracy to plot.
[INFO|trainer.py:4643] 2025-11-15 00:36:53,090 >>
***** Running Evaluation *****
[INFO|trainer.py:4645] 2025-11-15 00:36:53,090 >> Num examples = 7
[INFO|trainer.py:4648] 2025-11-15 00:36:53,090 >> Batch size = 1
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 9.30it/s]
***** eval metrics *****
epoch = 1.0
eval_loss = 3.5243
eval_runtime = 0:00:00.71
eval_samples_per_second = 9.774
eval_steps_per_second = 5.585
[INFO|modelcard.py:456] 2025-11-15 00:36:53,806 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}

3.lora微调

在LLaMA-Factory文件夹下,创建 qwen2.5-7b-lora-sft.yaml 配置文件,用于设置lora微调的配置。

### 模型配置
# 预训练模型的本地路径或HuggingFace模型ID
model_name_or_path: /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B
# 必须开启以加载包含自定义代码的模型(如Qwen/ChatGLM等)
trust_remote_code: true

### 训练方法
# 训练阶段:监督式微调(Supervised Fine-Tuning)
stage: sft
# 是否启用训练模式
do_train: true
# 微调类型:LoRA(低秩适配)
finetuning_type: lora
# LoRA作用的目标层(all表示所有线性层)
lora_target: all
# LoRA的秩(矩阵分解维度)
lora_rank: 16
# LoRA的α值(缩放因子,通常等于rank)
lora_alpha: 16
# LoRA层的dropout率(防止过拟合)
lora_dropout: 0.05

### 数据集配置
# 使用的数据集名称(对应data目录下的数据集文件夹)
dataset: alpaca_zh_demo
# 使用的模板格式(需与模型匹配,如qwen/llama/chatglm)
template: qwen
# 输入序列最大长度(单位:token)
cutoff_len: 1024
# 是否覆盖已有的预处理缓存
overwrite_cache: true
# 数据预处理的并行进程数(建议设置为CPU核心数的50-70%)
preprocessing_num_workers: 16

### 输出配置
# 模型和日志的输出目录
output_dir: saves/qwen2.5-7b/lora/sft
# 每隔100训练步记录一次日志
logging_steps: 100
# 每隔100训练步保存一次模型
save_steps: 100
# 是否生成训练损失曲线图
plot_loss: true
# 是否覆盖已有输出目录(新训练时建议开启)
overwrite_output_dir: true

### 训练参数
# 每个GPU的批次大小(实际总batch_size = 此值 * gradient_accumulation_steps * GPU数)
per_device_train_batch_size: 1
# 梯度累积步数(用于模拟更大batch_size,此处等效总batch_size=16*GPU数)
gradient_accumulation_steps: 16
# 初始学习率(LoRA微调的典型学习率范围:1e-4 ~ 5e-4)
learning_rate: 1.0e-4
# 训练总轮数
num_train_epochs: 1.0
# 学习率调度策略(余弦退火)
lr_scheduler_type: cosine
# 学习率预热比例(前10%的step用于线性预热)
warmup_ratio: 0.1
# 启用BF16混合精度(需Ampere架构以上GPU,如A100/3090)
bf16: true
# 分布式训练超时时间(单位:毫秒,此处约50小时)
ddp_timeout: 180000000

### 评估配置
# 验证集划分比例(从训练集划分10%作为验证集)
val_size: 0.1
# 评估时每个GPU的批次大小
per_device_eval_batch_size: 1
# 评估策略:按训练步数间隔评估
eval_strategy: steps
# 每隔500训练步执行一次验证
eval_steps: 500

开始训练:

# llamafactory-cli : 主程序入口
# train : 子命令,指定执行训练任务
# qwen2.5-7b-lora-sft.yaml : YAML格式的配置文件路径(包含完整的训练参数)
llamafactory-cli train qwen2.5-7b-lora-sft.yaml

训练结果为:

[INFO|trainer.py:2519] 2025-11-15 00:39:43,504 >> ***** Running training *****
[INFO|trainer.py:2520] 2025-11-15 00:39:43,504 >> Num examples = 900
[INFO|trainer.py:2521] 2025-11-15 00:39:43,504 >> Num Epochs = 1
[INFO|trainer.py:2522] 2025-11-15 00:39:43,504 >> Instantaneous batch size per device = 1
[INFO|trainer.py:2525] 2025-11-15 00:39:43,504 >> Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:2526] 2025-11-15 00:39:43,504 >> Gradient Accumulation steps = 16
[INFO|trainer.py:2527] 2025-11-15 00:39:43,504 >> Total optimization steps = 29
[INFO|trainer.py:2528] 2025-11-15 00:39:43,507 >> Number of trainable parameters = 18,464,768
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [01:18<00:00, 1.99s/it][INFO|trainer.py:4309] 2025-11-15 00:41:02,102 >> Saving model checkpoint to saves/qwen2.5-7b/lora/sft/checkpoint-29
[INFO|configuration_utils.py:763] 2025-11-15 00:41:02,121 >> loading configuration file /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B/config.json
[INFO|configuration_utils.py:839] 2025-11-15 00:41:02,122 >> Model config Qwen2Config {
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"dtype": "bfloat16",
"eos_token_id": 151643,
"hidden_act": "silu",
"hidden_size": 1536,
"initializer_range": 0.02,
"intermediate_size": 8960,
"layer_types": [
"full_attention",
"full_attention",
...
],
"max_position_embeddings": 131072,
"max_window_layers": 28,
"model_type": "qwen2",
"num_attention_heads": 12,
"num_hidden_layers": 28,
"num_key_value_heads": 2,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": true,
"transformers_version": "4.57.1",
"use_cache": true,
"use_mrope": false,
"use_sliding_window": false,
"vocab_size": 151936
}

[INFO|tokenization_utils_base.py:2421] 2025-11-15 00:41:02,220 >> chat template saved in saves/qwen2.5-7b/lora/sft/checkpoint-29/chat_template.jinja
[INFO|tokenization_utils_base.py:2590] 2025-11-15 00:41:02,220 >> tokenizer config file saved in saves/qwen2.5-7b/lora/sft/checkpoint-29/tokenizer_config.json
[INFO|tokenization_utils_base.py:2599] 2025-11-15 00:41:02,221 >> Special tokens file saved in saves/qwen2.5-7b/lora/sft/checkpoint-29/special_tokens_map.json
[INFO|trainer.py:2810] 2025-11-15 00:41:02,515 >>

Training completed. Do not forget to share your model on huggingface.co/models =)

{'train_runtime': 79.008, 'train_samples_per_second': 11.391, 'train_steps_per_second': 0.367, 'train_loss': 1.657024120462352, 'epoch': 1.0}
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [01:18<00:00, 2.72s/it]
[INFO|trainer.py:4309] 2025-11-15 00:41:02,518 >> Saving model checkpoint to saves/qwen2.5-7b/lora/sft
[INFO|configuration_utils.py:763] 2025-11-15 00:41:02,537 >> loading configuration file /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B/config.json
[INFO|configuration_utils.py:839] 2025-11-15 00:41:02,538 >> Model config Qwen2Config {
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"dtype": "bfloat16",
"eos_token_id": 151643,
"hidden_act": "silu",
"hidden_size": 1536,
"initializer_range": 0.02,
"intermediate_size": 8960,
"layer_types": [
"full_attention",
"full_attention",
...
],
"max_position_embeddings": 131072,
"max_window_layers": 28,
"model_type": "qwen2",
"num_attention_heads": 12,
"num_hidden_layers": 28,
"num_key_value_heads": 2,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": true,
"transformers_version": "4.57.1",
"use_cache": true,
"use_mrope": false,
"use_sliding_window": false,
"vocab_size": 151936
}

[INFO|tokenization_utils_base.py:2421] 2025-11-15 00:41:02,611 >> chat template saved in saves/qwen2.5-7b/lora/sft/chat_template.jinja
[INFO|tokenization_utils_base.py:2590] 2025-11-15 00:41:02,611 >> tokenizer config file saved in saves/qwen2.5-7b/lora/sft/tokenizer_config.json
[INFO|tokenization_utils_base.py:2599] 2025-11-15 00:41:02,611 >> Special tokens file saved in saves/qwen2.5-7b/lora/sft/special_tokens_map.json
***** train metrics *****
epoch = 1.0
total_flos = 1054627GF
train_loss = 1.657
train_runtime = 0:01:19.00
train_samples_per_second = 11.391
train_steps_per_second = 0.367
[WARNING|2025-11-15 00:41:02] llamafactory.extras.ploting:148 >> No metric loss to plot.
[WARNING|2025-11-15 00:41:02] llamafactory.extras.ploting:148 >> No metric eval_loss to plot.
[WARNING|2025-11-15 00:41:02] llamafactory.extras.ploting:148 >> No metric eval_accuracy to plot.
[INFO|trainer.py:4643] 2025-11-15 00:41:02,752 >>
***** Running Evaluation *****
[INFO|trainer.py:4645] 2025-11-15 00:41:02,752 >> Num examples = 100
[INFO|trainer.py:4648] 2025-11-15 00:41:02,752 >> Batch size = 1
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:01<00:00, 31.55it/s]
***** eval metrics *****
epoch = 1.0
eval_loss = 1.6728
eval_runtime = 0:00:01.62
eval_samples_per_second = 61.354
eval_steps_per_second = 30.677
[INFO|modelcard.py:456] 2025-11-15 00:41:04,381 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}

4.QLora微调

在LLaMA-Factory文件夹下,创建 qwen2.5-7b-qlora-sft.yaml 配置文件,用于设置qlora微调的配置。

### 模型配置
# 预训练模型的本地路径或HuggingFace模型ID(需确保路径正确)
model_name_or_path: /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B
# 必须开启以加载包含自定义代码的模型(如Qwen/ChatGLM等)
trust_remote_code: true

### 训练方法
# 训练阶段:监督式微调(Supervised Fine-Tuning)
stage: sft
# 是否启用训练模式
do_train: true
# 微调类型:QLoRA(量化低秩适配)
finetuning_type: lora
# QLoRA作用的目标层(all表示所有线性层)
lora_target: all
# 量化位数(4-bit量化)
quantization_bit: 4
# 量化方法(使用bitsandbytes库实现)
quantization_method: bitsandbytes
# QLoRA的秩(矩阵分解维度)
lora_rank: 16
# QLoRA的α值(缩放因子,通常等于rank)
lora_alpha: 16
# QLoRA层的dropout率(防止过拟合)
lora_dropout: 0.05

### 数据集配置
# 使用的数据集名称(对应data目录下的数据集文件夹)
dataset: alpaca_zh_demo
# 使用的模板格式(需与模型架构匹配)
template: qwen
# 输入序列最大长度(单位:token)
cutoff_len: 1024
# 是否覆盖已有的预处理缓存(数据集修改后需启用)
overwrite_cache: true
# 数据预处理的并行进程数(建议设置为CPU核心数的50-70%)
preprocessing_num_workers: 16

### 输出配置
# 模型和日志的输出目录(QLoRA检查点保存路径)
output_dir: saves/qwen2.5-7b/qlora/sft
# 每隔100训练步记录一次日志
logging_steps: 100
# 每隔100训练步保存一次模型
save_steps: 100
# 是否生成训练损失曲线图(保存在output_dir/loss.png)
plot_loss: true
# 是否覆盖已有输出目录(新训练时建议开启)
overwrite_output_dir: true

### 训练参数
# 每个GPU的批次大小(实际总batch_size = 此值 * gradient_accumulation_steps * GPU数)
per_device_train_batch_size: 1
# 梯度累积步数(用于模拟更大batch_size,此处等效总batch_size=16*GPU数)
gradient_accumulation_steps: 16
# 初始学习率(QLoRA典型学习率范围:1e-4 ~ 5e-4)
learning_rate: 1.0e-4
# 训练总轮数
num_train_epochs: 1.0
# 学习率调度策略(余弦退火)
lr_scheduler_type: cosine
# 学习率预热比例(前10%的step用于线性预热)
warmup_ratio: 0.1
# 启用BF16混合精度(需Ampere架构以上GPU,如A100/3090)
bf16: true
# 分布式训练超时时间(单位:毫秒,此处约50小时)
ddp_timeout: 180000000

### 评估配置
# 验证集划分比例(从训练集划分10%作为验证集)
val_size: 0.1
# 评估时每个GPU的批次大小
per_device_eval_batch_size: 1
# 评估策略:按训练步数间隔评估
eval_strategy: steps
# 每隔500训练步执行一次验证
eval_steps: 500

QLoRA训练:

# llamafactory-cli : 主程序入口
# train : 子命令,指定执行训练任务
# qwen2.5-7b-lora-sft.yaml : YAML格式的配置文件路径(包含完整的训练参数)
llamafactory-cli train qwen2.5-7b-qlora-sft.yaml

训练结果如下:

[INFO|trainer.py:2519] 2025-11-15 00:43:46,249 >> ***** Running training *****
[INFO|trainer.py:2520] 2025-11-15 00:43:46,249 >> Num examples = 900
[INFO|trainer.py:2521] 2025-11-15 00:43:46,249 >> Num Epochs = 1
[INFO|trainer.py:2522] 2025-11-15 00:43:46,249 >> Instantaneous batch size per device = 1
[INFO|trainer.py:2525] 2025-11-15 00:43:46,249 >> Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:2526] 2025-11-15 00:43:46,249 >> Gradient Accumulation steps = 16
[INFO|trainer.py:2527] 2025-11-15 00:43:46,249 >> Total optimization steps = 29
[INFO|trainer.py:2528] 2025-11-15 00:43:46,254 >> Number of trainable parameters = 18,464,768
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [01:20<00:00, 1.98s/it][INFO|trainer.py:4309] 2025-11-15 00:45:06,653 >> Saving model checkpoint to saves/qwen2.5-7b/qlora/sft/checkpoint-29
[INFO|configuration_utils.py:763] 2025-11-15 00:45:06,673 >> loading configuration file /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B/config.json
[INFO|configuration_utils.py:839] 2025-11-15 00:45:06,674 >> Model config Qwen2Config {
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"dtype": "bfloat16",
"eos_token_id": 151643,
"hidden_act": "silu",
"hidden_size": 1536,
"initializer_range": 0.02,
"intermediate_size": 8960,
"layer_types": [
"full_attention",
"full_attention",
...
],
"max_position_embeddings": 131072,
"max_window_layers": 28,
"model_type": "qwen2",
"num_attention_heads": 12,
"num_hidden_layers": 28,
"num_key_value_heads": 2,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": true,
"transformers_version": "4.57.1",
"use_cache": true,
"use_mrope": false,
"use_sliding_window": false,
"vocab_size": 151936
}

[INFO|tokenization_utils_base.py:2421] 2025-11-15 00:45:06,761 >> chat template saved in saves/qwen2.5-7b/qlora/sft/checkpoint-29/chat_template.jinja
[INFO|tokenization_utils_base.py:2590] 2025-11-15 00:45:06,761 >> tokenizer config file saved in saves/qwen2.5-7b/qlora/sft/checkpoint-29/tokenizer_config.json
[INFO|tokenization_utils_base.py:2599] 2025-11-15 00:45:06,761 >> Special tokens file saved in saves/qwen2.5-7b/qlora/sft/checkpoint-29/special_tokens_map.json
[INFO|trainer.py:2810] 2025-11-15 00:45:07,051 >>

Training completed. Do not forget to share your model on huggingface.co/models =)

{'train_runtime': 80.7972, 'train_samples_per_second': 11.139, 'train_steps_per_second': 0.359, 'train_loss': 1.6571868370319236, 'epoch': 1.0}
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [01:20<00:00, 2.78s/it]
[INFO|trainer.py:4309] 2025-11-15 00:45:07,054 >> Saving model checkpoint to saves/qwen2.5-7b/qlora/sft
[INFO|configuration_utils.py:763] 2025-11-15 00:45:07,073 >> loading configuration file /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B/config.json
[INFO|configuration_utils.py:839] 2025-11-15 00:45:07,074 >> Model config Qwen2Config {
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"dtype": "bfloat16",
"eos_token_id": 151643,
"hidden_act": "silu",
"hidden_size": 1536,
"initializer_range": 0.02,
"intermediate_size": 8960,
"layer_types": [
"full_attention",
"full_attention",
...
],
"max_position_embeddings": 131072,
"max_window_layers": 28,
"model_type": "qwen2",
"num_attention_heads": 12,
"num_hidden_layers": 28,
"num_key_value_heads": 2,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": true,
"transformers_version": "4.57.1",
"use_cache": true,
"use_mrope": false,
"use_sliding_window": false,
"vocab_size": 151936
}

[INFO|tokenization_utils_base.py:2421] 2025-11-15 00:45:07,136 >> chat template saved in saves/qwen2.5-7b/qlora/sft/chat_template.jinja
[INFO|tokenization_utils_base.py:2590] 2025-11-15 00:45:07,136 >> tokenizer config file saved in saves/qwen2.5-7b/qlora/sft/tokenizer_config.json
[INFO|tokenization_utils_base.py:2599] 2025-11-15 00:45:07,137 >> Special tokens file saved in saves/qwen2.5-7b/qlora/sft/special_tokens_map.json
***** train metrics *****
epoch = 1.0
total_flos = 1054627GF
train_loss = 1.6572
train_runtime = 0:01:20.79
train_samples_per_second = 11.139
train_steps_per_second = 0.359
[WARNING|2025-11-15 00:45:07] llamafactory.extras.ploting:148 >> No metric loss to plot.
[WARNING|2025-11-15 00:45:07] llamafactory.extras.ploting:148 >> No metric eval_loss to plot.
[WARNING|2025-11-15 00:45:07] llamafactory.extras.ploting:148 >> No metric eval_accuracy to plot.
[INFO|trainer.py:4643] 2025-11-15 00:45:07,276 >>
***** Running Evaluation *****
[INFO|trainer.py:4645] 2025-11-15 00:45:07,277 >> Num examples = 100
[INFO|trainer.py:4648] 2025-11-15 00:45:07,277 >> Batch size = 1
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:01<00:00, 31.85it/s]
***** eval metrics *****
epoch = 1.0
eval_loss = 1.6738
eval_runtime = 0:00:01.61
eval_samples_per_second = 61.919
eval_steps_per_second = 30.96
[INFO|modelcard.py:456] 2025-11-15 00:45:08,890 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}

使用上述训练配置,各个方法实测的显存占用如下。训练中的显存占用与训练参数配置息息相关,可根据自身实际需求进行设置。

  • 全量参数训练:42.18GB
  • LoRA训练:20.17GB
  • QLoRA训练: 10.97GB

五、合并模型权重

1.模型合并

如果采用LoRA或者QLoRA进行训练,脚本只保存对应的LoRA权重,需要合并权重才能进行推理。全量参数训练无需执行此步骤。下面将LoRA微调的权重和预训练模型进行合并。注意:如果是QLoRA微调的权重需要和使用NF4方式量化后的预训练模型进行合并。

微调的命令如下:

llamafactory-cli export qwen2.5-7b-merge-lora.yaml

其中 qwen2.5-7b-merge-lora.yaml 中配置如下:

### model
model_name_or_path: /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B
adapter_name_or_path: /root/autodl-tmp/LLaMA-Factory/saves/qwen2.5-7b/lora/sft
template: qwen
finetuning_type: lora
trust_remote_code: true // 必须开启

### export
export_dir: /root/autodl-tmp/LLaMA-Factory/models/qwen2.5-7b-sft-lora-merged
export_size: 2
export_device: cpu
export_legacy_format: false

权重合并的部分参数说明:

参数说明
model_name_or_path 预训练模型的名称或路径
template 模型类型
export_dir 导出路径
export_size 最大导出模型文件大小
export_device 导出设备
export_legacy_format 是否使用旧格式导出

注意:

  • 合并Qwen2.5模型权重,务必将template设为qwen;无论LoRA还是QLoRA训练,合并权重时,finetuning_type均为lora。
  • adapter_name_or_path需要与微调中的适配器输出路径output_dir相对应。

2.测试

inference.py 文件内容如下:

import time
from transformers import AutoModelForCausalLM, AutoTokenizer

# 加载 tokenizer 和 model
tokenizer = AutoTokenizer.from_pretrained(
"/root/autodl-tmp/LLaMA-Factory/models/qwen2.5-7b-sft-lora-merged",
trust_remote_code=True
)

model = AutoModelForCausalLM.from_pretrained(
"/root/autodl-tmp/LLaMA-Factory/models/qwen2.5-7b-sft-lora-merged",
device_map="auto",
trust_remote_code=True
).eval()

prompt = "你好"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# 记录生成开始时间
start_time = time.time()

# 使用 generate 生成文本
outputs = model.generate(
**inputs,
max_new_tokens=128,
do_sample=True,
temperature=0.3,
top_p=0.4
)

# 记录生成结束时间
end_time = time.time()

# 解码输出
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("生成结果:", response)

# 统计生成速度
num_generated_tokens = outputs.shape[1] inputs['input_ids'].shape[1] # 新生成的token数量
elapsed_time = end_time start_time
tokens_per_second = num_generated_tokens / elapsed_time if elapsed_time > 0 else 0

print(f"生成了 {num_generated_tokens} 个 token,用时 {elapsed_time:.2f} 秒,速度约为 {tokens_per_second:.2f} token/s")

结果如下:

生成结果: 你好,我有一个问题想问。
您好,请问有什么问题需要帮助吗?

我最近感到很焦虑,有什么方法可以缓解吗?
焦虑是一种常见的心理问题,您可以尝试进行深呼吸、冥想、运动、与朋友聊天等方式来缓解焦虑。同时,也可以考虑寻求专业心理咨询师的帮助。
生成了 64 个 token,用时 2.17 秒,速度约为 29.46 token/s

赞(0)
未经允许不得转载:网硕互联帮助中心 » LLaMA-Factory使用
分享到: 更多 (0)

评论 抢沙发

评论前必须登录!