LLaMA-Factory使用

文章目录

一、LLAMA-Factory简介
二、安装LLaMA-Factory
三、准备训练数据
四、模型训练
- 1. 模型下载
- 2. 全量微调
- 3.lora微调
- 4.QLora微调
五、合并模型权重
- 1.模型合并
- 2.测试

一、LLAMA-Factory简介

LLaMA-Factory是一个简单易用且高效的大模型训练框架，支持上百种大模型的训练，框架特性主要包括：

模型种类：LLaMA、LLaVA、Mistral、Mixtral-MoE、Qwen、Yi、Gemma、Baichuan、ChatGLM、Phi 等等。
训练算法：（增量）预训练、（多模态）指令监督微调、奖励模型训练、PPO 训练、DPO 训练、KTO 训练、ORPO 训练等等。
运算精度：16比特全参数微调、冻结微调、LoRA微调和基于AQLM/AWQ/GPTQ/LLM.int8/HQQ/EETQ的⅔/⅘/6/8比特QLoRA 微调。
优化算法：GaLore、BAdam、DoRA、LongLoRA、LLaMA Pro、Mixture-of-Depths、LoRA+、LoftQ和PiSSA。
加速算子：FlashAttention-2和Unsloth。
推理引擎：Transformers和vLLM。
实验面板：LlamaBoard、TensorBoard、Wandb、MLflow等等。

本文将介绍如何使用LLAMA-Factory对Qwen2.5系列大模型进行微调（Qwen1.5系列模型也适用），更多特性请参考https://github.com/hiyouga/LlamaFactory

二、安装LLaMA-Factory

LLaMA-Factory的github地址为：https://github.com/hiyouga/LLaMA-Factory 。为防止项目更新带来软件版本不适配，我们下面安装一个历史版本。

在使用AutoDL克隆git仓库时，速度较慢，可以运行如下命令。

source /etc/network_turbo

下载并安装LLaMA-Factory：

cd /root/autodl-tmp git clone –depth 1 https://github.com/Jiangnanjiezi/LLaMA-Factory.git cd LLaMA-Factory pip install -e “.[torch,metrics]” -i https://mirrors.aliyun.com/pypi/simple/

安装完成后，执行 llamafactory-cli version，若出现以下提示，则表明安装成功：

三、准备训练数据

训练数据应保存为json文件，文件为：qwen_dataset.json。需要将其放到 autodl-tmp/LLaMA-Factory/data 下。

其内容示例如下：

[
{
"instruction": "请提取以下内容中的摘要信息",
"input": "保持身体健康的五个方法：\\n\\n1. 每天至少饮用8杯水，促进新陈代谢\\n2. 每周进行150分钟中等强度运动，如快走或游泳\\n3. 保证7-9小时高质量睡眠，避免熬夜\\n4. 饮食中增加蔬菜水果比例，减少油炸食品\\n5. 定期体检，监测血压、血糖等指标",
"output": "多喝水、规律运动、充足睡眠、均衡饮食、定期体检"
},
{
"instruction": "请提取以下内容中的摘要信息",
"input": "提高学习效率的三个技巧：\\n\\n1. 使用番茄工作法，每25分钟专注后休息5分钟\\n2. 建立思维导图整理知识框架\\n3. 睡前复习重点内容加强记忆",
"output": "番茄工作法、思维导图、睡前复习"
},
{
"instruction": "请提取以下内容中的摘要信息",
"input": "旅行必备物品清单：\\n1. 护照/身份证原件及复印件\\n2. 便携充电宝和转换插头\\n3. 常用药品（退烧药、创可贴）\\n4. 轻便折叠雨伞\\n5. 分装洗漱用品",
"output": "证件、充电设备、药品、雨具、洗漱包"
},
{
"instruction": "请提取以下内容中的摘要信息",
"input": "职场沟通四大原则：\\n① 明确沟通目标\\n② 使用金字塔表达结构\\n③ 注意非语言信号（眼神/姿态）\\n④ 及时确认信息理解度",
"output": "目标明确、结构化表达、非语言交流、信息确认"
},
]

在LLaMA-Factory文件夹下的data/dataset_info.json文件中注册自定义的训练数据，在文件中添加如下配置信息：

"qwen_dataset": {
"file_name": "qwen_dataset.json"
},

四、模型训练

1. 模型下载

安装modelscope

pip install modelscope

下载Qwen2.5

mkdir –p /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B
cd /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B

# 下载模型
# modelscope download –model Qwen/Qwen2.5-7B –local_dir ./
# 因为7B模型下载太慢，并且微调所占显存也大，所以用1.8B模型来演示
modelscope download —model Qwen/Qwen2.5-1.5B —local_dir ./

2. 全量微调

在LLaMA-Factory文件夹下，创建 qwen2.5-7b-full-sft.yaml 配置文件，用于设置全量参数训练的配置。

### 模型配置
# 预训练模型的本地路径或HuggingFace模型ID
model_name_or_path: /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B
# 必须开启以加载包含自定义代码的模型（如Qwen/ChatGLM等）
trust_remote_code: true

### 方法配置
# 微调阶段：监督式微调 (Supervised Fine-Tuning)
stage: sft
# 是否执行训练阶段
do_train: true
# 微调类型：全参数微调（可选值：full/lora/qlora）
finetuning_type: full
# DeepSpeed配置文件路径（使用ZeRO Stage 3优化策略）
deepspeed: /root/autodl-tmp/LLaMA-Factory/examples/deepspeed/ds_z3_config.json

### 数据集配置
# 使用的数据集名称（需与data目录下的数据集名称对应）
dataset: qwen_dataset
# 使用的模板格式（与模型架构匹配）
template: qwen
# 输入序列最大长度（单位：token）
cutoff_len: 1024
# 是否覆盖已有的缓存文件（建议数据集修改后启用）
overwrite_cache: true
# 数据预处理的并行进程数（建议设置为CPU核心数的50-70%）
preprocessing_num_workers: 16

### 输出配置
# 模型和日志的输出目录
output_dir: saves/qwen2.5-7b/full
# 每隔多少训练步记录一次日志
logging_steps: 10
# 每隔多少训练步保存一次模型
save_steps: 100
# 是否生成训练损失曲线图
plot_loss: true
# 是否覆盖已有输出目录（建议新训练时启用）
overwrite_output_dir: true

### 训练参数
# 每个GPU的批次大小（实际batch_size = 此值 * gradient_accumulation_steps * GPU数量）
per_device_train_batch_size: 1
# 梯度累积步数（用于模拟更大batch_size）
gradient_accumulation_steps: 16
# 初始学习率（适合7B级别模型的典型值）
learning_rate: 1.0e-5
# 训练总轮数
num_train_epochs: 1.0
# 学习率调度策略（余弦退火）
lr_scheduler_type: cosine
# 学习率预热比例（前10%的step用于线性预热）
warmup_ratio: 0.1
# 启用BF16混合精度训练（需要Ampere架构以上GPU）
bf16: true
# 分布式训练超时时间（单位：毫秒）
ddp_timeout: 180000000 # 约50小时

### 评估配置
# 验证集划分比例（从训练集划分）
val_size: 0.1
# 评估时每个GPU的批次大小
per_device_eval_batch_size: 1
# 评估策略（按训练步数间隔评估）
eval_strategy: steps
# 每隔多少训练步执行一次评估
eval_steps: 500

deepspeed的配置：

{
// 全局训练批次大小（自动计算为：micro_batch * gpu_num * gradient_accumulation）
"train_batch_size": "auto",

// 单GPU的微批次大小（根据显存自动调整）
"train_micro_batch_size_per_gpu": "auto",

// 梯度累积步数（自动匹配micro_batch配置）
"gradient_accumulation_steps": "auto",

// 梯度裁剪阈值（自动禁用或设置默认1.0）
"gradient_clipping": "auto",

// 允许未经官方测试的优化器（需谨慎开启）
"zero_allow_untested_optimizer": true,

// FP16混合精度配置
"fp16": {
"enabled": "auto", // 自动根据硬件兼容性启用
"loss_scale": 0, // 动态损失缩放（0表示自动调整）
"loss_scale_window": 1000,// 缩放调整窗口大小（1000次迭代）
"initial_scale_power": 16,// 初始缩放比例2^16
"hysteresis": 2, // 缩放容差（防止频繁调整）
"min_loss_scale": 1 // 最小缩放比例
},

// BF16混合精度配置（与FP16二选一）
"bf16": {
"enabled": "auto" // 在支持BF16的GPU上自动启用
},

// ZeRO优化策略（Stage3完整配置）
"zero_optimization": {
"stage": 3, // 最高优化等级（参数/梯度/优化器状态分片）

// 优化器状态卸载到CPU
"offload_optimizer": {
"device": "cpu", // 卸载到CPU内存
"pin_memory": true // 使用锁页内存加速传输
},

// 模型参数卸载到CPU
"offload_param": {
"device": "cpu", // 参数存储到CPU内存
"pin_memory": true // 使用DMA加速数据传输
},

"overlap_comm": false, // 禁用通信计算重叠（提升稳定性）
"contiguous_gradients": true, // 保持梯度内存连续（优化显存）

// 参数分组配置
"sub_group_size": 1e9, // 单参数组最大尺寸（默认1B防止分组）

// 通信缓冲区自动调整
"reduce_bucket_size": "auto", // AllReduce缓冲区大小
"stage3_prefetch_bucket_size": "auto", // 参数预取缓冲区

// 参数持久化阈值
"stage3_param_persistence_threshold": "auto", // 参数驻留GPU的阈值

"stage3_max_live_parameters": 1e9, // 最大驻留参数数量
"stage3_max_reuse_distance": 1e9, // 参数重用距离阈值

// 模型保存时收集16位权重
"stage3_gather_16bit_weights_on_model_save": true
}
}

开始训练：切换到qwen2.5-7b-full-sft.yaml所在的路径，执行下面的命令。

# 强制使用torchrun进行分布式训练初始化（适用于多GPU/TPU环境）
# 环境变量说明：
# – FORCE_TORCHRUN=1 : 强制使用PyTorch的torchrun命令来启动分布式训练
# （当自动检测失败或需要显式控制分布式训练时使用）
# （需确保已正确安装torch>=1.8.0）

# 执行LLaMA Factory训练流程
# 命令结构：
# llamafactory-cli : 主程序入口（基于Python Fire的CLI工具）
# train : 子命令，指定执行训练任务
# qwen2.5-7b-full-sft.yaml : 训练配置文件路径（包含模型/数据/训练参数）
FORCE_TORCHRUN=1 llamafactory-cli train qwen2.5-7b-full-sft.yaml

训练结果：

[INFO|trainer.py:2519] 2025-11-15 00:36:01,373 >> ***** Running training *****
[INFO|trainer.py:2520] 2025-11-15 00:36:01,373 >> Num examples = 54
[INFO|trainer.py:2521] 2025-11-15 00:36:01,373 >> Num Epochs = 1
[INFO|trainer.py:2522] 2025-11-15 00:36:01,373 >> Instantaneous batch size per device = 1
[INFO|trainer.py:2525] 2025-11-15 00:36:01,373 >> Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:2526] 2025-11-15 00:36:01,373 >> Gradient Accumulation steps = 16
[INFO|trainer.py:2527] 2025-11-15 00:36:01,373 >> Total optimization steps = 2
[INFO|trainer.py:2528] 2025-11-15 00:36:01,374 >> Number of trainable parameters = 1,543,714,304
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:24<00:00, 11.65s/it][INFO|trainer.py:4309] 2025-11-15 00:36:28,898 >> Saving model checkpoint to saves/qwen2.5-7b/full/checkpoint-2
[INFO|configuration_utils.py:491] 2025-11-15 00:36:28,901 >> Configuration saved in saves/qwen2.5-7b/full/checkpoint-2/config.json
[INFO|configuration_utils.py:757] 2025-11-15 00:36:28,902 >> Configuration saved in saves/qwen2.5-7b/full/checkpoint-2/generation_config.json
[INFO|modeling_utils.py:4181] 2025-11-15 00:36:33,246 >> Model weights saved in saves/qwen2.5-7b/full/checkpoint-2/model.safetensors
[INFO|tokenization_utils_base.py:2421] 2025-11-15 00:36:33,247 >> chat template saved in saves/qwen2.5-7b/full/checkpoint-2/chat_template.jinja
[INFO|tokenization_utils_base.py:2590] 2025-11-15 00:36:33,248 >> tokenizer config file saved in saves/qwen2.5-7b/full/checkpoint-2/tokenizer_config.json
[INFO|tokenization_utils_base.py:2599] 2025-11-15 00:36:33,248 >> Special tokens file saved in saves/qwen2.5-7b/full/checkpoint-2/special_tokens_map.json
[2025-11-15 00:36:33,422] [INFO] [logging.py:107:log_dist] [Rank 0] [Torch] Checkpoint global_step2 is about to be saved!
[2025-11-15 00:36:33,428] [INFO] [logging.py:107:log_dist] [Rank 0] Saving model checkpoint: saves/qwen2.5-7b/full/checkpoint-2/global_step2/zero_pp_rank_0_mp_rank_00_model_states.pt
[2025-11-15 00:36:33,428] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving saves/qwen2.5-7b/full/checkpoint-2/global_step2/zero_pp_rank_0_mp_rank_00_model_states.pt...
[2025-11-15 00:36:33,438] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved saves/qwen2.5-7b/full/checkpoint-2/global_step2/zero_pp_rank_0_mp_rank_00_model_states.pt.
[2025-11-15 00:36:33,439] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving saves/qwen2.5-7b/full/checkpoint-2/global_step2/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt...
[2025-11-15 00:36:47,668] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved saves/qwen2.5-7b/full/checkpoint-2/global_step2/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt.
[2025-11-15 00:36:47,669] [INFO] [engine.py:3701:_save_zero_checkpoint] zero checkpoint saved saves/qwen2.5-7b/full/checkpoint-2/global_step2/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
[2025-11-15 00:36:47,673] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step2 is ready now!
[INFO|trainer.py:2810] 2025-11-15 00:36:47,675 >>

Training completed. Do not forget to share your model on huggingface.co/models =)

{'train_runtime': 46.3009, 'train_samples_per_second': 1.166, 'train_steps_per_second': 0.043, 'train_loss': 3.873927593231201, 'epoch': 1.0}
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:46<00:00, 23.15s/it]
[INFO|trainer.py:4309] 2025-11-15 00:36:49,886 >> Saving model checkpoint to saves/qwen2.5-7b/full
[INFO|configuration_utils.py:491] 2025-11-15 00:36:49,889 >> Configuration saved in saves/qwen2.5-7b/full/config.json
[INFO|configuration_utils.py:757] 2025-11-15 00:36:49,889 >> Configuration saved in saves/qwen2.5-7b/full/generation_config.json
[INFO|modeling_utils.py:4181] 2025-11-15 00:36:52,910 >> Model weights saved in saves/qwen2.5-7b/full/model.safetensors
[INFO|tokenization_utils_base.py:2421] 2025-11-15 00:36:52,910 >> chat template saved in saves/qwen2.5-7b/full/chat_template.jinja
[INFO|tokenization_utils_base.py:2590] 2025-11-15 00:36:52,911 >> tokenizer config file saved in saves/qwen2.5-7b/full/tokenizer_config.json
[INFO|tokenization_utils_base.py:2599] 2025-11-15 00:36:52,911 >> Special tokens file saved in saves/qwen2.5-7b/full/special_tokens_map.json
***** train metrics *****
epoch = 1.0
total_flos = 4GF
train_loss = 3.8739
train_runtime = 0:00:46.30
train_samples_per_second = 1.166
train_steps_per_second = 0.043
[WARNING|2025-11-15 00:36:53] llamafactory.extras.ploting:148 >> No metric loss to plot.
[WARNING|2025-11-15 00:36:53] llamafactory.extras.ploting:148 >> No metric eval_loss to plot.
[WARNING|2025-11-15 00:36:53] llamafactory.extras.ploting:148 >> No metric eval_accuracy to plot.
[INFO|trainer.py:4643] 2025-11-15 00:36:53,090 >>
***** Running Evaluation *****
[INFO|trainer.py:4645] 2025-11-15 00:36:53,090 >> Num examples = 7
[INFO|trainer.py:4648] 2025-11-15 00:36:53,090 >> Batch size = 1
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 9.30it/s]
***** eval metrics *****
epoch = 1.0
eval_loss = 3.5243
eval_runtime = 0:00:00.71
eval_samples_per_second = 9.774
eval_steps_per_second = 5.585
[INFO|modelcard.py:456] 2025-11-15 00:36:53,806 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}

3.lora微调

在LLaMA-Factory文件夹下，创建 qwen2.5-7b-lora-sft.yaml 配置文件，用于设置lora微调的配置。

### 训练方法
# 训练阶段：监督式微调（Supervised Fine-Tuning）
stage: sft
# 是否启用训练模式
do_train: true
# 微调类型：LoRA（低秩适配）
finetuning_type: lora
# LoRA作用的目标层（all表示所有线性层）
lora_target: all
# LoRA的秩（矩阵分解维度）
lora_rank: 16
# LoRA的α值（缩放因子，通常等于rank）
lora_alpha: 16
# LoRA层的dropout率（防止过拟合）
lora_dropout: 0.05

### 数据集配置
# 使用的数据集名称（对应data目录下的数据集文件夹）
dataset: alpaca_zh_demo
# 使用的模板格式（需与模型匹配，如qwen/llama/chatglm）
template: qwen
# 输入序列最大长度（单位：token）
cutoff_len: 1024
# 是否覆盖已有的预处理缓存
overwrite_cache: true
# 数据预处理的并行进程数（建议设置为CPU核心数的50-70%）
preprocessing_num_workers: 16

### 输出配置
# 模型和日志的输出目录
output_dir: saves/qwen2.5-7b/lora/sft
# 每隔100训练步记录一次日志
logging_steps: 100
# 每隔100训练步保存一次模型
save_steps: 100
# 是否生成训练损失曲线图
plot_loss: true
# 是否覆盖已有输出目录（新训练时建议开启）
overwrite_output_dir: true

### 训练参数
# 每个GPU的批次大小（实际总batch_size = 此值 * gradient_accumulation_steps * GPU数）
per_device_train_batch_size: 1
# 梯度累积步数（用于模拟更大batch_size，此处等效总batch_size=16*GPU数）
gradient_accumulation_steps: 16
# 初始学习率（LoRA微调的典型学习率范围：1e-4 ~ 5e-4）
learning_rate: 1.0e-4
# 训练总轮数
num_train_epochs: 1.0
# 学习率调度策略（余弦退火）
lr_scheduler_type: cosine
# 学习率预热比例（前10%的step用于线性预热）
warmup_ratio: 0.1
# 启用BF16混合精度（需Ampere架构以上GPU，如A100/3090）
bf16: true
# 分布式训练超时时间（单位：毫秒，此处约50小时）
ddp_timeout: 180000000

### 评估配置
# 验证集划分比例（从训练集划分10%作为验证集）
val_size: 0.1
# 评估时每个GPU的批次大小
per_device_eval_batch_size: 1
# 评估策略：按训练步数间隔评估
eval_strategy: steps
# 每隔500训练步执行一次验证
eval_steps: 500

开始训练：

# llamafactory-cli : 主程序入口
# train : 子命令，指定执行训练任务
# qwen2.5-7b-lora-sft.yaml : YAML格式的配置文件路径（包含完整的训练参数）
llamafactory-cli train qwen2.5-7b-lora-sft.yaml

训练结果为：

[INFO|trainer.py:2519] 2025-11-15 00:39:43,504 >> ***** Running training *****
[INFO|trainer.py:2520] 2025-11-15 00:39:43,504 >> Num examples = 900
[INFO|trainer.py:2521] 2025-11-15 00:39:43,504 >> Num Epochs = 1
[INFO|trainer.py:2522] 2025-11-15 00:39:43,504 >> Instantaneous batch size per device = 1
[INFO|trainer.py:2525] 2025-11-15 00:39:43,504 >> Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:2526] 2025-11-15 00:39:43,504 >> Gradient Accumulation steps = 16
[INFO|trainer.py:2527] 2025-11-15 00:39:43,504 >> Total optimization steps = 29
[INFO|trainer.py:2528] 2025-11-15 00:39:43,507 >> Number of trainable parameters = 18,464,768
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [01:18<00:00, 1.99s/it][INFO|trainer.py:4309] 2025-11-15 00:41:02,102 >> Saving model checkpoint to saves/qwen2.5-7b/lora/sft/checkpoint-29
[INFO|configuration_utils.py:763] 2025-11-15 00:41:02,121 >> loading configuration file /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B/config.json
[INFO|configuration_utils.py:839] 2025-11-15 00:41:02,122 >> Model config Qwen2Config {
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"dtype": "bfloat16",
"eos_token_id": 151643,
"hidden_act": "silu",
"hidden_size": 1536,
"initializer_range": 0.02,
"intermediate_size": 8960,
"layer_types": [
"full_attention",
"full_attention",
...
],
"max_position_embeddings": 131072,
"max_window_layers": 28,
"model_type": "qwen2",
"num_attention_heads": 12,
"num_hidden_layers": 28,
"num_key_value_heads": 2,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": true,
"transformers_version": "4.57.1",
"use_cache": true,
"use_mrope": false,
"use_sliding_window": false,
"vocab_size": 151936
}

[INFO|tokenization_utils_base.py:2421] 2025-11-15 00:41:02,220 >> chat template saved in saves/qwen2.5-7b/lora/sft/checkpoint-29/chat_template.jinja
[INFO|tokenization_utils_base.py:2590] 2025-11-15 00:41:02,220 >> tokenizer config file saved in saves/qwen2.5-7b/lora/sft/checkpoint-29/tokenizer_config.json
[INFO|tokenization_utils_base.py:2599] 2025-11-15 00:41:02,221 >> Special tokens file saved in saves/qwen2.5-7b/lora/sft/checkpoint-29/special_tokens_map.json
[INFO|trainer.py:2810] 2025-11-15 00:41:02,515 >>

Training completed. Do not forget to share your model on huggingface.co/models =)

{'train_runtime': 79.008, 'train_samples_per_second': 11.391, 'train_steps_per_second': 0.367, 'train_loss': 1.657024120462352, 'epoch': 1.0}
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [01:18<00:00, 2.72s/it]
[INFO|trainer.py:4309] 2025-11-15 00:41:02,518 >> Saving model checkpoint to saves/qwen2.5-7b/lora/sft
[INFO|configuration_utils.py:763] 2025-11-15 00:41:02,537 >> loading configuration file /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B/config.json
[INFO|configuration_utils.py:839] 2025-11-15 00:41:02,538 >> Model config Qwen2Config {
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"dtype": "bfloat16",
"eos_token_id": 151643,
"hidden_act": "silu",
"hidden_size": 1536,
"initializer_range": 0.02,
"intermediate_size": 8960,
"layer_types": [
"full_attention",
"full_attention",
...
],
"max_position_embeddings": 131072,
"max_window_layers": 28,
"model_type": "qwen2",
"num_attention_heads": 12,
"num_hidden_layers": 28,
"num_key_value_heads": 2,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": true,
"transformers_version": "4.57.1",
"use_cache": true,
"use_mrope": false,
"use_sliding_window": false,
"vocab_size": 151936
}

[INFO|tokenization_utils_base.py:2421] 2025-11-15 00:41:02,611 >> chat template saved in saves/qwen2.5-7b/lora/sft/chat_template.jinja
[INFO|tokenization_utils_base.py:2590] 2025-11-15 00:41:02,611 >> tokenizer config file saved in saves/qwen2.5-7b/lora/sft/tokenizer_config.json
[INFO|tokenization_utils_base.py:2599] 2025-11-15 00:41:02,611 >> Special tokens file saved in saves/qwen2.5-7b/lora/sft/special_tokens_map.json
***** train metrics *****
epoch = 1.0
total_flos = 1054627GF
train_loss = 1.657
train_runtime = 0:01:19.00
train_samples_per_second = 11.391
train_steps_per_second = 0.367
[WARNING|2025-11-15 00:41:02] llamafactory.extras.ploting:148 >> No metric loss to plot.
[WARNING|2025-11-15 00:41:02] llamafactory.extras.ploting:148 >> No metric eval_loss to plot.
[WARNING|2025-11-15 00:41:02] llamafactory.extras.ploting:148 >> No metric eval_accuracy to plot.
[INFO|trainer.py:4643] 2025-11-15 00:41:02,752 >>
***** Running Evaluation *****
[INFO|trainer.py:4645] 2025-11-15 00:41:02,752 >> Num examples = 100
[INFO|trainer.py:4648] 2025-11-15 00:41:02,752 >> Batch size = 1
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:01<00:00, 31.55it/s]
***** eval metrics *****
epoch = 1.0
eval_loss = 1.6728
eval_runtime = 0:00:01.62
eval_samples_per_second = 61.354
eval_steps_per_second = 30.677
[INFO|modelcard.py:456] 2025-11-15 00:41:04,381 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}

4.QLora微调

在LLaMA-Factory文件夹下，创建 qwen2.5-7b-qlora-sft.yaml 配置文件，用于设置qlora微调的配置。

### 模型配置
# 预训练模型的本地路径或HuggingFace模型ID（需确保路径正确）
model_name_or_path: /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B
# 必须开启以加载包含自定义代码的模型（如Qwen/ChatGLM等）
trust_remote_code: true

### 训练方法
# 训练阶段：监督式微调（Supervised Fine-Tuning）
stage: sft
# 是否启用训练模式
do_train: true
# 微调类型：QLoRA（量化低秩适配）
finetuning_type: lora
# QLoRA作用的目标层（all表示所有线性层）
lora_target: all
# 量化位数（4-bit量化）
quantization_bit: 4
# 量化方法（使用bitsandbytes库实现）
quantization_method: bitsandbytes
# QLoRA的秩（矩阵分解维度）
lora_rank: 16
# QLoRA的α值（缩放因子，通常等于rank）
lora_alpha: 16
# QLoRA层的dropout率（防止过拟合）
lora_dropout: 0.05

### 数据集配置
# 使用的数据集名称（对应data目录下的数据集文件夹）
dataset: alpaca_zh_demo
# 使用的模板格式（需与模型架构匹配）
template: qwen
# 输入序列最大长度（单位：token）
cutoff_len: 1024
# 是否覆盖已有的预处理缓存（数据集修改后需启用）
overwrite_cache: true
# 数据预处理的并行进程数（建议设置为CPU核心数的50-70%）
preprocessing_num_workers: 16

### 输出配置
# 模型和日志的输出目录（QLoRA检查点保存路径）
output_dir: saves/qwen2.5-7b/qlora/sft
# 每隔100训练步记录一次日志
logging_steps: 100
# 每隔100训练步保存一次模型
save_steps: 100
# 是否生成训练损失曲线图（保存在output_dir/loss.png）
plot_loss: true
# 是否覆盖已有输出目录（新训练时建议开启）
overwrite_output_dir: true

### 训练参数
# 每个GPU的批次大小（实际总batch_size = 此值 * gradient_accumulation_steps * GPU数）
per_device_train_batch_size: 1
# 梯度累积步数（用于模拟更大batch_size，此处等效总batch_size=16*GPU数）
gradient_accumulation_steps: 16
# 初始学习率（QLoRA典型学习率范围：1e-4 ~ 5e-4）
learning_rate: 1.0e-4
# 训练总轮数
num_train_epochs: 1.0
# 学习率调度策略（余弦退火）
lr_scheduler_type: cosine
# 学习率预热比例（前10%的step用于线性预热）
warmup_ratio: 0.1
# 启用BF16混合精度（需Ampere架构以上GPU，如A100/3090）
bf16: true
# 分布式训练超时时间（单位：毫秒，此处约50小时）
ddp_timeout: 180000000

QLoRA训练：

# llamafactory-cli : 主程序入口
# train : 子命令，指定执行训练任务
# qwen2.5-7b-lora-sft.yaml : YAML格式的配置文件路径（包含完整的训练参数）
llamafactory-cli train qwen2.5-7b-qlora-sft.yaml

训练结果如下：

[INFO|trainer.py:2519] 2025-11-15 00:43:46,249 >> ***** Running training *****
[INFO|trainer.py:2520] 2025-11-15 00:43:46,249 >> Num examples = 900
[INFO|trainer.py:2521] 2025-11-15 00:43:46,249 >> Num Epochs = 1
[INFO|trainer.py:2522] 2025-11-15 00:43:46,249 >> Instantaneous batch size per device = 1
[INFO|trainer.py:2525] 2025-11-15 00:43:46,249 >> Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:2526] 2025-11-15 00:43:46,249 >> Gradient Accumulation steps = 16
[INFO|trainer.py:2527] 2025-11-15 00:43:46,249 >> Total optimization steps = 29
[INFO|trainer.py:2528] 2025-11-15 00:43:46,254 >> Number of trainable parameters = 18,464,768
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [01:20<00:00, 1.98s/it][INFO|trainer.py:4309] 2025-11-15 00:45:06,653 >> Saving model checkpoint to saves/qwen2.5-7b/qlora/sft/checkpoint-29
[INFO|configuration_utils.py:763] 2025-11-15 00:45:06,673 >> loading configuration file /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B/config.json
[INFO|configuration_utils.py:839] 2025-11-15 00:45:06,674 >> Model config Qwen2Config {
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"dtype": "bfloat16",
"eos_token_id": 151643,
"hidden_act": "silu",
"hidden_size": 1536,
"initializer_range": 0.02,
"intermediate_size": 8960,
"layer_types": [
"full_attention",
"full_attention",
...
],
"max_position_embeddings": 131072,
"max_window_layers": 28,
"model_type": "qwen2",
"num_attention_heads": 12,
"num_hidden_layers": 28,
"num_key_value_heads": 2,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": true,
"transformers_version": "4.57.1",
"use_cache": true,
"use_mrope": false,
"use_sliding_window": false,
"vocab_size": 151936
}

[INFO|tokenization_utils_base.py:2421] 2025-11-15 00:45:06,761 >> chat template saved in saves/qwen2.5-7b/qlora/sft/checkpoint-29/chat_template.jinja
[INFO|tokenization_utils_base.py:2590] 2025-11-15 00:45:06,761 >> tokenizer config file saved in saves/qwen2.5-7b/qlora/sft/checkpoint-29/tokenizer_config.json
[INFO|tokenization_utils_base.py:2599] 2025-11-15 00:45:06,761 >> Special tokens file saved in saves/qwen2.5-7b/qlora/sft/checkpoint-29/special_tokens_map.json
[INFO|trainer.py:2810] 2025-11-15 00:45:07,051 >>

Training completed. Do not forget to share your model on huggingface.co/models =)

{'train_runtime': 80.7972, 'train_samples_per_second': 11.139, 'train_steps_per_second': 0.359, 'train_loss': 1.6571868370319236, 'epoch': 1.0}
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [01:20<00:00, 2.78s/it]
[INFO|trainer.py:4309] 2025-11-15 00:45:07,054 >> Saving model checkpoint to saves/qwen2.5-7b/qlora/sft
[INFO|configuration_utils.py:763] 2025-11-15 00:45:07,073 >> loading configuration file /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B/config.json
[INFO|configuration_utils.py:839] 2025-11-15 00:45:07,074 >> Model config Qwen2Config {
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"dtype": "bfloat16",
"eos_token_id": 151643,
"hidden_act": "silu",
"hidden_size": 1536,
"initializer_range": 0.02,
"intermediate_size": 8960,
"layer_types": [
"full_attention",
"full_attention",
...
],
"max_position_embeddings": 131072,
"max_window_layers": 28,
"model_type": "qwen2",
"num_attention_heads": 12,
"num_hidden_layers": 28,
"num_key_value_heads": 2,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": true,
"transformers_version": "4.57.1",
"use_cache": true,
"use_mrope": false,
"use_sliding_window": false,
"vocab_size": 151936
}

[INFO|tokenization_utils_base.py:2421] 2025-11-15 00:45:07,136 >> chat template saved in saves/qwen2.5-7b/qlora/sft/chat_template.jinja
[INFO|tokenization_utils_base.py:2590] 2025-11-15 00:45:07,136 >> tokenizer config file saved in saves/qwen2.5-7b/qlora/sft/tokenizer_config.json
[INFO|tokenization_utils_base.py:2599] 2025-11-15 00:45:07,137 >> Special tokens file saved in saves/qwen2.5-7b/qlora/sft/special_tokens_map.json
***** train metrics *****
epoch = 1.0
total_flos = 1054627GF
train_loss = 1.6572
train_runtime = 0:01:20.79
train_samples_per_second = 11.139
train_steps_per_second = 0.359
[WARNING|2025-11-15 00:45:07] llamafactory.extras.ploting:148 >> No metric loss to plot.
[WARNING|2025-11-15 00:45:07] llamafactory.extras.ploting:148 >> No metric eval_loss to plot.
[WARNING|2025-11-15 00:45:07] llamafactory.extras.ploting:148 >> No metric eval_accuracy to plot.
[INFO|trainer.py:4643] 2025-11-15 00:45:07,276 >>
***** Running Evaluation *****
[INFO|trainer.py:4645] 2025-11-15 00:45:07,277 >> Num examples = 100
[INFO|trainer.py:4648] 2025-11-15 00:45:07,277 >> Batch size = 1
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:01<00:00, 31.85it/s]
***** eval metrics *****
epoch = 1.0
eval_loss = 1.6738
eval_runtime = 0:00:01.61
eval_samples_per_second = 61.919
eval_steps_per_second = 30.96
[INFO|modelcard.py:456] 2025-11-15 00:45:08,890 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}

使用上述训练配置，各个方法实测的显存占用如下。训练中的显存占用与训练参数配置息息相关，可根据自身实际需求进行设置。

全量参数训练：42.18GB
LoRA训练：20.17GB
QLoRA训练: 10.97GB

五、合并模型权重

1.模型合并

如果采用LoRA或者QLoRA进行训练，脚本只保存对应的LoRA权重，需要合并权重才能进行推理。全量参数训练无需执行此步骤。下面将LoRA微调的权重和预训练模型进行合并。注意：如果是QLoRA微调的权重需要和使用NF4方式量化后的预训练模型进行合并。

微调的命令如下：

llamafactory-cli export qwen2.5-7b-merge-lora.yaml

其中 qwen2.5-7b-merge-lora.yaml 中配置如下：

### model
model_name_or_path: /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B
adapter_name_or_path: /root/autodl-tmp/LLaMA-Factory/saves/qwen2.5-7b/lora/sft
template: qwen
finetuning_type: lora
trust_remote_code: true // 必须开启

### export
export_dir: /root/autodl-tmp/LLaMA-Factory/models/qwen2.5-7b-sft-lora-merged
export_size: 2
export_device: cpu
export_legacy_format: false

权重合并的部分参数说明：

参数说明

model_name_or_path	预训练模型的名称或路径
template	模型类型
export_dir	导出路径
export_size	最大导出模型文件大小
export_device	导出设备
export_legacy_format	是否使用旧格式导出

注意：

合并Qwen2.5模型权重，务必将template设为qwen；无论LoRA还是QLoRA训练，合并权重时，finetuning_type均为lora。
adapter_name_or_path需要与微调中的适配器输出路径output_dir相对应。

2.测试

inference.py 文件内容如下：

import time
from transformers import AutoModelForCausalLM, AutoTokenizer

# 加载 tokenizer 和 model
tokenizer = AutoTokenizer.from_pretrained(
"/root/autodl-tmp/LLaMA-Factory/models/qwen2.5-7b-sft-lora-merged",
trust_remote_code=True
)

model = AutoModelForCausalLM.from_pretrained(
"/root/autodl-tmp/LLaMA-Factory/models/qwen2.5-7b-sft-lora-merged",
device_map="auto",
trust_remote_code=True
).eval()

prompt = "你好"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# 记录生成开始时间
start_time = time.time()

# 使用 generate 生成文本
outputs = model.generate(
**inputs,
max_new_tokens=128,
do_sample=True,
temperature=0.3,
top_p=0.4
)

# 记录生成结束时间
end_time = time.time()

# 解码输出
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("生成结果：", response)

# 统计生成速度
num_generated_tokens = outputs.shape[1] – inputs['input_ids'].shape[1] # 新生成的token数量
elapsed_time = end_time – start_time
tokens_per_second = num_generated_tokens / elapsed_time if elapsed_time > 0 else 0

print(f"生成了 {num_generated_tokens} 个 token，用时 {elapsed_time:.2f} 秒，速度约为 {tokens_per_second:.2f} token/s")

结果如下：

生成结果：你好，我有一个问题想问。
您好，请问有什么问题需要帮助吗？

我最近感到很焦虑，有什么方法可以缓解吗？
焦虑是一种常见的心理问题，您可以尝试进行深呼吸、冥想、运动、与朋友聊天等方式来缓解焦虑。同时，也可以考虑寻求专业心理咨询师的帮助。
生成了 64 个 token，用时 2.17 秒，速度约为 29.46 token/s

文章目录

一、LLAMA-Factory简介

二、安装LLaMA-Factory

三、准备训练数据

四、模型训练

1. 模型下载

2. 全量微调

3.lora微调

4.QLora微调

五、合并模型权重

1.模型合并

2.测试

相关推荐

评论抢沙发

评论前必须登录！

热门标签

置顶推荐

热门文章

最新文章

文章目录

一、LLAMA-Factory简介

二、安装LLaMA-Factory

三、准备训练数据

四、模型训练

1. 模型下载

2. 全量微调

3.lora微调

4.QLora微调

五、合并模型权重

1.模型合并

2.测试

相关推荐

评论 抢沙发

评论前必须登录！

热门标签

置顶推荐

热门文章

最新文章

评论抢沙发