基于DeepSeek-R1-Distill-Llama-8B的健康管理助手微调过程

本次创新实训项目的主要任务是利用DEEPSEEK提供的开源模型，通过微调技术，实现一个专注于健康管理与医疗咨询的人工智能助手。本文详细记录我们如何对DeepSeek-R1-Distill-Llama-8B模型进行微调，以满足健康医疗领域应用的需求。

为什么选择DeepSeek-R1-Distill-Llama-8B？

我们选择DeepSeek-R1-Distill-Llama-8B模型主要基于以下原因：

模型规模合适：8B参数规模在GPU资源有限的条件下也能高效训练。
中文理解能力强：特别适合医疗咨询类问题，语言表述清晰且专业。
开放与自由：DeepSeek系列的开源特性让我们能灵活地微调和部署。

数据集的选择和介绍

我们使用的是FreedomIntelligence医疗推理数据集。该数据集专注于医疗推理任务，每条数据由以下三个部分组成：

Question：医学问题描述。
Complex_CoT：详细的医学推理过程。
Response：医学建议或诊疗方案。

数据示例：

{
"Question": "一个患有急性阑尾炎的病人已经发病5天，腹痛稍有减轻但仍然发热…",
"Complex_CoT": "考虑病程较长，阑尾可能已形成脓肿，需要进一步处理…",
"Response": "建议首先进行保守治疗，如有必要再考虑手术干预。"
}

LoRA 微调原理详解

LoRA (Low-Rank Adaptation) 是一种高效的微调技术，通过冻结原模型的参数，仅通过低秩矩阵来适应新任务。具体而言，LoRA在原始权重矩阵

∈

W_0 \\in \\mathbb{R}^{d \\times d}

$W_{0} \in R^{d \times d}$ 基础上，增加了两个低秩矩阵

∈

A \\in \\mathbb{R}^{r \\times d}

$A \in R^{r \times d}$ 和

∈

B \\in \\mathbb{R}^{d \\times r}

$B \in R^{d \times r}$ ，实现权重微调：

(

≪

)

\\Delta W = BA \\quad (r \\ll d)

$Δ W = B A (r ≪ d)$

实际更新后的权重表示为：

W = W_0 + BA

$W = W_{0} + B A$

LoRA的参数设置包括：

r (rank)：控制模型微调的容量与精度，通常取8至64。
lora_alpha：放大系数，用于调整LoRA微调的学习强度，通常与r取相近数值。

通过LoRA，能够极大降低训练成本与显存占用，仅用少量参数即可有效微调。

微调实现过程

环境配置

!pip install unsloth bitsandbytes transformers datasets trl

模型加载与量化

使用Unsloth进行高效加载（使用4-bit量化）：

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
"unsloth/DeepSeek-R1-Distill-Llama-8B",
max_seq_length=2048,
load_in_4bit=True
)

数据集处理

构建适合模型微调的Prompt模板：

from datasets import load_dataset

EOS = tokenizer.eos_token

def formatting_prompts_func(examples):
texts = []
for q, cot, ans in zip(examples["Question"], examples["Complex_CoT"], examples["Response"]):
text = f"""Below is an instruction…
### Question:
{q}

### Response:
<think>
{cot}
</think>
{ans}{EOS}"""
texts.append(text)
return {"text": texts}

dataset = load_dataset(
"FreedomIntelligence/medical-o1-reasoning-SFT",
'zh',
split="train[:500]"
).map(formatting_prompts_func, batched=True)

LoRA微调参数配置

model = FastLanguageModel.get_peft_model(
model,
r=16, # 设定秩大小
lora_alpha=16, # LoRA放缩因子
target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
use_gradient_checkpointing="unsloth"
)

微调训练

使用TRL库的SFTTrainer进行高效训练：

from trl import SFTTrainer
from transformers import TrainingArguments

trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=dataset,
dataset_text_field="text",
args=TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
max_steps=60,
learning_rate=2e-4,
fp16=True,
output_dir="outputs",
logging_steps=1
)
)

trainer.train()

微调后模型简单推理验证

FastLanguageModel.for_inference(model)

question = "“最近感觉睡眠质量差，晚上容易醒来，白天精神也不好，应该如何调理？"
inputs = tokenizer([question], return_tensors="pt").to("cuda")

outputs = model.generate(
input_ids=inputs.input_ids,
attention_mask=inputs.attention_mask,
max_new_tokens=1024
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

通过上述过程，可以初步看到我们微调后的模型，在医疗问题回答的逻辑性和专业性上明显优于未经微调的模型，这验证了LoRA微调方法和我们整体微调流程的有效性。

总结

本文介绍了我们项目中如何基于DeepSeek-R1-Distill-Llama-8B模型，采用LoRA技术及医疗推理数据进行高效的微调训练。这为后续“健康管理助手”智能体的开发和应用提供了重要的基础。

基于DeepSeek-R1-Distill-Llama-8B的健康管理助手微调过程