Llama-factory源码详细解读

时间:2024-07-10 22:12:34
init_adapter

接着看下 Lora 微调代码细节:

from peft import LoraConfig
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    inference_mode=False,
    modules_to_save=finetuning_args.additional_target,
    use_dora=finetuning_args.use_dora,
    **peft_kwargs,
)
model = get_peft_model(model, lora_config)

步进 PeftModelForCausalLM(PeftModel) , PeftModel(PushToHubMixin, torch.nn.Module), LoraModel(BaseTuner)

转到 BaseTuner__init__方法,执行了inject_adapter 函数,该函数用于创建适配器层(adapter layers)并将其替换为目标模块(target modules)。调用 peft.mapping.get_peft_model 函数时,如果传入了非提示调优(non-prompt tuning)的适配器类,则在内部自动调用的方法。

key_list = [key for key, _ in model.named_modules()
# key_list 是模型结构模块, 列举如下            
"""
'transformer.wte', 'transformer.drop', 'transformer.rotary_emb', 'transformer.h', 'transformer.h.0', 'transformer.h.0.ln_1', 'transformer.h.0.attn', 'transformer.h.0.attn.c_attn', 'transformer.h.0.attn.c_proj', 'transformer.h.0.attn...tn_dropout', 'transformer.h.0.ln_2', 'transformer.h.0.mlp'
"""      

如果peft_config配置参数target_modules 中设置为 ‘c_attn’,则将模型中以 ’ c_attn’ 后缀结尾的 block,进行替换

parent, target, target_name = _get_submodules(model, key)

print(parent)
QWenAttention(
  (c_attn): Linear(in_features=2048, out_features=6144, bias=True)
  (c_proj): Linear(in_features=2048, out_features=2048, bias=False)
  (attn_dropout): Dropout(p=0.0, inplace=False)
)

print(target)
Linear(in_features=2048, out_features=6144, bias=True)

print(target_name)
c_attn

接下来参数的核心是产生新的module 和取代旧的 module

self._create_new_module(lora_config, adapter_name, target, **kwargs)
self._replace_module(parent, target_name, new_module, target)

新的module 产生:

#  peft.tuners.lora.layer.py
lora.Linear(
  (base_layer): Linear(in_features=5120, out_features=15360, bias=True)
  (lora_dropout): ModuleDict(
    (default): Identity()
  )
  (lora_A): ModuleDict(
    (default): Linear(in_features=5120, out_features=16, bias=False)
  )
  (lora_B): ModuleDict(
    (default): Linear(in_features=16, out_features=15360, bias=False)
  )
  (lora_embedding_A): ParameterDict()
  (lora_embedding_B): ParameterDict()
)

具体实施细节可以参看LoraLayer类和 self.update_layer 方法。

new_module 结构如下:

new_module:
lora.Linear(
  (base_layer): Linear(in_features=5120, out_features=15360, bias=True)
  (lora_dropout): ModuleDict(
    (default): Identity()
  )
  (lora_A): ModuleDict(
    (default): Linear(in_features=5120, out_features=16, bias=False)
  )
  (lora_B): ModuleDict(
    (default): Linear(in_features=16, out_features=15360, bias=False)
  )
  (lora_embedding_A): ParameterDict()
  (lora_embedding_B): ParameterDict()
)
    def _replace_module(self, parent, child_name, new_module, child):
        setattr(parent, child_name, new_module)
        if hasattr(child, "base_layer"):
            child = child.base_layer

        if not hasattr(new_module, "base_layer"):
            new_module.weight = child.weight
            if hasattr(child, "bias"):
                new_module.bias = child.bias

        if getattr(child, "state", None) is not None:
            if hasattr(new_module, "base_layer"):
                new_module.base_layer.state = child.state
            else:
                new_module.state = child.state
            new_module.to(child.weight.device)

        for name, module in new_module.named_modules():
            if (self.prefix in name) or ("ranknum" in name):
                weight = child.qweight if hasattr(child, "qweight") else child.weight
                module.to(weight.device)

这块代码显示module替换的过程,setattr(parent, child_name, new_module) 这行代码将新模块 new_module 赋值给父模块 parent 的属性 child_name,从而替换掉原来的子模块 child.

child = child.base_layer 如果 child 是一个包装器,这行代码将 child 设置为它的基础层(原始模块)

weight = child.qweight if hasattr(child, "qweight") else child.weight 这行代码检查原始模块是否有 qweight 属性,这通常表示原始模块的权重已经被量化。