init_adapter
接着看下 Lora
微调代码细节:
from peft import LoraConfig
lora_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
inference_mode=False,
modules_to_save=finetuning_args.additional_target,
use_dora=finetuning_args.use_dora,
**peft_kwargs,
)
model = get_peft_model(model, lora_config)
步进 PeftModelForCausalLM(PeftModel)
, PeftModel(PushToHubMixin, torch.nn.Module)
, LoraModel(BaseTuner)
转到 BaseTuner
的__init__
方法,执行了inject_adapter
函数,该函数用于创建适配器层(adapter layers)并将其替换为目标模块(target modules)。调用 peft.mapping.get_peft_model
函数时,如果传入了非提示调优(non-prompt tuning)的适配器类,则在内部自动调用的方法。
key_list = [key for key, _ in model.named_modules()
# key_list 是模型结构模块, 列举如下
"""
'transformer.wte', 'transformer.drop', 'transformer.rotary_emb', 'transformer.h', 'transformer.h.0', 'transformer.h.0.ln_1', 'transformer.h.0.attn', 'transformer.h.0.attn.c_attn', 'transformer.h.0.attn.c_proj', 'transformer.h.0.attn...tn_dropout', 'transformer.h.0.ln_2', 'transformer.h.0.mlp'
"""
如果peft_config
配置参数target_modules 中设置为 ‘c_attn’,则将模型中以 ’ c_attn’ 后缀结尾的 block,进行替换
parent, target, target_name = _get_submodules(model, key)
print(parent)
QWenAttention(
(c_attn): Linear(in_features=2048, out_features=6144, bias=True)
(c_proj): Linear(in_features=2048, out_features=2048, bias=False)
(attn_dropout): Dropout(p=0.0, inplace=False)
)
print(target)
Linear(in_features=2048, out_features=6144, bias=True)
print(target_name)
c_attn
接下来参数的核心是产生新的module
和取代旧的 module
self._create_new_module(lora_config, adapter_name, target, **kwargs)
self._replace_module(parent, target_name, new_module, target)
新的module
产生:
# peft.tuners.lora.layer.py
lora.Linear(
(base_layer): Linear(in_features=5120, out_features=15360, bias=True)
(lora_dropout): ModuleDict(
(default): Identity()
)
(lora_A): ModuleDict(
(default): Linear(in_features=5120, out_features=16, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=16, out_features=15360, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
)
具体实施细节可以参看LoraLayer
类和 self.update_layer
方法。
new_module
结构如下:
new_module:
lora.Linear(
(base_layer): Linear(in_features=5120, out_features=15360, bias=True)
(lora_dropout): ModuleDict(
(default): Identity()
)
(lora_A): ModuleDict(
(default): Linear(in_features=5120, out_features=16, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=16, out_features=15360, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
)
def _replace_module(self, parent, child_name, new_module, child):
setattr(parent, child_name, new_module)
if hasattr(child, "base_layer"):
child = child.base_layer
if not hasattr(new_module, "base_layer"):
new_module.weight = child.weight
if hasattr(child, "bias"):
new_module.bias = child.bias
if getattr(child, "state", None) is not None:
if hasattr(new_module, "base_layer"):
new_module.base_layer.state = child.state
else:
new_module.state = child.state
new_module.to(child.weight.device)
for name, module in new_module.named_modules():
if (self.prefix in name) or ("ranknum" in name):
weight = child.qweight if hasattr(child, "qweight") else child.weight
module.to(weight.device)
这块代码显示module
替换的过程,setattr(parent, child_name, new_module) 这行代码将新模块 new_module
赋值给父模块 parent
的属性 child_name
,从而替换掉原来的子模块 child
.
child = child.base_layer
如果 child
是一个包装器,这行代码将 child
设置为它的基础层(原始模块)
weight = child.qweight if hasattr(child, "qweight") else child.weight
这行代码检查原始模块是否有 qweight
属性,这通常表示原始模块的权重已经被量化。