我们微调后的模型由两部分组成:基座模型和Lora适配器,需要对这两者分别转换,最后再合并。
3.1 基座模型转换
先用convert_hf_to_gguf.py
工具转换基座模型:
注:convert_hf_to_gguf.py是llama.cpp提供的工具脚本,位于安装目录下,用于将huggingface上下载的safetensors模型格式转换为gguf文件。
!python /data2/downloads/llama.cpp/convert_hf_to_gguf.py \
--outtype bf16 \
--outfile /data2/anti_fraud/models/anti_fraud_v11/qwen2_bf16.gguf \
--model-name qwen2 \
/data2/anti_fraud/models/modelscope/hub/Qwen/Qwen2-1___5B-Instruct
参数释义:
- outtype: 用于指定参数的输出精度,bf16表示16位半精度浮点数;
- outfile: 指定输出的模型文件;
- model-name: 模型名称;
INFO:hf-to-gguf:Loading model: Qwen2-1___5B-Instruct
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model part 'model.safetensors'
INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> BF16, shape = {1536, 151936}
INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.bfloat16 --> BF16, shape = {8960, 1536}
INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.bfloat16 --> BF16, shape = {1536, 8960}
INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.0.attn_k.bias, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.0.attn_k.weight, torch.bfloat16 --> BF16, shape = {1536, 256}
INFO:hf-to-gguf:blk.0.attn_output.weight, torch.bfloat16 --> BF16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.0.attn_q.bias, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.0.attn_q.weight, torch.bfloat16 --> BF16, shape = {1536, 1536}
INFO:hf-to-gguf:blk.0.attn_v.bias, torch.bfloat16 --> F32, shape = {256}
INFO:hf-to-gguf:blk.0.attn_v.weight, torch.bfloat16 --> BF16, shape = {1536, 256}
……
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 32768
INFO:hf-to-gguf:gguf: embedding length = 1536
INFO:hf-to-gguf:gguf: feed forward length = 8960
INFO:hf-to-gguf:gguf: head count = 12
INFO:hf-to-gguf:gguf: key-value head count = 2
INFO:hf-to-gguf:gguf: rope theta = 1000000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-06
INFO:hf-to-gguf:gguf: file type = 32
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Adding 151387 merge(s).
INFO:gguf.vocab:Setting special token type eos to 151645
INFO:gguf.vocab:Setting special token type pad to 151643
INFO:gguf.vocab:Setting special token type bos to 151643
INFO:gguf.vocab:Setting chat_template to {% for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system
You are a helpful assistant.<|im_end|>
' }}{% endif %}{{'<|im_start|>' + message['role'] + '
' + message['content'] + '<|im_end|>' + '
'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% endif %}
INFO:hf-to-gguf:Set model quantization version
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:/data2/anti_fraud/models/anti_fraud_v11/qwen2_bf16.gguf: n_tensors = 338, total_size = 3.1G
Writing: 100%|██████████████████████████| 3.09G/3.09G [00:39<00:00, 77.9Mbyte/s]
INFO:hf-to-gguf:Model successfully exported to /data2/anti_fraud/models/anti_fraud_v11/qwen2_bf16.gguf
执行完后,就得到一个基座模型的gguf文件qwen2_bf16.gguf
。
3.2 lora适配器转换
接下来使用convert_lora_to_gguf.py
脚本工具来转换lora适配器。
!python /data2/downloads/llama.cpp/convert_lora_to_gguf.py \
--base /data2/anti_fraud/models/modelscope/hub/Qwen/Qwen2-1___5B-Instruct \
--outfile /data2/anti_fraud/models/anti_fraud_v11/lora_0913_4_bf16.gguf \
/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0913_4/checkpoint-5454 \
--outtype bf16 --verbose
- base: 用于指定基座模型位置,目的是确保转换后的Lora适配器可以正确地与基座模型合并;
- outfile: 用于指定适配器转换后的输出文件;
- outtype: 指定转换格式,与基座模型相同,都用bf16;
- checkpoint-5454是要转换的lora适配器的目录位置。
INFO:lora-to-gguf:Loading base model: Qwen2-1___5B-Instruct
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:lora-to-gguf:Exporting model...
INFO:hf-to-gguf:blk.0.ffn_down.weight.lora_a, torch.float32 --> BF16, shape = {8960, 16}
INFO:hf-to-gguf:blk.0.ffn_down.weight.lora_b, torch.float32 --> BF16, shape = {16, 1536}
INFO:hf-to-gguf:blk.0.ffn_gate.weight.lora_a, torch.float32 --> BF16, shape = {1536, 16}
INFO:hf-to-gguf:blk.0.ffn_gate.weight.lora_b, torch.float32 --> BF16, shape = {16, 8960}
INFO:hf-to-gguf:blk.0.ffn_up.weight.lora_a, torch.float32 --> BF16, shape = {1536, 16}
INFO:hf-to-gguf:blk.0.ffn_up.weight.lora_b, torch.float32 --> BF16, shape = {16, 8960}
……
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:/data2/anti_fraud/models/anti_fraud_v11/lora_0913_4_bf16.gguf: n_tensors = 392, total_size = 36.9M
Writing: 100%|██████████████████████████| 36.9M/36.9M [00:01<00:00, 21.4Mbyte/s]
INFO:lora-to-gguf:Model successfully exported to /data2/anti_fraud/models/anti_fraud_v11/lora_0913_4_bf16.gguf
执行完后,得到一个Lora适配器的gguf文件lora_0913_4_bf16.gguf
。
3.3 合并
使用llama-export-lora
工具将基座模型和Lora适配器合并为一个gguf文件。
!/data2/downloads/llama.cpp/llama-export-lora \
-m /data2/anti_fraud/models/anti_fraud_v11/qwen2_bf16.gguf \
-o /data2/anti_fraud/models/anti_fraud_v11/model_bf16.gguf \
--lora /data2/anti_fraud/models/anti_fraud_v11/lora_0913_4_bf16.gguf
-m
: 指定基座模型的gguf文件。--lora
: 指定lora适配器的gguf文件。-o
: 指定合并后的模型文件。
file_input: loaded gguf from /data2/anti_fraud/models/anti_fraud_v11/qwen2_bf16.gguf
file_input: loaded gguf from /data2/anti_fraud/models/anti_fraud_v11/lora_0913_4_bf16.gguf
copy_tensor : blk.0.attn_k.bias [256, 1, 1, 1]
merge_tensor : blk.0.attn_k.weight [1536, 256, 1, 1]
merge_tensor : + dequantize base tensor from bf16 to F32
merge_tensor : + merging from adapter[0] type=bf16
merge_tensor : input_scale=1.000000 calculated_scale=2.000000 rank=16
merge_tensor : + output type is f16
copy_tensor : blk.0.attn_norm.weight [1536, 1, 1, 1]
merge_tensor : blk.0.attn_output.weight [1536, 1536, 1, 1]
……
copy_tensor : output_norm.weight [1536, 1, 1, 1]
copy_tensor : token_embd.weight [1536, 151936, 1, 1]
run_merge : merged 196 tensors with lora adapters
run_merge : wrote 338 tensors to output file
done, output file is /data2/anti_fraud/models/anti_fraud_v11/model_bf16.gguf
查看导出的文件:
-rw-rw-r-- 1 42885408 Nov 9 14:57 lora_0913_4_bf16.gguf
-rw-rw-r-- 1 3093666720 Nov 9 14:58 model_bf16.gguf
-rw-rw-r-- 1 3093666720 Nov 9 14:56 qwen2_bf16.gguf
经过上面三步,我们就将safetensors格式的基座模型和lora适配器导出为gguf格式的模型文件model_bf16.gguf
,此时模型文件大小并没有变化,仍然有3G。
用llama-cli
命令验证此模型文件是否能正常work。
llama-cli是一种命令行接口,允许用户只通过一条命令完成模型启动和模型访问,用于快速测试和调试。
!/data2/downloads/llama.cpp/llama-cli --log-disable \
-m /data2/anti_fraud/models/anti_fraud_v11/model_bf16.gguf \
-p "我是一个来自太行山下小村庄家的孩子" \
-n 100
我是一个来自太行山下小村庄家的孩子,我叫李丽丽。我是一个很平凡的女孩,我平凡得像一颗小草,平凡得像一滴水,平凡得像一粒沙。但我有一颗不平凡的心,我有我独特的个性,我有我灿烂的微笑。
-m
: 指定要使用的模型文件路径;-p
:指定文本生成的起始提示;-n
: 指定要生成的文本序列的最大长度;--log-disable
: 关闭多余的日志输出,只输出最终的文本。