一、每次发出请求加载模型时,定义一个keep_alive变量,说明要存在多长时间。
curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt": "Why is the sky blue?",
"stream": false,
"keep_alive": "24h"
}'
二、又或者可以每280秒加载一次模型,因为模型每五分钟自动删除,由于加载模型只需1ms,所以可以选择这种方案:
import requests
import time
from datetime import datetime
import pytz
def get_bj_time():
beijing_tz = ('Asia/Shanghai')
return (beijing_tz).strftime("%Y-%m-%d %H:%M:%S")
while True:
data = {"model": "qwen:7b", "keep_alive": "5m"}
headers = {'Content-Type': 'application/json'}
high_precision_time = time.perf_counter()
response = ('http://localhost:11434/api/generate', json=data, headers=headers)
high_precision_time_end = time.perf_counter()
time1 = high_precision_time_end-high_precision_time
print(f"高精度时间(精确到微秒): {time1*1000:.6f}")
jsonResponse = ('utf-8') # 将 bytes 转换为字符串以便打印
print(jsonResponse)
print(f"当前北京时间:{get_bj_time()}")
(280) # 暂停280秒后再次执行
'''
7b初次加载模型时间:3.867187177s, 第二次加载模型时间:0.766666ms
14b初次加载模型时间:5.180146173s , 第二次加载模型时间:0.753414ms
72b初次加载模型时间:16.991763358s,第二次加载模型时间:1.358505ms