ollama是大模型演示的方便工具,但是有时候我们需要修改其配置(例如模型留驻GPU的时间),首先:
ollama serve -h
可以看到能够设置的环境变量:
Environment Variables:
OLLAMA_DEBUG Show additional debug information (. OLLAMA_DEBUG=1)
OLLAMA_HOST IP Address for the ollama server (default 127.0.0.1:11434)
OLLAMA_KEEP_ALIVE The duration that models stay loaded in memory (default "5m")
OLLAMA_MAX_LOADED_MODELS Maximum number of loaded models (default 1)
OLLAMA_MAX_QUEUE Maximum number of queued requests
OLLAMA_MODELS The path to the models directory
OLLAMA_NUM_PARALLEL Maximum number of parallel requests (default 1)
OLLAMA_NOPRUNE Do not prune model blobs on startup
OLLAMA_ORIGINS A comma separated list of allowed origins
OLLAMA_TMPDIR Location for temporary files
如果要改驻留时间,就修改OLLAMA_KEEP_ALIVE,那这个环境变量是什么单位呢?查看一下这个网页:/ollama/ollama/blob/main/docs/#how-do-i-keep-a-model-loaded-in-memory-or-make-it-unload-immediatelyhttps:///ollama/ollama/blob/main/docs/#how-do-i-keep-a-model-loaded-in-memory-or-make-it-unload-immediately
可以看到,指定上面的时间有几种选择:
- a duration string (such as "10m" or "24h")
- a number in seconds (such as 3600)
- any negative number which will keep the model loaded in memory (. -1 or "-1m")
- '0' which will unload the model immediately after generating a response
例如我们在windows环境变量中可以把OLLAMA_KEEP_ALIVE改成1h,OLLAMA_NUM_PARALLEL改成2,就可以同时有两个并发访问,并且驻留时间为1h了(如果用ollama ps则会显示59 minutes)。就简单记录这么多。
补充一点:我发现在windows上需要重启系统后上面这个环境变量才会真正生效。