PyPI很慢。我如何运行自己的服务器?

时间:2022-01-15 07:11:38

When a new developer joins the team, or Jenkins runs a complete build, I need to create a fresh virtualenv. I often find that setting up a virtualenv with Pip and a large number (more than 10) of requirements takes a very long time to install everything from PyPI. Often it fails altogether with:

当一个新的开发人员加入团队,或者Jenkins运行完整的构建时,我需要创建一个新的virtualenv。我经常发现使用Pip和大量(超过10个)需求设置virtualenv需要很长时间才能从PyPI安装所有内容。通常它完全失败:

Downloading/unpacking Django==1.4.5 (from -r requirements.pip (line 1))
Exception:
Traceback (most recent call last):
  File "/var/lib/jenkins/jobs/hermes-web/workspace/web/.venv/lib/python2.6/site-packages/pip-1.2.1-py2.6.egg/pip/basecommand.py", line 107, in main
    status = self.run(options, args)
  File "/var/lib/jenkins/jobs/hermes-web/workspace/web/.venv/lib/python2.6/site-packages/pip-1.2.1-py2.6.egg/pip/commands/install.py", line 256, in run
    requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, bundle=self.bundle)
  File "/var/lib/jenkins/jobs/hermes-web/workspace/web/.venv/lib/python2.6/site-packages/pip-1.2.1-py2.6.egg/pip/req.py", line 1018, in prepare_files
    self.unpack_url(url, location, self.is_download)
  File "/var/lib/jenkins/jobs/hermes-web/workspace/web/.venv/lib/python2.6/site-packages/pip-1.2.1-py2.6.egg/pip/req.py", line 1142, in unpack_url
    retval = unpack_http_url(link, location, self.download_cache, self.download_dir)
  File "/var/lib/jenkins/jobs/hermes-web/workspace/web/.venv/lib/python2.6/site-packages/pip-1.2.1-py2.6.egg/pip/download.py", line 463, in unpack_http_url
    download_hash = _download_url(resp, link, temp_location)
  File "/var/lib/jenkins/jobs/hermes-web/workspace/web/.venv/lib/python2.6/site-packages/pip-1.2.1-py2.6.egg/pip/download.py", line 380, in _download_url
    chunk = resp.read(4096)
  File "/usr/lib64/python2.6/socket.py", line 353, in read
    data = self._sock.recv(left)
  File "/usr/lib64/python2.6/httplib.py", line 538, in read
    s = self.fp.read(amt)
  File "/usr/lib64/python2.6/socket.py", line 353, in read
    data = self._sock.recv(left)
timeout: timed out

I'm aware of Pip's --use-mirrors flag, and sometimes people on my team have worked around by using --index-url http://f.pypi.python.org/simple (or another mirror) until they have a mirror that responds in a timely fashion. We're in the UK, but there's a PyPI mirror in Germany, and we don't have issues downloading data from other sites.

我知道Pip的--use-mirrors标志,有时我的团队中的人通过使用--index-url http://f.pypi.python.org/simple(或其他镜像)来解决问题一面能够及时响应的镜子。我们在英国,但在德国有一个PyPI镜像,我们在从其他网站下载数据时没有问题。

So, I'm looking at ways to mirror PyPI internally for our team.

所以,我正在寻找为我们的团队内部镜像PyPI的方法。

The options I've looked at are:

我看过的选项是:

  1. Running my own PyPI instance. There's the official PyPI implementation: CheeseShop as well as several third party implementations, such as: djangopypi and pypiserver (see footnote)

    运行我自己的PyPI实例。有官方PyPI实现:CheeseShop以及几个第三方实现,例如:djangopypi和pypiserver(见脚注)

    The problem with this approach is that I'm not interested in full PyPI functionality with file upload, I just want to mirror the content it provides.

    这种方法的问题在于我对文件上传的完整PyPI功能不感兴趣,我只想镜像它提供的内容。

  2. Running a PyPI mirror with pep381client or pypi-mirror.

    使用pep381client或pypi-mirror运行PyPI镜像。

    This looks like it could work, but it requires my mirror to download everything from PyPI first. I've set up a test instance of pep381client, but my download speed varies between 5 Kb/s and 200 Kb/s (bits, not bytes). Unless there's a copy of the full PyPI archive somewhere, it will take me weeks to have a useful mirror.

    看起来它可以工作,但它需要我的镜像首先从PyPI下载所有内容。我已经设置了pep381client的测试实例,但我的下载速度在5 Kb / s和200 Kb / s之间变化(位,而不是字节)。除非在某个地方有完整的PyPI存档的副本,否则我需要数周时间才能拥有一个有用的镜像。

  3. Using a PyPI round-robin proxy such as yopypi.

    使用PyPI循环代理,例如yopypi。

    This is irrelevant now that http://pypi.python.org itself consists of several geographically distinct servers.

    现在,http://pypi.python.org本身由几个地理位置不同的服务器组成,这是无关紧要的。

  4. Copying around a virtualenv between developers, or hosting a folder of the current project's dependencies.

    在开发人员之间复制virtualenv,或托管当前项目依赖项的文件夹。

    This doesn't scale: we have several different Python projects whose dependencies change (slowly) over time. As soon as the dependencies of any project change, this central folder must be updated to add the new dependencies. Copying the virtualenv is worse than copying the packages though, since any Python packages with C modules need to be compiled for the target system. Our team has both Linux and OS X users.

    这不会扩展:我们有几个不同的Python项目,其依赖性随着时间的推移而变化(缓慢)。只要任何项目的依赖项发生更改,就必须更新此*文件夹以添加新的依赖项。复制virtualenv比复制软件包更糟糕,因为任何带有C模块的Python软件包都需要为目标系统进行编译。我们的团队拥有Linux和OS X用户。

    (This still looks like the best option of a bad bunch.)

    (这看起来仍然是一堆糟糕的最佳选择。)

  5. Using an intelligent PyPI caching proxy: collective.eggproxy

    使用智能PyPI缓存代理:collective.eggproxy

    This seems like it would be a very good solution, but the last version on PyPI is dated 2009 and discusses mod_python.

    这似乎是一个非常好的解决方案,但PyPI的最后一个版本是2009年,并讨论了mod_python。

What do other large Python teams do? What's the best solution to quickly install the same set of python packages?

其他大型Python团队做了什么?快速安装同一套python软件包的最佳解决方案是什么?

Footnotes:

脚注:

  • I've seen the question How to roll my own PyPI?, but that question relates to hosting private code.
  • 我已经看到了如何滚动我自己的PyPI的问题?但是这个问题与托管私有代码有关。
  • The Python wiki lists alternative PyPI implementations
  • Python wiki列出了替代的PyPI实现
  • I've also recently discovered Crate.io but I don't believe that helps me when using Pip.
  • 我最近也发现了Crate.io,但我不认为这在使用Pip时有帮助。
  • There's a website monitoring PyPI mirror status
  • 有一个网站监控PyPI镜像状态
  • Some packages on PyPI have their files hosted elsewhere so even a perfect mirror won't help all dependencies
  • PyPI上的一些软件包将文件托管在其他地方,因此即使是完美的镜像也无法帮助所有依赖项

5 个解决方案

#1


27  

Do you have a shared filesystem?

你有共享文件系统吗?

Because I would use pip's cache setting. It's pretty simple. Make a folder called pip-cache in /mnt for example.

因为我会使用pip的缓存设置。这很简单。例如,在/ mnt中创建一个名为pip-cache的文件夹。

mkdir /mnt/pip-cache

Then each developer would put the following line into their pip config (unix = $HOME/.pip/pip.conf, win = %HOME%\pip\pip.ini)

然后每个开发人员将以下行放入他们的pip配置中(unix = $ HOME / .pip / pip.conf,win =%HOME%\ pip \ pip.ini)

[global]
download-cache = /mnt/pip-cache

It still checks PyPi, looks for the latest version. Then checks if that version is in the cache. If so it installs it from there. If not it downloads it. Stores it in the cache and installs it. So each package would only be downloaded once per new version.

它仍然检查PyPi,寻找最新版本。然后检查该版本是否在缓存中。如果是这样的话就从那里安装它。如果不是,则下载它。将其存储在缓存中并进行安装。因此,每个新版本只会下载一次。

#2


9  

While it doesn't solve your PyPI problem, handing built virtualenvs to developers (or deployments) can be done with Terrarium.

虽然它无法解决您的PyPI问题,但可以使用Terrarium将构建的virtualenvs交给开发人员(或部署)。

Use terrarium to package up, compress, and save virtualenvs. You can store them locally or even store them on S3. From the documentation on GitHub:

使用玻璃容器打包,压缩和保存virtualenvs。您可以在本地存储它们甚至将它们存储在S3上。从GitHub上的文档:

$ pip install terrarium
$ terrarium --target testenv --storage-dir /mnt/storage install requirements.txt

After building a fresh environment, terrarium will archive and compress the environment, and then copy it to the location specified by storage-dir.

在构建新环境后,terrarium将归档并压缩环境,然后将其复制到storage-dir指定的位置。

On subsequent installs for the same requirement set that specify the same storage-dir, terrarium will copy and extract the compressed archive from /mnt/storage.

在指定相同storage-dir的相同需求集的后续安装中,terrarium将从/ mnt / storage复制并解压缩压缩存档。

To display exactly how terrarium will name the archive, you can run the following command:

要准确显示玻璃容器如何命名存档,您可以运行以下命令:

$ terrarium key requirements.txt more_requirements.txt
x86_64-2.6-c33a239222ddb1f47fcff08f3ea1b5e1

#3


7  

I recently installed devpi into my development team's Vagrant configuration such that its package cache lives on the host's file system. This allows each VM to have its own devpi-server daemon that it uses as the index-url for virtualenv/pip. When the VMs are destroyed and reprovisioned, the packages don't have to be downloaded over and over. Each developer downloads them one time to build their local cache for as long as they live on the host's file system.

我最近在我的开发团队的Vagrant配置中安装了devpi,使其包缓存存在于主机的文件系统中。这允许每个VM拥有自己的devpi-server守护程序,它用作virtualenv / pip的index-url。当VM被销毁并重新配置时,不必一次又一次地下载软件包。只要它们存在于主机的文件系统上,每个开发人员都会下载一次以构建本地缓存。

We also have an internal PyPi index for our private packages that's currently just a directory being served by Apache. Ultimately, I'm going to convert that to a devpi proxy server as well so our build server will also maintain a package cache for our Python dependencies in addition to hosting our private libraries. This will create an additional buffer between our development environment, production deployments and the public PyPi.

我们的私有包也有一个内部PyPi索引,它目前只是Apache提供的目录。最终,我将把它转换为devpi代理服务器,因此除了托管我们的私有库之外,我们的构建服务器还将为我们的Python依赖项维护一个包缓存。这将在我们的开发环境,生产部署和公共PyPi之间创建一个额外的缓冲区。

This seems to be the most robust solution I've found to these requirements to date.

这似乎是迄今为止我发现这些要求的最强大的解决方案。

#4


3  

Take a look at David Wolever's pip2pi. You can just set up a cron job to keep a company- or team-wide mirror of the packages you need, and then point your pips towards your internal mirror.

看看David Wolever的pip2pi。您可以设置一个cron作业,以保留公司或团队范围内所需软件包的镜像,然后将您的点数指向内部镜像。

#5


-1  

Setup your local server then modify the local computer's hosts file to overwrite the actual URL to instead point to the local server thus skipping the standard DNS. Then delete the line in the host file if you are done.

设置本地服务器,然后修改本地计算机的hosts文件以覆盖实际的URL,而不是指向本地服务器,从而跳过标准DNS。如果完成,则删除主机文件中的行。

Or I suppose you could find the URL in pip and modify that.

或者我想你可以在pip中找到URL并修改它。

#1


27  

Do you have a shared filesystem?

你有共享文件系统吗?

Because I would use pip's cache setting. It's pretty simple. Make a folder called pip-cache in /mnt for example.

因为我会使用pip的缓存设置。这很简单。例如,在/ mnt中创建一个名为pip-cache的文件夹。

mkdir /mnt/pip-cache

Then each developer would put the following line into their pip config (unix = $HOME/.pip/pip.conf, win = %HOME%\pip\pip.ini)

然后每个开发人员将以下行放入他们的pip配置中(unix = $ HOME / .pip / pip.conf,win =%HOME%\ pip \ pip.ini)

[global]
download-cache = /mnt/pip-cache

It still checks PyPi, looks for the latest version. Then checks if that version is in the cache. If so it installs it from there. If not it downloads it. Stores it in the cache and installs it. So each package would only be downloaded once per new version.

它仍然检查PyPi,寻找最新版本。然后检查该版本是否在缓存中。如果是这样的话就从那里安装它。如果不是,则下载它。将其存储在缓存中并进行安装。因此,每个新版本只会下载一次。

#2


9  

While it doesn't solve your PyPI problem, handing built virtualenvs to developers (or deployments) can be done with Terrarium.

虽然它无法解决您的PyPI问题,但可以使用Terrarium将构建的virtualenvs交给开发人员(或部署)。

Use terrarium to package up, compress, and save virtualenvs. You can store them locally or even store them on S3. From the documentation on GitHub:

使用玻璃容器打包,压缩和保存virtualenvs。您可以在本地存储它们甚至将它们存储在S3上。从GitHub上的文档:

$ pip install terrarium
$ terrarium --target testenv --storage-dir /mnt/storage install requirements.txt

After building a fresh environment, terrarium will archive and compress the environment, and then copy it to the location specified by storage-dir.

在构建新环境后,terrarium将归档并压缩环境,然后将其复制到storage-dir指定的位置。

On subsequent installs for the same requirement set that specify the same storage-dir, terrarium will copy and extract the compressed archive from /mnt/storage.

在指定相同storage-dir的相同需求集的后续安装中,terrarium将从/ mnt / storage复制并解压缩压缩存档。

To display exactly how terrarium will name the archive, you can run the following command:

要准确显示玻璃容器如何命名存档,您可以运行以下命令:

$ terrarium key requirements.txt more_requirements.txt
x86_64-2.6-c33a239222ddb1f47fcff08f3ea1b5e1

#3


7  

I recently installed devpi into my development team's Vagrant configuration such that its package cache lives on the host's file system. This allows each VM to have its own devpi-server daemon that it uses as the index-url for virtualenv/pip. When the VMs are destroyed and reprovisioned, the packages don't have to be downloaded over and over. Each developer downloads them one time to build their local cache for as long as they live on the host's file system.

我最近在我的开发团队的Vagrant配置中安装了devpi,使其包缓存存在于主机的文件系统中。这允许每个VM拥有自己的devpi-server守护程序,它用作virtualenv / pip的index-url。当VM被销毁并重新配置时,不必一次又一次地下载软件包。只要它们存在于主机的文件系统上,每个开发人员都会下载一次以构建本地缓存。

We also have an internal PyPi index for our private packages that's currently just a directory being served by Apache. Ultimately, I'm going to convert that to a devpi proxy server as well so our build server will also maintain a package cache for our Python dependencies in addition to hosting our private libraries. This will create an additional buffer between our development environment, production deployments and the public PyPi.

我们的私有包也有一个内部PyPi索引,它目前只是Apache提供的目录。最终,我将把它转换为devpi代理服务器,因此除了托管我们的私有库之外,我们的构建服务器还将为我们的Python依赖项维护一个包缓存。这将在我们的开发环境,生产部署和公共PyPi之间创建一个额外的缓冲区。

This seems to be the most robust solution I've found to these requirements to date.

这似乎是迄今为止我发现这些要求的最强大的解决方案。

#4


3  

Take a look at David Wolever's pip2pi. You can just set up a cron job to keep a company- or team-wide mirror of the packages you need, and then point your pips towards your internal mirror.

看看David Wolever的pip2pi。您可以设置一个cron作业,以保留公司或团队范围内所需软件包的镜像,然后将您的点数指向内部镜像。

#5


-1  

Setup your local server then modify the local computer's hosts file to overwrite the actual URL to instead point to the local server thus skipping the standard DNS. Then delete the line in the host file if you are done.

设置本地服务器,然后修改本地计算机的hosts文件以覆盖实际的URL,而不是指向本地服务器,从而跳过标准DNS。如果完成,则删除主机文件中的行。

Or I suppose you could find the URL in pip and modify that.

或者我想你可以在pip中找到URL并修改它。