在virtualenv中检索Python库的外部C / Fortran依赖项列表

时间:2021-10-22 00:51:30

I'm maintaining a project that uses a number of Python libraries such as numpy, pandas, and netcdf4 which have dependencies such as libhdf5, ATLAS, LAPACK, etc. I have previously installed these libraries via my system package manager prior to installing using pip. Now it is desired to list all the dependencies required, including C/Fortran dependencies. (Python is pretty easy with pip freeze and pipdeptree, of course) Is there any way to show which linked C/Fortran libraries are being used? Failing that, is there any way to show build options for the Python libraries using C dependencies?

我正在维护一个使用许多Python库的项目,例如numpy,pandas和netcdf4,它们具有libhdf5,ATLAS,LAPACK等依赖项。我以前在使用pip安装之前通过我的系统软件包管理器安装了这些库。现在需要列出所需的所有依赖项,包括C / Fortran依赖项。 (当然,使用pip冻结和pipdeptree,Python非常简单)有没有办法显示正在使用哪些链接的C / Fortran库?如果失败了,有没有办法显示使用C依赖项的Python库的构建选项?

EDIT: this answer details how to do this for numpy and perhaps other libraries with C dependencies through ldd. What's the recommended approach across the board?

编辑:这个答案详细说明了如何通过ldd为numpy和其他具有C依赖性的库执行此操作。全面推荐的方法是什么?

1 个解决方案

#1


2  

On Linux systems you can get the dynamic linker to dump all sorts of debug information that you can gather this type of information from (see ld.so(8)). For example I have a python program called plot_all and if I invoke it as:

在Linux系统上,您可以使用动态链接器转储可以从中收集此类信息的各种调试信息(请参阅ld.so(8))。例如,我有一个名为plot_all的python程序,如果我将其调用为:

LD_DEBUG=libs plot_all 2> ld-libs-output

then the dynamic linker will output all of its library file information into the file ld-libs-output. This will encompass every dynamic library dependency for that file to run. If further processed, e.g.:

然后动态链接器将其所有库文件信息输出到文件ld-libs-output中。这将包含该文件运行的每个动态库依赖项。如果进一步处理,例如:

grep "calling init" ld-libs-output | cut -f3 -d: | sort | uniq > LDLibs

you will get a sorted list of all unique libraries loaded in the course of executing the python script. If you want to turn this into dependency information you can use your distributions tools for mapping files to packages. On Gentoo, I can query the packages that own these libraries with a command like:

您将获得在执行python脚本过程中加载的所有唯一库的排序列表。如果要将其转换为依赖关系信息,可以使用分发工具将文件映射到包。在Gentoo上,我可以通过以下命令查询拥有这些库的包:

 equery -q b -n $(cat LDLibs | grep "calling init" | cut -f3 -d: | sort | uniq) | sort | uniq

The output of this command is a sorted list of all the packages that own at least one of the libraries dynamically loaded during my script execution:

此命令的输出是在脚本执行期间拥有至少一个动态加载的库的所有包的排序列表:

app-arch/bzip2
app-arch/lz4
app-arch/xz-utils
dev-lang/python
dev-libs/expat
dev-libs/glib
dev-libs/icu
dev-libs/libffi
dev-libs/libpcre
dev-libs/libxml2
dev-libs/openssl
dev-python/h5py
dev-python/matplotlib
dev-python/mpi4py
dev-python/numpy
dev-python/pillow
dev-python/PyQt5
dev-python/sip
dev-qt/qtcore
dev-qt/qtdbus
dev-qt/qtgui
dev-qt/qtsvg
dev-qt/qtwidgets
media-gfx/graphite2
media-libs/fontconfig
media-libs/freetype
media-libs/harfbuzz
media-libs/jpeg
media-libs/libpng
media-libs/openjpeg
media-libs/tiff
sci-libs/blas-reference
sci-libs/cblas-reference
sci-libs/hdf5
sci-libs/lapack-reference
sci-libs/szip
sys-apps/attr
sys-apps/dbus
sys-apps/hwloc
sys-apps/systemd
sys-apps/util-linux
sys-cluster/openmpi
sys-devel/gcc
sys-libs/glibc
sys-libs/libcap
sys-libs/zlib
sys-process/numactl
x11-drivers/nvidia-drivers
x11-libs/libICE
x11-libs/libpciaccess
x11-libs/libSM
x11-libs/libX11
x11-libs/libXau
x11-libs/libxcb
x11-libs/libXcursor
x11-libs/libXdmcp
x11-libs/libXext
x11-libs/libXfixes
x11-libs/libXi
x11-libs/libxkbcommon
x11-libs/libXrender
x11-libs/xcb-util
x11-libs/xcb-util-image
x11-libs/xcb-util-keysyms
x11-libs/xcb-util-renderutil
x11-libs/xcb-util-wm

This list is quite verbose and you can see it pulls in dependent packages quite deep, well past what we really need and some that are environment dependent (e.g. the dependency on nvidia-drivers would not apply to someone without an nvidia graphics card). To turn this into a useful list you would have to look at the dependency graph and only depend on the top-level packages as those will implicitly pull in the ones below them. Analyzing the first-level dependencies of these packages, all of them can be pulled in with a minimum list of:

这个列表非常冗长,你可以看到它非常深入地依赖依赖包,远远超过了我们真正需要的东西和一些依赖于环境的东西(例如对nvidia驱动程序的依赖不适用于没有nvidia显卡的人)。要将其转换为有用的列表,您必须查看依赖关系图并且仅依赖于*包,因为这些将隐式地引入它们下面的包。分析这些包的第一级依赖关系,所有这些包都可以通过以下最小列表拉入:

dev-python/h5py
dev-python/matplotlib
dev-python/pillow
sys-libs/glibc

I would then repeat this for any other scripts in my python package and consolidate all of the information into a master dependency list for my package.

然后我会对我的python包中的任何其他脚本重复此操作,并将所有信息合并到我的包的主依赖项列表中。


This should give you an idea of a general workflow for discovering the distro packages that a python script depends on. In my case all C/Fortran dependencies external to python are brought in by the distro python packages, but this process would have discovered any other top-level packages needed. The workflow will need to be modified to your distro tools for matching files to packages and analyzing dependencies.

这应该让您了解用于发现python脚本所依赖的发行包的一般工作流程。在我的例子中,python外部的所有C / Fortran依赖项都是由distro python包引入的,但是这个过程会发现所需的任何其他*包。需要将工作流程修改为您的发行版工具,以便将文件与包匹配并分析依赖项。

#1


2  

On Linux systems you can get the dynamic linker to dump all sorts of debug information that you can gather this type of information from (see ld.so(8)). For example I have a python program called plot_all and if I invoke it as:

在Linux系统上,您可以使用动态链接器转储可以从中收集此类信息的各种调试信息(请参阅ld.so(8))。例如,我有一个名为plot_all的python程序,如果我将其调用为:

LD_DEBUG=libs plot_all 2> ld-libs-output

then the dynamic linker will output all of its library file information into the file ld-libs-output. This will encompass every dynamic library dependency for that file to run. If further processed, e.g.:

然后动态链接器将其所有库文件信息输出到文件ld-libs-output中。这将包含该文件运行的每个动态库依赖项。如果进一步处理,例如:

grep "calling init" ld-libs-output | cut -f3 -d: | sort | uniq > LDLibs

you will get a sorted list of all unique libraries loaded in the course of executing the python script. If you want to turn this into dependency information you can use your distributions tools for mapping files to packages. On Gentoo, I can query the packages that own these libraries with a command like:

您将获得在执行python脚本过程中加载的所有唯一库的排序列表。如果要将其转换为依赖关系信息,可以使用分发工具将文件映射到包。在Gentoo上,我可以通过以下命令查询拥有这些库的包:

 equery -q b -n $(cat LDLibs | grep "calling init" | cut -f3 -d: | sort | uniq) | sort | uniq

The output of this command is a sorted list of all the packages that own at least one of the libraries dynamically loaded during my script execution:

此命令的输出是在脚本执行期间拥有至少一个动态加载的库的所有包的排序列表:

app-arch/bzip2
app-arch/lz4
app-arch/xz-utils
dev-lang/python
dev-libs/expat
dev-libs/glib
dev-libs/icu
dev-libs/libffi
dev-libs/libpcre
dev-libs/libxml2
dev-libs/openssl
dev-python/h5py
dev-python/matplotlib
dev-python/mpi4py
dev-python/numpy
dev-python/pillow
dev-python/PyQt5
dev-python/sip
dev-qt/qtcore
dev-qt/qtdbus
dev-qt/qtgui
dev-qt/qtsvg
dev-qt/qtwidgets
media-gfx/graphite2
media-libs/fontconfig
media-libs/freetype
media-libs/harfbuzz
media-libs/jpeg
media-libs/libpng
media-libs/openjpeg
media-libs/tiff
sci-libs/blas-reference
sci-libs/cblas-reference
sci-libs/hdf5
sci-libs/lapack-reference
sci-libs/szip
sys-apps/attr
sys-apps/dbus
sys-apps/hwloc
sys-apps/systemd
sys-apps/util-linux
sys-cluster/openmpi
sys-devel/gcc
sys-libs/glibc
sys-libs/libcap
sys-libs/zlib
sys-process/numactl
x11-drivers/nvidia-drivers
x11-libs/libICE
x11-libs/libpciaccess
x11-libs/libSM
x11-libs/libX11
x11-libs/libXau
x11-libs/libxcb
x11-libs/libXcursor
x11-libs/libXdmcp
x11-libs/libXext
x11-libs/libXfixes
x11-libs/libXi
x11-libs/libxkbcommon
x11-libs/libXrender
x11-libs/xcb-util
x11-libs/xcb-util-image
x11-libs/xcb-util-keysyms
x11-libs/xcb-util-renderutil
x11-libs/xcb-util-wm

This list is quite verbose and you can see it pulls in dependent packages quite deep, well past what we really need and some that are environment dependent (e.g. the dependency on nvidia-drivers would not apply to someone without an nvidia graphics card). To turn this into a useful list you would have to look at the dependency graph and only depend on the top-level packages as those will implicitly pull in the ones below them. Analyzing the first-level dependencies of these packages, all of them can be pulled in with a minimum list of:

这个列表非常冗长,你可以看到它非常深入地依赖依赖包,远远超过了我们真正需要的东西和一些依赖于环境的东西(例如对nvidia驱动程序的依赖不适用于没有nvidia显卡的人)。要将其转换为有用的列表,您必须查看依赖关系图并且仅依赖于*包,因为这些将隐式地引入它们下面的包。分析这些包的第一级依赖关系,所有这些包都可以通过以下最小列表拉入:

dev-python/h5py
dev-python/matplotlib
dev-python/pillow
sys-libs/glibc

I would then repeat this for any other scripts in my python package and consolidate all of the information into a master dependency list for my package.

然后我会对我的python包中的任何其他脚本重复此操作,并将所有信息合并到我的包的主依赖项列表中。


This should give you an idea of a general workflow for discovering the distro packages that a python script depends on. In my case all C/Fortran dependencies external to python are brought in by the distro python packages, but this process would have discovered any other top-level packages needed. The workflow will need to be modified to your distro tools for matching files to packages and analyzing dependencies.

这应该让您了解用于发现python脚本所依赖的发行包的一般工作流程。在我的例子中,python外部的所有C / Fortran依赖项都是由distro python包引入的,但是这个过程会发现所需的任何其他*包。需要将工作流程修改为您的发行版工具,以便将文件与包匹配并分析依赖项。