在OSX上从dlopen句柄中查找路径名

时间:2022-09-04 11:20:33

I have dlopen()'ed a library, and I want to invert back from the handle it passes to me to the full pathname of shared library. On Linux and friends, I know that I can use dlinfo() to get the linkmap and iterate through those structures, but I can't seem to find an analogue on OSX. The closest thing I can do is to either:

我有一个dlopen()'ed一个库,我想从它传递给我的句柄反转回到共享库的完整路径名。在Linux和朋友们,我知道我可以使用dlinfo()来获取linkmap并遍历这些结构,但我似乎无法在OSX上找到类似物。我能做的最接近的事情是:

  • Use dyld_image_count() and dyld_get_image_name(), iterate over all the currently opened libraries and hope I can guess which one corresponds to my handle

    使用dyld_image_count()和dyld_get_image_name(),迭代所有当前打开的库,希望我能猜出哪一个对应于我的句柄

  • Somehow find a symbol that lives inside of the handle I have, and pass that to dladdr().

    不知何故找到一个生活在我手柄里面的符号,并将其传递给dladdr()。

If I have apriori knowledge as to a symbol name inside of the library I just opened, I can dlsym() that and then use dladdr(). That works fine. But in the general case where I have no idea what is inside this shared library, I would need to be able to enumerate symbols to do that, which I don't know how to do either.

如果我对我刚刚打开的库中的符号名称有先验知识,我可以使用dlsym()然后使用dladdr()。这很好。但是在一般情况下,我不知道这个共享库中有什么内容,我需要能够枚举符号才能做到这一点,我不知道该怎么做。

So any tips on how to lookup the pathname of a library from its dlopen handle would be very much appreciated. Thanks!

因此,非常感谢有关如何从其dlopen句柄查找库的路径名的任何提示。谢谢!

2 个解决方案

#1


16  

Here is how you can get the absolute path of a handle returned by dlopen.

以下是如何获取dlopen返回的句柄的绝对路径。

  1. In order to get the absolute path, you need to call the dladdr function and retrieve the Dl_info.dli_fname field.
  2. 为了获得绝对路径,您需要调用dladdr函数并检索Dl_info.dli_fname字段。

  3. In order to call the dladdr function, you need to give it an address.
  4. 要调用dladdr函数,需要给它一个地址。

  5. In order to get an address given a handle, you have to call the dlsym function with a symbol.
  6. 为了获得给定句柄的地址,您必须使用符号调用dlsym函数。

  7. In order to get a symbol out of a loaded library, you have to parse the library to find its symbol table and iterate over the symbols. You need to find an external symbol because dlsym only searches for external symbols.
  8. 为了从加载的库中获取符号,您必须解析库以查找其符号表并迭代符号。您需要找到外部符号,因为dlsym仅搜索外部符号。

Put it all together and you get this:

把它们放在一起就可以了:

#import <dlfcn.h>
#import <mach-o/dyld.h>
#import <mach-o/nlist.h>
#import <stdio.h>
#import <string.h>

#ifdef __LP64__
typedef struct mach_header_64 mach_header_t;
typedef struct segment_command_64 segment_command_t;
typedef struct nlist_64 nlist_t;
#else
typedef struct mach_header mach_header_t;
typedef struct segment_command segment_command_t;
typedef struct nlist nlist_t;
#endif

static const char * first_external_symbol_for_image(const mach_header_t *header)
{
    Dl_info info;
    if (dladdr(header, &info) == 0)
        return NULL;

    segment_command_t *seg_linkedit = NULL;
    segment_command_t *seg_text = NULL;
    struct symtab_command *symtab = NULL;

    struct load_command *cmd = (struct load_command *)((intptr_t)header + sizeof(mach_header_t));
    for (uint32_t i = 0; i < header->ncmds; i++, cmd = (struct load_command *)((intptr_t)cmd + cmd->cmdsize))
    {
        switch(cmd->cmd)
        {
            case LC_SEGMENT:
            case LC_SEGMENT_64:
                if (!strcmp(((segment_command_t *)cmd)->segname, SEG_TEXT))
                    seg_text = (segment_command_t *)cmd;
                else if (!strcmp(((segment_command_t *)cmd)->segname, SEG_LINKEDIT))
                    seg_linkedit = (segment_command_t *)cmd;
                break;

            case LC_SYMTAB:
                symtab = (struct symtab_command *)cmd;
                break;
        }
    }

    if ((seg_text == NULL) || (seg_linkedit == NULL) || (symtab == NULL))
        return NULL;

    intptr_t file_slide = ((intptr_t)seg_linkedit->vmaddr - (intptr_t)seg_text->vmaddr) - seg_linkedit->fileoff;
    intptr_t strings = (intptr_t)header + (symtab->stroff + file_slide);
    nlist_t *sym = (nlist_t *)((intptr_t)header + (symtab->symoff + file_slide));

    for (uint32_t i = 0; i < symtab->nsyms; i++, sym++)
    {
        if ((sym->n_type & N_EXT) != N_EXT || !sym->n_value)
            continue;

        return (const char *)strings + sym->n_un.n_strx;
    }

    return NULL;
}

const char * pathname_for_handle(void *handle)
{
    for (int32_t i = _dyld_image_count(); i >= 0 ; i--)
    {
        const char *first_symbol = first_external_symbol_for_image((const mach_header_t *)_dyld_get_image_header(i));
        if (first_symbol && strlen(first_symbol) > 1)
        {
            handle = (void *)((intptr_t)handle | 1); // in order to trigger findExportedSymbol instead of findExportedSymbolInImageOrDependentImages. See `dlsym` implementation at http://opensource.apple.com/source/dyld/dyld-239.3/src/dyldAPIs.cpp
            first_symbol++; // in order to remove the leading underscore
            void *address = dlsym(handle, first_symbol);
            Dl_info info;
            if (dladdr(address, &info))
                return info.dli_fname;
        }
    }
    return NULL;
}

int main(int argc, const char * argv[])
{
    void *libxml2 = dlopen("libxml2.dylib", RTLD_LAZY);
    printf("libxml2 path: %s\n", pathname_for_handle(libxml2));
    dlclose(libxml2);
    return 0;
}

If you run this code, it will yield the expected result: libxml2 path: /usr/lib/libxml2.2.dylib

如果运行此代码,它将产生预期的结果:libxml2 path:/usr/lib/libxml2.2.dylib

#2


6  

After about a year of using the solution provided by 0xced, we discovered an alternative method that is simpler and avoids one (rather rare) failure mode; specifically, because 0xced's code snippet iterates through each dylib currently loaded, finds the first exported symbol, attempts to resolve it in the dylib currently being sought, and returns positive if that symbol is found in that particular dylib, you can have false positives if the first exported symbol from an arbitrary library happens to be present inside of the dylib you're currently searching for.

在使用0xced提供的解决方案大约一年后,我们发现了一种更简单的替代方法,避免了一种(相当罕见的)故障模式;具体来说,因为0xced的代码片段遍历当前加载的每个dylib,找到第一个导出的符号,尝试在当前正在搜索的dylib中解析它,如果在该特定dylib中找到该符号则返回正数,如果第一个从任意库中导出的符号恰好出现在您当前正在搜索的dylib中。

My solution was to use _dyld_get_image_name(i) to get the absolute path of each image loaded, dlopen() that image, and compare the handle (after masking out any mode bits set by dlopen() due to usage of things like RTLD_FIRST) to ensure that this dylib is actually the same file as the handle passed into my function.

我的解决方案是使用_dyld_get_image_name(i)来获取每个图像的绝对路径,dlopen()该图像,并比较句柄(在屏蔽掉由于使用RTLD_FIRST之类的东西而由dlopen()设置的任何模式位之后)确保这个dylib实际上是与传递给我的函数的句柄相同的文件。

The complete function can be seen here, as a part of the Julia Language, with the relevant portion copied below:

这里可以看到完整的功能,作为Julia语言的一部分,相关部分复制如下:

// Iterate through all images currently in memory
for (int32_t i = _dyld_image_count(); i >= 0 ; i--) {
    // dlopen() each image, check handle
    const char *image_name = _dyld_get_image_name(i);
    uv_lib_t *probe_lib = jl_load_dynamic_library(image_name, JL_RTLD_DEFAULT);
    void *probe_handle = probe_lib->handle;
    uv_dlclose(probe_lib);

    // If the handle is the same as what was passed in (modulo mode bits), return this image name
    if (((intptr_t)handle & (-4)) == ((intptr_t)probe_handle & (-4)))
        return image_name;
}

Note that functions such as jl_load_dynamic_library() are wrappers around dlopen() that return libuv types, but the spirit of the code remains the same.

请注意,诸如jl_load_dynamic_library()之类的函数是围绕dlopen()的包装器,它返回libuv类型,但代码的精神保持不变。

#1


16  

Here is how you can get the absolute path of a handle returned by dlopen.

以下是如何获取dlopen返回的句柄的绝对路径。

  1. In order to get the absolute path, you need to call the dladdr function and retrieve the Dl_info.dli_fname field.
  2. 为了获得绝对路径,您需要调用dladdr函数并检索Dl_info.dli_fname字段。

  3. In order to call the dladdr function, you need to give it an address.
  4. 要调用dladdr函数,需要给它一个地址。

  5. In order to get an address given a handle, you have to call the dlsym function with a symbol.
  6. 为了获得给定句柄的地址,您必须使用符号调用dlsym函数。

  7. In order to get a symbol out of a loaded library, you have to parse the library to find its symbol table and iterate over the symbols. You need to find an external symbol because dlsym only searches for external symbols.
  8. 为了从加载的库中获取符号,您必须解析库以查找其符号表并迭代符号。您需要找到外部符号,因为dlsym仅搜索外部符号。

Put it all together and you get this:

把它们放在一起就可以了:

#import <dlfcn.h>
#import <mach-o/dyld.h>
#import <mach-o/nlist.h>
#import <stdio.h>
#import <string.h>

#ifdef __LP64__
typedef struct mach_header_64 mach_header_t;
typedef struct segment_command_64 segment_command_t;
typedef struct nlist_64 nlist_t;
#else
typedef struct mach_header mach_header_t;
typedef struct segment_command segment_command_t;
typedef struct nlist nlist_t;
#endif

static const char * first_external_symbol_for_image(const mach_header_t *header)
{
    Dl_info info;
    if (dladdr(header, &info) == 0)
        return NULL;

    segment_command_t *seg_linkedit = NULL;
    segment_command_t *seg_text = NULL;
    struct symtab_command *symtab = NULL;

    struct load_command *cmd = (struct load_command *)((intptr_t)header + sizeof(mach_header_t));
    for (uint32_t i = 0; i < header->ncmds; i++, cmd = (struct load_command *)((intptr_t)cmd + cmd->cmdsize))
    {
        switch(cmd->cmd)
        {
            case LC_SEGMENT:
            case LC_SEGMENT_64:
                if (!strcmp(((segment_command_t *)cmd)->segname, SEG_TEXT))
                    seg_text = (segment_command_t *)cmd;
                else if (!strcmp(((segment_command_t *)cmd)->segname, SEG_LINKEDIT))
                    seg_linkedit = (segment_command_t *)cmd;
                break;

            case LC_SYMTAB:
                symtab = (struct symtab_command *)cmd;
                break;
        }
    }

    if ((seg_text == NULL) || (seg_linkedit == NULL) || (symtab == NULL))
        return NULL;

    intptr_t file_slide = ((intptr_t)seg_linkedit->vmaddr - (intptr_t)seg_text->vmaddr) - seg_linkedit->fileoff;
    intptr_t strings = (intptr_t)header + (symtab->stroff + file_slide);
    nlist_t *sym = (nlist_t *)((intptr_t)header + (symtab->symoff + file_slide));

    for (uint32_t i = 0; i < symtab->nsyms; i++, sym++)
    {
        if ((sym->n_type & N_EXT) != N_EXT || !sym->n_value)
            continue;

        return (const char *)strings + sym->n_un.n_strx;
    }

    return NULL;
}

const char * pathname_for_handle(void *handle)
{
    for (int32_t i = _dyld_image_count(); i >= 0 ; i--)
    {
        const char *first_symbol = first_external_symbol_for_image((const mach_header_t *)_dyld_get_image_header(i));
        if (first_symbol && strlen(first_symbol) > 1)
        {
            handle = (void *)((intptr_t)handle | 1); // in order to trigger findExportedSymbol instead of findExportedSymbolInImageOrDependentImages. See `dlsym` implementation at http://opensource.apple.com/source/dyld/dyld-239.3/src/dyldAPIs.cpp
            first_symbol++; // in order to remove the leading underscore
            void *address = dlsym(handle, first_symbol);
            Dl_info info;
            if (dladdr(address, &info))
                return info.dli_fname;
        }
    }
    return NULL;
}

int main(int argc, const char * argv[])
{
    void *libxml2 = dlopen("libxml2.dylib", RTLD_LAZY);
    printf("libxml2 path: %s\n", pathname_for_handle(libxml2));
    dlclose(libxml2);
    return 0;
}

If you run this code, it will yield the expected result: libxml2 path: /usr/lib/libxml2.2.dylib

如果运行此代码,它将产生预期的结果:libxml2 path:/usr/lib/libxml2.2.dylib

#2


6  

After about a year of using the solution provided by 0xced, we discovered an alternative method that is simpler and avoids one (rather rare) failure mode; specifically, because 0xced's code snippet iterates through each dylib currently loaded, finds the first exported symbol, attempts to resolve it in the dylib currently being sought, and returns positive if that symbol is found in that particular dylib, you can have false positives if the first exported symbol from an arbitrary library happens to be present inside of the dylib you're currently searching for.

在使用0xced提供的解决方案大约一年后,我们发现了一种更简单的替代方法,避免了一种(相当罕见的)故障模式;具体来说,因为0xced的代码片段遍历当前加载的每个dylib,找到第一个导出的符号,尝试在当前正在搜索的dylib中解析它,如果在该特定dylib中找到该符号则返回正数,如果第一个从任意库中导出的符号恰好出现在您当前正在搜索的dylib中。

My solution was to use _dyld_get_image_name(i) to get the absolute path of each image loaded, dlopen() that image, and compare the handle (after masking out any mode bits set by dlopen() due to usage of things like RTLD_FIRST) to ensure that this dylib is actually the same file as the handle passed into my function.

我的解决方案是使用_dyld_get_image_name(i)来获取每个图像的绝对路径,dlopen()该图像,并比较句柄(在屏蔽掉由于使用RTLD_FIRST之类的东西而由dlopen()设置的任何模式位之后)确保这个dylib实际上是与传递给我的函数的句柄相同的文件。

The complete function can be seen here, as a part of the Julia Language, with the relevant portion copied below:

这里可以看到完整的功能,作为Julia语言的一部分,相关部分复制如下:

// Iterate through all images currently in memory
for (int32_t i = _dyld_image_count(); i >= 0 ; i--) {
    // dlopen() each image, check handle
    const char *image_name = _dyld_get_image_name(i);
    uv_lib_t *probe_lib = jl_load_dynamic_library(image_name, JL_RTLD_DEFAULT);
    void *probe_handle = probe_lib->handle;
    uv_dlclose(probe_lib);

    // If the handle is the same as what was passed in (modulo mode bits), return this image name
    if (((intptr_t)handle & (-4)) == ((intptr_t)probe_handle & (-4)))
        return image_name;
}

Note that functions such as jl_load_dynamic_library() are wrappers around dlopen() that return libuv types, but the spirit of the code remains the same.

请注意,诸如jl_load_dynamic_library()之类的函数是围绕dlopen()的包装器,它返回libuv类型,但代码的精神保持不变。