在自定义libc中实现线程本地存储

时间:2022-10-31 15:13:30

I'm implementing a small subset of libc for very small and statically linked programs, and I figured that adding TLS support would be a good learning experience. I use Ulrich Drepper's TLS document as a reference.

我正在为非常小的和静态链接的程序实现一小部分libc,我认为添加TLS支持将是一个很好的学习体验。我使用Ulrich Drepper的TLS文档作为参考。

I have two strings set up to try this out:

我设置了两个字符串来试试这个:

static __thread const char msg1[] = "TLS (1).\n"; /* 10 bytes */
static __thread const char msg2[] = "TLS (2).\n"; /* 10 bytes */

And the compiler generates the following instructions to access them:

编译器生成以下指令以访问它们:

mov    rbx, QWORD PTR fs:0x0 ; Load TLS.
lea    rsi, [rbx-0x14]       ; Get a pointer to 'msg1'. 20 byte offset.
lea    rsi, [rbx-0xa]        ; Get a pointer to 'msg2'. 10 byte offset.

Let's assume I place the TCB somewhere on the stack:

我们假设我将TCB放在堆栈的某个位置:

struct tcb {
    void* self; /* Points to self. I read that this was necessary somewhere. */
    int errno;  /* Per-thread errno variable. */
    int padding;
};

And then place the TLS area just next to it at tls = &tcb - tls_size. Then I set the FS register to point at fs = tls + tls_size, and copy the TLS initialization image to tls.

然后将TLS区域放在它旁边的tls =&tcb - tls_size。然后我将FS寄存器设置为指向fs = tls + tls_size,并将TLS初始化映像复制到tls。

However, this doesn't work. I have verified that I locate the TLS initialization image properly by writing the 20 bytes at tls_image to stdout. This either leads me to believe that I place the TCB and/or TLS area incorrectly, or that I'm otherwise not conforming to the ABI.

但是,这不起作用。我已经验证通过将tls_image中的20个字节写入stdout来正确定位TLS初始化映像。这或者让我相信我错误地放置了TCB和/或TLS区域,或者说我不符合ABI。

  • I set the FS register using arch_prctl(2). Do I need to use set_thread_area(2) somehow?
  • 我使用arch_prctl(2)设置FS寄存器。我是否需要以某种方式使用set_thread_area(2)?

  • I don't have a dtv. I'm assuming this isn't necessary since I am linking statically.
  • 我没有dtv。我假设这不是必要的,因为我静态链接。

Any ideas as to what I'm doing wrong? Thanks a lot!

关于我做错了什么的任何想法?非常感谢!

1 个解决方案

#1


2  

I'm implementing a small subset of libc for very small and statically linked programs, and I figured that adding TLS support would be a good learning experience.

我正在为非常小的和静态链接的程序实现一小部分libc,我认为添加TLS支持将是一个很好的学习体验。

Awesome idea! I had to implement my own TLS in a project because I could not use any common thread library like pthread. I do not have a completely solution for your problems, but sharing my experience could be useful.

真棒的想法!我必须在项目中实现自己的TLS,因为我无法使用任何常见的线程库,如pthread。我没有完全解决您的问题,但分享我的经验可能会有用。

Give a look also at this link, it may be useful.

看看这个链接,它可能会有用。

I set the FS register using arch_prctl(2). Do I need to use set_thread_area(2) somehow?

我使用arch_prctl(2)设置FS寄存器。我是否需要以某种方式使用set_thread_area(2)?

The answer depends on the architecture, you are actually using. If you are using a x86-64 bit, you should use exclusively arch_prctl to set the FS register to an area of memory that you want to use as TLS (it allows you to address memory areas bigger than 4GB). While for x86-32 you must use set_thread_area as it is the only system call supported by the kernel.

答案取决于您实际使用的架构。如果您使用的是x86-64位,则应该使用arch_prctl将FS寄存器设置为要用作TLS的内存区域(它允许您处理大于4GB的内存区域)。而对于x86-32,您必须使用set_thread_area,因为它是内核支持的唯一系统调用。

The idea behind my implementation is to allocate a private memory area for each thread and save its address into the %GS register. It is a rather easy method, but , in my case, it worked quite well. Each time you want to access the private area of a thread you just need to use as base address the value saved in %GS and an offset which identifies a memory location. I usually allocate a memory page (4096) for each thread and I divide it in 8 bytes blocks. So, I have 512 private memory slots for each thread, which can be accessed like an array whose indexes span from 0 to 511.

我的实现背后的想法是为每个线程分配一个私有内存区域,并将其地址保存到%GS寄存器中。这是一个相当简单的方法,但在我的情况下,它运作得很好。每次要访问线程的私有区域时,只需要将%GS中保存的值和标识内存位置的偏移量用作基址。我通常为每个线程分配一个内存页面(4096),并将其分成8个字节的块。因此,每个线程有512个私有内存插槽,可以像索引从0到511的数组一样访问。

This is the code I use :

这是我使用的代码:

#

define _GNU_SOURCE 1 

#include "tls.h"
#include <asm/ldt.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <sys/prctl.h>
#include <asm/prctl.h>
#include <sys/syscall.h> 
#include <unistd.h> 

void * install_tls() {
  void *addr = mmap(0, 4096, PROT_READ|PROT_WRITE,
                       MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
  if (syscall(SYS_arch_prctl,ARCH_SET_GS, addr) < 0) 
      return NULL;

   return addr;
}

void freeTLS() {
    void *addr;
    syscall(SYS_arch_prctl,ARCH_GET_GS, &addr);  
    munmap(addr, 4096);
}

bool set_tls_value(int idx, unsigned long val) {
    if (idx < 0 || idx >= 4096/8) {
      return false;
    }
    asm volatile(
        "movq %0, %%gs:(%1)\n"
        :
        : "q"((void *)val), "q"(8ll * idx));
    return true;
}


unsigned long get_tls_value(int idx) {
    long long rc;
    if (idx < 0 || idx >= 4096/8) {
      return 0;
    }
    asm volatile(
        "movq %%gs:(%1), %0\n"
        : "=q"(rc)
        : "q"(8ll * idx));
    return rc;
  }

This is the header with some macros :

这是带有一些宏的标题:

#ifndef TLS_H
#define TLS_H

#include <stdbool.h>

void *install_tls(); 
void freeTLS();
bool set_tls_value (int, unsigned long); 
unsigned long get_tls_value(int ); 

/*
 *macros used to set and retrieve the values 
 from the tls area
*/ 

#define TLS_TID 0x0
#define TLS_FD  0x8 
#define TLS_MONITORED 0x10

#define set_local_tid(_x) \
    set_tls_value(TLS_TID, (unsigned long)_x)

#define set_local_fd(_x) \
    set_tls_value(TLS_FD, (unsigned long)_x)

#define set_local_monitored(_x) \
    set_tls_value(TLS_MONITORED, (unsigned long)_x)

#define get_local_tid() \
    get_tls_value(TLS_TID)

#define get_local_fd() \
    get_tls_value(TLS_FD)

#define get_local_monitored() \
    get_tls_value(TLS_MONITORED)



#endif /* end of include guard: TLS_H */

The first action to be accomplished by each thread is to install the TLS memory area. Once the TLS are has been initialised, each thread can start using this area as private TLS.

每个线程要完成的第一个操作是安装TLS内存区域。一旦初始化了TLS,每个线程就可以开始将此区域用作私有TLS。

#1


2  

I'm implementing a small subset of libc for very small and statically linked programs, and I figured that adding TLS support would be a good learning experience.

我正在为非常小的和静态链接的程序实现一小部分libc,我认为添加TLS支持将是一个很好的学习体验。

Awesome idea! I had to implement my own TLS in a project because I could not use any common thread library like pthread. I do not have a completely solution for your problems, but sharing my experience could be useful.

真棒的想法!我必须在项目中实现自己的TLS,因为我无法使用任何常见的线程库,如pthread。我没有完全解决您的问题,但分享我的经验可能会有用。

Give a look also at this link, it may be useful.

看看这个链接,它可能会有用。

I set the FS register using arch_prctl(2). Do I need to use set_thread_area(2) somehow?

我使用arch_prctl(2)设置FS寄存器。我是否需要以某种方式使用set_thread_area(2)?

The answer depends on the architecture, you are actually using. If you are using a x86-64 bit, you should use exclusively arch_prctl to set the FS register to an area of memory that you want to use as TLS (it allows you to address memory areas bigger than 4GB). While for x86-32 you must use set_thread_area as it is the only system call supported by the kernel.

答案取决于您实际使用的架构。如果您使用的是x86-64位,则应该使用arch_prctl将FS寄存器设置为要用作TLS的内存区域(它允许您处理大于4GB的内存区域)。而对于x86-32,您必须使用set_thread_area,因为它是内核支持的唯一系统调用。

The idea behind my implementation is to allocate a private memory area for each thread and save its address into the %GS register. It is a rather easy method, but , in my case, it worked quite well. Each time you want to access the private area of a thread you just need to use as base address the value saved in %GS and an offset which identifies a memory location. I usually allocate a memory page (4096) for each thread and I divide it in 8 bytes blocks. So, I have 512 private memory slots for each thread, which can be accessed like an array whose indexes span from 0 to 511.

我的实现背后的想法是为每个线程分配一个私有内存区域,并将其地址保存到%GS寄存器中。这是一个相当简单的方法,但在我的情况下,它运作得很好。每次要访问线程的私有区域时,只需要将%GS中保存的值和标识内存位置的偏移量用作基址。我通常为每个线程分配一个内存页面(4096),并将其分成8个字节的块。因此,每个线程有512个私有内存插槽,可以像索引从0到511的数组一样访问。

This is the code I use :

这是我使用的代码:

#

define _GNU_SOURCE 1 

#include "tls.h"
#include <asm/ldt.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <sys/prctl.h>
#include <asm/prctl.h>
#include <sys/syscall.h> 
#include <unistd.h> 

void * install_tls() {
  void *addr = mmap(0, 4096, PROT_READ|PROT_WRITE,
                       MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
  if (syscall(SYS_arch_prctl,ARCH_SET_GS, addr) < 0) 
      return NULL;

   return addr;
}

void freeTLS() {
    void *addr;
    syscall(SYS_arch_prctl,ARCH_GET_GS, &addr);  
    munmap(addr, 4096);
}

bool set_tls_value(int idx, unsigned long val) {
    if (idx < 0 || idx >= 4096/8) {
      return false;
    }
    asm volatile(
        "movq %0, %%gs:(%1)\n"
        :
        : "q"((void *)val), "q"(8ll * idx));
    return true;
}


unsigned long get_tls_value(int idx) {
    long long rc;
    if (idx < 0 || idx >= 4096/8) {
      return 0;
    }
    asm volatile(
        "movq %%gs:(%1), %0\n"
        : "=q"(rc)
        : "q"(8ll * idx));
    return rc;
  }

This is the header with some macros :

这是带有一些宏的标题:

#ifndef TLS_H
#define TLS_H

#include <stdbool.h>

void *install_tls(); 
void freeTLS();
bool set_tls_value (int, unsigned long); 
unsigned long get_tls_value(int ); 

/*
 *macros used to set and retrieve the values 
 from the tls area
*/ 

#define TLS_TID 0x0
#define TLS_FD  0x8 
#define TLS_MONITORED 0x10

#define set_local_tid(_x) \
    set_tls_value(TLS_TID, (unsigned long)_x)

#define set_local_fd(_x) \
    set_tls_value(TLS_FD, (unsigned long)_x)

#define set_local_monitored(_x) \
    set_tls_value(TLS_MONITORED, (unsigned long)_x)

#define get_local_tid() \
    get_tls_value(TLS_TID)

#define get_local_fd() \
    get_tls_value(TLS_FD)

#define get_local_monitored() \
    get_tls_value(TLS_MONITORED)



#endif /* end of include guard: TLS_H */

The first action to be accomplished by each thread is to install the TLS memory area. Once the TLS are has been initialised, each thread can start using this area as private TLS.

每个线程要完成的第一个操作是安装TLS内存区域。一旦初始化了TLS,每个线程就可以开始将此区域用作私有TLS。