Inspired by a recent question, I'd like to know if anyone knows how to get gcc
to generate the x86-64 bts
instruction (bit test and set) on the Linux x86-64 platforms, without resorting to inline assembly or to nonstandard compiler intrinsics.
受最近一个问题的启发,我想知道是否有人知道如何让gcc在Linux x86-64平台上生成x86-64 bts指令(位测试和设置),而不用求助于内联汇编或非标准编译器intrinsic。
Related questions:
相关问题:
-
Why doesn't gcc do this for a simple
|=
operation were the right-hand side has exactly 1 bit set?为什么gcc不做一个简单的|=操作如果右边正好有1位?
-
How to get
bts
using compiler intrinsics or theasm
directive如何使用编译器intrinsic或asm指令获取bts
Portability is more important to me than bts
, so I won't use and asm
directive, and if there's another solution, I prefer not to use compiler instrinsics.
可移植性对我来说比bts更重要,所以我不会使用和asm指令,如果有其他解决方案,我宁愿不使用编译器研习。
EDIT: The C source language does not support atomic operations, so I'm not particularly interested in getting atomic test-and-set (even though that's the original reason for test-and-set to exist in the first place). If I want something atomic I know I have no chance of doing it with standard C source: it has to be an intrinsic, a library function, or inline assembly. (I have implemented atomic operations in compilers that support multiple threads.)
编辑:C源语言不支持原子操作,所以我对获取原子测试和设置并不特别感兴趣(尽管这是测试和设置存在的最初原因)。如果我想要一个原子的东西,我知道我不可能用标准的C源代码来做:它必须是一个内部的,一个库函数,或者内联程序集。(我在支持多个线程的编译器中实现了原子操作。)
3 个解决方案
#1
3
It is in the first answer for the first link - how much does it matter in grand scheme of things. The only part when you test bits are:
这是第一个环节的第一个答案——它在事物的宏伟蓝图中有多重要?测试位的唯一部分是:
- Low level drivers. However if you are writing one you probably know ASM, it is sufficient tided to the system and probably most delays are on I/O
- 低水平的司机。但是,如果您正在编写一个您可能知道ASM的程序,那么它对系统来说已经足够了,而且可能大多数延迟都在I/O上
- Testing for flags. It is usually either on initialisation (one time only at the beginning) or on some shared computation (which takes much more time).
- 测试旗帜。它通常要么是初始化(一次只在开始),要么是共享计算(这需要更多的时间)。
The overall impact on performance of applications and macrobenchmarks is likely to be minimal even if microbenchmarks shows an improvement.
即使微基准显示出改进,对应用程序和宏观基准的性能的总体影响也很可能是最小的。
To the Edit part - using bts
alone does not guarantee the atomic of the operation. All it guarantee is that it will be atomic on this core (so is or
done on memory). On multi-processor units (uncommon) or multi-core units (very common) you still have to synchronize with other processors.
对于编辑部分——仅使用bts并不保证操作的原子性。它所保证的是它在这个核心上是原子的(在内存上也是如此或如此)。在多处理器单元(不常见)或多核单元(非常常见)上,仍然需要与其他处理器进行同步。
As synchronization is much more expensive I belive that difference between:
由于同步要昂贵得多,我相信这两者之间的区别:
asm("lock bts %0, %1" : "+m" (*array) : "r" (bit));
and
和
asm("lock or %0, %1" : "+m" (*array) : "r" (1 << bit));
is minimal. And the second form:
是最小的。第二种形式:
- Can set several flag at once
- 可以同时设置几个标志吗?
- Have nice
__sync_fetch_and_or (array, 1 << bit)
form (working on gcc and intel compiler as far as I remember). - 拥有漂亮的__sync_fetch_and_or(数组,1 << bit)形式(据我所知,用于gcc和intel编译器)。
#2
1
I use the gcc atomic builtins such as __sync_lock_test_and_set
( http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html ). Changing the -march
flag will directly affect what is generated. I'm using it with i686
right now, but http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/i386-and-x86_002d64-Options.html#i386-and-x86_002d64-Options shows all the possibilities.
我使用了gcc的原子构建,比如__sync_lock_test_and_set(http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html)。更改-march标志将直接影响生成的内容。我现在使用的是i686,但http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/i386and -x86_002d64- options.html #i386- x86_002d64选项显示了所有的可能性。
I realize it's not exactly what you are asking for, but I found those two web pages very useful when I was looking for mechanisms like that.
我知道这并不是你想要的,但我发现这两个网页在我寻找这样的机制时非常有用。
#3
0
I believe (but am not certain) that neither the C++ or C standards have any mechanisms for these types of synchronization mechanisms yet. Support for higher level synchronization mechanisms are in various states of standardization, but I don't even think one of those would allow you the access of the type of primitive you're after.
我相信(但不确定)c++或C标准对于这些类型的同步机制都没有任何机制。对更高级别同步机制的支持处于不同的标准化状态,但我甚至认为其中之一都不允许您访问您所追求的原语类型。
Are you programming lock-free datastructures where locks are insufficient?
您是否正在为锁不足的无锁数据结构编程?
You probably want to just go ahead and use gcc's non-standard extensions and/or operating system or library provided synchronization primitives. I would bet there's a library that might provide the type of portability you're looking for if you're concerned about using compiler intrinsics. (Though really, I think most people just bite the bullet and use gcc-specific code when they need it. Not ideal, but the standards haven't really been keeping up.)
您可能希望继续使用gcc的非标准扩展和/或操作系统或库提供的同步原语。如果您关心使用编译器intrinsic,我敢打赌一定有一个库可以提供您正在寻找的可移植性类型。(尽管如此,我认为大多数人只是咬紧牙关,在需要的时候使用特定于gcc的代码。不理想,但标准并没有真正跟上。
#1
3
It is in the first answer for the first link - how much does it matter in grand scheme of things. The only part when you test bits are:
这是第一个环节的第一个答案——它在事物的宏伟蓝图中有多重要?测试位的唯一部分是:
- Low level drivers. However if you are writing one you probably know ASM, it is sufficient tided to the system and probably most delays are on I/O
- 低水平的司机。但是,如果您正在编写一个您可能知道ASM的程序,那么它对系统来说已经足够了,而且可能大多数延迟都在I/O上
- Testing for flags. It is usually either on initialisation (one time only at the beginning) or on some shared computation (which takes much more time).
- 测试旗帜。它通常要么是初始化(一次只在开始),要么是共享计算(这需要更多的时间)。
The overall impact on performance of applications and macrobenchmarks is likely to be minimal even if microbenchmarks shows an improvement.
即使微基准显示出改进,对应用程序和宏观基准的性能的总体影响也很可能是最小的。
To the Edit part - using bts
alone does not guarantee the atomic of the operation. All it guarantee is that it will be atomic on this core (so is or
done on memory). On multi-processor units (uncommon) or multi-core units (very common) you still have to synchronize with other processors.
对于编辑部分——仅使用bts并不保证操作的原子性。它所保证的是它在这个核心上是原子的(在内存上也是如此或如此)。在多处理器单元(不常见)或多核单元(非常常见)上,仍然需要与其他处理器进行同步。
As synchronization is much more expensive I belive that difference between:
由于同步要昂贵得多,我相信这两者之间的区别:
asm("lock bts %0, %1" : "+m" (*array) : "r" (bit));
and
和
asm("lock or %0, %1" : "+m" (*array) : "r" (1 << bit));
is minimal. And the second form:
是最小的。第二种形式:
- Can set several flag at once
- 可以同时设置几个标志吗?
- Have nice
__sync_fetch_and_or (array, 1 << bit)
form (working on gcc and intel compiler as far as I remember). - 拥有漂亮的__sync_fetch_and_or(数组,1 << bit)形式(据我所知,用于gcc和intel编译器)。
#2
1
I use the gcc atomic builtins such as __sync_lock_test_and_set
( http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html ). Changing the -march
flag will directly affect what is generated. I'm using it with i686
right now, but http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/i386-and-x86_002d64-Options.html#i386-and-x86_002d64-Options shows all the possibilities.
我使用了gcc的原子构建,比如__sync_lock_test_and_set(http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html)。更改-march标志将直接影响生成的内容。我现在使用的是i686,但http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/i386and -x86_002d64- options.html #i386- x86_002d64选项显示了所有的可能性。
I realize it's not exactly what you are asking for, but I found those two web pages very useful when I was looking for mechanisms like that.
我知道这并不是你想要的,但我发现这两个网页在我寻找这样的机制时非常有用。
#3
0
I believe (but am not certain) that neither the C++ or C standards have any mechanisms for these types of synchronization mechanisms yet. Support for higher level synchronization mechanisms are in various states of standardization, but I don't even think one of those would allow you the access of the type of primitive you're after.
我相信(但不确定)c++或C标准对于这些类型的同步机制都没有任何机制。对更高级别同步机制的支持处于不同的标准化状态,但我甚至认为其中之一都不允许您访问您所追求的原语类型。
Are you programming lock-free datastructures where locks are insufficient?
您是否正在为锁不足的无锁数据结构编程?
You probably want to just go ahead and use gcc's non-standard extensions and/or operating system or library provided synchronization primitives. I would bet there's a library that might provide the type of portability you're looking for if you're concerned about using compiler intrinsics. (Though really, I think most people just bite the bullet and use gcc-specific code when they need it. Not ideal, but the standards haven't really been keeping up.)
您可能希望继续使用gcc的非标准扩展和/或操作系统或库提供的同步原语。如果您关心使用编译器intrinsic,我敢打赌一定有一个库可以提供您正在寻找的可移植性类型。(尽管如此,我认为大多数人只是咬紧牙关,在需要的时候使用特定于gcc的代码。不理想,但标准并没有真正跟上。