使用C预处理获取字符串的整数值

时间:2022-11-25 10:11:27

How would I create a C macro to get the integer value of a string? The specific use-case is following on from a question here. I want to change code like this:

如何创建一个C宏来获取字符串的整数值?具体的用例来自这里的一个问题。我想改变这样的代码:

enum insn {
    sysenter = (uint64_t)'r' << 56 | (uint64_t)'e' << 48 |
               (uint64_t)'t' << 40 | (uint64_t)'n' << 32 |
               (uint64_t)'e' << 24 | (uint64_t)'s' << 16 |
               (uint64_t)'y' << 8  | (uint64_t)'s',
    mov = (uint64_t)'v' << 16 | (uint64_t)'o' << 8 |
          (uint64_t)'m'
};

To this:

enum insn {
    sysenter = INSN_TO_ENUM("sysenter"),
    mov      = INSN_TO_ENUM("mov")
};

Where INSN_TO_ENUM expands to the same code. The performance would be the same, but the readability would be boosted by a lot.

INSN_TO_ENUM扩展为相同代码的位置。性能将是相同的,但可读性将大大提高。

I'm suspecting that in this form it might not be possible because of a the C preprocessor's inability for string processing, so this would also be an unpreferred but acceptable solution (variable argument macro):

我怀疑在这种形式下它可能是不可能的,因为C预处理器无法进行字符串处理,所以这也是一个不可取但可接受的解决方案(变量参数宏):

enum insn {
    sysenter = INSN_TO_ENUM('s','y','s','e','n','t','e','r'),
    mov      = INSN_TO_ENUM('m','o','v')
};

4 个解决方案

#1


4  

Here's a compile-time, pure C solution, which you indicated as acceptable. You may need to extend it for longer mnemonics. I'll keep on thinking about the desired one (i.e. INSN_TO_ENUM("sysenter")). Interesting question :)

这是一个编译时纯C解决方案,您表示可以接受。您可能需要延长它以获得更长的助记符。我将继续考虑所需的一个(即INSN_TO_ENUM(“sysenter”))。有趣的问题:)

#include <stdio.h>

#define head(h, t...) h
#define tail(h, t...) t

#define A(n, c...) (((long long) (head(c))) << (n)) | B(n + 8, tail(c))
#define B(n, c...) (((long long) (head(c))) << (n)) | C(n + 8, tail(c))
#define C(n, c...) (((long long) (head(c))) << (n)) | D(n + 8, tail(c))
#define D(n, c...) (((long long) (head(c))) << (n)) | E(n + 8, tail(c))
#define E(n, c...) (((long long) (head(c))) << (n)) | F(n + 8, tail(c))
#define F(n, c...) (((long long) (head(c))) << (n)) | G(n + 8, tail(c))
#define G(n, c...) (((long long) (head(c))) << (n)) | H(n + 8, tail(c))
#define H(n, c...) (((long long) (head(c))) << (n)) /* extend here */

#define INSN_TO_ENUM(c...) A(0, c, 0, 0, 0, 0, 0, 0, 0)

enum insn {
    sysenter = INSN_TO_ENUM('s','y','s','e','n','t','e','r'),
    mov      = INSN_TO_ENUM('m','o','v')
};

int main()
{
    printf("sysenter = %llx\nmov = %x\n", sysenter, mov);
    return 0;
}

#2


2  

EDIT: This answer may be helpful so I'm not deleting it, but doesn't specifically answer the question. It DOES convert strings to numbers, but cannot be placed in an enum because it doesn't compute the number at compile-time.

编辑:这个答案可能会有所帮助,所以我不是删除它,但没有具体回答这个问题。它将字符串转换为数字,但不能放在枚举中,因为它不会在编译时计算数字。

Well, since your integers are 64 bit, you only have the first 8 characters of any string to worry about. Therefore, you can write the thing 8 times, making sure you don't go out of the string bound:

好吧,因为你的整数是64位,你只需要担心任何字符串的前8个字符。因此,你可以写东西8次,确保你没有走出字符串绑定:

#define GET_NTH_BYTE(x, n)   (sizeof(x) <= n?0:((uint64_t)x[n] << (n*8)))
#define INSN_TO_ENUM(x)      GET_NTH_BYTE(x, 0)\
                            |GET_NTH_BYTE(x, 1)\
                            |GET_NTH_BYTE(x, 2)\
                            |GET_NTH_BYTE(x, 3)\
                            |GET_NTH_BYTE(x, 4)\
                            |GET_NTH_BYTE(x, 5)\
                            |GET_NTH_BYTE(x, 6)\
                            |GET_NTH_BYTE(x, 7)

What it does is basically to check at each byte whether it is in the limit of the string and if it is, then gives you the corresponding byte.

它的作用基本上是检查每个字节是否在字符串的限制内,如果是,则给出相应的字节。

Note: that this only works on literal strings.

注意:这仅适用于文字字符串。

If you want to be able to convert any string, you can give the length of the string with it:

如果你想能够转换任何字符串,你可以用它给出字符串的长度:

#define GET_NTH_BYTE(x, n, l)   (l < n?0:((uint64_t)x[n] << (n*8)))
#define INSN_TO_ENUM(x, l)      GET_NTH_BYTE(x, 0, l)\
                               |GET_NTH_BYTE(x, 1, l)\
                               |GET_NTH_BYTE(x, 2, l)\
                               |GET_NTH_BYTE(x, 3, l)\
                               |GET_NTH_BYTE(x, 4, l)\
                               |GET_NTH_BYTE(x, 5, l)\
                               |GET_NTH_BYTE(x, 6, l)\
                               |GET_NTH_BYTE(x, 7, l)

So for example:

例如:

int length = strlen(your_string);
int num = INSN_TO_ENUM(your_string, length);

Finally, there is a way to avoid giving the length, but it requires the compiler actually computing the phrases of INSN_TO_ENUM from left-to-right. I'm not sure if this is standard:

最后,有一种方法可以避免给出长度,但它需要编译器实际上从左到右计算INSN_TO_ENUM的短语。我不确定这是否标准:

static int _nul_seen;
#define GET_NTH_BYTE(x, n)  ((_nul_seen || x[n] == '\0')?(_nul_seen=1)&0:((uint64_t)x[n] << (n*8)))
#define INSN_TO_ENUM(x)     (_nul_seen=0)|
                              (GET_NTH_BYTE(x, 0)\
                              |GET_NTH_BYTE(x, 1)\
                              |GET_NTH_BYTE(x, 2)\
                              |GET_NTH_BYTE(x, 3)\
                              |GET_NTH_BYTE(x, 4)\
                              |GET_NTH_BYTE(x, 5)\
                              |GET_NTH_BYTE(x, 6)\
                              |GET_NTH_BYTE(x, 7))

#3


1  

If you can use C++11 on a recent compiler

如果您可以在最近的编译器上使用C ++ 11

constexpr uint64_t insn_to_enum(const char* x) {
    return *x ? *x + (insn_to_enum(x+1) << 8) : 0;
}

enum insn { sysenter = insn_to_enum("sysenter") };

will work and calculate the constant during compile time.

将在编译期间工作并计算常量。

#4


0  

Some recursive template magic may do the trick. Creates no code if constants are known at compile time.

一些递归模板魔法可以解决问题。如果在编译时已知常量,则不创建代码。

May want to keep an eye on your build times if you use it in anger though.

如果你在愤怒中使用它,可能想要留意你的构建时间。

// the main recusrsive template magic. 
template <int N>
struct CharSHift 
{
    static __int64  charShift(char* string )
    {
        return string[N-1] | (CharSHift<N-1>::charShift(string)<<8);
    }
};

// need to provide a specialisation for 0 as this is where we need the recursion to stop
template <>
struct CharSHift<0> 
{
    static __int64 charShift(char* string )
    {
        return 0;
    }
};

// Template stuff is all a bit hairy too look at. So attempt to improve that with some macro wrapping !
#define CT_IFROMS(_string_) CharSHift<sizeof _string_ -1 >::charShift(_string_)

int _tmain(int argc, _TCHAR* argv[])
{
    __int64 hash0 = CT_IFROMS("abcdefgh");

    printf("%08llX \n",hash0);
    return 0;
}

#1


4  

Here's a compile-time, pure C solution, which you indicated as acceptable. You may need to extend it for longer mnemonics. I'll keep on thinking about the desired one (i.e. INSN_TO_ENUM("sysenter")). Interesting question :)

这是一个编译时纯C解决方案,您表示可以接受。您可能需要延长它以获得更长的助记符。我将继续考虑所需的一个(即INSN_TO_ENUM(“sysenter”))。有趣的问题:)

#include <stdio.h>

#define head(h, t...) h
#define tail(h, t...) t

#define A(n, c...) (((long long) (head(c))) << (n)) | B(n + 8, tail(c))
#define B(n, c...) (((long long) (head(c))) << (n)) | C(n + 8, tail(c))
#define C(n, c...) (((long long) (head(c))) << (n)) | D(n + 8, tail(c))
#define D(n, c...) (((long long) (head(c))) << (n)) | E(n + 8, tail(c))
#define E(n, c...) (((long long) (head(c))) << (n)) | F(n + 8, tail(c))
#define F(n, c...) (((long long) (head(c))) << (n)) | G(n + 8, tail(c))
#define G(n, c...) (((long long) (head(c))) << (n)) | H(n + 8, tail(c))
#define H(n, c...) (((long long) (head(c))) << (n)) /* extend here */

#define INSN_TO_ENUM(c...) A(0, c, 0, 0, 0, 0, 0, 0, 0)

enum insn {
    sysenter = INSN_TO_ENUM('s','y','s','e','n','t','e','r'),
    mov      = INSN_TO_ENUM('m','o','v')
};

int main()
{
    printf("sysenter = %llx\nmov = %x\n", sysenter, mov);
    return 0;
}

#2


2  

EDIT: This answer may be helpful so I'm not deleting it, but doesn't specifically answer the question. It DOES convert strings to numbers, but cannot be placed in an enum because it doesn't compute the number at compile-time.

编辑:这个答案可能会有所帮助,所以我不是删除它,但没有具体回答这个问题。它将字符串转换为数字,但不能放在枚举中,因为它不会在编译时计算数字。

Well, since your integers are 64 bit, you only have the first 8 characters of any string to worry about. Therefore, you can write the thing 8 times, making sure you don't go out of the string bound:

好吧,因为你的整数是64位,你只需要担心任何字符串的前8个字符。因此,你可以写东西8次,确保你没有走出字符串绑定:

#define GET_NTH_BYTE(x, n)   (sizeof(x) <= n?0:((uint64_t)x[n] << (n*8)))
#define INSN_TO_ENUM(x)      GET_NTH_BYTE(x, 0)\
                            |GET_NTH_BYTE(x, 1)\
                            |GET_NTH_BYTE(x, 2)\
                            |GET_NTH_BYTE(x, 3)\
                            |GET_NTH_BYTE(x, 4)\
                            |GET_NTH_BYTE(x, 5)\
                            |GET_NTH_BYTE(x, 6)\
                            |GET_NTH_BYTE(x, 7)

What it does is basically to check at each byte whether it is in the limit of the string and if it is, then gives you the corresponding byte.

它的作用基本上是检查每个字节是否在字符串的限制内,如果是,则给出相应的字节。

Note: that this only works on literal strings.

注意:这仅适用于文字字符串。

If you want to be able to convert any string, you can give the length of the string with it:

如果你想能够转换任何字符串,你可以用它给出字符串的长度:

#define GET_NTH_BYTE(x, n, l)   (l < n?0:((uint64_t)x[n] << (n*8)))
#define INSN_TO_ENUM(x, l)      GET_NTH_BYTE(x, 0, l)\
                               |GET_NTH_BYTE(x, 1, l)\
                               |GET_NTH_BYTE(x, 2, l)\
                               |GET_NTH_BYTE(x, 3, l)\
                               |GET_NTH_BYTE(x, 4, l)\
                               |GET_NTH_BYTE(x, 5, l)\
                               |GET_NTH_BYTE(x, 6, l)\
                               |GET_NTH_BYTE(x, 7, l)

So for example:

例如:

int length = strlen(your_string);
int num = INSN_TO_ENUM(your_string, length);

Finally, there is a way to avoid giving the length, but it requires the compiler actually computing the phrases of INSN_TO_ENUM from left-to-right. I'm not sure if this is standard:

最后,有一种方法可以避免给出长度,但它需要编译器实际上从左到右计算INSN_TO_ENUM的短语。我不确定这是否标准:

static int _nul_seen;
#define GET_NTH_BYTE(x, n)  ((_nul_seen || x[n] == '\0')?(_nul_seen=1)&0:((uint64_t)x[n] << (n*8)))
#define INSN_TO_ENUM(x)     (_nul_seen=0)|
                              (GET_NTH_BYTE(x, 0)\
                              |GET_NTH_BYTE(x, 1)\
                              |GET_NTH_BYTE(x, 2)\
                              |GET_NTH_BYTE(x, 3)\
                              |GET_NTH_BYTE(x, 4)\
                              |GET_NTH_BYTE(x, 5)\
                              |GET_NTH_BYTE(x, 6)\
                              |GET_NTH_BYTE(x, 7))

#3


1  

If you can use C++11 on a recent compiler

如果您可以在最近的编译器上使用C ++ 11

constexpr uint64_t insn_to_enum(const char* x) {
    return *x ? *x + (insn_to_enum(x+1) << 8) : 0;
}

enum insn { sysenter = insn_to_enum("sysenter") };

will work and calculate the constant during compile time.

将在编译期间工作并计算常量。

#4


0  

Some recursive template magic may do the trick. Creates no code if constants are known at compile time.

一些递归模板魔法可以解决问题。如果在编译时已知常量,则不创建代码。

May want to keep an eye on your build times if you use it in anger though.

如果你在愤怒中使用它,可能想要留意你的构建时间。

// the main recusrsive template magic. 
template <int N>
struct CharSHift 
{
    static __int64  charShift(char* string )
    {
        return string[N-1] | (CharSHift<N-1>::charShift(string)<<8);
    }
};

// need to provide a specialisation for 0 as this is where we need the recursion to stop
template <>
struct CharSHift<0> 
{
    static __int64 charShift(char* string )
    {
        return 0;
    }
};

// Template stuff is all a bit hairy too look at. So attempt to improve that with some macro wrapping !
#define CT_IFROMS(_string_) CharSHift<sizeof _string_ -1 >::charShift(_string_)

int _tmain(int argc, _TCHAR* argv[])
{
    __int64 hash0 = CT_IFROMS("abcdefgh");

    printf("%08llX \n",hash0);
    return 0;
}