c语言的sizeof到底应该返回什么？

我发现这个面试问题难住了蛮多人的。总的来说，sizeof(x)返回的是该值x占用的存储空间的大小。但是，问题就出在这个"x"到底是啥？

#include <stdio.h>
#include <string.h>
#define HELLO_STR "hello"
#define HELLO_STR2 "hello\0"

int hello(int input[3][3])
{
    printf("the sizeof(input) is: %d\n",sizeof(input));
}

int main(int argc, char *argv[])
{
    int i = 0;
    int j[10];
    char x[3][3];
    char str[] = "aBcDefghijklmn";
    printf("the sizeof(1) is: %d\n",sizeof(1));
    printf("the sizeof(i) is: %d\n",sizeof(i));
    printf("the sizeof(&i) is: %d\n",sizeof(&i));
    printf("the sizeof(j) is: %d\n",sizeof(j));
    printf("the sizeof(&j) is: %d\n",sizeof(&j));
    printf("the sizeof(x) is: %d\n",sizeof(x));
    printf("the sizeof(str[]) is:%d\n", sizeof(str));
    printf("the strlen(str[]) is:%d\n", strlen(str));
    printf("the sizeof(&str[]) is:%d\n", sizeof(&str));
    printf("the sizeof(HELLO_STR) is: %d\n",sizeof(HELLO_STR));
    printf("the sizeof(HELLO_STR2) is: %d\n",sizeof(HELLO_STR2));
    printf("the strlen(HELLO_STR) is: %d\n",strlen(HELLO_STR));
    printf("the strlen(HELLO_STR2) is: %d\n",strlen(HELLO_STR2));
    hello(j);
    return 0;
}

以下面的代码为例，有多少人能正确写出所有的结果呢？

在我的mac上，输出的结果是这样的

the sizeof(1) is: 4
the sizeof(i) is: 4
the sizeof(&i) is: 8
the sizeof(j) is: 40
the sizeof(&j) is: 8
the sizeof(x) is: 9
the sizeof(str[]) is:15
the strlen(str[]) is:14
the sizeof(&str[]) is:8
the sizeof(HELLO_STR) is: 6
the sizeof(HELLO_STR2) is: 7
the strlen(HELLO_STR) is: 5
the strlen(HELLO_STR2) is: 5
the sizeof(input) is: 8

1是一个int型，输出4，很好理解。

i是int型，输出4，也好理解。

&i是一个指针，我的mac是64位的，返回8，也好说。

j是一个int数组，占用的空间是4 * 10，返回40。

&j是一个指针，同&i

HELLO_STR，这个其实有点意思，他是被优化过的，返回了6，可以看到我的编译器给他自动append了一个 '\0'，strlen(HELLO_STR)应该返回5。

HELLO_STR2同上，也是append了'\0'。这个行为有点编译器优化的意思，因为在部分嵌入式的编译器，这个返回实际上跟strlen是相同的，所以具体一个define是返回多少，有点看编译器。

x的size是9，也好说，矩阵是3*3的，char的空间是9

str这个网上有人说这个是动态数组，返回的应该是指针大小，但是我实际测试，发现还是指向的数据空间的大小。so，返回的是15

str的strlen结果，没得说，14，很直接。

&str没得说，指针的大小。这里是8

最后的input，就有意思了，为什么是8呢？因为虽然函数定义是input[3][3]，但是实际传入到函数的是一个int指针，指针，返回指针的大小。

再挖深点，谈谈编译器优化

实际上，大家都知道，c是静态语言，所以sizeof完全是可以在编译过程中计算好的。我在linux下对代码进行了简单的反编译（mac的汇编格式实在不习惯。。。），就可以看到，没个printf的调用，都是这样的一段：

  0x00000000004006c1 <+312>:	mov    $0x40092c,%eax
   0x00000000004006c6 <+317>:	mov    $0x6,%esi
   0x00000000004006cb <+322>:	mov    %rax,%rdi
   0x00000000004006ce <+325>:	mov    $0x0,%eax
   0x00000000004006d3 <+330>:	callq  0x400460 <printf@plt>

这里eax的值是格式化字符串的地址，esi的值是6，也就是直接算出来的sizeof结果。有兴趣的可以自己看看。

但是纵观反编译的结果，只有调用printf的callq，没有调用strlen的。进一步观察，发现实际上部分strlen也被提前计算了。

   0x0000000000400706 <+381>:	mov    $0x400970,%eax
   0x000000000040070b <+386>:	mov    $0x5,%esi
   0x0000000000400710 <+391>:	mov    %rax,%rdi
   0x0000000000400713 <+394>:	mov    $0x0,%eax
   0x0000000000400718 <+399>:	callq  0x400460 <printf@plt>

看来对于常量字符串，编译器会进一步优化，预计算这个strlen。但是有一个有意的，就是str数组，是个动态数组，也会被优化嘛？追踪了本地的strlen实现，发现这个strlen被展开了，使用了一个repnz scas来实现。具体代码如下：

   0x000000000040066e <+229>:	lea    -0x20(%rbp),%rax
   0x0000000000400672 <+233>:	movq   $0xffffffffffffffff,-0x78(%rbp)
   0x000000000040067a <+241>:	mov    %rax,%rdx
   0x000000000040067d <+244>:	mov    $0x0,%eax
   0x0000000000400682 <+249>:	mov    -0x78(%rbp),%rcx
   0x0000000000400686 <+253>:	mov    %rdx,%rdi
   0x0000000000400689 <+256>:	repnz scas %es:(%rdi),%al
   0x000000000040068b <+258>:	mov    %rcx,%rax
   0x000000000040068e <+261>:	not    %rax
   0x0000000000400691 <+264>:	lea    -0x1(%rax),%rdx
   0x0000000000400695 <+268>:	mov    $0x4008f9,%eax
   0x000000000040069a <+273>:	mov    %rdx,%rsi
   0x000000000040069d <+276>:	mov    %rax,%rdi
   0x00000000004006a0 <+279>:	mov    $0x0,%eax
   0x00000000004006a5 <+284>:	callq  0x400460 <printf@plt>

scas指令的全称为scan a string，具体内容有兴趣的可以参考这个问题

so，一个看似简单的sizeof和strlen，还是有很多有意思的事情的。

秒客网

c语言的sizeof到底应该返回什么？

再挖深点，谈谈编译器优化

相关文章