当打印到stdout并写入文件时,C字符串中的垃圾值

时间:2021-09-13 21:14:24

I'm playing around with C strings and streams to get a better understanding of them. I have this test program to read a fixed size block of data from an input file to a buffer, store the buffer contents in an intermediate storage (in this case, I want the storage to be able to store three different "reads") and then write the read string and one of the strings in intermediate storage to an output file.

我使用C字符串和流来更好地理解它们。我有这个测试程序读取一个固定大小的数据块从一个输入文件缓冲区,将缓冲区的内容存储在一个中间存储(在本例中,我希望存储能够存储三个不同的“读取”)然后写读字符串和一个中间的字符串存储到一个输出文件。

A note on this: In each iteration I just use the two first positions of the intermediate storage and just write the second "stored string" to the file.

注意:在每次迭代中,我只使用中间存储的两个第一位置,并将第二个“存储字符串”写入文件。

THE CODE:

代码:

#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define SIZE 3
#define BUFFER_SIZE 5

int main(int argc, char** argv) {
  FILE* local_stream_test = fopen("LOCAL_INPUT_FILE","r");
  FILE* local_output_test = fopen("LOCAL_OUTPUT_TEST","w");

  if(!local_stream_test) {
    puts("!INPUT FILE");
    return EXIT_FAILURE;
  }
  if(!local_output_test) {
    puts("!OUTPUT FILE");
    return EXIT_FAILURE;
  }
  char my_buffer[BUFFER_SIZE];
  char test[SIZE];
  char* test2[SIZE];
  memset(my_buffer,0,sizeof(my_buffer));
  memset(test,0,sizeof(test));
  memset(test2,0,sizeof(test2));

  int read = fread( my_buffer, sizeof(my_buffer[0]), sizeof(my_buffer)/sizeof(my_buffer[0]), local_stream_test );

   printf("FIRST READ TEST: %d\n",read);
   printf("\tMY_BUFFER, SIZEOF: %lu, STRLEN: %lu\n",sizeof(my_buffer),strlen(my_buffer));

   fwrite(my_buffer,sizeof(my_buffer[0]),/*strlen(aux)*/ read,local_output_test);
   char* aux_test = strdup(my_buffer);
   printf("\tAUX_TEST STRLEN: %lu, ## %s\n",strlen(aux_test), aux_test);
   free(aux_test);
   aux_test = NULL;

   while(read > 0) {
     if(feof(local_stream)) {
       puts("BYE");
       break;
     }
     read = fread( my_buffer, sizeof(my_buffer[0]), sizeof(my_buffer)/sizeof(my_buffer[0]), local_stream_test );
     aux_test = strdup(my_buffer);

     if(!aux_test) {
       puts("!AUX_TEST");
       break;
     }


     printf("READ TEST: %d\n",read);
     printf("\tMY_BUFFER, SIZEOF: %lu, STRLEN: %lu\n",sizeof(my_buffer),strlen(my_buffer));
     printf("\tAUX_TEST, SIZEOF: %lu, STRLEN: %lu ** SIZEOF *AUX_TEST: %lu, SIZEOF AUX_TEST[0]: %lu\n",sizeof(aux_test),strlen(aux_test),sizeof(*aux_test),sizeof(aux_test[0]));

     fwrite(aux_test,sizeof(aux[0]),/*strlen(aux)*/ read,local_output_test);

     printf("** AUX_TEST: %s\n",aux_test);
     test2[0] = aux_test;
     test2[1] = aux_test;
     test2[1][3] = toupper(test2[1][3]);

     fwrite(test2[1],sizeof(test2[1][0]),read,local_output_test);

     printf("\n** TEST2[0] SIZEOF: %lu, STRLEN: %lu, TEST2[0]: %s\n",sizeof(test2[0]),strlen(test2[0]),test2[0]);
     printf("\n** TEST2[1] SIZEOF: %lu, STRLEN: %lu, TEST2[1]: %s\n",sizeof(test2[1]),strlen(test2[1]),test2[1]);

     strcpy(test2[1],aux_test);
     printf("** COPIED TEST2[1]: %s\n",test2[1]);
     free(aux_test);
     aux_test = NULL;
     puts("*******************************************");
  }
  return EXIT_SUCCESS;
}

THE INPUT FILE:

输入文件:

converts a byte string to a floating point value
converts a byte string to an integer value
converts a byte string to an integer value

When printing the strings I get extra junk values at the end of it after the second read. Here's the output in stdout for the first, second and third read's from the file:

当打印字符串时,在第二次读取后,我在末尾得到额外的垃圾值。这是stdout中输出的第一次、第二次和第三次读取的文件:

FIRST READ TEST: 5
    MY_BUFFER, SIZEOF: 5, STRLEN: 5
    AUX_TEST STRLEN: 5, ## conve
READ TEST: 5
    MY_BUFFER, SIZEOF: 5, STRLEN: 5
    AUX_TEST, SIZEOF: 4, STRLEN: 5 ** SIZEOF *AUX_TEST: 1, SIZEOF AUX_TEST[0]: 1

** AUX_TEST: rts a

** TEST2[0] SIZEOF: 4, STRLEN: 5, TEST2[0]: rts a

** TEST2[1] SIZEOF: 4, STRLEN: 5, TEST2[1]: rts a
** COPIED TEST2[1]: rts a

*******************************************
READ TEST: 5
    MY_BUFFER, SIZEOF: 5, STRLEN: 13
    AUX_TEST, SIZEOF: 4, STRLEN: 13 ** SIZEOF *AUX_TEST: 1, SIZEOF AUX_TEST[0]: 1

** AUX_TEST:  byte▒▒▒▒

** TEST2[0] SIZEOF: 4, STRLEN: 13, TEST2[0]:  byTe▒▒▒▒


** TEST2[1] SIZEOF: 4, STRLEN: 13, TEST2[1]:  byTe▒▒▒▒

** COPIED TEST2[1]:  byTe▒▒▒▒

What troubles me is the fact that when the junk values start to appear, the length of the string is greater than the read bytes from the file: 13 versus 5. I have played around with the BUFFER_SIZE but I always get the junk values when printing to stdout unless the size is big enough to read the file in one go.

困扰我的是,当垃圾值开始出现时,字符串的长度大于从文件中读取的字节数:13比5。我使用了BUFFER_SIZE,但是在打印到stdout时,我总是得到垃圾值,除非大小足够大,可以一次读取文件。

For example, with BUFFER_SIZE equals to 500, this is the output in stdout:

例如,当BUFFER_SIZE等于500时,这是stdout中的输出:

FIRST READ TEST: 135
    MY_BUFFER, SIZEOF: 300, STRLEN: 135
    AUX_TEST STRLEN: 135, ## converts a byte string to a floating point value
       converts a byte string to an integer value
        converts a byte string to an integer value

 BYE

And the output files generated:

以及生成的输出文件:

BUFFER_SIZE = 5

BUFFER_SIZE = 5

converts arts a byte byTe stri stRing tong To a fl a FloatinoatIng poig pOint vant Value
clue
converonvErts a ts A byte bytE strinstrIng to g tO an inan IntegertegEr valu vaLue
cone
cOnvertsverTs a by a Byte stte String rinG to anto An inte inTeger vger value
aluE

BUFFER_SIZE = 500: The same as the input file.

BUFFER_SIZE = 500:与输入文件相同。

So, I'm accessing out of bounds memory, right? But, where? I can't find the source of this problem (and most likely I have a misunderstanding in how to work with C strings).

那么,我正在访问边界内存,对吧?但是,在哪里?我找不到这个问题的根源(很可能我对如何使用C字符串有误解)。

PS:

PS:

I read here that maybe my problem is that I forgot to add the NULL mark at the end of the string. Doing:

我在这里读到,可能我的问题是我忘记在字符串末尾添加零标记。做的事情:

 test2[0] = aux_test;
 test2[0][ strlen(aux_test)+1 ] = '\0';

 /* OR THIS */
 test2[0][read+1] = '\0';

produces the same result.

产生相同的结果。

2 个解决方案

#1


4  

Part of your problem is that you are reading outside the bounds of your arrays, and fread() certainly doesn't null terminate anything.

问题的一部分是您正在读取数组的范围之外的内容,而fread()当然不会空结束任何内容。

For example:

例如:

printf("\tMY_BUFFER, SIZEOF: %lu, STRLEN: %lu\n",sizeof(my_buffer),strlen(my_buffer));

You read 5 bytes of data into an array of size 5 bytes. The strlen() reports 5; you were lucky that the first byte beyond the end of the array happened to be a zero byte, but since it was outside the array, you invoked undefined behaviour at that point (even though you got the answer you were expecting).

将5字节的数据读入大小为5字节的数组。strlen()5报告;幸运的是,数组末尾之外的第一个字节恰好是零字节,但是由于它在数组之外,所以您在那一点上调用了未定义的行为(尽管您得到了预期的答案)。

In the loop, in the first iteration, the toupper() case-converts a blank, which doesn't change it. test2[0] and test2[1] both point to the same string, so if the toupper() did anything, it would affect the value pointed at by both those pointers.

在循环中,在第一次迭代中,toupper()大小写转换为空白,不会改变它。test2[0]和test2[1]都指向相同的字符串,因此,如果toupper()做了什么,它将会影响这两个指针指向的值。

When the junk values 'appear', you've put non-zero bytes into the data after the end of my_buffer, and the strlen() reads through those non-zero bytes until it reaches a zero byte. So, the problem is all due to not ensure that your character buffers are null terminated within the allocated length. When you invoked undefined behaviour, weird stuff can happen.

当垃圾值“出现”时,在my_buffer结束后将非零字节放入数据中,并且strlen()会读取这些非零字节,直到达到零字节。因此,问题是由于不确保在分配的长度内您的字符缓冲区为空。当您调用未定义的行为时,会发生奇怪的事情。

Note that if you use printf("<<%.*s>>\n", read, my_buffer); you will only print the bytes of data that were read.

注意,如果您使用printf(“<<%”)。* s > > \ n”、读取、my_buffer);您将只打印所读取的数据的字节。


You ask about:

你询问:

test2[0] = aux_test;
test2[0][ strlen(aux_test)+1 ] = '\0';
/* OR THIS */
test2[0][read+1] = '\0';

You are accessing one byte beyond the end of what was provided. By definition, strlen(str) returns the first number len such that str[len] == '\0'. When you write test2[0][[strlen(aux_test)+1] = '\0'; therefore, you are writing one byte beyond the end of the first null in the string. The test2[0][read+1] = '\0'; assignment, assuming you've just read 5 bytes, overwrites test2[0][6], but the last byte of data that was read is in test2[0][4], so you've not changed test2[0][5] (and it isn't clear whether you're allowed to).

您访问的字节超出了所提供的末尾。根据定义,strlen(str)会返回第一个数字len,例如str[len] == '\0'。当您写入test2[0][[strlen(aux_test)+1] = '\0';因此,您将在字符串中第一个null结束后写入一个字节。(阅读+ 1)test2[0]= ' \ 0 ';赋值,假设您刚刚读取了5个字节,重写test2[0][6],但是读取的最后一个字节是test2[0][4],所以您没有更改test2[0][5](不清楚是否允许您这么做)。

test2[0][strlen(aux_test)] = '\0';  // No-op, but safe
test2[0][read] = '\0';              // If you left enough space, null terminates the input

#2


1  

In every case, the garbage starts after the 5th bit, as should be expected since #define BUFFER_SIZE 5. If after you read in the value, a '\0' was used to null terminate the legal length of the string (5), like this:

在每种情况下,垃圾都在第5位之后开始,这是意料之中的,因为#define BUFFER_SIZE 5。如果在读取值之后,使用'\0'终止字符串(5)的合法长度,如下所示:

my_buffer[strlen(my_buffer)-1]=0; //or since you know its length, my_buffer[4]=0;

my_buffer[strlen(my_buffer)1]= 0;//或者因为你知道它的长度,my_buffer[4]=0;

That would make the contents of my buffer a legal string. To actually fix the problem, create my_buffer with more space in the first place, then always terminate with '\0'.

这将使缓冲区的内容成为合法的字符串。要真正解决这个问题,首先创建具有更多空间的my_buffer,然后始终以'\0'结束。

#1


4  

Part of your problem is that you are reading outside the bounds of your arrays, and fread() certainly doesn't null terminate anything.

问题的一部分是您正在读取数组的范围之外的内容,而fread()当然不会空结束任何内容。

For example:

例如:

printf("\tMY_BUFFER, SIZEOF: %lu, STRLEN: %lu\n",sizeof(my_buffer),strlen(my_buffer));

You read 5 bytes of data into an array of size 5 bytes. The strlen() reports 5; you were lucky that the first byte beyond the end of the array happened to be a zero byte, but since it was outside the array, you invoked undefined behaviour at that point (even though you got the answer you were expecting).

将5字节的数据读入大小为5字节的数组。strlen()5报告;幸运的是,数组末尾之外的第一个字节恰好是零字节,但是由于它在数组之外,所以您在那一点上调用了未定义的行为(尽管您得到了预期的答案)。

In the loop, in the first iteration, the toupper() case-converts a blank, which doesn't change it. test2[0] and test2[1] both point to the same string, so if the toupper() did anything, it would affect the value pointed at by both those pointers.

在循环中,在第一次迭代中,toupper()大小写转换为空白,不会改变它。test2[0]和test2[1]都指向相同的字符串,因此,如果toupper()做了什么,它将会影响这两个指针指向的值。

When the junk values 'appear', you've put non-zero bytes into the data after the end of my_buffer, and the strlen() reads through those non-zero bytes until it reaches a zero byte. So, the problem is all due to not ensure that your character buffers are null terminated within the allocated length. When you invoked undefined behaviour, weird stuff can happen.

当垃圾值“出现”时,在my_buffer结束后将非零字节放入数据中,并且strlen()会读取这些非零字节,直到达到零字节。因此,问题是由于不确保在分配的长度内您的字符缓冲区为空。当您调用未定义的行为时,会发生奇怪的事情。

Note that if you use printf("<<%.*s>>\n", read, my_buffer); you will only print the bytes of data that were read.

注意,如果您使用printf(“<<%”)。* s > > \ n”、读取、my_buffer);您将只打印所读取的数据的字节。


You ask about:

你询问:

test2[0] = aux_test;
test2[0][ strlen(aux_test)+1 ] = '\0';
/* OR THIS */
test2[0][read+1] = '\0';

You are accessing one byte beyond the end of what was provided. By definition, strlen(str) returns the first number len such that str[len] == '\0'. When you write test2[0][[strlen(aux_test)+1] = '\0'; therefore, you are writing one byte beyond the end of the first null in the string. The test2[0][read+1] = '\0'; assignment, assuming you've just read 5 bytes, overwrites test2[0][6], but the last byte of data that was read is in test2[0][4], so you've not changed test2[0][5] (and it isn't clear whether you're allowed to).

您访问的字节超出了所提供的末尾。根据定义,strlen(str)会返回第一个数字len,例如str[len] == '\0'。当您写入test2[0][[strlen(aux_test)+1] = '\0';因此,您将在字符串中第一个null结束后写入一个字节。(阅读+ 1)test2[0]= ' \ 0 ';赋值,假设您刚刚读取了5个字节,重写test2[0][6],但是读取的最后一个字节是test2[0][4],所以您没有更改test2[0][5](不清楚是否允许您这么做)。

test2[0][strlen(aux_test)] = '\0';  // No-op, but safe
test2[0][read] = '\0';              // If you left enough space, null terminates the input

#2


1  

In every case, the garbage starts after the 5th bit, as should be expected since #define BUFFER_SIZE 5. If after you read in the value, a '\0' was used to null terminate the legal length of the string (5), like this:

在每种情况下,垃圾都在第5位之后开始,这是意料之中的,因为#define BUFFER_SIZE 5。如果在读取值之后,使用'\0'终止字符串(5)的合法长度,如下所示:

my_buffer[strlen(my_buffer)-1]=0; //or since you know its length, my_buffer[4]=0;

my_buffer[strlen(my_buffer)1]= 0;//或者因为你知道它的长度,my_buffer[4]=0;

That would make the contents of my buffer a legal string. To actually fix the problem, create my_buffer with more space in the first place, then always terminate with '\0'.

这将使缓冲区的内容成为合法的字符串。要真正解决这个问题,首先创建具有更多空间的my_buffer,然后始终以'\0'结束。