如何在C中声明足够大的缓冲区?

时间:2021-05-10 21:30:14

I want to know how to declare the exact size of storage in C , if I use array or do the memory allocation such as malloc , they all need to decide the size previously . In this situation , I will declare a very large size to prevent the overflow , but it still have probability to happened .

我想知道如何在C中声明存储的确切大小,如果我使用数组或执行内存分配(如malloc),它们都需要先决定大小。在这种情况下,我将声明一个非常大的大小以防止溢出,但它仍然有可能发生。

For example

If I want to split an text file to words , I need to declare a char ** to store the word string , but I can't know how much words will be split ?

如果我想将文本文件拆分为单词,我需要声明一个char **来存储单词string,但我不知道会分割多少单词?

If I want to read the file content into a array

如果我想将文件内容读入数组

I need to declare a large buffer to store

我需要声明一个大缓冲区来存储

buffer = malloc(sizeof(char)*1000);

buffer = malloc(sizeof(char)* 1000);

Any better or correct solutions? thanks

有更好或正确的解决方案?谢谢

#include <stdio.h>
#include <stdlib.h>

void read_chars(char * file_name ,char * buffer);

int main(int argc ,char * argv[])
{
    char * buffer ;
    buffer = malloc(sizeof(char)*1000);
    read_chars(argv[1],buffer);
    printf("%s",buffer);
}

void read_chars(char * file_name ,char * buffer)
{
    FILE * input_file ;
    input_file = fopen(file_name,"r");
    int i = 0;
    char ch;
    while((ch = fgetc(input_file)) != EOF)
    {
        *(buffer+i) = ch;
        i++;
    }
    *(buffer+i) = '\0';
    fclose(input_file);
}

2 个解决方案

#1


4  

The point of a buffer is (usually) to be a fixed size and allow you to read data in chunks. If you are reading a file then you shouldn't hold it all in memory unless you know the size of the file and it's not too big.

缓冲点(通常)是固定大小,允许您以块的形式读取数据。如果您正在读取文件,那么除非您知道文件的大小并且它不是太大,否则不应将其全部保存在内存中。

Declare a buffer size, traditionally a power of two, like 2048, and read the file into it in chunks, then run your logic on the chunk each time you read a block. You then use constant memory, can read any size file, and don't have to guess.

声明一个缓冲区大小,传统上是2的幂,如2048,并以块的形式读取文件,然后在每次读取块时在块上运行逻辑。然后你使用常量内存,可以读取任何大小的文件,而不必猜测。

A downside is that you may have issues working with items that overlap the boundaries of buffers. You may have to work harder to get your logic to work in these cases.

缺点是您可能在处理与缓冲区边界重叠的项目时遇到问题。在这些情况下,您可能需要更加努力地使您的逻辑工作。

Alternatively look at mmap to virtually map the whole file into memory (you still have to know how big it is though! But you can get the files size up-front.).

或者看看mmap虚拟地将整个文件映射到内存中(你仍然需要知道它有多大!但是你可以预先获得文件大小。)。

#2


1  

An answer after an accepted answer:

接受答案后的答案:

1) A classic attack on systems to day is buffer overrun. If your system can handle 1000 bytes, someone will try 1001. So rather than a solution that can deal with an arbitrarily large buffer, define an upper limit geared to the task. If one is looking for a "name", 1024 byte should work. See long name. This size should be easy to adjust should code need re-work. Longer values are likely attacks and need not get handled normally. They should be detected and declared invalid input instead.

1)对系统的典型攻击是缓冲区溢出。如果你的系统可以处理1000个字节,那么有人会尝试1001.因此,不是一个可以处理任意大缓冲区的解决方案,而是定义一个适合任务的上限。如果正在寻找“名称”,则1024字节应该有效。看长名。如果代码需要重新工作,这个大小应该很容易调整。较长的值可能是攻击,无需正常处理。应检测它们并将其声明为无效输入。

2) Don't miss the forest from the trees. I found it interesting that OP code has a classic error. Should getc() return the legal value of 255 then assign it to ch, ch may compare to EOF and stop. In all this dicsussion about buffer size, the size for ch was too small.

2)千万不要错过树林里的森林。我发现OP代码有一个经典错误很有趣。如果getc()返回合法值255然后将其分配给ch,则ch可以与EOF进行比较并停止。在关于缓冲区大小的所有这些争论中,ch的大小太小。

// char ch;
int ch;
while((ch = fgetc(input_file)) != EOF)

3) read_chars() should have had the buffer size passed to it so the function could use that information: read_chars(argv[1], buffer, 1000).

3)read_chars()应该已经传递了缓冲区大小,因此该函数可以使用该信息:read_chars(argv [1],buffer,1000)。

#1


4  

The point of a buffer is (usually) to be a fixed size and allow you to read data in chunks. If you are reading a file then you shouldn't hold it all in memory unless you know the size of the file and it's not too big.

缓冲点(通常)是固定大小,允许您以块的形式读取数据。如果您正在读取文件,那么除非您知道文件的大小并且它不是太大,否则不应将其全部保存在内存中。

Declare a buffer size, traditionally a power of two, like 2048, and read the file into it in chunks, then run your logic on the chunk each time you read a block. You then use constant memory, can read any size file, and don't have to guess.

声明一个缓冲区大小,传统上是2的幂,如2048,并以块的形式读取文件,然后在每次读取块时在块上运行逻辑。然后你使用常量内存,可以读取任何大小的文件,而不必猜测。

A downside is that you may have issues working with items that overlap the boundaries of buffers. You may have to work harder to get your logic to work in these cases.

缺点是您可能在处理与缓冲区边界重叠的项目时遇到问题。在这些情况下,您可能需要更加努力地使您的逻辑工作。

Alternatively look at mmap to virtually map the whole file into memory (you still have to know how big it is though! But you can get the files size up-front.).

或者看看mmap虚拟地将整个文件映射到内存中(你仍然需要知道它有多大!但是你可以预先获得文件大小。)。

#2


1  

An answer after an accepted answer:

接受答案后的答案:

1) A classic attack on systems to day is buffer overrun. If your system can handle 1000 bytes, someone will try 1001. So rather than a solution that can deal with an arbitrarily large buffer, define an upper limit geared to the task. If one is looking for a "name", 1024 byte should work. See long name. This size should be easy to adjust should code need re-work. Longer values are likely attacks and need not get handled normally. They should be detected and declared invalid input instead.

1)对系统的典型攻击是缓冲区溢出。如果你的系统可以处理1000个字节,那么有人会尝试1001.因此,不是一个可以处理任意大缓冲区的解决方案,而是定义一个适合任务的上限。如果正在寻找“名称”,则1024字节应该有效。看长名。如果代码需要重新工作,这个大小应该很容易调整。较长的值可能是攻击,无需正常处理。应检测它们并将其声明为无效输入。

2) Don't miss the forest from the trees. I found it interesting that OP code has a classic error. Should getc() return the legal value of 255 then assign it to ch, ch may compare to EOF and stop. In all this dicsussion about buffer size, the size for ch was too small.

2)千万不要错过树林里的森林。我发现OP代码有一个经典错误很有趣。如果getc()返回合法值255然后将其分配给ch,则ch可以与EOF进行比较并停止。在关于缓冲区大小的所有这些争论中,ch的大小太小。

// char ch;
int ch;
while((ch = fgetc(input_file)) != EOF)

3) read_chars() should have had the buffer size passed to it so the function could use that information: read_chars(argv[1], buffer, 1000).

3)read_chars()应该已经传递了缓冲区大小,因此该函数可以使用该信息:read_chars(argv [1],buffer,1000)。