C - scanf() vs get () vs fgets()

时间:2021-12-25 01:50:58

I've been doing a fairly easy program of converting a string of Characters (assuming numbers are entered) to an Integer.

我一直在做一个相当简单的程序,将一串字符(假设数字输入)转换为整数。

After I was done, I noticed some very peculiar "bugs" that I can't answer, mostly because of my limited knowledge of how the scanf(), gets() and fgets() functions work. (I did read a lot of literature though.)

完成之后,我注意到一些非常奇怪的“bug”,我无法回答,主要是因为我对scanf()、get()和fgets()函数如何工作的了解有限。(不过我确实读过很多文学作品。)

So without writing too much text, here's the code of the program:

所以,如果不写太多的文本,这是程序的代码:

#include <stdio.h>

#define MAX 100

int CharToInt(const char *);

int main()
{
    char str[MAX];

    printf(" Enter some numbers (no spaces): ");
    gets(str);
//  fgets(str, sizeof(str), stdin);
//  scanf("%s", str);

    printf(" Entered number is: %d\n", CharToInt(str));

    return 0;
}

int CharToInt(const char *s)
{
    int i, result, temp;

    result = 0;
    i = 0;

    while(*(s+i) != '\0')
    {
        temp = *(s+i) & 15;
        result = (temp + result) * 10;
        i++;
    }

    return result / 10;
}

So here's the problem I've been having. First, when using gets() function, the program works perfectly.

这就是我的问题。首先,当使用get()函数时,程序运行得很好。

Second, when using fgets(), the result is slightly wrong because apparently fgets() function reads newline (ASCII value 10) character last which screws up the result.

其次,当使用fgets()时,结果会有一点错误,因为显然fgets()函数会读取新行(ASCII值10)字符,从而使结果出错。

Third, when using scanf() function, the result is completely wrong because first character apparently has a -52 ASCII value. For this, I have no explanation.

第三,当使用scanf()函数时,结果是完全错误的,因为第一个字符显然有-52 ASCII值。对此,我没有任何解释。

Now I know that gets() is discouraged to use, so I would like to know if I can use fgets() here so it doesn't read (or ignores) newline character. Also, what's the deal with the scanf() function in this program?

现在我知道get()是不鼓励使用的,所以我想知道我是否可以在这里使用fgets(),这样它就不会读取(或忽略)换行符。还有,这个程序中的scanf()函数是什么?

7 个解决方案

#1


25  

  • Never use gets. It offers no protections against a buffer overflow vulnerability (that is, you cannot tell it how big the buffer you pass to it is, so it cannot prevent a user from entering a line larger than the buffer and clobbering memory).

    不要使用。它没有提供对缓冲区溢出漏洞的保护(也就是说,您不能告诉它您传递给它的缓冲区有多大,因此它不能阻止用户输入大于缓冲区和阻塞内存的行)。

  • Avoid using scanf. If not used carefully, it can have the same buffer overflow problems as gets. Even ignoring that, it has other problems that make it hard to use correctly.

    避免使用scanf。如果不小心使用,它可能会有相同的缓冲区溢出问题。即使忽略了这一点,它也有其他的问题使它难以正确使用。

  • Generally you should use fgets instead, although it's sometimes inconvenient (you have to strip the newline, you must determine a buffer size ahead of time, and then you must figure out what to do with lines that are too long–do you keep the part you read and discard the excess, discard the whole thing, dynamically grow the buffer and try again, etc.). There are some non-standard functions available that do this dynamic allocation for you (e.g. getline on POSIX systems, Chuck Falconer's public domain ggets function). Note that ggets has gets-like semantics in that it strips a trailing newline for you.

    通常应该使用fgets相反,虽然有时不方便(你必须带换行符,你必须提前确定缓冲区大小,然后你必须找出与行太久你让你阅读和丢弃多余的部分,丢弃整个事情,动态地增加缓冲和再试一次,等等)。有一些非标准的函数可以为您执行这个动态分配(例如,在POSIX系统上的getline, Chuck Falconer的公共域gget函数)。注意,gget有类似于gets的语义,因为它为您去掉了一个尾随的换行符。

#2


18  

Yes, you want to avoid gets. fgets will always read the new-line if the buffer was big enough to hold it (which lets you know when the buffer was too small and there's more of the line waiting to be read). If you want something like fgets that won't read the new-line (losing that indication of a too-small buffer) you can use fscanf with a scan-set conversion like: "%N[^\n]", where the 'N' is replaced by the buffer size - 1.

是的,你想避免得到。如果缓冲区足够大,fgets将始终读取这条新行(它可以让您知道缓冲区太小,还有更多的行等待读取)。如果你想函数类似,不会读新行(失去的层层肥肉缓冲)可以使用fscanf scan-set转换:“% N[^ \ N]”,取代“N”的缓冲区大小- 1。

One easy (if strange) way to remove the trailing new-line from a buffer after reading with fgets is: strtok(buffer, "\n"); This isn't how strtok is intended to be used, but I've used it this way more often than in the intended fashion (which I generally avoid).

一种简单(如果奇怪)的方法是在读取fgets后从缓冲区中删除拖尾的新行:strtok(缓冲区,“\n”);这不是strtok想要使用的方式,但我使用它的方式比预期的方式(我通常避免使用)更频繁。

#3


8  

There are numerous problems with this code. We'll fix the badly named variables and functions and investigate the problems:

这段代码有很多问题。我们将修复被命名的变量和函数,并研究这些问题:

  • First, CharToInt() should be renamed to the proper StringToInt() since it operates on an string not a single character.

    首先,CharToInt()应该被重命名为适当的StringToInt(),因为它操作的是字符串而不是单个字符。

  • The function CharToInt() [sic.] is unsafe. It doesn't check if the user accidentally passes in a NULL pointer.

    函数CharToInt()(原文如此。)是不安全的。它不检查用户是否意外地传递了一个空指针。

  • It doesn't validate input, or more correctly, skip invalid input. If the user enters in a non-digit the result will contain a bogus value. i.e. If you enter in N the code *(s+i) & 15 will produce 14 !?

    它没有验证输入,或者更准确地说,跳过无效输入。如果用户输入一个非数字,结果将包含一个伪值。也就是说,如果你输入了N,代码*(s+i)和15将产生14 !?

  • Next, the nondescript temp in CharToInt() [sic.] should be called digit since that is what it really is.

    接下来,在CharToInt()中没有描述的temp () [sic]。应该被称为数字,因为这就是它的本质。

  • Also, the kludge return result / 10; is just that -- a bad hack to work around a buggy implementation.

    另外,kludge返回结果/ 10;这只是一个拙劣的hack,在一个bug的实现周围工作。

  • Likewise MAX is badly named since it may appear to conflict with the standard usage. i.e. #define MAX(X,y) ((x)>(y))?(x):(y)

    同样,由于它可能与标准的使用相冲突,所以它的命名也很糟糕。即# define MAX(X,y)((X)>(y))?(X):(y)

  • The verbose *(s+i) is not as readable as simply *s. There is no need to use and clutter up the code with yet another temporary index i.

    verbose *(s+i)不像简单的*s那样可读。没有必要用另一个临时索引i来使用和整理代码。

gets()

This is bad because it can overflow the input string buffer. For example, if the buffer size is 2, and you enter in 16 characters, you will overflow str.

这很糟糕,因为它可以溢出输入字符串缓冲区。例如,如果缓冲区大小为2,并且输入16个字符,则会溢出str。

scanf()

This is equally bad because it can overflow the input string buffer.

这同样糟糕,因为它可以溢出输入字符串缓冲区。

You mention "when using scanf() function, the result is completely wrong because first character apparently has a -52 ASCII value."

您提到“当使用scanf()函数时,结果是完全错误的,因为第一个字符显然有-52 ASCII值。”

That is due to an incorrect usage of scanf(). I was not able to duplicate this bug.

这是因为scanf()的用法不正确。我不能复制这个错误。

fgets()

This is safe because you can guarantee you never overflow the input string buffer by passing in the buffer size (which includes room for the NULL.)

这是安全的,因为您可以保证不通过传入缓冲区大小(其中包括空的空间)来溢出输入字符串缓冲区。

getline()

A few people have suggested the C POSIX standard getline() as a replacement. Unfortunately this is not a practical portable solution as Microsoft does not implement a C version; only the standard C++ string template function as this SO #27755191 question answers. Microsoft's C++ getline() was available at least far back as Visual Studio 6 but since the OP is strictly asking about C and not C++ this isn't an option.

一些人建议C POSIX标准getline()作为替换。不幸的是,由于微软没有实现C版本,这并不是一个实际的可移植解决方案;只有标准的c++字符串模板函数,因此#27755191问题的答案。微软的c++ getline()至少可以追溯到Visual Studio 6,但由于OP严格要求C而不是c++,所以这不是一个选项。

Misc.

Lastly, this implementation is buggy in that it doesn't detect integer overflow. If the user enters too large a number the number may become negative! i.e. 9876543210 will become -18815698?! Let's fix that too.

最后,这个实现是错误的,因为它没有检测到整数溢出。如果用户输入的数字太大,数字可能会变成负数!即9876543210将变成-18815698?让我们解决这个问题。

This is trivial to fix for an unsigned int. If the previous partial number is less then the current partial number then we have overflowed and we return the previous partial number.

对于未签名的int,这是微不足道的。如果之前的部分数小于当前的部分数,那么我们就会溢出,然后返回之前的部分数字。

For a signed int this is a little more work. In assembly we could inspect the carry-flag, but in C there is no standard built-in way to detect overflow with signed int math. Fortunately, since we are multiplying by a constant, * 10, we can easily detect this if we use an equivalent equation:

对于一个已签名的int,这需要更多的工作。在装配过程中,我们可以检查随身携带的标志,但在C中,没有标准的内置方法来检测带有符号整数数学的溢出。幸运的是,由于我们乘以一个常数,* 10,我们可以很容易地检测到,如果我们用一个等价的方程:

n = x*10 = x*8 + x*2

If x*8 overflows then logically x*10 will as well. For a 32-bit int overflow will happen when x*8 = 0x100000000 thus all we need to do is detect when x >= 0x20000000. Since we don't want to assume how many bits an int has we only need to test if the top 3 msb's (Most Significant Bits) are set.

如果x*8溢出,那么逻辑上x*10也会。当x*8 = 0x100000000时,将会发生32位的int溢出,因此我们需要做的就是检测x >= 0x20000000。因为我们不想假定int有多少位,我们只需要测试前3个msb(最重要的位)是否被设置。

Additionally, a second overflow test is needed. If the msb is set (sign bit) after the digit concatenation then we also know the number overflowed.

此外,还需要进行第二次溢出测试。如果msb在数字连接后设置(符号位),那么我们也知道数字溢出。

Code

Here is a fixed safe version along with code that you can play with to detect overflow in the unsafe versions. I've also included both a signed and unsigned versions via #define SIGNED 1

这里有一个固定的安全版本,以及可以用来检测不安全版本溢出的代码。我还包括了签名和未签名的版本,通过#define签名1。

#include <stdio.h>
#include <ctype.h> // isdigit()

// 1 fgets
// 2 gets
// 3 scanf
#define INPUT 1

#define SIGNED 1

// re-implementation of atoi()
// Test Case: 2147483647 -- valid    32-bit
// Test Case: 2147483648 -- overflow 32-bit
int StringToInt( const char * s )
{
    int result = 0, prev, msb = (sizeof(int)*8)-1, overflow;

    if( !s )
        return result;

    while( *s )
    {
        if( isdigit( *s ) ) // Alt.: if ((*s >= '0') && (*s <= '9'))
        {
            prev     = result;
            overflow = result >> (msb-2); // test if top 3 MSBs will overflow on x*8
            result  *= 10;
            result  += *s++ & 0xF;// OPTIMIZATION: *s - '0'

            if( (result < prev) || overflow ) // check if would overflow
                return prev;
        }
        else
            break; // you decide SKIP or BREAK on invalid digits
    }

    return result;
}

// Test case: 4294967295 -- valid    32-bit
// Test case: 4294967296 -- overflow 32-bit
unsigned int StringToUnsignedInt( const char * s )
{
    unsigned int result = 0, prev;

    if( !s )
        return result;

    while( *s )
    {
        if( isdigit( *s ) ) // Alt.: if (*s >= '0' && *s <= '9')
        {
            prev    = result;
            result *= 10;
            result += *s++ & 0xF; // OPTIMIZATION: += (*s - '0')

            if( result < prev ) // check if would overflow
                return prev;
        }
        else
            break; // you decide SKIP or BREAK on invalid digits
    }

    return result;
}

int main()
{
    int  detect_buffer_overrun = 0;

    #define   BUFFER_SIZE 2    // set to small size to easily test overflow
    char str[ BUFFER_SIZE+1 ]; // C idiom is to reserve space for the NULL terminator

    printf(" Enter some numbers (no spaces): ");

#if   INPUT == 1
    fgets(str, sizeof(str), stdin);
#elif INPUT == 2
    gets(str); // can overflows
#elif INPUT == 3
    scanf("%s", str); // can also overflow
#endif

#if SIGNED
    printf(" Entered number is: %d\n", StringToInt(str));
#else
    printf(" Entered number is: %u\n", StringToUnsignedInt(str) );
#endif
    if( detect_buffer_overrun )
        printf( "Input buffer overflow!\n" );

    return 0;
}

#4


4  

You're correct that you should never use gets. If you want to use fgets, you can simply overwrite the newline.

你是对的,你永远不应该使用get。如果您想使用fgets,您可以简单地改写换行符。

char *result = fgets(str, sizeof(str), stdin);
char len = strlen(str);
if(result != NULL && str[len - 1] == '\n')
{
  str[len - 1] = '\0';
}
else
{
  // handle error
}

This does assume there are no embedded NULLs. Another option is POSIX getline:

这确实假定没有嵌入的空值。另一个选项是POSIX getline:

char *line = NULL;
size_t len = 0;
ssize_t count = getline(&line, &len, stdin);
if(count >= 1 && line[count - 1] == '\n')
{
  line[count - 1] = '\0';
}
else
{
  // Handle error
}

The advantage to getline is it does allocation and reallocation for you, it handles possible embedded NULLs, and it returns the count so you don't have to waste time with strlen. Note that you can't use an array with getline. The pointer must be NULL or free-able.

getline的优点是它为您分配和重新分配,它处理可能的嵌入的NULLs,它返回count,这样您就不必浪费时间与strlen了。注意,您不能使用带有getline的数组。指针必须是空的或可*的。

I'm not sure what issue you're having with scanf.

我不知道你和scanf有什么关系。

#5


3  

never use gets(), it can lead to unprdictable overflows. If your string array is of size 1000 and i enter 1001 characters, i can buffer overflow your program.

永远不要使用get(),它会导致无法处理的溢出。如果您的字符串数组大小为1000,而我输入1001个字符,则可以对程序进行缓冲区溢出。

#6


1  

Try using fgets() with this modified version of your CharToInt():

尝试使用fgets()来修改您的CharToInt():

int CharToInt(const char *s)
{
    int i, result, temp;

    result = 0;
    i = 0;

    while(*(s+i) != '\0')
    {
        if (isdigit(*(s+i)))
        {
            temp = *(s+i) & 15;
            result = (temp + result) * 10;
        }
        i++;
    }

    return result / 10;
}

It essentially validates the input digits and ignores anything else. This is very crude so modify it and salt to taste.

它实际上是验证输入数字并忽略其他任何东西。这是非常粗糙的,所以修改它和盐的味道。

#7


-2  

So I am not much of a programmer but let me try to answer your question about the scanf();. I think the scanf is pretty fine and use it for mostly everything without having any issues. But you have taken a not completely correct structure. It should be:

所以我不是一个程序员,但是让我试着回答你关于scanf()的问题。我认为scanf是很好的,并且在没有任何问题的情况下使用它。但是你的结构并不完全正确。应该是:

char str[MAX];
printf("Enter some text: ");
scanf("%s", &str);
fflush(stdin);

The "&" in front of the variable is important. It tells the program where (in which variable) to save the scanned value. the fflush(stdin); clears the buffer from the standard input (keyboard) so you're less likely to get a buffer overflow.

在变量前面的“&”是重要的。它告诉程序(在哪个变量中)保存扫描值。。fflush(stdin);从标准输入(键盘)清除缓冲区,这样就不太可能获得缓冲区溢出。

And the difference between gets/scanf and fgets is that gets(); and scanf(); only scan until the first space ' ' while fgets(); scans the whole input. (but be sure to clean the buffer afterwards so you wont get an overflow later on)

get /scanf和fgets的区别是get ();和scanf();只扫描到第一个空格,而fgets();扫描整个输入。(但一定要在事后清理缓冲区,以便以后不会出现溢出)

#1


25  

  • Never use gets. It offers no protections against a buffer overflow vulnerability (that is, you cannot tell it how big the buffer you pass to it is, so it cannot prevent a user from entering a line larger than the buffer and clobbering memory).

    不要使用。它没有提供对缓冲区溢出漏洞的保护(也就是说,您不能告诉它您传递给它的缓冲区有多大,因此它不能阻止用户输入大于缓冲区和阻塞内存的行)。

  • Avoid using scanf. If not used carefully, it can have the same buffer overflow problems as gets. Even ignoring that, it has other problems that make it hard to use correctly.

    避免使用scanf。如果不小心使用,它可能会有相同的缓冲区溢出问题。即使忽略了这一点,它也有其他的问题使它难以正确使用。

  • Generally you should use fgets instead, although it's sometimes inconvenient (you have to strip the newline, you must determine a buffer size ahead of time, and then you must figure out what to do with lines that are too long–do you keep the part you read and discard the excess, discard the whole thing, dynamically grow the buffer and try again, etc.). There are some non-standard functions available that do this dynamic allocation for you (e.g. getline on POSIX systems, Chuck Falconer's public domain ggets function). Note that ggets has gets-like semantics in that it strips a trailing newline for you.

    通常应该使用fgets相反,虽然有时不方便(你必须带换行符,你必须提前确定缓冲区大小,然后你必须找出与行太久你让你阅读和丢弃多余的部分,丢弃整个事情,动态地增加缓冲和再试一次,等等)。有一些非标准的函数可以为您执行这个动态分配(例如,在POSIX系统上的getline, Chuck Falconer的公共域gget函数)。注意,gget有类似于gets的语义,因为它为您去掉了一个尾随的换行符。

#2


18  

Yes, you want to avoid gets. fgets will always read the new-line if the buffer was big enough to hold it (which lets you know when the buffer was too small and there's more of the line waiting to be read). If you want something like fgets that won't read the new-line (losing that indication of a too-small buffer) you can use fscanf with a scan-set conversion like: "%N[^\n]", where the 'N' is replaced by the buffer size - 1.

是的,你想避免得到。如果缓冲区足够大,fgets将始终读取这条新行(它可以让您知道缓冲区太小,还有更多的行等待读取)。如果你想函数类似,不会读新行(失去的层层肥肉缓冲)可以使用fscanf scan-set转换:“% N[^ \ N]”,取代“N”的缓冲区大小- 1。

One easy (if strange) way to remove the trailing new-line from a buffer after reading with fgets is: strtok(buffer, "\n"); This isn't how strtok is intended to be used, but I've used it this way more often than in the intended fashion (which I generally avoid).

一种简单(如果奇怪)的方法是在读取fgets后从缓冲区中删除拖尾的新行:strtok(缓冲区,“\n”);这不是strtok想要使用的方式,但我使用它的方式比预期的方式(我通常避免使用)更频繁。

#3


8  

There are numerous problems with this code. We'll fix the badly named variables and functions and investigate the problems:

这段代码有很多问题。我们将修复被命名的变量和函数,并研究这些问题:

  • First, CharToInt() should be renamed to the proper StringToInt() since it operates on an string not a single character.

    首先,CharToInt()应该被重命名为适当的StringToInt(),因为它操作的是字符串而不是单个字符。

  • The function CharToInt() [sic.] is unsafe. It doesn't check if the user accidentally passes in a NULL pointer.

    函数CharToInt()(原文如此。)是不安全的。它不检查用户是否意外地传递了一个空指针。

  • It doesn't validate input, or more correctly, skip invalid input. If the user enters in a non-digit the result will contain a bogus value. i.e. If you enter in N the code *(s+i) & 15 will produce 14 !?

    它没有验证输入,或者更准确地说,跳过无效输入。如果用户输入一个非数字,结果将包含一个伪值。也就是说,如果你输入了N,代码*(s+i)和15将产生14 !?

  • Next, the nondescript temp in CharToInt() [sic.] should be called digit since that is what it really is.

    接下来,在CharToInt()中没有描述的temp () [sic]。应该被称为数字,因为这就是它的本质。

  • Also, the kludge return result / 10; is just that -- a bad hack to work around a buggy implementation.

    另外,kludge返回结果/ 10;这只是一个拙劣的hack,在一个bug的实现周围工作。

  • Likewise MAX is badly named since it may appear to conflict with the standard usage. i.e. #define MAX(X,y) ((x)>(y))?(x):(y)

    同样,由于它可能与标准的使用相冲突,所以它的命名也很糟糕。即# define MAX(X,y)((X)>(y))?(X):(y)

  • The verbose *(s+i) is not as readable as simply *s. There is no need to use and clutter up the code with yet another temporary index i.

    verbose *(s+i)不像简单的*s那样可读。没有必要用另一个临时索引i来使用和整理代码。

gets()

This is bad because it can overflow the input string buffer. For example, if the buffer size is 2, and you enter in 16 characters, you will overflow str.

这很糟糕,因为它可以溢出输入字符串缓冲区。例如,如果缓冲区大小为2,并且输入16个字符,则会溢出str。

scanf()

This is equally bad because it can overflow the input string buffer.

这同样糟糕,因为它可以溢出输入字符串缓冲区。

You mention "when using scanf() function, the result is completely wrong because first character apparently has a -52 ASCII value."

您提到“当使用scanf()函数时,结果是完全错误的,因为第一个字符显然有-52 ASCII值。”

That is due to an incorrect usage of scanf(). I was not able to duplicate this bug.

这是因为scanf()的用法不正确。我不能复制这个错误。

fgets()

This is safe because you can guarantee you never overflow the input string buffer by passing in the buffer size (which includes room for the NULL.)

这是安全的,因为您可以保证不通过传入缓冲区大小(其中包括空的空间)来溢出输入字符串缓冲区。

getline()

A few people have suggested the C POSIX standard getline() as a replacement. Unfortunately this is not a practical portable solution as Microsoft does not implement a C version; only the standard C++ string template function as this SO #27755191 question answers. Microsoft's C++ getline() was available at least far back as Visual Studio 6 but since the OP is strictly asking about C and not C++ this isn't an option.

一些人建议C POSIX标准getline()作为替换。不幸的是,由于微软没有实现C版本,这并不是一个实际的可移植解决方案;只有标准的c++字符串模板函数,因此#27755191问题的答案。微软的c++ getline()至少可以追溯到Visual Studio 6,但由于OP严格要求C而不是c++,所以这不是一个选项。

Misc.

Lastly, this implementation is buggy in that it doesn't detect integer overflow. If the user enters too large a number the number may become negative! i.e. 9876543210 will become -18815698?! Let's fix that too.

最后,这个实现是错误的,因为它没有检测到整数溢出。如果用户输入的数字太大,数字可能会变成负数!即9876543210将变成-18815698?让我们解决这个问题。

This is trivial to fix for an unsigned int. If the previous partial number is less then the current partial number then we have overflowed and we return the previous partial number.

对于未签名的int,这是微不足道的。如果之前的部分数小于当前的部分数,那么我们就会溢出,然后返回之前的部分数字。

For a signed int this is a little more work. In assembly we could inspect the carry-flag, but in C there is no standard built-in way to detect overflow with signed int math. Fortunately, since we are multiplying by a constant, * 10, we can easily detect this if we use an equivalent equation:

对于一个已签名的int,这需要更多的工作。在装配过程中,我们可以检查随身携带的标志,但在C中,没有标准的内置方法来检测带有符号整数数学的溢出。幸运的是,由于我们乘以一个常数,* 10,我们可以很容易地检测到,如果我们用一个等价的方程:

n = x*10 = x*8 + x*2

If x*8 overflows then logically x*10 will as well. For a 32-bit int overflow will happen when x*8 = 0x100000000 thus all we need to do is detect when x >= 0x20000000. Since we don't want to assume how many bits an int has we only need to test if the top 3 msb's (Most Significant Bits) are set.

如果x*8溢出,那么逻辑上x*10也会。当x*8 = 0x100000000时,将会发生32位的int溢出,因此我们需要做的就是检测x >= 0x20000000。因为我们不想假定int有多少位,我们只需要测试前3个msb(最重要的位)是否被设置。

Additionally, a second overflow test is needed. If the msb is set (sign bit) after the digit concatenation then we also know the number overflowed.

此外,还需要进行第二次溢出测试。如果msb在数字连接后设置(符号位),那么我们也知道数字溢出。

Code

Here is a fixed safe version along with code that you can play with to detect overflow in the unsafe versions. I've also included both a signed and unsigned versions via #define SIGNED 1

这里有一个固定的安全版本,以及可以用来检测不安全版本溢出的代码。我还包括了签名和未签名的版本,通过#define签名1。

#include <stdio.h>
#include <ctype.h> // isdigit()

// 1 fgets
// 2 gets
// 3 scanf
#define INPUT 1

#define SIGNED 1

// re-implementation of atoi()
// Test Case: 2147483647 -- valid    32-bit
// Test Case: 2147483648 -- overflow 32-bit
int StringToInt( const char * s )
{
    int result = 0, prev, msb = (sizeof(int)*8)-1, overflow;

    if( !s )
        return result;

    while( *s )
    {
        if( isdigit( *s ) ) // Alt.: if ((*s >= '0') && (*s <= '9'))
        {
            prev     = result;
            overflow = result >> (msb-2); // test if top 3 MSBs will overflow on x*8
            result  *= 10;
            result  += *s++ & 0xF;// OPTIMIZATION: *s - '0'

            if( (result < prev) || overflow ) // check if would overflow
                return prev;
        }
        else
            break; // you decide SKIP or BREAK on invalid digits
    }

    return result;
}

// Test case: 4294967295 -- valid    32-bit
// Test case: 4294967296 -- overflow 32-bit
unsigned int StringToUnsignedInt( const char * s )
{
    unsigned int result = 0, prev;

    if( !s )
        return result;

    while( *s )
    {
        if( isdigit( *s ) ) // Alt.: if (*s >= '0' && *s <= '9')
        {
            prev    = result;
            result *= 10;
            result += *s++ & 0xF; // OPTIMIZATION: += (*s - '0')

            if( result < prev ) // check if would overflow
                return prev;
        }
        else
            break; // you decide SKIP or BREAK on invalid digits
    }

    return result;
}

int main()
{
    int  detect_buffer_overrun = 0;

    #define   BUFFER_SIZE 2    // set to small size to easily test overflow
    char str[ BUFFER_SIZE+1 ]; // C idiom is to reserve space for the NULL terminator

    printf(" Enter some numbers (no spaces): ");

#if   INPUT == 1
    fgets(str, sizeof(str), stdin);
#elif INPUT == 2
    gets(str); // can overflows
#elif INPUT == 3
    scanf("%s", str); // can also overflow
#endif

#if SIGNED
    printf(" Entered number is: %d\n", StringToInt(str));
#else
    printf(" Entered number is: %u\n", StringToUnsignedInt(str) );
#endif
    if( detect_buffer_overrun )
        printf( "Input buffer overflow!\n" );

    return 0;
}

#4


4  

You're correct that you should never use gets. If you want to use fgets, you can simply overwrite the newline.

你是对的,你永远不应该使用get。如果您想使用fgets,您可以简单地改写换行符。

char *result = fgets(str, sizeof(str), stdin);
char len = strlen(str);
if(result != NULL && str[len - 1] == '\n')
{
  str[len - 1] = '\0';
}
else
{
  // handle error
}

This does assume there are no embedded NULLs. Another option is POSIX getline:

这确实假定没有嵌入的空值。另一个选项是POSIX getline:

char *line = NULL;
size_t len = 0;
ssize_t count = getline(&line, &len, stdin);
if(count >= 1 && line[count - 1] == '\n')
{
  line[count - 1] = '\0';
}
else
{
  // Handle error
}

The advantage to getline is it does allocation and reallocation for you, it handles possible embedded NULLs, and it returns the count so you don't have to waste time with strlen. Note that you can't use an array with getline. The pointer must be NULL or free-able.

getline的优点是它为您分配和重新分配,它处理可能的嵌入的NULLs,它返回count,这样您就不必浪费时间与strlen了。注意,您不能使用带有getline的数组。指针必须是空的或可*的。

I'm not sure what issue you're having with scanf.

我不知道你和scanf有什么关系。

#5


3  

never use gets(), it can lead to unprdictable overflows. If your string array is of size 1000 and i enter 1001 characters, i can buffer overflow your program.

永远不要使用get(),它会导致无法处理的溢出。如果您的字符串数组大小为1000,而我输入1001个字符,则可以对程序进行缓冲区溢出。

#6


1  

Try using fgets() with this modified version of your CharToInt():

尝试使用fgets()来修改您的CharToInt():

int CharToInt(const char *s)
{
    int i, result, temp;

    result = 0;
    i = 0;

    while(*(s+i) != '\0')
    {
        if (isdigit(*(s+i)))
        {
            temp = *(s+i) & 15;
            result = (temp + result) * 10;
        }
        i++;
    }

    return result / 10;
}

It essentially validates the input digits and ignores anything else. This is very crude so modify it and salt to taste.

它实际上是验证输入数字并忽略其他任何东西。这是非常粗糙的,所以修改它和盐的味道。

#7


-2  

So I am not much of a programmer but let me try to answer your question about the scanf();. I think the scanf is pretty fine and use it for mostly everything without having any issues. But you have taken a not completely correct structure. It should be:

所以我不是一个程序员,但是让我试着回答你关于scanf()的问题。我认为scanf是很好的,并且在没有任何问题的情况下使用它。但是你的结构并不完全正确。应该是:

char str[MAX];
printf("Enter some text: ");
scanf("%s", &str);
fflush(stdin);

The "&" in front of the variable is important. It tells the program where (in which variable) to save the scanned value. the fflush(stdin); clears the buffer from the standard input (keyboard) so you're less likely to get a buffer overflow.

在变量前面的“&”是重要的。它告诉程序(在哪个变量中)保存扫描值。。fflush(stdin);从标准输入(键盘)清除缓冲区,这样就不太可能获得缓冲区溢出。

And the difference between gets/scanf and fgets is that gets(); and scanf(); only scan until the first space ' ' while fgets(); scans the whole input. (but be sure to clean the buffer afterwards so you wont get an overflow later on)

get /scanf和fgets的区别是get ();和scanf();只扫描到第一个空格,而fgets();扫描整个输入。(但一定要在事后清理缓冲区,以便以后不会出现溢出)