从C语言中的fscanf字符串中删除特殊字符

时间:2022-01-22 17:06:21

I'm currently using the following code to scan each word in a text file, put it into a variable then do some manipulations with it before moving onto the next word. This works fine, but I'm trying to remove all characters that don't fall under A-Z / a-z. e.g if "he5llo" was entered I want the output to be "hello". If I can't modify fscanf to do it is there way of doing it to the variable once scanned? Thanks.

我现在使用下面的代码来扫描文本文件中的每个单词,然后将其放入一个变量中,然后在移动到下一个单词之前对其进行一些操作。这很好,但是我正在删除所有不属于A-Z / A-Z的字符。e。g如果输入“he5llo”,我希望输出是“hello”。如果我不能修改fscanf,它是否有办法对这个变量进行扫描?谢谢。

while (fscanf(inputFile, "%s", x) == 1)

5 个解决方案

#1


3  

You can give x to a function like this. First simple version for sake of understanding:

你可以把x赋给这样的函数。第一个简单的版本,为了理解:

// header needed for isalpha()
#include <ctype.h>

void condense_alpha_str(char *str) {
  int source = 0; // index of copy source
  int dest = 0; // index of copy destination

  // loop until original end of str reached
  while (str[source] != '\0') {
    if (isalpha(str[source])) {
      // keep only chars matching isalpha()
      str[dest] = str[source];
      ++dest;
    }
    ++source; // advance source always, wether char was copied or not
  }
  str[dest] = '\0'; // add new terminating 0 byte, in case string got shorter
}

It will go through the string in-place, copying chars which match isalpha() test, skipping and thus removing those which do not. To understand the code, it's important to realize that C strings are just char arrays, with byte value 0 marking end of the string. Another important detail is, that in C arrays and pointers are in many (not all!) ways same thing, so pointer can be indexed just like array. Also, this simple version will re-write every byte in the string, even when string doesn't actually change.

它将通过适当的字符串,复制与isalpha()测试匹配的字符,跳过这些字符,从而删除不匹配的字符。要理解代码,重要的是要认识到C字符串只是char数组,字节值为0标记字符串的末尾。另一个重要的细节是,在C数组和指针中有很多(不是全部)相同的东西,所以指针可以像数组一样被索引。此外,这个简单的版本将重写字符串中的每个字节,即使字符串实际上没有变化。


Then a more full-featured version, which uses filter function passed as parameter, and will only do memory writes if str changes, and returns pointer to str like most library string functions do:

然后是一个更完整的版本,它使用作为参数传递的filter函数,并且只有当str发生变化时,才会执行内存写入,并且返回到str的指针,就像大多数库字符串函数一样:

char *condense_str(char *str, int (*filter)(int)) {

  int source = 0; // index of character to copy

  // optimization: skip initial matching chars
  while (filter(str[source])) {
    ++source; 
  }
  // source is now index if first non-matching char or end-of-string

  // optimization: only do condense loop if not at end of str yet
  if (str[source]) { // '\0' is same as false in C

    // start condensing the string from first non-matching char
    int dest = source; // index of copy destination
    do {
      if (filter(str[source])) {
        // keep only chars matching given filter function
        str[dest] = str[source];
        ++dest;
      }
      ++source; // advance source always, wether char was copied or not
    } while (str[source]);
    str[dest] = '\0'; // add terminating 0 byte to match condenced string

  }

  // follow convention of strcpy, strcat etc, and return the string
  return str;
}

Example filter function:

例子过滤功能:

int isNotAlpha(char ch) {
    return !isalpha(ch);
}

Example calls:

示例调用:

char sample[] = "1234abc";
condense_str(sample, isalpha); // use a library function from ctype.h
// note: return value ignored, it's just convenience not needed here
// sample is now "abc"
condense_str(sample, isNotAlpha); // use custom function
// sample is now "", empty

// fscanf code from question, with buffer overrun prevention
char x[100];
while (fscanf(inputFile, "%99s", x) == 1) {
  condense_str(x, isalpha); // x modified in-place
  ...
}

reference:

参考:

Read int isalpha ( int c ); manual:

读取int isalpha (int c);手册:

Checks whether c is an alphabetic letter.
Return Value:
A value different from zero (i.e., true) if indeed c is an alphabetic letter. Zero (i.e., false) otherwise

检查c是否是字母。返回值:与零值不同的值(即。如果c确实是一个字母。零(即。,否则假)

#2


1  

luser droog answer will work, but in my opinion it is more complicated than necessary.

luser droog的答案是可行的,但在我看来,它比必要的要复杂得多。

foi your simple example you could try this:

你可以试试这个简单的例子:

while (fscanf(inputFile, "%[A-Za-z]", x) == 1) {   // read until find a non alpha character
   fscanf(inputFile, "%*[^A-Za-z]"))  // discard non alpha character and continue
}

#3


0  

you can use the isalpha() function checking for all the characters contained into the string

可以使用isalpha()函数检查字符串中包含的所有字符

#4


0  

I'm working on a similar project so you're in good hands! Strip the word down into separate parts.

我正在做一个类似的项目,所以你的手很好!把这个词分成不同的部分。

Blank spaces aren't an issue with cin each word You can use a

空格对于每个可以使用a的单词cin来说都不是问题

 if( !isPunct(x) )

Increase the index by 1, and add that new string to a temporary string holder. You can select characters in a string like an array, so finding those non-alpha characters and storing the new string is easy.

将索引增加1,并将新字符串添加到临时字符串持有者。您可以像数组一样在字符串中选择字符,因此找到这些非字符并存储新字符串很容易。

 string x = "hell5o"     // loop through until you find a non-alpha & mark that pos
 for( i = 0; i <= pos-1; i++ )
                                    // store the different parts of the string
 string tempLeft = ...    // make loops up to and after the position of non-alpha character
 string tempRight = ... 

#5


0  

The scanf family functions won't do this. You'll have to loop over the string and use isalpha to check each character. And "remove" the character with memmove by copying the end of the string forward.

scanf家族函数不会这样做。您必须对字符串进行循环,并使用isalpha检查每个字符。并通过复制字符串的末尾来“删除”字符。

Maybe scanf can do it after all. Under most circumstances, scanf and friends will push back any non-whitespace characters back onto the input stream if they fail to match.

也许斯坎夫终究能做到。在大多数情况下,如果不匹配,scanf和friends将把任何非空格字符返回到输入流。

This example uses scanf as a regex filter on the stream. Using the * conversion modifier means there's no storage destination for the negated pattern; it just gets eaten.

这个示例使用scanf作为流上的regex过滤器。使用*转换修饰符意味着否定模式没有存储目的地;它只是被吃掉。

#include <stdio.h>
#include <string.h>

int main(){
    enum { BUF_SZ = 80 };   // buffer size in one place
    char buf[BUF_SZ] = "";
    char fmtfmt[] = "%%%d[A-Za-z]";  // format string for the format string
    char fmt[sizeof(fmtfmt + 3)];    // storage for the real format string
    char nfmt[] = "%*[^A-Za-z]";     // negated pattern

    char *p = buf;                               // initialize the pointer
    sprintf(fmt, fmtfmt, BUF_SZ - strlen(buf));  // initialize the format string
    //printf("%s",fmt);
    while( scanf(fmt,p) != EOF                   // scan for format into buffer via pointer
        && scanf(nfmt) != EOF){                  // scan for negated format
        p += strlen(p);                          // adjust pointer
        sprintf(fmt, fmtfmt, BUF_SZ - strlen(buf));   // adjust format string (re-init)
    }
    printf("%s\n",buf);
    return 0;
}

#1


3  

You can give x to a function like this. First simple version for sake of understanding:

你可以把x赋给这样的函数。第一个简单的版本,为了理解:

// header needed for isalpha()
#include <ctype.h>

void condense_alpha_str(char *str) {
  int source = 0; // index of copy source
  int dest = 0; // index of copy destination

  // loop until original end of str reached
  while (str[source] != '\0') {
    if (isalpha(str[source])) {
      // keep only chars matching isalpha()
      str[dest] = str[source];
      ++dest;
    }
    ++source; // advance source always, wether char was copied or not
  }
  str[dest] = '\0'; // add new terminating 0 byte, in case string got shorter
}

It will go through the string in-place, copying chars which match isalpha() test, skipping and thus removing those which do not. To understand the code, it's important to realize that C strings are just char arrays, with byte value 0 marking end of the string. Another important detail is, that in C arrays and pointers are in many (not all!) ways same thing, so pointer can be indexed just like array. Also, this simple version will re-write every byte in the string, even when string doesn't actually change.

它将通过适当的字符串,复制与isalpha()测试匹配的字符,跳过这些字符,从而删除不匹配的字符。要理解代码,重要的是要认识到C字符串只是char数组,字节值为0标记字符串的末尾。另一个重要的细节是,在C数组和指针中有很多(不是全部)相同的东西,所以指针可以像数组一样被索引。此外,这个简单的版本将重写字符串中的每个字节,即使字符串实际上没有变化。


Then a more full-featured version, which uses filter function passed as parameter, and will only do memory writes if str changes, and returns pointer to str like most library string functions do:

然后是一个更完整的版本,它使用作为参数传递的filter函数,并且只有当str发生变化时,才会执行内存写入,并且返回到str的指针,就像大多数库字符串函数一样:

char *condense_str(char *str, int (*filter)(int)) {

  int source = 0; // index of character to copy

  // optimization: skip initial matching chars
  while (filter(str[source])) {
    ++source; 
  }
  // source is now index if first non-matching char or end-of-string

  // optimization: only do condense loop if not at end of str yet
  if (str[source]) { // '\0' is same as false in C

    // start condensing the string from first non-matching char
    int dest = source; // index of copy destination
    do {
      if (filter(str[source])) {
        // keep only chars matching given filter function
        str[dest] = str[source];
        ++dest;
      }
      ++source; // advance source always, wether char was copied or not
    } while (str[source]);
    str[dest] = '\0'; // add terminating 0 byte to match condenced string

  }

  // follow convention of strcpy, strcat etc, and return the string
  return str;
}

Example filter function:

例子过滤功能:

int isNotAlpha(char ch) {
    return !isalpha(ch);
}

Example calls:

示例调用:

char sample[] = "1234abc";
condense_str(sample, isalpha); // use a library function from ctype.h
// note: return value ignored, it's just convenience not needed here
// sample is now "abc"
condense_str(sample, isNotAlpha); // use custom function
// sample is now "", empty

// fscanf code from question, with buffer overrun prevention
char x[100];
while (fscanf(inputFile, "%99s", x) == 1) {
  condense_str(x, isalpha); // x modified in-place
  ...
}

reference:

参考:

Read int isalpha ( int c ); manual:

读取int isalpha (int c);手册:

Checks whether c is an alphabetic letter.
Return Value:
A value different from zero (i.e., true) if indeed c is an alphabetic letter. Zero (i.e., false) otherwise

检查c是否是字母。返回值:与零值不同的值(即。如果c确实是一个字母。零(即。,否则假)

#2


1  

luser droog answer will work, but in my opinion it is more complicated than necessary.

luser droog的答案是可行的,但在我看来,它比必要的要复杂得多。

foi your simple example you could try this:

你可以试试这个简单的例子:

while (fscanf(inputFile, "%[A-Za-z]", x) == 1) {   // read until find a non alpha character
   fscanf(inputFile, "%*[^A-Za-z]"))  // discard non alpha character and continue
}

#3


0  

you can use the isalpha() function checking for all the characters contained into the string

可以使用isalpha()函数检查字符串中包含的所有字符

#4


0  

I'm working on a similar project so you're in good hands! Strip the word down into separate parts.

我正在做一个类似的项目,所以你的手很好!把这个词分成不同的部分。

Blank spaces aren't an issue with cin each word You can use a

空格对于每个可以使用a的单词cin来说都不是问题

 if( !isPunct(x) )

Increase the index by 1, and add that new string to a temporary string holder. You can select characters in a string like an array, so finding those non-alpha characters and storing the new string is easy.

将索引增加1,并将新字符串添加到临时字符串持有者。您可以像数组一样在字符串中选择字符,因此找到这些非字符并存储新字符串很容易。

 string x = "hell5o"     // loop through until you find a non-alpha & mark that pos
 for( i = 0; i <= pos-1; i++ )
                                    // store the different parts of the string
 string tempLeft = ...    // make loops up to and after the position of non-alpha character
 string tempRight = ... 

#5


0  

The scanf family functions won't do this. You'll have to loop over the string and use isalpha to check each character. And "remove" the character with memmove by copying the end of the string forward.

scanf家族函数不会这样做。您必须对字符串进行循环,并使用isalpha检查每个字符。并通过复制字符串的末尾来“删除”字符。

Maybe scanf can do it after all. Under most circumstances, scanf and friends will push back any non-whitespace characters back onto the input stream if they fail to match.

也许斯坎夫终究能做到。在大多数情况下,如果不匹配,scanf和friends将把任何非空格字符返回到输入流。

This example uses scanf as a regex filter on the stream. Using the * conversion modifier means there's no storage destination for the negated pattern; it just gets eaten.

这个示例使用scanf作为流上的regex过滤器。使用*转换修饰符意味着否定模式没有存储目的地;它只是被吃掉。

#include <stdio.h>
#include <string.h>

int main(){
    enum { BUF_SZ = 80 };   // buffer size in one place
    char buf[BUF_SZ] = "";
    char fmtfmt[] = "%%%d[A-Za-z]";  // format string for the format string
    char fmt[sizeof(fmtfmt + 3)];    // storage for the real format string
    char nfmt[] = "%*[^A-Za-z]";     // negated pattern

    char *p = buf;                               // initialize the pointer
    sprintf(fmt, fmtfmt, BUF_SZ - strlen(buf));  // initialize the format string
    //printf("%s",fmt);
    while( scanf(fmt,p) != EOF                   // scan for format into buffer via pointer
        && scanf(nfmt) != EOF){                  // scan for negated format
        p += strlen(p);                          // adjust pointer
        sprintf(fmt, fmtfmt, BUF_SZ - strlen(buf));   // adjust format string (re-init)
    }
    printf("%s\n",buf);
    return 0;
}