从ifstream读取不会读取空格

时间:2023-02-06 12:25:52

I'm implementing a custom lexer in C++ and when attempting to read in whitespace, the ifstream won't read it out. I'm reading character by character using >>, and all the whitespace is gone. Is there any way to make the ifstream keep all the whitespace and read it out to me? I know that when reading whole strings, the read will stop at whitespace, but I was hoping that by reading character by character, I would avoid this behaviour.

我正在用C ++实现一个自定义词法分析器,当试图读取空格时,ifstream将不会读取它。我正在使用>>逐字逐句阅读,所有的空白都消失了。有没有什么方法可以让ifstream保留所有的空格并将它读出来给我?我知道在阅读整个字符串时,读取将停留在空白处,但我希望通过逐字逐句阅读,我会避免这种行为。

Attempted: .get(), recommended by many answers, but it has the same effect as std::noskipws, that is, I get all the spaces now, but not the new-line character that I need to lex some constructs.

尝试:.get(),由许多答案推荐,但它与std :: noskipws具有相同的效果,也就是说,我现在获得所有空格,但不是我需要使用某些结构的新行字符。

Here's the offending code (extended comments truncated)

这是违规代码(扩展注释被截断)

while(input >> current) {
    always_next_struct val = always_next_struct(next);
    if (current == L' ' || current == L'\n' || current == L'\t' || current == L'\r') {
        continue;
    }
    if (current == L'/') {
        input >> current;
        if (current == L'/') {
            // explicitly empty while loop
            while(input.get(current) && current != L'\n');
            continue;
        }

I'm breaking on the while line and looking at every value of current as it comes in, and \r or \n are definitely not among them- the input just skips to the next line in the input file.

我正在打破while行并查看当前的每个值,而\ r或\ n肯定不在其中 - 输入只是跳到输入文件中的下一行。

8 个解决方案

#1


15  

There is a manipulator to disable the whitespace skipping behavior:

有一个操纵器可以禁用空格跳过行为:

stream >> std::noskipws;

#2


7  

The operator>> eats whitespace (space, tab, newline). Use yourstream.get() to read each character.

运算符>>吃空格(空格,制表符,换行符)。使用yourstream.get()读取每个字符。

Edit:

Beware: Platforms (Windows, Un*x, Mac) differ in coding of newline. It can be '\n', '\r' or both. It also depends on how you open the file stream (text or binary).

注意:平台(Windows,Un * x,Mac)在换行编码方面有所不同。它可以是'\ n','\ r'或两者。它还取决于您打开文件流(文本或二进制)的方式。

Edit (analyzing code):

编辑(分析代码):

After

  while(input.get(current) && current != L'\n');
  continue;

there will be an \n in current, if not end of file is reached. After that you continue with the outmost while loop. There the first character on the next line is read into current. Is that not what you wanted?

如果没有到达文件末尾,则会有当前的\ n。之后,继续进行最外面的循环。在那里,下一行的第一个字符被读入当前字符。这不是你想要的吗?

I tried to reproduce your problem (using char and cin instead of wchar_t and wifstream):

我试图重现你的问题(使用char和cin而不是wchar_t和wifstream):

//: get.cpp : compile, then run: get < get.cpp

#include <iostream>

int main()
{
  char c;

  while (std::cin.get(c))
  {
    if (c == '/') 
    { 
      char last = c; 
      if (std::cin.get(c) && c == '/')
      {
        // std::cout << "Read to EOL\n";
        while(std::cin.get(c) && c != '\n'); // this comment will be skipped
        // std::cout << "go to next line\n";
        std::cin.putback(c);
        continue;
      }
     else { std::cin.putback(c); c = last; }
    }
    std::cout << c;
  }
  return 0;
}

This program, applied to itself, eliminates all C++ line comments in its output. The inner while loop doesn't eat up all text to the end of file. Please note the putback(c) statement. Without that the newline would not appear.

该程序适用于自身,它在输出中消除了所有C ++行注释。内部while循环不会占用文件末尾的所有文本。请注意回放(c)声明。没有它,换行就不会出现。

If it doesn't work the same for wifstream, it would be very strange except for one reason: when the opened text file is not saved as 16bit char and the \n char ends up in the wrong byte...

如果它对wifstream不起作用,那将是非常奇怪的,除了一个原因:当打开的文本文件没有保存为16位字符并且\ n字符以错误的字节结束时...

#3


4  

Wrap the stream (or its buffer, specifically) in a std::streambuf_iterator? That should ignore all formatting, and also give you a nice iterator interface.

将流(或其缓冲区,特别是)包装在std :: streambuf_iterator中?这应该忽略所有格式,并为您提供一个很好的迭代器接口。

Alternatively, a much more efficient, and fool-proof, approach might to just use the Win32 API (or Boost) to memory-map the file. Then you can traverse it using plain pointers, and you're guaranteed that nothing will be skipped or converted by the runtime.

或者,一种更有效,更傻瓜的方法可能只是使用Win32 API(或Boost)来存储映射文件。然后你可以使用普通指针遍历它,并且保证运行时不会跳过或转换任何内容。

#4


2  

The stream extractors behave the same and skip whitespace.

流提取器的行为相同并跳过空格。

If you want to read every byte, you can use the unformatted input functions, like stream.get(c).

如果要读取每个字节,可以使用未格式化的输入函数,如stream.get(c)。

#5


2  

Why not simply use getline ?

为什么不简单地使用getline?

You will get all the whitespaces, and while you won't get the end of lines characters, you will still know where they lie :)

你会得到所有的空格,虽然你不会得到行字符的结尾,你仍然会知道它们在哪里:)

#6


2  

You could open the stream in binary mode:

您可以以二进制模式打开流:

std::wifstream stream(filename, std::ios::binary);

You'll lose any formatting operations provided my the stream if you do this.

如果您执行此操作,您将丢失我提供的任何格式化操作。

The other option is to read the entire stream into a string and then process the string:

另一个选项是将整个流读取为字符串,然后处理字符串:

std::wostringstream ss;
ss << filestream.rdbuf();

OF course, getting the string from the ostringstream rquires an additional copy of the string, so you could consider changing this at some point to use a custom stream if you feel adventurous. EDIT: someone else mention istreambuf_iterator, which is probably a better way of doing it than reading the whole stream into a string.

当然,从ostringstream获取字符串需要额外的字符串副本,因此如果您有冒险精神,可以考虑在某些时候更改此字符串以使用自定义流。编辑:其他人提到istreambuf_iterator,这可能是比将整个流读入字符串更好的方法。

#7


0  

You could just Wrap the stream in a std::streambuf_iterator to get data with all whitespaces and newlines like this .

您可以将流包装在std :: streambuf_iterator中以获取包含所有空格和新行的数据。

           /*Open the stream in default mode.*/
            std::ifstream myfile("myfile.txt");

            if(myfile.good()) {
                /*Read data using streambuffer iterators.*/
    vector<char> buf((std::istreambuf_iterator<char>(myfile)), (std::istreambuf_iterator<char>()));

                /*str_buf holds all the data including whitespaces and newline .*/
                string str_buf(buf.begin(),buf.end());

                myfile.close();
            } 

#8


-3  

I ended up just cracking open the Windows API and using it to read the whole file into a buffer first, and then reading that buffer character by character. Thanks guys.

我最后只是打开Windows API并使用它首先将整个文件读入缓冲区,然后逐个字符地读取缓冲区。多谢你们。

#1


15  

There is a manipulator to disable the whitespace skipping behavior:

有一个操纵器可以禁用空格跳过行为:

stream >> std::noskipws;

#2


7  

The operator>> eats whitespace (space, tab, newline). Use yourstream.get() to read each character.

运算符>>吃空格(空格,制表符,换行符)。使用yourstream.get()读取每个字符。

Edit:

Beware: Platforms (Windows, Un*x, Mac) differ in coding of newline. It can be '\n', '\r' or both. It also depends on how you open the file stream (text or binary).

注意:平台(Windows,Un * x,Mac)在换行编码方面有所不同。它可以是'\ n','\ r'或两者。它还取决于您打开文件流(文本或二进制)的方式。

Edit (analyzing code):

编辑(分析代码):

After

  while(input.get(current) && current != L'\n');
  continue;

there will be an \n in current, if not end of file is reached. After that you continue with the outmost while loop. There the first character on the next line is read into current. Is that not what you wanted?

如果没有到达文件末尾,则会有当前的\ n。之后,继续进行最外面的循环。在那里,下一行的第一个字符被读入当前字符。这不是你想要的吗?

I tried to reproduce your problem (using char and cin instead of wchar_t and wifstream):

我试图重现你的问题(使用char和cin而不是wchar_t和wifstream):

//: get.cpp : compile, then run: get < get.cpp

#include <iostream>

int main()
{
  char c;

  while (std::cin.get(c))
  {
    if (c == '/') 
    { 
      char last = c; 
      if (std::cin.get(c) && c == '/')
      {
        // std::cout << "Read to EOL\n";
        while(std::cin.get(c) && c != '\n'); // this comment will be skipped
        // std::cout << "go to next line\n";
        std::cin.putback(c);
        continue;
      }
     else { std::cin.putback(c); c = last; }
    }
    std::cout << c;
  }
  return 0;
}

This program, applied to itself, eliminates all C++ line comments in its output. The inner while loop doesn't eat up all text to the end of file. Please note the putback(c) statement. Without that the newline would not appear.

该程序适用于自身,它在输出中消除了所有C ++行注释。内部while循环不会占用文件末尾的所有文本。请注意回放(c)声明。没有它,换行就不会出现。

If it doesn't work the same for wifstream, it would be very strange except for one reason: when the opened text file is not saved as 16bit char and the \n char ends up in the wrong byte...

如果它对wifstream不起作用,那将是非常奇怪的,除了一个原因:当打开的文本文件没有保存为16位字符并且\ n字符以错误的字节结束时...

#3


4  

Wrap the stream (or its buffer, specifically) in a std::streambuf_iterator? That should ignore all formatting, and also give you a nice iterator interface.

将流(或其缓冲区,特别是)包装在std :: streambuf_iterator中?这应该忽略所有格式,并为您提供一个很好的迭代器接口。

Alternatively, a much more efficient, and fool-proof, approach might to just use the Win32 API (or Boost) to memory-map the file. Then you can traverse it using plain pointers, and you're guaranteed that nothing will be skipped or converted by the runtime.

或者,一种更有效,更傻瓜的方法可能只是使用Win32 API(或Boost)来存储映射文件。然后你可以使用普通指针遍历它,并且保证运行时不会跳过或转换任何内容。

#4


2  

The stream extractors behave the same and skip whitespace.

流提取器的行为相同并跳过空格。

If you want to read every byte, you can use the unformatted input functions, like stream.get(c).

如果要读取每个字节,可以使用未格式化的输入函数,如stream.get(c)。

#5


2  

Why not simply use getline ?

为什么不简单地使用getline?

You will get all the whitespaces, and while you won't get the end of lines characters, you will still know where they lie :)

你会得到所有的空格,虽然你不会得到行字符的结尾,你仍然会知道它们在哪里:)

#6


2  

You could open the stream in binary mode:

您可以以二进制模式打开流:

std::wifstream stream(filename, std::ios::binary);

You'll lose any formatting operations provided my the stream if you do this.

如果您执行此操作,您将丢失我提供的任何格式化操作。

The other option is to read the entire stream into a string and then process the string:

另一个选项是将整个流读取为字符串,然后处理字符串:

std::wostringstream ss;
ss << filestream.rdbuf();

OF course, getting the string from the ostringstream rquires an additional copy of the string, so you could consider changing this at some point to use a custom stream if you feel adventurous. EDIT: someone else mention istreambuf_iterator, which is probably a better way of doing it than reading the whole stream into a string.

当然,从ostringstream获取字符串需要额外的字符串副本,因此如果您有冒险精神,可以考虑在某些时候更改此字符串以使用自定义流。编辑:其他人提到istreambuf_iterator,这可能是比将整个流读入字符串更好的方法。

#7


0  

You could just Wrap the stream in a std::streambuf_iterator to get data with all whitespaces and newlines like this .

您可以将流包装在std :: streambuf_iterator中以获取包含所有空格和新行的数据。

           /*Open the stream in default mode.*/
            std::ifstream myfile("myfile.txt");

            if(myfile.good()) {
                /*Read data using streambuffer iterators.*/
    vector<char> buf((std::istreambuf_iterator<char>(myfile)), (std::istreambuf_iterator<char>()));

                /*str_buf holds all the data including whitespaces and newline .*/
                string str_buf(buf.begin(),buf.end());

                myfile.close();
            } 

#8


-3  

I ended up just cracking open the Windows API and using it to read the whole file into a buffer first, and then reading that buffer character by character. Thanks guys.

我最后只是打开Windows API并使用它首先将整个文件读入缓冲区,然后逐个字符地读取缓冲区。多谢你们。