在Linux c++应用程序中查找和读取大型文件

时间:2022-01-01 01:50:43

I am running into integer overflow using the standard ftell and fseek options inside of G++, but I guess I was mistaken because it seems that ftell64 and fseek64 are not available. I have been searching and many websites seem to reference using lseek with the off64_t datatype, but I have not found any examples referencing something equal to fseek. Right now the files that I am reading in are 16GB+ CSV files with the expectation of at least double that.

我正在使用g++中的标准ftell和fseek选项运行整数溢出,但我猜我错了,因为似乎ftell64和fseek64不可用。我一直在搜索,许多网站似乎引用了lseek和off64_t数据类型,但我没有发现任何引用与fseek相同的例子。现在我正在读取的文件是16GB+ CSV文件,期望至少翻倍。

Without any external libraries what is the most straightforward method for achieving a similar structure as with the fseek/ftell pair? My application right now works using the standard GCC/G++ libraries for 4.x.

没有任何外部库,实现类似fseek/ftell对类似结构的最直接方法是什么?我的应用程序现在可以使用标准的GCC/ g++库来运行4.x。

5 个解决方案

#1


24  

fseek64 is a C function. To make it available you'll have to define _FILE_OFFSET_BITS=64 before including the system headers That will more or less define fseek to be actually fseek64. Or do it in the compiler arguments e.g. gcc -D_FILE_OFFSET_BITS=64 ....

fseek64是一个C函数。要使fseek可用,您必须先定义_FILE_OFFSET_BITS=64,然后再包含或多或少将fseek定义为fseek64的系统头。或用编译器参数如gcc - d_file_offset_bits = 64 ....

http://www.suse.de/~aj/linux_lfs.html has a great overviw of large file support on linux:

http://www.suse.de/~aj/linux_lfs.html在linux上有大量的大型文件支持:

  • Compile your programs with "gcc -D_FILE_OFFSET_BITS=64". This forces all file access calls to use the 64 bit variants. Several types change also, e.g. off_t becomes off64_t. It's therefore important to always use the correct types and to not use e.g. int instead of off_t. For portability with other platforms you should use getconf LFS_CFLAGS which will return -D_FILE_OFFSET_BITS=64 on Linux platforms but might return something else on e.g. Solaris. For linking, you should use the link flags that are reported via getconf LFS_LDFLAGS. On Linux systems, you do not need special link flags.
  • 使用“gcc -D_FILE_OFFSET_BITS=64”编译程序。这迫使所有文件访问调用都使用64位变体。一些类型也会改变,例如off_t变成off64_t。因此,重要的是始终使用正确的类型,不要使用int而不是off_t。对于与其他平台的可移植性,您应该使用getconf LFS_CFLAGS,它将在Linux平台上返回-D_FILE_OFFSET_BITS=64,但可能在Solaris等平台上返回其他内容。对于链接,您应该使用通过getconf LFS_LDFLAGS报告的链接标志。在Linux系统上,不需要特殊的链接标志。
  • Define _LARGEFILE_SOURCE and _LARGEFILE64_SOURCE. With these defines you can use the LFS functions like open64 directly.
  • 定义_LARGEFILE_SOURCE和_LARGEFILE64_SOURCE。有了这些定义,您可以直接使用LFS函数,如open64。
  • Use the O_LARGEFILE flag with open to operate on large files.
  • 使用开放的O_LARGEFILE标志对大型文件进行操作。

#2


9  

If you want to stick to ISO C standard interfaces, use fgetpos() and fsetpos(). However, these functions are only useful for saving a file position and going back to the same position later. They represent the position using the type fpos_t, which is not required to be an integer data type. For example, on a record-based system it could be a struct containing a record number and offset within the record. This may be too limiting.

如果您希望坚持使用ISO C标准接口,请使用fgetpos()和fsetpos()。但是,这些函数只对保存文件位置和稍后返回相同位置有用。它们使用类型fpos_t表示位置,该类型不需要是整数数据类型。例如,在基于记录的系统上,它可以是一个包含记录编号和记录内偏移量的结构体。这可能太有限了。

POSIX defines the functions ftello() and fseeko(), which represent the position using the off_t type. This is required to be an integer type, and the value is a byte offset from the beginning of the file. You can perform arithmetic on it, and can use fseeko() to perform relative seeks. This will work on Linux and other POSIX systems.

POSIX定义函数ftello()和fseeko(),它们使用off_t类型表示位置。这需要是一个整数类型,值是从文件开始的字节偏移量。您可以在它上面执行算术,并且可以使用fseeko()来执行相关的查找。这将适用于Linux和其他POSIX系统。

In addition, compile with -D_FILE_OFFSET_BITS=64 (Linux/Solaris). This will define off_t to be a 64-bit type (i.e. off64_t) instead of long, and will redefine the functions that use file offsets to be the versions that take 64-bit offsets. This is the default when you are compiling for 64-bit, so is not needed in that case.

此外,使用-D_FILE_OFFSET_BITS=64 (Linux/Solaris)进行编译。这将定义off_t为64位类型(即off64_t),而不是long,并将重新定义使用文件偏移量的函数,以作为64位偏移量的版本。在为64位编译时,这是默认值,在这种情况下不需要这样做。

#3


5  

fseek64() isn't standard, the compiler docs should tell you where to find it.

fseek64()不是标准的,编译器文档应该告诉您在哪里找到它。

Have you tried fgetpos and fsetpos? They're designed for large files and the implementation typically uses a 64-bit type as the base for fpos_t.

你试过fgetpos和fsetpos吗?它们是为大型文件设计的,实现通常使用64位类型作为fpos_t的基础。

#4


3  

Have you tried fseeko() with the _FILE_OFFSET_BITS preprocessor symbol set to 64?

您是否尝试过使用_FILE_OFFSET_BITS预处理器符号设置为64的fseeko() ?

This will give you an fseek()-like interface but with an offset parameter of type off_t instead of long. Setting _FILE_OFFSET_BITS=64 will make off_t a 64-bit type.

这将为您提供一个类似fseek()的接口,但是使用类型为off_t而不是long的偏移参数。设置_FILE_OFFSET_BITS=64将使off_t成为64位类型。

The same for goes for ftello().

ftello()也是如此。

#5


2  

Use fsetpos(3) and fgetpos(3). They use the fpos_t datatype , which I believe is guaranteed to be able to hold at least 64 bits.

使用fsetpos(3)和fgetpos(3)。他们使用fpos_t数据类型,我相信它可以保证至少保存64位。

#1


24  

fseek64 is a C function. To make it available you'll have to define _FILE_OFFSET_BITS=64 before including the system headers That will more or less define fseek to be actually fseek64. Or do it in the compiler arguments e.g. gcc -D_FILE_OFFSET_BITS=64 ....

fseek64是一个C函数。要使fseek可用,您必须先定义_FILE_OFFSET_BITS=64,然后再包含或多或少将fseek定义为fseek64的系统头。或用编译器参数如gcc - d_file_offset_bits = 64 ....

http://www.suse.de/~aj/linux_lfs.html has a great overviw of large file support on linux:

http://www.suse.de/~aj/linux_lfs.html在linux上有大量的大型文件支持:

  • Compile your programs with "gcc -D_FILE_OFFSET_BITS=64". This forces all file access calls to use the 64 bit variants. Several types change also, e.g. off_t becomes off64_t. It's therefore important to always use the correct types and to not use e.g. int instead of off_t. For portability with other platforms you should use getconf LFS_CFLAGS which will return -D_FILE_OFFSET_BITS=64 on Linux platforms but might return something else on e.g. Solaris. For linking, you should use the link flags that are reported via getconf LFS_LDFLAGS. On Linux systems, you do not need special link flags.
  • 使用“gcc -D_FILE_OFFSET_BITS=64”编译程序。这迫使所有文件访问调用都使用64位变体。一些类型也会改变,例如off_t变成off64_t。因此,重要的是始终使用正确的类型,不要使用int而不是off_t。对于与其他平台的可移植性,您应该使用getconf LFS_CFLAGS,它将在Linux平台上返回-D_FILE_OFFSET_BITS=64,但可能在Solaris等平台上返回其他内容。对于链接,您应该使用通过getconf LFS_LDFLAGS报告的链接标志。在Linux系统上,不需要特殊的链接标志。
  • Define _LARGEFILE_SOURCE and _LARGEFILE64_SOURCE. With these defines you can use the LFS functions like open64 directly.
  • 定义_LARGEFILE_SOURCE和_LARGEFILE64_SOURCE。有了这些定义,您可以直接使用LFS函数,如open64。
  • Use the O_LARGEFILE flag with open to operate on large files.
  • 使用开放的O_LARGEFILE标志对大型文件进行操作。

#2


9  

If you want to stick to ISO C standard interfaces, use fgetpos() and fsetpos(). However, these functions are only useful for saving a file position and going back to the same position later. They represent the position using the type fpos_t, which is not required to be an integer data type. For example, on a record-based system it could be a struct containing a record number and offset within the record. This may be too limiting.

如果您希望坚持使用ISO C标准接口,请使用fgetpos()和fsetpos()。但是,这些函数只对保存文件位置和稍后返回相同位置有用。它们使用类型fpos_t表示位置,该类型不需要是整数数据类型。例如,在基于记录的系统上,它可以是一个包含记录编号和记录内偏移量的结构体。这可能太有限了。

POSIX defines the functions ftello() and fseeko(), which represent the position using the off_t type. This is required to be an integer type, and the value is a byte offset from the beginning of the file. You can perform arithmetic on it, and can use fseeko() to perform relative seeks. This will work on Linux and other POSIX systems.

POSIX定义函数ftello()和fseeko(),它们使用off_t类型表示位置。这需要是一个整数类型,值是从文件开始的字节偏移量。您可以在它上面执行算术,并且可以使用fseeko()来执行相关的查找。这将适用于Linux和其他POSIX系统。

In addition, compile with -D_FILE_OFFSET_BITS=64 (Linux/Solaris). This will define off_t to be a 64-bit type (i.e. off64_t) instead of long, and will redefine the functions that use file offsets to be the versions that take 64-bit offsets. This is the default when you are compiling for 64-bit, so is not needed in that case.

此外,使用-D_FILE_OFFSET_BITS=64 (Linux/Solaris)进行编译。这将定义off_t为64位类型(即off64_t),而不是long,并将重新定义使用文件偏移量的函数,以作为64位偏移量的版本。在为64位编译时,这是默认值,在这种情况下不需要这样做。

#3


5  

fseek64() isn't standard, the compiler docs should tell you where to find it.

fseek64()不是标准的,编译器文档应该告诉您在哪里找到它。

Have you tried fgetpos and fsetpos? They're designed for large files and the implementation typically uses a 64-bit type as the base for fpos_t.

你试过fgetpos和fsetpos吗?它们是为大型文件设计的,实现通常使用64位类型作为fpos_t的基础。

#4


3  

Have you tried fseeko() with the _FILE_OFFSET_BITS preprocessor symbol set to 64?

您是否尝试过使用_FILE_OFFSET_BITS预处理器符号设置为64的fseeko() ?

This will give you an fseek()-like interface but with an offset parameter of type off_t instead of long. Setting _FILE_OFFSET_BITS=64 will make off_t a 64-bit type.

这将为您提供一个类似fseek()的接口,但是使用类型为off_t而不是long的偏移参数。设置_FILE_OFFSET_BITS=64将使off_t成为64位类型。

The same for goes for ftello().

ftello()也是如此。

#5


2  

Use fsetpos(3) and fgetpos(3). They use the fpos_t datatype , which I believe is guaranteed to be able to hold at least 64 bits.

使用fsetpos(3)和fgetpos(3)。他们使用fpos_t数据类型,我相信它可以保证至少保存64位。