如何拆分大文件?

时间:2023-01-14 22:30:02

I have a large CSV file (7.3GB; 16,300,000 lines), how can I split this file into two files?

我有一个大的CSV文件(7.3GB; 16,300,000行),如何将此文件拆分为两个文件?

2 个解决方案

#1


16  

Have you taken a look at the split command? See this man page for more information.

你看过split命令了吗?有关更多信息,请参见此手册页。

This page contains an example use of this command.

此页面包含此命令的示例用法。

Aside:

the man -k command is rather useful for finding unix/linux commands if you aren't quite sure what the specific command is. Specify a keyword with the man -k command and the system will pull out related commands. E.g.,

如果您不太确定具体命令是什么,man -k命令对于查找unix / linux命令非常有用。使用man -k命令指定关键字,系统将提取相关命令。例如。,

% man -k split

will yield:

csplit (1)           - split a file into sections determined by context lines
dirsplit (1)         - splits directory into multiple with equal size
dpkg-split (1)       - Debian package archive split/join tool
gpgsplit (1)         - Split an OpenPGP message into packets
pnmsplit (1)         - split a multi-image portable anymap into multiple single-image files
ppmtoyuvsplit (1)    - convert a portable pixmap into 3 subsampled raw YUV files
split (1)            - split a file into pieces
splitdiff (1)        - separate out incremental patches
splitfont (1)        - extract characters from an ISO-type font.
URI::Split (3pm)     - Parse and compose URI strings
wcstok (3)           - split wide-character string into tokens
yuvsplittoppm (1)    - convert a Y- and a U- and a V-file into a portable pixmap
zipsplit (1)         - split a zipfile into smaller zipfiles

#2


1  

split -d -n l/N filename.csv tempfile.part.

split -d -n l / N filename.csv tempfile.part。

splits the file into N files without splitting lines. As mentioned in the comments above, the header is not repeated in each file.

将文件拆分为N个文件而不拆分行。如上面的注释中所述,标题不会在每个文件中重复。

#1


16  

Have you taken a look at the split command? See this man page for more information.

你看过split命令了吗?有关更多信息,请参见此手册页。

This page contains an example use of this command.

此页面包含此命令的示例用法。

Aside:

the man -k command is rather useful for finding unix/linux commands if you aren't quite sure what the specific command is. Specify a keyword with the man -k command and the system will pull out related commands. E.g.,

如果您不太确定具体命令是什么,man -k命令对于查找unix / linux命令非常有用。使用man -k命令指定关键字,系统将提取相关命令。例如。,

% man -k split

will yield:

csplit (1)           - split a file into sections determined by context lines
dirsplit (1)         - splits directory into multiple with equal size
dpkg-split (1)       - Debian package archive split/join tool
gpgsplit (1)         - Split an OpenPGP message into packets
pnmsplit (1)         - split a multi-image portable anymap into multiple single-image files
ppmtoyuvsplit (1)    - convert a portable pixmap into 3 subsampled raw YUV files
split (1)            - split a file into pieces
splitdiff (1)        - separate out incremental patches
splitfont (1)        - extract characters from an ISO-type font.
URI::Split (3pm)     - Parse and compose URI strings
wcstok (3)           - split wide-character string into tokens
yuvsplittoppm (1)    - convert a Y- and a U- and a V-file into a portable pixmap
zipsplit (1)         - split a zipfile into smaller zipfiles

#2


1  

split -d -n l/N filename.csv tempfile.part.

split -d -n l / N filename.csv tempfile.part。

splits the file into N files without splitting lines. As mentioned in the comments above, the header is not repeated in each file.

将文件拆分为N个文件而不拆分行。如上面的注释中所述,标题不会在每个文件中重复。