如何从巨大的数据转储中删除我想要的东西?

时间:2022-01-14 09:21:24

I've got a 3.55 GB .txt file, which is too big to get into Access. It's got about 5 million records in it and I only a small portion of those. I need a way to parse out the lines of data that I need and get rid of the bulk of the data. Each line of text is 651 characters, but fortunately we can sort it by the first three. If I can delete any line that doesn't begin with 044, 067, 122, or 107, I'll have the file down to a size that I'll be able to load into Access. I've loaded both cygwin and mysql onto the machine, now I'm staring at the command prompts wondering what to do next.

我有一个3.55 GB的.txt文件,这个文件太大了,无法进入Access。它有大约500万条记录,而我只有一小部分。我需要一种方法来解析我需要的数据行并摆脱大量数据。每行文本为651个字符,但幸运的是我们可以按前三个字符排序。如果我可以删除任何不以044,067,122或107开头的行,我将把文件缩小到我能够加载到Access的大小。我已经将cygwin和mysql加载到机器上,现在我正盯着命令提示,想知道下一步该做什么。

1 个解决方案

#1


1  

If you've got cygwin, then something like

如果你有cygwin,那么就像

grep '^(044|067|122|107)' file.csv > newfile.csv

would do the trick. Might need to enable grep's regex options, but my brain's in Friday Mush mode right now and can't remember what they are offhand.

会做的伎俩。可能需要启用grep的正则表达式选项,但我的大脑现在处于星期五的Mush模式,并且不记得它们是什么。

#1


1  

If you've got cygwin, then something like

如果你有cygwin,那么就像

grep '^(044|067|122|107)' file.csv > newfile.csv

would do the trick. Might need to enable grep's regex options, but my brain's in Friday Mush mode right now and can't remember what they are offhand.

会做的伎俩。可能需要启用grep的正则表达式选项,但我的大脑现在处于星期五的Mush模式,并且不记得它们是什么。