读取大文件时出现内存错误

时间:2022-04-25 00:08:33

I am trying to read a 59 GB file and break it to a number of new files according to some id found in the begging of each line. I am running the code below which breaks with a memory error after producing 45GB files. The system memory remains in very low level during all time and suddenly creates a after the code runs for about 2 hours. I have 16GB ram. Am I using buffering wrongly? Any idea?

我试图读取一个59 GB的文件,并根据在每行的乞讨中找到的一些ID将其分解为多个新文件。我正在运行下面的代码,生成45GB文件后出现内存错误。系统内存始终处于非常低的水平,并在代码运行约2小时后突然创建。我有16GB内存。我错误地使用缓冲吗?任何的想法?

outputFile = '/home/.../folder1'
directory = '/home/.../folder2/'

with open(directory + 'aldk_tab_1mn.csv', 'r', buffering=50000000) as fin: 
    firstLine = fin.readline()
    print(firstLine)

    for line in fin:
        testChar = line[0:4]
        if testChar[0] == 'A' :
            if not os.path.exists(outputFile + '/A/' + testChar+'.csv'):   # first time open a file             
                with open(outputFile + '/A/' + testChar+'.csv', 'a') as foutA:
                    print('file', testChar, 'created')                    
                    foutA.write(firstLine)
                    foutA.write(line)          
            else: 
                with open(outputFile + '/A/' + testChar+'.csv', 'a') as foutA:
                    foutA.write(line)          
        else:
            if not os.path.exists(outputFile + '/B/' + testChar+'.csv'):   # first time open a file             
                with open(outputFile + '/B/' + testChar+'.csv', 'a') as foutB:
                    print('file', testChar, 'created')
                    foutB.write(firstLine)
                    foutB.write(line) 
            else: 
                with open(outputFile + '/B/' + testChar+'.csv', 'a') as foutB:
                    foutB.write(line)   

The produced error is

产生的错误是

MemoryError                               
Traceback (most recent call last)
 <ipython-input-17-761f2fcce982> in <module>()
  6 
----> 7     for line in fin:
  8         testChar = line[0:4]
  9         if testChar[0] == 'A' :

MemoryError: 

1 个解决方案

#1


0  

Thanks for the responses, it appears that the file had empty characters after a certain point making the line variable to explode as @Martijn suggested!

感谢您的回复,看起来该文件在某个点之后有空字符,使得该行变量在@Martijn建议时爆炸!

So I did slice my file! Thanks guys!

所以我确实切了我的文件!多谢你们!

#1


0  

Thanks for the responses, it appears that the file had empty characters after a certain point making the line variable to explode as @Martijn suggested!

感谢您的回复,看起来该文件在某个点之后有空字符,使得该行变量在@Martijn建议时爆炸!

So I did slice my file! Thanks guys!

所以我确实切了我的文件!多谢你们!