你如何在python中解压缩非常大的文件?

时间:2022-10-30 13:11:17

Using python 2.4 and the built-in ZipFile library, I cannot read very large zip files (greater than 1 or 2 GB) because it wants to store the entire contents of the uncompressed file in memory. Is there another way to do this (either with a third-party library or some other hack), or must I "shell out" and unzip it that way (which isn't as cross-platform, obviously).

使用python 2.4和内置的ZipFile库,我无法读取非常大的zip文件(大于1或2 GB),因为它想要将未压缩文件的全部内容存储在内存中。有没有其他方法可以做到这一点(使用第三方库或其他一些黑客攻击),或者我必须“解决”并以这种方式解压缩(显然不是跨平台)。

2 个解决方案

#1


17  

Here's an outline of decompression of large files.

这是大文件解压缩的概述。

import zipfile
import zlib
import os

src = open( doc, "rb" )
zf = zipfile.ZipFile( src )
for m in  zf.infolist():

    # Examine the header
    print m.filename, m.header_offset, m.compress_size, repr(m.extra), repr(m.comment)
    src.seek( m.header_offset )
    src.read( 30 ) # Good to use struct to unpack this.
    nm= src.read( len(m.filename) )
    if len(m.extra) > 0: ex= src.read( len(m.extra) )
    if len(m.comment) > 0: cm= src.read( len(m.comment) ) 

    # Build a decompression object
    decomp= zlib.decompressobj(-15)

    # This can be done with a loop reading blocks
    out= open( m.filename, "wb" )
    result= decomp.decompress( src.read( m.compress_size ) )
    out.write( result )
    result = decomp.flush()
    out.write( result )
    # end of the loop
    out.close()

zf.close()
src.close()

#2


11  

As of Python 2.6, you can use ZipFile.open() to open a file handle on a file, and copy contents efficiently to a target file of your choosing:

从Python 2.6开始,您可以使用ZipFile.open()打开文件上的文件句柄,并将内容有效地复制到您选择的目标文件中:

import errno
import os
import shutil
import zipfile

TARGETDIR = '/foo/bar/baz'

with open(doc, "rb") as zipsrc:
    zfile = zipfile.ZipFile(zipsrc)
    for member in zfile.infolist():
       target_path = os.path.join(TARGETDIR, member.filename)
       if target_path.endswith('/'):  # folder entry, create
           try:
               os.makedirs(target_path)
           except (OSError, IOError) as err:
               # Windows may complain if the folders already exist
               if err.errno != errno.EEXIST:
                   raise
           continue
       with open(target_path, 'wb') as outfile, zfile.open(member) as infile:
           shutil.copyfileobj(infile, outfile)

This uses shutil.copyfileobj() to efficiently read data from the open zipfile object, copying it over to the output file.

这使用shutil.copyfileobj()有效地从打开的zipfile对象读取数据,将其复制到输出文件。

#1


17  

Here's an outline of decompression of large files.

这是大文件解压缩的概述。

import zipfile
import zlib
import os

src = open( doc, "rb" )
zf = zipfile.ZipFile( src )
for m in  zf.infolist():

    # Examine the header
    print m.filename, m.header_offset, m.compress_size, repr(m.extra), repr(m.comment)
    src.seek( m.header_offset )
    src.read( 30 ) # Good to use struct to unpack this.
    nm= src.read( len(m.filename) )
    if len(m.extra) > 0: ex= src.read( len(m.extra) )
    if len(m.comment) > 0: cm= src.read( len(m.comment) ) 

    # Build a decompression object
    decomp= zlib.decompressobj(-15)

    # This can be done with a loop reading blocks
    out= open( m.filename, "wb" )
    result= decomp.decompress( src.read( m.compress_size ) )
    out.write( result )
    result = decomp.flush()
    out.write( result )
    # end of the loop
    out.close()

zf.close()
src.close()

#2


11  

As of Python 2.6, you can use ZipFile.open() to open a file handle on a file, and copy contents efficiently to a target file of your choosing:

从Python 2.6开始,您可以使用ZipFile.open()打开文件上的文件句柄,并将内容有效地复制到您选择的目标文件中:

import errno
import os
import shutil
import zipfile

TARGETDIR = '/foo/bar/baz'

with open(doc, "rb") as zipsrc:
    zfile = zipfile.ZipFile(zipsrc)
    for member in zfile.infolist():
       target_path = os.path.join(TARGETDIR, member.filename)
       if target_path.endswith('/'):  # folder entry, create
           try:
               os.makedirs(target_path)
           except (OSError, IOError) as err:
               # Windows may complain if the folders already exist
               if err.errno != errno.EEXIST:
                   raise
           continue
       with open(target_path, 'wb') as outfile, zfile.open(member) as infile:
           shutil.copyfileobj(infile, outfile)

This uses shutil.copyfileobj() to efficiently read data from the open zipfile object, copying it over to the output file.

这使用shutil.copyfileobj()有效地从打开的zipfile对象读取数据,将其复制到输出文件。