正在上传的文件的零星损坏

时间:2021-06-01 16:01:48

The below piece of code takes a file sent in an HTTP Request (Ajax) and saves it to the server. The code was written by someone else, but I had to modify it recently to add a unique identifier to the file so that existing files with the same name don't get overwritten. Essentially, I added these lines:

下面这段代码接收发送到HTTP请求(Ajax)中的文件并将其保存到服务器。这段代码是别人写的,但我最近不得不修改它,以便在文件中添加唯一标识符,这样就不会覆盖同名的现有文件。实质上,我添加了以下几行:

#uid is a GUID
if os.path.isfile(destination):
            destination = os.path.splitext(destination)[0] + str(uid) + os.path.splitext(destination)[1]
            name  = os.path.splitext(name)[0] + str(uid) + os.path.splitext(name)[1]

The problem I am seeing now is that some times the files for which I add the UID to the file name to guarantee uniqueness, end up being corrupted. It doesn't always happen - most of the times the files are saved correctly but in at least 4 cases out of 11 in the last 7 days, the files have been corrupted and it only happened to the files for which the UID was added to the name before saving it to the file system. Is there anything wrong with this code that may cause file corruption?

我现在看到的问题是,有时我将UID添加到文件名以保证惟一性的文件会被破坏。它并不总是发生——大多数时候的文件保存正确但至少4例11在过去的7天,文件已经损坏,它只发生在UID的文件被添加到这个名字保存到文件系统。这段代码有什么问题会导致文件损坏吗?

Here's the full context of the method being used:

以下是所使用方法的完整上下文:

if form.is_valid():
        id = request.REQUEST.get('id','')
        file = request.FILES['file']
        chunk = request.REQUEST.get('chunk','0')
        chunks = request.REQUEST.get('chunks','0')
        name = request.REQUEST.get('name','')
        destination = settings.MEDIA_ROOT+'/files/%s' % name
        # If the code goes into the below IF, the file MAY get corrupted.
        if os.path.isfile(destination):
            destination = os.path.splitext(destination)[0] + str(uid) + os.path.splitext(destination)[1]
            name  = os.path.splitext(name)[0] + str(uid) + os.path.splitext(name)[1]
        with open(destination, ('wb' if chunk == '0' else 'ab')) as f:  
            for content in file.chunks():  
                f.write(content)  
        if int(chunk) + 1 >= int(chunks):
            if not Attachment.objects.filter(uuid=uid,user=username,name=name):
                form.save(name,username,uid,id)

    response = HttpResponse(json.dumps({"jsonrpc" : "2.0", "result" : None, "id" : "id"}), mimetype='text/plain; charset=UTF-8')  
    return response

1 个解决方案

#1


1  

It looks like the problem is the lifetime of the file uid. This is not a problem for a single file upload, but becomes a problem when the chunked uploading functionality of the code is used.

看起来问题是文件uid的生命周期。对于单个文件上传来说,这不是问题,但是当使用代码的分段上传功能时,这就成了问题。

Because uid is generated per request and each file chunk is uploaded in a separate request, each chunk receives a different uid. This in turn causes chunk 1 to go file 1, chunk 2 to file 2, resulting in corruption.

因为uid是根据每个请求生成的,并且每个文件块在单独的请求中上载,所以每个块接收不同的uid。这进而导致块1转到文件1,块2转到文件2,导致损坏。

One workaround is to set the uid based on the session key, available via request.session.session_key. Because of the cryptographic properties of the session key, it should also be "reasonably unique" for the purpose.

一种解决方案是基于会话密钥设置uid,通过request.session.session_key可用。由于会话密钥的密码属性,因此它也应该是“合理惟一的”。

Note however, that there is a potential security risk, if the file path is exposed to the web or even if /media/ can be directory-listed, because you are exposing the session key to the web (the session key is the only thing that protects access to an active session).

但是,请注意,如果文件路径暴露在web上,或者即使/media/可以是直接列出的,因为您将会话密钥公开给web(会话密钥是唯一能够保护对活动会话的访问的东西),那么存在潜在的安全风险。

Another, more secure method is to assign a unique UUID to each session via a session variable. This is probably best done in middleware:

另一种更安全的方法是通过会话变量为每个会话分配唯一的UUID。这在中间件中可能是最好的:

class SessionUUIDMiddleware(object):
    def process_request(request):
        session_uuid = request.session.get('uuid', None)
        if not session_uuid:
            session_uuid = uuid.uuid1()
            request.session['uuid'] = session_uuid

This disconnects the unique id from the session key.

这将从会话密钥断开唯一的id。

#1


1  

It looks like the problem is the lifetime of the file uid. This is not a problem for a single file upload, but becomes a problem when the chunked uploading functionality of the code is used.

看起来问题是文件uid的生命周期。对于单个文件上传来说,这不是问题,但是当使用代码的分段上传功能时,这就成了问题。

Because uid is generated per request and each file chunk is uploaded in a separate request, each chunk receives a different uid. This in turn causes chunk 1 to go file 1, chunk 2 to file 2, resulting in corruption.

因为uid是根据每个请求生成的,并且每个文件块在单独的请求中上载,所以每个块接收不同的uid。这进而导致块1转到文件1,块2转到文件2,导致损坏。

One workaround is to set the uid based on the session key, available via request.session.session_key. Because of the cryptographic properties of the session key, it should also be "reasonably unique" for the purpose.

一种解决方案是基于会话密钥设置uid,通过request.session.session_key可用。由于会话密钥的密码属性,因此它也应该是“合理惟一的”。

Note however, that there is a potential security risk, if the file path is exposed to the web or even if /media/ can be directory-listed, because you are exposing the session key to the web (the session key is the only thing that protects access to an active session).

但是,请注意,如果文件路径暴露在web上,或者即使/media/可以是直接列出的,因为您将会话密钥公开给web(会话密钥是唯一能够保护对活动会话的访问的东西),那么存在潜在的安全风险。

Another, more secure method is to assign a unique UUID to each session via a session variable. This is probably best done in middleware:

另一种更安全的方法是通过会话变量为每个会话分配唯一的UUID。这在中间件中可能是最好的:

class SessionUUIDMiddleware(object):
    def process_request(request):
        session_uuid = request.session.get('uuid', None)
        if not session_uuid:
            session_uuid = uuid.uuid1()
            request.session['uuid'] = session_uuid

This disconnects the unique id from the session key.

这将从会话密钥断开唯一的id。