如何为HTTP头编码UTF8文件名?(Python,Django)

时间:2023-01-04 19:57:25

I have problem with HTTP headers, they're encoded in ASCII and I want to provided a view for downloading files that names can be non ASCII.

我对HTTP头有问题,它们是用ASCII编码的,我想提供一个视图来下载文件名可以是非ASCII的文件。

response['Content-Disposition'] = 'attachment; filename="%s"' % (vo.filename.encode("ASCII","replace"), )

I don't want to use static files serving for same issue with non ASCII file names but in this case there would be a problem with File system and it's file name encoding. (I don't know target os.)

我不想用静态文件来处理非ASCII文件名的问题但是在这种情况下文件系统和文件名编码会有问题。(我不知道target os。)

I've already tried urllib.quote(), but it raises KeyError exception.

我已经尝试过urllib.quote(),但它会引发KeyError异常。

Possibly I'm doing something wrong but maybe it's impossible.

也许我做错了什么事,但也许这是不可能的。

5 个解决方案

#1


34  

This is a FAQ.

这是一个常见问题。

There is no interoperable way to do this. Some browsers implement proprietary extensions (IE, Chrome), other implement RFC 2231 (Firefox, Opera).

没有可互操作的方法来实现这一点。一些浏览器实现了私有扩展(即Chrome),其他实现RFC 2231 (Firefox, Opera)。

See test cases at http://greenbytes.de/tech/tc2231/.

请参见http://greenbytes.de/tech/tc2231/上的测试用例。

Update: as of November 2012, all current desktop browsers support the encoding defined in RFC 6266 and RFC 5987 (Safari >= 6, IE >= 9, Chrome, Firefox, Opera, Konqueror).

更新:截至2012年11月,所有当前的桌面浏览器都支持RFC 6266和RFC 5987 (Safari >= 6, IE >= 9, Chrome, Firefox, Opera, Konqueror)中定义的编码。

#2


30  

Don't send a filename in Content-Disposition. There is no way to make non-ASCII header parameters work cross-browser(*).

不要在内容配置中发送文件名。没有办法让非ascii头参数跨浏览器工作(*)。

Instead, send just “Content-Disposition: attachment”, and leave the filename as a URL-encoded UTF-8 string in the trailing (PATH_INFO) part of your URL, for the browser to pick up and use by default. UTF-8 URLs are handled much more reliably by browsers than anything to do with Content-Disposition.

相反,只需发送“内容配置:附件”,并将文件名保留为URL末尾(PATH_INFO)部分中的URL编码UTF-8字符串,以便浏览器在默认情况下接收和使用。与内容处理相比,浏览器更可靠地处理UTF-8 url。

(*: actually, there's not even a current standard that says how it should be done as the relationships between RFCs 2616, 2231 and 2047 are pretty dysfunctional, something that Julian is trying to get cleared up at a spec level. Consistent browser support is in the distant future.)

(*:事实上,目前甚至还没有一个标准来规定应该怎么做,因为RFCs 2616、2231和2047之间的关系是非常不正常的,朱利安试图在规范的层面上澄清这一点。一致的浏览器支持是在遥远的将来。

#3


27  

Note that in 2011, RFC 6266 (especially Appendix D) weighed in on this issue and has specific recommendations to follow.

请注意,在2011年,RFC 6266(特别是附录D)对这个问题进行了权衡,并提出了具体的建议。

Namely, you can issue a filename with only ASCII characters, followed by filename* with a RFC 5987-formatted filename for those agents that understand it.

换句话说,您可以发出一个只包含ASCII字符的文件名,然后是文件名*,后面是理解它的代理的RFC 5987格式的文件名。

Typically this will look like filename="my-resume.pdf"; filename*=UTF-8''My%20R%C3%A9sum%C3%A9.pdf, where the Unicode filename ("My Résumé.pdf") is encoded into UTF-8 and then percent-encoded (note, do NOT use + for spaces).

通常这看起来像filename="my-resu .pdf";文件名* = utf - 8“我% 20 r % C3%A9sum % C3%A9。pdf,其中Unicode文件名(“My resu .pdf”)被编码为UTF-8,然后是百分比编码(注意,空格不使用+)。

Please do actually read RFC 6266 and RFC 5987 (or use a robust and tested library that abstracts this for you), as my summary here is lacking in important detail.

请务必阅读RFC 6266和RFC 5987(或使用健壮且经过测试的库为您抽象这一点),因为我在这里的总结缺乏重要的细节。

#4


2  

I can say that I've had success using the newer (RFC 5987) format of specifying a header encoded with the e-mail form (RFC 2231). I came up with the following solution which is based on code from the django-sendfile project.

我可以说我已经成功地使用了新的(RFC 5987)格式指定了一个用电子邮件形式(RFC 2231)编码的头。我提出了以下基于来自django-sendfile项目的代码的解决方案。

import unicodedata
from django.utils.http import urlquote

def rfc5987_content_disposition(file_name):
    ascii_name = unicodedata.normalize('NFKD', file_name).encode('ascii','ignore').decode()
    header = 'attachment; filename="{}"'.format(ascii_name)
    if ascii_name != file_name:
        quoted_name = urlquote(file_name)
        header += '; filename*=UTF-8\'\'{}'.format(quoted_name)

    return header

# e.g.
  # request['Content-Disposition'] = rfc5987_content_disposition(file_name)

I have only tested my code on Python 3.4 with Django 1.8. So the similar solution in django-sendfile may suite you better.

我只在Python 3.4和Django 1.8上测试了代码。所以类似的django-sendfile解决方案可能更适合您。

There's a long standing ticket in Django's tracker which acknowledges this but no patches have yet been proposed afaict. So unfortunately this is as close to using a robust tested library as I could find, please let me know if there's a better solution.

在Django的跟踪器中有一张长期的票,它承认这一点,但是还没有提出任何补丁。不幸的是,这和使用健壮的测试库非常接近,请告诉我是否有更好的解决方案。

#5


0  

A hack:

黑客:

if (Request.UserAgent.Contains("IE"))
{
  // IE will accept URL encoding, but spaces don't need to be, and since they're so common..
  filename = filename.Replace("%", "%25").Replace(";", "%3B").Replace("#", "%23").Replace("&", "%26");
}

#1


34  

This is a FAQ.

这是一个常见问题。

There is no interoperable way to do this. Some browsers implement proprietary extensions (IE, Chrome), other implement RFC 2231 (Firefox, Opera).

没有可互操作的方法来实现这一点。一些浏览器实现了私有扩展(即Chrome),其他实现RFC 2231 (Firefox, Opera)。

See test cases at http://greenbytes.de/tech/tc2231/.

请参见http://greenbytes.de/tech/tc2231/上的测试用例。

Update: as of November 2012, all current desktop browsers support the encoding defined in RFC 6266 and RFC 5987 (Safari >= 6, IE >= 9, Chrome, Firefox, Opera, Konqueror).

更新:截至2012年11月,所有当前的桌面浏览器都支持RFC 6266和RFC 5987 (Safari >= 6, IE >= 9, Chrome, Firefox, Opera, Konqueror)中定义的编码。

#2


30  

Don't send a filename in Content-Disposition. There is no way to make non-ASCII header parameters work cross-browser(*).

不要在内容配置中发送文件名。没有办法让非ascii头参数跨浏览器工作(*)。

Instead, send just “Content-Disposition: attachment”, and leave the filename as a URL-encoded UTF-8 string in the trailing (PATH_INFO) part of your URL, for the browser to pick up and use by default. UTF-8 URLs are handled much more reliably by browsers than anything to do with Content-Disposition.

相反,只需发送“内容配置:附件”,并将文件名保留为URL末尾(PATH_INFO)部分中的URL编码UTF-8字符串,以便浏览器在默认情况下接收和使用。与内容处理相比,浏览器更可靠地处理UTF-8 url。

(*: actually, there's not even a current standard that says how it should be done as the relationships between RFCs 2616, 2231 and 2047 are pretty dysfunctional, something that Julian is trying to get cleared up at a spec level. Consistent browser support is in the distant future.)

(*:事实上,目前甚至还没有一个标准来规定应该怎么做,因为RFCs 2616、2231和2047之间的关系是非常不正常的,朱利安试图在规范的层面上澄清这一点。一致的浏览器支持是在遥远的将来。

#3


27  

Note that in 2011, RFC 6266 (especially Appendix D) weighed in on this issue and has specific recommendations to follow.

请注意,在2011年,RFC 6266(特别是附录D)对这个问题进行了权衡,并提出了具体的建议。

Namely, you can issue a filename with only ASCII characters, followed by filename* with a RFC 5987-formatted filename for those agents that understand it.

换句话说,您可以发出一个只包含ASCII字符的文件名,然后是文件名*,后面是理解它的代理的RFC 5987格式的文件名。

Typically this will look like filename="my-resume.pdf"; filename*=UTF-8''My%20R%C3%A9sum%C3%A9.pdf, where the Unicode filename ("My Résumé.pdf") is encoded into UTF-8 and then percent-encoded (note, do NOT use + for spaces).

通常这看起来像filename="my-resu .pdf";文件名* = utf - 8“我% 20 r % C3%A9sum % C3%A9。pdf,其中Unicode文件名(“My resu .pdf”)被编码为UTF-8,然后是百分比编码(注意,空格不使用+)。

Please do actually read RFC 6266 and RFC 5987 (or use a robust and tested library that abstracts this for you), as my summary here is lacking in important detail.

请务必阅读RFC 6266和RFC 5987(或使用健壮且经过测试的库为您抽象这一点),因为我在这里的总结缺乏重要的细节。

#4


2  

I can say that I've had success using the newer (RFC 5987) format of specifying a header encoded with the e-mail form (RFC 2231). I came up with the following solution which is based on code from the django-sendfile project.

我可以说我已经成功地使用了新的(RFC 5987)格式指定了一个用电子邮件形式(RFC 2231)编码的头。我提出了以下基于来自django-sendfile项目的代码的解决方案。

import unicodedata
from django.utils.http import urlquote

def rfc5987_content_disposition(file_name):
    ascii_name = unicodedata.normalize('NFKD', file_name).encode('ascii','ignore').decode()
    header = 'attachment; filename="{}"'.format(ascii_name)
    if ascii_name != file_name:
        quoted_name = urlquote(file_name)
        header += '; filename*=UTF-8\'\'{}'.format(quoted_name)

    return header

# e.g.
  # request['Content-Disposition'] = rfc5987_content_disposition(file_name)

I have only tested my code on Python 3.4 with Django 1.8. So the similar solution in django-sendfile may suite you better.

我只在Python 3.4和Django 1.8上测试了代码。所以类似的django-sendfile解决方案可能更适合您。

There's a long standing ticket in Django's tracker which acknowledges this but no patches have yet been proposed afaict. So unfortunately this is as close to using a robust tested library as I could find, please let me know if there's a better solution.

在Django的跟踪器中有一张长期的票,它承认这一点,但是还没有提出任何补丁。不幸的是,这和使用健壮的测试库非常接近,请告诉我是否有更好的解决方案。

#5


0  

A hack:

黑客:

if (Request.UserAgent.Contains("IE"))
{
  // IE will accept URL encoding, but spaces don't need to be, and since they're so common..
  filename = filename.Replace("%", "%25").Replace(";", "%3B").Replace("#", "%23").Replace("&", "%26");
}