如何在Perl中限制CGI文件上传中的文件类型?

时间:2021-12-26 16:04:02

I am using CGI to allow the user to upload some files. I just want the just to be able to upload .txt or .csv files. If the user uploads file with any other format then I want to be able to put out an error message.

我使用CGI允许用户上传一些文件。我只是希望能够上传.txt或.csv文件。如果用户上传任何其他格式的文件,那么我希望能够发出错误消息。

I saw that this can be done by javascript: http://www.codestore.net/store.nsf/unid/DOMM-4Q8H9E

我看到这可以通过javascript完成:http://www.codestore.net/store.nsf/unid/DOMM-4Q8H9E

But is there a better way to achieve this? Is there is some functionality in Perl that allows this?

但有没有更好的方法来实现这一目标? Perl中是否有一些功能允许这样做?

4 个解决方案

#1


The disclaimer on the site to you link to is important:

您链接到的网站上的免责声明非常重要:

Note: This is not entirely foolproof as people can easily change the extension of a file before uploading it, or do some other trickery, as in the case of the "LoveBug" virus.

注意:这并非完全万无一失,因为人们可以在上传之前轻松更改文件的扩展名,或者像“LoveBug”病毒那样做一些其他的诡计。

If you really want to do this right, let the user upload the file, and then use something like File::MimeInfo::Magic (or file(1), the UNIX utility) to guess the actual file type. If you don't like the file type, delete the file and give the user an error message.

如果你真的想要这样做,让用户上传文件,然后使用File :: MimeInfo :: Magic(或文件(1),UNIX实用程序)来猜测实际的文件类型。如果您不喜欢文件类型,请删除该文件并向用户提供错误消息。

#2


I just want the just to be able to upload .txt or .csv files.

我只是希望能够上传.txt或.csv文件。

Sounds easy, doesn't it? It's not. And then some.

听起来很简单,不是吗?不是。然后还有一些。

The simple approach is just to test that the file ends in ‘.txt’ or ‘.csv’ before storing it on the filesystem. This should be part of a much more in-depth validation of what the filename is allowed to contain before you let a user-submitted filename anywhere near the filesystem.

简单的方法是在将文件存储到文件系统之前测试文件以'.txt'或'.csv'结尾。在将用户提交的文件名放在文件系统附近的任何位置之前,这应该是对文件名允许包含的内容进行更深入验证的一部分。

Because the rules about what can go in a filename are complex on some platforms (especially Windows) it's usually best to create your own filename independently with a known-good name and extension.

因为关于文件名中的内容的规则在某些平台(尤其是Windows)上很复杂,所以通常最好使用已知良好的名称和扩展名独立创建自己的文件名。

In any case there is no guarantee that the browser will send you a file with a usable name at all, and even if it does there is no guarantee that name will have ‘.txt’ or ‘.csv’ at the end, even if it is a text or CSV file. (Some platforms simply do not use extensions for file typing.)

在任何情况下,都不能保证浏览器会向您发送一个具有可用名称的文件,即使它确实存在,也不能保证名称最后会有'.txt'或'.csv',即使它是文本或CSV文件。 (有些平台根本不使用扩展名进行文件输入。)

Whilst you can try to sniff the contents of the file to see what type it might be, this is highly unreliable. For example:

虽然您可以尝试嗅探文件的内容以查看它可能是什么类型,但这非常不可靠。例如:

<html>,<body>,</body>,</html>

could be plain text, CSV, HTML, XML, or a variety of other formats. Better to give the user an explicit control to say what file type they're uploading (or use one file upload field per type).

可以是纯文本,CSV,HTML,XML或各种其他格式。最好给用户一个明确的控制来说明他们上传的文件类型(或者每种类型使用一个文件上传字段)。

Now here's where it gets really nasty. Say you've accepted the upload and stored it as /data/mygoodfilename.txt, and the web server is correctly serving it as the Content-Type ‘text/plain’. What do you think the browser interprets it as? Plain text? You should be so lucky.

现在这里变得非常讨厌。假设您已接受上传并将其存储为/data/mygoodfilename.txt,并且Web服务器正确地将其作为Content-Type“text / plain”提供。您认为浏览器将其解释为什么?纯文本?你应该这么幸运。

The problem is that browsers (primarily IE) don't trust your Content-Type header, and instead sniff the contents of the file to see if it looks like something else. Serve the above snippet as plain text, and IE will happily treat it as HTML. This can be a huge problem, because HTML can include client-side scripts that will take over the user's access to the site (a cross-site-scripting attack).

问题是浏览器(主要是IE)不信任您的Content-Type标头,而是嗅探文件的内容以查看它是否与其他内容相似。将上述代码段作为纯文本提供,IE很乐意将其视为HTML。这可能是一个很大的问题,因为HTML可以包含客户端脚本,这些脚本将接管用户对站点的访问(跨站点脚本攻击)。

At this point you might be tempted to sniff the file on the server-side, for example using the ‘file’ command, to check it doesn't contain ‘<html>’. But this is doomed to failure. The ‘file’ command does not sniff for all the same HTML tags as IE does, and other browsers sniff differently anyway. It's quite easy to prepare a file that ‘file’ will claim is not HTML, but that IE will nevertheless treat as if it is (with security-disaster implications).

此时,您可能想要在服务器端嗅探文件,例如使用'file'命令检查它是否包含''。但这注定要失败。 'file'命令不会像IE那样嗅探所有相同的HTML标记,而其他浏览器无论如何都会嗤之以鼻。准备一个'文件'声称不是HTML的文件很容易,但是IE仍然可以将其视为具有安全 - 灾难影响的文件。

Content-sniffing approaches such as ‘file’ will give you only a false sense of security. This is a convenience tool for loose guessing of filetypes and not an effective security measure.

诸如“文件”之类的内容嗅探方法只会给你一种错误的安全感。这是一种方便的工具,可以轻松猜测文件类型,而不是有效的安全措施。

At this point your last desperate possibilities are things like:

在这一点上,你最后的绝望可能性是:

  • serving all user-uploaded files from a separate hostname, so that a script injection attack can't purloin the credentials of your main site;

    从单独的主机名提供所有用户上传的文件,以便脚本注入攻击无法取消主站点的凭据;

  • serving all user-uploaded files through a CGI wrapper, adding the header ‘Content-Disposition: attachment’ so that browsers won't attempt to display them directly;

    通过CGI包装器提供所有用户上传的文件,添加标题“Content-Disposition:attachment”,以便浏览器不会尝试直接显示它们;

  • only accepting uploads from trusted users.

    仅接受来自可信用户的上传。

#3


On unix the easiest way is to do an JRockway suggested. If not on unix then your options are limited. You can examine the file extension and you can examine the contents to verify. I'm assuming for you specific case that you only want "* seperated value" text files. So one of the Text::CSV::* modules may be useful in verifying the file is the type you asked for.

在unix上最简单的方法是建立一个JRockway。如果没有在unix上,那么你的选择是有限的。您可以检查文件扩展名,然后检查要验证的内容。我假设您只需要“*分离值”文本文件的特定情况。因此,其中一个Text :: CSV :: *模块在验证文件是您要求的类型时可能很有用。

Security for this operation is a whole other ball of wax.

这项行动的安全性是另一个蜡球。

#4


try this:

$file_name = "file.txt";

$file_cmd  = "file \"$file_name"\";

$file_type = `$file_cmd`;

return 0 unless($file_type =~ /(ASCII|text)/i)

#1


The disclaimer on the site to you link to is important:

您链接到的网站上的免责声明非常重要:

Note: This is not entirely foolproof as people can easily change the extension of a file before uploading it, or do some other trickery, as in the case of the "LoveBug" virus.

注意:这并非完全万无一失,因为人们可以在上传之前轻松更改文件的扩展名,或者像“LoveBug”病毒那样做一些其他的诡计。

If you really want to do this right, let the user upload the file, and then use something like File::MimeInfo::Magic (or file(1), the UNIX utility) to guess the actual file type. If you don't like the file type, delete the file and give the user an error message.

如果你真的想要这样做,让用户上传文件,然后使用File :: MimeInfo :: Magic(或文件(1),UNIX实用程序)来猜测实际的文件类型。如果您不喜欢文件类型,请删除该文件并向用户提供错误消息。

#2


I just want the just to be able to upload .txt or .csv files.

我只是希望能够上传.txt或.csv文件。

Sounds easy, doesn't it? It's not. And then some.

听起来很简单,不是吗?不是。然后还有一些。

The simple approach is just to test that the file ends in ‘.txt’ or ‘.csv’ before storing it on the filesystem. This should be part of a much more in-depth validation of what the filename is allowed to contain before you let a user-submitted filename anywhere near the filesystem.

简单的方法是在将文件存储到文件系统之前测试文件以'.txt'或'.csv'结尾。在将用户提交的文件名放在文件系统附近的任何位置之前,这应该是对文件名允许包含的内容进行更深入验证的一部分。

Because the rules about what can go in a filename are complex on some platforms (especially Windows) it's usually best to create your own filename independently with a known-good name and extension.

因为关于文件名中的内容的规则在某些平台(尤其是Windows)上很复杂,所以通常最好使用已知良好的名称和扩展名独立创建自己的文件名。

In any case there is no guarantee that the browser will send you a file with a usable name at all, and even if it does there is no guarantee that name will have ‘.txt’ or ‘.csv’ at the end, even if it is a text or CSV file. (Some platforms simply do not use extensions for file typing.)

在任何情况下,都不能保证浏览器会向您发送一个具有可用名称的文件,即使它确实存在,也不能保证名称最后会有'.txt'或'.csv',即使它是文本或CSV文件。 (有些平台根本不使用扩展名进行文件输入。)

Whilst you can try to sniff the contents of the file to see what type it might be, this is highly unreliable. For example:

虽然您可以尝试嗅探文件的内容以查看它可能是什么类型,但这非常不可靠。例如:

<html>,<body>,</body>,</html>

could be plain text, CSV, HTML, XML, or a variety of other formats. Better to give the user an explicit control to say what file type they're uploading (or use one file upload field per type).

可以是纯文本,CSV,HTML,XML或各种其他格式。最好给用户一个明确的控制来说明他们上传的文件类型(或者每种类型使用一个文件上传字段)。

Now here's where it gets really nasty. Say you've accepted the upload and stored it as /data/mygoodfilename.txt, and the web server is correctly serving it as the Content-Type ‘text/plain’. What do you think the browser interprets it as? Plain text? You should be so lucky.

现在这里变得非常讨厌。假设您已接受上传并将其存储为/data/mygoodfilename.txt,并且Web服务器正确地将其作为Content-Type“text / plain”提供。您认为浏览器将其解释为什么?纯文本?你应该这么幸运。

The problem is that browsers (primarily IE) don't trust your Content-Type header, and instead sniff the contents of the file to see if it looks like something else. Serve the above snippet as plain text, and IE will happily treat it as HTML. This can be a huge problem, because HTML can include client-side scripts that will take over the user's access to the site (a cross-site-scripting attack).

问题是浏览器(主要是IE)不信任您的Content-Type标头,而是嗅探文件的内容以查看它是否与其他内容相似。将上述代码段作为纯文本提供,IE很乐意将其视为HTML。这可能是一个很大的问题,因为HTML可以包含客户端脚本,这些脚本将接管用户对站点的访问(跨站点脚本攻击)。

At this point you might be tempted to sniff the file on the server-side, for example using the ‘file’ command, to check it doesn't contain ‘<html>’. But this is doomed to failure. The ‘file’ command does not sniff for all the same HTML tags as IE does, and other browsers sniff differently anyway. It's quite easy to prepare a file that ‘file’ will claim is not HTML, but that IE will nevertheless treat as if it is (with security-disaster implications).

此时,您可能想要在服务器端嗅探文件,例如使用'file'命令检查它是否包含''。但这注定要失败。 'file'命令不会像IE那样嗅探所有相同的HTML标记,而其他浏览器无论如何都会嗤之以鼻。准备一个'文件'声称不是HTML的文件很容易,但是IE仍然可以将其视为具有安全 - 灾难影响的文件。

Content-sniffing approaches such as ‘file’ will give you only a false sense of security. This is a convenience tool for loose guessing of filetypes and not an effective security measure.

诸如“文件”之类的内容嗅探方法只会给你一种错误的安全感。这是一种方便的工具,可以轻松猜测文件类型,而不是有效的安全措施。

At this point your last desperate possibilities are things like:

在这一点上,你最后的绝望可能性是:

  • serving all user-uploaded files from a separate hostname, so that a script injection attack can't purloin the credentials of your main site;

    从单独的主机名提供所有用户上传的文件,以便脚本注入攻击无法取消主站点的凭据;

  • serving all user-uploaded files through a CGI wrapper, adding the header ‘Content-Disposition: attachment’ so that browsers won't attempt to display them directly;

    通过CGI包装器提供所有用户上传的文件,添加标题“Content-Disposition:attachment”,以便浏览器不会尝试直接显示它们;

  • only accepting uploads from trusted users.

    仅接受来自可信用户的上传。

#3


On unix the easiest way is to do an JRockway suggested. If not on unix then your options are limited. You can examine the file extension and you can examine the contents to verify. I'm assuming for you specific case that you only want "* seperated value" text files. So one of the Text::CSV::* modules may be useful in verifying the file is the type you asked for.

在unix上最简单的方法是建立一个JRockway。如果没有在unix上,那么你的选择是有限的。您可以检查文件扩展名,然后检查要验证的内容。我假设您只需要“*分离值”文本文件的特定情况。因此,其中一个Text :: CSV :: *模块在验证文件是您要求的类型时可能很有用。

Security for this operation is a whole other ball of wax.

这项行动的安全性是另一个蜡球。

#4


try this:

$file_name = "file.txt";

$file_cmd  = "file \"$file_name"\";

$file_type = `$file_cmd`;

return 0 unless($file_type =~ /(ASCII|text)/i)