如何以编程方式确定文件的真实扩展名/类型?

时间:2021-02-18 04:14:55

I am working on a script that will process user uploads to the server, and as an added layer of security I'd like to know:

我正在编写一个脚本来处理用户上传到服务器的过程,作为一个额外的安全层,我想知道:

Is there a way to detect a file's true extension/file type, and ensure that it is not another file type masked with a different extension?

有没有办法检测文件的真正扩展名/文件类型,并确保它不是另一个用不同扩展名掩盖的文件类型?

Is there a byte stamp or some unique identifier for each type/extension?

是否有每个类型/扩展名的字节戳或一些唯一标识符?

I'd like to be able to detect that someone hasn't applied a different extension onto the file they are uploading.

我希望能够检测到有人没有在他们上传的文件中应用不同的扩展名。

Thank you,

11 个解决方案

#1


Not really, no.

不是,不是。

You will need to read the first few bytes of each file and interpret it as a header for a finite set of known filetypes. Most files have distinct file headers, some sort of metadata in the first few bytes or first few kilobytes in the case of MP3.

您需要读取每个文件的前几个字节,并将其解释为有限的一组已知文件类型的标头。大多数文件具有不同的文件头,前几个字节中的某种元数据或MP3的前几千字节。

Your program will have to simply try parsing the file for each of your accepted filetypes.

您的程序只需尝试解析每个接受的文件类型的文件。

For my program, I send the uploaded image to imagemagick in a try-catch block, and if it blows up, then I guess it was a bad image. This should be considered insecure, because I am loading arbitrary (user supplied) binary data into an external program, which is generally an attack vector. here, I am trusting imageMagick to not do anything to my system.

对于我的程序,我将上传的图像发送到try-catch块中的imagemagick,如果它爆炸,那么我猜这是一个糟糕的图像。这应该被认为是不安全的,因为我将任意(用户提供的)二进制数据加载到外部程序中,该程序通常是攻击向量。在这里,我相信imageMagick不对我的系统做任何事情。

I recommend writing your own handlers for the significant filetypes you intend to use, to avoid any attack vectors.

我建议为您打算使用的重要文件类型编写自己的处理程序,以避免任何攻击向量。

Edit: I see in PHP there are some tools to do this for you.

编辑:我在PHP中看到有一些工具可以帮到你。

Also, MIME types are what the user's browser claims the file to be. It is handy and useful to read those and act on them in your code, but it is not a secure method, because anyone sending you bad files will fake the MIME headers easily. It's sort of a front line defense to keep your code that expects a JPEG from barfing on a PNG, but if someone embedded a virus in a .exe and named it JPEG, there's no reason not to have spoofed the MIME type.

此外,MIME类型是用户的浏览器声明文件的类型。在您的代码中读取它们并对它们进行操作非常方便有用,但它不是一种安全的方法,因为任何向您发送错误文件的人都会轻易伪造MIME标头。这是一种前线防御,以保持您的代码在PNG上禁止JPEG,但如果有人在.exe中嵌入病毒并将其命名为JPEG,则没有理由不欺骗MIME类型。

#2


PHP has a couple of ways of reading file contents to determine its MIME type, depending on which version of PHP you are using:

PHP有两种方法可以读取文件内容以确定其MIME类型,具体取决于您使用的PHP版本:

Have a look at the Fileinfo functions if you're running PHP 5.3+

如果您运行的是PHP 5.3+,请查看Fileinfo函数

$finfo = finfo_open(FILEINFO_MIME); 
$type = finfo_file($finfo, $filepath);
finfo_close($finfo);  

Alternatively, check out mime_content_type for older versions.

或者,查看旧版本的mime_content_type。

$type = mime_content_type($filepath);

Note that just validating the file type isn't enough if you want to be truly secure. Someone could, for example, upload a valid JPEG file which exploits a vulnerability in a common renderer. To guard against this, you would need a well maintained virus scanner.

请注意,如果您想要真正安全,仅仅验证文件类型是不够的。例如,某人可以上传有效的JPEG文件,该文件利用了常见渲染器中的漏洞。为防止这种情况,您需要一个维护良好的病毒扫描程序。

#3


PHP has a superglobal $_FILES that holds information like size and file type. It looks like the type is taken form some sort of a header, not an extension, but I may be wrong.

PHP有一个超全局$ _FILES,它包含大小和文件类型等信息。它看起来像某种类型的标题,而不是扩展,但我可能是错的。

There is an example of it on w3schools site.

在w3schools网站上有一个例子。

I am going to test if it is can be tricked when I get a chance.

当我有机会时,我将测试是否可以欺骗它。

UPDATE:

Everyone else probably knew this, but $_FILES can be tricked. I was able to determine it this way:

其他人可能都知道这一点,但$ _FILES可能会被欺骗。我能够以这种方式确定它:

$arg = escapeshellarg( $_FILES["file"]["tmp_name"] );
system( "file $arg", $type );
echo "Real type:  " . $type;

It basically uses Unix's file command. There are probably better ways, but I haven't used PHP in a while. I usually avoid using system commands if possible.

它基本上使用Unix的文件命令。可能有更好的方法,但我有一段时间没有使用PHP。如果可能,我通常会避免使用系统命令。

#4


that could still be forged. I would ensure that you can not (or do not) run a file uploaded to the server automatically.

那可能还是伪造的。我会确保您不能(或不)自动运行上传到服务器的文件。

I would also have a virus/spy ware scanner, and let it do the work for you.

我也会有一个病毒/间谍软件扫描程序,让它为你工作。

#5


you can use below code which gives you MIME type if you have changed the extension then also

您可以使用下面的代码,如果您更改了扩展名,则会为您提供MIME类型

$finfo = finfo_open(FILEINFO_MIME_TYPE);
echo $mime = finfo_file($finfo, $_FILES['userfile']['tmp_name']);
finfo_close($finfo);

Windows users: just edit php.ini and uncomment this line:

Windows用户:只需编辑php.ini并取消注释该行:

extension=php_fileinfo.dll

Remember to restart Apache for new php.ini to take effect.

记得重新启动Apache以使新的php.ini生效。

#6


In *nix, the first two bytes of the file tells you (see "magic number"). In Windows, ...sometimes this will be true ("header info"). It is, ultimately, O.S. dependent.

在* nix中,文件的前两个字节告诉你(参见“幻数”)。在Windows中,...有时这将是真的(“标题信息”)。它最终是O.S.依赖。

#7


Executables in general have a "signature" on the first bytes; I find it hard though to really ascertain what the file type really is.

可执行文件通常在第一个字节上有一个“签名”;我觉得很难确定文件类型到底是什么。

#8


What file types do you expect? Maybe you could check that it conforms to what you expect and reject everything else.

您期望什么文件类型?也许你可以检查它是否符合你的期望并拒绝其他一切。

#9


Others have already mentioned FileInfo, which I think is the correct solution, but I'll add this just in case you can't use that one for some reason. Most (all?) *nix distros include a command called file that when run on a file will output its type. It has a switch to output in human readable format (default) or the MIME type. You could have your script invoke this program on the uploaded file and read the result. Again, this is not the preferred approach. If you're on Windows, this utility is available through Cygwin.

其他人已经提到了FileInfo,我认为这是正确的解决方案,但我会添加它,以防万一你出于某种原因不能使用它。大多数(全部?)* nix发行版都包含一个名为file的命令,当在文件上运行时会输出其类型。它具有以人类可读格式(默认)或MIME类型输出的开关。您可以让脚本在上载的文件上调用此程序并读取结果。同样,这不是首选方法。如果您使用的是Windows,则可以通过Cygwin获得此实用程序。

#10


Is checking the MIME type simply enough? I am assuming that changing the extension on a file doesn't change it's MIME type?

是否足够简单地检查MIME类型?我假设更改文件的扩展名不会改变它的MIME类型?

Is MIME type a strong enough indicator to go by here?

MIME类型是否足够强大,可以在这里找到?

Thanks for all of the responses thus far.

感谢迄今为止的所有回复。

#11


Is checking the MIME type simply enough? I am assuming that changing the extension on a file doesn't change it's MIME type? Is MIME type a strong enough indicator to go by here?

是否足够简单地检查MIME类型?我假设更改文件的扩展名不会改变它的MIME类型? MIME类型是否足够强大,可以在这里找到?

It really depends on how it's used.

这真的取决于它是如何使用的。

  • If you provide uploads and downloads, then nothing matters since it doesn't execute.
  • 如果您提供上传和下载,那么没有任何问题,因为它不会执行。

  • If it's handled by the web server, then it's going to be dependent on how the web server is configured, though subject to most of the rest of these comments.
  • 如果它由Web服务器处理,那么它将取决于Web服务器的配置方式,但要受其他大部分注释的影响。

  • If it's an image, it will either display, or not, or be the target of image library exploits. But only those.
  • 如果它是图像,它将显示或不显示或成为图像库漏洞的目标。但只有那些。

  • Something like a pdf file may not affect your server, but rather the computer of the person accessing the file.
  • 像pdf文件这样的东西可能不会影响您的服务器,而是影响访问该文件的人的计算机。

  • If it's going to be passed to a function like "system()" then we're back to the OS behavior--as if it were "double-clicked", and the file extension might even be considered.
  • 如果它将被传递给像“system()”这样的函数,那么我们将回到操作系统行为 - 就像它被“双击”一样,甚至可以考虑文件扩展名。

#1


Not really, no.

不是,不是。

You will need to read the first few bytes of each file and interpret it as a header for a finite set of known filetypes. Most files have distinct file headers, some sort of metadata in the first few bytes or first few kilobytes in the case of MP3.

您需要读取每个文件的前几个字节,并将其解释为有限的一组已知文件类型的标头。大多数文件具有不同的文件头,前几个字节中的某种元数据或MP3的前几千字节。

Your program will have to simply try parsing the file for each of your accepted filetypes.

您的程序只需尝试解析每个接受的文件类型的文件。

For my program, I send the uploaded image to imagemagick in a try-catch block, and if it blows up, then I guess it was a bad image. This should be considered insecure, because I am loading arbitrary (user supplied) binary data into an external program, which is generally an attack vector. here, I am trusting imageMagick to not do anything to my system.

对于我的程序,我将上传的图像发送到try-catch块中的imagemagick,如果它爆炸,那么我猜这是一个糟糕的图像。这应该被认为是不安全的,因为我将任意(用户提供的)二进制数据加载到外部程序中,该程序通常是攻击向量。在这里,我相信imageMagick不对我的系统做任何事情。

I recommend writing your own handlers for the significant filetypes you intend to use, to avoid any attack vectors.

我建议为您打算使用的重要文件类型编写自己的处理程序,以避免任何攻击向量。

Edit: I see in PHP there are some tools to do this for you.

编辑:我在PHP中看到有一些工具可以帮到你。

Also, MIME types are what the user's browser claims the file to be. It is handy and useful to read those and act on them in your code, but it is not a secure method, because anyone sending you bad files will fake the MIME headers easily. It's sort of a front line defense to keep your code that expects a JPEG from barfing on a PNG, but if someone embedded a virus in a .exe and named it JPEG, there's no reason not to have spoofed the MIME type.

此外,MIME类型是用户的浏览器声明文件的类型。在您的代码中读取它们并对它们进行操作非常方便有用,但它不是一种安全的方法,因为任何向您发送错误文件的人都会轻易伪造MIME标头。这是一种前线防御,以保持您的代码在PNG上禁止JPEG,但如果有人在.exe中嵌入病毒并将其命名为JPEG,则没有理由不欺骗MIME类型。

#2


PHP has a couple of ways of reading file contents to determine its MIME type, depending on which version of PHP you are using:

PHP有两种方法可以读取文件内容以确定其MIME类型,具体取决于您使用的PHP版本:

Have a look at the Fileinfo functions if you're running PHP 5.3+

如果您运行的是PHP 5.3+,请查看Fileinfo函数

$finfo = finfo_open(FILEINFO_MIME); 
$type = finfo_file($finfo, $filepath);
finfo_close($finfo);  

Alternatively, check out mime_content_type for older versions.

或者,查看旧版本的mime_content_type。

$type = mime_content_type($filepath);

Note that just validating the file type isn't enough if you want to be truly secure. Someone could, for example, upload a valid JPEG file which exploits a vulnerability in a common renderer. To guard against this, you would need a well maintained virus scanner.

请注意,如果您想要真正安全,仅仅验证文件类型是不够的。例如,某人可以上传有效的JPEG文件,该文件利用了常见渲染器中的漏洞。为防止这种情况,您需要一个维护良好的病毒扫描程序。

#3


PHP has a superglobal $_FILES that holds information like size and file type. It looks like the type is taken form some sort of a header, not an extension, but I may be wrong.

PHP有一个超全局$ _FILES,它包含大小和文件类型等信息。它看起来像某种类型的标题,而不是扩展,但我可能是错的。

There is an example of it on w3schools site.

在w3schools网站上有一个例子。

I am going to test if it is can be tricked when I get a chance.

当我有机会时,我将测试是否可以欺骗它。

UPDATE:

Everyone else probably knew this, but $_FILES can be tricked. I was able to determine it this way:

其他人可能都知道这一点,但$ _FILES可能会被欺骗。我能够以这种方式确定它:

$arg = escapeshellarg( $_FILES["file"]["tmp_name"] );
system( "file $arg", $type );
echo "Real type:  " . $type;

It basically uses Unix's file command. There are probably better ways, but I haven't used PHP in a while. I usually avoid using system commands if possible.

它基本上使用Unix的文件命令。可能有更好的方法,但我有一段时间没有使用PHP。如果可能,我通常会避免使用系统命令。

#4


that could still be forged. I would ensure that you can not (or do not) run a file uploaded to the server automatically.

那可能还是伪造的。我会确保您不能(或不)自动运行上传到服务器的文件。

I would also have a virus/spy ware scanner, and let it do the work for you.

我也会有一个病毒/间谍软件扫描程序,让它为你工作。

#5


you can use below code which gives you MIME type if you have changed the extension then also

您可以使用下面的代码,如果您更改了扩展名,则会为您提供MIME类型

$finfo = finfo_open(FILEINFO_MIME_TYPE);
echo $mime = finfo_file($finfo, $_FILES['userfile']['tmp_name']);
finfo_close($finfo);

Windows users: just edit php.ini and uncomment this line:

Windows用户:只需编辑php.ini并取消注释该行:

extension=php_fileinfo.dll

Remember to restart Apache for new php.ini to take effect.

记得重新启动Apache以使新的php.ini生效。

#6


In *nix, the first two bytes of the file tells you (see "magic number"). In Windows, ...sometimes this will be true ("header info"). It is, ultimately, O.S. dependent.

在* nix中,文件的前两个字节告诉你(参见“幻数”)。在Windows中,...有时这将是真的(“标题信息”)。它最终是O.S.依赖。

#7


Executables in general have a "signature" on the first bytes; I find it hard though to really ascertain what the file type really is.

可执行文件通常在第一个字节上有一个“签名”;我觉得很难确定文件类型到底是什么。

#8


What file types do you expect? Maybe you could check that it conforms to what you expect and reject everything else.

您期望什么文件类型?也许你可以检查它是否符合你的期望并拒绝其他一切。

#9


Others have already mentioned FileInfo, which I think is the correct solution, but I'll add this just in case you can't use that one for some reason. Most (all?) *nix distros include a command called file that when run on a file will output its type. It has a switch to output in human readable format (default) or the MIME type. You could have your script invoke this program on the uploaded file and read the result. Again, this is not the preferred approach. If you're on Windows, this utility is available through Cygwin.

其他人已经提到了FileInfo,我认为这是正确的解决方案,但我会添加它,以防万一你出于某种原因不能使用它。大多数(全部?)* nix发行版都包含一个名为file的命令,当在文件上运行时会输出其类型。它具有以人类可读格式(默认)或MIME类型输出的开关。您可以让脚本在上载的文件上调用此程序并读取结果。同样,这不是首选方法。如果您使用的是Windows,则可以通过Cygwin获得此实用程序。

#10


Is checking the MIME type simply enough? I am assuming that changing the extension on a file doesn't change it's MIME type?

是否足够简单地检查MIME类型?我假设更改文件的扩展名不会改变它的MIME类型?

Is MIME type a strong enough indicator to go by here?

MIME类型是否足够强大,可以在这里找到?

Thanks for all of the responses thus far.

感谢迄今为止的所有回复。

#11


Is checking the MIME type simply enough? I am assuming that changing the extension on a file doesn't change it's MIME type? Is MIME type a strong enough indicator to go by here?

是否足够简单地检查MIME类型?我假设更改文件的扩展名不会改变它的MIME类型? MIME类型是否足够强大,可以在这里找到?

It really depends on how it's used.

这真的取决于它是如何使用的。

  • If you provide uploads and downloads, then nothing matters since it doesn't execute.
  • 如果您提供上传和下载,那么没有任何问题,因为它不会执行。

  • If it's handled by the web server, then it's going to be dependent on how the web server is configured, though subject to most of the rest of these comments.
  • 如果它由Web服务器处理,那么它将取决于Web服务器的配置方式,但要受其他大部分注释的影响。

  • If it's an image, it will either display, or not, or be the target of image library exploits. But only those.
  • 如果它是图像,它将显示或不显示或成为图像库漏洞的目标。但只有那些。

  • Something like a pdf file may not affect your server, but rather the computer of the person accessing the file.
  • 像pdf文件这样的东西可能不会影响您的服务器,而是影响访问该文件的人的计算机。

  • If it's going to be passed to a function like "system()" then we're back to the OS behavior--as if it were "double-clicked", and the file extension might even be considered.
  • 如果它将被传递给像“system()”这样的函数,那么我们将回到操作系统行为 - 就像它被“双击”一样,甚至可以考虑文件扩展名。