如何从字节数组中提取文件扩展名

I've got bytes array in database.

我在数据库中有字节数组。

How to extract file extension (mime/type) from byte array in java?

如何从java中的字节数组中提取文件扩展名(mime/type) ?

3 个解决方案

#1

If this is for storing a file that is uploaded:

如果是用于存储上传的文件:

create a column for the filename extension
为文件名扩展名创建一个列
create a column for the mime type as sent by the browser
为浏览器发送的mime类型创建一个列

If you don't have the original file, and you only have bytes, you have a couple of good solutions.

如果您没有原始文件，并且只有字节，那么您有一些很好的解决方案。

If you're able to use a library, look at using mime-util to inspect the bytes:

如果您能够使用一个库，可以使用mime-util来检查字节:

http://technopaper.blogspot.com/2009/03/identifying-mime-using-mime-util.html

If you have to build your own byte detector, here are many of the most popular starting bytes:

如果您必须构建自己的字节检测器，以下是许多最流行的起始字节:

"BC" => bitcode,
"BM" => bitmap,
"BZ" => bzip,
"MZ" => exe,
"SIMPLE"=> fits,
"GIF8" => gif,
"GKSM" => gks,
[0x01,0xDA].pack('c*') => iris_rgb,
[0xF1,0x00,0x40,0xBB].pack('c*') => itc,
[0xFF,0xD8].pack('c*') => jpeg,
"IIN1" => niff,
"MThd" => midi,
"%PDF" => pdf,
"VIEW" => pm,
[0x89].pack('c*') + "PNG" => png,
"%!" => postscript,
"Y" + [0xA6].pack('c*') + "j" + [0x95].pack('c*') => sun_rasterfile,
"MM*" + [0x00].pack('c*') => tiff,
"II*" + [0x00].pack('c*') => tiff,
"gimp xcf" => gimp_xcf,
"#FIG" => xfig,
"/* XPM */" => xpm,
[0x23,0x21].pack('c*') => shebang,
[0x1F,0x9D].pack('c*') => compress,
[0x1F,0x8B].pack('c*') => gzip,
"PK" + [0x03,0x04].pack('c*') => pkzip,
"MZ" => dos_os2_windows_executable,
".ELF" => unix_elf,
[0x99,0x00].pack('c*') => pgp_public_ring,
[0x95,0x01].pack('c*') => pgp_security_ring,
[0x95,0x00].pack('c*') => pgp_security_ring,
[0xA6,0x00].pack('c*') => pgp_encrypted_data,
[0xD0,0xCF,0x11,0xE0].pack('c*') => docfile

#2

It turned out that there is a decent method in JDK's URLConnection class, please refer to the following answer: Getting A File's Mime Type In Java

原来JDK的URLConnection类中有一个不错的方法，请参考以下答案:在Java中获取文件的Mime类型

If one needs to extract file extension from byte array instead of file, one should simply use java.io.ByteArrayInputStream (class to read bytes specifically from byte arrays) instead of java.io.FileInputStream (class to read bytes specifically from files) like in the following example:

如果需要从字节数组中提取文件扩展名而不是从文件中提取文件扩展名，那么只需使用java.io。ByteArrayInputStream(类来从字节数组中读取字节)而不是java.io。FileInputStream(类来从文件中读取字节)，如下例所示:

byte[] content = ;
InputStream is = new ByteArrayInputStream(content);
String mimeType = URLConnection.guessContentTypeFromStream(is);
 //...close stream

Hope this helps...

希望这有助于……

#3

Maybe I need to save additional column in my DB for file extension.

也许我需要在我的DB中为文件扩展保存额外的列。

That is a better solution than attempting to deduce a mimetype based on the database content, for (at least) the following reasons:

这比根据数据库内容推断mimetype更好，因为(至少)以下原因:

If you have a mime type from the document source, you can store and use that.
如果您有来自文档源的mime类型，您可以存储并使用它。
You could (potentially) ask the user to specify a mimetype when they lodge the document.
您可以(潜在地)要求用户在提交文档时指定mimetype。
If you have to use some heuristic-based scheme for figuring out a mimetype:
- you can do the work once before creating the table row, rather than N times after extracting it, and
- 您可以在创建表行之前进行一次工作，而不是在提取它之后的N次。
- you can report cases where the heuristic gives no good answer, and maybe ask the user to say what the file type really is.
- 您可以报告启发式没有给出好的答案的情况，并可能要求用户说出文件类型的真正含义。
如果你必须使用一些heuristic-based方案确定mimetype:你可以创建表行工作过一次,而不是N次提取后,你可以报告病例启发式没有提供好的答案,或者要求用户说什么文件类型。

(I'm making some assumptions that may not be warranted, but the question doesn't give any clues on how the larger system is intended to work.)

(我做出了一些可能没有根据的假设，但这个问题没有提供任何线索，说明这个更大的系统将如何运行。)

#1