使用Apache POI确定MS Excel文件类型

时间:2021-12-26 20:24:07

Is there a way to determine MS Office Excel file type in Apache POI? I need to know in what format is the Excel file: in Excel '97(-2007) (.xls) or Excel 2007 OOXML (.xlsx).

在Apache POI中有确定MS Office Excel文件类型的方法吗?我需要知道Excel文件的格式:Excel '97(-2007) (.xls)或Excel 2007 OOXML (.xlsx)。

I suppose I could do something like this:

我想我可以这样做:

int type = PoiTypeHelper.getType(file);
switch (type) {
case PoiType.EXCEL_1997_2007:
   ...
   break;
case PoiType.EXCEL_2007:
   ...
   break;
default:
   ...
}

Thanks.

谢谢。

4 个解决方案

#1


36  

Promoting a comment to an answer...

促进一个评论的答案……

If you're going to be doing something special with the files, then rjokelai's answer is the way to do it.

如果你打算用这些文件做一些特殊的事情,那么rjokelai的答案就是这样做的。

However, if you're just going to be using the HSSF / XSSF / Common SS usermodel, then it's much simpler to have POI do it for you, and use WorkbookFactory to have the type detected and opened for you. You'd do something like:

但是,如果您只是要使用HSSF / XSSF / Common SS usermodel,那么让POI为您做这件事要简单得多,并使用WorkbookFactory来检测和打开类型。你会做些什么:

 Workbook wb = WorkbookFactory.create(new File("something.xls"));

or

 Workbook wb = WorkbookFactory.create(request.getInputStream());

Then if you needed to do something special, test if it's a HSSFWorkbook or XSSFWorkbook. When opening the file, use a File rather than an InputStream if possible to speed things up and save memory.

然后,如果您需要做一些特殊的事情,请测试它是HSSFWorkbook还是XSSFWorkbook。打开文件时,如果可能的话,使用文件而不是InputStream来加快速度并保存内存。

If you don't know what your file is at all, use Apache Tika to do the detection - it can detect a huge number of different file formats for you.

如果您根本不知道您的文件是什么,那么使用Apache Tika进行检测——它可以为您检测大量不同的文件格式。

#2


22  

You can use:

您可以使用:

// For .xlsx
POIXMLDocument.hasOOXMLHeader(new BufferedInputStream( new FileInputStream(file) ));

// For .xls
POIFSFileSystem.hasPOIFSHeader(new BufferedInputStream( new FileInputStream(file) ));

These are essentially the methods that the WorkbookFactory#create(InputStream) uses for determining the type

这些实际上是WorkbookFactory#create(InputStream)用于确定类型的方法

Please note, that both method supports only streams supporting "mark" feature (or PushBackInputStream), so simple FileInputStream is not supported. Use BufferedInputStream as a wrapper. For this reason after the detection you can simply reuse the stream, since it will be reseted to the starting point.

请注意,这两个方法都只支持支持“mark”特性(或PushBackInputStream)的流,因此不支持简单的FileInputStream。使用BufferedInputStream作为包装。由于这个原因,在检测之后,您可以简单地重用流,因为流将被重新设置到起点。

#3


1  

Based on the lib implementation of org.apache.poi.ss.usermodel.WorkbookFactory#create(java.io.InputStream)

基于org.apache.poi. usermodel. workbookfactory #create(java.io.InputStream)的lib实现

We can mimic the WorkbookFactory's logic, remove irrelevant bits and return file type instead.

我们可以模拟WorkbookFactory的逻辑,删除不相关的位并返回文件类型。

public static TYPE fileType(File file) {
    try (
            InputStream inp = new FileInputStream(file)
    ) {
        if (!(inp).markSupported()) {
            return getNotMarkSupportFileType(file);
        }
        return getType(inp);
    } catch (IOException e) {
        LOGGER.error("Analyse FileType Problem.", e);
        return TYPE.INVALID;
    }
}

private static TYPE getNotMarkSupportFileType(File file) throws IOException {
    try (
            InputStream inp = new PushbackInputStream(new FileInputStream(file), 8)
    ) {
        return getType(inp);
    }
}

private static TYPE getType(InputStream inp) throws IOException {
    byte[] header8 = IOUtils.peekFirst8Bytes(inp);
    if (NPOIFSFileSystem.hasPOIFSHeader(header8)) {
        NPOIFSFileSystem fs = new NPOIFSFileSystem(inp);
        return fileType(fs);
    } else if (DocumentFactoryHelper.hasOOXMLHeader(inp)) {
        return TYPE.XSSF_WORKBOOK;
    }
    return TYPE.INVALID;
}

private static TYPE fileType(NPOIFSFileSystem fs) {
    DirectoryNode root = fs.getRoot();
    if (root.hasEntry("EncryptedPackage")) {
        return TYPE.XSSF_WORKBOOK;
    }
    return TYPE.HSSF_WORKBOOK;

}

public enum TYPE {
    HSSF_WORKBOOK, XSSF_WORKBOOK, INVALID
}

#4


0  

This can be done using the FileMagic class. See below JavaDoc - https://poi.apache.org/apidocs/org/apache/poi/poifs/filesystem/FileMagic.html

这可以使用FileMagic类来完成。参见下面的JavaDoc - https://poi.apache.org/apidocs/org/apache/poi/poifs/stem/filemagfilesy.html

Sample code snippet:

示例代码片段:

FileMagic.valueOf(inputStream).equals(FileMagic.OOXML) // XLSX

FileMagic.valueOf(inputStream).equals(FileMagic.OOXML)/ / XLSX

#1


36  

Promoting a comment to an answer...

促进一个评论的答案……

If you're going to be doing something special with the files, then rjokelai's answer is the way to do it.

如果你打算用这些文件做一些特殊的事情,那么rjokelai的答案就是这样做的。

However, if you're just going to be using the HSSF / XSSF / Common SS usermodel, then it's much simpler to have POI do it for you, and use WorkbookFactory to have the type detected and opened for you. You'd do something like:

但是,如果您只是要使用HSSF / XSSF / Common SS usermodel,那么让POI为您做这件事要简单得多,并使用WorkbookFactory来检测和打开类型。你会做些什么:

 Workbook wb = WorkbookFactory.create(new File("something.xls"));

or

 Workbook wb = WorkbookFactory.create(request.getInputStream());

Then if you needed to do something special, test if it's a HSSFWorkbook or XSSFWorkbook. When opening the file, use a File rather than an InputStream if possible to speed things up and save memory.

然后,如果您需要做一些特殊的事情,请测试它是HSSFWorkbook还是XSSFWorkbook。打开文件时,如果可能的话,使用文件而不是InputStream来加快速度并保存内存。

If you don't know what your file is at all, use Apache Tika to do the detection - it can detect a huge number of different file formats for you.

如果您根本不知道您的文件是什么,那么使用Apache Tika进行检测——它可以为您检测大量不同的文件格式。

#2


22  

You can use:

您可以使用:

// For .xlsx
POIXMLDocument.hasOOXMLHeader(new BufferedInputStream( new FileInputStream(file) ));

// For .xls
POIFSFileSystem.hasPOIFSHeader(new BufferedInputStream( new FileInputStream(file) ));

These are essentially the methods that the WorkbookFactory#create(InputStream) uses for determining the type

这些实际上是WorkbookFactory#create(InputStream)用于确定类型的方法

Please note, that both method supports only streams supporting "mark" feature (or PushBackInputStream), so simple FileInputStream is not supported. Use BufferedInputStream as a wrapper. For this reason after the detection you can simply reuse the stream, since it will be reseted to the starting point.

请注意,这两个方法都只支持支持“mark”特性(或PushBackInputStream)的流,因此不支持简单的FileInputStream。使用BufferedInputStream作为包装。由于这个原因,在检测之后,您可以简单地重用流,因为流将被重新设置到起点。

#3


1  

Based on the lib implementation of org.apache.poi.ss.usermodel.WorkbookFactory#create(java.io.InputStream)

基于org.apache.poi. usermodel. workbookfactory #create(java.io.InputStream)的lib实现

We can mimic the WorkbookFactory's logic, remove irrelevant bits and return file type instead.

我们可以模拟WorkbookFactory的逻辑,删除不相关的位并返回文件类型。

public static TYPE fileType(File file) {
    try (
            InputStream inp = new FileInputStream(file)
    ) {
        if (!(inp).markSupported()) {
            return getNotMarkSupportFileType(file);
        }
        return getType(inp);
    } catch (IOException e) {
        LOGGER.error("Analyse FileType Problem.", e);
        return TYPE.INVALID;
    }
}

private static TYPE getNotMarkSupportFileType(File file) throws IOException {
    try (
            InputStream inp = new PushbackInputStream(new FileInputStream(file), 8)
    ) {
        return getType(inp);
    }
}

private static TYPE getType(InputStream inp) throws IOException {
    byte[] header8 = IOUtils.peekFirst8Bytes(inp);
    if (NPOIFSFileSystem.hasPOIFSHeader(header8)) {
        NPOIFSFileSystem fs = new NPOIFSFileSystem(inp);
        return fileType(fs);
    } else if (DocumentFactoryHelper.hasOOXMLHeader(inp)) {
        return TYPE.XSSF_WORKBOOK;
    }
    return TYPE.INVALID;
}

private static TYPE fileType(NPOIFSFileSystem fs) {
    DirectoryNode root = fs.getRoot();
    if (root.hasEntry("EncryptedPackage")) {
        return TYPE.XSSF_WORKBOOK;
    }
    return TYPE.HSSF_WORKBOOK;

}

public enum TYPE {
    HSSF_WORKBOOK, XSSF_WORKBOOK, INVALID
}

#4


0  

This can be done using the FileMagic class. See below JavaDoc - https://poi.apache.org/apidocs/org/apache/poi/poifs/filesystem/FileMagic.html

这可以使用FileMagic类来完成。参见下面的JavaDoc - https://poi.apache.org/apidocs/org/apache/poi/poifs/stem/filemagfilesy.html

Sample code snippet:

示例代码片段:

FileMagic.valueOf(inputStream).equals(FileMagic.OOXML) // XLSX

FileMagic.valueOf(inputStream).equals(FileMagic.OOXML)/ / XLSX