为什么我们需要在使用Apache POI读取Excel文件之前创建工作簿？

Please find a Code snippet below

请在下面找到代码段

public class DataDriven_GetDataExcel {

public static void main(String[] args) throws  IOException, EncryptedDocumentException, InvalidFormatException {   

//1 Getting Control over File
FileInputStream fis = new FileInputStream("C:\\Users\\bewosaurabh\\Documents\\GetDataFile.xlsx");

//2 Creating a Workbook
Workbook wb = WorkbookFactory.create(fis);

//3 Getting Control over Sheet
Sheet sh = wb.getSheet("Sheet1");
            .
            ......

What I don't understand is why we need to create a Workbook before reading the Excel file? An Excel file is also called as Workbook (as we can see in below picture).

我不明白为什么我们需要在阅读Excel文件之前创建工作簿？ Excel文件也称为工作簿（如下图所示）。

When we Create an excel file that means we are creating a Workbook. From there, we access the Sheets followed by rows and columns.

当我们创建一个excel文件，这意味着我们正在创建一个工作簿。从那里，我们访问表格，然后是行和列。

I don't understand why we write write WorkbookFactory.create(fis); when we already have a 'Workbook' We should have some methods to get the Workbook we have created like we have for Rows(getRow), Sheets (getSheet), Cells (getCell).

我不明白为什么我们写写WorkbookFactory.create（fis）;当我们已经有一个'工作簿'时我们应该有一些方法来获取我们创建的工作簿，就像我们为Rows（getRow），Sheets（getSheet），Cells（getCell）创建的那样。

Can you help me understand POI?

你能帮我理解POI吗？

1 个解决方案

#1

What Workbook wb = WorkbookFactory.create(fis); does is:

什么工作簿wb = WorkbookFactory.create（fis）;的确是：

Instantiating a Java object, which implements Workbook, from the content of the file read using the file input stream. After that the Workbook object is then located in memory. And only after having access to this Workbook object we can use its methods.

从使用文件输入流读取的文件内容实例化实现Workbook的Java对象。之后，Workbook对象随后位于内存中。只有在访问此Workbook对象后，我们才能使用其方法。

If we would use Workbook wb = WorkbookFactory.create(file);, that is using a File instead of an InputStream, then the WorkbookFactory would create the Workbook object directly from the file. The advantage of this is that not the whole file content must be read into the memory. So we have a lower memory footprint. The disadvantage is that the file which is opened for reading cannot be used for writing the same time. So we cannot write changings we have made using the methods of Workbook into the same file we have read the Workbook from.

如果我们使用Workbook wb = WorkbookFactory.create（file）;,那就是使用File而不是InputStream，那么WorkbookFactory将直接从文件创建Workbook对象。这样做的好处是不必将整个文件内容读入内存。所以我们的内存占用量更低。缺点是打开用于读取的文件不能用于同时写入。因此，我们不能将使用Workbook方法制作的更改写入我们已阅读工作簿的同一文件中。

If memory footprint is a bigger issue, then for XSSF (*.xlsx), we can get at the underlying XML data and process it using XSSF and SAX (Event API). Using this we need not to instantiate a Workbook object. Instead we are reading and parsing the XML directly from the OPCPackage which is a ZipPackage in case of XSSF (*.xlsx) since a *.xlsx is simply a ZIP archive containing a directory structure containing XML files and other files.

如果内存占用是一个更大的问题，那么对于XSSF（* .xlsx），我们可以获取基础XML数据并使用XSSF和SAX（事件API）处理它。使用这个我们不需要实例化Workbook对象。相反，我们直接从OPCPackage读取和解析XML，这是XSSF（* .xlsx）的ZipPackage，因为* .xlsx只是一个包含XML文件和其他文件的目录结构的ZIP存档。

Since a *.xlsx is simply a ZIP archive we also could opening it as FileSystem gotten from FileSystems and then process its content totally independent from third party libraries. But this will be the most challenging approach.

由于* .xlsx只是一个ZIP存档，我们也可以在FileSystem从FileSystems获取它时打开它，然后完全独立于第三方库处理它的内容。但这将是最具挑战性的方法。

#1