Im developing a Java aplication that reads an excel xlsb file using Apache POI, but I got an exception while reading it, my code is as follows:
我正在开发一个Java应用程序,使用Apache POI读取excel xlsb文件,但是我在读取它时遇到了一个例外,我的代码如下:
import java.io.IOException;
import java.io.InputStream;
import org.apache.poi.xssf.eventusermodel.XSSFReader;
import org.apache.poi.xssf.model.SharedStringsTable;
import org.apache.poi.xssf.usermodel.XSSFRichTextString;
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.poi.openxml4j.exceptions.OpenXML4JException;
import org.apache.poi.openxml4j.opc.Package;
import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;
import java.util.Iterator;
public class Prueba {
public static void main (String [] args){
String direccion = "C:/Documents and Settings/RSalasL/My Documents/New Folder/masstigeoct12.xlsb";
Package pkg;
try {
pkg = Package.open(direccion);
XSSFReader r = new XSSFReader(pkg);
SharedStringsTable sst = r.getSharedStringsTable();
XMLReader parser = fetchSheetParser(sst);
Iterator<InputStream> sheets = r.getSheetsData();
while(sheets.hasNext()) {
System.out.println("Processing new sheet:\n");
InputStream sheet = sheets.next();
InputSource sheetSource = new InputSource(sheet);
parser.parse(sheetSource);
sheet.close();
System.out.println("");
}
} catch (InvalidFormatException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (OpenXML4JException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SAXException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public void processAllSheets(String filename) throws Exception {
Package pkg = Package.open(filename);
XSSFReader r = new XSSFReader( pkg );
SharedStringsTable sst = r.getSharedStringsTable();
XMLReader parser = fetchSheetParser(sst);
Iterator<InputStream> sheets = r.getSheetsData();
while(sheets.hasNext()) {
System.out.println("Processing new sheet:\n");
InputStream sheet = sheets.next();
InputSource sheetSource = new InputSource(sheet);
parser.parse(sheetSource);
sheet.close();
System.out.println("");
}
}
public static XMLReader fetchSheetParser(SharedStringsTable sst) throws SAXException {
XMLReader parser =
XMLReaderFactory.createXMLReader(
"org.apache.xerces.parsers.SAXParser"
);
ContentHandler handler = new SheetHandler(sst);
parser.setContentHandler(handler);
return parser;
}
private static class SheetHandler extends DefaultHandler {
private SharedStringsTable sst;
private String lastContents;
private boolean nextIsString;
private SheetHandler(SharedStringsTable sst) {
this.sst = sst;
}
public void startElement(String uri, String localName, String name,
Attributes attributes) throws SAXException {
// c => cell
if(name.equals("c")) {
// Print the cell reference
System.out.print(attributes.getValue("r") + " - ");
// Figure out if the value is an index in the SST
String cellType = attributes.getValue("t");
if(cellType != null && cellType.equals("s")) {
nextIsString = true;
} else {
nextIsString = false;
}
}
// Clear contents cache
lastContents = "";
}
public void endElement(String uri, String localName, String name)
throws SAXException {
// Process the last contents as required.
// Do now, as characters() may be called more than once
if(nextIsString) {
int idx = Integer.parseInt(lastContents);
lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString();
nextIsString = false;
}
// v => contents of a cell
// Output after we've seen the string contents
if(name.equals("v")) {
System.out.println(lastContents);
}
}
public void characters(char[] ch, int start, int length)
throws SAXException {
lastContents += new String(ch, start, length);
}
}
}
And the exception is this:
唯一的例外是:
java.io.CharConversionException: Characters larger than 4 bytes are not supported: byte 0x83 implies a length of more than 4 bytes
at org.apache.xmlbeans.impl.piccolo.xml.UTF8XMLDecoder.decode(UTF8XMLDecoder.java:162)
at org.apache.xmlbeans.impl.piccolo.xml.XMLStreamReader$FastStreamDecoder.read(XMLStreamReader.java:762)
at org.apache.xmlbeans.impl.piccolo.xml.XMLStreamReader.read(XMLStreamReader.java:162)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yy_refill(PiccoloLexer.java:3474)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yylex(PiccoloLexer.java:3958)
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yylex(Piccolo.java:1290)
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yyparse(Piccolo.java:1400)
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.parse(Piccolo.java:714)
at org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Locale.java:3439)
at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1270)
at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1257)
at org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:345)
at org.openxmlformats.schemas.spreadsheetml.x2006.main.WorkbookDocument$Factory.parse(Unknown Source)
at org.apache.poi.xssf.eventusermodel.XSSFReader$SheetIterator.<init>(XSSFReader.java:207)
at org.apache.poi.xssf.eventusermodel.XSSFReader$SheetIterator.<init>(XSSFReader.java:166)
at org.apache.poi.xssf.eventusermodel.XSSFReader.getSheetsData(XSSFReader.java:160)
at EDManager.Prueba.main(Prueba.java:36)
The file has 2 sheets, one with 329 rows and 3 columns and the other with 566 rows and 3 columns, I just want to read the file to find if a value is in the second sheet.
这个文件有两个表,一个有329行,3列,另一个有566行,3列,我只想读一下这个文件,看看第二个表中是否有一个值。
2 个解决方案
#1
10
Apache POI doesn't support the .xlsb file format for anything other than text extraction. Apache POI will happily provide full read or write support .xls files (via HSSF) and .xlsx files (via XSSF), or both (via the common SS UserModel interface).
除了文本提取之外,Apache POI不支持.xlsb文件格式。Apache POI将愉快地提供完整的读或写支持。xls文件(通过HSSF)和.xlsx文件(通过XSSF),或者两者都(通过通用的SS用户模型接口)。
However, the .xlsb format is not supported for generatl operations - it's a very odd hybrid between the two, and the large amount of work involved has meant no-one has been willing to volunteer/sponsor the work required.
然而,.xlsb格式不支持generatl操作——它是两者之间的一种非常奇怪的混合,所涉及的大量工作意味着没有人愿意自愿或赞助所需的工作。
What Apache POI does offer for .xlsb, as of Apache POI 3.15 beta3 / 3.16, is a text extractor for .xlsb files - XSSFBEventBasedExcelExtractor. You can use that to get the text out of your file, or with a few tweaks convert it to something like CSV
Apache POI所提供的。xlsb,如Apache POI 3.15 beta3 / 3.16,是一个用于.xlsb文件的文本提取器——XSSFBEventBasedExcelExtractor。您可以使用它来从文件中获取文本,或者通过一些微调将其转换为CSV之类的东西。
For full read/write support, you'll need to convert your file to either .xls (if it doesn't have very large numbers of rows/columns), or .xlsx (if it does). If you're really really keen to help though, you could review the source code for XSSFBEventBasedExcelExtractor, then have a go at contributing patches to add full support to POI for it!
对于完整的读/写支持,您需要将文件转换为.xls(如果没有大量的行/列)或.xlsx(如果有)。如果您真的很想提供帮助,您可以查看XSSFBEventBasedExcelExtractor的源代码,然后尝试为POI添加完整的支持补丁!
(Additionally, I think from the exception that your particular .xlsb file is partly corrupt, but even if it wasn't it still wouldn't be supported by Apache POI for anything other than text extraction, sorry)
(此外,我认为您的特殊的.xlsb文件有部分损坏,但是即使没有,Apache POI仍然不会支持除文本提取之外的任何内容,抱歉)
#2
0
I have a implementation using the smartxls, and my code firts convert the xlsb to xlsx and after can use ApachePoi. The next method receive a java.io.File and verify if its extension is xlsb and convert this to xlsx and replace file whit the new. This works for me.
我有一个使用smartxls的实现,我的代码firts将xlsb转换为xlsx,之后可以使用ApachePoi。下一个方法接收java.io。文件并验证其扩展名是否为xlsb并将其转换为xlsx并替换新文件whit。这适合我。
private void processXLSBFile(File file) {
WorkBook workBook = new WorkBook();
String filePath = file.getAbsolutePath();
if (FilenameUtils.getExtension(filePath).equalsIgnoreCase((Static.XLSB_EXT))) {
try {
workBook.readXLSB(new java.io.FileInputStream(filePath));
filePath = filePath.replaceAll("(?i)".concat(Static.XLSB),
Static.XLSX_EXT.toLowerCase());
workBook.writeXLSX(new java.io.FileOutputStream(filePath));
final File xlsb = new File(filePath);
file = xlsb;
} catch (Exception e) {
logger.error(e.getMessage(), e);
MensajesJSFUtil
.mostrarMensajeNegocio(new GTMException(e, ClaveMensaje.COMANDAS_ADJUNTAR_XLSBFILE_READERROR));
}
}
}
#1
10
Apache POI doesn't support the .xlsb file format for anything other than text extraction. Apache POI will happily provide full read or write support .xls files (via HSSF) and .xlsx files (via XSSF), or both (via the common SS UserModel interface).
除了文本提取之外,Apache POI不支持.xlsb文件格式。Apache POI将愉快地提供完整的读或写支持。xls文件(通过HSSF)和.xlsx文件(通过XSSF),或者两者都(通过通用的SS用户模型接口)。
However, the .xlsb format is not supported for generatl operations - it's a very odd hybrid between the two, and the large amount of work involved has meant no-one has been willing to volunteer/sponsor the work required.
然而,.xlsb格式不支持generatl操作——它是两者之间的一种非常奇怪的混合,所涉及的大量工作意味着没有人愿意自愿或赞助所需的工作。
What Apache POI does offer for .xlsb, as of Apache POI 3.15 beta3 / 3.16, is a text extractor for .xlsb files - XSSFBEventBasedExcelExtractor. You can use that to get the text out of your file, or with a few tweaks convert it to something like CSV
Apache POI所提供的。xlsb,如Apache POI 3.15 beta3 / 3.16,是一个用于.xlsb文件的文本提取器——XSSFBEventBasedExcelExtractor。您可以使用它来从文件中获取文本,或者通过一些微调将其转换为CSV之类的东西。
For full read/write support, you'll need to convert your file to either .xls (if it doesn't have very large numbers of rows/columns), or .xlsx (if it does). If you're really really keen to help though, you could review the source code for XSSFBEventBasedExcelExtractor, then have a go at contributing patches to add full support to POI for it!
对于完整的读/写支持,您需要将文件转换为.xls(如果没有大量的行/列)或.xlsx(如果有)。如果您真的很想提供帮助,您可以查看XSSFBEventBasedExcelExtractor的源代码,然后尝试为POI添加完整的支持补丁!
(Additionally, I think from the exception that your particular .xlsb file is partly corrupt, but even if it wasn't it still wouldn't be supported by Apache POI for anything other than text extraction, sorry)
(此外,我认为您的特殊的.xlsb文件有部分损坏,但是即使没有,Apache POI仍然不会支持除文本提取之外的任何内容,抱歉)
#2
0
I have a implementation using the smartxls, and my code firts convert the xlsb to xlsx and after can use ApachePoi. The next method receive a java.io.File and verify if its extension is xlsb and convert this to xlsx and replace file whit the new. This works for me.
我有一个使用smartxls的实现,我的代码firts将xlsb转换为xlsx,之后可以使用ApachePoi。下一个方法接收java.io。文件并验证其扩展名是否为xlsb并将其转换为xlsx并替换新文件whit。这适合我。
private void processXLSBFile(File file) {
WorkBook workBook = new WorkBook();
String filePath = file.getAbsolutePath();
if (FilenameUtils.getExtension(filePath).equalsIgnoreCase((Static.XLSB_EXT))) {
try {
workBook.readXLSB(new java.io.FileInputStream(filePath));
filePath = filePath.replaceAll("(?i)".concat(Static.XLSB),
Static.XLSX_EXT.toLowerCase());
workBook.writeXLSX(new java.io.FileOutputStream(filePath));
final File xlsb = new File(filePath);
file = xlsb;
} catch (Exception e) {
logger.error(e.getMessage(), e);
MensajesJSFUtil
.mostrarMensajeNegocio(new GTMException(e, ClaveMensaje.COMANDAS_ADJUNTAR_XLSBFILE_READERROR));
}
}
}