I am reading an Excel sheet using POI's XSSF and SAX (Event API). The Excel sheet has thousands of rows of user information like user name, email, address, age, department etc.
我正在使用POI的XSSF和SAX(事件API)读取Excel表。Excel表格包含数千行用户信息,如用户名、电子邮件、地址、年龄、部门等。
I need to read each row from Excel, convert it into a User object and add this User object to a List of User objects.
我需要从Excel中读取每一行,将其转换为用户对象,并将此用户对象添加到用户对象列表中。
I can read the Excel sheet successfully, but I am not sure at what point while reading I should create an instance of the User object and populate it with the data from the Excel sheet.
我可以成功地读取Excel表,但是我不确定在读取时什么时候应该创建User对象的实例并使用来自Excel表的数据填充它。
Below is my entire working code.
下面是我的全部工作代码。
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.PrintStream;
import java.util.ArrayList;
import java.util.List;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.apache.poi.openxml4j.exceptions.OpenXML4JException;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.openxml4j.opc.PackageAccess;
import org.apache.poi.ss.usermodel.BuiltinFormats;
import org.apache.poi.ss.usermodel.DataFormatter;
import org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable;
import org.apache.poi.xssf.eventusermodel.XSSFReader;
import org.apache.poi.xssf.model.StylesTable;
import org.apache.poi.xssf.usermodel.XSSFCellStyle;
import org.apache.poi.xssf.usermodel.XSSFRichTextString;
import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
public class ExcelSheetParser {
enum xssfDataType {
BOOL, ERROR, FORMULA, INLINESTR, SSTINDEX, NUMBER,
}
int countrows = 0;
class XSSFSheetHandler extends DefaultHandler {
/**
* Table with styles
*/
private StylesTable stylesTable;
/**
* Table with unique strings
*/
private ReadOnlySharedStringsTable sharedStringsTable;
/**
* Destination for data
*/
private final PrintStream output;
private List<?> list = new ArrayList();
private Class clazz;
/**
* Number of columns to read starting with leftmost
*/
private final int minColumnCount;
// Set when V start element is seen
private boolean vIsOpen;
// Set when cell start element is seen;
// used when cell close element is seen.
private xssfDataType nextDataType;
// Used to format numeric cell values.
private short formatIndex;
private String formatString;
private final DataFormatter formatter;
private int thisColumn = -1;
// The last column printed to the output stream
private int lastColumnNumber = -1;
// Gathers characters as they are seen.
private StringBuffer value;
/**
* Accepts objects needed while parsing.
*
* @param styles
* Table of styles
* @param strings
* Table of shared strings
* @param cols
* Minimum number of columns to show
* @param target
* Sink for output
*/
public XSSFSheetHandler(StylesTable styles,
ReadOnlySharedStringsTable strings, int cols, PrintStream target, Class clazz) {
this.stylesTable = styles;
this.sharedStringsTable = strings;
this.minColumnCount = cols;
this.output = target;
this.value = new StringBuffer();
this.nextDataType = xssfDataType.NUMBER;
this.formatter = new DataFormatter();
this.clazz = clazz;
}
public void startElement(String uri, String localName, String name,
Attributes attributes) throws SAXException {
if ("inlineStr".equals(name) || "v".equals(name)) {
vIsOpen = true;
// Clear contents cache
value.setLength(0);
}
// c => cell
else if ("c".equals(name)) {
// Get the cell reference
String r = attributes.getValue("r");
int firstDigit = -1;
for (int c = 0; c < r.length(); ++c) {
if (Character.isDigit(r.charAt(c))) {
firstDigit = c;
break;
}
}
thisColumn = nameToColumn(r.substring(0, firstDigit));
// Set up defaults.
this.nextDataType = xssfDataType.NUMBER;
this.formatIndex = -1;
this.formatString = null;
String cellType = attributes.getValue("t");
String cellStyleStr = attributes.getValue("s");
if ("b".equals(cellType))
nextDataType = xssfDataType.BOOL;
else if ("e".equals(cellType))
nextDataType = xssfDataType.ERROR;
else if ("inlineStr".equals(cellType))
nextDataType = xssfDataType.INLINESTR;
else if ("s".equals(cellType))
nextDataType = xssfDataType.SSTINDEX;
else if ("str".equals(cellType))
nextDataType = xssfDataType.FORMULA;
else if (cellStyleStr != null) {
// It's a number, but almost certainly one
// with a special style or format
int styleIndex = Integer.parseInt(cellStyleStr);
XSSFCellStyle style = stylesTable.getStyleAt(styleIndex);
this.formatIndex = style.getDataFormat();
this.formatString = style.getDataFormatString();
if (this.formatString == null)
this.formatString = BuiltinFormats
.getBuiltinFormat(this.formatIndex);
}
}
}
public void endElement(String uri, String localName, String name)
throws SAXException {
String thisStr = null;
// v => contents of a cell
if ("v".equals(name)) {
// Process the value contents as required.
// Do now, as characters() may be called more than once
switch (nextDataType) {
case BOOL:
char first = value.charAt(0);
thisStr = first == '0' ? "FALSE" : "TRUE";
break;
case ERROR:
thisStr = "\"ERROR:" + value.toString() + '"';
break;
case FORMULA:
// A formula could result in a string value,
// so always add double-quote characters.
thisStr = '"' + value.toString() + '"';
break;
case INLINESTR:
// TODO: have seen an example of this, so it's untested.
XSSFRichTextString rtsi = new XSSFRichTextString(value
.toString());
thisStr = '"' + rtsi.toString() + '"';
break;
case SSTINDEX:
String sstIndex = value.toString();
try {
int idx = Integer.parseInt(sstIndex);
XSSFRichTextString rtss = new XSSFRichTextString(
sharedStringsTable.getEntryAt(idx));
thisStr = '"' + rtss.toString() + '"';
} catch (NumberFormatException ex) {
output.println("Failed to parse SST index '" + sstIndex
+ "': " + ex.toString());
}
break;
case NUMBER:
String n = value.toString();
if (this.formatString != null)
thisStr = formatter.formatRawCellContents(Double
.parseDouble(n), this.formatIndex,
this.formatString);
else
thisStr = n;
break;
default:
thisStr = "(TODO: Unexpected type: " + nextDataType + ")";
break;
}
// Output after we've seen the string contents
// Emit commas for any fields that were missing on this row
if (lastColumnNumber == -1) {
lastColumnNumber = 0;
}
for (int i = lastColumnNumber; i < thisColumn; ++i)
output.print(',');
// Might be the empty string.
output.print(thisColumn +" : "+thisStr);
// Update column
if (thisColumn > -1)
lastColumnNumber = thisColumn;
} else if ("row".equals(name)) {
// Print out any missing commas if needed
if (minColumns > 0) {
// Columns are 0 based
if (lastColumnNumber == -1) {
lastColumnNumber = 0;
}
for (int i = lastColumnNumber; i < (this.minColumnCount); i++) {
output.print(',');
}
}
// We're onto a new row
output.println();
output.println(countrows++);
lastColumnNumber = -1;
}
}
/**
* Captures characters only if a suitable element is open. Originally
* was just "v"; extended for inlineStr also.
*/
public void characters(char[] ch, int start, int length)
throws SAXException {
if (vIsOpen)
value.append(ch, start, length);
}
/**
* Converts an Excel column name like "C" to a zero-based index.
*
* @param name
* @return Index corresponding to the specified name
*/
private int nameToColumn(String name) {
int column = -1;
for (int i = 0; i < name.length(); ++i) {
int c = name.charAt(i);
column = (column + 1) * 26 + c - 'A';
}
return column;
}
}
// /////////////////////////////////////
private OPCPackage xlsxPackage;
private int minColumns;
private PrintStream output;
private Class clazz;
/**
* Creates a new XLSX -> CSV converter
*
* @param pkg
* The XLSX package to process
* @param output
* The PrintStream to output the CSV to
* @param minColumns
* The minimum number of columns to output, or -1 for no minimum
*/
public ExcelSheetParser(OPCPackage pkg, PrintStream output, int minColumns, Class clazz) {
this.xlsxPackage = pkg;
this.output = output;
this.minColumns = minColumns;
this.clazz = clazz;
}
/**
* Parses and shows the content of one sheet using the specified styles and
* shared-strings tables.
*
* @param styles
* @param strings
* @param sheetInputStream
*/
public void processSheet(StylesTable styles,
ReadOnlySharedStringsTable strings, InputStream sheetInputStream)
throws IOException, ParserConfigurationException, SAXException {
InputSource sheetSource = new InputSource(sheetInputStream);
SAXParserFactory saxFactory = SAXParserFactory.newInstance();
SAXParser saxParser = saxFactory.newSAXParser();
XMLReader sheetParser = saxParser.getXMLReader();
ContentHandler handler = new XSSFSheetHandler(styles, strings,
this.minColumns, this.output, this.clazz);
sheetParser.setContentHandler(handler);
sheetParser.parse(sheetSource);
}
/**
* Initiates the processing of the XLS workbook file to CSV.
*
* @throws IOException
* @throws OpenXML4JException
* @throws ParserConfigurationException
* @throws SAXException
*/
public void process() throws IOException, OpenXML4JException,
ParserConfigurationException, SAXException {
ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(
this.xlsxPackage);
XSSFReader xssfReader = new XSSFReader(this.xlsxPackage);
StylesTable styles = xssfReader.getStylesTable();
XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader
.getSheetsData();
int index = 0;
while (iter.hasNext()) {
InputStream stream = iter.next();
String sheetName = iter.getSheetName();
this.output.println(sheetName + " [index=" + index + "]:");
processSheet(styles, strings, stream);
stream.close();
++index;
}
}
}
4 个解决方案
#1
1
What I'd probably do is start building the User object when the row starts. As you hit the cells in the row, you populate your User object. When the row ends, validate the User object, and if it's fine add it then. Because you're doing SAX parsing, you'll get the start and events for all of these, so you can attach your logic there.
我可能会在行开始时开始构建User对象。当您点击行中的单元格时,您将填充用户对象。当行结束时,验证User对象,如果可以,那么添加它。因为您正在进行SAX解析,所以您将获得所有这些的开始和事件,因此您可以在那里附加逻辑。
I'd suggest you take a look at XLSX2CSV in the Apache POI Examples. It shows how to go about handling the different kinds of cell contents (which you'll need for populating your user object), how to do something when you reach the end of the row, as well as handling missing cells etc.
我建议您在Apache POI示例中查看XLSX2CSV。它展示了如何处理不同类型的单元格内容(填充用户对象所需的内容),如何在到达行尾时执行一些操作,以及如何处理丢失的单元格等等。
#2
0
I think you can create a user object at following location in your code:
我认为您可以在以下代码中创建一个用户对象:
// We're onto a new row
output.println();
// Convert output to a new user object
// ....
// ....
#3
0
First of all where you are saving value in thisStr variable, if this is a valid value then put this value in Map.
首先,在thisStr变量中保存值,如果这个值是有效的,那么把这个值放到Map中。
You should create USer object in endElement() method in
您应该在endElement()方法中创建USer对象
else if ("row".equals(name)) {
// use map create USER object here
}
and You can add Users object in global list and if you want to persist it then you can persist it sheet by sheet OR all data at a time.
你可以在全局列表中添加Users对象如果你想要持久化它你可以将它一张张地保存或者同时保存所有数据。
while (iter.hasNext()) {
InputStream stream = iter.next();
String sheetName = iter.getSheetName();
this.output.println(sheetName + " [index=" + index + "]:");
processSheet(styles, strings, stream);
stream.close();
++index;
//for persisting USERS data sheet by sheet write your code here.........
}
// for persisting complete data of all sheets write your code here...
This is working for me.
这对我很有效。
#4
0
For one of my projects I have created a basic utility that uses Apache POI and OpenCSV and can read both xlsx, xls and csv files.
在我的一个项目中,我创建了一个基本实用程序,它使用Apache POI和OpenCSV,可以读取xlsx、xls和csv文件。
Given a converter it can convert rows to objects, like this:
给定一个转换器,它可以将行转换为对象,如下所示:
RowConverter<Country> converter = (row) -> new Country(row[0], row[1]);
ExcelReader<Country> reader = ExcelReader.builder(Country.class)
.converter(converter)
.withHeader()
.csvDelimiter(';')
.sheets(1)
.build();
List<Country> list;
list = reader.read("CountryCodes.xlsx");
list = reader.read("CountryCodes.xls");
list = reader.read("CountryCodes.csv");
You may find the project on github.
你可以在github上找到这个项目。
#1
1
What I'd probably do is start building the User object when the row starts. As you hit the cells in the row, you populate your User object. When the row ends, validate the User object, and if it's fine add it then. Because you're doing SAX parsing, you'll get the start and events for all of these, so you can attach your logic there.
我可能会在行开始时开始构建User对象。当您点击行中的单元格时,您将填充用户对象。当行结束时,验证User对象,如果可以,那么添加它。因为您正在进行SAX解析,所以您将获得所有这些的开始和事件,因此您可以在那里附加逻辑。
I'd suggest you take a look at XLSX2CSV in the Apache POI Examples. It shows how to go about handling the different kinds of cell contents (which you'll need for populating your user object), how to do something when you reach the end of the row, as well as handling missing cells etc.
我建议您在Apache POI示例中查看XLSX2CSV。它展示了如何处理不同类型的单元格内容(填充用户对象所需的内容),如何在到达行尾时执行一些操作,以及如何处理丢失的单元格等等。
#2
0
I think you can create a user object at following location in your code:
我认为您可以在以下代码中创建一个用户对象:
// We're onto a new row
output.println();
// Convert output to a new user object
// ....
// ....
#3
0
First of all where you are saving value in thisStr variable, if this is a valid value then put this value in Map.
首先,在thisStr变量中保存值,如果这个值是有效的,那么把这个值放到Map中。
You should create USer object in endElement() method in
您应该在endElement()方法中创建USer对象
else if ("row".equals(name)) {
// use map create USER object here
}
and You can add Users object in global list and if you want to persist it then you can persist it sheet by sheet OR all data at a time.
你可以在全局列表中添加Users对象如果你想要持久化它你可以将它一张张地保存或者同时保存所有数据。
while (iter.hasNext()) {
InputStream stream = iter.next();
String sheetName = iter.getSheetName();
this.output.println(sheetName + " [index=" + index + "]:");
processSheet(styles, strings, stream);
stream.close();
++index;
//for persisting USERS data sheet by sheet write your code here.........
}
// for persisting complete data of all sheets write your code here...
This is working for me.
这对我很有效。
#4
0
For one of my projects I have created a basic utility that uses Apache POI and OpenCSV and can read both xlsx, xls and csv files.
在我的一个项目中,我创建了一个基本实用程序,它使用Apache POI和OpenCSV,可以读取xlsx、xls和csv文件。
Given a converter it can convert rows to objects, like this:
给定一个转换器,它可以将行转换为对象,如下所示:
RowConverter<Country> converter = (row) -> new Country(row[0], row[1]);
ExcelReader<Country> reader = ExcelReader.builder(Country.class)
.converter(converter)
.withHeader()
.csvDelimiter(';')
.sheets(1)
.build();
List<Country> list;
list = reader.read("CountryCodes.xlsx");
list = reader.read("CountryCodes.xls");
list = reader.read("CountryCodes.csv");
You may find the project on github.
你可以在github上找到这个项目。