Poi解析对比excel表格

时间:2021-06-16 06:42:10

前言

这次不是Android的技术分享,是java的,当然把poi的代码放到Android中也可以用,毕竟同源嘛

为啥会有这个文章呢,因为我老婆是会计嘛,她有时候会让我帮忙对账,两个excel文件,顺序也不同,需要我来对比出哪里有问题,也就是数不太对应,我想了一下,如果好几百个甚至几千个数字来对账,那我岂不是眼睛都花了,这样我哪里还有时间去happy愉快的撸代码了?
作为一个程序员,我们要解放自己的眼睛,去做一些有意义的事情!

开发环境

Intellij Idea+maven

pom文件
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

<modelVersion>4.0.0</modelVersion>

<groupId>com.kikt</groupId>
<artifactId>ExcelDemo</artifactId>
<version>1.0-SNAPSHOT</version>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.7</source>
<target>1.7</target>
</configuration>
</plugin>
</plugins>
</build>

<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.poi/poi -->
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>3.15-beta2</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.poi/poi-ooxml -->
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>3.15-beta2</version>
</dependency>
</dependencies>

</project>

引入了poi的解析库的两个文件

结构

首先是poi对于excel的结构分析
WorkBook->Sheet->Row->Cell
放在wps/excel中看,WorkBook对应的是工作簿,Sheet是表,Row顾名思义是行,Cell是单元格

有了这个基础,我们继续去看

获取数据

得到sheet表

要想拿到数据,首先需要先把工作簿拿到,然后拿到Sheet

首先拿到WorkBook

        File file = new File(path);
FileInputStream is = new FileInputStream(file);
Workbook sheets = WorkbookFactory.create(is);

这里path是文件对应的路径

我们这里建一个Utils文件用于操作这样的重复数据
ExcelUtils.java

public class ExcelUtils {
private ExcelUtils() {
}
public static Sheet getSheet(String path, int sheetPosition) throws IOException, InvalidFormatException {
File file = new File(path);
FileInputStream is = new FileInputStream(file);
Workbook sheets = WorkbookFactory.create(is);
return sheets.getSheetAt(sheetPosition);
}

public static Sheet getSheet(String path, String sheetName) throws IOException, InvalidFormatException {
File file = new File(path);
FileInputStream is = new FileInputStream(file);
Workbook sheets = WorkbookFactory.create(is);
return sheets.getSheet(sheetName);
}
}

两个方法分别使用表格的名字/序号获取
position从0开始,这里为了处理有可能数十个sheet的情况,所以增加了一个用名称获取的方法

Sheet的声明

public interface Sheet extends Iterable<Row> 

Sheet是一个接口,继承Iterable,所以可以知道这里的实现类一定实现了Iterable接口
可以用foreach循环来遍历Sheet得到Row

得到Cell

public interface Row extends Iterable<Cell>

Row同样如此,可以通过foreach循环得到Cell,这样可以每个单元格的遍历

 Cell getCell(int var1);

Row中有一个方法,根据一个int值得到对应的Cell
这个方法从0开始,这里就涉及到一个问题,Excel的列标是字母形式,而不是数字,需要转化一下
这里写了一个小算法

    private static Map<Integer, Integer> columnMap = new HashMap<>();

private static int getColumnLength(int length) {
Integer columnLength = columnMap.get(length);
if (columnLength == null) {
columnMap.put(length, (int) Math.pow(26, length));
} else {
return columnLength;
}
return getColumnLength(length);
}

/**
* @param columnLetter 列的字母
* @return 列对应的数字
*/

public static int getColumnNumber(String columnLetter) {
if (columnLetter == null) {
throw new RuntimeException("列号不能为空");
}
columnLetter = columnLetter.toLowerCase();
int letterLength = columnLetter.length();
if (letterLength == 1) {
char letter = columnLetter.charAt(0);
return letter - 97;
} else {
Integer length =getColumnLength(letterLength - 1);
return (getColumnNumber(columnLetter.charAt(0) + "")+1)*length+getColumnNumber(columnLetter.substring(1));
}
}

可以将AA、CA之类的列号转为对应的数字
PS:题外话,这里推荐下Sedgewick的《算法》一书,最近重新研读了下,虽然都是基础,但是基础的牢靠对于算法有很大的帮助

正式开始编码的准备工作

这里是对应的两个表的截图,这里我给隐私部位打了些马赛克
Poi解析对比excel表格
Poi解析对比excel表格
金额之类的可以看到
我们要对比的就是图1的F列和图2的H列

String recordFilePath = "H:\\1.xls";
Sheet recordSheet = ExcelUtils.getSheet(recordFilePath, 0);
List<RecordBean> recordBeanList = getRecordList(recordSheet, "a", "f");

String invoiceFilePath = "2.xls";
Sheet invoiceSheet = ExcelUtils.getSheet(invoiceFilePath, "外地预交增值税及附加税");
List<InvoiceBean> invoiceBeanList = getInvoiceList(invoiceSheet, "a", "i");

这里我首先通过util的方法获取到了sheet表,然后将需要解析的列号写入方法内
然后获取到了对应的List集合

bean实体

package excel.bean;

/**
* Created by kikt on 2017/2/26.
* 记账信息
*/

public class RecordBean extends NumberBean{
private int index;
private double number;

public int getIndex() {
return index;
}

public void setIndex(int index) {
this.index = index;
}

public double getNumber() {
return number;
}

public void setNumber(double number) {
this.number = number;
}

@Override
public String toString() {
return "RecordBean{" +
"index=" + index +
", number=" + number +
'}';
}
}
package excel.bean;

/**
* Created by kikt on 2017/2/26.
*/

public class NumberBean {
private int numberIndex;

public int getNumberIndex() {
return numberIndex;
}

public void setNumberIndex(int numberIndex) {
this.numberIndex = numberIndex;
}
}

获取list的方法

private static List<RecordBean> getRecordList(Sheet recordSheet, String indexLetter, String numberLetter) {
List<RecordBean> list = new ArrayList<>();
for (Row cells : recordSheet) {
RecordBean bean = new RecordBean();
Cell indexCell = cells.getCell(ExcelUtils.getColumnNumber(indexLetter));
if (indexCell == null || indexCell.getCellType() != Cell.CELL_TYPE_NUMERIC) {
continue;
}
double numericCellValue = indexCell.getNumericCellValue();
bean.setIndex((int) numericCellValue);
int columnNumber = ExcelUtils.getColumnNumber(numberLetter);
bean.setNumberIndex(columnNumber);
bean.setNumber(cells.getCell(columnNumber).getNumericCellValue());
list.add(bean);
}

return list;
}

另一个大致相同,这里不贴了

然后通过一个compare方法比较一下

private static List<InvoiceBean> compareList(List<RecordBean> recordBeanList, List<InvoiceBean> invoiceBeanList) {
List<InvoiceBean> unMarkBeanList = new ArrayList<>();

for (int i = recordBeanList.size() - 1; i >= 0; i--) {
RecordBean recordBean = recordBeanList.get(i);
for (int j = 0; j < invoiceBeanList.size(); j++) {
InvoiceBean invoiceBean = invoiceBeanList.get(j);
if (recordBean.getNumber() == invoiceBean.getNumber()) {
invoiceBeanList.remove(invoiceBean);
recordBeanList.remove(recordBean);
break;
}
}
}

unMarkBeanList.addAll(invoiceBeanList);

return unMarkBeanList;
}

将相同的移除掉,剩余的就是不同的

保存结果

这里光有比对结果不行,还需要修改表格,将不同的标记出来,以备人工查账

保存sheet的方法

 public static void saveWorkbook(String path, Workbook workbook) throws IOException {
File file = new File(path);
workbook.write(new FileOutputStream(file));
}

public static void backupSheet(String path, Workbook workbook) throws IOException {
File file = new File(path);
String name = file.getName();
String newPath = file.getParentFile().getAbsolutePath() + "\\backup\\";
String newName = newPath + name + "_" + TimeUtils.getTimeString() + ".bak";
File newFile = new File(newName);
newFile.getParentFile().mkdirs();
newFile.createNewFile();
workbook.write(new FileOutputStream(newFile));
}

public static void saveSheet(Sheet sheet, String path) throws IOException {
Workbook workbook = sheet.getWorkbook();
saveWorkbook(path, workbook);
}
package excel.utils;

import java.text.SimpleDateFormat;
import java.util.Date;

/**
* Created by kikt on 2017/2/26.
*/

public class TimeUtils {

}public static String getTimeString() {
SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMdd_HHmmss");
return sdf.format(new Date());
}

核心保存的方法是workbook.write(OutputStream)方法,简单封装了一下,saveSheet()也是封装,参数不同,这里还有一个备份的方法,可以大概看看,简单的说就是修改文件名,加时间戳.bak后缀,保存成文件

修改样式

保存和备份文件说完了,这里还需要修改下样式,不然谁知道你最后查出了什么

 private static void setStyle(Sheet invoiceSheet, int index, int numberIndex) {
for (Row cells : invoiceSheet) {
Cell cell = cells.getCell(ExcelUtils.getColumnNumber("a"));
if (cell != null && cell.getCellType() == Cell.CELL_TYPE_NUMERIC) {
if (index == cell.getNumericCellValue()) {
Cell numberCell = cells.getCell(numberIndex);
CellStyle cellStyle = invoiceSheet.getWorkbook().createCellStyle();
cellStyle.setFillPattern(HSSFCellStyle.SOLID_FOREGROUND);
cellStyle.setFillForegroundColor(HSSFColor.RED.index);
numberCell.setCellStyle(cellStyle);
}
}
}
}

这里没有封装,只是简单的修改了下
核心代码是

CellStyle cellStyle = invoiceSheet.getWorkbook().createCellStyle();//创建一个新单元格样式
cellStyle.setFillPattern(HSSFCellStyle.SOLID_FOREGROUND);//填充方式是前景色
cellStyle.setFillForegroundColor(HSSFColor.RED.index);//设置前景色为红色
numberCell.setCellStyle(cellStyle);//将单元格的样式改为新建的样式

到这里简单的修改样式就结束了,只要在这之后保存workbook就可以了

结语

这篇文章主要是解析和简单的修改,后面可能会涉及到生成文件,到时候再写篇文章吧