最近在写一个大量小文件直接压缩到一个zip的需求,由于zip中的entry每一个都是独立的,不需要追加写入,也就是一个entry文件,写一个内容,
因此直接使用了多线程来处理,结果就翻车了,代码给出了如下的错误:write beyond end of stream!
下面直接还原当时的代码场景:
1 public class MultiThreadWriteZipFile {
2
3 private static ExecutorService executorService = Executors.newFixedThreadPool(50);
4
5 private static CountDownLatch countDownLatch = new CountDownLatch(50);
6
7
8 @Test
9 public void multiThreadWriteZip() throws IOException, InterruptedException {
10 File file = new File("D:\\Gis开发\\数据\\影像数据\\china_tms\\2\\6\\2.jpeg");
11 //创建一个zip
12 ZipOutputStream zipOutputStream =
13 new ZipOutputStream(new FileOutputStream(new File("E:\\java\\test\\test.zip")));
14
15 for (int i = 0; i < 50; i++){
16 String entryName = i + File.separator + i + File.separator + i + ".jpeg";
17 executorService.submit(() -> {
18 try {
19 writeSource2ZipFile(new FileInputStream(file),entryName,zipOutputStream);
20 countDownLatch.countDown();
21 } catch (IOException e) {
22 e.getLocalizedMessage();
23 }
24 });
25 }
26 //阻塞主线程
27 countDownLatch.await();
28 //关闭流
29 zipOutputStream.close();
30 }
31
32
33 public void writeSource2ZipFile(InputStream inputStream,
34 String zipEntryName,
35 ZipOutputStream zipOutputStream) throws IOException {
36 //新建entry
37 zipOutputStream.putNextEntry(new ZipEntry(zipEntryName));
38 byte[] buf = new byte[1024];
39 int position;
40 //entry中写数据
41 while((position = inputStream.read(buf)) != -1){
42 zipOutputStream.write(buf);
43 }
44 zipOutputStream.closeEntry();
45 zipOutputStream.flush();
46 }
47 }
直接运行上面的代码就会报错:write beyond end of stream
将 private static ExecutorService executorService = Executors.newFixedThreadPool(50);
修改为
private static ExecutorSercvice executorService = Executors.newSingleThreadExecutor();
此时代码运行正常!
至于原因嘛,我们跟踪下代码也就明白其中的原因了,我们先来看报错的代码出处:
在java.util包下的DeflaterOutputStream的201行(jdk1.8,其它版本可能会有差异),我们来看代码
public void write(byte[] b, int off, int len) throws IOException { if (def.finished()) { throw new IOException("write beyond end of stream"); } if ((off | len | (off + len) | (b.length - (off + len))) < 0) { throw new IndexOutOzfBoundsException(); } else if (len == 0) { return; } if (!def.finished()) { def.setInput(b, off, len); while (!def.needsInput()) { deflate(); } } }
关键的原因就是def.finished()对应的状态信息,而这个状态是在Deflater这个类中定义的,这个类也是Java基于ZLIB压缩库实现的,一个压缩工具类。
而下面的这段代码就是改变这个状态的,
public void finish() { synchronized (zsRef) { finish = true; } }
而这个代码的调用之处,最源头就是我们上面的zipOutputStream.putNextEntry(new ZipEntry(zipEntryName)); 这行代码,
其实先思路,就是每次新增一个entry的时候,都需要将上一次的entry关闭掉,此时也就触发了这个条件,而这个状态并不是线程私有的,我们通过下面的代码就可以知道
public class Deflater { private final ZStreamRef zsRef; private byte[] buf = new byte[0]; private int off, len; private int level, strategy; private boolean setParams; private boolean finish, finished; private long bytesRead; private long bytesWritten;
因此在多线程下,这个状态肯定是线程不安全的!
好了本次关于多线程下写zip报错的问题,就介绍到这里!