同时写入和读取同一文件并通过Jersey流式传输

时间:2022-08-21 22:46:04

I am currently working on a solution for streaming huge files from EMC Documentum and to the client through Jersey. The API for Documentum allows either to get the file as a ByteArrayInputStream or to save it down to a disk area. Using the ByteArrayInputStream is out of question as it stores the whole file in memory, which is not acceptable for 20 gigabyte files.

我目前正在研究一种解决方案,用于从EMC Documentum流式传输大型文件,并通过Jersey传输到客户端。 Documentum的API允许将文件作为ByteArrayInputStream获取或将其保存到磁盘区域。使用ByteArrayInputStream是不可能的,因为它将整个文件存储在内存中,这对于20 GB的文件是不可接受的。

Therefore the only solution is to save the file to a disk area (use of internal classes and functions is also out of question). In order to make it faster I want to let Documentum to write the data to a file and at the same time read data from this file and stream it to the client through Jersey (by returning an InputStream in Jersey).

因此,唯一的解决方案是将文件保存到磁盘区域(使用内部类和函数也是不可能的)。为了使它更快,我想让Documentum将数据写入文件,同时从该文件中读取数据并通过Jersey将其传输到客户端(通过返回Jersey中的InputStream)。

The problem is that if the reading thread is faster than the writing thread it will reach the end of the stream and get -1 back, meaning there is no more data in the stream. When this happens Jersey will probably stop streaming the file as it thinks it is done.

问题是如果读取线程比写入线程更快,它将到达流的末尾并返回-1,这意味着流中没有更多数据。当发生这种情况时,泽西岛可能会停止流式传输文件,因为它认为已完成。

Are there any best practices or libraries for this kind of problem? I have been looking on the internet and have some workarounds in mind, but it feels like this should be a common problem and there maybe already exists a solution in the Jersey API which I missed or some other library. Does there exist a class in Jersey which you can return and explicitly set when the end of the stream is reached?

是否有针对此类问题的最佳实践或库?我一直在寻找互联网,并考虑到一些解决方法,但感觉这应该是一个常见的问题,可能已经存在我错过的泽西API或其他一些库的解决方案。在Jersey中是否存在一个类,您可以返回并在到达流末尾时显式设置?

2 个解决方案

#1


The API for Documentum allows either to get the file as a ByteArrayInputStream or to save it down to a disk area

Documentum的API允许将文件作为ByteArrayInputStream获取或将其保存到磁盘区域

Actually, DFC provides two another options for transferring content from content server:

实际上,DFC提供了另外两种从内容服务器传输内容的选项:

  1. getCollectionForContent() method (poorly documented, but present in public API):

    getCollectionForContent()方法(文档记录很差,但存在于公共API中):

    IDfCollection collection = null;
    try {
        collection = object.getCollectionForContent(null, 0);
        long total = 0;
        while (collection.next()) {
            // 64K chunk
            ByteArrayInputStream baos = collection.getBytesBuffer(null,
                    null, null, 0);
        }
    } finally {
        if (collection != null) {
            collection.close();
        }
    }
    
  2. getStream() method in ISysObjectInternal interface (not a part of public API, but widely used by EMC applications):

    ISysObjectInternal接口中的getStream()方法(不是公共API的一部分,但EMC应用程序广泛使用):

    InputStream stream = null;
    try {
        stream = ((ISysObjectInternal) object).getStream(null, 0, null, false);
    
        // some logic here
    
    } finally {
        if (stream != null) {
            stream.close();
        }
    }
    

#2


EMC Documentum is DMS - document management system. I am sure that you cannot use same repository object to concurrently read/write the same version of that particular object.

EMC Documentum是DMS - 文档管理系统。我确信您不能使用相同的存储库对象来同时读取/写入该特定对象的相同版本。

If you really need to stick to the Documentum maybe you could try accessing real content at the filestore location which ever filestore type you are using. Yet again, this way you need to reconsider security issues and stuff like that.

如果你真的需要坚持Documentum,也许你可以尝试访问你正在使用的文件存储类型的文件存储位置的真实内容。再次,这种方式你需要重新考虑安全问题和类似的东西。

#1


The API for Documentum allows either to get the file as a ByteArrayInputStream or to save it down to a disk area

Documentum的API允许将文件作为ByteArrayInputStream获取或将其保存到磁盘区域

Actually, DFC provides two another options for transferring content from content server:

实际上,DFC提供了另外两种从内容服务器传输内容的选项:

  1. getCollectionForContent() method (poorly documented, but present in public API):

    getCollectionForContent()方法(文档记录很差,但存在于公共API中):

    IDfCollection collection = null;
    try {
        collection = object.getCollectionForContent(null, 0);
        long total = 0;
        while (collection.next()) {
            // 64K chunk
            ByteArrayInputStream baos = collection.getBytesBuffer(null,
                    null, null, 0);
        }
    } finally {
        if (collection != null) {
            collection.close();
        }
    }
    
  2. getStream() method in ISysObjectInternal interface (not a part of public API, but widely used by EMC applications):

    ISysObjectInternal接口中的getStream()方法(不是公共API的一部分,但EMC应用程序广泛使用):

    InputStream stream = null;
    try {
        stream = ((ISysObjectInternal) object).getStream(null, 0, null, false);
    
        // some logic here
    
    } finally {
        if (stream != null) {
            stream.close();
        }
    }
    

#2


EMC Documentum is DMS - document management system. I am sure that you cannot use same repository object to concurrently read/write the same version of that particular object.

EMC Documentum是DMS - 文档管理系统。我确信您不能使用相同的存储库对象来同时读取/写入该特定对象的相同版本。

If you really need to stick to the Documentum maybe you could try accessing real content at the filestore location which ever filestore type you are using. Yet again, this way you need to reconsider security issues and stuff like that.

如果你真的需要坚持Documentum,也许你可以尝试访问你正在使用的文件存储类型的文件存储位置的真实内容。再次,这种方式你需要重新考虑安全问题和类似的东西。