如何从S3下载GZip文件?

时间:2023-02-06 00:05:34

I have looked at both AWS S3 Java SDK - Download file help and Working with Zip and GZip files in Java.

我查看了AWS S3 Java SDK - 下载文件帮助和使用Java中的Zip和GZip文件。

While they provide ways to download and deal with files from S3 and GZipped files respectively, these do not help in dealing with a GZipped file located in S3. How would I do this?

虽然它们提供了分别从S3和GZipped文件下载和处理文件的方法,但这些方法无法处理位于S3中的GZipped文件。我该怎么办?

Currently I have:

目前我有:

try {
    AmazonS3 s3Client = new AmazonS3Client(
            new ProfileCredentialsProvider());
    String URL = downloadURL.getPrimitiveJavaObject(arg0[0].get());
    S3Object fileObj = s3Client.getObject(getBucket(URL), getFile(URL));
    BufferedReader fileIn = new BufferedReader(new InputStreamReader(
            fileObj.getObjectContent()));
    String fileContent = "";
    String line = fileIn.readLine();
    while (line != null){
        fileContent += line + "\n";
        line = fileIn.readLine();
    }
    fileObj.close();
    return fileContent;
} catch (IOException e) {
    e.printStackTrace();
    return "ERROR IOEXCEPTION";
}

Clearly, I am not handling the compressed nature of the file, and my output is:

显然,我没有处理文件的压缩特性,我的输出是:

����sU�3204�50�5010�20�24��L,(���O�V�M-.NLOU�R�U�����<s��<#�^�.wߐX�%w���������}C=�%�J3��.�����둚�S�ᜑ���ZQ�T�e��#sr�cdN#瘐:&�
S�BǔJ����P�<��

However, I cannot implement the example in the second question given above because the file is not located locally, it requires downloading from S3.

但是,我无法在上面给出的第二个问题中实现该示例,因为该文件不在本地,它需要从S3下载。

What should I do?

我该怎么办?

3 个解决方案

#1


5  

I solved the issue using a Scanner instead of an InputStream.

我使用Scanner而不是InputStream解决了这个问题。

The scanner takes the GZIPInputStream and reads the unzipped file line by line:

扫描程序采用GZIPInputStream并逐行读取解压缩的文件:

fileObj = s3Client.getObject(new GetObjectRequest(oSummary.getBucketName(), oSummary.getKey()));
fileIn = new Scanner(new GZIPInputStream(fileObj.getObjectContent()));

#2


1  

You have to use GZIPInputStream to read GZIP file

您必须使用GZIPInputStream来读取GZIP文件

       AmazonS3 s3Client = new AmazonS3Client(
            new ProfileCredentialsProvider());
    String URL = downloadURL.getPrimitiveJavaObject(arg0[0].get());
    S3Object fileObj = s3Client.getObject(getBucket(URL), getFile(URL));

    byte[] buffer = new byte[1024];
    int n;
    FileOutputStream fileOuputStream = new FileOutputStream("temp.gz");
    BufferedInputStream bufferedInputStream = new BufferedInputStream( new GZIPInputStream(fileObj.getObjectContent()));

    GZIPOutputStream gzipOutputStream = new GZIPOutputStream(fileOuputStream);
    while ((n = bufferedInputStream.read(buffer)) != -1) {
        gzipOutputStream.write(buffer);
    }
    gzipOutputStream.flush();
    gzipOutputStream.close();

Please try this way to download GZip file from S3.

请尝试这种方式从S3下载GZip文件。

#3


-1  

I wasn't quite looking for this issue but I did feel like improving the quality of this thread by actually explaining why the already provided solution works.

我并不是在寻找这个问题,但我确实想通过实际解释为什么已经提供的解决方案有效来提高这个线程的质量。

No it's not because of the Scanner as is suggested. It's because the stream is being ungzipped by wrapping fileObj.getObjectContent() in a GZIPInputStream which unzips the contents.

不,不是因为建议的扫描仪。这是因为通过在GZIPInputStream中包装fileObj.getObjectContent()来解压缩流,该GZIPInputStream解压缩内容。

Remove the scanner but keep the GZIPInputStream and things will still work.

删除扫描仪,但保留GZIPInputStream,事情仍然有效。

#1


5  

I solved the issue using a Scanner instead of an InputStream.

我使用Scanner而不是InputStream解决了这个问题。

The scanner takes the GZIPInputStream and reads the unzipped file line by line:

扫描程序采用GZIPInputStream并逐行读取解压缩的文件:

fileObj = s3Client.getObject(new GetObjectRequest(oSummary.getBucketName(), oSummary.getKey()));
fileIn = new Scanner(new GZIPInputStream(fileObj.getObjectContent()));

#2


1  

You have to use GZIPInputStream to read GZIP file

您必须使用GZIPInputStream来读取GZIP文件

       AmazonS3 s3Client = new AmazonS3Client(
            new ProfileCredentialsProvider());
    String URL = downloadURL.getPrimitiveJavaObject(arg0[0].get());
    S3Object fileObj = s3Client.getObject(getBucket(URL), getFile(URL));

    byte[] buffer = new byte[1024];
    int n;
    FileOutputStream fileOuputStream = new FileOutputStream("temp.gz");
    BufferedInputStream bufferedInputStream = new BufferedInputStream( new GZIPInputStream(fileObj.getObjectContent()));

    GZIPOutputStream gzipOutputStream = new GZIPOutputStream(fileOuputStream);
    while ((n = bufferedInputStream.read(buffer)) != -1) {
        gzipOutputStream.write(buffer);
    }
    gzipOutputStream.flush();
    gzipOutputStream.close();

Please try this way to download GZip file from S3.

请尝试这种方式从S3下载GZip文件。

#3


-1  

I wasn't quite looking for this issue but I did feel like improving the quality of this thread by actually explaining why the already provided solution works.

我并不是在寻找这个问题,但我确实想通过实际解释为什么已经提供的解决方案有效来提高这个线程的质量。

No it's not because of the Scanner as is suggested. It's because the stream is being ungzipped by wrapping fileObj.getObjectContent() in a GZIPInputStream which unzips the contents.

不,不是因为建议的扫描仪。这是因为通过在GZIPInputStream中包装fileObj.getObjectContent()来解压缩流,该GZIPInputStream解压缩内容。

Remove the scanner but keep the GZIPInputStream and things will still work.

删除扫描仪,但保留GZIPInputStream,事情仍然有效。