android写入磁盘不可靠 - 写入file.length!= expected.length

时间:2022-11-07 07:46:32

I have a write method that will write a byte[] to disk. On very few devices I'm running into some strange problems where the written file.length() != byte[].length after a successful write operation.

我有一个写入方法,将byte []写入磁盘。在很少的设备上,我遇到了一些奇怪的问题,写入file.length()!= byte []。成功写入操作后的长度。

Code and Problem

The code to write a file to disk

将文件写入磁盘的代码

private static boolean writeByteFile(File file, byte[] byteData) throws IOException {
    if (!file.exists()) {
        boolean fileCreated = file.createNewFile();
        if (!fileCreated) {
            return false;
        }
    }

    FileOutputStream fos = new FileOutputStream(file);
    BufferedOutputStream bos = new BufferedOutputStream(fos);
    bos.write(byteData);
    bos.flush();
    fos.getFD().sync(); // sync to disk as recommended: http://android-developers.blogspot.com/2010/12/saving-data-safely.html
    fos.close();


    if (file.length() != byteData.length) {
        final byte[] originalMD5Hash = md.digest(byteData);

        InputStream is = new FileInputStream(file);
        BufferedInputStream bis = new BufferedInputStream(is);
        byte[] buffer = new byte[4096];

        while(bis.read(buffer) > -1) {
            md.update(buffer);
        }
        is.close();

        final byte[] writtenFileMD5Hash = md.digest();

        if(!Arrays.equals(originalMD5Hash, writtenFileMD5Hash)) {
            String message = String.format(
                    "After an fsync, the file's length is not equal to the number of bytes we wrote!\npath=%s, expected=%d, actual=%d.  >>  " +
                    "Original MD5 Hash: %s, written file MD5 hash: %s",
                    file.getAbsolutePath(), byteData.length, file.length(),
                    digestToHex(originalMD5Hash), digestToHex(writtenFileMD5Hash));
            throw new GiantWtfException(message);
        }
    }

    return true;
}

I'm running into the if-statement where I compare file length on a few devices. One example output:

我正在进入if语句,我在一些设备上比较文件长度。一个示例输出:

After an fsync, the file's length is not equal to the number of bytes we wrote! path=/mnt/sdcard/.folder/filename, expected=233510, actual=229376 >> Original MD5 Hash: f1d298c0484672c52d9c26d04a3a21dc, written file MD5 hash: ab30660bd2b476d9551c15b340207a8a

在fsync之后,文件的长度不等于我们写的字节数! path = / mnt / sdcard / .folder / filename,expected = 233510,actual = 229376 >>原始MD5哈希:f1d298c0484672c52d9c26d04a3a21dc,写入文件MD5哈希:ab30660bd2b476d9551c15b340207a8a

I currently see this problem on 5 devices as I'm slowly rolling out the code. Some device data:

我目前在5台设备上看到了这个问题,因为我正在慢慢推出代码。一些设备数据:

Question

Is there anything else I can do or improve?

还有什么我可以做或改进的吗?

More stats and observations

Current system version

当前系统版本

  • 2.3.5
  • 2.3.5
  • 2.3.6
  • 2.3.6

Model

模型

  • N860 (LG)
  • N860(LG)
  • GT-I9100G (Samsung)
  • GT-I9100G(三星)
  • GT-S5300 (Samsung)
  • GT-S5300(三星)
  • GT-S7500 (Samsung)
  • GT-S7500(三星)
  • LG-VS410PP (LG)
  • LG-VS410PP(LG)

Other stats

其他统计数据

In the general crash analytics (from Crittercism) there is always more then enough free disk space at the time the problem happens. Still some (not all) of the devices have thrown IOExceptions around no free disk space at a different point in time.

在一般崩溃分析(来自Crittercism)中,在问题发生时总是有足够的可用磁盘空间。仍然有一些(并非所有)设备在不同的时间点抛出IOExceptions而没有可用的磁盘空间。

As always I've never been able to reproduce that problem on any test phone I have.

一如既往,我从来没有能够在任何测试手机上重现这个问题。

Assumptions / Observations:

假设/观察:

Generally I would expect a IOException when the disk is full. Still all the exceptions that I catch have less bytes written then they should have.

通常我会在磁盘已满时发生IOException。我捕获的所有异常仍然有更少的字节写,然后他们应该有。

Interestingly enough all the number of bytes that actually have been written to disk are a multiple of 2^15.

有趣的是,实际写入磁盘的所有字节数都是2 ^ 15的倍数。

EDIT: I added a MD5 check sum validation that also fails and simplified the example code a little for better readability. It still fails in the wild with different MD5 hashes.

编辑:我添加了一个MD5校验和验证,该验证也失败并简化了示例代码以提高可读性。它仍然在野外失败,具有不同的MD5哈希值。

1 个解决方案

#1


0  

philipp, file.length() is the file size as reported by the OS. It might be the space the file takes up on disk or the number of bytes in the file.

philipp,file.length()是操作系统报告的文件大小。它可能是文件占用磁盘的空间或文件中的字节数。

If the number returned is size on disk, it is related to the number of clusters that hold the file. For example NTFS generally uses 4KB clusters. If you save a text document with 3 ascii encoded characters in on an NTFS formatted volume, the size of the file is 3 bytes, the size of the file on disk is 4096 bytes. On NTFS with a 4KB cluster all files are a multiple of 4096 bytes on disk. See http://en.wikipedia.org/wiki/Data_cluster for more.

如果返回的数字是磁盘上的大小,则它与保存文件的群集数量有关。例如,NTFS通常使用4KB群集。如果在NTFS格式的卷上保存带有3个ascii编码字符的文本文档,则文件大小为3个字节,磁盘上文件的大小为4096个字节。在具有4KB群集的NTFS上,所有文件都是磁盘上4096字节的倍数。有关更多信息,请参见http://en.wikipedia.org/wiki/Data_cluster。

If the number returned is the length of the file in bytes (from the underlying file-system's meta-data) then you should have an exact match to how many bytes you wrote, though I wouldn't bet my life on it.

如果返回的数字是文件的长度(以字节为单位)(来自底层文件系统的元数据),那么你应该与你写的字节数完全匹配,尽管我不会赌我的生命。

Android uses YAFFS or EXT4, if that helps at all.

Android使用YAFFS或EXT4,如果这有帮助的话。

I strongly agree with admdrew, use a hash. MD5 would work great. SHA or even CRC should work fine for this task. As you write bytes to the disk, feed the stream to your hash algorithm as well. Once the file is written, read it back and feed that to your hasher. Compare the results. If you want to be sure the data is clean, file size is not enough.

我非常同意admdrew,使用哈希。 MD5会很棒。 SHA或甚至CRC应该可以正常执行此任务。在将字节写入磁盘时,也要将流提供给哈希算法。一旦文件被写入,请将其读回并将其提供给您的哈希。比较结果。如果您想确保数据干净,文件大小是不够的。

#1


0  

philipp, file.length() is the file size as reported by the OS. It might be the space the file takes up on disk or the number of bytes in the file.

philipp,file.length()是操作系统报告的文件大小。它可能是文件占用磁盘的空间或文件中的字节数。

If the number returned is size on disk, it is related to the number of clusters that hold the file. For example NTFS generally uses 4KB clusters. If you save a text document with 3 ascii encoded characters in on an NTFS formatted volume, the size of the file is 3 bytes, the size of the file on disk is 4096 bytes. On NTFS with a 4KB cluster all files are a multiple of 4096 bytes on disk. See http://en.wikipedia.org/wiki/Data_cluster for more.

如果返回的数字是磁盘上的大小,则它与保存文件的群集数量有关。例如,NTFS通常使用4KB群集。如果在NTFS格式的卷上保存带有3个ascii编码字符的文本文档,则文件大小为3个字节,磁盘上文件的大小为4096个字节。在具有4KB群集的NTFS上,所有文件都是磁盘上4096字节的倍数。有关更多信息,请参见http://en.wikipedia.org/wiki/Data_cluster。

If the number returned is the length of the file in bytes (from the underlying file-system's meta-data) then you should have an exact match to how many bytes you wrote, though I wouldn't bet my life on it.

如果返回的数字是文件的长度(以字节为单位)(来自底层文件系统的元数据),那么你应该与你写的字节数完全匹配,尽管我不会赌我的生命。

Android uses YAFFS or EXT4, if that helps at all.

Android使用YAFFS或EXT4,如果这有帮助的话。

I strongly agree with admdrew, use a hash. MD5 would work great. SHA or even CRC should work fine for this task. As you write bytes to the disk, feed the stream to your hash algorithm as well. Once the file is written, read it back and feed that to your hasher. Compare the results. If you want to be sure the data is clean, file size is not enough.

我非常同意admdrew,使用哈希。 MD5会很棒。 SHA或甚至CRC应该可以正常执行此任务。在将字节写入磁盘时,也要将流提供给哈希算法。一旦文件被写入,请将其读回并将其提供给您的哈希。比较结果。如果您想确保数据干净,文件大小是不够的。