如何优化二进制文件的数据加载

时间:2021-11-22 05:27:59

I have a binary file encoded with little endian and containing ~250.000 values of var1 then another same number of values of var2. I should make a method that reads the file and returns a DataSet with those values in the columns var1 and var2.

我有一个用小端编码的二进制文件,包含大约250.000的var1值,然后是另一个相同数量的var2值。我应该创建一个读取文件的方法,并在列var1和var2中返回带有这些值的DataSet。

I am using the library: miscutil mentioned here in SO multiple times, see here as well for details: will there be an update on MiscUtil for .Net 4?

我正在使用这个库:多次在SO中提到的错误,请参阅此处以获取详细信息:MiberUtil for .Net 4会有更新吗?

thanks a lot Jon Skeet for making it available. :)

非常感谢Jon Skeet让它变得可用。 :)

I have the following code working, I am interested in better ideas on how to minimize the for loops to read from the file and to populate the DataTable. Any suggestion?

我有以下代码工作,我感兴趣的是如何最小化从文件读取的for循环和填充DataTable的更好的想法。有什么建议吗?

private static DataSet parseBinaryFile(string filePath)
{
    var result = new DataSet();

    var table = result.Tables.Add("Data");

    table.Columns.Add("Index", typeof(int));
    table.Columns.Add("rain", typeof(float));
    table.Columns.Add("gnum", typeof(float));

    const int samplesCount = 259200; // 720 * 360

    float[] vRain = new float[samplesCount];
    float[] vStations = new float[samplesCount];

    try
    {
        if (string.IsNullOrWhiteSpace(filePath) || !File.Exists(filePath))
        {
            throw new ArgumentException(string.Format("Unable to open the file: '{0}'", filePath));
        }

        // at this point FilePath is valid and exists...
        using (FileStream fs = new FileStream(filePath, FileMode.Open))
        {
            // We are using the library found here: http://www.yoda.arachsys.com/csharp/miscutil/
            var reader = new MiscUtil.IO.EndianBinaryReader(MiscUtil.Conversion.LittleEndianBitConverter.Little, fs);

            int i = 0;

            while (reader.BaseStream.Position < reader.BaseStream.Length) //while (pos < length)
            {
                // Read Data

                float buffer = reader.ReadSingle();

                if (i < samplesCount)
                {
                    vRain[i] = buffer;
                }
                else
                {
                    vStations[i-samplesCount] = buffer;
                }

                ++i;
            }

            Console.WriteLine("number of reads was: {0}", (i/2).ToString("N0"));
        }

        for (int j = 0; j < samplesCount; ++j)
        {
            table.Rows.Add(new object[] { j + 1, vRain[j], vStations[j] });
        }
    }
    catch (Exception exc)
    {
        Debug.WriteLine(exc.Message);
    }

    return result;
} 

1 个解决方案

#1


1  

Option #1

Read the entire file into memory (or Memory Map it) and loop once.

将整个文件读入内存(或内存映射)并循环一次。

Option #2

Add all the data table rows as you read the var1 section with a placeholder value for var2. Then fix-up the data table as you read the var2 section.

在读取带有var2占位符值的var1部分时添加所有数据表行。然后在读取var2部分时修复数据表。

#1


1  

Option #1

Read the entire file into memory (or Memory Map it) and loop once.

将整个文件读入内存(或内存映射)并循环一次。

Option #2

Add all the data table rows as you read the var1 section with a placeholder value for var2. Then fix-up the data table as you read the var2 section.

在读取带有var2占位符值的var1部分时添加所有数据表行。然后在读取var2部分时修复数据表。