将二进制文件读入结构

时间:2021-10-09 11:07:59

I'm trying to read binary data using C#. I have all the information about the layout of the data in the files I want to read. I'm able to read the data "chunk by chunk", i.e. getting the first 40 bytes of data converting it to a string, get the next 40 bytes.

我正在尝试使用C#读取二进制数据。我有关于我想要阅读的文件中数据布局的所有信息。我能够读取数据“chunk by chunk”,即获取前40个字节的数据将其转换为字符串,获得接下来的40个字节。

Since there are at least three slightly different version of the data, I would like to read the data directly into a struct. It just feels so much more right than by reading it "line by line".

由于至少有三种略有不同的数据版本,我想将数据直接读入结构中。它比通过“逐行”阅读它感觉更加正确。

I have tried the following approach but to no avail:

我尝试了以下方法但无济于事:

StructType aStruct;
int count = Marshal.SizeOf(typeof(StructType));
byte[] readBuffer = new byte[count];
BinaryReader reader = new BinaryReader(stream);
readBuffer = reader.ReadBytes(count);
GCHandle handle = GCHandle.Alloc(readBuffer, GCHandleType.Pinned);
aStruct = (StructType) Marshal.PtrToStructure(handle.AddrOfPinnedObject(), typeof(StructType));
handle.Free();

The stream is an opened FileStream from which I have began to read from. I get an AccessViolationException when using Marshal.PtrToStructure.

该流是一个打开的FileStream,我已经开始从中读取。使用Marshal.PtrToStructure时出现AccessViolationException。

The stream contains more information than I'm trying to read since I'm not interested in data at the end of the file.

由于我对文件末尾的数据不感兴趣,因此该流包含的信息比我尝试阅读的要多。

The struct is defined like:

结构定义如下:

[StructLayout(LayoutKind.Explicit)]
struct StructType
{
    [FieldOffset(0)]
    public string FileDate;
    [FieldOffset(8)]
    public string FileTime;
    [FieldOffset(16)]
    public int Id1;
    [FieldOffset(20)]
    public string Id2;
}

The examples code is changed from original to make this question shorter.

示例代码从原始代码更改为使此问题更短。

How would I read binary data from a file into a struct?

如何将二进制数据从文件读入结构?

7 个解决方案

#1


23  

The problem is the strings in your struct. I found that marshaling types like byte/short/int is not a problem; but when you need to marshal into a complex type such as a string, you need your struct to explicitly mimic an unmanaged type. You can do this with the MarshalAs attrib.

问题是结构中的字符串。我发现像byte / short / int这样的编组类型不是问题;但是当你需要编组成一个复杂的类型(如字符串)时,你需要你的结构显式地模仿非托管类型。您可以使用MarshalAs attrib执行此操作。

For your example, the following should work:

对于您的示例,以下应该工作:

[StructLayout(LayoutKind.Explicit)]
struct StructType
{
    [FieldOffset(0)]
    [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 8)]
    public string FileDate;

    [FieldOffset(8)]
    [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 8)]
    public string FileTime;

    [FieldOffset(16)]
    public int Id1;

    [FieldOffset(20)]
    [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 66)] //Or however long Id2 is.
    public string Id2;
}

#2


9  

Here is what I am using.
This worked successfully for me for reading Portable Executable Format.
It's a generic function, so T is your struct type.

这是我正在使用的。这成功地为我阅读可移植可执行格式。它是一个通用函数,所以T是你的结构类型。

public static T ByteToType<T>(BinaryReader reader)
{
    byte[] bytes = reader.ReadBytes(Marshal.SizeOf(typeof(T)));

    GCHandle handle = GCHandle.Alloc(bytes, GCHandleType.Pinned);
    T theStructure = (T)Marshal.PtrToStructure(handle.AddrOfPinnedObject(), typeof(T));
    handle.Free();

    return theStructure;
}

#3


5  

As Ronnie said, I'd use BinaryReader and read each field individually. I can't find the link to the article with this info, but it's been observed that using BinaryReader to read each individual field can be faster than Marshal.PtrToStruct, if the struct contains less than 30-40 or so fields. I'll post the link to the article when I find it.

正如Ronnie所说,我会使用BinaryReader并单独读取每个字段。我找不到带有此信息的文章的链接,但据观察,如果struct包含少于30-40个字段,则使用BinaryReader读取每个单独的字段可能比Marshal.PtrToStruct更快。当我找到它时,我会发布文章的链接。

The article's link is at: http://www.codeproject.com/Articles/10750/Fast-Binary-File-Reading-with-C

该文章的链接位于:http://www.codeproject.com/Articles/10750/Fast-Binary-File-Reading-with-C

When marshaling an array of structs, PtrToStruct gains the upper-hand more quickly, because you can think of the field count as fields * array length.

当编组结构数组时,PtrToStruct可以更快地获得上风,因为您可以将字段数视为字段*数组长度。

#4


3  

I had no luck using the BinaryFormatter, I guess I have to have a complete struct that matches the content of the file exactly. I realised that in the end I wasn't interested in very much of the file content anyway so I went with the solution of reading part of stream into a bytebuffer and then converting it using

我没有运气使用BinaryFormatter,我想我必须有一个完全匹配文件内容的完整结构。我意识到最后我对文件内容并不感兴趣,所以我选择了将部分流读取到bytebuffer然后使用它转换它的解决方案。

Encoding.ASCII.GetString()

for strings and

对于字符串和

BitConverter.ToInt32()

for the integers.

对于整数。

I will need to be able to parse more of the file later on but for this version I got away with just a couple of lines of code.

我需要稍后能够解析更多的文件,但是对于这个版本,我只需要几行代码就可以了。

#5


1  

I don't see any problem with your code.

我没有看到您的代码有任何问题。

just out of my head, what if you try to do it manually? does it work?

只是出于我的想法,如果你试图手动怎么办?它有用吗?

BinaryReader reader = new BinaryReader(stream);
StructType o = new StructType();
o.FileDate = Encoding.ASCII.GetString(reader.ReadBytes(8));
o.FileTime = Encoding.ASCII.GetString(reader.ReadBytes(8));
...
...
...

also try

StructType o = new StructType();
byte[] buffer = new byte[Marshal.SizeOf(typeof(StructType))];
GCHandle handle = GCHandle.Alloc(buffer, GCHandleType.Pinned);
Marshal.StructureToPtr(o, handle.AddrOfPinnedObject(), false);
handle.Free();

then use buffer[] in your BinaryReader instead of reading data from FileStream to see whether you still get AccessViolation exception.

然后在BinaryReader中使用buffer []而不是从FileStream读取数据,看看你是否仍然得到AccessViolation异常。

I had no luck using the BinaryFormatter, I guess I have to have a complete struct that matches the content of the file exactly.

我没有运气使用BinaryFormatter,我想我必须有一个完全匹配文件内容的完整结构。

That makes sense, BinaryFormatter has its own data format, completely incompatible with yours.

这是有道理的,BinaryFormatter有自己的数据格式,与你的完全不兼容。

#6


0  

Try this:

using (FileStream stream = new FileStream(fileName, FileMode.Open))
{
    BinaryFormatter formatter = new BinaryFormatter();
    StructType aStruct = (StructType)formatter.Deserialize(filestream);
}

#7


0  

Reading straight into structs is evil - many a C program has fallen over because of different byte orderings, different compiler implementations of fields, packing, word size.......

直接阅读结构是邪恶的 - 许多C程序因为不同的字节顺序,字段的不同编译器实现,打包,字大小而失败.......

You are best of serialising and deserialising byte by byte. Use the build in stuff if you want or just get used to BinaryReader.

您最好逐字节串行化和反序列化。如果你想要或者只是习惯了BinaryReader,请使用build in stuff。

#1


23  

The problem is the strings in your struct. I found that marshaling types like byte/short/int is not a problem; but when you need to marshal into a complex type such as a string, you need your struct to explicitly mimic an unmanaged type. You can do this with the MarshalAs attrib.

问题是结构中的字符串。我发现像byte / short / int这样的编组类型不是问题;但是当你需要编组成一个复杂的类型(如字符串)时,你需要你的结构显式地模仿非托管类型。您可以使用MarshalAs attrib执行此操作。

For your example, the following should work:

对于您的示例,以下应该工作:

[StructLayout(LayoutKind.Explicit)]
struct StructType
{
    [FieldOffset(0)]
    [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 8)]
    public string FileDate;

    [FieldOffset(8)]
    [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 8)]
    public string FileTime;

    [FieldOffset(16)]
    public int Id1;

    [FieldOffset(20)]
    [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 66)] //Or however long Id2 is.
    public string Id2;
}

#2


9  

Here is what I am using.
This worked successfully for me for reading Portable Executable Format.
It's a generic function, so T is your struct type.

这是我正在使用的。这成功地为我阅读可移植可执行格式。它是一个通用函数,所以T是你的结构类型。

public static T ByteToType<T>(BinaryReader reader)
{
    byte[] bytes = reader.ReadBytes(Marshal.SizeOf(typeof(T)));

    GCHandle handle = GCHandle.Alloc(bytes, GCHandleType.Pinned);
    T theStructure = (T)Marshal.PtrToStructure(handle.AddrOfPinnedObject(), typeof(T));
    handle.Free();

    return theStructure;
}

#3


5  

As Ronnie said, I'd use BinaryReader and read each field individually. I can't find the link to the article with this info, but it's been observed that using BinaryReader to read each individual field can be faster than Marshal.PtrToStruct, if the struct contains less than 30-40 or so fields. I'll post the link to the article when I find it.

正如Ronnie所说,我会使用BinaryReader并单独读取每个字段。我找不到带有此信息的文章的链接,但据观察,如果struct包含少于30-40个字段,则使用BinaryReader读取每个单独的字段可能比Marshal.PtrToStruct更快。当我找到它时,我会发布文章的链接。

The article's link is at: http://www.codeproject.com/Articles/10750/Fast-Binary-File-Reading-with-C

该文章的链接位于:http://www.codeproject.com/Articles/10750/Fast-Binary-File-Reading-with-C

When marshaling an array of structs, PtrToStruct gains the upper-hand more quickly, because you can think of the field count as fields * array length.

当编组结构数组时,PtrToStruct可以更快地获得上风,因为您可以将字段数视为字段*数组长度。

#4


3  

I had no luck using the BinaryFormatter, I guess I have to have a complete struct that matches the content of the file exactly. I realised that in the end I wasn't interested in very much of the file content anyway so I went with the solution of reading part of stream into a bytebuffer and then converting it using

我没有运气使用BinaryFormatter,我想我必须有一个完全匹配文件内容的完整结构。我意识到最后我对文件内容并不感兴趣,所以我选择了将部分流读取到bytebuffer然后使用它转换它的解决方案。

Encoding.ASCII.GetString()

for strings and

对于字符串和

BitConverter.ToInt32()

for the integers.

对于整数。

I will need to be able to parse more of the file later on but for this version I got away with just a couple of lines of code.

我需要稍后能够解析更多的文件,但是对于这个版本,我只需要几行代码就可以了。

#5


1  

I don't see any problem with your code.

我没有看到您的代码有任何问题。

just out of my head, what if you try to do it manually? does it work?

只是出于我的想法,如果你试图手动怎么办?它有用吗?

BinaryReader reader = new BinaryReader(stream);
StructType o = new StructType();
o.FileDate = Encoding.ASCII.GetString(reader.ReadBytes(8));
o.FileTime = Encoding.ASCII.GetString(reader.ReadBytes(8));
...
...
...

also try

StructType o = new StructType();
byte[] buffer = new byte[Marshal.SizeOf(typeof(StructType))];
GCHandle handle = GCHandle.Alloc(buffer, GCHandleType.Pinned);
Marshal.StructureToPtr(o, handle.AddrOfPinnedObject(), false);
handle.Free();

then use buffer[] in your BinaryReader instead of reading data from FileStream to see whether you still get AccessViolation exception.

然后在BinaryReader中使用buffer []而不是从FileStream读取数据,看看你是否仍然得到AccessViolation异常。

I had no luck using the BinaryFormatter, I guess I have to have a complete struct that matches the content of the file exactly.

我没有运气使用BinaryFormatter,我想我必须有一个完全匹配文件内容的完整结构。

That makes sense, BinaryFormatter has its own data format, completely incompatible with yours.

这是有道理的,BinaryFormatter有自己的数据格式,与你的完全不兼容。

#6


0  

Try this:

using (FileStream stream = new FileStream(fileName, FileMode.Open))
{
    BinaryFormatter formatter = new BinaryFormatter();
    StructType aStruct = (StructType)formatter.Deserialize(filestream);
}

#7


0  

Reading straight into structs is evil - many a C program has fallen over because of different byte orderings, different compiler implementations of fields, packing, word size.......

直接阅读结构是邪恶的 - 许多C程序因为不同的字节顺序,字段的不同编译器实现,打包,字大小而失败.......

You are best of serialising and deserialising byte by byte. Use the build in stuff if you want or just get used to BinaryReader.

您最好逐字节串行化和反序列化。如果你想要或者只是习惯了BinaryReader,请使用build in stuff。