从使用Scipy.io加载的.mat文件中访问数组内容。loadmat——python

时间:2022-03-13 19:47:10

UPDATE: This is a long question that boils down to, can someone explain the numpy array class to me? I answered my own question below.

更新:这是一个很长的问题,可以有人向我解释numpy数组类吗?我在下面回答了我自己的问题。

I am working on a project to import data from matlab into a mysql database whose contents will be made available through a django website. I want to use Scipy.io.loadmat to get the information from matlab into a form I can use in python so that I can enter the data into the database with the django api.

我正在做一个项目,从matlab将数据导入到mysql数据库,该数据库的内容将通过django网站提供。我想用剪刀。io。loadmat可以将matlab中的信息转换为我可以在python中使用的表单,这样我就可以使用django api将数据输入数据库。

My problem is that I cannot work with the data imported by scipy.io.loadmat. It is loaded in the form of several nested arrays and some of the variable names seem to be lost.

我的问题是,我无法使用scipy.io.loadmat导入的数据。它以几个嵌套数组的形式加载,一些变量名似乎丢失了。

Here is the matlab code for a test structure that I have created for a trial:

下面是我为试验创建的一个测试结构的matlab代码:

sensors.time = [0:1:10].';
sensors.sensor1 = {};
sensors.sensor1.source_type = 'flight';                          
sensors.sensor1.source_name = 'flight-2';                       
sensors.sensor1.channels = {};
sensors.sensor1.channels.channel1.name = '1';                    
sensors.sensor1.channels.channel1.local_ori = 'lateral';         
sensors.sensor1.channels.channel1.vehicle_ori = 'axial';         
sensors.sensor1.channels.channel1.signals = {};
sensors.sensor1.channels.channel1.signals.signal1.filtered = 'N';
sensors.sensor1.channels.channel1.signals.signal1.filtered_description = 'none'; 
sensors.sensor1.channels.channel1.signals.signal1.data = sin(sensors.time)+0.1*rand(11,1); 

>> sensors
      time: [11x1 double]
      sensor1: [1x1 struct]
>> sensors.sensor1
      source_type: 'flight'
      source_name: 'flight-2'
      channels: [1x1 struct]
>> sensors.sensor1.channels
      channel1: [1x1 struct]
>> sensors.sensor1.channels.channel1
      name: '1'
      local_ori: 'lateral'
      vehicle_ori: 'axial'
      signals: [1x1 struct]
>> sensors.sensor1.channels.channel1.signals
      signal1: [1x1 struct]
>> sensors.sensor1.channels.channel1.signals.signal1
      filtered: 'N'
      filtered_description: 'none'
      data: [11x1 double]

I can easily visualize this structure as a python dictionary, so it does not seem like this should be such a complicated exercise.

我可以很容易地将这个结构形象化为python字典,因此看起来这不应该是一个如此复杂的练习。

Here is the python code I used to read the file in (eventually I want to read in multiple files):

这里是我用来读取文件的python代码(最终我想在多个文件中读取):

from scipy
import os, glob

path = 'C:\Users\c\Desktop\import'
for f in glob.glob( os.path.join(path, '*.mat')):
    matfile = scipy.io.loadmat(f, struct_as_record=True)

This is the resulting dictionary from loadmat:

这是由loadmat生成的字典:

>>> matfile
{'sensors': array([[ ([[0], [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]],[[(array([u'flight'], 
      dtype='<U6'), array([u'flight-2'], 
      dtype='<U8'), array([[ ([[(array([u'1'], 
      dtype='<U1'), array([u'lateral'], 
      dtype='<U7'), array([u'axial'], 
      dtype='<U5'), array([[ ([[(array([u'N'], 
      dtype='<U1'), array([u'none'], 
      dtype='<U4'), array([[ 0.06273465],[ 0.84363597],[ 1.00035443],[ 0.22117587],[-0.68221775],[-0.87761299],[-0.24108487],[ 0.71871452],[ 1.04690773],[ 0.46512366],[-0.51651414]]))]],)]],
      dtype=[('signal1', '|O4')]))]],)]], 
      dtype=[('channel1', '|O4')]))]])]], 
      dtype=[('time', '|O4'), ('sensor1', '|O4')]), '__version__': '1.0', '__header__': 'MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Tue Jun 07 18:38:32 2011', '__globals__': []}

The data is all there, but I don't know how to access these class objects. I would like to be able to loop over contents so that I can process, multiple sensors, then multiple channels for each sensor, etc.

数据都在那里,但我不知道如何访问这些类对象。我希望能够对内容进行循环,这样我就可以处理多个传感器,然后为每个传感器设置多个通道,等等。

Any explanations to help me simplify this data structure or suggested changes to make this easier would be greatly appreciated.

如果有任何解释可以帮助我简化这个数据结构,或者建议进行修改以使其更容易,我们将非常感激。


Update, based on Nick's suggestion here is the repr(matfile) and the dir(matfile)

更新,根据Nick的建议这里是repr(matfile)和dir(matfile)

>>> repr(matfile)
"{'sensors': array([[ ([[0], [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]], [[(array([u'flight'], \n      dtype='<U6'), array([u'flight-2'], \n      dtype='<U8'), array([[ ([[(array([u'1'], \n      dtype='<U1'), array([u'lateral'], \n      dtype='<U7'), array([u'axial'], \n      dtype='<U5'), array([[ ([[(array([u'N'], \n      dtype='<U1'), array([u'none'], \n      dtype='<U4'), array([[ 0.0248629 ],\n       [ 0.88663486],\n       [ 0.93206871],\n       [ 0.22156497],\n       [-0.65819207],\n       [-0.95592508],\n       [-0.22584908],\n       [ 0.66569432],\n       [ 1.06956739],\n       [ 0.51103298],\n       [-0.53732649]]))]], [[(array([u'Y'], \n      dtype='<U1'), array([u'1. 5 Hz High Pass, 2. remove offset'], \n      dtype='<U35'), array([[ 0.        ],\n       [ 0.84147098],\n       [ 0.90929743],\n       [ 0.14112001],\n       [-0.7568025 ],\n       [-0.95892427],\n       [-0.2794155 ],\n       [ 0.6569866 ],\n       [ 0.98935825],\n       [ 0.41211849],\n       [-0.54402111]]))]])]], \n      dtype=[('signal1', '|O4'), ('signal2', '|O4')]))]],)]], \n      dtype=[('channel1', '|O4')]))]])]], \n      dtype=[('time', '|O4'), ('sensor1', '|O4')]), '__version__': '1.0', '__header__': 'MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Wed Jun 08 10:58:19 2011', '__globals__': []}"

>>> dir(matfile)
['__class__', '__cmp__', '__contains__', '__delattr__', '__delitem__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy', 'fromkeys', 'get', 'has_key', 'items', 'iteritems', 'iterkeys', 'itervalues', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values', 'viewitems', 'viewkeys', 'viewvalues']

>>> dir(matfile['sensors'])
['T', '__abs__', '__add__', '__and__', '__array__', '__array_finalize__', '__array_interface__', '__array_prepare__', '__array_priority__', '__array_struct__', '__array_wrap__', '__class__', '__contains__', '__copy__', '__deepcopy__', '__delattr__', '__delitem__', '__delslice__', '__div__', '__divmod__', '__doc__', '__eq__', '__float__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getslice__', '__gt__', '__hash__', '__hex__', '__iadd__', '__iand__', '__idiv__', '__ifloordiv__', '__ilshift__', '__imod__', '__imul__', '__index__', '__init__', '__int__', '__invert__', '__ior__', '__ipow__', '__irshift__', '__isub__', '__iter__', '__itruediv__', '__ixor__', '__le__', '__len__', '__long__', '__lshift__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__nonzero__', '__oct__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdiv__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__setitem__', '__setslice__', '__setstate__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__xor__', 'all', 'any', 'argmax', 'argmin', 'argsort', 'astype', 'base', 'byteswap', 'choose', 'clip', 'compress', 'conj', 'conjugate', 'copy', 'ctypes', 'cumprod', 'cumsum', 'data', 'diagonal', 'dot', 'dtype', 'dump', 'dumps', 'fill', 'flags', 'flat', 'flatten', 'getfield', 'imag', 'item', 'itemset', 'itemsize', 'max', 'mean', 'min', 'nbytes', 'ndim', 'newbyteorder', 'nonzero', 'prod', 'ptp', 'put', 'ravel', 'real', 'repeat', 'reshape', 'resize', 'round', 'searchsorted', 'setfield', 'setflags', 'shape', 'size', 'sort', 'squeeze', 'std', 'strides', 'sum', 'swapaxes', 'take', 'tofile', 'tolist', 'tostring', 'trace', 'transpose', 'var', 'view']

Obviously I need to learn a bit about objects and classes. How can I pull bits of the array and put them into variables. For example:

显然,我需要学习一些关于对象和类的知识。我怎样才能把数组的一些元素放到变量中。例如:

time = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
source_type = 'flight'
etc.   

2 个解决方案

#1


19  

I've run into a similar issue with a fairly complex mat file at our company. I'm still getting my head wrapped around the scipy IO module, but here is what we found.

我在我们公司遇到了一个相当复杂的mat文件的类似问题。我还在思考scipy IO模块,但这里是我们发现的。

When you access matfile['sensors'] it returns a scipy.io.matlab.mio5_params.mat_struct object, which we can use to access the contents below. When you print it, it looks like a flat array, but you can still access the dict to get at the individual components. So you could run something like this to start accessing the components:

当您访问matfile['sensor ']时,它将返回一个scipy.io.matlab.mio5_params。mat_struct对象,我们可以使用它来访问下面的内容。当您打印它时,它看起来像一个平面数组,但是您仍然可以访问dict类型来获取单个组件。所以你可以运行这样的程序来开始访问组件:

from scipy.io import loadmat
matfile = loadmat('myfile.mat', squeeze_me=True, struct_as_record=False)
matfile['sensors'].sensor1.channels.channel1.name

In your case you want to be able to iterate over the elements in the structure, which you can do if you access the _fieldnames property of the mat_struct object. From there you can just loop over the field names and access them with getattr:

在这种情况下,您希望能够遍历结构中的元素,如果访问mat_struct对象的_fieldnames属性,就可以这样做。从那里你可以对字段名进行循环,并使用getattr访问它们:

for field in matfile['sensors']._fieldnames:
    # getattr will return the value for the given key
    print getattr(matfile['sensors'], field)

This is at least allowing us to access the deeply nested elements without having to alter our mat files.

这至少允许我们访问深度嵌套的元素,而不必修改mat文件。

#2


0  

The solution I resorted to was to simplify the MATLAB structure. I eliminated nested structures. Each data set resides in a single file and I used python to loop over all the files of a particular type in the specified directory. (http://bogdan.org.ua/2007/08/12/python-iterate-and-read-all-files-in-a-directory-folder.html, if you would like to see an example of that.)

我采用的解决方案是简化MATLAB结构。我取消了嵌套结构。每个数据集都驻留在一个文件中,我使用python对指定目录中特定类型的所有文件进行循环。(http://bogdan.org.ua/2007/08/12/python-iterate-and-read-all-files-in directoryfolder.html,如果您想看一个例子的话。)

Importing the flat matlab structure results in a dictionary where the matlab variable names are the keys. Strings come in as arrays of shape (1,) --> [ string ], and numbers come in as arrays of shape (N, M) --> [[ numbers ]].

将平面的matlab结构导入字典,在字典中,matlab变量名是键。字符串以形状(1,)—> [string]的形式出现,数字以形状(N, M)—> [[number]的形式出现。

I still have to learn a bit more about the numpy arrays.

我还需要学习更多关于numpy数组的知识。

#1


19  

I've run into a similar issue with a fairly complex mat file at our company. I'm still getting my head wrapped around the scipy IO module, but here is what we found.

我在我们公司遇到了一个相当复杂的mat文件的类似问题。我还在思考scipy IO模块,但这里是我们发现的。

When you access matfile['sensors'] it returns a scipy.io.matlab.mio5_params.mat_struct object, which we can use to access the contents below. When you print it, it looks like a flat array, but you can still access the dict to get at the individual components. So you could run something like this to start accessing the components:

当您访问matfile['sensor ']时,它将返回一个scipy.io.matlab.mio5_params。mat_struct对象,我们可以使用它来访问下面的内容。当您打印它时,它看起来像一个平面数组,但是您仍然可以访问dict类型来获取单个组件。所以你可以运行这样的程序来开始访问组件:

from scipy.io import loadmat
matfile = loadmat('myfile.mat', squeeze_me=True, struct_as_record=False)
matfile['sensors'].sensor1.channels.channel1.name

In your case you want to be able to iterate over the elements in the structure, which you can do if you access the _fieldnames property of the mat_struct object. From there you can just loop over the field names and access them with getattr:

在这种情况下,您希望能够遍历结构中的元素,如果访问mat_struct对象的_fieldnames属性,就可以这样做。从那里你可以对字段名进行循环,并使用getattr访问它们:

for field in matfile['sensors']._fieldnames:
    # getattr will return the value for the given key
    print getattr(matfile['sensors'], field)

This is at least allowing us to access the deeply nested elements without having to alter our mat files.

这至少允许我们访问深度嵌套的元素,而不必修改mat文件。

#2


0  

The solution I resorted to was to simplify the MATLAB structure. I eliminated nested structures. Each data set resides in a single file and I used python to loop over all the files of a particular type in the specified directory. (http://bogdan.org.ua/2007/08/12/python-iterate-and-read-all-files-in-a-directory-folder.html, if you would like to see an example of that.)

我采用的解决方案是简化MATLAB结构。我取消了嵌套结构。每个数据集都驻留在一个文件中,我使用python对指定目录中特定类型的所有文件进行循环。(http://bogdan.org.ua/2007/08/12/python-iterate-and-read-all-files-in directoryfolder.html,如果您想看一个例子的话。)

Importing the flat matlab structure results in a dictionary where the matlab variable names are the keys. Strings come in as arrays of shape (1,) --> [ string ], and numbers come in as arrays of shape (N, M) --> [[ numbers ]].

将平面的matlab结构导入字典,在字典中,matlab变量名是键。字符串以形状(1,)—> [string]的形式出现,数字以形状(N, M)—> [[number]的形式出现。

I still have to learn a bit more about the numpy arrays.

我还需要学习更多关于numpy数组的知识。