打印出hdf文件包含的数据(无论是文件还是图表)

时间:2021-01-31 14:53:24

If I'm given *.hdf file, how can I print out all the data it contains?

如果我给了* .hdf文件,我怎样才能打印出它包含的所有数据?

>>> import h5py
>>> f = h5py.File('my_file.hdf', 'r')
>>> # What's next?

All the questions here describe how to either create an hdf file or just read it without printing out the data in contains. So don't mark it as a duplicate.

这里的所有问题都描述了如何创建一个hdf文件或只是读取它而不打印出contains中的数据。所以不要将其标记为重复。

2 个解决方案

#1


You might want to use the visititems method.

您可能想要使用visititems方法。

Recursively visit all objects in this group and subgroups. Like Group.visit(), except your callable should have the signature: callable(name, object) -> None or return value. In this case object will be a Group or Dataset instance.

递归访问此组和子组中的所有对象。与Group.visit()一样,除了你的callable应该有签名:callable(name,object) - > None或return value。在这种情况下,对象将是一个组或数据集实例。

So the idea is to have a function that will take as argument the name of the visited group (or dataset) and the group (or dataset) instance to log and call the visititems function of the opened file with this log function as argument.

因此,我们的想法是创建一个函数,将访问组(或数据集)的名称和组(或数据集)实例作为参数,以此日志函数作为参数记录和调用打开文件的visititems函数。

Here is a simple example implementation:

这是一个简单的示例实现:

def log_hdf_file(hdf_file):
    """
    Print the groups, attributes and datasets contained in the given HDF file handler to stdout.

    :param h5py.File hdf_file: HDF file handler to log to stdout.
    """
    def _print_item(name, item):
        """Print to stdout the name and attributes or value of the visited item."""
        print name
        # Format item attributes if any
        if item.attrs:
            print '\tattributes:'
            for key, value in item.attrs.iteritems():
                print '\t\t{}: {}'.format(key, str(value).replace('\n', '\n\t\t'))

        # Format Dataset value
        if hasattr(item, 'value'):
            print '\tValue:'
            print '\t\t' + str(item.value).replace('\n', '\n\t\t')

    # Here we first print the file attributes as they are not accessible from File.visititems()
    _print_item(hdf_file.filename, hdf_file)
    # Print the content of the file
    hdf_file.visititems(_print_item)


with h5py.File('my_file.h5') as hdf_file:
    log_hdf_file(hdf_file)

#2


This is not a proper answer to this question, but the one other answer is a bit unsatisfactory.

这不是这个问题的正确答案,但另一个答案有点令人不满意。

To have a look at what's inside an .hdf file, I usually use NASA's Panoply software. It can be downloaded here: http://www.giss.nasa.gov/tools/panoply/ and it lets you open, explore and plot data in all sorts of geo-referenced formats, including netCDF and hdf.

要查看.hdf文件中的内容,我通常会使用NASA的Panoply软件。它可以在这里下载:http://www.giss.nasa.gov/tools/panoply/它可以让你打开,探索和绘制各种地理参考格式的数据,包括netCDF和hdf。

Then I can find out the name of the subdataset I'm interested in and open it in my python script.

然后我可以找到我感兴趣的子数据集的名称,并在我的python脚本中打开它。

Hope this will be a helpful tip for some people looking up this question!

希望这对于一些查找这个问题的人来说是一个有用的提示!

#1


You might want to use the visititems method.

您可能想要使用visititems方法。

Recursively visit all objects in this group and subgroups. Like Group.visit(), except your callable should have the signature: callable(name, object) -> None or return value. In this case object will be a Group or Dataset instance.

递归访问此组和子组中的所有对象。与Group.visit()一样,除了你的callable应该有签名:callable(name,object) - > None或return value。在这种情况下,对象将是一个组或数据集实例。

So the idea is to have a function that will take as argument the name of the visited group (or dataset) and the group (or dataset) instance to log and call the visititems function of the opened file with this log function as argument.

因此,我们的想法是创建一个函数,将访问组(或数据集)的名称和组(或数据集)实例作为参数,以此日志函数作为参数记录和调用打开文件的visititems函数。

Here is a simple example implementation:

这是一个简单的示例实现:

def log_hdf_file(hdf_file):
    """
    Print the groups, attributes and datasets contained in the given HDF file handler to stdout.

    :param h5py.File hdf_file: HDF file handler to log to stdout.
    """
    def _print_item(name, item):
        """Print to stdout the name and attributes or value of the visited item."""
        print name
        # Format item attributes if any
        if item.attrs:
            print '\tattributes:'
            for key, value in item.attrs.iteritems():
                print '\t\t{}: {}'.format(key, str(value).replace('\n', '\n\t\t'))

        # Format Dataset value
        if hasattr(item, 'value'):
            print '\tValue:'
            print '\t\t' + str(item.value).replace('\n', '\n\t\t')

    # Here we first print the file attributes as they are not accessible from File.visititems()
    _print_item(hdf_file.filename, hdf_file)
    # Print the content of the file
    hdf_file.visititems(_print_item)


with h5py.File('my_file.h5') as hdf_file:
    log_hdf_file(hdf_file)

#2


This is not a proper answer to this question, but the one other answer is a bit unsatisfactory.

这不是这个问题的正确答案,但另一个答案有点令人不满意。

To have a look at what's inside an .hdf file, I usually use NASA's Panoply software. It can be downloaded here: http://www.giss.nasa.gov/tools/panoply/ and it lets you open, explore and plot data in all sorts of geo-referenced formats, including netCDF and hdf.

要查看.hdf文件中的内容,我通常会使用NASA的Panoply软件。它可以在这里下载:http://www.giss.nasa.gov/tools/panoply/它可以让你打开,探索和绘制各种地理参考格式的数据,包括netCDF和hdf。

Then I can find out the name of the subdataset I'm interested in and open it in my python script.

然后我可以找到我感兴趣的子数据集的名称,并在我的python脚本中打开它。

Hope this will be a helpful tip for some people looking up this question!

希望这对于一些查找这个问题的人来说是一个有用的提示!