更有效地从.json文件中检索特定数据?

时间:2020-12-27 16:37:45

I have the following .json file, which have some lists like values in some elements:

我有以下.json文件,它有一些列表,如某些元素中的值:

{
  "paciente": [
    {
      "id": 1234,
      "nombre": "Pablo",
      "sesion": [
        {
          "id": 12345,
          "juego": [
            {
              "nombre": "bonzo",
              "nivel": [
                {
                  "id": 1234,
                  "nombre": "caida libre"
                }
              ],
              "___léeme___": "El array 'iteraciones' contiene las vitorias o derrotas con el tiempo en segundos de cada iteración",
              "iteraciones": [
                {
                  "victoria": true,
                  "tiempo": 120
                },
                {
                  "victoria": false,
                  "tiempo": 232
                }
              ]
            }
          ],
          "segmento": [
            {
              "id": 12345,
              "nombre": "Hombro",
              "movimiento": [
                {
                  "id": 12,
                  "nombre": "flexion",
                  "metricas": [
                    {
                      "min": 12,
                      "max": 34,
                      "media": 23,
                      "moda": 20
                    }
                  ]
                }
              ]
            }
          ]
        }
      ]
    },
    {
      "id": 156,
      "nombre": "Bernardo",
      "sesion": [
        {
          "id": 456,
          "juego": [
            {
              "nombre": "Rita",
              "nivel": [
                {
                  "id": 1,
                  "nombre": "NAVEGANDO"
                }
              ],
              "___léeme___": "El array 'iteraciones' contiene las vitorias o derrotas con el tiempo en segundos de cada iteración",
              "iteraciones": [
                {
                  "victoria": true,
                  "tiempo": 120
                },
                {
                  "victoria": false,
                  "tiempo": 232
                }
              ]
            }
          ],
          "segmento": [
            {
              "id": 12345,
              "nombre": "Escapula",
              "movimiento": [
                {
                  "id": 12,
                  "nombre": "Protracción",
                  "metricas": [
                    {
                      "min": 12,
                      "max": 34,
                      "media": 23,
                      "moda": 20
                    }
                  ]
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}

From my script, I want to go through it's different nested elements for get specific information

从我的脚本中,我想通过它的不同嵌套元素来获取特定信息

import json

with open('myfile.json') as data_file:
    data = json.loads(data_file.read())


    patient_id = data["paciente"][0]["id"]

    patient_name = data["paciente"][0]["nombre"]

    id_session = data["paciente"][0]["sesion"][0]["id"]

    game_session = data["paciente"][0]["sesion"][0]["juego"][0]["nombre"]

    level_game = data["paciente"][0]["sesion"][0]["juego"][0]["nivel"][0]["nombre"]

    iterations = data["paciente"][0]["sesion"][0]["juego"][0]["iteraciones"]

    iterations_victory = data["paciente"][0]["sesion"][0]["juego"][0]["iteraciones"][0]["victoria"]

    iterations_time = data["paciente"][0]["sesion"][0]["juego"][0]["iteraciones"][0]["tiempo"]

    iterations_victory1 = data["paciente"][0]["sesion"][0]["juego"][0]["iteraciones"][1]["victoria"]

    iterations_time1 = data["paciente"][0]["sesion"][0]["juego"][0]["iteraciones"][1]["tiempo"]

    segment = data["paciente"][0]["sesion"][0]["segmento"][0]["nombre"]

    movement = data["paciente"][0]["sesion"][0]["segmento"][0]["movimiento"][0]["nombre"]

    #metrics = data["paciente"][0]["sesion"][0]["segmento"][0]["movimiento"][0]["metricas"]

    metric_min = data["paciente"][0]["sesion"][0]["segmento"][0]["movimiento"][0]["metricas"][0]["min"]

    metric_max = data["paciente"][0]["sesion"][0]["segmento"][0]["movimiento"][0]["metricas"][0]["max"]

    metric_average = data["paciente"][0]["sesion"][0]["segmento"][0]["movimiento"][0]["metricas"][0]["media"]

    metric_moda = data["paciente"][0]["sesion"][0]["segmento"][0]["movimiento"][0]["metricas"][0]["moda"]

    print(
        'Patient ID:', patient_id,'\n',
        'Patient Name:', patient_name, '\n',
        'Session:','\n',
        '  Id Session:',id_session,'\n',
        '  Game:', game_session, '\n',
        '  Level:', level_game, '\n',
        '  Iterations:', len(iterations),'\n',
        '    Victory:', iterations_victory, '\n',
        '    Time:', iterations_time, '\n',
        '    Victory:', iterations_victory1, '\n',
        '    Time:', iterations_time1, '\n',
        '  Affected Segment:', segment, '\n',
        '    Movement:', movement, '\n',
        '       Metrics:','\n',
        '          Minimum:', metric_min, '\n'
        '          Maximum:', metric_max, '\n'
        '          Average:', metric_average, '\n'
        '          Moda/Trend:', metric_moda, '\n'

        )

This is my output:

这是我的输出:

Patient ID: 1234
 Patient Name: Pablo
 Session:
   Id Session: 12345
   Game: bonzo
   Level: caida libre
   Iterations: 2
     Victory: True
     Time: 120
     Victory: False
     Time: 232
   Affected Segment: Hombro
     Movement: flexion
        Metrics:
           Minimum: 12
          Maximum: 34
          Average: 23
          Moda/Trend: 20

[Finished in 0.0s]

Is it possible to optimize this code? How to can I make this code more readable or short?

是否可以优化此代码?如何才能使这段代码更具可读性或更短?

I would like especially when I will have query for more of one element (just in case of that exist) in the lists/arrays like as segment, movement, iterations, games, etc

我特别喜欢在列表/数组中查询更多的一个元素(只是在存在的情况下),如段,移动,迭代,游戏等

Any orientation is welcome.

欢迎任何方向。

2 个解决方案

#1


1  

Note that you are omitting the second patient record in your data (Bernardo), and that you assume there are always exactly two iterations. This might not always be true.

请注意,您省略了数据中的第二个患者记录(Bernardo),并且您认为总是有两次迭代。这可能并非总是如此。

When you look for speed, your code is close to the best you can get, but for the above reasons, you would probably do good to add some tests and loops to make sure you cover all data, and not more.

当您寻找速度时,您的代码接近您可以获得的最佳值,但由于上述原因,您可能会添加一些测试和循环以确保覆盖所有数据,而不是更多。

Here is a function you could use to print the data in your format, based on a template you pass it. The template lists all labels you want to use for the keys you want to print the values for. In order to avoid ambiguity, the template needs both the key and the parent key of the elements of interest.

这是一个可用于根据您传递的模板以您的格式打印数据的功能。该模板列出了要用于要为其打印值的键的所有标签。为了避免歧义,模板需要感兴趣元素的键和父键。

As the function needs to visit the keys in order, OrderedDict is used instead of dict:

由于函数需要按顺序访问键,因此使用OrderedDict而不是dict:

import json
from collections import OrderedDict

data = json.loads(data, object_pairs_hook=OrderedDict)

def pretty(template, item, parentName='', name='', indent=0):
    label = template.get(parentName + '/' + name)
    if label:
        label = '  ' * indent + label + ': '
        if isinstance(item, list):
            label += str(len(item))
        elif not isinstance(item, OrderedDict):
            label += str(item)
        print(label)
    if isinstance(item, list):
        for value in item:
            pretty(template, value, parentName + '[]', name, indent)
    elif isinstance(item, OrderedDict):
        for key, value in item.items():
            pretty(template, value, name, key, indent+1)


template = {
    "paciente/id": "Patient ID",
    "paciente/nombre": "Patient Name",
    "paciente/sesion": "Sessions",
    "sesion/id": "Id Session",
    "juego/nombre": "Game",
    "nivel/nombre": "Level",
    "juego/iteraciones": "Iterations",
    "iteraciones/victoria": "Victory",
    "iteraciones/tiempo": "Time",
    "segmento/nombre": "Affected Segment",
    "movimiento/nombre": "Movement",
    "movimiento/metricas": "Metrics",
    "metricas/min": "Minimum",
    "metricas/max": "Maximum",
    "metricas/media": "Average",
    "metricas/moda": "Moda/Trend"
}

pretty(template, data)

The output is:

输出是:

    Patient ID: 1234
    Patient Name: Pablo
    Sessions: 1
      Id Session: 12345
        Game: bonzo
          Level: caida libre
        Iterations: 2
          Victory: True
          Time: 120
          Victory: False
          Time: 232
        Affected Segment: Hombro
          Movement: flexion
          Metrics: 1
            Minimum: 12
            Maximum: 34
            Average: 23
            Moda/Trend: 20
    Patient ID: 156
    Patient Name: Bernardo
    Sessions: 1
      Id Session: 456
        Game: Rita
          Level: NAVEGANDO
        Iterations: 2
          Victory: True
          Time: 120
          Victory: False
          Time: 232
        Affected Segment: Escapula
          Movement: Protracción
          Metrics: 1
            Minimum: 12
            Maximum: 34
            Average: 23
            Moda/Trend: 20

#2


1  

Depending on what else your program is doing, it may or may not matter if you speed the code up. You should use the profile or cProfile module to find out where your script is spending its time and work on those.

根据您的程序正在执行的其他操作,如果您加快代码的速度,可能会或可能没有关系。您应该使用配置文件或cProfile模块来找出脚本花费时间的位置并对其进行处理。

Regardless, you could save some processing time by removing all the redundant indexing operations by using temporary variable to hold the result. You can think of this simple as the removal of common prefixes. It's relatively easy if you've got a good code editor.

无论如何,通过使用临时变量来保存结果,可以通过删除所有冗余索引操作来节省一些处理时间。您可以将此简单视为删除公共前缀。如果你有一个好的代码编辑器,这是相对容易的。

Although it may not be shorter or more readable code, it likely will execute faster (although there is some overhead involved).

虽然它可能不是更短或更易读的代码,但它可能会更快地执行(尽管涉及一些开销)。

Here's what I'm describing:

这就是我所描述的:

import json

with open('myfile.json') as data_file:
    data = json.loads(data_file.read())

    patient0_data = data["paciente"][0]

    patient_id = patient0_data["id"]
    patient_name = patient0_data["nombre"]

    patient0_data_sesion0 = patient0_data["sesion"][0]

    id_session = patient0_data_sesion0["id"]

    patient0_data_sesion0_juego0 = patient0_data_sesion0["juego"][0]

    game_session = patient0_data_sesion0_juego0["nombre"]
    level_game = patient0_data_sesion0_juego0["nivel"][0]["nombre"]
    iterations = patient0_data_sesion0_juego0["iteraciones"]

    patient0_data_sesion0_juego0_iteraciones = patient0_data_sesion0_juego0["iteraciones"]

    iterations_victory = patient0_data_sesion0_juego0_iteraciones[0]["victoria"]
    iterations_time = patient0_data_sesion0_juego0_iteraciones[0]["tiempo"]
    iterations_victory1 = patient0_data_sesion0_juego0_iteraciones[1]["victoria"]
    iterations_time1 = patient0_data_sesion0_juego0_iteraciones[1]["tiempo"]

    patient0_data_sesion0_segmento0 = patient0_data_sesion0["segmento"][0]

    segment = patient0_data_sesion0_segmento0["nombre"]

    patient0_data_sesion0_segmento0_movimiento0 = (
                                    patient0_data_sesion0_segmento0["movimiento"][0])

    movement = patient0_data_sesion0_segmento0_movimiento0["nombre"]
    #metrics = patient0_data_sesion0_segmento0_movimiento0["metricas"]

    patient0_data_sesion0_segmento0_movimiento0_metricas0 = (
                        patient0_data_sesion0_segmento0["movimiento"][0]["metricas"][0])

    metric_min = patient0_data_sesion0_segmento0_movimiento0_metricas0["min"]
    metric_max = patient0_data_sesion0_segmento0_movimiento0_metricas0["max"]
    metric_average = patient0_data_sesion0_segmento0_movimiento0_metricas0["media"]
    metric_moda = patient0_data_sesion0_segmento0_movimiento0_metricas0["moda"]

    print(
        'Patient ID:', patient_id,'\n',
        'Patient Name:', patient_name, '\n',
        'Session:','\n',
        '  Id Session:',id_session,'\n',
        '  Game:', game_session, '\n',
        '  Level:', level_game, '\n',
        '  Iterations:', len(iterations),'\n',
        '    Victory:', iterations_victory, '\n',
        '    Time:', iterations_time, '\n',
        '    Victory:', iterations_victory1, '\n',
        '    Time:', iterations_time1, '\n',
        '  Affected Segment:', segment, '\n',
        '    Movement:', movement, '\n',
        '       Metrics:','\n',
        '          Minimum:', metric_min, '\n'
        '          Maximum:', metric_max, '\n'
        '          Average:', metric_average, '\n'
        '          Moda/Trend:', metric_moda, '\n'

        )

#1


1  

Note that you are omitting the second patient record in your data (Bernardo), and that you assume there are always exactly two iterations. This might not always be true.

请注意,您省略了数据中的第二个患者记录(Bernardo),并且您认为总是有两次迭代。这可能并非总是如此。

When you look for speed, your code is close to the best you can get, but for the above reasons, you would probably do good to add some tests and loops to make sure you cover all data, and not more.

当您寻找速度时,您的代码接近您可以获得的最佳值,但由于上述原因,您可能会添加一些测试和循环以确保覆盖所有数据,而不是更多。

Here is a function you could use to print the data in your format, based on a template you pass it. The template lists all labels you want to use for the keys you want to print the values for. In order to avoid ambiguity, the template needs both the key and the parent key of the elements of interest.

这是一个可用于根据您传递的模板以您的格式打印数据的功能。该模板列出了要用于要为其打印值的键的所有标签。为了避免歧义,模板需要感兴趣元素的键和父键。

As the function needs to visit the keys in order, OrderedDict is used instead of dict:

由于函数需要按顺序访问键,因此使用OrderedDict而不是dict:

import json
from collections import OrderedDict

data = json.loads(data, object_pairs_hook=OrderedDict)

def pretty(template, item, parentName='', name='', indent=0):
    label = template.get(parentName + '/' + name)
    if label:
        label = '  ' * indent + label + ': '
        if isinstance(item, list):
            label += str(len(item))
        elif not isinstance(item, OrderedDict):
            label += str(item)
        print(label)
    if isinstance(item, list):
        for value in item:
            pretty(template, value, parentName + '[]', name, indent)
    elif isinstance(item, OrderedDict):
        for key, value in item.items():
            pretty(template, value, name, key, indent+1)


template = {
    "paciente/id": "Patient ID",
    "paciente/nombre": "Patient Name",
    "paciente/sesion": "Sessions",
    "sesion/id": "Id Session",
    "juego/nombre": "Game",
    "nivel/nombre": "Level",
    "juego/iteraciones": "Iterations",
    "iteraciones/victoria": "Victory",
    "iteraciones/tiempo": "Time",
    "segmento/nombre": "Affected Segment",
    "movimiento/nombre": "Movement",
    "movimiento/metricas": "Metrics",
    "metricas/min": "Minimum",
    "metricas/max": "Maximum",
    "metricas/media": "Average",
    "metricas/moda": "Moda/Trend"
}

pretty(template, data)

The output is:

输出是:

    Patient ID: 1234
    Patient Name: Pablo
    Sessions: 1
      Id Session: 12345
        Game: bonzo
          Level: caida libre
        Iterations: 2
          Victory: True
          Time: 120
          Victory: False
          Time: 232
        Affected Segment: Hombro
          Movement: flexion
          Metrics: 1
            Minimum: 12
            Maximum: 34
            Average: 23
            Moda/Trend: 20
    Patient ID: 156
    Patient Name: Bernardo
    Sessions: 1
      Id Session: 456
        Game: Rita
          Level: NAVEGANDO
        Iterations: 2
          Victory: True
          Time: 120
          Victory: False
          Time: 232
        Affected Segment: Escapula
          Movement: Protracción
          Metrics: 1
            Minimum: 12
            Maximum: 34
            Average: 23
            Moda/Trend: 20

#2


1  

Depending on what else your program is doing, it may or may not matter if you speed the code up. You should use the profile or cProfile module to find out where your script is spending its time and work on those.

根据您的程序正在执行的其他操作,如果您加快代码的速度,可能会或可能没有关系。您应该使用配置文件或cProfile模块来找出脚本花费时间的位置并对其进行处理。

Regardless, you could save some processing time by removing all the redundant indexing operations by using temporary variable to hold the result. You can think of this simple as the removal of common prefixes. It's relatively easy if you've got a good code editor.

无论如何,通过使用临时变量来保存结果,可以通过删除所有冗余索引操作来节省一些处理时间。您可以将此简单视为删除公共前缀。如果你有一个好的代码编辑器,这是相对容易的。

Although it may not be shorter or more readable code, it likely will execute faster (although there is some overhead involved).

虽然它可能不是更短或更易读的代码,但它可能会更快地执行(尽管涉及一些开销)。

Here's what I'm describing:

这就是我所描述的:

import json

with open('myfile.json') as data_file:
    data = json.loads(data_file.read())

    patient0_data = data["paciente"][0]

    patient_id = patient0_data["id"]
    patient_name = patient0_data["nombre"]

    patient0_data_sesion0 = patient0_data["sesion"][0]

    id_session = patient0_data_sesion0["id"]

    patient0_data_sesion0_juego0 = patient0_data_sesion0["juego"][0]

    game_session = patient0_data_sesion0_juego0["nombre"]
    level_game = patient0_data_sesion0_juego0["nivel"][0]["nombre"]
    iterations = patient0_data_sesion0_juego0["iteraciones"]

    patient0_data_sesion0_juego0_iteraciones = patient0_data_sesion0_juego0["iteraciones"]

    iterations_victory = patient0_data_sesion0_juego0_iteraciones[0]["victoria"]
    iterations_time = patient0_data_sesion0_juego0_iteraciones[0]["tiempo"]
    iterations_victory1 = patient0_data_sesion0_juego0_iteraciones[1]["victoria"]
    iterations_time1 = patient0_data_sesion0_juego0_iteraciones[1]["tiempo"]

    patient0_data_sesion0_segmento0 = patient0_data_sesion0["segmento"][0]

    segment = patient0_data_sesion0_segmento0["nombre"]

    patient0_data_sesion0_segmento0_movimiento0 = (
                                    patient0_data_sesion0_segmento0["movimiento"][0])

    movement = patient0_data_sesion0_segmento0_movimiento0["nombre"]
    #metrics = patient0_data_sesion0_segmento0_movimiento0["metricas"]

    patient0_data_sesion0_segmento0_movimiento0_metricas0 = (
                        patient0_data_sesion0_segmento0["movimiento"][0]["metricas"][0])

    metric_min = patient0_data_sesion0_segmento0_movimiento0_metricas0["min"]
    metric_max = patient0_data_sesion0_segmento0_movimiento0_metricas0["max"]
    metric_average = patient0_data_sesion0_segmento0_movimiento0_metricas0["media"]
    metric_moda = patient0_data_sesion0_segmento0_movimiento0_metricas0["moda"]

    print(
        'Patient ID:', patient_id,'\n',
        'Patient Name:', patient_name, '\n',
        'Session:','\n',
        '  Id Session:',id_session,'\n',
        '  Game:', game_session, '\n',
        '  Level:', level_game, '\n',
        '  Iterations:', len(iterations),'\n',
        '    Victory:', iterations_victory, '\n',
        '    Time:', iterations_time, '\n',
        '    Victory:', iterations_victory1, '\n',
        '    Time:', iterations_time1, '\n',
        '  Affected Segment:', segment, '\n',
        '    Movement:', movement, '\n',
        '       Metrics:','\n',
        '          Minimum:', metric_min, '\n'
        '          Maximum:', metric_max, '\n'
        '          Average:', metric_average, '\n'
        '          Moda/Trend:', metric_moda, '\n'

        )