迭代字典并提取值

时间:2022-09-01 22:16:34

I have a dictionary (result_dict) as follows.

我有一个字典(result_dict)如下。

{'11333216@N05': {'person': {'can_buy_pro': 0,
   'description': {'_content': ''},
   'has_stats': '1',
   'iconfarm': 3,
   'iconserver': '2214',
   'id': '11333216@N05',
   'ispro': 0,
   'location': {'_content': ''},
   'mbox_sha1sum': {'_content': '8eb2e248cbad94e2b4a5aae75eb653c7e061a90c'},
   'mobileurl': {'_content': 'https://m.flickr.com/photostream.gne?id=11327876'},
   'nsid': '11333216@N05',
   'path_alias': 'kishansamarasinghe',
   'photos': {'count': {'_content': 442},
    'firstdate': {'_content': '1193073180'},
    'firstdatetaken': {'_content': '2000-01-01 00:49:17'}},
   'photosurl': {'_content': 'https://www.flickr.com/photos/kishansamarasinghe/'},
   'profileurl': {'_content': 'https://www.flickr.com/people/kishansamarasinghe/'},
   'realname': {'_content': 'Kishan Samarasinghe'},
   'timezone': {'label': 'Sri Jayawardenepura',
    'offset': '+06:00',
    'timezone_id': 'Asia/Colombo'},
   'username': {'_content': 'Three Sixty Five Degrees'}},
  'stat': 'ok'},
 '117692977@N08': {'person': {'can_buy_pro': 0,
   'description': {'_content': ''},
   'has_stats': '0',
   'iconfarm': 1,
   'iconserver': '404',
   'id': '117692977@N08',
   'ispro': 0,
   'location': {'_content': 'Almere, The Nederlands'},
   'mobileurl': {'_content': 'https://m.flickr.com/photostream.gne?id=117600164'},
   'nsid': '117692977@N08',
   'path_alias': 'meijsvo',
   'photos': {'count': {'_content': 3237},
    'firstdate': {'_content': '1392469161'},
    'firstdatetaken': {'_content': '2013-06-23 14:39:30'}},
   'photosurl': {'_content': 'https://www.flickr.com/photos/meijsvo/'},
   'profileurl': {'_content': 'https://www.flickr.com/people/meijsvo/'},
   'realname': {'_content': 'Markéta Eijsvogelová'},
   'timezone': {'label': 'Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna',
    'offset': '+01:00',
    'timezone_id': 'Europe/Amsterdam'},
   'username': {'_content': 'meijsvo'}},
  'stat': 'ok'},
 '21539776@N02': {'person': {'can_buy_pro': 0,
   'description': {'_content': ''},
   'has_stats': '1',
   'iconfarm': 0,
   'iconserver': '0',

This contains more than 150 usernames (e.g. 11333216@N05) . I want to extract 'mobileurl' for each user and create a dataframe containing username and mobileurl columns. I couldn't find a way to iterate each user and extract his mobileurl as indexing is impossible. However, I have extract the mobileurl for one of the users as follows.

这包含超过150个用户名(例如,11333216 @ N05)。我想为每个用户提取“mobileurl”并创建一个包含用户名和mobileurl列的数据框。我无法找到一种方法来迭代每个用户并提取他的mobileurl,因为索引是不可能的。但是,我已经为其中一个用户提取了mobileurl,如下所示。

result_dict['76617062@N08']["person"]["mobileurl"]['_content']

'https://m.flickr.com/photostream.gne?id=76524249'

Would be grateful if someone can help, as I'm a bit new to python.

如果有人可以提供帮助,将不胜感激,因为我对python有点新意。

3 个解决方案

#1


0  

Iterate through the dictionarys list of keys which in this case are the usernames, then use each one to access each top level dict and from there dive through all the other layers to find the exact data you need. The mobileurl in your example.

迭代dictionarys键列表,在这种情况下是用户名,然后使用每个键访问每个*词典,并从那里浏览所有其他层以找到所需的确切数据。你的例子中的mobileurl。

Once you have these 2 variables, add them to your dataframe.

获得这两个变量后,将它们添加到数据框中。

# Iterate through list of users
for user in result_dict.keys():

    # use each username to find the mobileurl you need within
    mobileurl = result_dict[user]["person"]["mobileurl"]["_content"]

    # Add the variables 'user' and 'mobileurl' to dataframe as you see fit

#2


0  

result_dict = {'11333216@N05': {'person': {'can_buy_pro': 0,
   'description': {'_content': ''},
   'has_stats': '1',
   'iconfarm': 3,
   'iconserver': '2214',
   'id': '11333216@N05',
   'ispro': 0,
   'location': {'_content': ''},
   'mbox_sha1sum': {'_content': '8eb2e248cbad94e2b4a5aae75eb653c7e061a90c'},
   'mobileurl': {'_content': 'https://m.flickr.com/photostream.gne?id=11327876'},
   'nsid': '11333216@N05',
   'path_alias': 'kishansamarasinghe',
   'photos': {'count': {'_content': 442},
    'firstdate': {'_content': '1193073180'},
    'firstdatetaken': {'_content': '2000-01-01 00:49:17'}},
   'photosurl': {'_content': 'https://www.flickr.com/photos/kishansamarasinghe/'},
   'profileurl': {'_content': 'https://www.flickr.com/people/kishansamarasinghe/'},
   'realname': {'_content': 'Kishan Samarasinghe'},
   'timezone': {'label': 'Sri Jayawardenepura',
    'offset': '+06:00',
    'timezone_id': 'Asia/Colombo'},
   'username': {'_content': 'Three Sixty Five Degrees'}},
  'stat': 'ok'},
 '117692977@N08': {'person': {'can_buy_pro': 0,
   'description': {'_content': ''},
   'has_stats': '0',
   'iconfarm': 1,
   'iconserver': '404',
   'id': '117692977@N08',
   'ispro': 0,
   'location': {'_content': 'Almere, The Nederlands'},
   'mobileurl': {'_content': 'https://m.flickr.com/photostream.gne?id=117600164'},
   'nsid': '117692977@N08',
   'path_alias': 'meijsvo',
   'photos': {'count': {'_content': 3237},
    'firstdate': {'_content': '1392469161'},
    'firstdatetaken': {'_content': '2013-06-23 14:39:30'}},
   'photosurl': {'_content': 'https://www.flickr.com/photos/meijsvo/'},
   'profileurl': {'_content': 'https://www.flickr.com/people/meijsvo/'},
   'realname': {'_content': 'Markéta Eijsvogelová'},
   'timezone': {'label': 'Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna',
    'offset': '+01:00',
    'timezone_id': 'Europe/Amsterdam'},
   'username': {'_content': 'meijsvo'}},
  'stat': 'ok'},
 '21539776@N02': {'person': {'can_buy_pro': 0,
   'description': {'_content': ''},
   'has_stats': '1',
   'iconfarm': 0,
   'iconserver': '0'}
}
}

For your use case better use iteritems() of dictionary:

对于您的用例,最好使用字典的iteritems():

for key, value in result_dict.iteritems():
    print value.get("person", {}).get("mobileurl", {}).get("_content")

OUTPUT

https://m.flickr.com/photostream.gne?id=117600164
https://m.flickr.com/photostream.gne?id=11327876

#3


0  

I think you could also try to do it more of a pandas way instead of pure dictionary iteration. it's not necessarily the fastest but given you are new to python and pandas, I think it's good thing to know that pandas can handle this well.

我想你也可以尝试更多的熊猫方式,而不是纯字典迭代。它不一定是最快的,但鉴于你是蟒蛇和熊猫的新手,我认为知道熊猫可以很好地处理这个问题是件好事。

I am assuming you are using pandas DataFrame, not just dictionary. you could easily achieve the same purpose without converting your json to pandas DataFrame. i.e. other answers will work even if you are not a pandas DataFrame. they are also valid python dictionary syntax.

我假设您正在使用pandas DataFrame,而不仅仅是字典。您可以轻松实现相同的目的,而无需将您的json转换为pandas DataFrame。即使您不是熊猫DataFrame,其他答案也会有效。它们也是有效的python字典语法。

urls = result_dict[result_dict.index=='person'].apply(lambda x: x['mobileurl']['_content'])

here we have selected all rows that have the index as person and then we tried to apply a function (lambda is the anonymous function we'll be using) to each person. In this case, we are extracting out the urls using the lambda function, then pandas converted the result back to a pandas DataFrame (or Series) for you to use.

这里我们选择了所有索引为person的行,然后我们尝试将一个函数(lambda是我们将要使用的匿名函数)应用于每个人。在这种情况下,我们使用lambda函数提取url,然后pandas将结果转换回pandas DataFrame(或Series)供您使用。

normally I would also care about how fast my iteration is.

通常我也会关心我的迭代速度有多快。

(following are done in IPython, a nice tool you could use to do many things in python. %%timeit is a magic function provided by IPython for you to calculate the time your codes could take)

(以下是在IPython中完成的,一个很好的工具,可以用来在python中做很多事情。%% timeit是IPython提供的一个魔术函数,用于计算你的代码可以花费的时间)

%timeit 
urls = result_dict[result_dict.index=='person'].apply(lambda x: x['mobileurl']['_content'])

1000 loops, best of 3: 133 us per loop (us = microsecond, 10e-6)

@SamC provided the fast solution here I can let you know. but like i said, you don't need a DataFrame to use his solution. it'll also work for plain dictionary.

@SamC提供了快速解决方案,我可以告诉你。但就像我说的,你不需要DataFrame来使用他的解决方案。它也适用于普通词典。

#1


0  

Iterate through the dictionarys list of keys which in this case are the usernames, then use each one to access each top level dict and from there dive through all the other layers to find the exact data you need. The mobileurl in your example.

迭代dictionarys键列表,在这种情况下是用户名,然后使用每个键访问每个*词典,并从那里浏览所有其他层以找到所需的确切数据。你的例子中的mobileurl。

Once you have these 2 variables, add them to your dataframe.

获得这两个变量后,将它们添加到数据框中。

# Iterate through list of users
for user in result_dict.keys():

    # use each username to find the mobileurl you need within
    mobileurl = result_dict[user]["person"]["mobileurl"]["_content"]

    # Add the variables 'user' and 'mobileurl' to dataframe as you see fit

#2


0  

result_dict = {'11333216@N05': {'person': {'can_buy_pro': 0,
   'description': {'_content': ''},
   'has_stats': '1',
   'iconfarm': 3,
   'iconserver': '2214',
   'id': '11333216@N05',
   'ispro': 0,
   'location': {'_content': ''},
   'mbox_sha1sum': {'_content': '8eb2e248cbad94e2b4a5aae75eb653c7e061a90c'},
   'mobileurl': {'_content': 'https://m.flickr.com/photostream.gne?id=11327876'},
   'nsid': '11333216@N05',
   'path_alias': 'kishansamarasinghe',
   'photos': {'count': {'_content': 442},
    'firstdate': {'_content': '1193073180'},
    'firstdatetaken': {'_content': '2000-01-01 00:49:17'}},
   'photosurl': {'_content': 'https://www.flickr.com/photos/kishansamarasinghe/'},
   'profileurl': {'_content': 'https://www.flickr.com/people/kishansamarasinghe/'},
   'realname': {'_content': 'Kishan Samarasinghe'},
   'timezone': {'label': 'Sri Jayawardenepura',
    'offset': '+06:00',
    'timezone_id': 'Asia/Colombo'},
   'username': {'_content': 'Three Sixty Five Degrees'}},
  'stat': 'ok'},
 '117692977@N08': {'person': {'can_buy_pro': 0,
   'description': {'_content': ''},
   'has_stats': '0',
   'iconfarm': 1,
   'iconserver': '404',
   'id': '117692977@N08',
   'ispro': 0,
   'location': {'_content': 'Almere, The Nederlands'},
   'mobileurl': {'_content': 'https://m.flickr.com/photostream.gne?id=117600164'},
   'nsid': '117692977@N08',
   'path_alias': 'meijsvo',
   'photos': {'count': {'_content': 3237},
    'firstdate': {'_content': '1392469161'},
    'firstdatetaken': {'_content': '2013-06-23 14:39:30'}},
   'photosurl': {'_content': 'https://www.flickr.com/photos/meijsvo/'},
   'profileurl': {'_content': 'https://www.flickr.com/people/meijsvo/'},
   'realname': {'_content': 'Markéta Eijsvogelová'},
   'timezone': {'label': 'Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna',
    'offset': '+01:00',
    'timezone_id': 'Europe/Amsterdam'},
   'username': {'_content': 'meijsvo'}},
  'stat': 'ok'},
 '21539776@N02': {'person': {'can_buy_pro': 0,
   'description': {'_content': ''},
   'has_stats': '1',
   'iconfarm': 0,
   'iconserver': '0'}
}
}

For your use case better use iteritems() of dictionary:

对于您的用例,最好使用字典的iteritems():

for key, value in result_dict.iteritems():
    print value.get("person", {}).get("mobileurl", {}).get("_content")

OUTPUT

https://m.flickr.com/photostream.gne?id=117600164
https://m.flickr.com/photostream.gne?id=11327876

#3


0  

I think you could also try to do it more of a pandas way instead of pure dictionary iteration. it's not necessarily the fastest but given you are new to python and pandas, I think it's good thing to know that pandas can handle this well.

我想你也可以尝试更多的熊猫方式,而不是纯字典迭代。它不一定是最快的,但鉴于你是蟒蛇和熊猫的新手,我认为知道熊猫可以很好地处理这个问题是件好事。

I am assuming you are using pandas DataFrame, not just dictionary. you could easily achieve the same purpose without converting your json to pandas DataFrame. i.e. other answers will work even if you are not a pandas DataFrame. they are also valid python dictionary syntax.

我假设您正在使用pandas DataFrame,而不仅仅是字典。您可以轻松实现相同的目的,而无需将您的json转换为pandas DataFrame。即使您不是熊猫DataFrame,其他答案也会有效。它们也是有效的python字典语法。

urls = result_dict[result_dict.index=='person'].apply(lambda x: x['mobileurl']['_content'])

here we have selected all rows that have the index as person and then we tried to apply a function (lambda is the anonymous function we'll be using) to each person. In this case, we are extracting out the urls using the lambda function, then pandas converted the result back to a pandas DataFrame (or Series) for you to use.

这里我们选择了所有索引为person的行,然后我们尝试将一个函数(lambda是我们将要使用的匿名函数)应用于每个人。在这种情况下,我们使用lambda函数提取url,然后pandas将结果转换回pandas DataFrame(或Series)供您使用。

normally I would also care about how fast my iteration is.

通常我也会关心我的迭代速度有多快。

(following are done in IPython, a nice tool you could use to do many things in python. %%timeit is a magic function provided by IPython for you to calculate the time your codes could take)

(以下是在IPython中完成的,一个很好的工具,可以用来在python中做很多事情。%% timeit是IPython提供的一个魔术函数,用于计算你的代码可以花费的时间)

%timeit 
urls = result_dict[result_dict.index=='person'].apply(lambda x: x['mobileurl']['_content'])

1000 loops, best of 3: 133 us per loop (us = microsecond, 10e-6)

@SamC provided the fast solution here I can let you know. but like i said, you don't need a DataFrame to use his solution. it'll also work for plain dictionary.

@SamC提供了快速解决方案,我可以告诉你。但就像我说的,你不需要DataFrame来使用他的解决方案。它也适用于普通词典。