I am currently these datasets from the Kiva Kaggle Competition: https://www.kaggle.com/kiva/data-science-for-good-kiva-crowdfunding/data
我目前是来自Kiva Kaggle比赛的这些数据集:https://www.kaggle.com/kiva/data-science-for-good-kiva-crowdfunding/data
I want to link a float 'MPI' value (a 'Multidimensional Poverty Index') to their corresponding geographical regions for each micro loan.
我想将浮动'MPI'值('多维贫困指数')与每个小额贷款的相应地理区域联系起来。
- In one dataset
kiva_mpi_region_locations.csv
each region has a single corresponding MPI value associated with it. - However, in dataset
kiva_loans.csv
where each loan is given a "Region", the data often has multiple values in the same cell separated by commas (,).
在一个数据集kiva_mpi_region_locations.csv中,每个区域都有一个与之关联的相应MPI值。
但是,在每个贷款都被赋予“区域”的数据集kiva_loans.csv中,数据通常在同一单元格中有逗号(,)分隔的多个值。
['kiva_loans.csv'/Loan Data Example] (Note, Different loans can come from the same region so in this case region is a foreign key but not a primary key):
['kiva_loans.csv'/贷款数据示例](注意,不同的贷款可以来自同一地区,因此在这种情况下,区域是外键但不是主键):
Loan #: 653338
region: Tanjay, Negros Oriental
[kiva_mpi_region_locations.csv
/ Regional MPI value example] (Note, every region only has one MPI as region in a primary key):
[kiva_mpi_region_locations.csv / Regional MPI value example](注意,每个区域只有一个MPI作为主键区域):
region: Badakhshan
MPI: 0.387
My code so far:
我的代码到目前为止:
RegionMPI = dict(zip(dfLocations.region, dfLocations.MPI))
{'Badakhshan': 0.387,
'Badghis': 0.466,
'Baghlan': 0.3,
'Balkh': 0.301,
'Bamyan': 0.325,
'Daykundi': 0.313,
etc}
LoanRegion = dfLoanTheme['region'].str.split(',').values.tolist()
[['Lahore']
nan,
['Dar es Salaam'],
['Liloy-Dela Paz'],
['Tanjay', ' Negros Oriental'],
['Ica'],
nan,
['Lahore']]
Any advice on how to cycle through my nested list and then use my dictionary keys to link the corresponding value from my dictionary to my list for all occurrences of that key in my list?
关于如何循环我的嵌套列表然后使用我的字典键将我的字典中的相应值链接到我的列表中的所有键的列表的任何建议?
1 个解决方案
#1
0
You want to do a merge
on two dataframes on the region
field. pandas
library makes this really easy (also performant). The code looks like this (your CSV files are behind the Kaggle registration-wall):
您想要在区域字段上的两个数据帧上进行合并。 pandas库让这很容易(也很有效)。代码如下所示(您的CSV文件位于Kaggle注册墙后面):
import pandas as pd
loans = pd.read_csv('kiva_loans.csv')
mpi_regions = pd.read_csv('kiva_mpi_region_locations.csv')
df = loans.merge(mpi_regions, on='region')
You really don't want to reinvent the wheel by writing your own join code in base Python, use pandas package already.
你真的不想通过在基础Python中编写自己的连接代码来重新发明*,已经使用pandas包。
(Note you're assuming region
is unique across countries. It might be safer to merge both on=['country','region']
)
(请注意,您假设区域在不同国家/地区是唯一的。在= ['country','region']上合并两者可能更安全
#1
0
You want to do a merge
on two dataframes on the region
field. pandas
library makes this really easy (also performant). The code looks like this (your CSV files are behind the Kaggle registration-wall):
您想要在区域字段上的两个数据帧上进行合并。 pandas库让这很容易(也很有效)。代码如下所示(您的CSV文件位于Kaggle注册墙后面):
import pandas as pd
loans = pd.read_csv('kiva_loans.csv')
mpi_regions = pd.read_csv('kiva_mpi_region_locations.csv')
df = loans.merge(mpi_regions, on='region')
You really don't want to reinvent the wheel by writing your own join code in base Python, use pandas package already.
你真的不想通过在基础Python中编写自己的连接代码来重新发明*,已经使用pandas包。
(Note you're assuming region
is unique across countries. It might be safer to merge both on=['country','region']
)
(请注意,您假设区域在不同国家/地区是唯一的。在= ['country','region']上合并两者可能更安全