如何将列表转换为熊猫数据档案

时间:2021-05-14 23:48:38

I have the following code:

我有以下代码:

rows =[]
for dt in new_info:
    x =  dt['state']
    est = dt['estimates']

    col_R = [val['choice'] for val in est if val['party'] == 'Rep']
    col_D = [val['choice'] for val in est if val['party'] == 'Dem']

    incumb = [val['party'] for val in est if val['incumbent'] == True ]

    rows.append((x, col_R, col_D, incumb))

Now I want to convert my rows list into a pandas data frame. Structure of my rows list is shown below and my list has 32 entries.

现在我想把我的行列表转换成一个熊猫数据框。我的行列表的结构如下所示,我的列表有32个条目。

如何将列表转换为熊猫数据档案

When I convert this into a pandas data frame, I get the entries in the data frame as a list. :

当我把它转换成熊猫数据框时,我将数据框中的条目作为列表。:

pd.DataFrame(rows, columns=["State", "R", "D", "incumbent"])  

如何将列表转换为熊猫数据档案

But I want my data frame like this

但是我想要这样的数据框架

如何将列表转换为熊猫数据档案

The new info variable looks like this 如何将列表转换为熊猫数据档案

新的info变量如下所示

2 个解决方案

#1


9  

Since you mind the objects in the columns being lists, I would use a generator to remove the lists wrapping your items:

由于您介意列中的对象是列表,我将使用生成器删除包装项目的列表:

import pandas as pd
import numpy as np
rows = [(u'KY', [u'McConnell'], [u'Grimes'], [u'Rep']),
        (u'AR', [u'Cotton'], [u'Pryor'], [u'Dem']),
        (u'MI', [u'Land'], [u'Peters'], [])]

def get(r, nth):
    '''helper function to retrieve item from nth list in row r'''
    return r[nth][0] if r[nth] else np.nan

def remove_list_items(list_of_records):
    for r in list_of_records:
        yield r[0], get(r, 1), get(r, 2), get(r, 3)

The generator works similarly to this function, but instead of materializing a list unnecessarily in memory as an intermediate step, it just passes each row that would be in the list to the consumer of the list of rows:

生成器的工作原理与此函数类似,但它不会将不必要的列表作为中间步骤在内存中具体化,而是将列表中的每一行传递给行列表的使用者:

def remove_list_items(list_of_records):
    result = []
    for r in list_of_records:
        result.append((r[0], get(r, 1), get(r, 2), get(r, 3)))
    return result

And then compose your DataFrame passing your data through the generator, (or the list version, if you wish.)

然后将数据通过生成器(或者列表版本,如果您愿意的话)传递给DataFrame。

>>> df = pd.DataFrame.from_records(
        remove_list_items(rows), 
        columns=["State", "R", "D", "incumbent"])
>>> df
  State          R       D incumbent
0    KY  McConnell  Grimes       Rep
1    AR     Cotton   Pryor       Dem
2    MI       Land  Peters       NaN

Or you could use a list comprehension or a generator expression (shown) to do essentially the same:

或者您可以使用列表理解或生成器表达式(显示)来完成基本相同的工作:

>>> df = pd.DataFrame.from_records(
      ((r[0], get(r, 1), get(r, 2), get(r, 3)) for r in rows), 
      columns=["State", "R", "D", "incumbent"])

#2


7  

You can use some built in python list manipulation and do something like:

您可以使用python中的一些内置列表操作,并执行以下操作:

df['col1'] = df['col1'].apply(lambda i: ''.join(i))

which will produce:

这将会产生:

    col1 col2
0    a  [d]
1    b  [e]
2    c  [f]

Obviously col2 hasn't been formatted in order to show contrast.

显然,col2还没有被格式化以显示对比。

Edit

As requested by OP, if you want to implement an apply(lambda...) to all the columns then you can either explicitly set each column with a line that looks like the one above replacing 'col1' with each of the column names you wish to alter or you can just loop over the columns like this:

OP的要求,如果你想实现一个(λ…)适用于所有列然后您可以显式地设置每一列一行看起来像上面“col1”替换为每个列名称你想改变或者你可以遍历列如下:

if you have a data frame of type

如果你有一个类型的数据框架

x = [['a'],['b'],['c'],['d']]
y = [['e'],['f'],['g'],['h']]
z = [['i'],['j'],['k'],['l']]

df = pd.DataFrame({'col1':x, 'col2':y, 'col3':z})

then you can loop over the columns

然后可以对列进行循环

for col in df.columns:
    df[col] = df[col].apply(lambda i: ''.join(i))

which converts a data frame that starts like:

转换数据帧,开始如下:

   col1 col2 col3
0  [a]  [e]  [i]
1  [b]  [f]  [j]
2  [c]  [g]  [k]
3  [d]  [h]  [l]

and becomes

并成为

    col1 col2 col3
0    a    e    i
1    b    f    j
2    c    g    k
3    d    h    l

#1


9  

Since you mind the objects in the columns being lists, I would use a generator to remove the lists wrapping your items:

由于您介意列中的对象是列表,我将使用生成器删除包装项目的列表:

import pandas as pd
import numpy as np
rows = [(u'KY', [u'McConnell'], [u'Grimes'], [u'Rep']),
        (u'AR', [u'Cotton'], [u'Pryor'], [u'Dem']),
        (u'MI', [u'Land'], [u'Peters'], [])]

def get(r, nth):
    '''helper function to retrieve item from nth list in row r'''
    return r[nth][0] if r[nth] else np.nan

def remove_list_items(list_of_records):
    for r in list_of_records:
        yield r[0], get(r, 1), get(r, 2), get(r, 3)

The generator works similarly to this function, but instead of materializing a list unnecessarily in memory as an intermediate step, it just passes each row that would be in the list to the consumer of the list of rows:

生成器的工作原理与此函数类似,但它不会将不必要的列表作为中间步骤在内存中具体化,而是将列表中的每一行传递给行列表的使用者:

def remove_list_items(list_of_records):
    result = []
    for r in list_of_records:
        result.append((r[0], get(r, 1), get(r, 2), get(r, 3)))
    return result

And then compose your DataFrame passing your data through the generator, (or the list version, if you wish.)

然后将数据通过生成器(或者列表版本,如果您愿意的话)传递给DataFrame。

>>> df = pd.DataFrame.from_records(
        remove_list_items(rows), 
        columns=["State", "R", "D", "incumbent"])
>>> df
  State          R       D incumbent
0    KY  McConnell  Grimes       Rep
1    AR     Cotton   Pryor       Dem
2    MI       Land  Peters       NaN

Or you could use a list comprehension or a generator expression (shown) to do essentially the same:

或者您可以使用列表理解或生成器表达式(显示)来完成基本相同的工作:

>>> df = pd.DataFrame.from_records(
      ((r[0], get(r, 1), get(r, 2), get(r, 3)) for r in rows), 
      columns=["State", "R", "D", "incumbent"])

#2


7  

You can use some built in python list manipulation and do something like:

您可以使用python中的一些内置列表操作,并执行以下操作:

df['col1'] = df['col1'].apply(lambda i: ''.join(i))

which will produce:

这将会产生:

    col1 col2
0    a  [d]
1    b  [e]
2    c  [f]

Obviously col2 hasn't been formatted in order to show contrast.

显然,col2还没有被格式化以显示对比。

Edit

As requested by OP, if you want to implement an apply(lambda...) to all the columns then you can either explicitly set each column with a line that looks like the one above replacing 'col1' with each of the column names you wish to alter or you can just loop over the columns like this:

OP的要求,如果你想实现一个(λ…)适用于所有列然后您可以显式地设置每一列一行看起来像上面“col1”替换为每个列名称你想改变或者你可以遍历列如下:

if you have a data frame of type

如果你有一个类型的数据框架

x = [['a'],['b'],['c'],['d']]
y = [['e'],['f'],['g'],['h']]
z = [['i'],['j'],['k'],['l']]

df = pd.DataFrame({'col1':x, 'col2':y, 'col3':z})

then you can loop over the columns

然后可以对列进行循环

for col in df.columns:
    df[col] = df[col].apply(lambda i: ''.join(i))

which converts a data frame that starts like:

转换数据帧,开始如下:

   col1 col2 col3
0  [a]  [e]  [i]
1  [b]  [f]  [j]
2  [c]  [g]  [k]
3  [d]  [h]  [l]

and becomes

并成为

    col1 col2 col3
0    a    e    i
1    b    f    j
2    c    g    k
3    d    h    l