使用包含嵌套列表的现有列的​​出现总和创建一个新列

时间:2022-12-30 22:54:29

I have a relatively big dataframe that looks like this:

我有一个相对较大的数据框,如下所示:

(I have uploaded the csv file here - ufile.io/526t4)

(我在这里上传了csv文件 - ufile.io/526t4)

    value
0   [[1,92,"D"],[93,93,"C"],[94,113,"S"],[114,120,"C"],[121,181,"S"],[182,187,"C"],[188,292,"S"],[319,319,"S"],[320,353,"C"],[354,393,"D"]]
1   [[18,23,"D"],[24,27,"C"],[28,186,"S"],[187,198,"C"],[199,246,"S"]]
2   [[18,23,"D"],[24,27,"C"],[28,186,"S"],[187,198,"C"],[199,246,"S"]]
3   [[20,79,"D"]]
...
12352   [[25,36,"S"],[37,89,"C"],[90,115,"S"]]
12353   [[1,16,"D"],[17,407,"C"],[408,416,"D"]]
12354   [[16,21,"D"],[22,108,"C"],[109,123,"D"],[124,164,"C"],[165,421,"S"]]
12355 rows × 1 columns

And I want to create a new column with the sum of all "D" occurrences

我想创建一个新列,其中包含所有“D”次出现的总和

using the first row as an example:

以第一行为例:

x = [[1,92,"D"],[93,93,"C"],[94,113,"S"],[114,120,"C"][121,181,"S"],182,187,"C"],[188,292,"S"],[319,319,"S"],[320,353,"C"],[354,393,"D"]]
new_colum_D = (sum([y[1]-y[0] for y in x if y[2]=="D"])) # applied for all rows

the new_colum_D = first row value would be 130

new_colum_D =第一行值为130

I have tried the following:

我尝试过以下方法:

df['Column_D']=df["value"].apply(lambda x:sum([y[1]-y[0] for y in x if y[2]=="D"]))

but I get the following message: IndexError: string index out of range

但我得到以下消息:IndexError:字符串索引超出范围

IndexError                                Traceback (most recent call last)
<ipython-input-7-f7f23d42d4e5> in <module>()
----> 1 df['sum']=df["value"].apply(lambda x:sum([y[1]-y[0] for y in x if 
y[2]=="D"]))
~\AppData\Local\conda\conda\envs\my_root\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
   2549             else:
   2550                 values = self.asobject
-> 2551                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   2552 
   2553         if len(mapped) and isinstance(mapped[0], Series):
pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer()
<ipython-input-7-f7f23d42d4e5> in <lambda>(x)
----> 1 df['sum']=df["value"].apply(lambda x:sum([y[1]-y[0] for y in x if y[2]=="D"]))
<ipython-input-7-f7f23d42d4e5> in <listcomp>(.0)
----> 1 df['sum']=df["value"].apply(lambda x:sum([y[1]-y[0] for y in x if y[2]=="D"]))

IndexError: string index out of range

1 个解决方案

#1


0  

You are very close. You can structure your calculation in a list comprehension. Then assign the list to a series.

你很近。您可以在列表推导中构建计算结构。然后将列表分配给系列。

You may feel you are vectorising a calculation by using pd.DataFrame.apply, but this is not the case: apply is just a thinly veiled loop with some additional overhead.

您可能会觉得使用pd.DataFrame.apply来计算计算,但事实并非如此:apply只是一个带有一些额外开销的精简循环。

df = pd.DataFrame({'value': [[[1,92,"D"],[93,93,"C"],[94,113,"S"],[114,120,"C"],[121,181,"S"], [182,187,"C"],[188,292,"S"],[319,319,"S"],[320,353,"C"],[354,393,"D"]],
                             [[18,23,"D"],[24,27,"C"],[28,186,"S"],[187,198,"C"],[199,246,"S"]],
                             [[18,23,"D"],[24,27,"C"],[28,186,"S"],[187,198,"C"],[199,246,"S"]]]})

df['value'] = [sum([y[1]-y[0] for y in x if y[2]=="D"]) for x in df['value']]

print(df)

   value
0    130
1      5
2      5

#1


0  

You are very close. You can structure your calculation in a list comprehension. Then assign the list to a series.

你很近。您可以在列表推导中构建计算结构。然后将列表分配给系列。

You may feel you are vectorising a calculation by using pd.DataFrame.apply, but this is not the case: apply is just a thinly veiled loop with some additional overhead.

您可能会觉得使用pd.DataFrame.apply来计算计算,但事实并非如此:apply只是一个带有一些额外开销的精简循环。

df = pd.DataFrame({'value': [[[1,92,"D"],[93,93,"C"],[94,113,"S"],[114,120,"C"],[121,181,"S"], [182,187,"C"],[188,292,"S"],[319,319,"S"],[320,353,"C"],[354,393,"D"]],
                             [[18,23,"D"],[24,27,"C"],[28,186,"S"],[187,198,"C"],[199,246,"S"]],
                             [[18,23,"D"],[24,27,"C"],[28,186,"S"],[187,198,"C"],[199,246,"S"]]]})

df['value'] = [sum([y[1]-y[0] for y in x if y[2]=="D"]) for x in df['value']]

print(df)

   value
0    130
1      5
2      5