如何从熊猫数据表中删除一行列表?

时间:2021-07-21 22:55:01

I have a dataframe df :

我有一个dataframe df:

>>> df
                  sales  discount  net_sales    cogs
STK_ID RPT_Date                                     
600141 20060331   2.709       NaN      2.709   2.245
       20060630   6.590       NaN      6.590   5.291
       20060930  10.103       NaN     10.103   7.981
       20061231  15.915       NaN     15.915  12.686
       20070331   3.196       NaN      3.196   2.710
       20070630   7.907       NaN      7.907   6.459

Then I want to drop rows with certain sequence numbers which indicated in a list, suppose here is [1,2,4], then left:

然后我想要删除列表中有特定序号的行,假设这里是[1,2,4],然后左:

                  sales  discount  net_sales    cogs
STK_ID RPT_Date                                     
600141 20060331   2.709       NaN      2.709   2.245
       20061231  15.915       NaN     15.915  12.686
       20070630   7.907       NaN      7.907   6.459

How or what function can do that ?

函数是如何做到这一点的?

7 个解决方案

#1


254  

Use DataFrame.drop and pass it a Series of index labels:

使用DataFrame。删除并传递一系列索引标签:

In [65]: df
Out[65]: 
       one  two
one      1    4
two      2    3
three    3    2
four     4    1


In [66]: df.drop(df.index[[1,3]])
Out[66]: 
       one  two
one      1    4
three    3    2

#2


72  

Note that it may be important to use the "inplace" command when you want to do the drop in line.

注意,当您想要执行下拉时,使用“inplace”命令可能很重要。

df.drop(df.index[[1,3]], inplace=True)

Because your original question is not returning anything, this command should be used. http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.drop.html

因为您最初的问题没有返回任何内容,所以应该使用这个命令。http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.drop.html

#3


32  

You can also pass to DataFrame.drop the label itself (instead of Series of index labels):

您还可以传递到DataFrame。去掉标签本身(而不是一系列的索引标签):

In[17]: df
Out[17]: 
            a         b         c         d         e
one  0.456558 -2.536432  0.216279 -1.305855 -0.121635
two -1.015127 -0.445133  1.867681  2.179392  0.518801

In[18]: df.drop('one')
Out[18]: 
            a         b         c         d         e
two -1.015127 -0.445133  1.867681  2.179392  0.518801

Which is equivalent to:

相当于:

In[19]: df.drop(df.index[[0]])
Out[19]: 
            a         b         c         d         e
two -1.015127 -0.445133  1.867681  2.179392  0.518801

#4


26  

If the DataFrame is huge, and the number of rows to drop is large as well, then simple drop by index df.drop(df.index[]) takes too much time.

如果DataFrame是巨大的,并且要删除的行数也很大,那么通过index df.drop(df.index[])就会花费太多时间。

In my case, I have a multi-indexed DataFrame of floats with 100M rows x 3 cols, and I need to remove 10k rows from it. The fastest method I found is, quite counterintuitively, to take the remaining rows.

在我的示例中,我有一个具有100M行x3 cols的浮动的多索引DataFrame,我需要从中删除10k行。我发现的最快的方法是,完全违背直觉地取剩下的行。

Let indexes_to_drop be an array of positional indexes to drop ([1, 2, 4] in the question).

让indexes_to_drop是要删除的位置索引数组(问题中的[1,2,4])。

indexes_to_keep = set(range(df.shape[0])) - set(indexes_to_drop)
df_sliced = df.take(list(indexes_to_keep))

In my case this took 20.5s, while the simple df.drop took 5min 27s and consumed a lot of memory. The resulting DataFrame is the same.

在我的例子中,这需要20.5秒,而简单的df。drop花了5min 27秒,消耗了很多内存。得到的数据aframe是相同的。

#5


4  

If I want to drop a row which has let's say index x, I would do the following:

如果我想要删除某一行,比如索引x,我要做如下的事情:

df = df[df.index != x]

If I would want to drop multiple indices (say these indices are in the list unwanted_indices), I would do:

如果我想要删除多个索引(比如这些索引在unwanted_index列表中),我将会这样做:

desired_indices = [i for i in len(df.index) if i not in unwanted_indices]
desired_df = df.iloc[desired_indices]

#6


4  

I solved this in a simpler way - just in 2 steps.

我用一种更简单的方法解决了这个问题——只用了两个步骤。

Step 1: First form a dataframe with unwanted rows/data.

步骤1:首先使用不需要的行/数据生成dataframe。

Step 2: Use the index of this unwanted dataframe to drop the rows from the original dataframe.

步骤2:使用这个不需要的dataframe的索引来从原始的dataframe中删除行。

Example:

例子:

Suppose you have a dataframe df which as many columns including 'Age' which is an integer. Now let's say you want to drop all the rows with 'Age' as negative number.

假设您有一个dataframe df,其中有很多列,包括'Age',它是一个整数。现在,假设你想把所有的行都变成负数。

Step 1: df_age_negative = df[ df['Age'] < 0 ]

步骤1:df_age_negative = df[df['Age'] < 0]

Step 2: df = df.drop(df_age_negative.index, axis=0)

步骤2:df = df.drop(df_age_negative)。指数,轴= 0)

Hope this is much simpler and helps you.

希望这更简单,对你有帮助。

#7


2  

In a comment to @theodros-zelleke's answer, @j-jones asked about what to do if the index is not unique. I had to deal with such a situation. What I did was to rename the duplicates in the index before I called drop(), a la:

在对@theodros-zelleke回答的评论中,@j-jones询问如果索引不是唯一的,该怎么办。我必须处理这种情况。我所做的就是在调用drop()之前重命名索引中的副本,a la:

dropped_indexes = <determine-indexes-to-drop>
df.index = rename_duplicates(df.index)
df.drop(df.index[dropped_indexes], inplace=True)

where rename_duplicates() is a function I defined that went through the elements of index and renamed the duplicates. I used the same renaming pattern as pd.read_csv() uses on columns, i.e., "%s.%d" % (name, count), where name is the name of the row and count is how many times it has occurred previously.

rename_duplicate()是我定义的一个函数,它遍历索引元素并重新命名为duplicate。我使用了与在列上使用的p.d.read_csv()相同的重命名模式。,“% s。%d" % (name, count),其中name是行的名称,count是之前发生的次数。

#1


254  

Use DataFrame.drop and pass it a Series of index labels:

使用DataFrame。删除并传递一系列索引标签:

In [65]: df
Out[65]: 
       one  two
one      1    4
two      2    3
three    3    2
four     4    1


In [66]: df.drop(df.index[[1,3]])
Out[66]: 
       one  two
one      1    4
three    3    2

#2


72  

Note that it may be important to use the "inplace" command when you want to do the drop in line.

注意,当您想要执行下拉时,使用“inplace”命令可能很重要。

df.drop(df.index[[1,3]], inplace=True)

Because your original question is not returning anything, this command should be used. http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.drop.html

因为您最初的问题没有返回任何内容,所以应该使用这个命令。http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.drop.html

#3


32  

You can also pass to DataFrame.drop the label itself (instead of Series of index labels):

您还可以传递到DataFrame。去掉标签本身(而不是一系列的索引标签):

In[17]: df
Out[17]: 
            a         b         c         d         e
one  0.456558 -2.536432  0.216279 -1.305855 -0.121635
two -1.015127 -0.445133  1.867681  2.179392  0.518801

In[18]: df.drop('one')
Out[18]: 
            a         b         c         d         e
two -1.015127 -0.445133  1.867681  2.179392  0.518801

Which is equivalent to:

相当于:

In[19]: df.drop(df.index[[0]])
Out[19]: 
            a         b         c         d         e
two -1.015127 -0.445133  1.867681  2.179392  0.518801

#4


26  

If the DataFrame is huge, and the number of rows to drop is large as well, then simple drop by index df.drop(df.index[]) takes too much time.

如果DataFrame是巨大的,并且要删除的行数也很大,那么通过index df.drop(df.index[])就会花费太多时间。

In my case, I have a multi-indexed DataFrame of floats with 100M rows x 3 cols, and I need to remove 10k rows from it. The fastest method I found is, quite counterintuitively, to take the remaining rows.

在我的示例中,我有一个具有100M行x3 cols的浮动的多索引DataFrame,我需要从中删除10k行。我发现的最快的方法是,完全违背直觉地取剩下的行。

Let indexes_to_drop be an array of positional indexes to drop ([1, 2, 4] in the question).

让indexes_to_drop是要删除的位置索引数组(问题中的[1,2,4])。

indexes_to_keep = set(range(df.shape[0])) - set(indexes_to_drop)
df_sliced = df.take(list(indexes_to_keep))

In my case this took 20.5s, while the simple df.drop took 5min 27s and consumed a lot of memory. The resulting DataFrame is the same.

在我的例子中,这需要20.5秒,而简单的df。drop花了5min 27秒,消耗了很多内存。得到的数据aframe是相同的。

#5


4  

If I want to drop a row which has let's say index x, I would do the following:

如果我想要删除某一行,比如索引x,我要做如下的事情:

df = df[df.index != x]

If I would want to drop multiple indices (say these indices are in the list unwanted_indices), I would do:

如果我想要删除多个索引(比如这些索引在unwanted_index列表中),我将会这样做:

desired_indices = [i for i in len(df.index) if i not in unwanted_indices]
desired_df = df.iloc[desired_indices]

#6


4  

I solved this in a simpler way - just in 2 steps.

我用一种更简单的方法解决了这个问题——只用了两个步骤。

Step 1: First form a dataframe with unwanted rows/data.

步骤1:首先使用不需要的行/数据生成dataframe。

Step 2: Use the index of this unwanted dataframe to drop the rows from the original dataframe.

步骤2:使用这个不需要的dataframe的索引来从原始的dataframe中删除行。

Example:

例子:

Suppose you have a dataframe df which as many columns including 'Age' which is an integer. Now let's say you want to drop all the rows with 'Age' as negative number.

假设您有一个dataframe df,其中有很多列,包括'Age',它是一个整数。现在,假设你想把所有的行都变成负数。

Step 1: df_age_negative = df[ df['Age'] < 0 ]

步骤1:df_age_negative = df[df['Age'] < 0]

Step 2: df = df.drop(df_age_negative.index, axis=0)

步骤2:df = df.drop(df_age_negative)。指数,轴= 0)

Hope this is much simpler and helps you.

希望这更简单,对你有帮助。

#7


2  

In a comment to @theodros-zelleke's answer, @j-jones asked about what to do if the index is not unique. I had to deal with such a situation. What I did was to rename the duplicates in the index before I called drop(), a la:

在对@theodros-zelleke回答的评论中,@j-jones询问如果索引不是唯一的,该怎么办。我必须处理这种情况。我所做的就是在调用drop()之前重命名索引中的副本,a la:

dropped_indexes = <determine-indexes-to-drop>
df.index = rename_duplicates(df.index)
df.drop(df.index[dropped_indexes], inplace=True)

where rename_duplicates() is a function I defined that went through the elements of index and renamed the duplicates. I used the same renaming pattern as pd.read_csv() uses on columns, i.e., "%s.%d" % (name, count), where name is the name of the row and count is how many times it has occurred previously.

rename_duplicate()是我定义的一个函数,它遍历索引元素并重新命名为duplicate。我使用了与在列上使用的p.d.read_csv()相同的重命名模式。,“% s。%d" % (name, count),其中name是行的名称,count是之前发生的次数。