I have a dataframe df :
我有一个dataframe df:
>>> df
sales discount net_sales cogs
STK_ID RPT_Date
600141 20060331 2.709 NaN 2.709 2.245
20060630 6.590 NaN 6.590 5.291
20060930 10.103 NaN 10.103 7.981
20061231 15.915 NaN 15.915 12.686
20070331 3.196 NaN 3.196 2.710
20070630 7.907 NaN 7.907 6.459
Then I want to drop rows with certain sequence numbers which indicated in a list, suppose here is [1,2,4],
then left:
然后我想要删除列表中有特定序号的行,假设这里是[1,2,4],然后左:
sales discount net_sales cogs
STK_ID RPT_Date
600141 20060331 2.709 NaN 2.709 2.245
20061231 15.915 NaN 15.915 12.686
20070630 7.907 NaN 7.907 6.459
How or what function can do that ?
函数是如何做到这一点的?
7 个解决方案
#1
254
Use DataFrame.drop and pass it a Series of index labels:
使用DataFrame。删除并传递一系列索引标签:
In [65]: df
Out[65]:
one two
one 1 4
two 2 3
three 3 2
four 4 1
In [66]: df.drop(df.index[[1,3]])
Out[66]:
one two
one 1 4
three 3 2
#2
72
Note that it may be important to use the "inplace" command when you want to do the drop in line.
注意,当您想要执行下拉时,使用“inplace”命令可能很重要。
df.drop(df.index[[1,3]], inplace=True)
Because your original question is not returning anything, this command should be used. http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.drop.html
因为您最初的问题没有返回任何内容,所以应该使用这个命令。http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.drop.html
#3
32
You can also pass to DataFrame.drop the label itself (instead of Series of index labels):
您还可以传递到DataFrame。去掉标签本身(而不是一系列的索引标签):
In[17]: df
Out[17]:
a b c d e
one 0.456558 -2.536432 0.216279 -1.305855 -0.121635
two -1.015127 -0.445133 1.867681 2.179392 0.518801
In[18]: df.drop('one')
Out[18]:
a b c d e
two -1.015127 -0.445133 1.867681 2.179392 0.518801
Which is equivalent to:
相当于:
In[19]: df.drop(df.index[[0]])
Out[19]:
a b c d e
two -1.015127 -0.445133 1.867681 2.179392 0.518801
#4
26
If the DataFrame is huge, and the number of rows to drop is large as well, then simple drop by index df.drop(df.index[])
takes too much time.
如果DataFrame是巨大的,并且要删除的行数也很大,那么通过index df.drop(df.index[])就会花费太多时间。
In my case, I have a multi-indexed DataFrame of floats with 100M rows x 3 cols
, and I need to remove 10k
rows from it. The fastest method I found is, quite counterintuitively, to take
the remaining rows.
在我的示例中,我有一个具有100M行x3 cols的浮动的多索引DataFrame,我需要从中删除10k行。我发现的最快的方法是,完全违背直觉地取剩下的行。
Let indexes_to_drop
be an array of positional indexes to drop ([1, 2, 4]
in the question).
让indexes_to_drop是要删除的位置索引数组(问题中的[1,2,4])。
indexes_to_keep = set(range(df.shape[0])) - set(indexes_to_drop)
df_sliced = df.take(list(indexes_to_keep))
In my case this took 20.5s
, while the simple df.drop
took 5min 27s
and consumed a lot of memory. The resulting DataFrame is the same.
在我的例子中,这需要20.5秒,而简单的df。drop花了5min 27秒,消耗了很多内存。得到的数据aframe是相同的。
#5
4
If I want to drop a row which has let's say index x
, I would do the following:
如果我想要删除某一行,比如索引x,我要做如下的事情:
df = df[df.index != x]
If I would want to drop multiple indices (say these indices are in the list unwanted_indices
), I would do:
如果我想要删除多个索引(比如这些索引在unwanted_index列表中),我将会这样做:
desired_indices = [i for i in len(df.index) if i not in unwanted_indices]
desired_df = df.iloc[desired_indices]
#6
4
I solved this in a simpler way - just in 2 steps.
我用一种更简单的方法解决了这个问题——只用了两个步骤。
Step 1: First form a dataframe with unwanted rows/data.
步骤1:首先使用不需要的行/数据生成dataframe。
Step 2: Use the index of this unwanted dataframe to drop the rows from the original dataframe.
步骤2:使用这个不需要的dataframe的索引来从原始的dataframe中删除行。
Example:
例子:
Suppose you have a dataframe df which as many columns including 'Age' which is an integer. Now let's say you want to drop all the rows with 'Age' as negative number.
假设您有一个dataframe df,其中有很多列,包括'Age',它是一个整数。现在,假设你想把所有的行都变成负数。
Step 1: df_age_negative = df[ df['Age'] < 0 ]
步骤1:df_age_negative = df[df['Age'] < 0]
Step 2: df = df.drop(df_age_negative.index, axis=0)
步骤2:df = df.drop(df_age_negative)。指数,轴= 0)
Hope this is much simpler and helps you.
希望这更简单,对你有帮助。
#7
2
In a comment to @theodros-zelleke's answer, @j-jones asked about what to do if the index is not unique. I had to deal with such a situation. What I did was to rename the duplicates in the index before I called drop()
, a la:
在对@theodros-zelleke回答的评论中,@j-jones询问如果索引不是唯一的,该怎么办。我必须处理这种情况。我所做的就是在调用drop()之前重命名索引中的副本,a la:
dropped_indexes = <determine-indexes-to-drop>
df.index = rename_duplicates(df.index)
df.drop(df.index[dropped_indexes], inplace=True)
where rename_duplicates()
is a function I defined that went through the elements of index and renamed the duplicates. I used the same renaming pattern as pd.read_csv()
uses on columns, i.e., "%s.%d" % (name, count)
, where name
is the name of the row and count
is how many times it has occurred previously.
rename_duplicate()是我定义的一个函数,它遍历索引元素并重新命名为duplicate。我使用了与在列上使用的p.d.read_csv()相同的重命名模式。,“% s。%d" % (name, count),其中name是行的名称,count是之前发生的次数。
#1
254
Use DataFrame.drop and pass it a Series of index labels:
使用DataFrame。删除并传递一系列索引标签:
In [65]: df
Out[65]:
one two
one 1 4
two 2 3
three 3 2
four 4 1
In [66]: df.drop(df.index[[1,3]])
Out[66]:
one two
one 1 4
three 3 2
#2
72
Note that it may be important to use the "inplace" command when you want to do the drop in line.
注意,当您想要执行下拉时,使用“inplace”命令可能很重要。
df.drop(df.index[[1,3]], inplace=True)
Because your original question is not returning anything, this command should be used. http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.drop.html
因为您最初的问题没有返回任何内容,所以应该使用这个命令。http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.drop.html
#3
32
You can also pass to DataFrame.drop the label itself (instead of Series of index labels):
您还可以传递到DataFrame。去掉标签本身(而不是一系列的索引标签):
In[17]: df
Out[17]:
a b c d e
one 0.456558 -2.536432 0.216279 -1.305855 -0.121635
two -1.015127 -0.445133 1.867681 2.179392 0.518801
In[18]: df.drop('one')
Out[18]:
a b c d e
two -1.015127 -0.445133 1.867681 2.179392 0.518801
Which is equivalent to:
相当于:
In[19]: df.drop(df.index[[0]])
Out[19]:
a b c d e
two -1.015127 -0.445133 1.867681 2.179392 0.518801
#4
26
If the DataFrame is huge, and the number of rows to drop is large as well, then simple drop by index df.drop(df.index[])
takes too much time.
如果DataFrame是巨大的,并且要删除的行数也很大,那么通过index df.drop(df.index[])就会花费太多时间。
In my case, I have a multi-indexed DataFrame of floats with 100M rows x 3 cols
, and I need to remove 10k
rows from it. The fastest method I found is, quite counterintuitively, to take
the remaining rows.
在我的示例中,我有一个具有100M行x3 cols的浮动的多索引DataFrame,我需要从中删除10k行。我发现的最快的方法是,完全违背直觉地取剩下的行。
Let indexes_to_drop
be an array of positional indexes to drop ([1, 2, 4]
in the question).
让indexes_to_drop是要删除的位置索引数组(问题中的[1,2,4])。
indexes_to_keep = set(range(df.shape[0])) - set(indexes_to_drop)
df_sliced = df.take(list(indexes_to_keep))
In my case this took 20.5s
, while the simple df.drop
took 5min 27s
and consumed a lot of memory. The resulting DataFrame is the same.
在我的例子中,这需要20.5秒,而简单的df。drop花了5min 27秒,消耗了很多内存。得到的数据aframe是相同的。
#5
4
If I want to drop a row which has let's say index x
, I would do the following:
如果我想要删除某一行,比如索引x,我要做如下的事情:
df = df[df.index != x]
If I would want to drop multiple indices (say these indices are in the list unwanted_indices
), I would do:
如果我想要删除多个索引(比如这些索引在unwanted_index列表中),我将会这样做:
desired_indices = [i for i in len(df.index) if i not in unwanted_indices]
desired_df = df.iloc[desired_indices]
#6
4
I solved this in a simpler way - just in 2 steps.
我用一种更简单的方法解决了这个问题——只用了两个步骤。
Step 1: First form a dataframe with unwanted rows/data.
步骤1:首先使用不需要的行/数据生成dataframe。
Step 2: Use the index of this unwanted dataframe to drop the rows from the original dataframe.
步骤2:使用这个不需要的dataframe的索引来从原始的dataframe中删除行。
Example:
例子:
Suppose you have a dataframe df which as many columns including 'Age' which is an integer. Now let's say you want to drop all the rows with 'Age' as negative number.
假设您有一个dataframe df,其中有很多列,包括'Age',它是一个整数。现在,假设你想把所有的行都变成负数。
Step 1: df_age_negative = df[ df['Age'] < 0 ]
步骤1:df_age_negative = df[df['Age'] < 0]
Step 2: df = df.drop(df_age_negative.index, axis=0)
步骤2:df = df.drop(df_age_negative)。指数,轴= 0)
Hope this is much simpler and helps you.
希望这更简单,对你有帮助。
#7
2
In a comment to @theodros-zelleke's answer, @j-jones asked about what to do if the index is not unique. I had to deal with such a situation. What I did was to rename the duplicates in the index before I called drop()
, a la:
在对@theodros-zelleke回答的评论中,@j-jones询问如果索引不是唯一的,该怎么办。我必须处理这种情况。我所做的就是在调用drop()之前重命名索引中的副本,a la:
dropped_indexes = <determine-indexes-to-drop>
df.index = rename_duplicates(df.index)
df.drop(df.index[dropped_indexes], inplace=True)
where rename_duplicates()
is a function I defined that went through the elements of index and renamed the duplicates. I used the same renaming pattern as pd.read_csv()
uses on columns, i.e., "%s.%d" % (name, count)
, where name
is the name of the row and count
is how many times it has occurred previously.
rename_duplicate()是我定义的一个函数,它遍历索引元素并重新命名为duplicate。我使用了与在列上使用的p.d.read_csv()相同的重命名模式。,“% s。%d" % (name, count),其中name是行的名称,count是之前发生的次数。