I'm trying to set the entire column of a dataframe to a specific value.
我试图将dataframe的整个列设置为特定值。
In [1]: df
Out [1]:
issueid industry
0 001 xxx
1 002 xxx
2 003 xxx
3 004 xxx
4 005 xxx
From what I've seen, .loc is the best practice when replacing values in a dataframe (or is it?):
根据我所看到的,.loc是替换dataframe(或者是?)中的值的最佳实践:
In [2]: df.loc[:,'industry'] = 'yyy'
However I still received this much talked-about warning message:
然而,我还是收到了很多关于警告的留言:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
Any ideas? Working with Python 3.5.2 and pandas 0.18.1.
什么好主意吗?使用Python 3.5.2和熊猫0.18.1。
EDIT
编辑
If I do
如果我做
In [3]: df['industry'] = 'yyy'
I got the same warning message.
我得到了同样的警告信息。
4 个解决方案
#1
3
Python can do unexpected things when new objects are defined from existing ones. You stated in a comment above that your dataframe is defined along the lines of df = df_all.loc[df_all['issueid']==specific_id,:]
. In this case, df
is really just a stand-in for the rows stored in the df_all
object: a new object is NOT created in memory.
当从现有对象定义新对象时,Python可以做一些意想不到的事情。您在上面的评论中指出,您的dataframe是按照df = df_all.loc[df_all['issueid']= specific_id,:]的行定义的。在这种情况下,df实际上只是存储在df_all对象中的行的替身:一个新对象不是在内存中创建的。
To avoid these issues altogether, I often have to remind myself to use the copy
module, which explicitly forces objects to be copied in memory so that methods called on the new objects are not applied to the source object. I had the same problem as you, and avoided it using the deepcopy
function.
为了完全避免这些问题,我经常需要提醒自己使用copy模块,该模块显式地强制将对象复制到内存中,以便对新对象调用的方法不应用于源对象。我和你有同样的问题,使用deepcopy函数避免了这个问题。
In your case, this should get rid of the warning message:
在你的情况下,这应该消除警告信息:
from copy import deepcopy
df = deepcopy(df_all.loc[df_all['issueid']==specific_id,:])
df['industry'] = 'yyy'
#2
6
You can do :
你能做什么:
df['industry'] = 'yyy'
#3
0
Change your .loc
line to:
将你的。loc行改为:
df['industry'] = 'yyy'
Example output
示例输出
>>> df
issueid industry
0 1 xxx
1 2 xxx
2 3 xxx
3 4 xxx
4 5 xxx
>>> df['industry'] = 'yyy'
>>> df
issueid industry
0 1 yyy
1 2 yyy
2 3 yyy
3 4 yyy
4 5 yyy
#4
0
Assuming your Data frame is like 'Data' you have to consider if your data is a string or an integer. Both are treated differently. So in this case you need be specific about that.
假设您的数据框架类似于“Data”,您必须考虑您的数据是字符串还是整数。两者都是区别对待。所以在这种情况下,你需要具体说明。
import pandas as pd
data = [('001','xxx'), ('002','xxx'), ('003','xxx'), ('004','xxx'), ('005','xxx')]
df = pd.DataFrame(data,columns=['issueid', 'industry'])
print("Old DataFrame")
print(df)
df.loc[:,'industry'] = str('yyy')
print("New DataFrame")
print(df)
Now if want to put numbers instead of letters you must create and array
现在,如果你想用数字代替字母,你必须创建数组
list_of_ones = [1,1,1,1,1]
df.loc[:,'industry'] = list_of_ones
print(df)
Or if you are using Numpy
或者如果你使用Numpy
import numpy as np
n = len(df)
df.loc[:,'industry'] = np.ones(n)
print(df)
#1
3
Python can do unexpected things when new objects are defined from existing ones. You stated in a comment above that your dataframe is defined along the lines of df = df_all.loc[df_all['issueid']==specific_id,:]
. In this case, df
is really just a stand-in for the rows stored in the df_all
object: a new object is NOT created in memory.
当从现有对象定义新对象时,Python可以做一些意想不到的事情。您在上面的评论中指出,您的dataframe是按照df = df_all.loc[df_all['issueid']= specific_id,:]的行定义的。在这种情况下,df实际上只是存储在df_all对象中的行的替身:一个新对象不是在内存中创建的。
To avoid these issues altogether, I often have to remind myself to use the copy
module, which explicitly forces objects to be copied in memory so that methods called on the new objects are not applied to the source object. I had the same problem as you, and avoided it using the deepcopy
function.
为了完全避免这些问题,我经常需要提醒自己使用copy模块,该模块显式地强制将对象复制到内存中,以便对新对象调用的方法不应用于源对象。我和你有同样的问题,使用deepcopy函数避免了这个问题。
In your case, this should get rid of the warning message:
在你的情况下,这应该消除警告信息:
from copy import deepcopy
df = deepcopy(df_all.loc[df_all['issueid']==specific_id,:])
df['industry'] = 'yyy'
#2
6
You can do :
你能做什么:
df['industry'] = 'yyy'
#3
0
Change your .loc
line to:
将你的。loc行改为:
df['industry'] = 'yyy'
Example output
示例输出
>>> df
issueid industry
0 1 xxx
1 2 xxx
2 3 xxx
3 4 xxx
4 5 xxx
>>> df['industry'] = 'yyy'
>>> df
issueid industry
0 1 yyy
1 2 yyy
2 3 yyy
3 4 yyy
4 5 yyy
#4
0
Assuming your Data frame is like 'Data' you have to consider if your data is a string or an integer. Both are treated differently. So in this case you need be specific about that.
假设您的数据框架类似于“Data”,您必须考虑您的数据是字符串还是整数。两者都是区别对待。所以在这种情况下,你需要具体说明。
import pandas as pd
data = [('001','xxx'), ('002','xxx'), ('003','xxx'), ('004','xxx'), ('005','xxx')]
df = pd.DataFrame(data,columns=['issueid', 'industry'])
print("Old DataFrame")
print(df)
df.loc[:,'industry'] = str('yyy')
print("New DataFrame")
print(df)
Now if want to put numbers instead of letters you must create and array
现在,如果你想用数字代替字母,你必须创建数组
list_of_ones = [1,1,1,1,1]
df.loc[:,'industry'] = list_of_ones
print(df)
Or if you are using Numpy
或者如果你使用Numpy
import numpy as np
n = len(df)
df.loc[:,'industry'] = np.ones(n)
print(df)