熊猫:如何改变一列的所有值?

I have a data frame with a column called "Date" and want all the values from this column to have the same value (the year only). Example:

我有一个名为“Date”的列的数据框架，希望该列中的所有值都具有相同的值(仅为年份)。例子:

City     Date
Paris    01/04/2004
Lisbon   01/09/2004
Madrid   2004
Pekin    31/2004

What I want is:

我想要的是:

City     Date
Paris    2004
Lisbon   2004
Madrid   2004
Pekin    2004

Here is my code:

这是我的代码:

fr61_70xls = pd.ExcelFile('AMADEUS FRANCE 1961-1970.xlsx')

#Here we import the individual sheets and clean the sheets    
years=(['1961','1962','1963','1964','1965','1966','1967','1968','1969','1970'])

fr={}

header=(['City','Country','NACE','Cons','Last_year','Op_Rev_EUR_Last_avail_yr','BvD_Indep_Indic','GUO_Name','Legal_status','Date_of_incorporation','Legal_status_date'])

for year in years:
    # save every sheet in variable fr['1961'], fr['1962'] and so on
    fr[year]=fr61_70xls.parse(year,header=0,parse_cols=10)
    fr[year].columns=header
    # drop the entire Legal status date column
    fr[year]=fr[year].drop(['Legal_status_date','Date_of_incorporation'],axis=1)
    # drop every row where GUO Name is empty
    fr[year]=fr[year].dropna(axis=0,how='all',subset=[['GUO_Name']])
    fr[year]=fr[year].set_index(['GUO_Name','Date_of_incorporation'])

It happens that in my DataFrames, called for example fr['1961'] the values of Date_of_incorporation can be anything (strings, integer, and so on), so maybe it would be best to completely erase this column and then attach another column with only the year to the DataFrames?

碰巧在我的DataFrames(例如fr['1961'])中，Date_of_incorporation的值可以是任何值(字符串、整数等等)，所以最好完全删除这个列，然后在DataFrames上附加另一个只有年份的列?

2 个解决方案

#1

As @DSM points out, you can do this more directly using the vectorised string methods:

正如@DSM所指出的，你可以使用矢量化字符串方法更直接地做到这一点:

df['Date'].str[-4:].astype(int)

Or using extract (assuming there is only one set of digits of length 4 somewhere in each string):

或者使用extract(假设每个字符串中只有一组长度为4的数字):

df['Date'].str.extract('(?P<year>\d{4})').astype(int)

An alternative slightly more flexible way, might be to use apply (or equivalently map) to do this:

另一种稍微灵活的方法是使用应用程序(或等效映射)来实现这一点:

df['Date'] = df['Date'].apply(lambda x: int(str(x)[-4:]))
             #  converts the last 4 characters of the string to an integer

The lambda function, is taking the input from the Date and converting it to a year.
You could (and perhaps should) write this more verbosely as:

lambda函数从日期中获取输入并将其转换为一年。你可以(或许应该)更详细地写如下:

def convert_to_year(date_in_some_format);
    date_as_string = str(date_in_some_format)
    year_as_string = date_in_some_format[-4:] # last four characters
    return int(year_as_string)

df['Date'] = df['Date'].apply(convert_to_year)

Perhaps 'Year' is a better name for this column...

也许“年”是这个专栏更好的名字……

#2

You can do a column transformation by using apply

您可以使用apply进行列转换

Define a clean function to remove the dollar and commas and convert your data to float.

定义一个干净的函数来删除美元和逗号，并将数据转换为浮点数。

def clean(x):
    x = x.replace("$", "").replace(",", "").replace(" ", "")
    return float(x)

Next, call it on your column like this.

接下来，像这样在你的专栏中调用它。

data['Revenue'] = data['Revenue'].apply(clean)

#1