如何在numpy datetime64中更改年值?

时间:2022-02-27 15:52:16

I have a pandas DataFrame with dtype=numpy.datetime64 In the data I want to change






or other year. Timedelta is not known, only year number to assign. this displays year in int



but can't assign value. Anyone know how to assign year to numpy.datetime64?


3 个解决方案



Consider the following approach:


In [115]: df
0 2000-01-01
1 2001-02-02
2 2002-03-03
3 2003-04-04
4 2004-05-05

In [116]: df.loc[:, 'Date'] = df['Date'].apply(lambda x: x.replace(year=1999))

In [117]: df
0 1999-01-01
1 1999-02-02
2 1999-03-03
3 1999-04-04
4 1999-05-05



numpy.datetime64 objects are hard to work with. To update a value, it is normally easier to convert the date to a standard Python datetime object, do the change and then convert it back to a numpy.datetime64 value again:

numpy。datetime64对象很难处理。要更新一个值,通常更容易将日期转换为标准的Python datetime对象,进行更改,然后将其转换回numpy。datetime64又值:

import numpy as np
from datetime import datetime

dt64 = np.datetime64('2011-11-14T00:00:00.000000000')

# convert to timestamp:
ts = (dt64 - np.datetime64('1970-01-01T00:00:00Z')) / np.timedelta64(1, 's')

# standard utctime from timestamp
dt = datetime.utcfromtimestamp(ts)

# update year

# convert back to numpy.datetime64:
dt64 = np.datetime64(dt)

There might be simpler ways, but this works, at least.




This vectorised solution gives the same result as using pandas to iterate over with x.replace(year=n), but the speed up on large arrays is at least x10 faster.


It is important to remember the year that the datetime64 object is replaced with should be a leap year. Using the python datetime library, the following crashes: datetime(2012,2,29).replace(year=2011) crashes. Here, the function 'replace_year' will simply move 2012-02-29 to 2011-03-01.

重要的是要记住datetime64对象被替换的年份应该是闰年。使用python datetime库,下面的崩溃:datetime(2012,2,29).replace(year=2011)崩溃。在这里,函数“replace_year”将简单地移动到2012-02-29到2011-03-01。

I'm using numpy v 1.13.1.

我用的是numpy v 1.13.1。

import numpy as np
import pandas as pd

def replace_year(x, year):
    """ Year must be a leap year for this to work """
    # Add number of days x is from JAN-01 to year-01-01 
    x_year = np.datetime64(str(year)+'-01-01') +  (x - x.astype('M8[Y]'))

    # Due to leap years calculate offset of 1 day for those days in non-leap year
    yr_mn = x.astype('M8[Y]') + np.timedelta64(59,'D')
    leap_day_offset = (yr_mn.astype('M8[M]') - yr_mn.astype('M8[Y]') - 1).astype(np.int)

    # However, due to days in non-leap years prior March-01, 
    # correct for previous step by removing an extra day
    non_leap_yr_beforeMarch1 = (x.astype('M8[D]') - x.astype('M8[Y]')).astype(np.int) < 59
    non_leap_yr_beforeMarch1 = np.logical_and(non_leap_yr_beforeMarch1, leap_day_offset).astype(np.int)
    day_offset = np.datetime64('1970') - (leap_day_offset - non_leap_yr_beforeMarch1).astype('M8[D]')

    # Finally, apply the day offset 
    x_year = x_year - day_offset
    return x_year

x = np.arange('2012-01-01', '2014-01-01', dtype='datetime64[h]')
x_datetime = pd.to_datetime(x)

x_year = replace_year(x, 1992)
x_datetime = x_datetime.map(lambda x: x.replace(year=1992))

print(np.all(x_datetime.values == x_year))



Consider the following approach:


In [115]: df
0 2000-01-01
1 2001-02-02
2 2002-03-03
3 2003-04-04
4 2004-05-05

In [116]: df.loc[:, 'Date'] = df['Date'].apply(lambda x: x.replace(year=1999))

In [117]: df
0 1999-01-01
1 1999-02-02
2 1999-03-03
3 1999-04-04
4 1999-05-05



numpy.datetime64 objects are hard to work with. To update a value, it is normally easier to convert the date to a standard Python datetime object, do the change and then convert it back to a numpy.datetime64 value again:

numpy。datetime64对象很难处理。要更新一个值,通常更容易将日期转换为标准的Python datetime对象,进行更改,然后将其转换回numpy。datetime64又值:

import numpy as np
from datetime import datetime

dt64 = np.datetime64('2011-11-14T00:00:00.000000000')

# convert to timestamp:
ts = (dt64 - np.datetime64('1970-01-01T00:00:00Z')) / np.timedelta64(1, 's')

# standard utctime from timestamp
dt = datetime.utcfromtimestamp(ts)

# update year

# convert back to numpy.datetime64:
dt64 = np.datetime64(dt)

There might be simpler ways, but this works, at least.




This vectorised solution gives the same result as using pandas to iterate over with x.replace(year=n), but the speed up on large arrays is at least x10 faster.


It is important to remember the year that the datetime64 object is replaced with should be a leap year. Using the python datetime library, the following crashes: datetime(2012,2,29).replace(year=2011) crashes. Here, the function 'replace_year' will simply move 2012-02-29 to 2011-03-01.

重要的是要记住datetime64对象被替换的年份应该是闰年。使用python datetime库,下面的崩溃:datetime(2012,2,29).replace(year=2011)崩溃。在这里,函数“replace_year”将简单地移动到2012-02-29到2011-03-01。

I'm using numpy v 1.13.1.

我用的是numpy v 1.13.1。

import numpy as np
import pandas as pd

def replace_year(x, year):
    """ Year must be a leap year for this to work """
    # Add number of days x is from JAN-01 to year-01-01 
    x_year = np.datetime64(str(year)+'-01-01') +  (x - x.astype('M8[Y]'))

    # Due to leap years calculate offset of 1 day for those days in non-leap year
    yr_mn = x.astype('M8[Y]') + np.timedelta64(59,'D')
    leap_day_offset = (yr_mn.astype('M8[M]') - yr_mn.astype('M8[Y]') - 1).astype(np.int)

    # However, due to days in non-leap years prior March-01, 
    # correct for previous step by removing an extra day
    non_leap_yr_beforeMarch1 = (x.astype('M8[D]') - x.astype('M8[Y]')).astype(np.int) < 59
    non_leap_yr_beforeMarch1 = np.logical_and(non_leap_yr_beforeMarch1, leap_day_offset).astype(np.int)
    day_offset = np.datetime64('1970') - (leap_day_offset - non_leap_yr_beforeMarch1).astype('M8[D]')

    # Finally, apply the day offset 
    x_year = x_year - day_offset
    return x_year

x = np.arange('2012-01-01', '2014-01-01', dtype='datetime64[h]')
x_datetime = pd.to_datetime(x)

x_year = replace_year(x, 1992)
x_datetime = x_datetime.map(lambda x: x.replace(year=1992))

print(np.all(x_datetime.values == x_year))