I have data with a time-stamp in UTC. I'd like to convert the timezone of this timestamp to 'US/Pacific' and add it as a hierarchical index to a pandas DataFrame. I've been able to convert the timestamp as an Index, but it loses the timezone formatting when I try to add it back into the DataFrame, either as a column or as an index.
我的数据带有UTC时间戳。我想将此时间戳的时区转换为“US / Pacific”,并将其作为分层索引添加到pandas DataFrame中。我已经能够将时间戳转换为索引,但是当我尝试将其添加回DataFrame时,它会丢失时区格式,无论是作为列还是作为索引。
>>> import pandas as pd
>>> dat = pd.DataFrame({'label':['a', 'a', 'a', 'b', 'b', 'b'], 'datetime':['2011-07-19 07:00:00', '2011-07-19 08:00:00', '2011-07-19 09:00:00', '2011-07-19 07:00:00', '2011-07-19 08:00:00', '2011-07-19 09:00:00'], 'value':range(6)})
>>> dat.dtypes
#datetime object
#label object
#value int64
#dtype: object
Now if I try to convert the Series directly I run into an error.
现在,如果我尝试直接转换系列,我会遇到错误。
>>> times = pd.to_datetime(dat['datetime'])
>>> times.tz_localize('UTC')
#Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# File "/Users/erikshilts/workspace/schedule-detection/python/pysched/env/lib/python2.7/site-packages/pandas/core/series.py", line 3170, in tz_localize
# raise Exception('Cannot tz-localize non-time series')
#Exception: Cannot tz-localize non-time series
If I convert it to an Index then I can manipulate it as a timeseries. Notice that the index now has the Pacific timezone.
如果我将其转换为索引,那么我可以将其作为时间序列进行操作。请注意,索引现在具有太平洋时区。
>>> times_index = pd.Index(times)
>>> times_index_pacific = times_index.tz_localize('UTC').tz_convert('US/Pacific')
>>> times_index_pacific
#<class 'pandas.tseries.index.DatetimeIndex'>
#[2011-07-19 00:00:00, ..., 2011-07-19 02:00:00]
#Length: 6, Freq: None, Timezone: US/Pacific
However, now I run into problems adding the index back to the dataframe as it loses its timezone formatting:
但是,现在我遇到了将索引添加回数据帧的问题,因为它丢失了时区格式:
>>> dat_index = dat.set_index([dat['label'], times_index_pacific])
>>> dat_index
# datetime label value
#label
#a 2011-07-19 07:00:00 2011-07-19 07:00:00 a 0
# 2011-07-19 08:00:00 2011-07-19 08:00:00 a 1
# 2011-07-19 09:00:00 2011-07-19 09:00:00 a 2
#b 2011-07-19 07:00:00 2011-07-19 07:00:00 b 3
# 2011-07-19 08:00:00 2011-07-19 08:00:00 b 4
# 2011-07-19 09:00:00 2011-07-19 09:00:00 b 5
You'll notice the index is back on the UTC timezone instead of the converted Pacific timezone.
您会注意到索引返回UTC时区而不是转换后的太平洋时区。
How can I change the timezone and add it as an index to a DataFrame?
如何更改时区并将其添加为DataFrame的索引?
4 个解决方案
#1
8
By now this has been fixed. For example, you can now call:
到目前为止,这已得到修复。例如,您现在可以调用:
dataframe.tz_localize('UTC', level=0)
You'll have to call it twice for the given example, though. (I.e., once for each level.)
但是,对于给定的示例,您必须为它调用两次。 (即,每个级别一次。)
#2
20
If you set it as the index, it's automatically converted to an Index:
如果将其设置为索引,它会自动转换为索引:
In [11]: dat.index = pd.to_datetime(dat.pop('datetime'), utc=True)
In [12]: dat
Out[12]:
label value
datetime
2011-07-19 07:00:00 a 0
2011-07-19 08:00:00 a 1
2011-07-19 09:00:00 a 2
2011-07-19 07:00:00 b 3
2011-07-19 08:00:00 b 4
2011-07-19 09:00:00 b 5
Then do the tz_localize
:
然后执行tz_localize:
In [12]: dat.index = dat.index.tz_localize('UTC').tz_convert('US/Pacific')
In [13]: dat
Out[13]:
label value
datetime
2011-07-19 00:00:00-07:00 a 0
2011-07-19 01:00:00-07:00 a 1
2011-07-19 02:00:00-07:00 a 2
2011-07-19 00:00:00-07:00 b 3
2011-07-19 01:00:00-07:00 b 4
2011-07-19 02:00:00-07:00 b 5
And then you can append the label column to the index:
然后,您可以将标签列附加到索引:
Hmmm this is definitely a bug!
嗯,这绝对是一个错误!
In [14]: dat.set_index('label', append=True).swaplevel(0, 1)
Out[14]:
value
label datetime
a 2011-07-19 07:00:00 0
2011-07-19 08:00:00 1
2011-07-19 09:00:00 2
b 2011-07-19 07:00:00 3
2011-07-19 08:00:00 4
2011-07-19 09:00:00 5
A hacky workaround is to convert the (datetime) level directly (when it's already a MultiIndex):
一个hacky解决方法是直接转换(datetime)级别(当它已经是MultiIndex时):
In [15]: dat.index.levels[1] = dat.index.get_level_values(1).tz_localize('UTC').tz_convert('US/Pacific')
In [16]: dat1
Out[16]:
value
label datetime
a 2011-07-19 00:00:00-07:00 0
2011-07-19 01:00:00-07:00 1
2011-07-19 02:00:00-07:00 2
b 2011-07-19 00:00:00-07:00 3
2011-07-19 01:00:00-07:00 4
2011-07-19 02:00:00-07:00 5
#3
1
An other workaround which works in pandas 0.13.1, and solves the FrozenList can not be assigned problem:
在pandas 0.13.1中工作的另一种解决方法,并解决了FrozenList无法分配的问题:
index.levels = pandas.core.base.FrozenList([
index.levels[0].tz_localize('UTC').tz_convert(tz),
index.levels[1].tz_localize('UTC').tz_convert(tz)
])
Struggling a lot with this issue, MultiIndex loses tz in many other conditions too.
在这个问题上苦苦挣扎,MultiIndex在许多其他条件下也失去了tz。
#4
0
The workaround does not seem to work because the index levels of a hierarchical index seem to be immutable (FrozenList is immutable).
解决方法似乎不起作用,因为层次索引的索引级别似乎是不可变的(FrozenList是不可变的)。
Starting with a singular index and appending also does not work.
从单数索引开始并附加也不起作用。
Creating a lambda function that casts as Timestamp and converts each member of the Series returned by to_datetime() also does not work.
创建一个转换为Timestamp并转换由to_datetime()返回的Series的每个成员的lambda函数也不起作用。
Is there a way to create timezone aware Series and then insert them into a dataframe/make them an index?
有没有办法创建时区感知系列,然后将它们插入数据帧/使它们成为索引?
joined_event_df = joined_event_df.set_index(['pandasTime'])
joined_event_df.index = joined_event_df.index.get_level_values(1).tz_localize('UTC').tz_convert('US/Central')
# we have tz-awareness above this line
joined_event_df = joined_event_df.set_index('sequence', append = True)
# we lose tz-awareness in the index as soon as we add another index
joined_event_df = joined_event_df.swaplevel(0,1)
#1
8
By now this has been fixed. For example, you can now call:
到目前为止,这已得到修复。例如,您现在可以调用:
dataframe.tz_localize('UTC', level=0)
You'll have to call it twice for the given example, though. (I.e., once for each level.)
但是,对于给定的示例,您必须为它调用两次。 (即,每个级别一次。)
#2
20
If you set it as the index, it's automatically converted to an Index:
如果将其设置为索引,它会自动转换为索引:
In [11]: dat.index = pd.to_datetime(dat.pop('datetime'), utc=True)
In [12]: dat
Out[12]:
label value
datetime
2011-07-19 07:00:00 a 0
2011-07-19 08:00:00 a 1
2011-07-19 09:00:00 a 2
2011-07-19 07:00:00 b 3
2011-07-19 08:00:00 b 4
2011-07-19 09:00:00 b 5
Then do the tz_localize
:
然后执行tz_localize:
In [12]: dat.index = dat.index.tz_localize('UTC').tz_convert('US/Pacific')
In [13]: dat
Out[13]:
label value
datetime
2011-07-19 00:00:00-07:00 a 0
2011-07-19 01:00:00-07:00 a 1
2011-07-19 02:00:00-07:00 a 2
2011-07-19 00:00:00-07:00 b 3
2011-07-19 01:00:00-07:00 b 4
2011-07-19 02:00:00-07:00 b 5
And then you can append the label column to the index:
然后,您可以将标签列附加到索引:
Hmmm this is definitely a bug!
嗯,这绝对是一个错误!
In [14]: dat.set_index('label', append=True).swaplevel(0, 1)
Out[14]:
value
label datetime
a 2011-07-19 07:00:00 0
2011-07-19 08:00:00 1
2011-07-19 09:00:00 2
b 2011-07-19 07:00:00 3
2011-07-19 08:00:00 4
2011-07-19 09:00:00 5
A hacky workaround is to convert the (datetime) level directly (when it's already a MultiIndex):
一个hacky解决方法是直接转换(datetime)级别(当它已经是MultiIndex时):
In [15]: dat.index.levels[1] = dat.index.get_level_values(1).tz_localize('UTC').tz_convert('US/Pacific')
In [16]: dat1
Out[16]:
value
label datetime
a 2011-07-19 00:00:00-07:00 0
2011-07-19 01:00:00-07:00 1
2011-07-19 02:00:00-07:00 2
b 2011-07-19 00:00:00-07:00 3
2011-07-19 01:00:00-07:00 4
2011-07-19 02:00:00-07:00 5
#3
1
An other workaround which works in pandas 0.13.1, and solves the FrozenList can not be assigned problem:
在pandas 0.13.1中工作的另一种解决方法,并解决了FrozenList无法分配的问题:
index.levels = pandas.core.base.FrozenList([
index.levels[0].tz_localize('UTC').tz_convert(tz),
index.levels[1].tz_localize('UTC').tz_convert(tz)
])
Struggling a lot with this issue, MultiIndex loses tz in many other conditions too.
在这个问题上苦苦挣扎,MultiIndex在许多其他条件下也失去了tz。
#4
0
The workaround does not seem to work because the index levels of a hierarchical index seem to be immutable (FrozenList is immutable).
解决方法似乎不起作用,因为层次索引的索引级别似乎是不可变的(FrozenList是不可变的)。
Starting with a singular index and appending also does not work.
从单数索引开始并附加也不起作用。
Creating a lambda function that casts as Timestamp and converts each member of the Series returned by to_datetime() also does not work.
创建一个转换为Timestamp并转换由to_datetime()返回的Series的每个成员的lambda函数也不起作用。
Is there a way to create timezone aware Series and then insert them into a dataframe/make them an index?
有没有办法创建时区感知系列,然后将它们插入数据帧/使它们成为索引?
joined_event_df = joined_event_df.set_index(['pandasTime'])
joined_event_df.index = joined_event_df.index.get_level_values(1).tz_localize('UTC').tz_convert('US/Central')
# we have tz-awareness above this line
joined_event_df = joined_event_df.set_index('sequence', append = True)
# we lose tz-awareness in the index as soon as we add another index
joined_event_df = joined_event_df.swaplevel(0,1)