I am working with a large CSV file that I read with Pandas. One of the columns (not the Index) is timestamp data that look like this:
我正在使用我用Pandas阅读的大型CSV文件。其中一列(不是索引)是时间戳数据,如下所示:
sent>23:56:51.748912
There is the prefix sent>
followed by hour, minute, seconds, microseconds. I want to modify all of these timestamp entries so that the times are shifted backwards by 11 hours. So the example above would look like this:
发送前缀>后跟小时,分钟,秒,微秒。我想修改所有这些时间戳条目,以便时间向后移动11小时。所以上面的例子看起来像这样:
sent>12:56:51.748912
I'm expecting/hoping that there is something smart enough with modulo arithmetic so that time shifting an entry of sent>09:02:13.245511
backwards by 11 will become sent>22:02:13.245511
.
我期待/希望有一些足够智能的模数运算,以便将发送> 09:02:13.245511的条目向后移动11,将发送> 22:02:13.245511。
I am having some difficulty because both the NumPy datetime64
and Pandas TimeSeries
want the full year, month, and day, but I don't have any of that. The documentation and examples I have seen so far have been rather terse. I've tried storing the data in all sorts of various structures (summarized below), but nothing seems to work so far.
我遇到了一些困难,因为NumPy datetime64和Pandas TimeSeries都想要全年,一个月和一天,但我没有这个。到目前为止,我看到的文档和示例都相当简洁。我已经尝试将数据存储在各种各样的结构中(总结如下),但到目前为止似乎没有任何工作。
(Still learning numpy/pandas... please go easy on me...) Here is what I have tried:
(还在学习numpy / pandas ......请放轻松我...)以下是我的尝试:
import pandas as pd
import numpy as np
import datetime
df = pd.read_csv(filename, header=None, delimiter=' ', skiprows=2,
skipfooter=2, names=colnames, index_col=False, engine='python')
senttime_col = np.array(df['sent_time'], dtype='str')
senttime_col = np.char.lstrip(senttime_col, 'sent>')
# this creates np array of strings with elements like: 23:56:51.748585
senttimes_ts = pd.to_datetime(df['sent_time'])
# this creates TimeSeries with elements like: sent>23:56:51.748585
senttimes_ts.tshift(pd.Timedelta('-11 hours'))
# ValueError: Freq was not given and was not set in the index
senttimes_df = pd.DataFrame(senttime_col, index=None)
senttimes_df.shift(periods=-11, freq=pd.Timedelta(hours=1))
# TypeError: unsupported operand type(s) for +: 'numpy.int64' and 'Timedelta'
senttimes = np.datetime64(senttime_col)
# ValueError: Could not convert object to NumPy datetime
senttimes = np.datetime64(senttime_col, 'h:m:s.us')
# TypeError: Invalid datetime unit "h:m:s.us" in metadata
senttimes = np.array(senttime_col, dtype='datetime64[us]')
# ValueError: Error parsing datetime string "00:16:51.748269" at position 2
timelist = [datetime.datetime.strptime(x, '%H:%M:%S.%f') for x in senttime_col]
# ValueError: time data 'None' does not match format '%H:%M:%S.%f'
1 个解决方案
#1
Assuming, s
is your column as a series:
假设,s是你的专栏系列:
s = pd.Series(['sent>12:56:51.748912'] * 10000)
# this removes the 'sent>' string from the beginning
s = s.str[5:]
I'll use this function to lookup dates which I've already parsed -
我将使用此函数查找我已经解析过的日期 -
def lookup2(s):
'''uses .map() to apply changes'''
dates = {date:pd.to_datetime(date) - pd.Timedelta('11 hours') for date in s.unique()}
return s.map(dates)
Then, we save the result back into s
. Note: I didn't face this problem - "I am having some difficulty because both the NumPy datetime64 and Pandas TimeSeries want the full year, month, and day, but I don't have any of that."
然后,我们将结果保存回s。注意:我没有遇到这个问题 - “我遇到了一些困难,因为NumPy datetime64和Pandas TimeSeries都想要全年,一个月和一天,但我没有这个。”
s = lookup2(s)
In [156]: s.head()
Out[156]:
0 2015-05-10 12:56:51.748912
1 2015-05-10 12:56:51.748912
2 2015-05-10 12:56:51.748912
3 2015-05-10 12:56:51.748912
4 2015-05-10 12:56:51.748912
dtype: datetime64[ns]
Moving time back by 11 hours -
搬回时间11小时 -
In [154]: t = (s - pd.Timedelta('11 hours')).dt.time
In [155]: t.head()
Out[155]:
0 23:56:51.748912
1 23:56:51.748912
2 23:56:51.748912
3 23:56:51.748912
4 23:56:51.748912
dtype: object
Please let me know if this works for you.
如果这对你有用,请告诉我。
#1
Assuming, s
is your column as a series:
假设,s是你的专栏系列:
s = pd.Series(['sent>12:56:51.748912'] * 10000)
# this removes the 'sent>' string from the beginning
s = s.str[5:]
I'll use this function to lookup dates which I've already parsed -
我将使用此函数查找我已经解析过的日期 -
def lookup2(s):
'''uses .map() to apply changes'''
dates = {date:pd.to_datetime(date) - pd.Timedelta('11 hours') for date in s.unique()}
return s.map(dates)
Then, we save the result back into s
. Note: I didn't face this problem - "I am having some difficulty because both the NumPy datetime64 and Pandas TimeSeries want the full year, month, and day, but I don't have any of that."
然后,我们将结果保存回s。注意:我没有遇到这个问题 - “我遇到了一些困难,因为NumPy datetime64和Pandas TimeSeries都想要全年,一个月和一天,但我没有这个。”
s = lookup2(s)
In [156]: s.head()
Out[156]:
0 2015-05-10 12:56:51.748912
1 2015-05-10 12:56:51.748912
2 2015-05-10 12:56:51.748912
3 2015-05-10 12:56:51.748912
4 2015-05-10 12:56:51.748912
dtype: datetime64[ns]
Moving time back by 11 hours -
搬回时间11小时 -
In [154]: t = (s - pd.Timedelta('11 hours')).dt.time
In [155]: t.head()
Out[155]:
0 23:56:51.748912
1 23:56:51.748912
2 23:56:51.748912
3 23:56:51.748912
4 23:56:51.748912
dtype: object
Please let me know if this works for you.
如果这对你有用,请告诉我。