I have a pandas dataframe with the following columns;
我有一个熊猫dataframe和以下的专栏;
Date Time
01-06-2013 23:00:00
02-06-2013 01:00:00
02-06-2013 21:00:00
02-06-2013 22:00:00
02-06-2013 23:00:00
03-06-2013 01:00:00
03-06-2013 21:00:00
03-06-2013 22:00:00
03-06-2013 23:00:00
04-06-2013 01:00:00
How do I combine data['Date'] & data['Time'] to get the following? Is there a way of doing it using pd.to_datetime
?
如何组合数据['日期']和数据['时间']来得到以下信息?有使用pd.to_datetime的方法吗?
Date
01-06-2013 23:00:00
02-06-2013 01:00:00
02-06-2013 21:00:00
02-06-2013 22:00:00
02-06-2013 23:00:00
03-06-2013 01:00:00
03-06-2013 21:00:00
03-06-2013 22:00:00
03-06-2013 23:00:00
04-06-2013 01:00:00
6 个解决方案
#1
83
It's worth mentioning that you may have been able to read this in directly e.g. if you were using read_csv
using parse_dates=[['Date', 'Time']]
.
值得一提的是,您可能可以直接在其中阅读本文,例如,如果您使用的是read_csv,使用的是parse_dates=['Date', 'Time']]。
Assuming these are just strings you could simply add them together (with a space), allowing you to apply to_datetime
:
假设这些只是字符串,您可以简单地将它们添加到一起(使用空格),从而允许您应用to_datetime:
In [11]: df['Date'] + ' ' + df['Time']
Out[11]:
0 01-06-2013 23:00:00
1 02-06-2013 01:00:00
2 02-06-2013 21:00:00
3 02-06-2013 22:00:00
4 02-06-2013 23:00:00
5 03-06-2013 01:00:00
6 03-06-2013 21:00:00
7 03-06-2013 22:00:00
8 03-06-2013 23:00:00
9 04-06-2013 01:00:00
dtype: object
In [12]: pd.to_datetime(df['Date'] + ' ' + df['Time'])
Out[12]:
0 2013-01-06 23:00:00
1 2013-02-06 01:00:00
2 2013-02-06 21:00:00
3 2013-02-06 22:00:00
4 2013-02-06 23:00:00
5 2013-03-06 01:00:00
6 2013-03-06 21:00:00
7 2013-03-06 22:00:00
8 2013-03-06 23:00:00
9 2013-04-06 01:00:00
dtype: datetime64[ns]
Note: surprisingly (for me), this works fine with NaNs being converted to NaT, but it is worth worrying that the conversion (perhaps using the raise
argument).
注意:令人惊讶的是(对我来说),这对于将NaNs转换为NaT很有效,但是值得担心的是转换(可能使用raise参数)。
#2
21
The accepted answer works for columns that are of datatype string
. For completeness: I come across this question when searching how to do this when the columns are of datatypes: date and time.
接受的答案适用于数据类型字符串的列。为了完整性:我在搜索如何在数据类型为日期和时间的列时遇到了这个问题。
df.apply(lambda r : pd.datetime.combine(r['date_column_name'],r['time_column_name']),1)
#3
6
I don't have enough reputation to comment on jka.ne so:
我没有足够的声誉来评论jka。不如此:
I had to amend jka.ne's line for it to work:
我必须修改jka。ne的工作路线:
df.apply(lambda r : pd.datetime.combine(r['date_column_name'],r['time_column_name']).time(),1)
This might help others.
这可能会帮助别人。
Also, I have tested a different approach, using replace
instead of combine
:
此外,我还测试了不同的方法,使用replace代替组合:
def combine_date_time(df, datecol, timecol):
return df.apply(lambda row: row[datecol].replace(
hour=row[timecol].hour,
minute=row[timecol].minute),
axis=1)
which in the OP's case would be:
OP的情况是:
combine_date_time(df, 'Date', 'Time')
I have timed both approaches for a relatively large dataset (>500.000 rows), and they both have similar runtimes, but using combine
is faster (59s for replace
vs 50s for combine
).
对于较大的数据集(>500.000行),我对这两种方法都进行了计时,它们的运行时间都很相似,但是使用combine更快(替换59秒,组合50秒)。
#4
4
You can use this to merge date and time into the same column of dataframe.
您可以使用它将日期和时间合并到dataframe的同一列中。
import pandas as pd
data_file = 'data.csv' #path of your file
Reading .csv file with merged columns Date_Time:
使用合并列读取.csv文件
data = pd.read_csv(data_file, parse_dates=[['Date', 'Time']])
You can use this line to keep both other columns also.
您也可以使用这一行来保持其他两列。
data.set_index(['Date', 'Time'], drop=False)
#5
1
You can cast the columns if the types are different (datetime and timestamp or str) and use to_datetime :
如果类型不同(datetime和timestamp或str),可以转换列,并使用to_datetime:
df.loc[:,'Date'] = pd.to_datetime(df.Date.astype(str)+' '+df.Time.astype(str))
Result :
结果:
0 2013-01-06 23:00:00
1 2013-02-06 01:00:00
2 2013-02-06 21:00:00
3 2013-02-06 22:00:00
4 2013-02-06 23:00:00
5 2013-03-06 01:00:00
6 2013-03-06 21:00:00
7 2013-03-06 22:00:00
8 2013-03-06 23:00:00
9 2013-04-06 01:00:00
Best,
请接受我最美好的祝愿,
#6
0
The answer really depends on what your column types are. In my case, I had datetime
and timedelta
.
答案取决于你的列类型。在我的例子中,我有datetime和timedelta。
> df[['Date','Time']].dtypes
Date datetime64[ns]
Time timedelta64[ns]
If this is your case, then you just need to add the columns:
如果这是您的情况,那么您只需要添加列:
> df['Date'] + df['Time']
#1
83
It's worth mentioning that you may have been able to read this in directly e.g. if you were using read_csv
using parse_dates=[['Date', 'Time']]
.
值得一提的是,您可能可以直接在其中阅读本文,例如,如果您使用的是read_csv,使用的是parse_dates=['Date', 'Time']]。
Assuming these are just strings you could simply add them together (with a space), allowing you to apply to_datetime
:
假设这些只是字符串,您可以简单地将它们添加到一起(使用空格),从而允许您应用to_datetime:
In [11]: df['Date'] + ' ' + df['Time']
Out[11]:
0 01-06-2013 23:00:00
1 02-06-2013 01:00:00
2 02-06-2013 21:00:00
3 02-06-2013 22:00:00
4 02-06-2013 23:00:00
5 03-06-2013 01:00:00
6 03-06-2013 21:00:00
7 03-06-2013 22:00:00
8 03-06-2013 23:00:00
9 04-06-2013 01:00:00
dtype: object
In [12]: pd.to_datetime(df['Date'] + ' ' + df['Time'])
Out[12]:
0 2013-01-06 23:00:00
1 2013-02-06 01:00:00
2 2013-02-06 21:00:00
3 2013-02-06 22:00:00
4 2013-02-06 23:00:00
5 2013-03-06 01:00:00
6 2013-03-06 21:00:00
7 2013-03-06 22:00:00
8 2013-03-06 23:00:00
9 2013-04-06 01:00:00
dtype: datetime64[ns]
Note: surprisingly (for me), this works fine with NaNs being converted to NaT, but it is worth worrying that the conversion (perhaps using the raise
argument).
注意:令人惊讶的是(对我来说),这对于将NaNs转换为NaT很有效,但是值得担心的是转换(可能使用raise参数)。
#2
21
The accepted answer works for columns that are of datatype string
. For completeness: I come across this question when searching how to do this when the columns are of datatypes: date and time.
接受的答案适用于数据类型字符串的列。为了完整性:我在搜索如何在数据类型为日期和时间的列时遇到了这个问题。
df.apply(lambda r : pd.datetime.combine(r['date_column_name'],r['time_column_name']),1)
#3
6
I don't have enough reputation to comment on jka.ne so:
我没有足够的声誉来评论jka。不如此:
I had to amend jka.ne's line for it to work:
我必须修改jka。ne的工作路线:
df.apply(lambda r : pd.datetime.combine(r['date_column_name'],r['time_column_name']).time(),1)
This might help others.
这可能会帮助别人。
Also, I have tested a different approach, using replace
instead of combine
:
此外,我还测试了不同的方法,使用replace代替组合:
def combine_date_time(df, datecol, timecol):
return df.apply(lambda row: row[datecol].replace(
hour=row[timecol].hour,
minute=row[timecol].minute),
axis=1)
which in the OP's case would be:
OP的情况是:
combine_date_time(df, 'Date', 'Time')
I have timed both approaches for a relatively large dataset (>500.000 rows), and they both have similar runtimes, but using combine
is faster (59s for replace
vs 50s for combine
).
对于较大的数据集(>500.000行),我对这两种方法都进行了计时,它们的运行时间都很相似,但是使用combine更快(替换59秒,组合50秒)。
#4
4
You can use this to merge date and time into the same column of dataframe.
您可以使用它将日期和时间合并到dataframe的同一列中。
import pandas as pd
data_file = 'data.csv' #path of your file
Reading .csv file with merged columns Date_Time:
使用合并列读取.csv文件
data = pd.read_csv(data_file, parse_dates=[['Date', 'Time']])
You can use this line to keep both other columns also.
您也可以使用这一行来保持其他两列。
data.set_index(['Date', 'Time'], drop=False)
#5
1
You can cast the columns if the types are different (datetime and timestamp or str) and use to_datetime :
如果类型不同(datetime和timestamp或str),可以转换列,并使用to_datetime:
df.loc[:,'Date'] = pd.to_datetime(df.Date.astype(str)+' '+df.Time.astype(str))
Result :
结果:
0 2013-01-06 23:00:00
1 2013-02-06 01:00:00
2 2013-02-06 21:00:00
3 2013-02-06 22:00:00
4 2013-02-06 23:00:00
5 2013-03-06 01:00:00
6 2013-03-06 21:00:00
7 2013-03-06 22:00:00
8 2013-03-06 23:00:00
9 2013-04-06 01:00:00
Best,
请接受我最美好的祝愿,
#6
0
The answer really depends on what your column types are. In my case, I had datetime
and timedelta
.
答案取决于你的列类型。在我的例子中,我有datetime和timedelta。
> df[['Date','Time']].dtypes
Date datetime64[ns]
Time timedelta64[ns]
If this is your case, then you just need to add the columns:
如果这是您的情况,那么您只需要添加列:
> df['Date'] + df['Time']