使用python熊猫组合日期和时间列

时间:2021-11-26 04:27:30

I have a pandas dataframe with the following columns;

我有一个熊猫dataframe和以下的专栏;

Date              Time
01-06-2013      23:00:00
02-06-2013      01:00:00
02-06-2013      21:00:00
02-06-2013      22:00:00
02-06-2013      23:00:00
03-06-2013      01:00:00
03-06-2013      21:00:00
03-06-2013      22:00:00
03-06-2013      23:00:00
04-06-2013      01:00:00

How do I combine data['Date'] & data['Time'] to get the following? Is there a way of doing it using pd.to_datetime?

如何组合数据['日期']和数据['时间']来得到以下信息?有使用pd.to_datetime的方法吗?

Date
01-06-2013 23:00:00
02-06-2013 01:00:00
02-06-2013 21:00:00
02-06-2013 22:00:00
02-06-2013 23:00:00
03-06-2013 01:00:00
03-06-2013 21:00:00
03-06-2013 22:00:00
03-06-2013 23:00:00
04-06-2013 01:00:00

6 个解决方案

#1


83  

It's worth mentioning that you may have been able to read this in directly e.g. if you were using read_csv using parse_dates=[['Date', 'Time']].

值得一提的是,您可能可以直接在其中阅读本文,例如,如果您使用的是read_csv,使用的是parse_dates=['Date', 'Time']]。

Assuming these are just strings you could simply add them together (with a space), allowing you to apply to_datetime:

假设这些只是字符串,您可以简单地将它们添加到一起(使用空格),从而允许您应用to_datetime:

In [11]: df['Date'] + ' ' + df['Time']
Out[11]:
0    01-06-2013 23:00:00
1    02-06-2013 01:00:00
2    02-06-2013 21:00:00
3    02-06-2013 22:00:00
4    02-06-2013 23:00:00
5    03-06-2013 01:00:00
6    03-06-2013 21:00:00
7    03-06-2013 22:00:00
8    03-06-2013 23:00:00
9    04-06-2013 01:00:00
dtype: object

In [12]: pd.to_datetime(df['Date'] + ' ' + df['Time'])
Out[12]:
0   2013-01-06 23:00:00
1   2013-02-06 01:00:00
2   2013-02-06 21:00:00
3   2013-02-06 22:00:00
4   2013-02-06 23:00:00
5   2013-03-06 01:00:00
6   2013-03-06 21:00:00
7   2013-03-06 22:00:00
8   2013-03-06 23:00:00
9   2013-04-06 01:00:00
dtype: datetime64[ns]

Note: surprisingly (for me), this works fine with NaNs being converted to NaT, but it is worth worrying that the conversion (perhaps using the raise argument).

注意:令人惊讶的是(对我来说),这对于将NaNs转换为NaT很有效,但是值得担心的是转换(可能使用raise参数)。

#2


21  

The accepted answer works for columns that are of datatype string. For completeness: I come across this question when searching how to do this when the columns are of datatypes: date and time.

接受的答案适用于数据类型字符串的列。为了完整性:我在搜索如何在数据类型为日期和时间的列时遇到了这个问题。

df.apply(lambda r : pd.datetime.combine(r['date_column_name'],r['time_column_name']),1)

#3


6  

I don't have enough reputation to comment on jka.ne so:

我没有足够的声誉来评论jka。不如此:

I had to amend jka.ne's line for it to work:

我必须修改jka。ne的工作路线:

df.apply(lambda r : pd.datetime.combine(r['date_column_name'],r['time_column_name']).time(),1)

This might help others.

这可能会帮助别人。

Also, I have tested a different approach, using replace instead of combine:

此外,我还测试了不同的方法,使用replace代替组合:

def combine_date_time(df, datecol, timecol):
    return df.apply(lambda row: row[datecol].replace(
                                hour=row[timecol].hour,
                                minute=row[timecol].minute),
                    axis=1)

which in the OP's case would be:

OP的情况是:

combine_date_time(df, 'Date', 'Time')

I have timed both approaches for a relatively large dataset (>500.000 rows), and they both have similar runtimes, but using combine is faster (59s for replace vs 50s for combine).

对于较大的数据集(>500.000行),我对这两种方法都进行了计时,它们的运行时间都很相似,但是使用combine更快(替换59秒,组合50秒)。

#4


4  

You can use this to merge date and time into the same column of dataframe.

您可以使用它将日期和时间合并到dataframe的同一列中。

import pandas as pd    
data_file = 'data.csv' #path of your file

Reading .csv file with merged columns Date_Time:

使用合并列读取.csv文件

data = pd.read_csv(data_file, parse_dates=[['Date', 'Time']]) 

You can use this line to keep both other columns also.

您也可以使用这一行来保持其他两列。

data.set_index(['Date', 'Time'], drop=False)

#5


1  

You can cast the columns if the types are different (datetime and timestamp or str) and use to_datetime :

如果类型不同(datetime和timestamp或str),可以转换列,并使用to_datetime:

df.loc[:,'Date'] = pd.to_datetime(df.Date.astype(str)+' '+df.Time.astype(str))

Result :

结果:

0   2013-01-06 23:00:00
1   2013-02-06 01:00:00
2   2013-02-06 21:00:00
3   2013-02-06 22:00:00
4   2013-02-06 23:00:00
5   2013-03-06 01:00:00
6   2013-03-06 21:00:00
7   2013-03-06 22:00:00
8   2013-03-06 23:00:00
9   2013-04-06 01:00:00

Best,

请接受我最美好的祝愿,

#6


0  

The answer really depends on what your column types are. In my case, I had datetime and timedelta.

答案取决于你的列类型。在我的例子中,我有datetime和timedelta。

> df[['Date','Time']].dtypes
Date     datetime64[ns]
Time    timedelta64[ns]

If this is your case, then you just need to add the columns:

如果这是您的情况,那么您只需要添加列:

> df['Date'] + df['Time']

#1


83  

It's worth mentioning that you may have been able to read this in directly e.g. if you were using read_csv using parse_dates=[['Date', 'Time']].

值得一提的是,您可能可以直接在其中阅读本文,例如,如果您使用的是read_csv,使用的是parse_dates=['Date', 'Time']]。

Assuming these are just strings you could simply add them together (with a space), allowing you to apply to_datetime:

假设这些只是字符串,您可以简单地将它们添加到一起(使用空格),从而允许您应用to_datetime:

In [11]: df['Date'] + ' ' + df['Time']
Out[11]:
0    01-06-2013 23:00:00
1    02-06-2013 01:00:00
2    02-06-2013 21:00:00
3    02-06-2013 22:00:00
4    02-06-2013 23:00:00
5    03-06-2013 01:00:00
6    03-06-2013 21:00:00
7    03-06-2013 22:00:00
8    03-06-2013 23:00:00
9    04-06-2013 01:00:00
dtype: object

In [12]: pd.to_datetime(df['Date'] + ' ' + df['Time'])
Out[12]:
0   2013-01-06 23:00:00
1   2013-02-06 01:00:00
2   2013-02-06 21:00:00
3   2013-02-06 22:00:00
4   2013-02-06 23:00:00
5   2013-03-06 01:00:00
6   2013-03-06 21:00:00
7   2013-03-06 22:00:00
8   2013-03-06 23:00:00
9   2013-04-06 01:00:00
dtype: datetime64[ns]

Note: surprisingly (for me), this works fine with NaNs being converted to NaT, but it is worth worrying that the conversion (perhaps using the raise argument).

注意:令人惊讶的是(对我来说),这对于将NaNs转换为NaT很有效,但是值得担心的是转换(可能使用raise参数)。

#2


21  

The accepted answer works for columns that are of datatype string. For completeness: I come across this question when searching how to do this when the columns are of datatypes: date and time.

接受的答案适用于数据类型字符串的列。为了完整性:我在搜索如何在数据类型为日期和时间的列时遇到了这个问题。

df.apply(lambda r : pd.datetime.combine(r['date_column_name'],r['time_column_name']),1)

#3


6  

I don't have enough reputation to comment on jka.ne so:

我没有足够的声誉来评论jka。不如此:

I had to amend jka.ne's line for it to work:

我必须修改jka。ne的工作路线:

df.apply(lambda r : pd.datetime.combine(r['date_column_name'],r['time_column_name']).time(),1)

This might help others.

这可能会帮助别人。

Also, I have tested a different approach, using replace instead of combine:

此外,我还测试了不同的方法,使用replace代替组合:

def combine_date_time(df, datecol, timecol):
    return df.apply(lambda row: row[datecol].replace(
                                hour=row[timecol].hour,
                                minute=row[timecol].minute),
                    axis=1)

which in the OP's case would be:

OP的情况是:

combine_date_time(df, 'Date', 'Time')

I have timed both approaches for a relatively large dataset (>500.000 rows), and they both have similar runtimes, but using combine is faster (59s for replace vs 50s for combine).

对于较大的数据集(>500.000行),我对这两种方法都进行了计时,它们的运行时间都很相似,但是使用combine更快(替换59秒,组合50秒)。

#4


4  

You can use this to merge date and time into the same column of dataframe.

您可以使用它将日期和时间合并到dataframe的同一列中。

import pandas as pd    
data_file = 'data.csv' #path of your file

Reading .csv file with merged columns Date_Time:

使用合并列读取.csv文件

data = pd.read_csv(data_file, parse_dates=[['Date', 'Time']]) 

You can use this line to keep both other columns also.

您也可以使用这一行来保持其他两列。

data.set_index(['Date', 'Time'], drop=False)

#5


1  

You can cast the columns if the types are different (datetime and timestamp or str) and use to_datetime :

如果类型不同(datetime和timestamp或str),可以转换列,并使用to_datetime:

df.loc[:,'Date'] = pd.to_datetime(df.Date.astype(str)+' '+df.Time.astype(str))

Result :

结果:

0   2013-01-06 23:00:00
1   2013-02-06 01:00:00
2   2013-02-06 21:00:00
3   2013-02-06 22:00:00
4   2013-02-06 23:00:00
5   2013-03-06 01:00:00
6   2013-03-06 21:00:00
7   2013-03-06 22:00:00
8   2013-03-06 23:00:00
9   2013-04-06 01:00:00

Best,

请接受我最美好的祝愿,

#6


0  

The answer really depends on what your column types are. In my case, I had datetime and timedelta.

答案取决于你的列类型。在我的例子中,我有datetime和timedelta。

> df[['Date','Time']].dtypes
Date     datetime64[ns]
Time    timedelta64[ns]

If this is your case, then you just need to add the columns:

如果这是您的情况,那么您只需要添加列:

> df['Date'] + df['Time']