按时间间隔分组

时间:2022-07-06 02:52:32

I have a pandas DataFrame that imports with 2 columns (Time, Heart Rate).

我有一只熊猫DataFrame,它的进口有两栏(时间,心率)。

The time comes in with the format MM:SS.s (for minutes:Seconds.miliseconds). I am trying to convert this time into a float of seconds (e.g. 0.6s or 65.3s) (to later be used to collapse into 10s windows). For example:

是时候采用MM:SS格式了。(几分钟:Seconds.miliseconds)。我正在尝试将这个时间转换为浮点数(例如0.6秒或65.3秒)(稍后用于折叠为10s窗口)。例如:

import pandas as pdhr_raw = pd.read_csv('hr_data.csv')hr_raw.dropna(inplace=True)print(hr_raw.head())   Time       HR bpm0  00:00.6    97.01  00:01.0    92.02  00:01.3    80.03  00:01.6    81.04  00:02.0    80.0

Previously (when importing using the standard CSV module) I just split this string, converted to a float and did the math to convert it to seconds:

以前(使用标准的CSV模块导入时),我只是将这个字符串拆分,转换为浮点数,并将其转换为秒:

 with open('hr_data.csv', 'rU') as infile:     hr_data = list(csv.DictReader(infile, delimiter=','))     for row in hr_data:         temp = row['Time']         time.append(float(temp[3:7]) + (float(temp[0:2]) * 60))

Now that I am using a pandas df the code isn't working. I've tried to modify so that I am accessing the 'Time' column (see below), but not having much luck.

现在我使用的是熊猫df,代码不起作用。我尝试过修改,以便访问“Time”列(见下面),但运气不太好。

import pandas as pdwin_size = 10  # user defined window in secondshr_raw = pd.read_csv('hr_data.csv')hr_raw.dropna(inplace=True) #remove NaN artifact from import#### problem code ####for row in hr_raw.Time:    hr_raw.Time[row] = float(hr_raw.Time[row][3:]) + float((hr_raw.Time[row][0:2] * 60))# set time as indexhr_raw.set_index('Time', inplace=True)# bin data based on user defined windowhr_bin = hr_raw.groupby((hr_raw.index // win_size + 1) * win_size).mean()

The error that comes up is:

出现的错误是:

Traceback (most recent call last):  File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5126)  File "pandas\_libs\hashtable_class_helper.pxi", line 759, in pandas._libs.hashtable.Int64HashTable.get_item (pandas\_libs\hashtable.c:14010)TypeError: an integer is requiredDuring handling of the above exception, another exception occurred:Traceback (most recent call last):  File "C:\Users\mitbl001\Dropbox\CPET_python\import_hr_csv.py", line 11, in <module>    hr_raw.Time[row] = float(hr_raw.Time[row][3:]) + float((hr_raw.Time[row][0:2] * 60))  File "C:\Users\mitbl001\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\series.py", line 601, in __getitem__    result = self.index.get_value(self, key)  File "C:\Users\mitbl001\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexes\base.py", line 2477, in get_value    tz=getattr(series.dtype, 'tz', None))  File "pandas\_libs\index.pyx", line 98, in pandas._libs.index.IndexEngine.get_value (pandas\_libs\index.c:4404)  File "pandas\_libs\index.pyx", line 106, in pandas._libs.index.IndexEngine.get_value (pandas\_libs\index.c:4087)  File "pandas\_libs\index.pyx", line 156, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5210)KeyError: '00:00.6'

2 个解决方案

#1


1  

Convert your time column to float using pd.to_timedelta:

使用pd.to_timedelta将时间列转换为浮点数:

df['Time'] = pd.to_timedelta('00:' + df.Time).dt.total_seconds()df   Time  HR bpm0   0.6    97.01   1.0    92.02   1.3    80.03   1.6    81.04   2.0    80.0

A groupby should be simple now using the syntax:

一个groupby应该很简单,现在使用语法:

df.groupby(df.Time // x * x)

Where x is your desired window of time. Here's an example of grouping in intervals of 0.5 seconds and taking the mean of the heart rate:

x是你期望的时间窗口。这里有一个以0。5秒为间隔,取平均心率的例子:

df.groupby(df.Time // 0.5 * 0.5)['HR bpm'].mean()Time0.5    97.01.0    86.01.5    81.02.0    80.0Name: HR bpm, dtype: float64

The above outputs a series. If you want to obtain a dataframe, you can call reset_index after the groupby.

上面的输出是一个系列。如果您想获得一个dataframe,可以在groupby之后调用reset_index。

df.groupby(df.Time // 0.5 * 0.5)['HR bpm'].mean().reset_index()   Time  HR bpm0   0.5    97.01   1.0    86.02   1.5    81.03   2.0    80.0

In your case, you'd do something along the lines of df.groupby(df.Time // 10 * 10).

在您的例子中,您应该按照df.groupby(df)的思路来做一些事情。时间// 10 * 10)。

#2


1  

I think you need indexing with str with cast to float by astype:

我认为你需要用str加上cast来索引astype:

hr_raw.Time = hr_raw.Time.str[3:].astype(float) + hr_raw.Time.str[0:2].astype(float) * 60print (hr_raw)   Time  HR bpm0   0.6    97.01   1.0    92.02   1.3    80.03   1.6    81.04   2.0    80.0

Another solution is converting to_timedelta, but before add from right side hours by radd:

另一个解决方案是将to_timedelta转换为radd:

hr_raw.Time = pd.to_timedelta(hr_raw.Time.radd('00:')).dt.total_seconds()print (hr_raw)   Time  HR bpm0   0.6    97.01   1.0    92.02   1.3    80.03   1.6    81.04   2.0    80.0

And then set_index is not necessary, use column Time:

然后不需要set_index,使用列时间:

# bin data based on user defined windowhr_bin = hr_raw.groupby((hr_raw.Time // win_size + 1) * win_size).mean()print (hr_bin)      Time  HR bpmTime              10.0   1.3    86.0

#1


1  

Convert your time column to float using pd.to_timedelta:

使用pd.to_timedelta将时间列转换为浮点数:

df['Time'] = pd.to_timedelta('00:' + df.Time).dt.total_seconds()df   Time  HR bpm0   0.6    97.01   1.0    92.02   1.3    80.03   1.6    81.04   2.0    80.0

A groupby should be simple now using the syntax:

一个groupby应该很简单,现在使用语法:

df.groupby(df.Time // x * x)

Where x is your desired window of time. Here's an example of grouping in intervals of 0.5 seconds and taking the mean of the heart rate:

x是你期望的时间窗口。这里有一个以0。5秒为间隔,取平均心率的例子:

df.groupby(df.Time // 0.5 * 0.5)['HR bpm'].mean()Time0.5    97.01.0    86.01.5    81.02.0    80.0Name: HR bpm, dtype: float64

The above outputs a series. If you want to obtain a dataframe, you can call reset_index after the groupby.

上面的输出是一个系列。如果您想获得一个dataframe,可以在groupby之后调用reset_index。

df.groupby(df.Time // 0.5 * 0.5)['HR bpm'].mean().reset_index()   Time  HR bpm0   0.5    97.01   1.0    86.02   1.5    81.03   2.0    80.0

In your case, you'd do something along the lines of df.groupby(df.Time // 10 * 10).

在您的例子中,您应该按照df.groupby(df)的思路来做一些事情。时间// 10 * 10)。

#2


1  

I think you need indexing with str with cast to float by astype:

我认为你需要用str加上cast来索引astype:

hr_raw.Time = hr_raw.Time.str[3:].astype(float) + hr_raw.Time.str[0:2].astype(float) * 60print (hr_raw)   Time  HR bpm0   0.6    97.01   1.0    92.02   1.3    80.03   1.6    81.04   2.0    80.0

Another solution is converting to_timedelta, but before add from right side hours by radd:

另一个解决方案是将to_timedelta转换为radd:

hr_raw.Time = pd.to_timedelta(hr_raw.Time.radd('00:')).dt.total_seconds()print (hr_raw)   Time  HR bpm0   0.6    97.01   1.0    92.02   1.3    80.03   1.6    81.04   2.0    80.0

And then set_index is not necessary, use column Time:

然后不需要set_index,使用列时间:

# bin data based on user defined windowhr_bin = hr_raw.groupby((hr_raw.Time // win_size + 1) * win_size).mean()print (hr_bin)      Time  HR bpmTime              10.0   1.3    86.0