I have a pandas DataFrame that imports with 2 columns (Time, Heart Rate).
我有一只熊猫DataFrame,它的进口有两栏(时间,心率)。
The time comes in with the format MM:SS.s (for minutes:Seconds.miliseconds). I am trying to convert this time into a float of seconds (e.g. 0.6s or 65.3s) (to later be used to collapse into 10s windows). For example:
是时候采用MM:SS格式了。(几分钟:Seconds.miliseconds)。我正在尝试将这个时间转换为浮点数(例如0.6秒或65.3秒)(稍后用于折叠为10s窗口)。例如:
import pandas as pdhr_raw = pd.read_csv('hr_data.csv')hr_raw.dropna(inplace=True)print(hr_raw.head()) Time HR bpm0 00:00.6 97.01 00:01.0 92.02 00:01.3 80.03 00:01.6 81.04 00:02.0 80.0
Previously (when importing using the standard CSV module) I just split this string, converted to a float and did the math to convert it to seconds:
以前(使用标准的CSV模块导入时),我只是将这个字符串拆分,转换为浮点数,并将其转换为秒:
with open('hr_data.csv', 'rU') as infile: hr_data = list(csv.DictReader(infile, delimiter=',')) for row in hr_data: temp = row['Time'] time.append(float(temp[3:7]) + (float(temp[0:2]) * 60))
Now that I am using a pandas df the code isn't working. I've tried to modify so that I am accessing the 'Time' column (see below), but not having much luck.
现在我使用的是熊猫df,代码不起作用。我尝试过修改,以便访问“Time”列(见下面),但运气不太好。
import pandas as pdwin_size = 10 # user defined window in secondshr_raw = pd.read_csv('hr_data.csv')hr_raw.dropna(inplace=True) #remove NaN artifact from import#### problem code ####for row in hr_raw.Time: hr_raw.Time[row] = float(hr_raw.Time[row][3:]) + float((hr_raw.Time[row][0:2] * 60))# set time as indexhr_raw.set_index('Time', inplace=True)# bin data based on user defined windowhr_bin = hr_raw.groupby((hr_raw.index // win_size + 1) * win_size).mean()
The error that comes up is:
出现的错误是:
Traceback (most recent call last): File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5126) File "pandas\_libs\hashtable_class_helper.pxi", line 759, in pandas._libs.hashtable.Int64HashTable.get_item (pandas\_libs\hashtable.c:14010)TypeError: an integer is requiredDuring handling of the above exception, another exception occurred:Traceback (most recent call last): File "C:\Users\mitbl001\Dropbox\CPET_python\import_hr_csv.py", line 11, in <module> hr_raw.Time[row] = float(hr_raw.Time[row][3:]) + float((hr_raw.Time[row][0:2] * 60)) File "C:\Users\mitbl001\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\series.py", line 601, in __getitem__ result = self.index.get_value(self, key) File "C:\Users\mitbl001\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexes\base.py", line 2477, in get_value tz=getattr(series.dtype, 'tz', None)) File "pandas\_libs\index.pyx", line 98, in pandas._libs.index.IndexEngine.get_value (pandas\_libs\index.c:4404) File "pandas\_libs\index.pyx", line 106, in pandas._libs.index.IndexEngine.get_value (pandas\_libs\index.c:4087) File "pandas\_libs\index.pyx", line 156, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5210)KeyError: '00:00.6'
2 个解决方案
#1
1
Convert your time column to float using pd.to_timedelta
:
使用pd.to_timedelta将时间列转换为浮点数:
df['Time'] = pd.to_timedelta('00:' + df.Time).dt.total_seconds()df Time HR bpm0 0.6 97.01 1.0 92.02 1.3 80.03 1.6 81.04 2.0 80.0
A groupby
should be simple now using the syntax:
一个groupby应该很简单,现在使用语法:
df.groupby(df.Time // x * x)
Where x
is your desired window of time. Here's an example of grouping in intervals of 0.5 seconds and taking the mean of the heart rate:
x是你期望的时间窗口。这里有一个以0。5秒为间隔,取平均心率的例子:
df.groupby(df.Time // 0.5 * 0.5)['HR bpm'].mean()Time0.5 97.01.0 86.01.5 81.02.0 80.0Name: HR bpm, dtype: float64
The above outputs a series. If you want to obtain a dataframe, you can call reset_index
after the groupby.
上面的输出是一个系列。如果您想获得一个dataframe,可以在groupby之后调用reset_index。
df.groupby(df.Time // 0.5 * 0.5)['HR bpm'].mean().reset_index() Time HR bpm0 0.5 97.01 1.0 86.02 1.5 81.03 2.0 80.0
In your case, you'd do something along the lines of df.groupby(df.Time // 10 * 10)
.
在您的例子中,您应该按照df.groupby(df)的思路来做一些事情。时间// 10 * 10)。
#2
1
I think you need indexing with str with cast to float
by astype
:
我认为你需要用str加上cast来索引astype:
hr_raw.Time = hr_raw.Time.str[3:].astype(float) + hr_raw.Time.str[0:2].astype(float) * 60print (hr_raw) Time HR bpm0 0.6 97.01 1.0 92.02 1.3 80.03 1.6 81.04 2.0 80.0
Another solution is converting to_timedelta
, but before add from right side hour
s by radd
:
另一个解决方案是将to_timedelta转换为radd:
hr_raw.Time = pd.to_timedelta(hr_raw.Time.radd('00:')).dt.total_seconds()print (hr_raw) Time HR bpm0 0.6 97.01 1.0 92.02 1.3 80.03 1.6 81.04 2.0 80.0
And then set_index is not necessary, use column Time
:
然后不需要set_index,使用列时间:
# bin data based on user defined windowhr_bin = hr_raw.groupby((hr_raw.Time // win_size + 1) * win_size).mean()print (hr_bin) Time HR bpmTime 10.0 1.3 86.0
#1
1
Convert your time column to float using pd.to_timedelta
:
使用pd.to_timedelta将时间列转换为浮点数:
df['Time'] = pd.to_timedelta('00:' + df.Time).dt.total_seconds()df Time HR bpm0 0.6 97.01 1.0 92.02 1.3 80.03 1.6 81.04 2.0 80.0
A groupby
should be simple now using the syntax:
一个groupby应该很简单,现在使用语法:
df.groupby(df.Time // x * x)
Where x
is your desired window of time. Here's an example of grouping in intervals of 0.5 seconds and taking the mean of the heart rate:
x是你期望的时间窗口。这里有一个以0。5秒为间隔,取平均心率的例子:
df.groupby(df.Time // 0.5 * 0.5)['HR bpm'].mean()Time0.5 97.01.0 86.01.5 81.02.0 80.0Name: HR bpm, dtype: float64
The above outputs a series. If you want to obtain a dataframe, you can call reset_index
after the groupby.
上面的输出是一个系列。如果您想获得一个dataframe,可以在groupby之后调用reset_index。
df.groupby(df.Time // 0.5 * 0.5)['HR bpm'].mean().reset_index() Time HR bpm0 0.5 97.01 1.0 86.02 1.5 81.03 2.0 80.0
In your case, you'd do something along the lines of df.groupby(df.Time // 10 * 10)
.
在您的例子中,您应该按照df.groupby(df)的思路来做一些事情。时间// 10 * 10)。
#2
1
I think you need indexing with str with cast to float
by astype
:
我认为你需要用str加上cast来索引astype:
hr_raw.Time = hr_raw.Time.str[3:].astype(float) + hr_raw.Time.str[0:2].astype(float) * 60print (hr_raw) Time HR bpm0 0.6 97.01 1.0 92.02 1.3 80.03 1.6 81.04 2.0 80.0
Another solution is converting to_timedelta
, but before add from right side hour
s by radd
:
另一个解决方案是将to_timedelta转换为radd:
hr_raw.Time = pd.to_timedelta(hr_raw.Time.radd('00:')).dt.total_seconds()print (hr_raw) Time HR bpm0 0.6 97.01 1.0 92.02 1.3 80.03 1.6 81.04 2.0 80.0
And then set_index is not necessary, use column Time
:
然后不需要set_index,使用列时间:
# bin data based on user defined windowhr_bin = hr_raw.groupby((hr_raw.Time // win_size + 1) * win_size).mean()print (hr_bin) Time HR bpmTime 10.0 1.3 86.0