I have a pandas dataframe with date information stored as a string. I want to extract the month from each date directly, so I tried this:
我有一个pandas数据帧,日期信息存储为字符串。我想直接从每个日期中提取月份,所以我尝试了这个:
import pandas as pd
df = pd.DataFrame([['2015-04-16', 5], ['2014-05-01', 6]],columns = ['date','units'])
df['month'] = df['month'].str[5,7]
print(df)
This gives the following output
这给出了以下输出
date units month
0 2015-04-16 5 NaN
1 2014-05-01 6 NaN
The dtype for the NaN's is float, and I have no idea why. Why doesn't this just create another column with the substrings?
NaN的dtype是浮点数,我不知道为什么。为什么这不会创建带有子串的另一列?
2 个解决方案
#1
I think your problem is that your slicing is invalid:
我认为您的问题是您的切片无效:
In [7]:
df = pd.DataFrame([['2015-04-16', 5], ['2014-05-01', 6]],columns = ['date','units'])
df['date'].str[5,7]
Out[7]:
0 NaN
1 NaN
Name: date, dtype: float64
Compare with this:
与此比较:
t='2015-04-16'
t[5,7]
this raises a:
这提出了一个:
TypeError: string indices must be integers
TypeError:字符串索引必须是整数
I think you wanted:
我想你想要:
In [18]:
df = pd.DataFrame([['2015-04-16', 5], ['2014-05-01', 6]],columns = ['date','units'])
df['month'] = df['date'].str[5:7]
df
Out[18]:
date units month
0 2015-04-16 5 04
1 2014-05-01 6 05
So as this is an invalid operation pandas is returning NaN
因此,这是一个无效的操作,pandas正在返回NaN
#2
If you're trying to slice each string to get the substring from 5 to 7, you need a :
, not a ,
:
如果您尝试将每个字符串切片以从5到7获取子字符串,则需要:,而不是a:
>>> df = pd.DataFrame([['2015-04-16', 5], ['2014-05-01', 6]],columns = ['date','units'])
>>> df['month'] = df['date'].str[5:7]
>>> print(df)
date units month
0 2015-04-16 5 04
1 2014-05-01 6 05
#1
I think your problem is that your slicing is invalid:
我认为您的问题是您的切片无效:
In [7]:
df = pd.DataFrame([['2015-04-16', 5], ['2014-05-01', 6]],columns = ['date','units'])
df['date'].str[5,7]
Out[7]:
0 NaN
1 NaN
Name: date, dtype: float64
Compare with this:
与此比较:
t='2015-04-16'
t[5,7]
this raises a:
这提出了一个:
TypeError: string indices must be integers
TypeError:字符串索引必须是整数
I think you wanted:
我想你想要:
In [18]:
df = pd.DataFrame([['2015-04-16', 5], ['2014-05-01', 6]],columns = ['date','units'])
df['month'] = df['date'].str[5:7]
df
Out[18]:
date units month
0 2015-04-16 5 04
1 2014-05-01 6 05
So as this is an invalid operation pandas is returning NaN
因此,这是一个无效的操作,pandas正在返回NaN
#2
If you're trying to slice each string to get the substring from 5 to 7, you need a :
, not a ,
:
如果您尝试将每个字符串切片以从5到7获取子字符串,则需要:,而不是a:
>>> df = pd.DataFrame([['2015-04-16', 5], ['2014-05-01', 6]],columns = ['date','units'])
>>> df['month'] = df['date'].str[5:7]
>>> print(df)
date units month
0 2015-04-16 5 04
1 2014-05-01 6 05