为什么在尝试将列添加到Pandas数据帧时会获得np.NaN值?

时间:2022-05-15 21:24:32

I have a pandas dataframe with date information stored as a string. I want to extract the month from each date directly, so I tried this:

我有一个pandas数据帧,日期信息存储为字符串。我想直接从每个日期中提取月份,所以我尝试了这个:

import pandas as pd

df = pd.DataFrame([['2015-04-16', 5], ['2014-05-01', 6]],columns = ['date','units'])
df['month'] = df['month'].str[5,7]
print(df)

This gives the following output

这给出了以下输出

    date  units  month
0  2015-04-16      5    NaN
1  2014-05-01      6    NaN

The dtype for the NaN's is float, and I have no idea why. Why doesn't this just create another column with the substrings?

NaN的dtype是浮点数,我不知道为什么。为什么这不会创建带有子串的另一列?

2 个解决方案

#1


I think your problem is that your slicing is invalid:

我认为您的问题是您的切片无效:

In [7]:

df = pd.DataFrame([['2015-04-16', 5], ['2014-05-01', 6]],columns = ['date','units'])
df['date'].str[5,7]
​
Out[7]:
0   NaN
1   NaN
Name: date, dtype: float64

Compare with this:

与此比较:

t='2015-04-16'
t[5,7]

this raises a:

这提出了一个:

TypeError: string indices must be integers

TypeError:字符串索引必须是整数

I think you wanted:

我想你想要:

In [18]:

df = pd.DataFrame([['2015-04-16', 5], ['2014-05-01', 6]],columns = ['date','units'])
df['month'] = df['date'].str[5:7]
df
Out[18]:
         date  units month
0  2015-04-16      5    04
1  2014-05-01      6    05

So as this is an invalid operation pandas is returning NaN

因此,这是一个无效的操作,pandas正在返回NaN

#2


If you're trying to slice each string to get the substring from 5 to 7, you need a :, not a ,:

如果您尝试将每个字符串切片以从5到7获取子字符串,则需要:,而不是a:

>>> df = pd.DataFrame([['2015-04-16', 5], ['2014-05-01', 6]],columns = ['date','units'])
>>> df['month'] = df['date'].str[5:7]
>>> print(df)
         date  units month
0  2015-04-16      5    04
1  2014-05-01      6    05

#1


I think your problem is that your slicing is invalid:

我认为您的问题是您的切片无效:

In [7]:

df = pd.DataFrame([['2015-04-16', 5], ['2014-05-01', 6]],columns = ['date','units'])
df['date'].str[5,7]
​
Out[7]:
0   NaN
1   NaN
Name: date, dtype: float64

Compare with this:

与此比较:

t='2015-04-16'
t[5,7]

this raises a:

这提出了一个:

TypeError: string indices must be integers

TypeError:字符串索引必须是整数

I think you wanted:

我想你想要:

In [18]:

df = pd.DataFrame([['2015-04-16', 5], ['2014-05-01', 6]],columns = ['date','units'])
df['month'] = df['date'].str[5:7]
df
Out[18]:
         date  units month
0  2015-04-16      5    04
1  2014-05-01      6    05

So as this is an invalid operation pandas is returning NaN

因此,这是一个无效的操作,pandas正在返回NaN

#2


If you're trying to slice each string to get the substring from 5 to 7, you need a :, not a ,:

如果您尝试将每个字符串切片以从5到7获取子字符串,则需要:,而不是a:

>>> df = pd.DataFrame([['2015-04-16', 5], ['2014-05-01', 6]],columns = ['date','units'])
>>> df['month'] = df['date'].str[5:7]
>>> print(df)
         date  units month
0  2015-04-16      5    04
1  2014-05-01      6    05