Pandas可以绘制日期直方图吗？

时间：2021-04-16 21:21:16

I've taken my Series and coerced it to a datetime column of dtype=datetime64[ns] (though only need day resolution...not sure how to change).

我已经拿走了我的系列并将其强制转换为dtype = datetime64 [ns]的日期时间列（尽管只需要一天的分辨率......不知道如何更改）。

import pandas as pd
df = pd.read_csv('somefile.csv')
column = df['date']
column = pd.to_datetime(column, coerce=True)

but plotting doesn't work:

但绘图不起作用：

ipdb> column.plot(kind='hist')
*** TypeError: ufunc add cannot use operands with types dtype('<M8[ns]') and dtype('float64')

I'd like to plot a histogram that just shows the count of dates by week, month, or year.

我想绘制一个直方图，它只显示按周，月或年的日期计数。

Surely there is a way to do this in pandas?

当然有一种方法可以在熊猫中做到这一点？

6 个解决方案

#1

82

Given this df:

鉴于此df：

        date
0 2001-08-10
1 2002-08-31
2 2003-08-29
3 2006-06-21
4 2002-03-27
5 2003-07-14
6 2004-06-15
7 2003-08-14
8 2003-07-29

and, if it's not already the case:

而且，如果不是这样的话：

df["date"] = df["date"].astype("datetime64")

To show the count of dates by month:

要按月显示日期计数：

df.groupby(df["date"].dt.month).count().plot(kind="bar")

.dt allows you to access the datetime properties.

.dt允许您访问日期时间属性。

Which will give you:

哪个会给你：

Pandas可以绘制日期直方图吗？

You can replace month by year, day, etc..

您可以逐年，每天等替换。

If you want to distinguish year and month for instance, just do:

例如，如果要区分年份和月份，请执行以下操作：

df.groupby([df["date"].dt.year, df["date"].dt.month]).count().plot(kind="bar")

Which gives:

这使：

Pandas可以绘制日期直方图吗？

Was it what you wanted ? Is this clear ?

这是你想要的吗？这个清楚吗？

Hope this helps !

希望这可以帮助！

#2

4

I think resample might be what you are looking for. In your case, do:

我认为重新取样可能就是你要找的东西。在你的情况下，做：

df.set_index('date', inplace=True)
# for '1M' for 1 month; '1W' for 1 week; check documentation on offset alias
df.resample('1M', how='count')

It is only doing the counting and not the plot, so you then have to make your own plots.

它只是在进行计数而不是情节，所以你必须制作自己的情节。

See this post for more details on the documentation of resample pandas resample documentation

有关resample pandas resample文档的文档的更多详细信息，请参阅此文章

I have ran into similar problems as you did. Hope this helps.

我遇到了类似的问题。希望这可以帮助。

#3

3

Rendered example

Example Code

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""Create random datetime object."""

# core modules
from datetime import datetime
import random

# 3rd party modules
import pandas as pd
import matplotlib.pyplot as plt


def visualize(df, column_name='start_date', color='#494949', title=''):
    """
    Visualize a dataframe with a date column.

    Parameters
    ----------
    df : Pandas dataframe
    column_name : str
        Column to visualize
    color : str
    title : str
    """
    plt.figure(figsize=(20, 10))
    ax = (df[column_name].groupby(df[column_name].dt.hour)
                         .count()).plot(kind="bar", color=color)
    ax.set_facecolor('#eeeeee')
    ax.set_xlabel("hour of the day")
    ax.set_ylabel("count")
    ax.set_title(title)
    plt.show()


def create_random_datetime(from_date, to_date, rand_type='uniform'):
    """
    Create random date within timeframe.

    Parameters
    ----------
    from_date : datetime object
    to_date : datetime object
    rand_type : {'uniform'}

    Examples
    --------
    >>> random.seed(28041990)
    >>> create_random_datetime(datetime(1990, 4, 28), datetime(2000, 12, 31))
    datetime.datetime(1998, 12, 13, 23, 38, 0, 121628)
    >>> create_random_datetime(datetime(1990, 4, 28), datetime(2000, 12, 31))
    datetime.datetime(2000, 3, 19, 19, 24, 31, 193940)
    """
    delta = to_date - from_date
    if rand_type == 'uniform':
        rand = random.random()
    else:
        raise NotImplementedError('Unknown random mode \'{}\''
                                  .format(rand_type))
    return from_date + rand * delta


def create_df(n=1000):
    """Create a Pandas dataframe with datetime objects."""
    from_date = datetime(1990, 4, 28)
    to_date = datetime(2000, 12, 31)
    sales = [create_random_datetime(from_date, to_date) for _ in range(n)]
    df = pd.DataFrame({'start_date': sales})
    return df


if __name__ == '__main__':
    import doctest
    doctest.testmod()
    df = create_df()
    visualize(df)

#4

1

I was just having trouble with this as well. I imagine that since you're working with dates you want to preserve chronological ordering (like I did.)

我也遇到了麻烦。我想，既然你正在处理日期，你想保留时间顺序（就像我做的那样）。

The workaround then is

解决方法是

import matplotlib.pyplot as plt    
counts = df['date'].value_counts(sort=False)
plt.bar(counts.index,counts)
plt.show()

Please, if anyone knows of a better way please speak up.

如果有人知道更好的方式，请说出来。

EDIT: for jean above, here's a sample of the data [I randomly sampled from the full dataset, hence the trivial histogram data.]

编辑：对于上面的牛仔裤，这是一个数据样本[我从完整数据集中随机抽样，因此是平凡的直方图数据。]

print dates
type(dates),type(dates[0])
dates.hist()
plt.show()

Output:

输出：

0    2001-07-10
1    2002-05-31
2    2003-08-29
3    2006-06-21
4    2002-03-27
5    2003-07-14
6    2004-06-15
7    2002-01-17
Name: Date, dtype: object
<class 'pandas.core.series.Series'> <type 'datetime.date'>

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-38-f39e334eece0> in <module>()
      2 print dates
      3 print type(dates),type(dates[0])
----> 4 dates.hist()
      5 plt.show()

/anaconda/lib/python2.7/site-packages/pandas/tools/plotting.pyc in hist_series(self, by, ax, grid, xlabelsize, xrot, ylabelsize, yrot, figsize, bins, **kwds)
   2570         values = self.dropna().values
   2571 
-> 2572         ax.hist(values, bins=bins, **kwds)
   2573         ax.grid(grid)
   2574         axes = np.array([ax])

/anaconda/lib/python2.7/site-packages/matplotlib/axes/_axes.pyc in hist(self, x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
   5620             for xi in x:
   5621                 if len(xi) > 0:
-> 5622                     xmin = min(xmin, xi.min())
   5623                     xmax = max(xmax, xi.max())
   5624             bin_range = (xmin, xmax)

TypeError: can't compare datetime.date to float

#5

1

I was able to work around this by (1) plotting with matplotlib instead of using the dataframe directly and (2) using the values attribute. See example:

我能够通过（1）使用matplotlib绘图而不是直接使用数据帧和（2）使用values属性来解决这个问题。见例子：

import matplotlib.pyplot as plt

ax = plt.gca()
ax.hist(column.values)

This doesn't work if I don't use values, but I don't know why it does work.

如果我不使用值，这不起作用，但我不知道它为什么起作用。

#6

0

I think for solving that problem, you can use this code, it converts date type to int types:

我认为要解决这个问题，你可以使用这个代码，它将日期类型转换为int类型：

df['date'] = df['date'].astype(int)
df['date'] = pd.to_datetime(df['date'], unit='s')

for getting date only, you can add this code:

要获取日期，您可以添加以下代码：

pd.DatetimeIndex(df.date).normalize()
df['date'] = pd.DatetimeIndex(df.date).normalize()

#1

82

Given this df:

鉴于此df：

        date
0 2001-08-10
1 2002-08-31
2 2003-08-29
3 2006-06-21
4 2002-03-27
5 2003-07-14
6 2004-06-15
7 2003-08-14
8 2003-07-29

and, if it's not already the case:

而且，如果不是这样的话：

df["date"] = df["date"].astype("datetime64")

To show the count of dates by month:

要按月显示日期计数：

df.groupby(df["date"].dt.month).count().plot(kind="bar")

.dt allows you to access the datetime properties.

.dt允许您访问日期时间属性。

Which will give you:

哪个会给你：

Pandas可以绘制日期直方图吗？

You can replace month by year, day, etc..

您可以逐年，每天等替换。

If you want to distinguish year and month for instance, just do:

例如，如果要区分年份和月份，请执行以下操作：

df.groupby([df["date"].dt.year, df["date"].dt.month]).count().plot(kind="bar")

Which gives:

这使：

Pandas可以绘制日期直方图吗？

Was it what you wanted ? Is this clear ?

这是你想要的吗？这个清楚吗？

Hope this helps !

希望这可以帮助！

#2

4

I think resample might be what you are looking for. In your case, do:

我认为重新取样可能就是你要找的东西。在你的情况下，做：

df.set_index('date', inplace=True)
# for '1M' for 1 month; '1W' for 1 week; check documentation on offset alias
df.resample('1M', how='count')

It is only doing the counting and not the plot, so you then have to make your own plots.

它只是在进行计数而不是情节，所以你必须制作自己的情节。

See this post for more details on the documentation of resample pandas resample documentation

有关resample pandas resample文档的文档的更多详细信息，请参阅此文章

I have ran into similar problems as you did. Hope this helps.

我遇到了类似的问题。希望这可以帮助。

#3

3

Rendered example

Example Code

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""Create random datetime object."""

# core modules
from datetime import datetime
import random

# 3rd party modules
import pandas as pd
import matplotlib.pyplot as plt


def visualize(df, column_name='start_date', color='#494949', title=''):
    """
    Visualize a dataframe with a date column.

    Parameters
    ----------
    df : Pandas dataframe
    column_name : str
        Column to visualize
    color : str
    title : str
    """
    plt.figure(figsize=(20, 10))
    ax = (df[column_name].groupby(df[column_name].dt.hour)
                         .count()).plot(kind="bar", color=color)
    ax.set_facecolor('#eeeeee')
    ax.set_xlabel("hour of the day")
    ax.set_ylabel("count")
    ax.set_title(title)
    plt.show()


def create_random_datetime(from_date, to_date, rand_type='uniform'):
    """
    Create random date within timeframe.

    Parameters
    ----------
    from_date : datetime object
    to_date : datetime object
    rand_type : {'uniform'}

    Examples
    --------
    >>> random.seed(28041990)
    >>> create_random_datetime(datetime(1990, 4, 28), datetime(2000, 12, 31))
    datetime.datetime(1998, 12, 13, 23, 38, 0, 121628)
    >>> create_random_datetime(datetime(1990, 4, 28), datetime(2000, 12, 31))
    datetime.datetime(2000, 3, 19, 19, 24, 31, 193940)
    """
    delta = to_date - from_date
    if rand_type == 'uniform':
        rand = random.random()
    else:
        raise NotImplementedError('Unknown random mode \'{}\''
                                  .format(rand_type))
    return from_date + rand * delta


def create_df(n=1000):
    """Create a Pandas dataframe with datetime objects."""
    from_date = datetime(1990, 4, 28)
    to_date = datetime(2000, 12, 31)
    sales = [create_random_datetime(from_date, to_date) for _ in range(n)]
    df = pd.DataFrame({'start_date': sales})
    return df


if __name__ == '__main__':
    import doctest
    doctest.testmod()
    df = create_df()
    visualize(df)

#4

1

I was just having trouble with this as well. I imagine that since you're working with dates you want to preserve chronological ordering (like I did.)

我也遇到了麻烦。我想，既然你正在处理日期，你想保留时间顺序（就像我做的那样）。

The workaround then is

解决方法是

import matplotlib.pyplot as plt    
counts = df['date'].value_counts(sort=False)
plt.bar(counts.index,counts)
plt.show()

Please, if anyone knows of a better way please speak up.

如果有人知道更好的方式，请说出来。

EDIT: for jean above, here's a sample of the data [I randomly sampled from the full dataset, hence the trivial histogram data.]

编辑：对于上面的牛仔裤，这是一个数据样本[我从完整数据集中随机抽样，因此是平凡的直方图数据。]

print dates
type(dates),type(dates[0])
dates.hist()
plt.show()

Output:

输出：

0    2001-07-10
1    2002-05-31
2    2003-08-29
3    2006-06-21
4    2002-03-27
5    2003-07-14
6    2004-06-15
7    2002-01-17
Name: Date, dtype: object
<class 'pandas.core.series.Series'> <type 'datetime.date'>

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-38-f39e334eece0> in <module>()
      2 print dates
      3 print type(dates),type(dates[0])
----> 4 dates.hist()
      5 plt.show()

/anaconda/lib/python2.7/site-packages/pandas/tools/plotting.pyc in hist_series(self, by, ax, grid, xlabelsize, xrot, ylabelsize, yrot, figsize, bins, **kwds)
   2570         values = self.dropna().values
   2571 
-> 2572         ax.hist(values, bins=bins, **kwds)
   2573         ax.grid(grid)
   2574         axes = np.array([ax])

/anaconda/lib/python2.7/site-packages/matplotlib/axes/_axes.pyc in hist(self, x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
   5620             for xi in x:
   5621                 if len(xi) > 0:
-> 5622                     xmin = min(xmin, xi.min())
   5623                     xmax = max(xmax, xi.max())
   5624             bin_range = (xmin, xmax)

TypeError: can't compare datetime.date to float

#5

1

I was able to work around this by (1) plotting with matplotlib instead of using the dataframe directly and (2) using the values attribute. See example:

我能够通过（1）使用matplotlib绘图而不是直接使用数据帧和（2）使用values属性来解决这个问题。见例子：

import matplotlib.pyplot as plt

ax = plt.gca()
ax.hist(column.values)

This doesn't work if I don't use values, but I don't know why it does work.

如果我不使用值，这不起作用，但我不知道它为什么起作用。

#6

0

I think for solving that problem, you can use this code, it converts date type to int types:

我认为要解决这个问题，你可以使用这个代码，它将日期类型转换为int类型：

df['date'] = df['date'].astype(int)
df['date'] = pd.to_datetime(df['date'], unit='s')

for getting date only, you can add this code:

要获取日期，您可以添加以下代码：

pd.DatetimeIndex(df.date).normalize()
df['date'] = pd.DatetimeIndex(df.date).normalize()

标签：pandas python time-series matplotlib

相关文章

