matplotlib:在忽略缺失数据的点之间绘制线条

时间:2022-04-13 05:59:46

I have a set of data which I want plotted as a line-graph. For each series, some data is missing (but different for each series). Currently matplotlib does not draw lines which skip missing data: for example

我有一组数据,我想绘制为线图。对于每个系列,缺少一些数据(但每个系列都不同)。目前,matplotlib不会绘制跳过缺失数据的行:例如

import matplotlib.pyplot as plt

xs = range(8)
series1 = [1, 3, 3, None, None, 5, 8, 9]
series2 = [2, None, 5, None, 4, None, 3, 2]

plt.plot(xs, series1, linestyle='-', marker='o')
plt.plot(xs, series2, linestyle='-', marker='o')

plt.show()

results in a plot with gaps in the lines. How can I tell matplotlib to draw lines through the gaps? (I'd rather not have to interpolate the data).

导致线条中有间隙的图。如何告诉matplotlib通过间隙绘制线条? (我宁愿不必插入数据)。

5 个解决方案

#1


56  

You can mask the NaN values this way:

您可以通过以下方式屏蔽NaN值:

import numpy as np
import matplotlib.pyplot as plt

xs = np.arange(8)
series1 = np.array([1, 3, 3, None, None, 5, 8, 9]).astype(np.double)
s1mask = np.isfinite(series1)
series2 = np.array([2, None, 5, None, 4, None, 3, 2]).astype(np.double)
s2mask = np.isfinite(series2)

plt.plot(xs[s1mask], series1[s1mask], linestyle='-', marker='o')
plt.plot(xs[s2mask], series2[s2mask], linestyle='-', marker='o')

plt.show()

This leads to

这导致

matplotlib:在忽略缺失数据的点之间绘制线条

#2


3  

Qouting @Rutger Kassies (link) :

Qouting @Rutger Kassies(链接):

Matplotlib only draws a line between consecutive (valid) data points, and leaves a gap at NaN values.

Matplotlib仅在连续(有效)数据点之间绘制一条线,并在NaN值处留下间隙。

A solution if you are using Pandas, :

如果您使用Pandas,解决方案:

#pd.Series 
s.dropna().plot() #masking (as @Thorsten Kranz suggestion)

#pd.DataFrame
df['a_col_ffill'] = df['a_col'].ffill(method='ffill')
df['b_col_ffill'] = df['b_col'].ffill(method='ffill')  # changed from a to b
df[['a_col_ffill','b_col_ffill']].plot()

#3


2  

Without interpolation you'll need to remove the None's from the data. This also means you'll need to remove the X-values corresponding to None's in the series. Here's an (ugly) one liner for doing that:

如果没有插值,您需要从数据中删除None。这也意味着您需要删除系列中与None相对应的X值。这是一个(丑陋的)衬里,用于这样做:

  x1Clean,series1Clean = zip(* filter( lambda x: x[1] is not None , zip(xs,series1) ))

The lambda function returns False for None values, filtering the x,series pairs from the list, it then re-zips the data back into its original form.

lambda函数为None值返回False,从列表中过滤x,系列对,然后将数据重新压缩回原始形式。

#4


1  

For what it may be worth, after some trial and error I would like to add one clarification to Thorsten's solution. Hopefully saving time for users who looked elsewhere after having tried this approach.

对于它可能值得的东西,经过一些试验和错误后,我想对Thorsten的解决方案添加一个澄清。希望为尝试这种方法之后在其他地方寻找用户的用户节省时间。

I was unable to get success with an identical problem while using

使用时我无法获得相同问题的成功

from pyplot import *

and attempting to plot with

并尝试用

plot(abscissa[mask],ordinate[mask])

It seemed it was required to use import matplotlib.pyplot as plt to get the proper NaNs handling, though I cannot say why.

似乎需要使用import matplotlib.pyplot作为plt来获得正确的NaNs处理,尽管我不能说为什么。

#5


-1  

Perhaps I missed the point, but I believe Pandas now does this automatically. The example below is a little involved, and requires internet access, but the line for China has lots of gaps in the early years, hence the straight line segments.

也许我错过了这一点,但我相信熊猫现在会自动完成这一点。下面的例子有点涉及,需要互联网接入,但中国的线路在早期有很多空白,因此是直线段。

import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt

# read data from Maddison project 
url = 'http://www.ggdc.net/maddison/maddison-project/data/mpd_2013-01.xlsx'
mpd = pd.read_excel(url, skiprows=2, index_col=0, na_values=[' ']) 
mpd.columns = map(str.rstrip, mpd.columns)

# select countries 
countries = ['England/GB/UK', 'USA', 'Japan', 'China', 'India', 'Argentina']
mpd = mpd[countries].dropna()
mpd = mpd.rename(columns={'England/GB/UK': 'UK'})
mpd = np.log(mpd)/np.log(2)  # convert to log2 

# plots
ax = mpd.plot(lw=2)
ax.set_title('GDP per person', fontsize=14, loc='left')
ax.set_ylabel('GDP Per Capita (1990 USD, log2 scale)')
ax.legend(loc='upper left', fontsize=10, handlelength=2, labelspacing=0.15)
fig = ax.get_figure()
fig.show() 

#1


56  

You can mask the NaN values this way:

您可以通过以下方式屏蔽NaN值:

import numpy as np
import matplotlib.pyplot as plt

xs = np.arange(8)
series1 = np.array([1, 3, 3, None, None, 5, 8, 9]).astype(np.double)
s1mask = np.isfinite(series1)
series2 = np.array([2, None, 5, None, 4, None, 3, 2]).astype(np.double)
s2mask = np.isfinite(series2)

plt.plot(xs[s1mask], series1[s1mask], linestyle='-', marker='o')
plt.plot(xs[s2mask], series2[s2mask], linestyle='-', marker='o')

plt.show()

This leads to

这导致

matplotlib:在忽略缺失数据的点之间绘制线条

#2


3  

Qouting @Rutger Kassies (link) :

Qouting @Rutger Kassies(链接):

Matplotlib only draws a line between consecutive (valid) data points, and leaves a gap at NaN values.

Matplotlib仅在连续(有效)数据点之间绘制一条线,并在NaN值处留下间隙。

A solution if you are using Pandas, :

如果您使用Pandas,解决方案:

#pd.Series 
s.dropna().plot() #masking (as @Thorsten Kranz suggestion)

#pd.DataFrame
df['a_col_ffill'] = df['a_col'].ffill(method='ffill')
df['b_col_ffill'] = df['b_col'].ffill(method='ffill')  # changed from a to b
df[['a_col_ffill','b_col_ffill']].plot()

#3


2  

Without interpolation you'll need to remove the None's from the data. This also means you'll need to remove the X-values corresponding to None's in the series. Here's an (ugly) one liner for doing that:

如果没有插值,您需要从数据中删除None。这也意味着您需要删除系列中与None相对应的X值。这是一个(丑陋的)衬里,用于这样做:

  x1Clean,series1Clean = zip(* filter( lambda x: x[1] is not None , zip(xs,series1) ))

The lambda function returns False for None values, filtering the x,series pairs from the list, it then re-zips the data back into its original form.

lambda函数为None值返回False,从列表中过滤x,系列对,然后将数据重新压缩回原始形式。

#4


1  

For what it may be worth, after some trial and error I would like to add one clarification to Thorsten's solution. Hopefully saving time for users who looked elsewhere after having tried this approach.

对于它可能值得的东西,经过一些试验和错误后,我想对Thorsten的解决方案添加一个澄清。希望为尝试这种方法之后在其他地方寻找用户的用户节省时间。

I was unable to get success with an identical problem while using

使用时我无法获得相同问题的成功

from pyplot import *

and attempting to plot with

并尝试用

plot(abscissa[mask],ordinate[mask])

It seemed it was required to use import matplotlib.pyplot as plt to get the proper NaNs handling, though I cannot say why.

似乎需要使用import matplotlib.pyplot作为plt来获得正确的NaNs处理,尽管我不能说为什么。

#5


-1  

Perhaps I missed the point, but I believe Pandas now does this automatically. The example below is a little involved, and requires internet access, but the line for China has lots of gaps in the early years, hence the straight line segments.

也许我错过了这一点,但我相信熊猫现在会自动完成这一点。下面的例子有点涉及,需要互联网接入,但中国的线路在早期有很多空白,因此是直线段。

import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt

# read data from Maddison project 
url = 'http://www.ggdc.net/maddison/maddison-project/data/mpd_2013-01.xlsx'
mpd = pd.read_excel(url, skiprows=2, index_col=0, na_values=[' ']) 
mpd.columns = map(str.rstrip, mpd.columns)

# select countries 
countries = ['England/GB/UK', 'USA', 'Japan', 'China', 'India', 'Argentina']
mpd = mpd[countries].dropna()
mpd = mpd.rename(columns={'England/GB/UK': 'UK'})
mpd = np.log(mpd)/np.log(2)  # convert to log2 

# plots
ax = mpd.plot(lw=2)
ax.set_title('GDP per person', fontsize=14, loc='left')
ax.set_ylabel('GDP Per Capita (1990 USD, log2 scale)')
ax.legend(loc='upper left', fontsize=10, handlelength=2, labelspacing=0.15)
fig = ax.get_figure()
fig.show()