I have a set of data that when plotted most points congregate to the left of the x axis:
我有一组数据,当绘制的大多数点聚集到x轴的左边:
plt.plot(x, y, marker='o')
plt.title('Original')
plt.show()
原始图
I want to use scipy to interpolate the data and later try to fit a quadratic line to the data. I am avoiding to simply fit a quadratic curve without interpolation since this will make the obtained curve biased towards the mass of data at one extreme end of the x axis. I tried this by using
我想用scipy来插入数据,然后尝试用二次曲线来匹配数据。我避免简单地拟合二次曲线而不进行插值,因为这将使得到的曲线偏向x轴一端的数据量。我用了这个。
f = interp1d(x, y, kind='quadratic')
# Array with points in between min(x) and max(x) for interpolation
x_interp = np.linspace(min(x), max(x), num=np.size(x))
# Plot graph with interpolation
plt.plot(x_interp, f(x_interp), marker='o')
plt.title('Interpolated')
plt.show()
and got INTERPOLATED GRAPH.
并得到了插值图。
However, what I intend to get is something like this: EXPECTED GRAPH
但是,我想得到的是这样的:期望图
What am I doing wrong?
我做错了什么?
My values for x can be found here and values for y here. Thank you!
x的值在这里,y的值在这里。谢谢你!
1 个解决方案
#1
2
Solution 1
I'm pretty sure this does what you want. It fits a second degree (quadratic) polynomial to your data, then plots that function on an evenly spaced array of x values ranging from the minimum to the maximum of your original x data.
我很确定这是你想要的。它将一个二次多项式拟合到数据上,然后将该函数绘制在一个均匀间隔的x值数组上,该数组的值从最初的x数据的最小值到最大值不等。
new_x = np.linspace(min(x), max(x), num=np.size(x))
coefs = np.polyfit(x,y,2)
new_line = np.polyval(coefs, new_x)
Plotting it returns:
策划它返回:
plt.scatter(x,y)
plt.scatter(new_x,new_line,c='g', marker='^', s=5)
plt.xlim(min(x)-0.00001,max(x)+0.00001)
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()
if that wasn't what you meant...
However, from your question, it seems like you might be trying to force all your original y-values onto evenly spaced x-values (if that's not your intention, let me know, and I'll just delete this part).
然而,从你的问题来看,你可能试图把所有的y值强制到均匀间隔的x值(如果这不是你的本意,请让我知道,我将删除这部分)。
This is also possible, there are lots of ways to do this, but I've done it here in pandas:
这也是可能的,有很多方法可以做到这一点,但我在这里做过熊猫:
import pandas as pd
xy_df=pd.DataFrame({'x_orig': x, 'y_orig': y})
sorted_x_y=xy_df.sort_values('x_orig')
sorted_x_y['new_x'] = np.linspace(min(x), max(x), np.size(x))
plt.figure(figsize=[5,5])
plt.scatter(sorted_x_y['new_x'], sorted_x_y['y_orig'])
plt.xlim(min(x)-0.00001,max(x)+0.00001)
plt.xticks(rotation=90)
plt.tight_layout()
Which looks pretty different from your original data... which is why I think it might not be exactly what you're looking for.
这看起来和你的原始数据非常不同……这就是为什么我认为它可能不是你想要的。
#1
2
Solution 1
I'm pretty sure this does what you want. It fits a second degree (quadratic) polynomial to your data, then plots that function on an evenly spaced array of x values ranging from the minimum to the maximum of your original x data.
我很确定这是你想要的。它将一个二次多项式拟合到数据上,然后将该函数绘制在一个均匀间隔的x值数组上,该数组的值从最初的x数据的最小值到最大值不等。
new_x = np.linspace(min(x), max(x), num=np.size(x))
coefs = np.polyfit(x,y,2)
new_line = np.polyval(coefs, new_x)
Plotting it returns:
策划它返回:
plt.scatter(x,y)
plt.scatter(new_x,new_line,c='g', marker='^', s=5)
plt.xlim(min(x)-0.00001,max(x)+0.00001)
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()
if that wasn't what you meant...
However, from your question, it seems like you might be trying to force all your original y-values onto evenly spaced x-values (if that's not your intention, let me know, and I'll just delete this part).
然而,从你的问题来看,你可能试图把所有的y值强制到均匀间隔的x值(如果这不是你的本意,请让我知道,我将删除这部分)。
This is also possible, there are lots of ways to do this, but I've done it here in pandas:
这也是可能的,有很多方法可以做到这一点,但我在这里做过熊猫:
import pandas as pd
xy_df=pd.DataFrame({'x_orig': x, 'y_orig': y})
sorted_x_y=xy_df.sort_values('x_orig')
sorted_x_y['new_x'] = np.linspace(min(x), max(x), np.size(x))
plt.figure(figsize=[5,5])
plt.scatter(sorted_x_y['new_x'], sorted_x_y['y_orig'])
plt.xlim(min(x)-0.00001,max(x)+0.00001)
plt.xticks(rotation=90)
plt.tight_layout()
Which looks pretty different from your original data... which is why I think it might not be exactly what you're looking for.
这看起来和你的原始数据非常不同……这就是为什么我认为它可能不是你想要的。