Numpy / Python中的快速性能数组处理

时间:2021-06-06 21:20:06

I am trying to find out the optimal way (fastest performance) to process coordinate and measurement data stored in several numpy arrays.

我试图找出处理存储在几个numpy数组中的坐标和测量数据的最佳方式(最快的性能)。

I need to calculate the distance from each grid point (lot, lon, alt value in green in the attached image) to each measurement location (lat, lon, alt, range from target in gray in the attached image). Seeing as there are hundreds of grid points, and thousands of measurement ranges to calculate for each grid point, I would like to iterate through the arrays in the most efficient way possible

我需要计算从每个网格点(附加图像中的绿色,地块,替代值)到每个测量位置(纬度,经度,高度,附加图像中灰色的目标范围)的距离。看到有数百个网格点,以及为每个网格点计算的数千个测量范围,我想以最有效的方式迭代数组

Numpy / Python中的快速性能数组处理

I am trying to decide between how to store the LLA measurements for the grid and measurements, and then what the ideal way is to calculate the Mean Squared Error for each point on the grid based on the delta between the measured range value and the actual range.

我试图决定如何存储网格和测量的LLA测量值,然后根据测量范围值和实际范围之间的差值计算网格上每个点的平均平方误差的理想方法。 。

Any ideas on how to best store these values, and then iterate across the grid to determine the range from each measurement would be very much appreciated. Thanks!!!

关于如何最好地存储这些值,然后在网格中迭代以确定每次测量的范围的任何想法都将非常受欢迎。谢谢!!!

Currently, I am using a 2D meshgrid to store the LLA values for the grid

目前,我正在使用2D网格网格来存储网格的LLA值

# Create a 2D Grid that will be used to store the MSE estimations
# First, create two 1-D arrays representing the X and Y coordinates of our grid
x_delta = abs(xmax-xmin)/gridsize_x
y_delta = abs(ymax-ymin)/gridsize_y
X = np.arange(xmin,xmax+x_delta,x_delta)
Y = np.arange(ymin,ymax+y_delta,y_delta)

# Next, pass arrays to meshgrid to return 2-D coordinate matrices from the 1-D coordinate arrays
grid_lon, grid_lat = np.meshgrid(X, Y)

I have the LLA points and range values from the measurements stored in a measurement class

我有测量类中存储的测量值的LLA点和范围值

measurement_lon = [measurement.gps.getlon() for measurement in target_measurements]
measurement_lat = [measurement.gps.getlat() for measurement in target_measurements]
measurement_range = [measurement.getrange() for measurement in target_measurements]

Measurement class

class RangeMeasurement:

def __init__(self, lat, lon, alt, range):
  self.gps = GpsLocation(lat,lon,alt)
  self.range = range

Really bad pseudocode for range calculation (iterative and very slow)

用于范围计算的非常糟糕的伪代码(迭代且非常慢)

for i in len(grid_lon):
  for j in len(measurement_lat):
    range_error += distance(grid_lon[i],grid_lat[i],measurement_lon[j],measurement_lat[j])-measurement_range[j]      

1 个解决方案

#1


3  

I think the scipy.spatial.distance module will help you out with this problem: http://docs.scipy.org/doc/scipy/reference/spatial.distance.html

我认为scipy.spatial.distance模块将帮助您解决这个问题:http://docs.scipy.org/doc/scipy/reference/spatial.distance.html

You should store your points as 2-d numpy arrays with 2 columns and N rows, where N is the number of points in the array. To convert your grid_lon and grid_lat to this format, use

您应该将点存储为具有2列和N行的2-d numpy数组,其中N是数组中的点数。要将grid_lon和grid_lat转换为此格式,请使用

N1 = grid_lon.size
grid_point_array = np.hstack([grid_lon.reshape((N1,1)), grid_lat.reshape((N1,1))])

This takes all of the values in grid_lon, which are arranged in a rectangular array that is the same shape as the grid, and puts them in an array with one column and N rows. It does the same for grid_lat. The two one-column wide arrays are then combined to create a two column array.

这将获取grid_lon中的所有值,这些值排列在与网格形状相同的矩形阵列中,并将它们放在具有一列和N行的数组中。它对grid_lat也是一样的。然后组合两个单列宽阵列以创建两列阵列。

A similar method can be used to convert your measurement data:

可以使用类似的方法转换您的测量数据:

N2 = len(measurement_lon)
measurment_data_array = np.hstack([np.array(measurement_lon).reshape((N2,1)),
    np.array(measurement_lat).reshape((N2,1))])

Once your data is in this format, you can easily find the distances between each pair of points with scipy.spatial.distance:

一旦您的数据采用这种格式,您就可以使用scipy.spatial.distance轻松找到每对点之间的距离:

d = scipy.spatial.distance.cdist(grid_point_array, measurement_data_array, 'euclidean')

d will be an array with N1 rows and N2 columns, and d[i,j] will be the distance between grid point i and measurement point j.

d将是具有N1行和N2列的阵列,并且d [i,j]将是网格点i和测量点j之间的距离。

EDIT Thanks for clarifying range error. Sounds like an interesting project. This should give you the grid point with the smallest accumulated squared error:

编辑感谢澄清范围错误。听起来像一个有趣的项目。这应该为您提供具有最小累积平方误差的网格点:

measurement_range_array = np.array(measurement_range)
flat_grid_idx = pow(measurement_range_array-d,2).sum(1).argmin()

This takes advantage of broadcasting to get the difference between a point's measured range and its distance from every grid point. All of the errors for a given grid point are then summed, and the resulting 1-D array should be the accumulated error you're looking for. argmin() is called to find the position of the smallest value. To get the x and y grid coordinates from the flattened index, use

这利用广播来获得点的测量范围与其与每个网格点的距离之间的差异。然后对给定网格点的所有误差求和,得到的1-D数组应该是您正在寻找的累积误差。调用argmin()来查找最小值的位置。要从展平的索引获取x和y网格坐标,请使用

grid_x = flat_grid_idx % gridsize_x
grid_y = flat_grid_idx // gridsize_x

(The // is integer division.)

(//是整数除法。)

#1


3  

I think the scipy.spatial.distance module will help you out with this problem: http://docs.scipy.org/doc/scipy/reference/spatial.distance.html

我认为scipy.spatial.distance模块将帮助您解决这个问题:http://docs.scipy.org/doc/scipy/reference/spatial.distance.html

You should store your points as 2-d numpy arrays with 2 columns and N rows, where N is the number of points in the array. To convert your grid_lon and grid_lat to this format, use

您应该将点存储为具有2列和N行的2-d numpy数组,其中N是数组中的点数。要将grid_lon和grid_lat转换为此格式,请使用

N1 = grid_lon.size
grid_point_array = np.hstack([grid_lon.reshape((N1,1)), grid_lat.reshape((N1,1))])

This takes all of the values in grid_lon, which are arranged in a rectangular array that is the same shape as the grid, and puts them in an array with one column and N rows. It does the same for grid_lat. The two one-column wide arrays are then combined to create a two column array.

这将获取grid_lon中的所有值,这些值排列在与网格形状相同的矩形阵列中,并将它们放在具有一列和N行的数组中。它对grid_lat也是一样的。然后组合两个单列宽阵列以创建两列阵列。

A similar method can be used to convert your measurement data:

可以使用类似的方法转换您的测量数据:

N2 = len(measurement_lon)
measurment_data_array = np.hstack([np.array(measurement_lon).reshape((N2,1)),
    np.array(measurement_lat).reshape((N2,1))])

Once your data is in this format, you can easily find the distances between each pair of points with scipy.spatial.distance:

一旦您的数据采用这种格式,您就可以使用scipy.spatial.distance轻松找到每对点之间的距离:

d = scipy.spatial.distance.cdist(grid_point_array, measurement_data_array, 'euclidean')

d will be an array with N1 rows and N2 columns, and d[i,j] will be the distance between grid point i and measurement point j.

d将是具有N1行和N2列的阵列,并且d [i,j]将是网格点i和测量点j之间的距离。

EDIT Thanks for clarifying range error. Sounds like an interesting project. This should give you the grid point with the smallest accumulated squared error:

编辑感谢澄清范围错误。听起来像一个有趣的项目。这应该为您提供具有最小累积平方误差的网格点:

measurement_range_array = np.array(measurement_range)
flat_grid_idx = pow(measurement_range_array-d,2).sum(1).argmin()

This takes advantage of broadcasting to get the difference between a point's measured range and its distance from every grid point. All of the errors for a given grid point are then summed, and the resulting 1-D array should be the accumulated error you're looking for. argmin() is called to find the position of the smallest value. To get the x and y grid coordinates from the flattened index, use

这利用广播来获得点的测量范围与其与每个网格点的距离之间的差异。然后对给定网格点的所有误差求和,得到的1-D数组应该是您正在寻找的累积误差。调用argmin()来查找最小值的位置。要从展平的索引获取x和y网格坐标,请使用

grid_x = flat_grid_idx % gridsize_x
grid_y = flat_grid_idx // gridsize_x

(The // is integer division.)

(//是整数除法。)