Pandas纬度 - 经度连续行之间的距离[重复]

时间:2021-04-01 15:23:19

This question already has an answer here:

这个问题在这里已有答案:

I have the following in a Pandas DataFrame in Python 2.7:

我在Python 2.7中的Pandas DataFrame中有以下内容:

Ser_Numb        LAT      LONG
       1  74.166061 30.512811
       2  72.249672 33.427724
       3  67.499828 37.937264
       4  84.253715 69.328767
       5  72.104828 33.823462
       6  63.989462 51.918173
       7  80.209112 33.530778
       8  68.954132 35.981256
       9  83.378214 40.619652
       10 68.778571 6.607066

I am looking to calculate the distance between successive rows in the dataframe. The output should look something like this:

我想计算数据帧中连续行之间的距离。输出应该如下所示:

Ser_Numb          LAT        LONG   Distance
       1    74.166061   30.512811          0
       2    72.249672   33.427724          d_between_Ser_Numb2 and Ser_Numb1
       3    67.499828   37.937264          d_between_Ser_Numb3 and Ser_Numb2
       4    84.253715   69.328767          d_between_Ser_Numb4 and Ser_Numb3
       5    72.104828   33.823462          d_between_Ser_Numb5 and Ser_Numb4
       6    63.989462   51.918173          d_between_Ser_Numb6 and Ser_Numb5
       7    80.209112   33.530778   .
       8    68.954132   35.981256   .
       9    83.378214   40.619652   .
       10   68.778571   6.607066    .

Attempt

This post looks somewhat similar but it is calculating the distance between fixed points. I need the distance between successive points.

这篇文章看起来有些类似,但它正在计算固定点之间的距离。我需要连续点之间的距离。

I tried to adapt this as follows:

我尝试按如下方式调整:

df['LAT_rad'], df['LON_rad'] = np.radians(df['LAT']), np.radians(df['LONG'])
df['dLON'] = df['LON_rad'] - np.radians(df['LON_rad'].shift(1))
df['dLAT'] = df['LAT_rad'] - np.radians(df['LAT_rad'].shift(1))
df['distance'] = 6367 * 2 * np.arcsin(np.sqrt(np.sin(df['dLAT']/2)**2 + math.cos(df['LAT_rad'].astype(float).shift(-1)) * np.cos(df['LAT_rad']) * np.sin(df['dLON']/2)**2))

However, I get the following error:

但是,我收到以下错误:

Traceback (most recent call last):
  File "C:\Python27\test.py", line 115, in <module>
    df['distance'] = 6367 * 2 * np.arcsin(np.sqrt(np.sin(df['dLAT']/2)**2 + math.cos(df['LAT_rad'].astype(float).shift(-1)) * np.cos(df['LAT_rad']) * np.sin(df['dLON']/2)**2))
  File "C:\Python27\lib\site-packages\pandas\core\series.py", line 78, in wrapper
    "{0}".format(str(converter)))
TypeError: cannot convert the series to <type 'float'>
[Finished in 2.3s with exit code 1]

This error was fixed from MaxU's comment. With the fix, the output of this calculation is not making sense - the distance is nearly 8000 km:

MaxU的评论修正了此错误。有了修复,这个计算的输出没有意义 - 距离接近8000公里:

   Ser_Numb        LAT       LONG   LAT_rad   LON_rad      dLON      dLAT     distance
0         1  74.166061  30.512811  1.294442  0.532549       NaN       NaN          NaN
1         2  72.249672  33.427724  1.260995  0.583424  0.574129  1.238402  8010.487211
2         3  67.499828  37.937264  1.178094  0.662130  0.651947  1.156086  7415.364469
3         4  84.253715  69.328767  1.470505  1.210015  1.198459  1.449943  9357.184623
4         5  72.104828  33.823462  1.258467  0.590331  0.569212  1.232802  7992.087820
5         6  63.989462  51.918173  1.116827  0.906143  0.895840  1.094862  7169.812123
6         7  80.209112  33.530778  1.399913  0.585222  0.569407  1.380421  8851.558260
7         8  68.954132  35.981256  1.203477  0.627991  0.617777  1.179044  7559.609520
8         9  83.378214  40.619652  1.455224  0.708947  0.697986  1.434220  9194.371978
9        10  68.778571   6.607066  1.200413  0.115315  0.102942  1.175014          NaN

According to:

  • this online calculator: If I use Latitude1 = 74.166061, Longitude1 = 30.512811, Latitude2 = 72.249672, Longitude2 = 33.427724 then I get 233 km
  • 这个在线计算器:如果我使用Latitude1 = 74.166061,Longitude1 = 30.512811,Latitude2 = 72.249672,Longitude2 = 33.427724那么我得到233 km

  • haversine function found here as: print haversine(30.512811, 74.166061, 33.427724, 72.249672) then I get 232.55 km
  • 在这里找到hasrsine函数:print hasrsine(30.512811,74.166061,33.427724,72.249672)然后我得到232.55 km

The answer should be 233 km, but my approach is giving ~8000 km. I think there is something wrong with how I am trying to iterate between successive rows.

答案应该是233公里,但我的方法是给出~8000公里。我认为我试图在连续的行之间进行迭代是有问题的。

Question: Is there a way to do this in Pandas? Or do I need to loop through the dataframe one row at a time?

问题:在熊猫中有办法做到这一点吗?或者我是否需要一次遍历数据帧一行?

Additional Information:

To create the above DF, select it and copy to clipboard. Then:

要创建上述DF,请选择它并复制到剪贴板。然后:

import pandas as pd
df = pd.read_clipboard()
print df

1 个解决方案

#1


19  

you can use this great solution (c) @ballsatballsdotballs (don't forget to upvote it ;-) or this slightly optimized version:

你可以使用这个伟大的解决方案(c)@ballsatballsdotballs(不要忘记upvote it ;-)或这个稍微优化的版本:

def haversine_np(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance between two points
    on the earth (specified in decimal degrees)

    All args must be of equal length.    

    """
    lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])

    dlon = lon2 - lon1
    dlat = lat2 - lat1

    a = np.sin(dlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2.0)**2

    c = 2 * np.arcsin(np.sqrt(a))
    km = 6367 * c
    return km

df['dist'] = \
    haversine_np(df.LONG.shift(), df.LAT.shift(),
                 df.loc[1:, 'LONG'], df.loc[1:, 'LAT'])

Result:

In [566]: df
Out[566]:
   Ser_Numb        LAT       LONG         dist
0         1  74.166061  30.512811          NaN
1         2  72.249672  33.427724   232.549785
2         3  67.499828  37.937264   554.905446
3         4  84.253715  69.328767  1981.896491
4         5  72.104828  33.823462  1513.397997
5         6  63.989462  51.918173  1164.481327
6         7  80.209112  33.530778  1887.256899
7         8  68.954132  35.981256  1252.531365
8         9  83.378214  40.619652  1606.340727
9        10  68.778571   6.607066  1793.921854

UPDATE: this will help to understand the logic:

更新:这将有助于理解逻辑:

In [573]: pd.concat([df['LAT'].shift(), df.loc[1:, 'LAT']], axis=1, ignore_index=True)
Out[573]:
           0          1
0        NaN        NaN
1  74.166061  72.249672
2  72.249672  67.499828
3  67.499828  84.253715
4  84.253715  72.104828
5  72.104828  63.989462
6  63.989462  80.209112
7  80.209112  68.954132
8  68.954132  83.378214
9  83.378214  68.778571

#1


19  

you can use this great solution (c) @ballsatballsdotballs (don't forget to upvote it ;-) or this slightly optimized version:

你可以使用这个伟大的解决方案(c)@ballsatballsdotballs(不要忘记upvote it ;-)或这个稍微优化的版本:

def haversine_np(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance between two points
    on the earth (specified in decimal degrees)

    All args must be of equal length.    

    """
    lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])

    dlon = lon2 - lon1
    dlat = lat2 - lat1

    a = np.sin(dlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2.0)**2

    c = 2 * np.arcsin(np.sqrt(a))
    km = 6367 * c
    return km

df['dist'] = \
    haversine_np(df.LONG.shift(), df.LAT.shift(),
                 df.loc[1:, 'LONG'], df.loc[1:, 'LAT'])

Result:

In [566]: df
Out[566]:
   Ser_Numb        LAT       LONG         dist
0         1  74.166061  30.512811          NaN
1         2  72.249672  33.427724   232.549785
2         3  67.499828  37.937264   554.905446
3         4  84.253715  69.328767  1981.896491
4         5  72.104828  33.823462  1513.397997
5         6  63.989462  51.918173  1164.481327
6         7  80.209112  33.530778  1887.256899
7         8  68.954132  35.981256  1252.531365
8         9  83.378214  40.619652  1606.340727
9        10  68.778571   6.607066  1793.921854

UPDATE: this will help to understand the logic:

更新:这将有助于理解逻辑:

In [573]: pd.concat([df['LAT'].shift(), df.loc[1:, 'LAT']], axis=1, ignore_index=True)
Out[573]:
           0          1
0        NaN        NaN
1  74.166061  72.249672
2  72.249672  67.499828
3  67.499828  84.253715
4  84.253715  72.104828
5  72.104828  63.989462
6  63.989462  80.209112
7  80.209112  68.954132
8  68.954132  83.378214
9  83.378214  68.778571