This question already has an answer here:
这个问题在这里已有答案:
- Fast Haversine Approximation (Python/Pandas) 4 answers
快速的Haversine逼近(Python / Pandas)4个答案
I have the following in a Pandas DataFrame in Python 2.7:
我在Python 2.7中的Pandas DataFrame中有以下内容:
Ser_Numb LAT LONG
1 74.166061 30.512811
2 72.249672 33.427724
3 67.499828 37.937264
4 84.253715 69.328767
5 72.104828 33.823462
6 63.989462 51.918173
7 80.209112 33.530778
8 68.954132 35.981256
9 83.378214 40.619652
10 68.778571 6.607066
I am looking to calculate the distance between successive rows in the dataframe. The output should look something like this:
我想计算数据帧中连续行之间的距离。输出应该如下所示:
Ser_Numb LAT LONG Distance
1 74.166061 30.512811 0
2 72.249672 33.427724 d_between_Ser_Numb2 and Ser_Numb1
3 67.499828 37.937264 d_between_Ser_Numb3 and Ser_Numb2
4 84.253715 69.328767 d_between_Ser_Numb4 and Ser_Numb3
5 72.104828 33.823462 d_between_Ser_Numb5 and Ser_Numb4
6 63.989462 51.918173 d_between_Ser_Numb6 and Ser_Numb5
7 80.209112 33.530778 .
8 68.954132 35.981256 .
9 83.378214 40.619652 .
10 68.778571 6.607066 .
Attempt
This post looks somewhat similar but it is calculating the distance between fixed points. I need the distance between successive points.
这篇文章看起来有些类似,但它正在计算固定点之间的距离。我需要连续点之间的距离。
I tried to adapt this as follows:
我尝试按如下方式调整:
df['LAT_rad'], df['LON_rad'] = np.radians(df['LAT']), np.radians(df['LONG'])
df['dLON'] = df['LON_rad'] - np.radians(df['LON_rad'].shift(1))
df['dLAT'] = df['LAT_rad'] - np.radians(df['LAT_rad'].shift(1))
df['distance'] = 6367 * 2 * np.arcsin(np.sqrt(np.sin(df['dLAT']/2)**2 + math.cos(df['LAT_rad'].astype(float).shift(-1)) * np.cos(df['LAT_rad']) * np.sin(df['dLON']/2)**2))
However, I get the following error:
但是,我收到以下错误:
Traceback (most recent call last):
File "C:\Python27\test.py", line 115, in <module>
df['distance'] = 6367 * 2 * np.arcsin(np.sqrt(np.sin(df['dLAT']/2)**2 + math.cos(df['LAT_rad'].astype(float).shift(-1)) * np.cos(df['LAT_rad']) * np.sin(df['dLON']/2)**2))
File "C:\Python27\lib\site-packages\pandas\core\series.py", line 78, in wrapper
"{0}".format(str(converter)))
TypeError: cannot convert the series to <type 'float'>
[Finished in 2.3s with exit code 1]
This error was fixed from MaxU's comment. With the fix, the output of this calculation is not making sense - the distance is nearly 8000 km:
MaxU的评论修正了此错误。有了修复,这个计算的输出没有意义 - 距离接近8000公里:
Ser_Numb LAT LONG LAT_rad LON_rad dLON dLAT distance
0 1 74.166061 30.512811 1.294442 0.532549 NaN NaN NaN
1 2 72.249672 33.427724 1.260995 0.583424 0.574129 1.238402 8010.487211
2 3 67.499828 37.937264 1.178094 0.662130 0.651947 1.156086 7415.364469
3 4 84.253715 69.328767 1.470505 1.210015 1.198459 1.449943 9357.184623
4 5 72.104828 33.823462 1.258467 0.590331 0.569212 1.232802 7992.087820
5 6 63.989462 51.918173 1.116827 0.906143 0.895840 1.094862 7169.812123
6 7 80.209112 33.530778 1.399913 0.585222 0.569407 1.380421 8851.558260
7 8 68.954132 35.981256 1.203477 0.627991 0.617777 1.179044 7559.609520
8 9 83.378214 40.619652 1.455224 0.708947 0.697986 1.434220 9194.371978
9 10 68.778571 6.607066 1.200413 0.115315 0.102942 1.175014 NaN
According to:
- this online calculator: If I use Latitude1 = 74.166061, Longitude1 = 30.512811, Latitude2 = 72.249672, Longitude2 = 33.427724 then I get 233 km
- haversine function found here as:
print haversine(30.512811, 74.166061, 33.427724, 72.249672)
then I get 232.55 km
这个在线计算器:如果我使用Latitude1 = 74.166061,Longitude1 = 30.512811,Latitude2 = 72.249672,Longitude2 = 33.427724那么我得到233 km
在这里找到hasrsine函数:print hasrsine(30.512811,74.166061,33.427724,72.249672)然后我得到232.55 km
The answer should be 233 km, but my approach is giving ~8000 km. I think there is something wrong with how I am trying to iterate between successive rows.
答案应该是233公里,但我的方法是给出~8000公里。我认为我试图在连续的行之间进行迭代是有问题的。
Question: Is there a way to do this in Pandas? Or do I need to loop through the dataframe one row at a time?
问题:在熊猫中有办法做到这一点吗?或者我是否需要一次遍历数据帧一行?
Additional Information:
To create the above DF, select it and copy to clipboard. Then:
要创建上述DF,请选择它并复制到剪贴板。然后:
import pandas as pd
df = pd.read_clipboard()
print df
1 个解决方案
#1
19
you can use this great solution (c) @ballsatballsdotballs (don't forget to upvote it ;-) or this slightly optimized version:
你可以使用这个伟大的解决方案(c)@ballsatballsdotballs(不要忘记upvote it ;-)或这个稍微优化的版本:
def haversine_np(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
All args must be of equal length.
"""
lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = np.sin(dlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2.0)**2
c = 2 * np.arcsin(np.sqrt(a))
km = 6367 * c
return km
df['dist'] = \
haversine_np(df.LONG.shift(), df.LAT.shift(),
df.loc[1:, 'LONG'], df.loc[1:, 'LAT'])
Result:
In [566]: df
Out[566]:
Ser_Numb LAT LONG dist
0 1 74.166061 30.512811 NaN
1 2 72.249672 33.427724 232.549785
2 3 67.499828 37.937264 554.905446
3 4 84.253715 69.328767 1981.896491
4 5 72.104828 33.823462 1513.397997
5 6 63.989462 51.918173 1164.481327
6 7 80.209112 33.530778 1887.256899
7 8 68.954132 35.981256 1252.531365
8 9 83.378214 40.619652 1606.340727
9 10 68.778571 6.607066 1793.921854
UPDATE: this will help to understand the logic:
更新:这将有助于理解逻辑:
In [573]: pd.concat([df['LAT'].shift(), df.loc[1:, 'LAT']], axis=1, ignore_index=True)
Out[573]:
0 1
0 NaN NaN
1 74.166061 72.249672
2 72.249672 67.499828
3 67.499828 84.253715
4 84.253715 72.104828
5 72.104828 63.989462
6 63.989462 80.209112
7 80.209112 68.954132
8 68.954132 83.378214
9 83.378214 68.778571
#1
19
you can use this great solution (c) @ballsatballsdotballs (don't forget to upvote it ;-) or this slightly optimized version:
你可以使用这个伟大的解决方案(c)@ballsatballsdotballs(不要忘记upvote it ;-)或这个稍微优化的版本:
def haversine_np(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
All args must be of equal length.
"""
lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = np.sin(dlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2.0)**2
c = 2 * np.arcsin(np.sqrt(a))
km = 6367 * c
return km
df['dist'] = \
haversine_np(df.LONG.shift(), df.LAT.shift(),
df.loc[1:, 'LONG'], df.loc[1:, 'LAT'])
Result:
In [566]: df
Out[566]:
Ser_Numb LAT LONG dist
0 1 74.166061 30.512811 NaN
1 2 72.249672 33.427724 232.549785
2 3 67.499828 37.937264 554.905446
3 4 84.253715 69.328767 1981.896491
4 5 72.104828 33.823462 1513.397997
5 6 63.989462 51.918173 1164.481327
6 7 80.209112 33.530778 1887.256899
7 8 68.954132 35.981256 1252.531365
8 9 83.378214 40.619652 1606.340727
9 10 68.778571 6.607066 1793.921854
UPDATE: this will help to understand the logic:
更新:这将有助于理解逻辑:
In [573]: pd.concat([df['LAT'].shift(), df.loc[1:, 'LAT']], axis=1, ignore_index=True)
Out[573]:
0 1
0 NaN NaN
1 74.166061 72.249672
2 72.249672 67.499828
3 67.499828 84.253715
4 84.253715 72.104828
5 72.104828 63.989462
6 63.989462 80.209112
7 80.209112 68.954132
8 68.954132 83.378214
9 83.378214 68.778571