在两个数据框中查找公共列并执行一项功能

时间:2021-03-25 22:52:41

I have two csv files that I want to compare and perform a function/calculation if four conditions are satisfied.

我有两个我要比较的csv文件,如果满足四个条件,则执行函数/计算。

file1:

SN  CY  Year    Month   Day Hour    Lat Lon
196101  1   1961    1   14  12  8.3 134.7
196101  1   1961    1   14  18  8.8 133.4
196101  1   1961    1   15  0   9.1 132.5
196101  1   1961    1   15  6   9.3 132.2
196101  1   1961    1   15  12  9.5 132
196101  1   1961    1   15  18  9.9 131.8
196125  1   1961    1   14  12  10.0 136
196125  1   1961    1   14  18  10.5 136.5

file2:

 Year    Month Day RR Hour Lat  Lon
 1961    1   14  0   0   14.0917 121.055
 1961    1   14  0   6   14.0917 121.055
 1961    1   14  0   12  14.0917 121.055
 1961    1   14  0   18  14.0917 121.055
 1961    1   15  0   0   14.0917 121.055
 1961    1   15  0   6   14.0917 121.055

I am trying to calculate the distance between Lat-Lon points from these two files whenever they have the same Year,Month,Day,Hour. Here is my code:

我试图计算Lat-Lon点与这两个文件之间的距离,只要它们具有相同的年,月,日,小时。这是我的代码:

jtwc <-read.csv("file1.csv",header=T,sep=",")
stn  <-read.csv("file2.csv",header=T,sep=",")

dms_to_rad <- function(d, m, s) (d + m / 60 + s / 3600) * pi / 180
great_circle_distance <- function(lat1, long1, lat2, long2) {
   a <- sin(0.5 * (lat2 - lat1))
   b <- sin(0.5 * (long2 - long1))
   12742 * asin(sqrt(a * a + cos(lat1) * cos(lat2) * b * b))
}

jtwc$dist<- great_circle_distance(dms_to_rad(jtwc$Lat,0,0),dms_to_rad(jtwc$Lon,0,0),dms_to_rad(stn$Lat,0,0),dms_to_rad(stn$Lon,0,0))
write.csv(stn,file="dist.csv",row.names=T)

The "SN" column is a unique identifier in file1. What I want to do:

“SN”列是file1中的唯一标识符。我想做的事:

[1] Calculate the distance(jtwc$dist) when the two files have the same Year,Month,Day, and Hour.

[1]当两个文件具有相同的年,月,日和小时时,计算距离(jtwc $ dist)。

[2] In case a row has the same Year,Month,Day,and Hour but different SN number in file1,I will use the values in the row with the same Year,Month,Day,and Hour in file2 in computing the distance.

[2]如果一行具有相同的年,月,日和小时但在file1中具有不同的SN编号,我将使用文件2中具有相同年,月,日和小时的行中的值来计算距离。

The output should like this:

输出应该是这样的:

SN  CY  Year    Month   Day Hour    Lat Lon dist
196101  1   1961    1   14  12  8.3 134.7  1620.961
196101  1   1961    1   14  18  8.8 133.4  1467.859
196101  1   1961    1   15  0   9.1 132.5  1334.382
196101  1   1961    1   15  6   9.3 132.2  1324.915
196125  1   1961    1   14  12  10.0 136   1687.127
196125  1   1961    1   14  18  10.5 136.5  1724.351

Any suggestion on how to do this correctly?

有关如何正确执行此操作的任何建议?

1 个解决方案

#1


2  

If I understand you right, you can try this solution:

如果我理解你,你可以尝试这个解决方案:

library(tidyverse)
#functions
dms_to_rad <- function(d, m, s) (d + m / 60 + s / 3600) * pi / 180
great_circle_distance <- function(lat1, long1, lat2, long2) {
  a <- sin(0.5 * (lat2 - lat1))
  b <- sin(0.5 * (long2 - long1))
  12742 * asin(sqrt(a * a + cos(lat1) * cos(lat2) * b * b))
}

#read file
dir1 = 'path_to_your_files'
dir1 = 'path_to_your_files'
jtwc <- read.csv(dir1) %>% 
  unite('key',c('Year','Month','Day','Hour'))
stn <- read.csv(dir2) %>% 
  unite('key',c('Year','Month','Day','Hour'))

#aggregating 
stn <- left_join(jtwc,stn,by = 'key') %>% 
      drop_na() %>% 
      mutate_at(vars(Lat.x,Lon.x, Lat.y,Lon.y),funs(dms_to_rad),m = 0,s  =0) %>% 
      mutate(dist = great_circle_distance(Lat.x,Lon.x, Lat.y,Lon.y))


write.csv(stn,file="dist.csv",row.names=T)

#1


2  

If I understand you right, you can try this solution:

如果我理解你,你可以尝试这个解决方案:

library(tidyverse)
#functions
dms_to_rad <- function(d, m, s) (d + m / 60 + s / 3600) * pi / 180
great_circle_distance <- function(lat1, long1, lat2, long2) {
  a <- sin(0.5 * (lat2 - lat1))
  b <- sin(0.5 * (long2 - long1))
  12742 * asin(sqrt(a * a + cos(lat1) * cos(lat2) * b * b))
}

#read file
dir1 = 'path_to_your_files'
dir1 = 'path_to_your_files'
jtwc <- read.csv(dir1) %>% 
  unite('key',c('Year','Month','Day','Hour'))
stn <- read.csv(dir2) %>% 
  unite('key',c('Year','Month','Day','Hour'))

#aggregating 
stn <- left_join(jtwc,stn,by = 'key') %>% 
      drop_na() %>% 
      mutate_at(vars(Lat.x,Lon.x, Lat.y,Lon.y),funs(dms_to_rad),m = 0,s  =0) %>% 
      mutate(dist = great_circle_distance(Lat.x,Lon.x, Lat.y,Lon.y))


write.csv(stn,file="dist.csv",row.names=T)