日期时间戳在R中未正确匹配/合并

时间:2023-02-08 22:58:44

I have two data frames:

我有两个数据框:

locA:

Date    Time    Temperature Voltage Longitude   Latitude    Deployment  DateTime
2014-08-08  14:18:25    20.9    1.872   -126.6886   34.95633    A   2014-08-08 14:20:00
2014-08-08  14:38:25    19.8    1.862   -126.6899   34.95675    A   2014-08-08 14:40:00
2014-08-08  14:58:26    19.3    1.856   -126.6911   34.95732    A   2014-08-08 15:00:00
2014-08-08  15:18:26    18.8    1.872   -126.6921   34.95803    A   2014-08-08 15:20:00
2014-08-08  15:38:25    18.6    1.872   -126.6929   34.95872    A   2014-08-08 15:40:00
2014-08-08  15:58:26    18.4    1.859   -126.6937   34.95921    A   2014-08-08 16:00:00
2014-08-08  16:18:25    18.4    1.875   -126.6944   34.95983    A   2014-08-08 16:20:00
2014-08-08  16:38:25    18.4    1.875   -126.6949   34.96021    A   2014-08-08 16:40:00
2014-08-08  16:58:26    18.4    1.872   -126.6956   34.96033    A   2014-08-08 17:00:00
2014-08-08  17:18:26    18.4    1.872   -126.6962   34.96025    A   2014-08-08 17:20:00

and datA:

FileName    Deployment  Start_UTC   Stop_UTC    Duration    sum PrAb    dolPrAb
DASBR1_20140808$140000.wav  A   2014-08-08 14:00:00 2014-08-08 14:02:00 119.6000    0   0   0
DASBR1_20140808$142000.wav  A   2014-08-08 14:20:00 2014-08-08 14:22:00 119.6000    0   0   0
DASBR1_20140808$144000.wav  A   2014-08-08 14:40:00 2014-08-08 14:42:00 119.8000    0   0   0
DASBR1_20140808$150000.wav  A   2014-08-08 15:00:00 2014-08-08 15:02:00 119.6000    0   0   0
DASBR1_20140808$152000.wav  A   2014-08-08 15:20:00 2014-08-08 15:22:00 119.6000    0   0   0
DASBR1_20140808$154000.wav  A   2014-08-08 15:40:00 2014-08-08 15:42:00 119.6000    0   0   0
DASBR1_20140808$160000.wav  A   2014-08-08 16:00:00 2014-08-08 16:02:00 119.6000    1   1   1
DASBR1_20140808$162000.wav  A   2014-08-08 16:20:00 2014-08-08 16:22:00 119.8000    0   0   0
DASBR1_20140808$164000.wav  A   2014-08-08 16:40:00 2014-08-08 16:42:00 119.8000    0   0   0
DASBR1_20140808$170000.wav  A   2014-08-08 17:00:00 2014-08-08 17:02:00 119.6000    0   0   0

Each are actually several thousand rows long. I want to merge them based on the "DateTime" stamp in locA and the "Start_UTC" in datA. Both of those columns have been formatted identically in UTC.

每个实际上都有几千行。我想根据locA中的“DateTime”标记和datA中的“Start_UTC”合并它们。这两个列的格式都与UTC相同。

locA$DateTime=as.POSIXct(paste(locA$Date, time), format="%Y-%m-%d %H:%M:%S", tz="UTC")
datA$Start_UTC = as.POSIXct(files$Start_UTC, format="%Y-%m-%d %H:%M:%S", tz="UTC")

In locA, the "time" variable comes from a sequence where I round the character string in locA$Time to the nearest 20min. "time" is a character string with the adjusted time.

在locA中,“time”变量来自一个序列,其中我将locA $ Time中的字符串舍入到最接近的20min。 “time”是具有调整时间的字符串。

str(time)
chr [1:8845] "14:00:00" "14:20:00" "14:40:00" "15:00:00" "15:20:00" "15:40:00" "16:00:00" "16:20:00" "16:40:00" "17:00:00" ...

When I merge the two data frames using locA$DateTime and datA$Start_UTC, it never works properly. I get something that looks like this, every time.

当我使用locA $ DateTime和datA $ Start_UTC合并两个数据帧时,它永远不会正常工作。我每次都会得到这样的东西。

resA = merge(datA,locA, by.x=c("Start_UTC"), by.y=c("DateTime"))

resA:

Start_UTC   FileName    Deployment.x    Stop_UTC    Duration    sum PrAb    dolPrAb Date    Time    Temperature Voltage Longitude   Latitude    Deployment.y
2014-08-08 14:00:00 DASBR1_20140808$140000.wav  A   2014-08-08 14:02:00 119.6000    0   0   0   2014-08-08  20:58:26    19.1    1.872   -126.7018   34.94994    A
2014-08-08 14:20:00 DASBR1_20140808$142000.wav  A   2014-08-08 14:22:00 119.6000    0   0   0   2014-08-08  21:18:26    19.1    1.872   -126.7027   34.94801    A
2014-08-08 14:40:00 DASBR1_20140808$144000.wav  A   2014-08-08 14:42:00 119.8000    0   0   0   2014-08-08  21:38:26    19.1    1.872   -126.7038   34.94608    A
2014-08-08 15:00:00 DASBR1_20140808$150000.wav  A   2014-08-08 15:02:00 119.6000    0   0   0   2014-08-08  21:58:27    18.9    1.865   -126.7050   34.94394    A
2014-08-08 15:20:00 DASBR1_20140808$152000.wav  A   2014-08-08 15:22:00 119.6000    0   0   0   2014-08-08  22:18:27    18.9    1.856   -126.7062   34.94188    A
2014-08-08 15:40:00 DASBR1_20140808$154000.wav  A   2014-08-08 15:42:00 119.6000    0   0   0   2014-08-08  22:38:26    18.8    1.862   -126.7077   34.93994    A
2014-08-08 16:00:00 DASBR1_20140808$160000.wav  A   2014-08-08 16:02:00 119.6000    1   1   1   2014-08-08  22:58:26    18.8    1.859   -126.7090   34.93796    A
2014-08-08 16:20:00 DASBR1_20140808$162000.wav  A   2014-08-08 16:22:00 119.8000    0   0   0   2014-08-08  23:18:26    18.8    1.865   -126.7105   34.93622    A
2014-08-08 16:40:00 DASBR1_20140808$164000.wav  A   2014-08-08 16:42:00 119.8000    0   0   0   2014-08-08  23:38:27    18.6    1.856   -126.7120   34.93480    A
2014-08-08 17:00:00 DASBR1_20140808$170000.wav  A   2014-08-08 17:02:00 119.6000    0   0   0   2014-08-09  23:58:28    18.8    1.849   -126.8051   34.88381    A

I have tried this so many different ways. I tried keeping the locA$DateTime as is, where the time equals the time stamp in datA$Time, and then tried using this function to match it to the correct datA$Start_UTC. Match the minimum locA$timestamp to the datA$timestamp

我尝试过这么多不同的方法。我尝试将locA $ DateTime保持原样,其中时间等于datA $ Time中的时间戳,然后尝试使用此函数将其与正确的datA $ Start_UTC相匹配。将最小locA $ timestamp与datA $ timestamp匹配

findrow=function(dt,df) {min(which(df>dt))}
rowA=sapply(locA$DateTime, findrow, df=datA$Start_UTC)
resA=cbind(locA,datA[rowA,])

This gave me the same results. I don't know what's going on. I'm sure it's intrinsic to how the datetime stamps are generated, but I don't know how to fix this. If I format them the same, and tell R that they are UTC, shouldn't they be treated the same way?

这给了我相同的结果。我不知道发生了什么事。我确定它是如何生成日期时间戳的固有的,但我不知道如何解决这个问题。如果我将它们格式化为相同,并告诉R它们是UTC,那么它们是否应该以相同的方式处理?

Help, please, all help, comments, or humorous "hang in there" statements would greatly appreciated. Not being able to properly pair my lat/longs with my data timestamps is really causing a headache.

请帮助,所有帮助,评论或幽默“挂在那里”的声明将非常感谢。无法将我的纬度/长度与我的数据时间戳正确配对确实令人头疼。

Cheers.

1 个解决方案

#1


0  

I wonder if doing a merge on chr objects instead of POSIXct objects would work better?

我想知道在chr对象上进行合并而不是POSIXct对象会更好吗?

locA$DateTime_str = format(locA$DateTime)
datA$Start_UTC_str = format(datA$Start_UTC)
resA = merge(datA,locA, by.x=c("Start_UTC_str"), by.y=c("DateTime_str"))

#1


0  

I wonder if doing a merge on chr objects instead of POSIXct objects would work better?

我想知道在chr对象上进行合并而不是POSIXct对象会更好吗?

locA$DateTime_str = format(locA$DateTime)
datA$Start_UTC_str = format(datA$Start_UTC)
resA = merge(datA,locA, by.x=c("Start_UTC_str"), by.y=c("DateTime_str"))