比较一个文件（保持行顺序）与“静态”文件

Using the awk call below I was getting good results before, which I think was first checking if values in the 1st column of "seafloor" were in 1st column of "ctd", and if so, it gave the value of 2nd column of "ctd", and always following the row order of "seafloor":

使用下面的awk调用之前我得到了很好的结果，我认为首先检查“seafloor”的第1列中的值是否在“ctd”的第1列中，如果是，则给出第2列的值“ ctd“，并且始终遵循”seafloor“的行顺序：

awk 'NR==FNR{A[$1]=$2;next} {i=int($1+.5)} i in A {print A[i]}' ctd seafloor

This means, if "seafloor" 1st column would have values higher than "ctd" 1st column, it gave a blank space, but left that row missing there.

这意味着，如果“seafloor”第1列的值高于“ctd”第1列，则会给出一个空格，但是那行丢失了。

I presume that it was because my "static" file called "ctd" (change of seawater temperature with depth, where water depth in 1st column, and temperature in 2nd column) only has info up to 3470 m water depth, and my "variable" file called "seafloor" (or water depth) was up to 3470 m water depth.

我认为这是因为我的“静态”文件称为“ctd”（海水温度随深度的变化，第1列中的水深，第2列中的温度）仅有3470米深度的信息，而我的“变量” “文件称为”海底“（或水深）高达3470米水深。

The "static" file has 3470 rows, 1st row going from 1 to 3470, and 2nd from 1.78 to 23.69:

“静态”文件有3470行，第1行从1到3470，第2行从1.78到23.69：

ctd: N = 3470   <1/3470>        <1.78/23.69>

The thing is that now "seafloor" goes up to 3862 m water depth (ie. deeper than ctd, which is 3470 as maximum in 2nd column):

问题是，现在“海底”的水深达到3862米（即比ctd更深，第二列的最大值为3470）：

seafloor: N = 13544     <1773.39/3862.14>

I realized that it wasn't working as expected because after using this awk call with these two files I got 9839 records, instead of 13544 records present in "seafloor", ie., I must get 13544 records but zeros (or eg. NaN) in those rows where the 1st column of "seafloor" goes beyond the range of 1st column of "ctd" (eg. for a value of 3471).

我意识到它没有按预期工作，因为在使用这两个文件的awk调用之后我获得了9839条记录，而不是“seafloor”中存在的13544条记录，即，我必须获得13544条记录但是零（或者例如.NaN））在“海底”的第1列超出“ctd”的第1列的范围的那些行中（例如，值3471）。

I appreciate any hint to solve this, and please let me know if some more clarification is needed, thanks.

我很感激任何解决这个问题的提示，如果需要进一步澄清，请告诉我，谢谢。

PS: ctd file is here: http://pastelink.me/dl/779584, and seafloor file is here: http://pastelink.me/dl/f275bc

PS：ctd文件在这里：http：//pastelink.me/dl/779584，海底文件在这里：http：//pastelink.me/dl/f275bc

EDIT 1:

编辑1：

Just to thanks to Scrutinizer for the new awk call, it works nicely. After using it, I compared this result with my old files (ie. up to 3470 m in both files) and this new output is the same that the old one (ie. using my original awk call). So the problem was basically the maximum depth in "seafloor", that it goes beyond the 3470 m in "ctd".

感谢Scrutinizer为新的awk调用，它运行良好。使用它之后，我将这个结果与我的旧文件（即两个文件中最多3470米）进行了比较，这个新输出与旧文件相同（即使用我原来的awk调用）。所以问题基本上是“海底”的最大深度，超过了“ctd”中的3470米。

1 个解决方案

#1

Do you mean something like this:

你的意思是这样的：

awk 'NR==FNR{A[$1]=$2; next} {i=int($1+.5); print A[i]+0}' crd seafloor

The explanation is that by adding a zero to the array element it is forced into numerical context, so if it already contained a number, then nothing changes and if it is empty then it becomes 0 . Also because the if statement was removed output is printed for every line.

解释是，通过向数组元素添加零，它被强制转换为数字上下文，因此如果它已经包含数字，则没有任何更改，如果它是空的，则它变为0。另外，因为删除了if语句，所以为每一行打印输出。

A side effect is that this uses a little bit more memory than it necessary, but it is a little bit simpler. Should memory usage become an issue, then this would be a bit more efficient:

副作用是它使用的内存比必要的多一点，但它有点简单。如果内存使用成为一个问题，那么这将更有效：

awk 'NR==FNR{A[$1]=$2; next} {i=int($1+.5); print (i in A)?A[i]:0}' ctd seafloor

#1