I have two text files which has space seperated values, i want to combine the files based on a key column from both files and output in another file.
location.txt
我有两个文本文件,其中包含空格分隔值,我想根据两个文件中的键列和另一个文件中的输出组合文件。 location.txt
1 21.5 23
2 24.5 20
3 19.5 19
4 22.5 15
5 24.5 12
6 19.5 12
data.txt which has milllion of data, but i will give simple few entries here,
data.txt有数万个数据,但我会在这里给出简单的几个条目,
2004-03-31 03:38:15.757551 2 1 122.153 -3.91901 11.04 2.03397
2004-02-28 00:59:16.02785 3 2 19.9884 37.0933 45.08 2.69964
2004-02-28 01:03:16.33393 11 3 19.3024 38.4629 45.08 2.68742
2004-02-28 01:06:16.013453 17 4 19.1652 38.8039 45.08 2.68742
2004-02-28 01:06:46.778088 18 5 19.175 38.8379 45.08 2.69964
2004-02-28 01:08:45.992524 22 6 19.1456 38.9401 45.08 2.68742
What i trying is to combine these two files based on the key value of column 1 from location.txt and column 4 from data.txt and get the result in format as below by combining all the data from data.txt and column 2 and 3 from location.txt..
我尝试的是基于第1列的键值从location.txt和第4列中的data.txt组合这两个文件,并通过组合data.txt和第2列和第3列的所有数据得到如下格式的结果来自location.txt ..
2004-03-31 03:38:15.757551 2 1 122.153 -3.91901 11.04 2.03397 21.5 23
2004-02-28 00:59:16.02785 3 2 19.9884 37.0933 45.08 2.69964 24.5 20
2004-02-28 01:03:16.33393 11 3 19.3024 38.4629 45.08 2.68742 19.5 19
2004-02-28 01:06:16.013453 17 4 19.1652 38.8039 45.08 2.68742 22.5 15
2004-02-28 01:06:46.778088 18 5 19.175 38.8379 45.08 2.69964 24.5 12
2004-02-28 01:08:45.992524 22 6 19.1456 38.9401 45.08 2.68742 19.5 12
I'm using awk command:
我正在使用awk命令:
awk -F' ' "NR==FNR{label[$1]=$1;x[$1]=$2;y[$1]=$3;next}; ($2==label[$2]){print $0 "," x[$2] y[$3]}" location.txt data.txt > result.txt
But I'm not getting the output as i expected, Can anyone help me fix this? can we get the result file in csv format with space replaced with comma?
但我没有像我预期的那样得到输出,任何人都可以帮我解决这个问题吗?我们可以用csv格式获取结果文件,用逗号替换空格吗?
2 个解决方案
#1
1
In awk:
在awk中:
$ awk '
NR==FNR { # process location.txt
a[$1]=$2 OFS $3 # hash using $1 as key
next # next record
}
$4 in a { # process data.txt
print $0,a[$4] # output record and related location
}' location.txt data.txt # mind the file order
2004-03-31 03:38:15.757551 2 1 122.153 -3.91901 11.04 2.03397 21.5 23
2004-02-28 00:59:16.02785 3 2 19.9884 37.0933 45.08 2.69964 24.5 20
2004-02-28 01:03:16.33393 11 3 19.3024 38.4629 45.08 2.68742 19.5 19
2004-02-28 01:06:16.013453 17 4 19.1652 38.8039 45.08 2.68742 22.5 15
2004-02-28 01:06:46.778088 18 5 19.175 38.8379 45.08 2.69964 24.5 12
2004-02-28 01:08:45.992524 22 6 19.1456 38.9401 45.08 2.68742 19.5 12
#2
1
With bash and join
用bash和join
join -1 1 -2 4 <(sort -k1,1 -n location.txt) <(sort -k4,4 -n data.txt) -o 2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8,1.2,1.3
Output:
输出:
2004-03-31 03:38:15.757551 2 1 122.153 -3.91901 11.04 2.03397 21.5 23 2004-02-28 00:59:16.02785 3 2 19.9884 37.0933 45.08 2.69964 24.5 20 2004-02-28 01:03:16.33393 11 3 19.3024 38.4629 45.08 2.68742 19.5 19 2004-02-28 01:06:16.013453 17 4 19.1652 38.8039 45.08 2.68742 22.5 15 2004-02-28 01:06:46.778088 18 5 19.175 38.8379 45.08 2.69964 24.5 12 2004-02-28 01:08:45.992524 22 6 19.1456 38.9401 45.08 2.68742 19.5 12
See: man join
见:男子加入
#1
1
In awk:
在awk中:
$ awk '
NR==FNR { # process location.txt
a[$1]=$2 OFS $3 # hash using $1 as key
next # next record
}
$4 in a { # process data.txt
print $0,a[$4] # output record and related location
}' location.txt data.txt # mind the file order
2004-03-31 03:38:15.757551 2 1 122.153 -3.91901 11.04 2.03397 21.5 23
2004-02-28 00:59:16.02785 3 2 19.9884 37.0933 45.08 2.69964 24.5 20
2004-02-28 01:03:16.33393 11 3 19.3024 38.4629 45.08 2.68742 19.5 19
2004-02-28 01:06:16.013453 17 4 19.1652 38.8039 45.08 2.68742 22.5 15
2004-02-28 01:06:46.778088 18 5 19.175 38.8379 45.08 2.69964 24.5 12
2004-02-28 01:08:45.992524 22 6 19.1456 38.9401 45.08 2.68742 19.5 12
#2
1
With bash and join
用bash和join
join -1 1 -2 4 <(sort -k1,1 -n location.txt) <(sort -k4,4 -n data.txt) -o 2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8,1.2,1.3
Output:
输出:
2004-03-31 03:38:15.757551 2 1 122.153 -3.91901 11.04 2.03397 21.5 23 2004-02-28 00:59:16.02785 3 2 19.9884 37.0933 45.08 2.69964 24.5 20 2004-02-28 01:03:16.33393 11 3 19.3024 38.4629 45.08 2.68742 19.5 19 2004-02-28 01:06:16.013453 17 4 19.1652 38.8039 45.08 2.68742 22.5 15 2004-02-28 01:06:46.778088 18 5 19.175 38.8379 45.08 2.69964 24.5 12 2004-02-28 01:08:45.992524 22 6 19.1456 38.9401 45.08 2.68742 19.5 12
See: man join
见:男子加入