如何在awk中创建子档案?

时间:2021-05-26 16:04:10

Given a list like:

给出如下列表:

Dog bone
Cat catnip
Human ipad
Dog collar
Dog collar
Cat collar
Human car
Human laptop
Cat catnip
Human ipad

How can I get results like this, using awk:

如何使用awk获得这样的结果:

Dog bone 1
Dog collar 2
Cat catnip 2
Cat collar 1
Human car 1
Human laptop 1
Human ipad 2

Do I need a sub array? It seems to me like a need an array of "owners" which is populated by arrays of "things."

我需要一个子阵列吗?在我看来,需要一系列“所有者”,其中包含“事物”阵列。

I'd like to use awk to do this, as this is a subscript of another program in awk, and for now, I'd rather not create a separate program.

我想用awk来做这个,因为这是awk中另一个程序的下标,而现在,我宁愿不创建一个单独的程序。

By the way, I can already do it using sort and grep -c, and a few other pipes, but I really won't be able to do that on gigantic data files, as it would be too slow. Awk is generally much faster for this kind of thing, I'm told.

顺便说一句,我已经可以使用sort和grep -c以及其他一些管道来完成它,但我真的无法在巨大的数据文件上做到这一点,因为它太慢了。据我所知,awk通常对这种事情要快得多。

 Thanks, 
 Kevin

EDIT: Be aware, that the columns are actually not next to eachother like this, in the real file, they are more like column $8 and $11. I say this because I suppose if they were next to eachother I could incorporate an awk regex ~/Dog\ Collar/ or something. But I won't have that option. -thanks!

编辑:请注意,这些列实际上并不像这样,在真实文件中,它们更像是$ 8和$ 11列。我说这是因为我想如果他们在彼此旁边,我可以加入一个awk正则表达式/ / Dog \ Collar /或其他东西。但我不会有这个选择。 -谢谢!

2 个解决方案

#1


2  

awk does not have multi-dimensional arrays, but you can manage by constructing 2D-ish array keys:

awk没有多维数组,但你可以通过构建2D-ish数组键来管理:

awk '{count[$1 " " $2]++} END {for (key in count) print key, count[key]}' | sort

which, from your input, outputs

根据您的输入,输出

Cat catnip 2
Cat collar 1
Dog bone 1
Dog collar 2
Human car 1
Human ipad 2
Human laptop 1

Here, I use a space to separate the key values. If your data contains spaces, you can use some other character that does not appear in your input. I typically use array[$a FS $b] when I have a specific field separator, since that's guaranteed not to appear in the field values.

在这里,我使用空格来分隔键值。如果数据包含空格,则可以使用输入中未显示的其他字符。当我有一个特定的字段分隔符时,我通常使用数组[$ a FS $ b],因为这保证不会出现在字段值中。

#2


2  

GNU Awk has some support for multi-dimensional arrays, but it's really just cleverly concatenating keys to form a sort of compound key.

GNU Awk对多维数组有一些支持,但它实际上只是巧妙地将键连接起来形成一种复合键。

I'd recommend learning Perl, which will be fairly familiar to you if you like awk, but Perl supports true Lists of Lists. In general, Perl will take you much further than awk.

我建议学习Perl,如果你喜欢awk,你会对它很熟悉,但Perl支持真正的列表列表。一般来说,Perl会比awk更进一步。


Re your comment:

你的评论:

I'm not trying to be superior. I understand you asked how to accomplish a task with a specific tool, awk. I did give a link to the documentation for simulating multi-dimensional arrays in awk. But awk doesn't do that task well, and it was effectively replaced by Perl nearly 20 years ago.

我不是想成为优秀者。我知道你问过如何使用特定工具完成任务,awk。我确实提供了一个文档链接,用于在awk中模拟多维数组。但awk并没有很好地完成这项任务,并且近20年前它被Perl有效地取代了。

If you ask how to cross a lake on a bicycle, and I tell you it'll be easier in a boat, I don't think that's unreasonable. If I tell you it'll be easier to first build a bridge, or first invent a Star Trek transporter, then that would be unreasonable.

如果你问如何骑自行车过湖,我告诉你它在船上会更容易,我觉得这不合理。如果我告诉你,首先建造一座桥,或者首先发明一辆星际迷航运输车会更容易,那么那将是不合理的。

#1


2  

awk does not have multi-dimensional arrays, but you can manage by constructing 2D-ish array keys:

awk没有多维数组,但你可以通过构建2D-ish数组键来管理:

awk '{count[$1 " " $2]++} END {for (key in count) print key, count[key]}' | sort

which, from your input, outputs

根据您的输入,输出

Cat catnip 2
Cat collar 1
Dog bone 1
Dog collar 2
Human car 1
Human ipad 2
Human laptop 1

Here, I use a space to separate the key values. If your data contains spaces, you can use some other character that does not appear in your input. I typically use array[$a FS $b] when I have a specific field separator, since that's guaranteed not to appear in the field values.

在这里,我使用空格来分隔键值。如果数据包含空格,则可以使用输入中未显示的其他字符。当我有一个特定的字段分隔符时,我通常使用数组[$ a FS $ b],因为这保证不会出现在字段值中。

#2


2  

GNU Awk has some support for multi-dimensional arrays, but it's really just cleverly concatenating keys to form a sort of compound key.

GNU Awk对多维数组有一些支持,但它实际上只是巧妙地将键连接起来形成一种复合键。

I'd recommend learning Perl, which will be fairly familiar to you if you like awk, but Perl supports true Lists of Lists. In general, Perl will take you much further than awk.

我建议学习Perl,如果你喜欢awk,你会对它很熟悉,但Perl支持真正的列表列表。一般来说,Perl会比awk更进一步。


Re your comment:

你的评论:

I'm not trying to be superior. I understand you asked how to accomplish a task with a specific tool, awk. I did give a link to the documentation for simulating multi-dimensional arrays in awk. But awk doesn't do that task well, and it was effectively replaced by Perl nearly 20 years ago.

我不是想成为优秀者。我知道你问过如何使用特定工具完成任务,awk。我确实提供了一个文档链接,用于在awk中模拟多维数组。但awk并没有很好地完成这项任务,并且近20年前它被Perl有效地取代了。

If you ask how to cross a lake on a bicycle, and I tell you it'll be easier in a boat, I don't think that's unreasonable. If I tell you it'll be easier to first build a bridge, or first invent a Star Trek transporter, then that would be unreasonable.

如果你问如何骑自行车过湖,我告诉你它在船上会更容易,我觉得这不合理。如果我告诉你,首先建造一座桥,或者首先发明一辆星际迷航运输车会更容易,那么那将是不合理的。