如何在文件名最大的每个目录中查找文件?

时间:2022-08-30 17:29:12

I have a file structure that looks like this

我有一个看起来像这样的文件结构

./501.res/1.bin
./503.res/1.bin
./503.res/2.bin
./504.res/1.bin

and I would like to find the file path to the .bin file in each directory which have the highest number as filename. So the output I am looking for would be

我想在每个目录中找到.bin文件的文件路径,其中文件名最高。所以我正在寻找的输出将是

./501.res/1.bin
./503.res/2.bin
./504.res/1.bin

The highest number a file can have is 9.

文件的最高编号是9。

Question

How do I do that in BASH?

我怎么在BASH那样做?

I have come as far as find .|grep bin|sort

我找到了。| grep bin | sort

6 个解决方案

#1


1  

What about using awk? You can get the FIRST occurrence really simply:

用awk怎么样?您可以非常简单地获得FIRST事件:

[ghoti@pc ~]$ cat data1
./501.res/1.bin
./503.res/1.bin
./503.res/2.bin
./504.res/1.bin
[ghoti@pc ~]$ awk 'BEGIN{FS="."} a[$2] {next} {a[$2]=1} 1' data1
./501.res/1.bin
./503.res/1.bin
./504.res/1.bin
[ghoti@pc ~]$ 

To get the last occurrence you could pipe through a couple of sorts:

要获得最后一次出现,您可以管理几种类型:

[ghoti@pc ~]$ sort -r data1 | awk 'BEGIN{FS="."} a[$2] {next} {a[$2]=1} 1' | sort
./501.res/1.bin
./503.res/2.bin
./504.res/1.bin
[ghoti@pc ~]$ 

Given that you're using "find" and "grep", you could probably do this:

鉴于您正在使用“find”和“grep”,您可能会这样做:

find . -name \*.bin -type f -print | sort -r | awk 'BEGIN{FS="."} a[$2] {next} {a[$2]=1} 1' | sort

How does this work?

这个怎么用?

The find command has many useful options, including the ability to select your files by glob, select the type of file, etc. Its output you already know, and that becomes the input to sort -r.

find命令有许多有用的选项,包括通过glob选择文件的能力,选择文件的类型等。它已经知道它的输出,并且它成为sort -r的输入。

First, we sort our input data in reverse (sort -r). This insures that within any directory, the highest numbered file will show up first. That result gets fed into awk. FS is the field separator, which makes $2 into things like "/501", "/502", etc. Awk scripts have sections in the form of condition {action} which get evaluated for each line of input. If a condition is missing, the action runs on every line. If "1" is the condition and there is no action, it prints the line. So this script is broken out as follows:

首先,我们对输入数据进行反向排序(sort -r)。这可以确保在任何目录中,编号最大的文件将首先显示。这个结果被输入awk。 FS是字段分隔符,它使$ 2成为“/ 501”,“/ 502”等内容.Awk脚本具有条件{action}形式的部分,可以对每行输入进行评估。如果缺少某个条件,则该操作将在每一行上运行。如果条件为“1”并且没有动作,则打印该行。所以这个脚本分解如下:

  • a[$2] {next} - If the array a with the subscript $2 (i.e. "/501") exists, just jump to the next line. Otherwise...
  • a [$ 2] {next} - 如果存在带有下标$ 2的数组a(即“/ 501”),则跳转到下一行。除此以外...

  • {a[$2]=1} - set the array a subscript $2 to 1, so that in future the first condition will evaluate as true, then...
  • {a [$ 2] = 1} - 将数组的下标$ 2设置为1,以便将来第一个条件评估为true,然后......

  • 1 - print the line.
  • 1 - 打印线。

The output of this awk script will be the data you want, but in reverse order. The final sort puts things back in the order you'd expect.

此awk脚本的输出将是您想要的数据,但顺序相反。最后的排序会按照您期望的顺序重新排列。

Now ... that's a lot of pipes, and sort can be a bit resource hungry when you ask it to deal with millions of lines of input at the same time. This solution will be perfectly sufficient for small numbers of files, but if you're dealing with large quantities of input, let us know, and I can come up with an all-in-one awk solution (that will take longer than 60 seconds to write).

现在......这是很多管道,当你要求它同时处理数百万行输入时,排序可能会有点资源浪费。这个解决方案对于少量文件来说是完全足够的,但是如果你正在处理大量的输入,请告诉我们,我可以提出一个多功能的awk解决方案(需要超过60秒)来写)。

UPDATE

Per Dennis' sage advice, the awk script I included above could be improved by changing it from

根据Dennis的圣训建议,我上面包含的awk脚本可以通过改变它来改进

BEGIN{FS="."} a[$2] {next} {a[$2]=1} 1

to

BEGIN{FS="."} $2 in a {next} {a[$2]} 1

While this is functionally identical, the advantage is that you simply define array members rather than assigning values to them, which may save memory or cpu depending on your implementation of awk. At any rate, it's cleaner.

虽然这在功能上是相同的,但优点是您只需定义数组成员而不是为它们赋值,这可能会节省内存或CPU,具体取决于您的awk实现。无论如何,它更清洁。

#2


3  

Globs are guaranteed to be expanded in lexical order.

保证球体在词汇顺序上得到扩展。

for dir in ./*/
do
    files=($dir/*)           # create an array
    echo "${files[@]: -1}"   # access its last member
done

#3


2  

Tested:

find . -type d -name '*.res' | while read dir; do
    find "$dir" -maxdepth 1 | sort -n | tail -n 1
done

#4


0  

I came up with someting like this:

我想出了这样的东西:

for dir in $(find . -mindepth 1 -type d | sort); do
   file=$(ls "$dir" | sort | tail -n 1);
   [ -n "$file" ] && (echo "$dir/$file");
done

Maybe it can be simpler

也许它可以更简单

#5


0  

If invoking a shell from within find is an option try this

如果从find中调用shell是一个选项,请尝试这样做

  find * -type d -exec sh -c "echo -n './'; ls -1 {}/*.bin | sort -n -r | head -n 1" \;

#6


0  

And here is one liner

这是一个班轮

find . -mindepth 1 -type d | sort | sed -e "s/.*/ls & | sort | tail -n 1 | xargs -I{} echo &\/{}/" | bash

#1


1  

What about using awk? You can get the FIRST occurrence really simply:

用awk怎么样?您可以非常简单地获得FIRST事件:

[ghoti@pc ~]$ cat data1
./501.res/1.bin
./503.res/1.bin
./503.res/2.bin
./504.res/1.bin
[ghoti@pc ~]$ awk 'BEGIN{FS="."} a[$2] {next} {a[$2]=1} 1' data1
./501.res/1.bin
./503.res/1.bin
./504.res/1.bin
[ghoti@pc ~]$ 

To get the last occurrence you could pipe through a couple of sorts:

要获得最后一次出现,您可以管理几种类型:

[ghoti@pc ~]$ sort -r data1 | awk 'BEGIN{FS="."} a[$2] {next} {a[$2]=1} 1' | sort
./501.res/1.bin
./503.res/2.bin
./504.res/1.bin
[ghoti@pc ~]$ 

Given that you're using "find" and "grep", you could probably do this:

鉴于您正在使用“find”和“grep”,您可能会这样做:

find . -name \*.bin -type f -print | sort -r | awk 'BEGIN{FS="."} a[$2] {next} {a[$2]=1} 1' | sort

How does this work?

这个怎么用?

The find command has many useful options, including the ability to select your files by glob, select the type of file, etc. Its output you already know, and that becomes the input to sort -r.

find命令有许多有用的选项,包括通过glob选择文件的能力,选择文件的类型等。它已经知道它的输出,并且它成为sort -r的输入。

First, we sort our input data in reverse (sort -r). This insures that within any directory, the highest numbered file will show up first. That result gets fed into awk. FS is the field separator, which makes $2 into things like "/501", "/502", etc. Awk scripts have sections in the form of condition {action} which get evaluated for each line of input. If a condition is missing, the action runs on every line. If "1" is the condition and there is no action, it prints the line. So this script is broken out as follows:

首先,我们对输入数据进行反向排序(sort -r)。这可以确保在任何目录中,编号最大的文件将首先显示。这个结果被输入awk。 FS是字段分隔符,它使$ 2成为“/ 501”,“/ 502”等内容.Awk脚本具有条件{action}形式的部分,可以对每行输入进行评估。如果缺少某个条件,则该操作将在每一行上运行。如果条件为“1”并且没有动作,则打印该行。所以这个脚本分解如下:

  • a[$2] {next} - If the array a with the subscript $2 (i.e. "/501") exists, just jump to the next line. Otherwise...
  • a [$ 2] {next} - 如果存在带有下标$ 2的数组a(即“/ 501”),则跳转到下一行。除此以外...

  • {a[$2]=1} - set the array a subscript $2 to 1, so that in future the first condition will evaluate as true, then...
  • {a [$ 2] = 1} - 将数组的下标$ 2设置为1,以便将来第一个条件评估为true,然后......

  • 1 - print the line.
  • 1 - 打印线。

The output of this awk script will be the data you want, but in reverse order. The final sort puts things back in the order you'd expect.

此awk脚本的输出将是您想要的数据,但顺序相反。最后的排序会按照您期望的顺序重新排列。

Now ... that's a lot of pipes, and sort can be a bit resource hungry when you ask it to deal with millions of lines of input at the same time. This solution will be perfectly sufficient for small numbers of files, but if you're dealing with large quantities of input, let us know, and I can come up with an all-in-one awk solution (that will take longer than 60 seconds to write).

现在......这是很多管道,当你要求它同时处理数百万行输入时,排序可能会有点资源浪费。这个解决方案对于少量文件来说是完全足够的,但是如果你正在处理大量的输入,请告诉我们,我可以提出一个多功能的awk解决方案(需要超过60秒)来写)。

UPDATE

Per Dennis' sage advice, the awk script I included above could be improved by changing it from

根据Dennis的圣训建议,我上面包含的awk脚本可以通过改变它来改进

BEGIN{FS="."} a[$2] {next} {a[$2]=1} 1

to

BEGIN{FS="."} $2 in a {next} {a[$2]} 1

While this is functionally identical, the advantage is that you simply define array members rather than assigning values to them, which may save memory or cpu depending on your implementation of awk. At any rate, it's cleaner.

虽然这在功能上是相同的,但优点是您只需定义数组成员而不是为它们赋值,这可能会节省内存或CPU,具体取决于您的awk实现。无论如何,它更清洁。

#2


3  

Globs are guaranteed to be expanded in lexical order.

保证球体在词汇顺序上得到扩展。

for dir in ./*/
do
    files=($dir/*)           # create an array
    echo "${files[@]: -1}"   # access its last member
done

#3


2  

Tested:

find . -type d -name '*.res' | while read dir; do
    find "$dir" -maxdepth 1 | sort -n | tail -n 1
done

#4


0  

I came up with someting like this:

我想出了这样的东西:

for dir in $(find . -mindepth 1 -type d | sort); do
   file=$(ls "$dir" | sort | tail -n 1);
   [ -n "$file" ] && (echo "$dir/$file");
done

Maybe it can be simpler

也许它可以更简单

#5


0  

If invoking a shell from within find is an option try this

如果从find中调用shell是一个选项,请尝试这样做

  find * -type d -exec sh -c "echo -n './'; ls -1 {}/*.bin | sort -n -r | head -n 1" \;

#6


0  

And here is one liner

这是一个班轮

find . -mindepth 1 -type d | sort | sed -e "s/.*/ls & | sort | tail -n 1 | xargs -I{} echo &\/{}/" | bash