过滤名称为时间戳的文件夹 - 使用find实用程序匹配模式匹配与正则表达式匹配

时间:2022-09-01 23:17:58

I am writing a generic shell script which filters out files based on given regex.

我正在编写一个通用shell脚本,它根据给定的正则表达式筛选出文件。

My shell script:

我的shell脚本:

files=$(find $path -name $regex)

In one of the cases (to filter), I want to filter folders inside a directory, the name of the folders are in the below format:

在其中一种情况下(要过滤),我想过滤目录中的文件夹,文件夹的名称采用以下格式:

20161128-20:34:33:432813246
YYYYMMDD-HH:MM:SS:NS

I am unable to arrive at the correct regex.

我无法达到正确的正则表达式。

I am able to get the path of the files inside the folder using the regex '*data.txt', as I know the name of the file inside it.

我可以使用正则表达式'* data.txt'获取文件夹中文件的路径,因为我知道其中文件的名称。

But it gives me the full path of the file, something like

但它给了我文件的完整路径,类似于

/path/20161128-20:34:33:432813246/data.txt

What I want is simply:

我想要的只是:

/path/20161128-20:34:33:432813246

Please help me in identifying the correct regex for my requirement

请帮我确定正确的正则表达式以满足我的要求

NOTE:

I know how to process the data after

我知道如何处理数据

files=$(find $path -name $regex)

But since the script needs to be generic for many use cases, I only need the correct regex that needs to be passed.

但由于脚本需要在许多用例中都是通用的,所以我只需要正确的正则表达式来传递。

2 个解决方案

#1


3  

  • Per POSIX, find's -name -path primaries (tests) use patterns (a.k.a wildcard expressions, globs) to match filenames and pathnames (while patterns and regular expressions are distantly related, their syntax and capabilities differ significantly; in short: patterns are syntactically simpler, but far less powerful).

    根据POSIX,find-name -path primaries(tests)使用模式(aka通配符表达式,globs)来匹配文件名和路径名(虽然模式和正则表达式有很大关系,但它们的语法和功能差别很大;简而言之:模式在语法上更简单,但远不那么强大)。

    • -name and matches the pattern against the basename (mere filename) part of an input path only
    • -name并仅将模式与输入路径的basename(仅文件名)部分进行匹配

    • -path matches the pattern against the whole pathname (the full path)
    • -path匹配整个路径名的模式(完整路径)

  • Both GNU and BSD/macOS find implement nonstandard extensions:

    GNU和BSD / macOS都可以实现非标准扩展:

    • -iname and -ipath, which work like their standard-compliant counterparts (based on patterns), except that they match case-insensitively.
    • -iname和-ipath,它们与标准兼容的对应物(基于模式)一样工作,除了它们不区分大小写。

    • -regex and -iregex tests for matching pathnames by regex (regular expression).
      • Caveat: Both implementations offer at least 2 regex dialects to choose from (-E activates support for extended regular expressions in BSD find, and GNU find allows selecting from several dialects with-regextype, but no two dialects are exactly the same across the two implementations - see bottom for the gory details.
      • 警告:两种实现都提供至少2种正则表达方式可供选择(-E激活对BSD查找中扩展正则表达式的支持,而GNU查找允许从多种方言中选择-regextype,但两种方言中两种方言完全相同 - 看看血淋淋的细节底部。

    • -regex和-iregex通过正则表达式(正则表达式)测试匹配的路径名。警告:两种实现都提供至少2种正则表达方式可供选择(-E激活对BSD查找中扩展正则表达式的支持,而GNU查找允许从多种方言中选择-regextype,但两种方言中两种方言完全相同 - 看看血淋淋的细节底部。


With your folder names following a fixed-width naming scheme, a pattern would work:

使用固定宽度命名方案的文件夹名称,模式将起作用:

pattern='[0-9][0-9][0-9][0-9][0-9][0-9][0-9]-[0-9][0-9]:[0-9][0-9]:[0-9][0-9]:[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'

Of course, you can take a shortcut if you don't expect false positives:

当然,如果你不期望误报,你可以采取捷径:

pattern='[0-9]*-[0-9]?:[0-9]?:[0-9]?:[0-9]*'

Note how * and ?, unlike in a regex, are not duplication symbols (quantifiers) that refer to the preceding expression, but by themselves represent any sequence of characters (*) or any single character (?).

注意*与正则表达式不同,*和?不是引用前面表达式的重复符号(量词),而是它们本身表示任何字符序列(*)或任何单个字符(?)。

If we put it all together:

如果我们把它们放在一起:

files=$(find "$path" -type d -name "$pattern")
  • It's important to double-quote the variable references to protect their values from unwanted shell expansions, notably to preserve any whitespace in the path and to prevent premature globbing by the shell of value $pattern.

    重要的是双引号变量引用以保护它们的值免受不必要的shell扩展,特别是保留路径中的任何空格并防止值$ pattern的shell过早地使用。

  • Note that I've added -type d to limit matching to directories (folders), which improves performance.

    请注意,我添加了-type d来限制与目录(文件夹)的匹配,从而提高了性能。


Optional background information:

可选背景信息:

Below is a regex feature matrix as of GNU find v4.6.0 / BSD find as found on macOS 10.12.1:

下面是一个正则表达式的特征矩阵,从GNU find v4.6.0 / BSD中可以找到macOS 10.12.1:

  • GNU find features are listed by the types supported by the -regextype option, with emacs being the default.

    GNU查找功能按-regextype选项支持的类型列出,emacs是默认值。

    • Note that several posix-*-named regex types are misnomers in that they support features beyond what POSIX mandates.
    • 请注意,几个posix - * - 命名的正则表达式类型是错误的,因为它们支持的功能超出了POSIX的要求。

  • BSD find features are listed by basic (using NO regex option, which implies platform-flavored BREs) and extended (using option -E, which implies platform-flavored EREs).

    BSD查找功能按基本列出(使用NO regex选项,这意味着平台风格的BRE)和扩展(使用选项-E,这意味着平台风格的ERE)。

For cross-platform use, sticking with POSIX EREs (extended regular expressions) while using -regextype posix-extended with GNU find and using -E with BSD find is safe, but note that not all features you may expect will be supported, notably \b, \</\> and character class shortcuts such as \d.

对于跨平台使用,坚持使用POSIX ERE(扩展正则表达式),同时使用-regextype posix-extended与GNU查找并使用-E与BSD查找是安全的,但请注意,并非所有功能都可以支持,特别是\ b,\ 和字符类快捷键,例如\ d。

=================== GNU find ===================
== REGEX FEATURE: \{\}
TYPE: awk:                                        -
TYPE: egrep:                                      -
TYPE: ed:                                         ✓
TYPE: emacs:                                      -
TYPE: gnu-awk:                                    -
TYPE: grep:                                       ✓
TYPE: posix-awk:                                  -
TYPE: posix-basic:                                ✓
TYPE: posix-egrep:                                -
TYPE: posix-extended:                             -
TYPE: posix-minimal-basic:                        ✓
TYPE: sed:                                        ✓
== REGEX FEATURE: {}
TYPE: awk:                                        -
TYPE: egrep:                                      ✓
TYPE: ed:                                         -
TYPE: emacs:                                      -
TYPE: gnu-awk:                                    ✓
TYPE: grep:                                       -
TYPE: posix-awk:                                  ✓
TYPE: posix-basic:                                -
TYPE: posix-egrep:                                ✓
TYPE: posix-extended:                             ✓
TYPE: posix-minimal-basic:                        -
TYPE: sed:                                        -
== REGEX FEATURE: \+
TYPE: awk:                                        -
TYPE: egrep:                                      -
TYPE: ed:                                         ✓
TYPE: emacs:                                      -
TYPE: gnu-awk:                                    -
TYPE: grep:                                       ✓
TYPE: posix-awk:                                  -
TYPE: posix-basic:                                ✓
TYPE: posix-egrep:                                -
TYPE: posix-extended:                             -
TYPE: posix-minimal-basic:                        -
TYPE: sed:                                        ✓
== REGEX FEATURE: +
TYPE: awk:                                        ✓
TYPE: egrep:                                      ✓
TYPE: ed:                                         -
TYPE: emacs:                                      ✓
TYPE: gnu-awk:                                    ✓
TYPE: grep:                                       -
TYPE: posix-awk:                                  ✓
TYPE: posix-basic:                                -
TYPE: posix-egrep:                                ✓
TYPE: posix-extended:                             ✓
TYPE: posix-minimal-basic:                        -
TYPE: sed:                                        -
== REGEX FEATURE: \b
TYPE: awk:                                        -
TYPE: egrep:                                      ✓
TYPE: ed:                                         ✓
TYPE: emacs:                                      ✓
TYPE: gnu-awk:                                    ✓
TYPE: grep:                                       ✓
TYPE: posix-awk:                                  -
TYPE: posix-basic:                                ✓
TYPE: posix-egrep:                                ✓
TYPE: posix-extended:                             ✓
TYPE: posix-minimal-basic:                        ✓
TYPE: sed:                                        ✓
== REGEX FEATURE: \< \>
TYPE: awk:                                        -
TYPE: egrep:                                      ✓
TYPE: ed:                                         ✓
TYPE: emacs:                                      ✓
TYPE: gnu-awk:                                    ✓
TYPE: grep:                                       ✓
TYPE: posix-awk:                                  -
TYPE: posix-basic:                                ✓
TYPE: posix-egrep:                                ✓
TYPE: posix-extended:                             ✓
TYPE: posix-minimal-basic:                        ✓
TYPE: sed:                                        ✓
== REGEX FEATURE: [:digit:]
TYPE: awk:                                        ✓
TYPE: egrep:                                      ✓
TYPE: ed:                                         ✓
TYPE: emacs:                                      -
TYPE: gnu-awk:                                    ✓
TYPE: grep:                                       ✓
TYPE: posix-awk:                                  ✓
TYPE: posix-basic:                                ✓
TYPE: posix-egrep:                                ✓
TYPE: posix-extended:                             ✓
TYPE: posix-minimal-basic:                        ✓
TYPE: sed:                                        ✓
== REGEX FEATURE: \d
TYPE: awk:                                        -
TYPE: egrep:                                      -
TYPE: ed:                                         -
TYPE: emacs:                                      -
TYPE: gnu-awk:                                    -
TYPE: grep:                                       -
TYPE: posix-awk:                                  -
TYPE: posix-basic:                                -
TYPE: posix-egrep:                                -
TYPE: posix-extended:                             -
TYPE: posix-minimal-basic:                        -
TYPE: sed:                                        -
== REGEX FEATURE: \s
TYPE: awk:                                        ✓
TYPE: egrep:                                      ✓
TYPE: ed:                                         -
TYPE: emacs:                                      ✓
TYPE: gnu-awk:                                    ✓
TYPE: grep:                                       -
TYPE: posix-awk:                                  ✓
TYPE: posix-basic:                                -
TYPE: posix-egrep:                                ✓
TYPE: posix-extended:                             ✓
TYPE: posix-minimal-basic:                        -
TYPE: sed:                                        -
=================== BSD find ===================
== REGEX FEATURE: \{\}
TYPE: basic:                                      ✓
TYPE: extended:                                   -
== REGEX FEATURE: {}
TYPE: basic:                                      -
TYPE: extended:                                   ✓
== REGEX FEATURE: \+
TYPE: basic:                                      -
TYPE: extended:                                   -
== REGEX FEATURE: +
TYPE: basic:                                      -
TYPE: extended:                                   ✓
== REGEX FEATURE: \b
TYPE: basic:                                      -
TYPE: extended:                                   -
== REGEX FEATURE: \< \>
TYPE: basic:                                      -
TYPE: extended:                                   -
== REGEX FEATURE: [:digit:]
TYPE: basic:                                      ✓
TYPE: extended:                                   ✓
== REGEX FEATURE: \d
TYPE: basic:                                      -
TYPE: extended:                                   -
== REGEX FEATURE: \s
TYPE: basic:                                      -
TYPE: extended:                                   ✓

#2


-1  

When you have a full path of a file, then you don't need a regex to extract the directory name.

当您拥有文件的完整路径时,则不需要正则表达式来提取目录名称。

dirname "/path/20161128-20:34:33:432813246/data.txt" 

will give you

会给你

/path/20161128-20:34:33:432813246

If you really want a regex, try this:

如果你真的想要一个正则表达式,试试这个:

\d{8}-\d{2}:\d{2}:\d{2}:\d{9}

#1


3  

  • Per POSIX, find's -name -path primaries (tests) use patterns (a.k.a wildcard expressions, globs) to match filenames and pathnames (while patterns and regular expressions are distantly related, their syntax and capabilities differ significantly; in short: patterns are syntactically simpler, but far less powerful).

    根据POSIX,find-name -path primaries(tests)使用模式(aka通配符表达式,globs)来匹配文件名和路径名(虽然模式和正则表达式有很大关系,但它们的语法和功能差别很大;简而言之:模式在语法上更简单,但远不那么强大)。

    • -name and matches the pattern against the basename (mere filename) part of an input path only
    • -name并仅将模式与输入路径的basename(仅文件名)部分进行匹配

    • -path matches the pattern against the whole pathname (the full path)
    • -path匹配整个路径名的模式(完整路径)

  • Both GNU and BSD/macOS find implement nonstandard extensions:

    GNU和BSD / macOS都可以实现非标准扩展:

    • -iname and -ipath, which work like their standard-compliant counterparts (based on patterns), except that they match case-insensitively.
    • -iname和-ipath,它们与标准兼容的对应物(基于模式)一样工作,除了它们不区分大小写。

    • -regex and -iregex tests for matching pathnames by regex (regular expression).
      • Caveat: Both implementations offer at least 2 regex dialects to choose from (-E activates support for extended regular expressions in BSD find, and GNU find allows selecting from several dialects with-regextype, but no two dialects are exactly the same across the two implementations - see bottom for the gory details.
      • 警告:两种实现都提供至少2种正则表达方式可供选择(-E激活对BSD查找中扩展正则表达式的支持,而GNU查找允许从多种方言中选择-regextype,但两种方言中两种方言完全相同 - 看看血淋淋的细节底部。

    • -regex和-iregex通过正则表达式(正则表达式)测试匹配的路径名。警告:两种实现都提供至少2种正则表达方式可供选择(-E激活对BSD查找中扩展正则表达式的支持,而GNU查找允许从多种方言中选择-regextype,但两种方言中两种方言完全相同 - 看看血淋淋的细节底部。


With your folder names following a fixed-width naming scheme, a pattern would work:

使用固定宽度命名方案的文件夹名称,模式将起作用:

pattern='[0-9][0-9][0-9][0-9][0-9][0-9][0-9]-[0-9][0-9]:[0-9][0-9]:[0-9][0-9]:[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'

Of course, you can take a shortcut if you don't expect false positives:

当然,如果你不期望误报,你可以采取捷径:

pattern='[0-9]*-[0-9]?:[0-9]?:[0-9]?:[0-9]*'

Note how * and ?, unlike in a regex, are not duplication symbols (quantifiers) that refer to the preceding expression, but by themselves represent any sequence of characters (*) or any single character (?).

注意*与正则表达式不同,*和?不是引用前面表达式的重复符号(量词),而是它们本身表示任何字符序列(*)或任何单个字符(?)。

If we put it all together:

如果我们把它们放在一起:

files=$(find "$path" -type d -name "$pattern")
  • It's important to double-quote the variable references to protect their values from unwanted shell expansions, notably to preserve any whitespace in the path and to prevent premature globbing by the shell of value $pattern.

    重要的是双引号变量引用以保护它们的值免受不必要的shell扩展,特别是保留路径中的任何空格并防止值$ pattern的shell过早地使用。

  • Note that I've added -type d to limit matching to directories (folders), which improves performance.

    请注意,我添加了-type d来限制与目录(文件夹)的匹配,从而提高了性能。


Optional background information:

可选背景信息:

Below is a regex feature matrix as of GNU find v4.6.0 / BSD find as found on macOS 10.12.1:

下面是一个正则表达式的特征矩阵,从GNU find v4.6.0 / BSD中可以找到macOS 10.12.1:

  • GNU find features are listed by the types supported by the -regextype option, with emacs being the default.

    GNU查找功能按-regextype选项支持的类型列出,emacs是默认值。

    • Note that several posix-*-named regex types are misnomers in that they support features beyond what POSIX mandates.
    • 请注意,几个posix - * - 命名的正则表达式类型是错误的,因为它们支持的功能超出了POSIX的要求。

  • BSD find features are listed by basic (using NO regex option, which implies platform-flavored BREs) and extended (using option -E, which implies platform-flavored EREs).

    BSD查找功能按基本列出(使用NO regex选项,这意味着平台风格的BRE)和扩展(使用选项-E,这意味着平台风格的ERE)。

For cross-platform use, sticking with POSIX EREs (extended regular expressions) while using -regextype posix-extended with GNU find and using -E with BSD find is safe, but note that not all features you may expect will be supported, notably \b, \</\> and character class shortcuts such as \d.

对于跨平台使用,坚持使用POSIX ERE(扩展正则表达式),同时使用-regextype posix-extended与GNU查找并使用-E与BSD查找是安全的,但请注意,并非所有功能都可以支持,特别是\ b,\ 和字符类快捷键,例如\ d。

=================== GNU find ===================
== REGEX FEATURE: \{\}
TYPE: awk:                                        -
TYPE: egrep:                                      -
TYPE: ed:                                         ✓
TYPE: emacs:                                      -
TYPE: gnu-awk:                                    -
TYPE: grep:                                       ✓
TYPE: posix-awk:                                  -
TYPE: posix-basic:                                ✓
TYPE: posix-egrep:                                -
TYPE: posix-extended:                             -
TYPE: posix-minimal-basic:                        ✓
TYPE: sed:                                        ✓
== REGEX FEATURE: {}
TYPE: awk:                                        -
TYPE: egrep:                                      ✓
TYPE: ed:                                         -
TYPE: emacs:                                      -
TYPE: gnu-awk:                                    ✓
TYPE: grep:                                       -
TYPE: posix-awk:                                  ✓
TYPE: posix-basic:                                -
TYPE: posix-egrep:                                ✓
TYPE: posix-extended:                             ✓
TYPE: posix-minimal-basic:                        -
TYPE: sed:                                        -
== REGEX FEATURE: \+
TYPE: awk:                                        -
TYPE: egrep:                                      -
TYPE: ed:                                         ✓
TYPE: emacs:                                      -
TYPE: gnu-awk:                                    -
TYPE: grep:                                       ✓
TYPE: posix-awk:                                  -
TYPE: posix-basic:                                ✓
TYPE: posix-egrep:                                -
TYPE: posix-extended:                             -
TYPE: posix-minimal-basic:                        -
TYPE: sed:                                        ✓
== REGEX FEATURE: +
TYPE: awk:                                        ✓
TYPE: egrep:                                      ✓
TYPE: ed:                                         -
TYPE: emacs:                                      ✓
TYPE: gnu-awk:                                    ✓
TYPE: grep:                                       -
TYPE: posix-awk:                                  ✓
TYPE: posix-basic:                                -
TYPE: posix-egrep:                                ✓
TYPE: posix-extended:                             ✓
TYPE: posix-minimal-basic:                        -
TYPE: sed:                                        -
== REGEX FEATURE: \b
TYPE: awk:                                        -
TYPE: egrep:                                      ✓
TYPE: ed:                                         ✓
TYPE: emacs:                                      ✓
TYPE: gnu-awk:                                    ✓
TYPE: grep:                                       ✓
TYPE: posix-awk:                                  -
TYPE: posix-basic:                                ✓
TYPE: posix-egrep:                                ✓
TYPE: posix-extended:                             ✓
TYPE: posix-minimal-basic:                        ✓
TYPE: sed:                                        ✓
== REGEX FEATURE: \< \>
TYPE: awk:                                        -
TYPE: egrep:                                      ✓
TYPE: ed:                                         ✓
TYPE: emacs:                                      ✓
TYPE: gnu-awk:                                    ✓
TYPE: grep:                                       ✓
TYPE: posix-awk:                                  -
TYPE: posix-basic:                                ✓
TYPE: posix-egrep:                                ✓
TYPE: posix-extended:                             ✓
TYPE: posix-minimal-basic:                        ✓
TYPE: sed:                                        ✓
== REGEX FEATURE: [:digit:]
TYPE: awk:                                        ✓
TYPE: egrep:                                      ✓
TYPE: ed:                                         ✓
TYPE: emacs:                                      -
TYPE: gnu-awk:                                    ✓
TYPE: grep:                                       ✓
TYPE: posix-awk:                                  ✓
TYPE: posix-basic:                                ✓
TYPE: posix-egrep:                                ✓
TYPE: posix-extended:                             ✓
TYPE: posix-minimal-basic:                        ✓
TYPE: sed:                                        ✓
== REGEX FEATURE: \d
TYPE: awk:                                        -
TYPE: egrep:                                      -
TYPE: ed:                                         -
TYPE: emacs:                                      -
TYPE: gnu-awk:                                    -
TYPE: grep:                                       -
TYPE: posix-awk:                                  -
TYPE: posix-basic:                                -
TYPE: posix-egrep:                                -
TYPE: posix-extended:                             -
TYPE: posix-minimal-basic:                        -
TYPE: sed:                                        -
== REGEX FEATURE: \s
TYPE: awk:                                        ✓
TYPE: egrep:                                      ✓
TYPE: ed:                                         -
TYPE: emacs:                                      ✓
TYPE: gnu-awk:                                    ✓
TYPE: grep:                                       -
TYPE: posix-awk:                                  ✓
TYPE: posix-basic:                                -
TYPE: posix-egrep:                                ✓
TYPE: posix-extended:                             ✓
TYPE: posix-minimal-basic:                        -
TYPE: sed:                                        -
=================== BSD find ===================
== REGEX FEATURE: \{\}
TYPE: basic:                                      ✓
TYPE: extended:                                   -
== REGEX FEATURE: {}
TYPE: basic:                                      -
TYPE: extended:                                   ✓
== REGEX FEATURE: \+
TYPE: basic:                                      -
TYPE: extended:                                   -
== REGEX FEATURE: +
TYPE: basic:                                      -
TYPE: extended:                                   ✓
== REGEX FEATURE: \b
TYPE: basic:                                      -
TYPE: extended:                                   -
== REGEX FEATURE: \< \>
TYPE: basic:                                      -
TYPE: extended:                                   -
== REGEX FEATURE: [:digit:]
TYPE: basic:                                      ✓
TYPE: extended:                                   ✓
== REGEX FEATURE: \d
TYPE: basic:                                      -
TYPE: extended:                                   -
== REGEX FEATURE: \s
TYPE: basic:                                      -
TYPE: extended:                                   ✓

#2


-1  

When you have a full path of a file, then you don't need a regex to extract the directory name.

当您拥有文件的完整路径时,则不需要正则表达式来提取目录名称。

dirname "/path/20161128-20:34:33:432813246/data.txt" 

will give you

会给你

/path/20161128-20:34:33:432813246

If you really want a regex, try this:

如果你真的想要一个正则表达式,试试这个:

\d{8}-\d{2}:\d{2}:\d{2}:\d{9}