I am writing a generic shell script which filters out files based on given regex.
我正在编写一个通用shell脚本,它根据给定的正则表达式筛选出文件。
My shell script:
我的shell脚本:
files=$(find $path -name $regex)
In one of the cases (to filter), I want to filter folders inside a directory, the name of the folders are in the below format:
在其中一种情况下(要过滤),我想过滤目录中的文件夹,文件夹的名称采用以下格式:
20161128-20:34:33:432813246
YYYYMMDD-HH:MM:SS:NS
I am unable to arrive at the correct regex.
我无法达到正确的正则表达式。
I am able to get the path of the files inside the folder using the regex '*data.txt'
, as I know the name of the file inside it.
我可以使用正则表达式'* data.txt'获取文件夹中文件的路径,因为我知道其中文件的名称。
But it gives me the full path of the file, something like
但它给了我文件的完整路径,类似于
/path/20161128-20:34:33:432813246/data.txt
What I want is simply:
我想要的只是:
/path/20161128-20:34:33:432813246
Please help me in identifying the correct regex for my requirement
请帮我确定正确的正则表达式以满足我的要求
NOTE:
I know how to process the data after
我知道如何处理数据
files=$(find $path -name $regex)
But since the script needs to be generic for many use cases, I only need the correct regex that needs to be passed.
但由于脚本需要在许多用例中都是通用的,所以我只需要正确的正则表达式来传递。
2 个解决方案
#1
3
-
Per POSIX,
find
's-name
-path
primaries (tests) use patterns (a.k.a wildcard expressions, globs) to match filenames and pathnames (while patterns and regular expressions are distantly related, their syntax and capabilities differ significantly; in short: patterns are syntactically simpler, but far less powerful).根据POSIX,find-name -path primaries(tests)使用模式(aka通配符表达式,globs)来匹配文件名和路径名(虽然模式和正则表达式有很大关系,但它们的语法和功能差别很大;简而言之:模式在语法上更简单,但远不那么强大)。
-
-name
and matches the pattern against the basename (mere filename) part of an input path only -
-path
matches the pattern against the whole pathname (the full path)
-name并仅将模式与输入路径的basename(仅文件名)部分进行匹配
-path匹配整个路径名的模式(完整路径)
-
-
Both GNU and BSD/macOS
find
implement nonstandard extensions:GNU和BSD / macOS都可以实现非标准扩展:
-
-iname
and-ipath
, which work like their standard-compliant counterparts (based on patterns), except that they match case-insensitively. -
-regex
and-iregex
tests for matching pathnames by regex (regular expression).- Caveat: Both implementations offer at least 2 regex dialects to choose from (
-E
activates support for extended regular expressions in BSDfind
, and GNUfind
allows selecting from several dialects with-regextype
, but no two dialects are exactly the same across the two implementations - see bottom for the gory details.
警告:两种实现都提供至少2种正则表达方式可供选择(-E激活对BSD查找中扩展正则表达式的支持,而GNU查找允许从多种方言中选择-regextype,但两种方言中两种方言完全相同 - 看看血淋淋的细节底部。
- Caveat: Both implementations offer at least 2 regex dialects to choose from (
-iname和-ipath,它们与标准兼容的对应物(基于模式)一样工作,除了它们不区分大小写。
-regex和-iregex通过正则表达式(正则表达式)测试匹配的路径名。警告:两种实现都提供至少2种正则表达方式可供选择(-E激活对BSD查找中扩展正则表达式的支持,而GNU查找允许从多种方言中选择-regextype,但两种方言中两种方言完全相同 - 看看血淋淋的细节底部。
-
With your folder names following a fixed-width naming scheme, a pattern would work:
使用固定宽度命名方案的文件夹名称,模式将起作用:
pattern='[0-9][0-9][0-9][0-9][0-9][0-9][0-9]-[0-9][0-9]:[0-9][0-9]:[0-9][0-9]:[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
Of course, you can take a shortcut if you don't expect false positives:
当然,如果你不期望误报,你可以采取捷径:
pattern='[0-9]*-[0-9]?:[0-9]?:[0-9]?:[0-9]*'
Note how *
and ?
, unlike in a regex, are not duplication symbols (quantifiers) that refer to the preceding expression, but by themselves represent any sequence of characters (*
) or any single character (?
).
注意*与正则表达式不同,*和?不是引用前面表达式的重复符号(量词),而是它们本身表示任何字符序列(*)或任何单个字符(?)。
If we put it all together:
如果我们把它们放在一起:
files=$(find "$path" -type d -name "$pattern")
-
It's important to double-quote the variable references to protect their values from unwanted shell expansions, notably to preserve any whitespace in the path and to prevent premature globbing by the shell of value
$pattern
.重要的是双引号变量引用以保护它们的值免受不必要的shell扩展,特别是保留路径中的任何空格并防止值$ pattern的shell过早地使用。
-
Note that I've added
-type d
to limit matching to directories (folders), which improves performance.请注意,我添加了-type d来限制与目录(文件夹)的匹配,从而提高了性能。
Optional background information:
可选背景信息:
Below is a regex feature matrix as of GNU find
v4.6.0 / BSD find
as found on macOS 10.12.1:
下面是一个正则表达式的特征矩阵,从GNU find v4.6.0 / BSD中可以找到macOS 10.12.1:
-
GNU
find
features are listed by the types supported by the-regextype
option, withemacs
being the default.GNU查找功能按-regextype选项支持的类型列出,emacs是默认值。
- Note that several
posix-*
-named regex types are misnomers in that they support features beyond what POSIX mandates.
请注意,几个posix - * - 命名的正则表达式类型是错误的,因为它们支持的功能超出了POSIX的要求。
- Note that several
-
BSD
find
features are listed bybasic
(using NO regex option, which implies platform-flavored BREs) andextended
(using option-E
, which implies platform-flavored EREs).BSD查找功能按基本列出(使用NO regex选项,这意味着平台风格的BRE)和扩展(使用选项-E,这意味着平台风格的ERE)。
For cross-platform use, sticking with POSIX EREs (extended regular expressions) while using -regextype posix-extended
with GNU find
and using -E
with BSD find
is safe, but note that not all features you may expect will be supported, notably \b
, \<
/\>
and character class shortcuts such as \d
.
对于跨平台使用,坚持使用POSIX ERE(扩展正则表达式),同时使用-regextype posix-extended与GNU查找并使用-E与BSD查找是安全的,但请注意,并非所有功能都可以支持,特别是\ b,\ 和字符类快捷键,例如\ d。
=================== GNU find ===================
== REGEX FEATURE: \{\}
TYPE: awk: -
TYPE: egrep: -
TYPE: ed: ✓
TYPE: emacs: -
TYPE: gnu-awk: -
TYPE: grep: ✓
TYPE: posix-awk: -
TYPE: posix-basic: ✓
TYPE: posix-egrep: -
TYPE: posix-extended: -
TYPE: posix-minimal-basic: ✓
TYPE: sed: ✓
== REGEX FEATURE: {}
TYPE: awk: -
TYPE: egrep: ✓
TYPE: ed: -
TYPE: emacs: -
TYPE: gnu-awk: ✓
TYPE: grep: -
TYPE: posix-awk: ✓
TYPE: posix-basic: -
TYPE: posix-egrep: ✓
TYPE: posix-extended: ✓
TYPE: posix-minimal-basic: -
TYPE: sed: -
== REGEX FEATURE: \+
TYPE: awk: -
TYPE: egrep: -
TYPE: ed: ✓
TYPE: emacs: -
TYPE: gnu-awk: -
TYPE: grep: ✓
TYPE: posix-awk: -
TYPE: posix-basic: ✓
TYPE: posix-egrep: -
TYPE: posix-extended: -
TYPE: posix-minimal-basic: -
TYPE: sed: ✓
== REGEX FEATURE: +
TYPE: awk: ✓
TYPE: egrep: ✓
TYPE: ed: -
TYPE: emacs: ✓
TYPE: gnu-awk: ✓
TYPE: grep: -
TYPE: posix-awk: ✓
TYPE: posix-basic: -
TYPE: posix-egrep: ✓
TYPE: posix-extended: ✓
TYPE: posix-minimal-basic: -
TYPE: sed: -
== REGEX FEATURE: \b
TYPE: awk: -
TYPE: egrep: ✓
TYPE: ed: ✓
TYPE: emacs: ✓
TYPE: gnu-awk: ✓
TYPE: grep: ✓
TYPE: posix-awk: -
TYPE: posix-basic: ✓
TYPE: posix-egrep: ✓
TYPE: posix-extended: ✓
TYPE: posix-minimal-basic: ✓
TYPE: sed: ✓
== REGEX FEATURE: \< \>
TYPE: awk: -
TYPE: egrep: ✓
TYPE: ed: ✓
TYPE: emacs: ✓
TYPE: gnu-awk: ✓
TYPE: grep: ✓
TYPE: posix-awk: -
TYPE: posix-basic: ✓
TYPE: posix-egrep: ✓
TYPE: posix-extended: ✓
TYPE: posix-minimal-basic: ✓
TYPE: sed: ✓
== REGEX FEATURE: [:digit:]
TYPE: awk: ✓
TYPE: egrep: ✓
TYPE: ed: ✓
TYPE: emacs: -
TYPE: gnu-awk: ✓
TYPE: grep: ✓
TYPE: posix-awk: ✓
TYPE: posix-basic: ✓
TYPE: posix-egrep: ✓
TYPE: posix-extended: ✓
TYPE: posix-minimal-basic: ✓
TYPE: sed: ✓
== REGEX FEATURE: \d
TYPE: awk: -
TYPE: egrep: -
TYPE: ed: -
TYPE: emacs: -
TYPE: gnu-awk: -
TYPE: grep: -
TYPE: posix-awk: -
TYPE: posix-basic: -
TYPE: posix-egrep: -
TYPE: posix-extended: -
TYPE: posix-minimal-basic: -
TYPE: sed: -
== REGEX FEATURE: \s
TYPE: awk: ✓
TYPE: egrep: ✓
TYPE: ed: -
TYPE: emacs: ✓
TYPE: gnu-awk: ✓
TYPE: grep: -
TYPE: posix-awk: ✓
TYPE: posix-basic: -
TYPE: posix-egrep: ✓
TYPE: posix-extended: ✓
TYPE: posix-minimal-basic: -
TYPE: sed: -
=================== BSD find ===================
== REGEX FEATURE: \{\}
TYPE: basic: ✓
TYPE: extended: -
== REGEX FEATURE: {}
TYPE: basic: -
TYPE: extended: ✓
== REGEX FEATURE: \+
TYPE: basic: -
TYPE: extended: -
== REGEX FEATURE: +
TYPE: basic: -
TYPE: extended: ✓
== REGEX FEATURE: \b
TYPE: basic: -
TYPE: extended: -
== REGEX FEATURE: \< \>
TYPE: basic: -
TYPE: extended: -
== REGEX FEATURE: [:digit:]
TYPE: basic: ✓
TYPE: extended: ✓
== REGEX FEATURE: \d
TYPE: basic: -
TYPE: extended: -
== REGEX FEATURE: \s
TYPE: basic: -
TYPE: extended: ✓
#2
-1
When you have a full path of a file, then you don't need a regex to extract the directory name.
当您拥有文件的完整路径时,则不需要正则表达式来提取目录名称。
dirname "/path/20161128-20:34:33:432813246/data.txt"
will give you
会给你
/path/20161128-20:34:33:432813246
If you really want a regex, try this:
如果你真的想要一个正则表达式,试试这个:
\d{8}-\d{2}:\d{2}:\d{2}:\d{9}
#1
3
-
Per POSIX,
find
's-name
-path
primaries (tests) use patterns (a.k.a wildcard expressions, globs) to match filenames and pathnames (while patterns and regular expressions are distantly related, their syntax and capabilities differ significantly; in short: patterns are syntactically simpler, but far less powerful).根据POSIX,find-name -path primaries(tests)使用模式(aka通配符表达式,globs)来匹配文件名和路径名(虽然模式和正则表达式有很大关系,但它们的语法和功能差别很大;简而言之:模式在语法上更简单,但远不那么强大)。
-
-name
and matches the pattern against the basename (mere filename) part of an input path only -
-path
matches the pattern against the whole pathname (the full path)
-name并仅将模式与输入路径的basename(仅文件名)部分进行匹配
-path匹配整个路径名的模式(完整路径)
-
-
Both GNU and BSD/macOS
find
implement nonstandard extensions:GNU和BSD / macOS都可以实现非标准扩展:
-
-iname
and-ipath
, which work like their standard-compliant counterparts (based on patterns), except that they match case-insensitively. -
-regex
and-iregex
tests for matching pathnames by regex (regular expression).- Caveat: Both implementations offer at least 2 regex dialects to choose from (
-E
activates support for extended regular expressions in BSDfind
, and GNUfind
allows selecting from several dialects with-regextype
, but no two dialects are exactly the same across the two implementations - see bottom for the gory details.
警告:两种实现都提供至少2种正则表达方式可供选择(-E激活对BSD查找中扩展正则表达式的支持,而GNU查找允许从多种方言中选择-regextype,但两种方言中两种方言完全相同 - 看看血淋淋的细节底部。
- Caveat: Both implementations offer at least 2 regex dialects to choose from (
-iname和-ipath,它们与标准兼容的对应物(基于模式)一样工作,除了它们不区分大小写。
-regex和-iregex通过正则表达式(正则表达式)测试匹配的路径名。警告:两种实现都提供至少2种正则表达方式可供选择(-E激活对BSD查找中扩展正则表达式的支持,而GNU查找允许从多种方言中选择-regextype,但两种方言中两种方言完全相同 - 看看血淋淋的细节底部。
-
With your folder names following a fixed-width naming scheme, a pattern would work:
使用固定宽度命名方案的文件夹名称,模式将起作用:
pattern='[0-9][0-9][0-9][0-9][0-9][0-9][0-9]-[0-9][0-9]:[0-9][0-9]:[0-9][0-9]:[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
Of course, you can take a shortcut if you don't expect false positives:
当然,如果你不期望误报,你可以采取捷径:
pattern='[0-9]*-[0-9]?:[0-9]?:[0-9]?:[0-9]*'
Note how *
and ?
, unlike in a regex, are not duplication symbols (quantifiers) that refer to the preceding expression, but by themselves represent any sequence of characters (*
) or any single character (?
).
注意*与正则表达式不同,*和?不是引用前面表达式的重复符号(量词),而是它们本身表示任何字符序列(*)或任何单个字符(?)。
If we put it all together:
如果我们把它们放在一起:
files=$(find "$path" -type d -name "$pattern")
-
It's important to double-quote the variable references to protect their values from unwanted shell expansions, notably to preserve any whitespace in the path and to prevent premature globbing by the shell of value
$pattern
.重要的是双引号变量引用以保护它们的值免受不必要的shell扩展,特别是保留路径中的任何空格并防止值$ pattern的shell过早地使用。
-
Note that I've added
-type d
to limit matching to directories (folders), which improves performance.请注意,我添加了-type d来限制与目录(文件夹)的匹配,从而提高了性能。
Optional background information:
可选背景信息:
Below is a regex feature matrix as of GNU find
v4.6.0 / BSD find
as found on macOS 10.12.1:
下面是一个正则表达式的特征矩阵,从GNU find v4.6.0 / BSD中可以找到macOS 10.12.1:
-
GNU
find
features are listed by the types supported by the-regextype
option, withemacs
being the default.GNU查找功能按-regextype选项支持的类型列出,emacs是默认值。
- Note that several
posix-*
-named regex types are misnomers in that they support features beyond what POSIX mandates.
请注意,几个posix - * - 命名的正则表达式类型是错误的,因为它们支持的功能超出了POSIX的要求。
- Note that several
-
BSD
find
features are listed bybasic
(using NO regex option, which implies platform-flavored BREs) andextended
(using option-E
, which implies platform-flavored EREs).BSD查找功能按基本列出(使用NO regex选项,这意味着平台风格的BRE)和扩展(使用选项-E,这意味着平台风格的ERE)。
For cross-platform use, sticking with POSIX EREs (extended regular expressions) while using -regextype posix-extended
with GNU find
and using -E
with BSD find
is safe, but note that not all features you may expect will be supported, notably \b
, \<
/\>
and character class shortcuts such as \d
.
对于跨平台使用,坚持使用POSIX ERE(扩展正则表达式),同时使用-regextype posix-extended与GNU查找并使用-E与BSD查找是安全的,但请注意,并非所有功能都可以支持,特别是\ b,\ 和字符类快捷键,例如\ d。
=================== GNU find ===================
== REGEX FEATURE: \{\}
TYPE: awk: -
TYPE: egrep: -
TYPE: ed: ✓
TYPE: emacs: -
TYPE: gnu-awk: -
TYPE: grep: ✓
TYPE: posix-awk: -
TYPE: posix-basic: ✓
TYPE: posix-egrep: -
TYPE: posix-extended: -
TYPE: posix-minimal-basic: ✓
TYPE: sed: ✓
== REGEX FEATURE: {}
TYPE: awk: -
TYPE: egrep: ✓
TYPE: ed: -
TYPE: emacs: -
TYPE: gnu-awk: ✓
TYPE: grep: -
TYPE: posix-awk: ✓
TYPE: posix-basic: -
TYPE: posix-egrep: ✓
TYPE: posix-extended: ✓
TYPE: posix-minimal-basic: -
TYPE: sed: -
== REGEX FEATURE: \+
TYPE: awk: -
TYPE: egrep: -
TYPE: ed: ✓
TYPE: emacs: -
TYPE: gnu-awk: -
TYPE: grep: ✓
TYPE: posix-awk: -
TYPE: posix-basic: ✓
TYPE: posix-egrep: -
TYPE: posix-extended: -
TYPE: posix-minimal-basic: -
TYPE: sed: ✓
== REGEX FEATURE: +
TYPE: awk: ✓
TYPE: egrep: ✓
TYPE: ed: -
TYPE: emacs: ✓
TYPE: gnu-awk: ✓
TYPE: grep: -
TYPE: posix-awk: ✓
TYPE: posix-basic: -
TYPE: posix-egrep: ✓
TYPE: posix-extended: ✓
TYPE: posix-minimal-basic: -
TYPE: sed: -
== REGEX FEATURE: \b
TYPE: awk: -
TYPE: egrep: ✓
TYPE: ed: ✓
TYPE: emacs: ✓
TYPE: gnu-awk: ✓
TYPE: grep: ✓
TYPE: posix-awk: -
TYPE: posix-basic: ✓
TYPE: posix-egrep: ✓
TYPE: posix-extended: ✓
TYPE: posix-minimal-basic: ✓
TYPE: sed: ✓
== REGEX FEATURE: \< \>
TYPE: awk: -
TYPE: egrep: ✓
TYPE: ed: ✓
TYPE: emacs: ✓
TYPE: gnu-awk: ✓
TYPE: grep: ✓
TYPE: posix-awk: -
TYPE: posix-basic: ✓
TYPE: posix-egrep: ✓
TYPE: posix-extended: ✓
TYPE: posix-minimal-basic: ✓
TYPE: sed: ✓
== REGEX FEATURE: [:digit:]
TYPE: awk: ✓
TYPE: egrep: ✓
TYPE: ed: ✓
TYPE: emacs: -
TYPE: gnu-awk: ✓
TYPE: grep: ✓
TYPE: posix-awk: ✓
TYPE: posix-basic: ✓
TYPE: posix-egrep: ✓
TYPE: posix-extended: ✓
TYPE: posix-minimal-basic: ✓
TYPE: sed: ✓
== REGEX FEATURE: \d
TYPE: awk: -
TYPE: egrep: -
TYPE: ed: -
TYPE: emacs: -
TYPE: gnu-awk: -
TYPE: grep: -
TYPE: posix-awk: -
TYPE: posix-basic: -
TYPE: posix-egrep: -
TYPE: posix-extended: -
TYPE: posix-minimal-basic: -
TYPE: sed: -
== REGEX FEATURE: \s
TYPE: awk: ✓
TYPE: egrep: ✓
TYPE: ed: -
TYPE: emacs: ✓
TYPE: gnu-awk: ✓
TYPE: grep: -
TYPE: posix-awk: ✓
TYPE: posix-basic: -
TYPE: posix-egrep: ✓
TYPE: posix-extended: ✓
TYPE: posix-minimal-basic: -
TYPE: sed: -
=================== BSD find ===================
== REGEX FEATURE: \{\}
TYPE: basic: ✓
TYPE: extended: -
== REGEX FEATURE: {}
TYPE: basic: -
TYPE: extended: ✓
== REGEX FEATURE: \+
TYPE: basic: -
TYPE: extended: -
== REGEX FEATURE: +
TYPE: basic: -
TYPE: extended: ✓
== REGEX FEATURE: \b
TYPE: basic: -
TYPE: extended: -
== REGEX FEATURE: \< \>
TYPE: basic: -
TYPE: extended: -
== REGEX FEATURE: [:digit:]
TYPE: basic: ✓
TYPE: extended: ✓
== REGEX FEATURE: \d
TYPE: basic: -
TYPE: extended: -
== REGEX FEATURE: \s
TYPE: basic: -
TYPE: extended: ✓
#2
-1
When you have a full path of a file, then you don't need a regex to extract the directory name.
当您拥有文件的完整路径时,则不需要正则表达式来提取目录名称。
dirname "/path/20161128-20:34:33:432813246/data.txt"
will give you
会给你
/path/20161128-20:34:33:432813246
If you really want a regex, try this:
如果你真的想要一个正则表达式,试试这个:
\d{8}-\d{2}:\d{2}:\d{2}:\d{9}