Bash脚本 - 使用正则表达式分隔符拆分字符串

时间:2022-04-11 19:26:45

I want to split string something like 'substring1 substring2 ONCE[0,10s] substring3'. The expected result should be (with delimiter 'ONCE[0,10s]'):

我想拆分类似'substring1 substring2 ONCE [0,10s] substring3'的字符串。预期的结果应该是(带分隔符'ONCE [0,10s]'):

substring1 substring2
substring3

The problem is that the number in delimiter is variable such as 'ONCE[0,1s]' or 'ONCE[0,3m]' or 'ONCE[0,10d]' and so on.

问题是分隔符中的数字是可变的,例如'ONCE [0,1s]'或'ONCE [0,3m]'或'ONCE [0,10d]'等等。

How can I do this in bash script ? Any idea ?

我怎么能在bash脚本中这样做?任何想法 ?

Thank you

谢谢

3 个解决方案

#1


3  

The example provided in the OP (as well as the two answers provided by @GlennJackman and @devnull) assume that the actual question could have been:

OP中提供的示例(以及@GlennJackman和@devnull提供的两个答案)假设实际问题可能是:

In bash, how do I replace the match for a regular expression in a string with a newline.

在bash中,如何使用换行符替换字符串中正则表达式的匹配项。

That's not actually the same as "split a string using a regular expression", unless you add the constraint that the string does not contain any newline characters. And even then, it's not actually "splitting" the string; the presumption is that some other process will use a newline to split the result.

这与“使用正则表达式拆分字符串”实际上并不相同,除非您添加字符串不包含任何换行符的约束。即便如此,它实际上并没有“分裂”字符串;假设其他一些过程将使用换行符来分割结果。

Once the question has been reformulated, the solution is not challenging. You could use any tool which supports regular expressions, such as sed:

一旦问题重新制定,解决方案就没有挑战性。您可以使用任何支持正则表达式的工具,例如sed:

sed 's/ *ONCE\[[^]]*] */\n/g' <<<"$variable"

(Remove the g if you only want to replace the first sequence; you may need to adjust the regular expression, since it wasn't quite clear what the desired constraints are.)

(如果您只想替换第一个序列,请删除g;您可能需要调整正则表达式,因为不太清楚所需的约束是什么。)

bash itself does not provide a replace all primitive using regular expressions, although it does have "patterns" and, if the option extglob is set (which is the default on some distributions), the patterns are sufficiently powerful to express the pattern, so you could use:

bash本身不提供使用正则表达式替换所有原语,虽然它确实有“模式”,如果设置选项extglob(这是某些发行版的默认设置),模式足以表达模式,所以你可以用:

echo "${variable//*( )ONCE\[*([^]])]*( )/$'\n'}"

Again, you can make the substitution only happen once by changing // to / and you may need to change the pattern to meet your precise needs.

同样,您可以通过将//更改为/来进行替换,您可能需要更改模式以满足您的精确需求。

That leaves open the question of how to actually split a bash variable using a delimiter specified by a regular expression, for some definition of "split". One possible definition is "call a function with the parts of the string as arguments"; that's the one which we use here:

这就留下了如何使用正则表达式指定的分隔符实际拆分bash变量的问题,对于“split”的某些定义。一种可能的定义是“以字符串的部分作为参数调用函数”;那是我们在这里使用的那个:

# Usage:
# call_with_split <pattern> <string> <cmd> <args>...
# Splits string according to regular expression pattern and then invokes
# cmd args string-pieces
call_with_split () { 
  if [[ $2 =~ ($1).* ]]; then
    call_with_split "$1" \
                    "${2:$((${#2} - ${#BASH_REMATCH[0]} + ${#BASH_REMATCH[1]}))}" \
                    "${@:3}" \
                    "${2:0:$((${#2} - ${#BASH_REMATCH[0]}))}"
  else
    "${@:3}" "$2"
  fi
}

Example:

例:

$ var="substring1 substring2 ONCE[0,10s] substring3"
$ call_with_split " ONCE\[[^]]*] " "$var" printf "%s\n"
substring1 substring2
substring3

#2


2  

bash:

庆典:

s='substring1 substring2 ONCE[0,10s] substring3'

if [[ $s =~ (.+)" ONCE["[0-9]+,[0-9]+[smhd]"] "(.+) ]]; then
    echo "${BASH_REMATCH[1]}"
    echo "${BASH_REMATCH[2]}"
else 
    echo no match
fi
substring1 substring2
substring3

#3


1  

You could use awk. Specify the field separator as:

你可以使用awk。将字段分隔符指定为:

'ONCE[[]0,[^]]*[]] *'

For example, using your sample input:

例如,使用您的示例输入:

$ awk -F 'ONCE[[]0,[^]]*[]] *' '{for(i=1;i<=NF;i++){printf $i"\n"}}' <<< "substring1 substring2 ONCE[0,10s] substring3"
substring1 substring2 
substring3

#1


3  

The example provided in the OP (as well as the two answers provided by @GlennJackman and @devnull) assume that the actual question could have been:

OP中提供的示例(以及@GlennJackman和@devnull提供的两个答案)假设实际问题可能是:

In bash, how do I replace the match for a regular expression in a string with a newline.

在bash中,如何使用换行符替换字符串中正则表达式的匹配项。

That's not actually the same as "split a string using a regular expression", unless you add the constraint that the string does not contain any newline characters. And even then, it's not actually "splitting" the string; the presumption is that some other process will use a newline to split the result.

这与“使用正则表达式拆分字符串”实际上并不相同,除非您添加字符串不包含任何换行符的约束。即便如此,它实际上并没有“分裂”字符串;假设其他一些过程将使用换行符来分割结果。

Once the question has been reformulated, the solution is not challenging. You could use any tool which supports regular expressions, such as sed:

一旦问题重新制定,解决方案就没有挑战性。您可以使用任何支持正则表达式的工具,例如sed:

sed 's/ *ONCE\[[^]]*] */\n/g' <<<"$variable"

(Remove the g if you only want to replace the first sequence; you may need to adjust the regular expression, since it wasn't quite clear what the desired constraints are.)

(如果您只想替换第一个序列,请删除g;您可能需要调整正则表达式,因为不太清楚所需的约束是什么。)

bash itself does not provide a replace all primitive using regular expressions, although it does have "patterns" and, if the option extglob is set (which is the default on some distributions), the patterns are sufficiently powerful to express the pattern, so you could use:

bash本身不提供使用正则表达式替换所有原语,虽然它确实有“模式”,如果设置选项extglob(这是某些发行版的默认设置),模式足以表达模式,所以你可以用:

echo "${variable//*( )ONCE\[*([^]])]*( )/$'\n'}"

Again, you can make the substitution only happen once by changing // to / and you may need to change the pattern to meet your precise needs.

同样,您可以通过将//更改为/来进行替换,您可能需要更改模式以满足您的精确需求。

That leaves open the question of how to actually split a bash variable using a delimiter specified by a regular expression, for some definition of "split". One possible definition is "call a function with the parts of the string as arguments"; that's the one which we use here:

这就留下了如何使用正则表达式指定的分隔符实际拆分bash变量的问题,对于“split”的某些定义。一种可能的定义是“以字符串的部分作为参数调用函数”;那是我们在这里使用的那个:

# Usage:
# call_with_split <pattern> <string> <cmd> <args>...
# Splits string according to regular expression pattern and then invokes
# cmd args string-pieces
call_with_split () { 
  if [[ $2 =~ ($1).* ]]; then
    call_with_split "$1" \
                    "${2:$((${#2} - ${#BASH_REMATCH[0]} + ${#BASH_REMATCH[1]}))}" \
                    "${@:3}" \
                    "${2:0:$((${#2} - ${#BASH_REMATCH[0]}))}"
  else
    "${@:3}" "$2"
  fi
}

Example:

例:

$ var="substring1 substring2 ONCE[0,10s] substring3"
$ call_with_split " ONCE\[[^]]*] " "$var" printf "%s\n"
substring1 substring2
substring3

#2


2  

bash:

庆典:

s='substring1 substring2 ONCE[0,10s] substring3'

if [[ $s =~ (.+)" ONCE["[0-9]+,[0-9]+[smhd]"] "(.+) ]]; then
    echo "${BASH_REMATCH[1]}"
    echo "${BASH_REMATCH[2]}"
else 
    echo no match
fi
substring1 substring2
substring3

#3


1  

You could use awk. Specify the field separator as:

你可以使用awk。将字段分隔符指定为:

'ONCE[[]0,[^]]*[]] *'

For example, using your sample input:

例如,使用您的示例输入:

$ awk -F 'ONCE[[]0,[^]]*[]] *' '{for(i=1;i<=NF;i++){printf $i"\n"}}' <<< "substring1 substring2 ONCE[0,10s] substring3"
substring1 substring2 
substring3