如何在bash中替换字符串中的空格和斜杠?

时间:2022-02-04 16:50:08

Giving the string:

给字符串:

foo='Hello     \    
World! \  
x

we are friends

here we are'

Supose there are also tab characters mixed with spaces after or before the \ character. I want to replace the spaces, tabs and the slash by only a space. I tried with:

在\字符之后或之前,还有与空格混合的制表符。我想只用空格替换空格,制表符和斜杠。我尝试过:

echo "$foo" | tr "[\s\t]\\\[\s\t]\n\[\s\t]" " " | tr -s " "

Returns:

返回:

Hello World! x we are friend here we are 

And the result I need is:

我需要的结果是:

Hello World! x

we are friends

here we are

Some idea, tip or trick to do it? Could I get the result I want in only a command?

有一些想法,小费或技巧吗?我能在一个命令中得到我想要的结果吗?

9 个解决方案

#1


3  

The following one-liner gives the desired result:

以下单行程提供了所需的结果:

echo "$foo" | tr '\n' '\r' | sed 's,\s*\\\s*, ,g' | tr '\r' '\n'
Hello World!

we are friends

here we are

Explanation:

说明:

tr '\n' '\r' removes newlines from the input to avoid special sed behavior for newlines.

tr'\ n''\ r'从输入中删除换行符以避免换行符的特殊sed行为。

sed 's,\s*\\\s*, ,g' converts whitespaces with an embedded \ into one space.

sed's,\ s * \\\ s * ,, g'将带有嵌入\的空格转换为一个空格。

tr '\r' '\n' puts back the unchanged newlines.

tr'\ r''\ n'放回未更改的换行符。

#2


1  

Try as below:

请尝试以下方法:

#!/bin/bash

foo="Hello     \
World!"

echo $foo | sed 's/[\s*,\\]//g'

#3


1  

If you just want to print the output as given, you just need to:

如果您只想按给定的方式打印输出,则只需:

foo='Hello     \
World!'
bar=$(tr -d '\\' <<<"$foo")
echo $bar    # unquoted!
Hello World!

If you want to squeeze the whitespace as it's being stored in the variable, then one of:

如果你想要存储在变量中的空格,那么其中一个:

bar=$(tr -d '\\' <<<"$foo" | tr -s '[:space:]' " ")
bar=$(perl -0777 -pe 's/\\$//mg; s/\s+/ /g' <<<"$foo")

The advantage of the perl version is that it only removes line continuation backslashes (at the end of the line).

perl版本的优点是它只删除行连续反斜杠(在行尾)。


Note that when you use double quotes, the shell takes care of line continuations (proper ones with no whitespace after the slash:

请注意,当您使用双引号时,shell会处理行继续(斜杠后面没有空格的正确行:

$ foo="Hello    \
World"
$ echo "$foo"
Hello    World

So at this point, it's too late.

所以在这一点上,为时已晚。

If you use single quotes, the shell won't interpret line continuations, and

如果使用单引号,则shell不会解释行连续,并且

$ foo='Hello     \
World!

here we are'
$ echo "$foo"
Hello     \
World!

here we are
$ echo "$foo" | perl -0777 -pe 's/(\s*\\\s*\n\s*)/ /sg'
Hello World!

here we are

#4


1  

foo='Hello     \    
World! \  
x

we are friends

here we are'

If you use double quotes then the shell will interpret the \ as a line continuation character. Switching to single quotes preserves the literal backslash.

如果使用双引号,则shell会将\解释为行继续符。切换到单引号可保留文字反斜杠。

I've added an backslash after World! to test multiple backslash lines in a row.

我在世界之后添加了反斜杠!连续测试多个反斜杠行。

sed -r ':s; s/( )? *\\ *$/\1/; Te; N; bs; :e; s/\n *//g' <<< "$foo"

Output:

输出:

Hello World! x

we are friends

here we are

What's this doing? In pseudo-code you might read this as:

这是做什么的?在伪代码中,您可能会将其读作:

while (s/( )? *\\ *$/\1/) {  # While there's a backslash to remove, remove it...
    N                        # ...and concatenate the next line.
}

s/\n *//g                    # Remove all the newlines.

In detail, here's what it does:

详细说明,这是它的作用:

  1. :s is a branch labeled s for "start".
  2. :s是标记为“start”的分支。
  3. s/( )? *\\ *$/\1/ replaces a backslash and its surrounding whitespace. It leaves one space if there was one by capturing ( )?.
  4. s /()? * \\ * $ / \ 1 /替换反斜杠及其周围的空格。如果有一个通过捕获()?它会留下一个空格。
  5. If the previous substitution failed, Te jumps to label e.
  6. 如果先前的替换失败,则Te跳转到标签e。
  7. N concatenates the following line, including the newline \n.
  8. N连接以下行,包括换行符\ n。
  9. bs jumps back to the start. This is so we can handle multiple consecutive lines with backslashes.
  10. bs跳回到开始。这样我们就可以使用反斜杠处理多个连续的行。
  11. :e is a branch labeled e for "end".
  12. :e是标记为“end”的e的分支。
  13. s/\n *//g removes all the extra newlines from step #4. It also removes leading spaces from following line.
  14. s / \ n * // g删除步骤#4中的所有额外换行符。它还从后续行中删除前导空格。

Note that T is a GNU extension. If you need this to work in another version of sed, you'll need to use t instead. That'll probably take an extra b label or two.

请注意,T是GNU扩展。如果您需要在另一个版本的sed中使用它,则需要使用t代替。这可能需要额外的b标签或两个。

#5


1  

You could use a read loop to get the desired output.

您可以使用读取循环来获得所需的输出。

arr=()
i=0
while read line; do
    ((i++))
    [ $i -le 3 ] && arr+=($line)
    if [ $i -eq 3 ]; then
        echo ${arr[@]}
    elif [ $i -gt 3 ]; then
        echo $line
    fi
done <<< "$foo"

#6


1  

With awk:

用awk:

$ echo "$foo"
Hello     \
World! \
x

we are friends

here we are

With trailing newline:

$ echo "$foo" | awk '{gsub(/[[:space:]]*\\[[:space:]]*/," ",$0)}1' RS= FS='\n' ORS='\n\n'
Hello World! x

we are friends

here we are
                                                                                              .

Without trailing newline:

$ echo "$foo" | 
awk '{
  gsub(/[[:space:]]*\\[[:space:]]*/," ",$0)
  a[++i] = $0
}
END {
  for(;j<i;) printf "%s%s", a[++j], (ORS = (j < NR) ? "\n\n" : "\n")
}' RS= FS='\n' 
Hello World! x

we are friends

here we are

#7


1  

sed is an excellent tool for simple subsitutions on a single line but for anything else just use awk. This uses GNU awk for multi-char RS (with other awks RS='\0' would work for text files that don't contain NUL chars):

sed是一个很好的工具,可以在一行上进行简单的替换,但对于其他任何东西只需使用awk。这使用GNU awk进行多字符RS(其他awks RS ='\ 0'适用于不包含NUL字符的文本文件):

$ echo "$foo" | awk -v RS='^$' -v ORS= '{gsub(/\s+\\\s+/," ")}1'
Hello World! x

we are friends

here we are

#8


0  

With bashisms such as extended globbing, parameter expansion etc...but it's probably just as ugly

有了诸如扩展的globbing,参数扩展等基本原理......但它可能同样丑陋

foo='Hello     \    
World!'
shopt -s extglob
echo "${foo/+( )\\*( )$'\n'/ }"
Hello World!

#9


0  

As I understand, you want to just remove trailing spaces followed by an backslash-escaped newline?

据我所知,你想删除尾随空格后跟一个反斜杠转义的换行符?

In that case, search with the regex ( ) *\\\n and replace with \1

在这种情况下,使用regex()* \\\ n进行搜索并替换为\ 1

#1


3  

The following one-liner gives the desired result:

以下单行程提供了所需的结果:

echo "$foo" | tr '\n' '\r' | sed 's,\s*\\\s*, ,g' | tr '\r' '\n'
Hello World!

we are friends

here we are

Explanation:

说明:

tr '\n' '\r' removes newlines from the input to avoid special sed behavior for newlines.

tr'\ n''\ r'从输入中删除换行符以避免换行符的特殊sed行为。

sed 's,\s*\\\s*, ,g' converts whitespaces with an embedded \ into one space.

sed's,\ s * \\\ s * ,, g'将带有嵌入\的空格转换为一个空格。

tr '\r' '\n' puts back the unchanged newlines.

tr'\ r''\ n'放回未更改的换行符。

#2


1  

Try as below:

请尝试以下方法:

#!/bin/bash

foo="Hello     \
World!"

echo $foo | sed 's/[\s*,\\]//g'

#3


1  

If you just want to print the output as given, you just need to:

如果您只想按给定的方式打印输出,则只需:

foo='Hello     \
World!'
bar=$(tr -d '\\' <<<"$foo")
echo $bar    # unquoted!
Hello World!

If you want to squeeze the whitespace as it's being stored in the variable, then one of:

如果你想要存储在变量中的空格,那么其中一个:

bar=$(tr -d '\\' <<<"$foo" | tr -s '[:space:]' " ")
bar=$(perl -0777 -pe 's/\\$//mg; s/\s+/ /g' <<<"$foo")

The advantage of the perl version is that it only removes line continuation backslashes (at the end of the line).

perl版本的优点是它只删除行连续反斜杠(在行尾)。


Note that when you use double quotes, the shell takes care of line continuations (proper ones with no whitespace after the slash:

请注意,当您使用双引号时,shell会处理行继续(斜杠后面没有空格的正确行:

$ foo="Hello    \
World"
$ echo "$foo"
Hello    World

So at this point, it's too late.

所以在这一点上,为时已晚。

If you use single quotes, the shell won't interpret line continuations, and

如果使用单引号,则shell不会解释行连续,并且

$ foo='Hello     \
World!

here we are'
$ echo "$foo"
Hello     \
World!

here we are
$ echo "$foo" | perl -0777 -pe 's/(\s*\\\s*\n\s*)/ /sg'
Hello World!

here we are

#4


1  

foo='Hello     \    
World! \  
x

we are friends

here we are'

If you use double quotes then the shell will interpret the \ as a line continuation character. Switching to single quotes preserves the literal backslash.

如果使用双引号,则shell会将\解释为行继续符。切换到单引号可保留文字反斜杠。

I've added an backslash after World! to test multiple backslash lines in a row.

我在世界之后添加了反斜杠!连续测试多个反斜杠行。

sed -r ':s; s/( )? *\\ *$/\1/; Te; N; bs; :e; s/\n *//g' <<< "$foo"

Output:

输出:

Hello World! x

we are friends

here we are

What's this doing? In pseudo-code you might read this as:

这是做什么的?在伪代码中,您可能会将其读作:

while (s/( )? *\\ *$/\1/) {  # While there's a backslash to remove, remove it...
    N                        # ...and concatenate the next line.
}

s/\n *//g                    # Remove all the newlines.

In detail, here's what it does:

详细说明,这是它的作用:

  1. :s is a branch labeled s for "start".
  2. :s是标记为“start”的分支。
  3. s/( )? *\\ *$/\1/ replaces a backslash and its surrounding whitespace. It leaves one space if there was one by capturing ( )?.
  4. s /()? * \\ * $ / \ 1 /替换反斜杠及其周围的空格。如果有一个通过捕获()?它会留下一个空格。
  5. If the previous substitution failed, Te jumps to label e.
  6. 如果先前的替换失败,则Te跳转到标签e。
  7. N concatenates the following line, including the newline \n.
  8. N连接以下行,包括换行符\ n。
  9. bs jumps back to the start. This is so we can handle multiple consecutive lines with backslashes.
  10. bs跳回到开始。这样我们就可以使用反斜杠处理多个连续的行。
  11. :e is a branch labeled e for "end".
  12. :e是标记为“end”的e的分支。
  13. s/\n *//g removes all the extra newlines from step #4. It also removes leading spaces from following line.
  14. s / \ n * // g删除步骤#4中的所有额外换行符。它还从后续行中删除前导空格。

Note that T is a GNU extension. If you need this to work in another version of sed, you'll need to use t instead. That'll probably take an extra b label or two.

请注意,T是GNU扩展。如果您需要在另一个版本的sed中使用它,则需要使用t代替。这可能需要额外的b标签或两个。

#5


1  

You could use a read loop to get the desired output.

您可以使用读取循环来获得所需的输出。

arr=()
i=0
while read line; do
    ((i++))
    [ $i -le 3 ] && arr+=($line)
    if [ $i -eq 3 ]; then
        echo ${arr[@]}
    elif [ $i -gt 3 ]; then
        echo $line
    fi
done <<< "$foo"

#6


1  

With awk:

用awk:

$ echo "$foo"
Hello     \
World! \
x

we are friends

here we are

With trailing newline:

$ echo "$foo" | awk '{gsub(/[[:space:]]*\\[[:space:]]*/," ",$0)}1' RS= FS='\n' ORS='\n\n'
Hello World! x

we are friends

here we are
                                                                                              .

Without trailing newline:

$ echo "$foo" | 
awk '{
  gsub(/[[:space:]]*\\[[:space:]]*/," ",$0)
  a[++i] = $0
}
END {
  for(;j<i;) printf "%s%s", a[++j], (ORS = (j < NR) ? "\n\n" : "\n")
}' RS= FS='\n' 
Hello World! x

we are friends

here we are

#7


1  

sed is an excellent tool for simple subsitutions on a single line but for anything else just use awk. This uses GNU awk for multi-char RS (with other awks RS='\0' would work for text files that don't contain NUL chars):

sed是一个很好的工具,可以在一行上进行简单的替换,但对于其他任何东西只需使用awk。这使用GNU awk进行多字符RS(其他awks RS ='\ 0'适用于不包含NUL字符的文本文件):

$ echo "$foo" | awk -v RS='^$' -v ORS= '{gsub(/\s+\\\s+/," ")}1'
Hello World! x

we are friends

here we are

#8


0  

With bashisms such as extended globbing, parameter expansion etc...but it's probably just as ugly

有了诸如扩展的globbing,参数扩展等基本原理......但它可能同样丑陋

foo='Hello     \    
World!'
shopt -s extglob
echo "${foo/+( )\\*( )$'\n'/ }"
Hello World!

#9


0  

As I understand, you want to just remove trailing spaces followed by an backslash-escaped newline?

据我所知,你想删除尾随空格后跟一个反斜杠转义的换行符?

In that case, search with the regex ( ) *\\\n and replace with \1

在这种情况下,使用regex()* \\\ n进行搜索并替换为\ 1