I have a file containing lines like
我有一个包含类似行的文件
I want a lot <*tag 1> more <*tag 2>*cheese *cakes.
I am trying to remove the * within <>
but not outside. The tags can be more complicated than above. For example, <*better *tag 1>
.
我试图删除<>但不在外面。标签可能比上面更复杂。例如,<* better * tag 1>。
I tried /\bregex\b/s/\*//g
, which works for tag 1 but not tag 2. So how can I make it work for tag 2 as well?
我试过/ \ bregex \ b / s / \ * // g,它适用于标签1但不适用于标签2.那么我怎样才能使它适用于标签2呢?
Many thanks.
3 个解决方案
#1
3
Obligatory Perl solution:
强制性Perl解决方案:
perl -pe '$_ = join "",
map +($i++ % 2 == 0 ? $_ : s/\*//gr),
split /(<[^>]+>)/, $_;' FILE
Append:
perl -pe 's/(<[^>]+>)/$1 =~ s(\*)()gr/ge' FILE
#2
3
Simple solution if you have only one asterisk in tag
如果标签中只有一个星号,则为简单的解决方案
sed 's/<\([^>]*\)\*\([^>]*\)>/<\1\2>/g'
If you can have more, you can use sed goto label system
如果你有更多,你可以使用sed goto标签系统
sed ':doagain s/<\([^>]*\)\*\([^>]*\)>/<\1\2>/g; t doagain'
Where doagain is label for loop, t doagain is conditional jump to label doagain. Refer to the sed manual:
doagain是循环的标签,t doagain是条件跳转到标签doagain。请参阅sed手册:
t label
Branch to label only if there has been a successful substitution since the last
input line was read or conditional branch was taken. The label may be omitted, in
which case the next cycle is started.
#3
1
awk could solve your problem:
awk可以解决你的问题:
awk '{x=split($0,a,/<[^>]*>/,s);for(i in s)gsub(/\*/,"",s[i]);for(j=1;j<=x;j++)r=r a[j] s[j]; print r}' file
more readable version:
更易读的版本:
awk '{x=split($0,a,/<[^>]*>/,s)
for(i in s)gsub(/\*/,"",s[i])
for(j=1;j<=x;j++)r=r a[j] s[j]
print r}' file
test with your data:
测试您的数据:
kent$ cat file
I want a lot <*tag 1> more <*tag 2>*cheese *cakes. <*better *tag X*>
kent$ awk '{x=split($0,a,/<[^>]*>/,s);for(i in s)gsub(/\*/,"",s[i]);for(j=1;j<=x;j++)r=r a[j] s[j]; print r}' file
I want a lot <tag 1> more <tag 2>*cheese *cakes. <better tag X>
#1
3
Obligatory Perl solution:
强制性Perl解决方案:
perl -pe '$_ = join "",
map +($i++ % 2 == 0 ? $_ : s/\*//gr),
split /(<[^>]+>)/, $_;' FILE
Append:
perl -pe 's/(<[^>]+>)/$1 =~ s(\*)()gr/ge' FILE
#2
3
Simple solution if you have only one asterisk in tag
如果标签中只有一个星号,则为简单的解决方案
sed 's/<\([^>]*\)\*\([^>]*\)>/<\1\2>/g'
If you can have more, you can use sed goto label system
如果你有更多,你可以使用sed goto标签系统
sed ':doagain s/<\([^>]*\)\*\([^>]*\)>/<\1\2>/g; t doagain'
Where doagain is label for loop, t doagain is conditional jump to label doagain. Refer to the sed manual:
doagain是循环的标签,t doagain是条件跳转到标签doagain。请参阅sed手册:
t label
Branch to label only if there has been a successful substitution since the last
input line was read or conditional branch was taken. The label may be omitted, in
which case the next cycle is started.
#3
1
awk could solve your problem:
awk可以解决你的问题:
awk '{x=split($0,a,/<[^>]*>/,s);for(i in s)gsub(/\*/,"",s[i]);for(j=1;j<=x;j++)r=r a[j] s[j]; print r}' file
more readable version:
更易读的版本:
awk '{x=split($0,a,/<[^>]*>/,s)
for(i in s)gsub(/\*/,"",s[i])
for(j=1;j<=x;j++)r=r a[j] s[j]
print r}' file
test with your data:
测试您的数据:
kent$ cat file
I want a lot <*tag 1> more <*tag 2>*cheese *cakes. <*better *tag X*>
kent$ awk '{x=split($0,a,/<[^>]*>/,s);for(i in s)gsub(/\*/,"",s[i]);for(j=1;j<=x;j++)r=r a[j] s[j]; print r}' file
I want a lot <tag 1> more <tag 2>*cheese *cakes. <better tag X>