I need to write a quick (by tomorrow) filter script to replace line breaks (LF or CRLF) found within double quoted strings by the escaped newline \n
. The content is a (broken) javascript program, so I need to allow for escape sequences like "ab\"cd"
and "ab\\"cd"ef"
within a string.
我需要编写一个快速(明天)过滤器脚本来替换转义换行符在双引号字符串中找到的换行符(LF或CRLF)\ n。内容是一个(破碎的)javascript程序,所以我需要在字符串中允许转义序列,如“ab \”cd“和”ab \\“cd”ef“。
I understand that sed is not well-suited for the job as it work per line, so I turn to perl, of which I know nothing :)
我知道sed并不适合这项工作,因为它每行工作,所以我转向perl,其中我什么都不知道:)
I've written this regex: "(((\\.)|[^"\\\n])*\n?)*"
and tested it with the http://regex.powertoy.org. It indeed matches quoted strings with line breaks, however, perl -p -e 's/"(((\\.)|[^"\\\n])*(\n)?)*"/TEST/g'
does not.
我写了这个正则表达式:“(((\\。)| [^”\\\ n])* \ n?)*“并用http://regex.powertoy.org测试它。它确实匹配引用带换行符的字符串,但是,perl -p -e's /“(((\\。)| [^”\\\ n])*(\ n)?)*“/ TEST / g'不会。
So my questions are:
所以我的问题是:
- how to make perl to match line breaks?
- how to write the "replace-by" part so that it keeps the original string and only replaces newlines?
如何使perl匹配换行符?
如何编写“替换”部分,以便保留原始字符串并仅替换换行符?
There is this similar question with awk solution, but it is not quite what I need.
awk解决方案有类似的问题,但它并不是我需要的。
NOTE: I usually don't ask "please do this for me" questions, but I really don't feel like learning perl/awk by tomorrow... :)
注意:我通常不会问“请为我做这个”问题,但我真的不想明天学习perl / awk ...... :)
EDIT: sample data
编辑:样本数据
"abc\"def" - matches as one string
"abc\\"def"xy" - match "abcd\\" and "xy"
"ab
cd
ef" - is replaced by "ab\ncd\nef"
4 个解决方案
#1
2
Here is a simple Perl solution:
这是一个简单的Perl解决方案:
s§
\G # match from the beginning of the string or the last match
([^"]*+) # till we get to a quote
"((?:[^"\\]++|\\.)*+)" # match the whole quote
§
$a = $1;
$b = $2;
$b =~ s/\r?\n/\\n/g; # replace what you want inside the quote
"$a\"$b\"";
§gex;
Here is another solution in case you wouldn't want to use /e
and just do it with one regex:
这是另一个解决方案,如果您不想使用/ e并且只使用一个正则表达式:
use strict;
$_=<<'_quote_';
hai xtest "aa xx aax" baix "xx"
x "axa\"x\\" xa "x\\\\\"x" ax
xbai!x
_quote_
print "Original:\n", $_, "\n";
s/
(
(?:
# at the beginning of the string match till inside the quotes
^(?&outside_quote) "
# or continue from last match which always stops inside quotes
| (?!^)\G
)
(?&inside_quote) # eat things up till we find what we want
)
x # the thing we want to replace
(
(?&inside_quote) # eat more possibly till end of quote
# if going out of quote make sure the match stops inside them
# or at the end of string
(?: " (?&outside_quote) (?:"|\z) )?
)
(?(DEFINE)
(?<outside_quote> [^"]*+ ) # just eat everything till quoting starts
(?<inside_quote> (?:[^"\\x]++|\\.)*+ ) # handle escapes
)
/$1Y$2/xg;
print "Replaced:\n", $_, "\n";
Output:
Original:
hai xtest "aa xx aax" baix "xx"
x "axa\"x\\" xa "x\\\\\"x" ax
xbai!x
Replaced:
hai xtest "aa YY aaY" baix "YY"
x "aYa\"Y\\" xa "Y\\\\\"Y" ax
xbai!x
To work with line breaks instead of x, just replace it in the regex like so:
要使用换行符而不是x,只需在正则表达式中替换它,如下所示:
s/
(
(?:
# at the beginning of the string match till inside the quotes
^(?&outside_quote) "
# or continue from last match which always stops inside quotes
| (?!^)\G
)
(?&inside_quote) # eat things up till we find what we want
)
\r?\n # the thing we want to replace
(
(?&inside_quote) # eat more possibly till end of quote
# if going out of quote make sure the match stops inside them
# or at the end of string
(?: " (?&outside_quote) (?:"|\z) )?
)
(?(DEFINE)
(?<outside_quote> [^"]*+ ) # just eat everything till quoting starts
(?<inside_quote> (?:[^"\\\r\n]++|\\.)*+ ) # handle escapes
)
/$1\\n$2/xg;
#2
1
Until the OP posts some example content to test by, try adding the "m" (and possibly the "s") flag to the end of your regex; from perldoc perlreref
(reference):
在OP发布一些示例内容进行测试之前,请尝试将“m”(可能还有“s”)标记添加到正则表达式的末尾;来自perldoc perlreref(参考):
m Multiline mode - ^ and $ match internal lines
s match as a Single line - . matches \n
For testing you might also find that adding the command line argument "-i.bak" so that you keep a backup of the original file (now with the extension ".bak").
对于测试,您可能还会发现添加命令行参数“-i.bak”以便保留原始文件的备份(现在扩展名为“.bak”)。
Note also that if you want to capture but not store something you can use (?:PATTERN)
rather than (PATTERN)
. Once you have your captured content use $1
through $9
to access stored matches from the matching section.
另请注意,如果您想捕获但不能存储可以使用的东西(?:PATTERN)而不是(PATTERN)。获取捕获的内容后,使用$ 1到$ 9来访问匹配部分中存储的匹配项。
For more info see the link about as well as perldoc perlretut
(tutorial) and perldoc perlre
(full-ish documentation)
有关更多信息,请参阅有关以及perldoc perlretut(教程)和perldoc perlre(完整文档)的链接
#3
1
#!/usr/bin/perl
use warnings;
use strict;
use Regexp::Common;
$_ = '"abc\"def"' . '"abc\\\\"def"xy"' . qq("ab\ncd\nef");
print "befor: {{$_}}\n";
s{($RE{quoted})}
{ (my $x=$1) =~ s/\n/\\n/g;
$x
}ge;
print "after: {{$_}}\n";
#4
1
Using Perl 5.14.0 (install with perlbrew) one can do this:
使用Perl 5.14.0(使用perlbrew安装)可以这样做:
#!/usr/bin/env perl
use strict;
use warnings;
use 5.14.0;
use Regexp::Common qw/delimited/;
my $data = <<'END';
"abc\"def"
"abc\\"def"xy"
"ab
cd
ef"
END
my $output = $data =~ s/$RE{delimited}{-delim=>'"'}{-keep}/$1=~s!\n!\\n!rg/egr;
print $output;
I need 5.14.0 for the /r
flag of the internal replace. If someone knows how to avoid this please let me know.
我需要5.14.0作为内部替换的/ r标志。如果有人知道如何避免这种情况,请告诉我。
#1
2
Here is a simple Perl solution:
这是一个简单的Perl解决方案:
s§
\G # match from the beginning of the string or the last match
([^"]*+) # till we get to a quote
"((?:[^"\\]++|\\.)*+)" # match the whole quote
§
$a = $1;
$b = $2;
$b =~ s/\r?\n/\\n/g; # replace what you want inside the quote
"$a\"$b\"";
§gex;
Here is another solution in case you wouldn't want to use /e
and just do it with one regex:
这是另一个解决方案,如果您不想使用/ e并且只使用一个正则表达式:
use strict;
$_=<<'_quote_';
hai xtest "aa xx aax" baix "xx"
x "axa\"x\\" xa "x\\\\\"x" ax
xbai!x
_quote_
print "Original:\n", $_, "\n";
s/
(
(?:
# at the beginning of the string match till inside the quotes
^(?&outside_quote) "
# or continue from last match which always stops inside quotes
| (?!^)\G
)
(?&inside_quote) # eat things up till we find what we want
)
x # the thing we want to replace
(
(?&inside_quote) # eat more possibly till end of quote
# if going out of quote make sure the match stops inside them
# or at the end of string
(?: " (?&outside_quote) (?:"|\z) )?
)
(?(DEFINE)
(?<outside_quote> [^"]*+ ) # just eat everything till quoting starts
(?<inside_quote> (?:[^"\\x]++|\\.)*+ ) # handle escapes
)
/$1Y$2/xg;
print "Replaced:\n", $_, "\n";
Output:
Original:
hai xtest "aa xx aax" baix "xx"
x "axa\"x\\" xa "x\\\\\"x" ax
xbai!x
Replaced:
hai xtest "aa YY aaY" baix "YY"
x "aYa\"Y\\" xa "Y\\\\\"Y" ax
xbai!x
To work with line breaks instead of x, just replace it in the regex like so:
要使用换行符而不是x,只需在正则表达式中替换它,如下所示:
s/
(
(?:
# at the beginning of the string match till inside the quotes
^(?&outside_quote) "
# or continue from last match which always stops inside quotes
| (?!^)\G
)
(?&inside_quote) # eat things up till we find what we want
)
\r?\n # the thing we want to replace
(
(?&inside_quote) # eat more possibly till end of quote
# if going out of quote make sure the match stops inside them
# or at the end of string
(?: " (?&outside_quote) (?:"|\z) )?
)
(?(DEFINE)
(?<outside_quote> [^"]*+ ) # just eat everything till quoting starts
(?<inside_quote> (?:[^"\\\r\n]++|\\.)*+ ) # handle escapes
)
/$1\\n$2/xg;
#2
1
Until the OP posts some example content to test by, try adding the "m" (and possibly the "s") flag to the end of your regex; from perldoc perlreref
(reference):
在OP发布一些示例内容进行测试之前,请尝试将“m”(可能还有“s”)标记添加到正则表达式的末尾;来自perldoc perlreref(参考):
m Multiline mode - ^ and $ match internal lines
s match as a Single line - . matches \n
For testing you might also find that adding the command line argument "-i.bak" so that you keep a backup of the original file (now with the extension ".bak").
对于测试,您可能还会发现添加命令行参数“-i.bak”以便保留原始文件的备份(现在扩展名为“.bak”)。
Note also that if you want to capture but not store something you can use (?:PATTERN)
rather than (PATTERN)
. Once you have your captured content use $1
through $9
to access stored matches from the matching section.
另请注意,如果您想捕获但不能存储可以使用的东西(?:PATTERN)而不是(PATTERN)。获取捕获的内容后,使用$ 1到$ 9来访问匹配部分中存储的匹配项。
For more info see the link about as well as perldoc perlretut
(tutorial) and perldoc perlre
(full-ish documentation)
有关更多信息,请参阅有关以及perldoc perlretut(教程)和perldoc perlre(完整文档)的链接
#3
1
#!/usr/bin/perl
use warnings;
use strict;
use Regexp::Common;
$_ = '"abc\"def"' . '"abc\\\\"def"xy"' . qq("ab\ncd\nef");
print "befor: {{$_}}\n";
s{($RE{quoted})}
{ (my $x=$1) =~ s/\n/\\n/g;
$x
}ge;
print "after: {{$_}}\n";
#4
1
Using Perl 5.14.0 (install with perlbrew) one can do this:
使用Perl 5.14.0(使用perlbrew安装)可以这样做:
#!/usr/bin/env perl
use strict;
use warnings;
use 5.14.0;
use Regexp::Common qw/delimited/;
my $data = <<'END';
"abc\"def"
"abc\\"def"xy"
"ab
cd
ef"
END
my $output = $data =~ s/$RE{delimited}{-delim=>'"'}{-keep}/$1=~s!\n!\\n!rg/egr;
print $output;
I need 5.14.0 for the /r
flag of the internal replace. If someone knows how to avoid this please let me know.
我需要5.14.0作为内部替换的/ r标志。如果有人知道如何避免这种情况,请告诉我。