如何使用Perl选择性地修改HTML文档中脚本标记的src属性?

时间:2022-12-06 10:59:21

I need to write a regular expression in Perl that will prefix all srcs with [perl]texthere[/perl], like such:

我需要在Perl中编写一个正则表达式,它将在所有的srcs中加上[Perl]texthere[/ Perl],例如:

 <script src="[perl]texthere[/perl]/text"></script> 

Any help? Thanks!

任何帮助吗?谢谢!

4 个解决方案

#1


2  

Use a proper parser such as HTML::TokeParser::Simple:

使用合适的解析器,如HTML::TokeParser::

#!/usr/bin/env perl

use strict; use warnings;
use HTML::TokeParser::Simple;

my $parser = HTML::TokeParser::Simple->new(handle => \*DATA);

while (my $token = $parser->get_token('script')) {
    if ($token->is_tag('script')
            and defined(my $src = $token->get_attr('src'))) {
            $src =~ m{^https?://}
                or  $token->set_attr('src', "[perl]texthere[/perl]$src");
    }
    print $token->as_is;
}

__DATA__
<script src="/js/text.text.js/"></script>

And at the same time, ignore scrs that begin with http, as such:

 <script src="https://websitewebsitewebsite"></script>

Output:

输出:

<script src="[perl]texthere[/perl]/js/text.text.js/"></script>

And at the same time, ignore scrs that begin with http, as such:

 <script src="https://websitewebsitewebsite"></script>

#2


1  

Use the negative lookahead pattern (on the third line below):

使用消极的前视模式(在下面的第三行):

s{
  (<script\s+src\s*=\s*[\'"])
  (?!https?://)
}{$1\[perl]texthere[/perl]}gsx;

#3


0  

I am able to match any src=" except for http via: ^<script src="(?!(https:)).*$ Let me know if there are any issues and I'll fix it.

我能匹配任何src = "除了http通过:^ < script src = "(? !(https:))。如果有什么问题,请告诉我,我来修理。

Try using: this website as a regex tutorial and this website to test regex.

尝试使用:这个网站作为一个regex教程和这个网站来测试regex。

#4


0  

This should work:

这应该工作:

 s{(?<=src=)(?!"https?)}{[perl]texthere[/perl]}

Test:

测试:

 my @olnk = ('<script src=/js/text.text.js/"></script>',
             '<script src="https://websitewebsitewebsite"></script>' );
 my @nlnk = map {
                  s{(?<=src=)(?!"https?)}{[perl]texthere[/perl]}; $_
                } @olnk;

Result:

结果:

 print join "\n", @nlnk;

 <script src=[perl]texthere[/perl]/js/text.text.js/"></script>
 <script src="https://websitewebsitewebsite"></script>

Regards

问候

rbo

#1


2  

Use a proper parser such as HTML::TokeParser::Simple:

使用合适的解析器,如HTML::TokeParser::

#!/usr/bin/env perl

use strict; use warnings;
use HTML::TokeParser::Simple;

my $parser = HTML::TokeParser::Simple->new(handle => \*DATA);

while (my $token = $parser->get_token('script')) {
    if ($token->is_tag('script')
            and defined(my $src = $token->get_attr('src'))) {
            $src =~ m{^https?://}
                or  $token->set_attr('src', "[perl]texthere[/perl]$src");
    }
    print $token->as_is;
}

__DATA__
<script src="/js/text.text.js/"></script>

And at the same time, ignore scrs that begin with http, as such:

 <script src="https://websitewebsitewebsite"></script>

Output:

输出:

<script src="[perl]texthere[/perl]/js/text.text.js/"></script>

And at the same time, ignore scrs that begin with http, as such:

 <script src="https://websitewebsitewebsite"></script>

#2


1  

Use the negative lookahead pattern (on the third line below):

使用消极的前视模式(在下面的第三行):

s{
  (<script\s+src\s*=\s*[\'"])
  (?!https?://)
}{$1\[perl]texthere[/perl]}gsx;

#3


0  

I am able to match any src=" except for http via: ^<script src="(?!(https:)).*$ Let me know if there are any issues and I'll fix it.

我能匹配任何src = "除了http通过:^ < script src = "(? !(https:))。如果有什么问题,请告诉我,我来修理。

Try using: this website as a regex tutorial and this website to test regex.

尝试使用:这个网站作为一个regex教程和这个网站来测试regex。

#4


0  

This should work:

这应该工作:

 s{(?<=src=)(?!"https?)}{[perl]texthere[/perl]}

Test:

测试:

 my @olnk = ('<script src=/js/text.text.js/"></script>',
             '<script src="https://websitewebsitewebsite"></script>' );
 my @nlnk = map {
                  s{(?<=src=)(?!"https?)}{[perl]texthere[/perl]}; $_
                } @olnk;

Result:

结果:

 print join "\n", @nlnk;

 <script src=[perl]texthere[/perl]/js/text.text.js/"></script>
 <script src="https://websitewebsitewebsite"></script>

Regards

问候

rbo