I need to perform some modifications to PHP files (PHTML files to be exact, but they are still valid PHP files), from a Bash script. My original thought was to use sed or similar utility with regex, but reading some of the replies here for other HTML parsing questions it seems that there might be a better solution.
我需要从Bash脚本对PHP文件(确切地说,PHTML文件,但它们仍然是有效的PHP文件)执行一些修改。我最初的想法是使用regex使用sed或类似的实用程序,但是在这里阅读一些其他HTML解析问题的回复,似乎有更好的解决方案。
The problem I was facing with the regex was a lack of support for detecting if the string I wanted to match: (src|href|action)=["']/
was in <?php ?>
tags or not, so that I could then either perform string concatenation if the match was in PHP tags, or add in new PHP tags should it not be. For example:
我在使用regex时遇到的问题是不支持检测我想匹配的字符串:(src|href|action)=["']/在 标记,这样我就可以执行字符串连接(如果匹配是在php标记中),或者在不使用的情况下添加新的php标记。例如:
(1) <img id="icon-loader-small" src="/css/images/loader-small.gif" style="vertical-align:middle; display:none;"/>
(2) <li><span class="name"><?php echo $this->loggedInAs()?></span> | <a href="/Login/logout">Logout</a></li>
(3) <?php echo ($watched_dir->getExistsFlag())?"":"<span class='ui-icon-alert'><img src='/css/images/warning-icon.png'></span>"?><span><?php echo $watched_dir->getDirectory();?></span></span><span class="ui-icon ui-icon-close"></span>
(EDIT: 4) <form method="post" action="/Preference/stream-setting" enctype="application/x-www-form-urlencoded" onsubmit="return confirm('<?php echo $this->confirm_pypo_restart_text ?>');">
In (1) there a src="/css
, and as it is not in PHP tags I want that to become src="<?php echo $baseUrl?>/css
. In (2), there is a PHP tag but it is not around the href="/Login
, so it also becomes href="<?php echo $baseUrl?>/Login
. Unfortunately, (3) has src='/css
but inside the PHP tags (it is an echoed string). It is also quoted by "
in the PHP code, so the modification needs to pick up on that too. The final result would look something like: src='".$baseUrl."/css
.
在(1)中有一个src="/css,由于它不在PHP标签中,我希望它变成src=" / css。在(2)中,有一个PHP标记,但是它不在href="/Login附近,所以它也变成了href=" /登录。不幸的是,(3)有src='/css,但是在PHP标记中(它是一个回显字符串)。PHP代码中也引用了这句话,因此修改也需要注意这一点。最终的结果将会是:src='".$baseUrl."/css。
All the other modifications to my HTML and PHP files have been done using a regex (I know, I know...). If regexes could support matching everything except a certain pattern, like [^(<\?php)(\?>)]*
then I would be flying through this part. Unfortunately it seems that this is Type 2 grammar territory. So - what should I use? Ideally it needs to be installed by default with the GNU suite, but other tools like PHP itself or other interpreters are fine too, just not preferred. Of course, if someone could structure a regex that would work on the above examples, then that would be excellent.
所有对HTML和PHP文件的其他修改都是使用regex(我知道,我知道…)。如果匹配regex可以支持除了一定的模式,像[^(< \ ? php)(\ ? >)]*然后我将飞越这一部分。不幸的是,这似乎是第二类语法领域。那么-我应该用什么?理想情况下,它需要默认安装在GNU套件中,但是其他工具如PHP本身或其他解释器也可以,只是不推荐使用。当然,如果有人可以构造一个regex来处理上面的示例,那就太棒了。
EDIT: (4) is the nasty match, where most regexes will fail.
编辑:(4)是讨厌的匹配,大多数regex将会失败。
1 个解决方案
#1
3
The way I solved this problem was by separating my file into sections that were encapsulated by . The script kept track of the 'context' it was currently in - by default set to html but switching to php when it hit those tags. An operation (not necessarily a regex) then performs on that section, which is then appended to the output buffer. When the file is completely processed the output buffer is written back into the file.
我解决这个问题的方法是将我的文件分割成被封装的部分。该脚本跟踪当前的“上下文”——默认设置为html,但在碰到这些标签时切换到php。然后,操作(不一定是regex)对该部分执行,然后将该部分附加到输出缓冲区。当文件被完全处理后,输出缓冲区被写回文件。
I attempted to do this with sed, but I faced the problem of not being able to control where newlines would be printed. The context based logic was also hardcoded meaning it would be tedious to add in a new context, like ASP.NET support for example. My current solution is written in Perl and mitigates both problems, although I am having a bit of trouble getting my regex to actually do something, but this might just be me coding my regex incorrectly.
我试图用sed实现这一点,但我面临的问题是无法控制在哪里打印新行。基于上下文的逻辑也被硬编码,这意味着在新的上下文(如ASP)中添加内容会很麻烦。网络支持的例子。我的当前解决方案是用Perl编写的,可以缓解这两个问题,尽管我在让regex实际执行某些操作时遇到了一些麻烦,但这可能只是我错误地编写了regex。
Script is as follows:
脚本如下:
#!/usr/bin/perl -w
use strict;
#Prototypes
sub readFile(;\$);
sub writeFile($);
#Constants
my $file;
my $outputBuffer;
my $holdBuffer;
# Regexes should have s and g modifiers
# Pattern is in $_
my %contexts = (
html => {
operation => ''
},
php => {
openTag => '<\?php |<\? ', closeTag => '\?>', operation => ''
},
js => {
openTag => '<script>', closeTag => '<\/script>', operation => ''
}
);
my $currentContext = 'html';
my $combinedOpenTags;
#Initialisation
unshift(@ARGV, '-') unless @ARGV;
foreach my $key (keys %contexts) {
if($contexts{$key}{openTag}) {
if($combinedOpenTags) {
$combinedOpenTags .= "|".$contexts{$key}{openTag};
} else {
$combinedOpenTags = $contexts{$key}{openTag};
}
}
}
#Main loop
while(readFile($holdBuffer)) {
$outputBuffer = '';
while($holdBuffer) {
$currentContext = "html";
foreach my $key (keys %contexts) {
if(!$contexts{$key}{openTag}) {
next;
}
if($holdBuffer =~ /\A($contexts{$key}{openTag})/) {
$currentContext = $key;
last;
}
}
if($currentContext eq "html") {
$holdBuffer =~ s/\A(.*?)($combinedOpenTags|\z)/$2/s;
$_ = $1;
} else {
$holdBuffer =~ s/\A(.*?$contexts{$currentContext}{closeTag}|\z)//s;
$_ = $1;
}
eval($contexts{$currentContext}{operation});
$outputBuffer .= $_;
}
writeFile($outputBuffer);
}
# readFile: read file into $_
sub readFile(;\$) {
my $argref = @_ ? shift() : \$_;
return 0 unless @ARGV;
$file = shift(@ARGV);
open(WORKFILE, "<$file") || die("$0: can't open $file for reading ($!)\n");
local $/;
$$argref = <WORKFILE>;
close(WORKFILE);
return 1;
}
# writeFile: write $_[0] to file
sub writeFile($) {
open(WORKFILE, ">$file") || die("$0: can't open $file for writing ($!)\n");
print WORKFILE $_[0];
close(WORKFILE);
}
I hope that this can be used and modified by others to suit their needs.
我希望这可以被其他人使用和修改,以满足他们的需要。
#1
3
The way I solved this problem was by separating my file into sections that were encapsulated by . The script kept track of the 'context' it was currently in - by default set to html but switching to php when it hit those tags. An operation (not necessarily a regex) then performs on that section, which is then appended to the output buffer. When the file is completely processed the output buffer is written back into the file.
我解决这个问题的方法是将我的文件分割成被封装的部分。该脚本跟踪当前的“上下文”——默认设置为html,但在碰到这些标签时切换到php。然后,操作(不一定是regex)对该部分执行,然后将该部分附加到输出缓冲区。当文件被完全处理后,输出缓冲区被写回文件。
I attempted to do this with sed, but I faced the problem of not being able to control where newlines would be printed. The context based logic was also hardcoded meaning it would be tedious to add in a new context, like ASP.NET support for example. My current solution is written in Perl and mitigates both problems, although I am having a bit of trouble getting my regex to actually do something, but this might just be me coding my regex incorrectly.
我试图用sed实现这一点,但我面临的问题是无法控制在哪里打印新行。基于上下文的逻辑也被硬编码,这意味着在新的上下文(如ASP)中添加内容会很麻烦。网络支持的例子。我的当前解决方案是用Perl编写的,可以缓解这两个问题,尽管我在让regex实际执行某些操作时遇到了一些麻烦,但这可能只是我错误地编写了regex。
Script is as follows:
脚本如下:
#!/usr/bin/perl -w
use strict;
#Prototypes
sub readFile(;\$);
sub writeFile($);
#Constants
my $file;
my $outputBuffer;
my $holdBuffer;
# Regexes should have s and g modifiers
# Pattern is in $_
my %contexts = (
html => {
operation => ''
},
php => {
openTag => '<\?php |<\? ', closeTag => '\?>', operation => ''
},
js => {
openTag => '<script>', closeTag => '<\/script>', operation => ''
}
);
my $currentContext = 'html';
my $combinedOpenTags;
#Initialisation
unshift(@ARGV, '-') unless @ARGV;
foreach my $key (keys %contexts) {
if($contexts{$key}{openTag}) {
if($combinedOpenTags) {
$combinedOpenTags .= "|".$contexts{$key}{openTag};
} else {
$combinedOpenTags = $contexts{$key}{openTag};
}
}
}
#Main loop
while(readFile($holdBuffer)) {
$outputBuffer = '';
while($holdBuffer) {
$currentContext = "html";
foreach my $key (keys %contexts) {
if(!$contexts{$key}{openTag}) {
next;
}
if($holdBuffer =~ /\A($contexts{$key}{openTag})/) {
$currentContext = $key;
last;
}
}
if($currentContext eq "html") {
$holdBuffer =~ s/\A(.*?)($combinedOpenTags|\z)/$2/s;
$_ = $1;
} else {
$holdBuffer =~ s/\A(.*?$contexts{$currentContext}{closeTag}|\z)//s;
$_ = $1;
}
eval($contexts{$currentContext}{operation});
$outputBuffer .= $_;
}
writeFile($outputBuffer);
}
# readFile: read file into $_
sub readFile(;\$) {
my $argref = @_ ? shift() : \$_;
return 0 unless @ARGV;
$file = shift(@ARGV);
open(WORKFILE, "<$file") || die("$0: can't open $file for reading ($!)\n");
local $/;
$$argref = <WORKFILE>;
close(WORKFILE);
return 1;
}
# writeFile: write $_[0] to file
sub writeFile($) {
open(WORKFILE, ">$file") || die("$0: can't open $file for writing ($!)\n");
print WORKFILE $_[0];
close(WORKFILE);
}
I hope that this can be used and modified by others to suit their needs.
我希望这可以被其他人使用和修改,以满足他们的需要。