In my email program I use Tidy to clean up the HTML before I send out the emails. A problem is beginning to persist that if I send a mail fetching the html from a url on the web there may exist some javascript in the document.
在我的电子邮件程序中,我使用Tidy清理HTML,然后再发送电子邮件。一个问题开始持续,如果我发送一个从web上的url获取html的邮件,文档中可能存在一些javascript。
I want to clean up this html document even more by stripping out all javascript, embedded, referenced and in any form so that the mail exist only of html.
我想通过去掉所有的javascript、嵌入的、引用的和任何形式的html文档来清理这个html文档,以便邮件只存在于html中。
I want to use php's preg_replace()
to strip out all javascript from a mail and I need some help with the best regex because it's not my strongest point i must confess.
我想使用php的preg_replace()从邮件中删除所有javascript,我需要使用最好的regex,因为这不是我必须承认的最优点。
5 个解决方案
#2
4
You can use strip_tags
, passing in the tags you wish to allow (whitelist) as the second parameter, but that will not remove inline JS - which might be present in onclick properties and such.
您可以使用strip_tags,将您希望允许的标记(whitelist)作为第二个参数传递给它,但这不会删除内联JS——这可能存在于onclick属性等中。
echo strip_tags($html, '<p><a><small>');
#3
2
Look at Create a regex to strip javascript from Html article. And Part 2.
查看如何创建一个regex来从Html文章中删除javascript。和第2部分。
#4
2
There's no guarantee with this(as below) but I tried to make my light weight solution because html purifier (http://htmlpurifier.org) is a few huge for my tiny goal. My goal is to preventing XSS and nothing more so the result for XSS attempts will be a lot of dirty things for this code BUT I think it will be SAFE :
这并没有保证(如下所示),但是我尝试着让我的轻量级解决方案,因为html净化器(http://htmlpurifier.org)对于我的小目标来说是巨大的。我的目标是防止XSS的出现,因此XSS尝试的结果将会是很多脏东西,但我认为它是安全的:
<?
//href="javascript:
//style="....expression
//style="....behavior
//<script
//on*="
$str = '
asd
<a STyLE="asd; expression" hRef=" javascript:" onx="asd">asd</a>
asd
<code><a href="javascript:">asd</a></code>
<scr<script></script>ipt ... >asd</script>
<a style="hey:good boy;" href="javascript:">asd</a>';
function stripteaser($str, $StripHTMLTags = true, $AllowableTags = NULL) {
$str = explode('<code>', $str);
$codes = array();
if (count($str) > 1) {
foreach ($str as $idx => $val) {
$val = explode('</code>', $val);
if (count($val) > 1) {
$uid = md5(uniqid(mt_rand(), true));
$codes[$uid] = htmlentities(array_shift($val), ENT_QUOTES, 'UTF-8');
$str[$idx] = "##$uid##" . implode('', $val);
}
}
}
$str = implode('', $str);
while (stripos($str, '<script') !== false) {
$str = str_ireplace('<script', '<script', $str);
}
$rptjob = function(&$str, $regexp) {
while (preg_match($regexp, $str, $matches)) {
$str = str_ireplace($matches[0], htmlentities($matches[0], ENT_QUOTES, 'UTF-8'), $str);
}
};
$rptjob($str, '/href[\s\n\t]*=[\s\n\t]*[\"\'][\s\n\t]*(javascript:|data:)/i'); //href = "javascript:
$rptjob($str, '/style[\s\n\t]*=[\s\n\t]*[\"][^\"]*expression/i'); //style = "...expression
$rptjob($str, '/style[\s\n\t]*=[\s\n\t]*[\'][^\']*expression/i'); //style = '...expression
$rptjob($str, '/style[\s\n\t]*=[\s\n\t]*[\"][^\"]*behavior/i'); //style = "...behavior
$rptjob($str, '/style[\s\n\t]*=[\s\n\t]*[\'][^\']*behavior/i'); //style = '...behavior
$rptjob($str, '/on\w+[\s\n\t]*=[\s\n\t]*[\"\']/i'); //onasd = "
if ($StripHTMLTags)
$str = strip_tags($str, $AllowableTags);
foreach ($codes as $idx => $code) {
$str = str_replace("##$idx##", $code, $str);
}
return $str;
}
echo stripteaser($str);
exit;
?>
:D Dirty code for this moon at home and ... However it's not a good job (a lot of while conditions take a few CPU time) but it's better than another huge component like html purifier for my tiny goal.
:i’我在家里用的是这个月亮的脏代码。然而,这并不是一项很好的工作(很多情况下需要花费一些CPU时间),但是对于我的小目标来说,它比另一个巨大的组件(比如html净化器)要好。
RESULT WILL BE:
结果将是:
asd
<a STyLE="asd; expression" hRef=" javascript:" onx="asd">asd</a>
asd
<a href="javascript:">asd</a>
<scri<script></script>pt ... >asd</script>
<a style="hey:good boy;" href="javascript:">asd</a>
I have no experience to css expressions but I know about behavior using for JS VML in IE for curved corners so can be dangerous. AND FINALLY THERE IS NO AND NO GUARANTEE.
我对css表达式没有经验,但我知道在IE中使用JS VML处理弧线角的行为是危险的。最后,没有也没有保证。
I hope it can be useful for some friend ;)
我希望它对某些朋友有用;
#5
0
I used this one:
我用这个:
//remove js,css,head.....
static function cleanElements($html){
$search = array (
"'<script[^>]*?>.*?</script>'si", //remove js
"'<style[^>]*?>.*?</style>'si", //remove css
"'<head[^>]*?>.*?</head>'si", //remove head
"'<link[^>]*?>.*?</link>'si", //remove link
"'<object[^>]*?>.*?</object>'si"
);
$replace = array (
"",
"",
"",
"",
""
);
return preg_replace ($search, $replace, $html);
}
http://allenprogram.blogspot.pt/2012/04/php-remove-js-css-head-obj-elements.html
http://allenprogram.blogspot.pt/2012/04/php-remove-js-css-head-obj-elements.html
Removes all tags, scripts and styles, except body and html, so after using it, i use strip_tags.
除去主体和html之外的所有标记、脚本和样式,因此在使用之后,我使用strip_tags。
#1
#2
4
You can use strip_tags
, passing in the tags you wish to allow (whitelist) as the second parameter, but that will not remove inline JS - which might be present in onclick properties and such.
您可以使用strip_tags,将您希望允许的标记(whitelist)作为第二个参数传递给它,但这不会删除内联JS——这可能存在于onclick属性等中。
echo strip_tags($html, '<p><a><small>');
#3
2
Look at Create a regex to strip javascript from Html article. And Part 2.
查看如何创建一个regex来从Html文章中删除javascript。和第2部分。
#4
2
There's no guarantee with this(as below) but I tried to make my light weight solution because html purifier (http://htmlpurifier.org) is a few huge for my tiny goal. My goal is to preventing XSS and nothing more so the result for XSS attempts will be a lot of dirty things for this code BUT I think it will be SAFE :
这并没有保证(如下所示),但是我尝试着让我的轻量级解决方案,因为html净化器(http://htmlpurifier.org)对于我的小目标来说是巨大的。我的目标是防止XSS的出现,因此XSS尝试的结果将会是很多脏东西,但我认为它是安全的:
<?
//href="javascript:
//style="....expression
//style="....behavior
//<script
//on*="
$str = '
asd
<a STyLE="asd; expression" hRef=" javascript:" onx="asd">asd</a>
asd
<code><a href="javascript:">asd</a></code>
<scr<script></script>ipt ... >asd</script>
<a style="hey:good boy;" href="javascript:">asd</a>';
function stripteaser($str, $StripHTMLTags = true, $AllowableTags = NULL) {
$str = explode('<code>', $str);
$codes = array();
if (count($str) > 1) {
foreach ($str as $idx => $val) {
$val = explode('</code>', $val);
if (count($val) > 1) {
$uid = md5(uniqid(mt_rand(), true));
$codes[$uid] = htmlentities(array_shift($val), ENT_QUOTES, 'UTF-8');
$str[$idx] = "##$uid##" . implode('', $val);
}
}
}
$str = implode('', $str);
while (stripos($str, '<script') !== false) {
$str = str_ireplace('<script', '<script', $str);
}
$rptjob = function(&$str, $regexp) {
while (preg_match($regexp, $str, $matches)) {
$str = str_ireplace($matches[0], htmlentities($matches[0], ENT_QUOTES, 'UTF-8'), $str);
}
};
$rptjob($str, '/href[\s\n\t]*=[\s\n\t]*[\"\'][\s\n\t]*(javascript:|data:)/i'); //href = "javascript:
$rptjob($str, '/style[\s\n\t]*=[\s\n\t]*[\"][^\"]*expression/i'); //style = "...expression
$rptjob($str, '/style[\s\n\t]*=[\s\n\t]*[\'][^\']*expression/i'); //style = '...expression
$rptjob($str, '/style[\s\n\t]*=[\s\n\t]*[\"][^\"]*behavior/i'); //style = "...behavior
$rptjob($str, '/style[\s\n\t]*=[\s\n\t]*[\'][^\']*behavior/i'); //style = '...behavior
$rptjob($str, '/on\w+[\s\n\t]*=[\s\n\t]*[\"\']/i'); //onasd = "
if ($StripHTMLTags)
$str = strip_tags($str, $AllowableTags);
foreach ($codes as $idx => $code) {
$str = str_replace("##$idx##", $code, $str);
}
return $str;
}
echo stripteaser($str);
exit;
?>
:D Dirty code for this moon at home and ... However it's not a good job (a lot of while conditions take a few CPU time) but it's better than another huge component like html purifier for my tiny goal.
:i’我在家里用的是这个月亮的脏代码。然而,这并不是一项很好的工作(很多情况下需要花费一些CPU时间),但是对于我的小目标来说,它比另一个巨大的组件(比如html净化器)要好。
RESULT WILL BE:
结果将是:
asd
<a STyLE="asd; expression" hRef=" javascript:" onx="asd">asd</a>
asd
<a href="javascript:">asd</a>
<scri<script></script>pt ... >asd</script>
<a style="hey:good boy;" href="javascript:">asd</a>
I have no experience to css expressions but I know about behavior using for JS VML in IE for curved corners so can be dangerous. AND FINALLY THERE IS NO AND NO GUARANTEE.
我对css表达式没有经验,但我知道在IE中使用JS VML处理弧线角的行为是危险的。最后,没有也没有保证。
I hope it can be useful for some friend ;)
我希望它对某些朋友有用;
#5
0
I used this one:
我用这个:
//remove js,css,head.....
static function cleanElements($html){
$search = array (
"'<script[^>]*?>.*?</script>'si", //remove js
"'<style[^>]*?>.*?</style>'si", //remove css
"'<head[^>]*?>.*?</head>'si", //remove head
"'<link[^>]*?>.*?</link>'si", //remove link
"'<object[^>]*?>.*?</object>'si"
);
$replace = array (
"",
"",
"",
"",
""
);
return preg_replace ($search, $replace, $html);
}
http://allenprogram.blogspot.pt/2012/04/php-remove-js-css-head-obj-elements.html
http://allenprogram.blogspot.pt/2012/04/php-remove-js-css-head-obj-elements.html
Removes all tags, scripts and styles, except body and html, so after using it, i use strip_tags.
除去主体和html之外的所有标记、脚本和样式,因此在使用之后,我使用strip_tags。