从html标签中删除所有属性。

时间:2021-12-29 00:19:25

i have this html code:

我有这个html代码:

<p style="padding:0px;">
<strong style="padding:0;margin:0;">hello</strong>
</p>

but it should become (for all possible html tags):

但是它应该变成(对于所有可能的html标签):

<p>
<strong>hello</strong>
</p>

9 个解决方案

#1


123  

Adapted from my answer on a similar question

改编自我对类似问题的回答

$text = '<p style="padding:0px;"><strong style="padding:0;margin:0;">hello</strong></p>';

echo preg_replace("/<([a-z][a-z0-9]*)[^>]*?(\/?)>/i",'<$1$2>', $text);

// <p><strong>hello</strong></p>

The RegExp broken down:

RegExp分解:

/              # Start Pattern
 <             # Match '<' at beginning of tags
 (             # Start Capture Group $1 - Tag Name
  [a-z]         # Match 'a' through 'z'
  [a-z0-9]*     # Match 'a' through 'z' or '0' through '9' zero or more times
 )             # End Capture Group
 [^>]*?        # Match anything other than '>', Zero or More times, not-greedy (wont eat the /)
 (\/?)         # Capture Group $2 - '/' if it is there
 >             # Match '>'
/i            # End Pattern - Case Insensitive

Add some quoting, and use the replacement text <$1$2> it should strip any text after the tagname until the end of tag /> or just >.

添加一些引用,并使用替换文本<$1$2>它应该在标签名之后删除任何文本,直到标签/>或>结束。

Please Note This isn't necessarily going to work on ALL input, as the Anti-HTML + RegExp will tell you. There are a few fallbacks, most notably <p style=">"> would end up <p>"> and a few other broken issues... I would recommend looking at Zend_Filter_StripTags as a more full proof tags/attributes filter in PHP

请注意,这并不一定适用于所有输入,因为反html + RegExp将告诉您。有一些失败,最明显的是

将结束

">,还有一些其他的问题……我建议使用Zend_Filter_StripTags作为PHP中更完整的证明标签/属性过滤器

#2


60  

Here is how to do it with native DOM:

以下是如何处理本地DOM的方法:

$dom = new DOMDocument;                 // init new DOMDocument
$dom->loadHTML($html);                  // load HTML into it
$xpath = new DOMXPath($dom);            // create a new XPath
$nodes = $xpath->query('//*[@style]');  // Find elements with a style attribute
foreach ($nodes as $node) {              // Iterate over found elements
    $node->removeAttribute('style');    // Remove style attribute
}
echo $dom->saveHTML();                  // output cleaned HTML

If you want to remove all possible attributes from all possible tags, do

如果您想从所有可能的标记中删除所有可能的属性,请这样做

$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//@*');
foreach ($nodes as $node) {
    $node->parentNode->removeAttribute($node->nodeName);
}
echo $dom->saveHTML();

#3


9  

I would avoid using regex as HTML is not a regular language and instead use a html parser like Simple HTML DOM

我将避免使用regex作为HTML不是一种常规语言,而是使用HTML解析器,如简单的HTML DOM

You can get a list of attributes that the object has by using attr. For example:

您可以通过使用attr获得对象的属性列表。例如:

$html = str_get_html('<div id="hello">World</div>');
var_dump($html->find("div", 0)->attr); /
/*
array(1) {
  ["id"]=>
  string(5) "hello"
}
*/

foreach ( $html->find("div", 0)->attr as &$value ){
    $value = null;
}

print $html
//<div>World</div>

#4


4  

$html_text = '<p>Hello <b onclick="alert(123)" style="color: red">world</b>. <i>Its beautiful day.</i></p>';
$strip_text = strip_tags($html_text, '<b>');
$result = preg_replace('/<(\w+)[^>]*>/', '<$1>', $strip_text);
echo $result;

// Result
string 'Hello <b>world</b>. Its beautiful day.'

#5


1  

Regex's are too fragile for HTML parsing. In your example, the following would strip out your attributes:

Regex对HTML解析来说太脆弱了。在您的示例中,以下内容将删除您的属性:

echo preg_replace(
    "|<(\w+)([^>/]+)?|",
    "<$1",
    "<p style=\"padding:0px;\">\n<strong style=\"padding:0;margin:0;\">hello</strong>\n</p>\n"
);

Update

更新

Make to second capture optional and do not strip '/' from closing tags:

从关闭标签中获取可选的和不带“/”的选项:

|<(\w+)([^>]+)| to |<(\w+)([^>/]+)?|

| <(\ w +)([^ >]+)| | <(\ w +)([^ > /]+)? |

Demonstrate this regular expression works:

演示这个正则表达式:

$ phpsh
Starting php
type 'h' or 'help' to see instructions & features
php> $html = '<p style="padding:0px;"><strong style="padding:0;margin:0;">hello<br/></strong></p>';
php> echo preg_replace("|<(\w+)([^>/]+)?|", "<$1", $html);
<p><strong>hello</strong><br/></p>
php> $html = '<strong>hello</strong>';
php> echo preg_replace("|<(\w+)([^>/]+)?|", "<$1", $html);
<strong>hello</strong>

#6


1  

Hope this helps. It may not be the fastest way to do it, especially for large blocks of html. If anyone has any suggestions as to make this faster, let me know.

希望这个有帮助。这可能不是最快的方法,特别是对于大型的html块。如果有人有什么建议可以让这个更快,请告诉我。

function StringEx($str, $start, $end)
{ 
    $str_low = strtolower($str);
    $pos_start = strpos($str_low, $start);
    $pos_end = strpos($str_low, $end, ($pos_start + strlen($start)));
    if($pos_end==0) return false;
    if ( ($pos_start !== false) && ($pos_end !== false) )
    {  
        $pos1 = $pos_start + strlen($start);
        $pos2 = $pos_end - $pos1;
        $RData = substr($str, $pos1, $pos2);
        if($RData=='') { return true; }
        return $RData;
    } 
    return false;
}

$S = '<'; $E = '>'; while($RData=StringEx($DATA, $S, $E)) { if($RData==true) {$RData='';} $DATA = str_ireplace($S.$RData.$E, '||||||', $DATA); } $DATA = str_ireplace('||||||', $S.$E, $DATA);

#7


0  

To do SPECIFICALLY what andufo wants, it's simply:

要做andufo想做的事,很简单:

$html = preg_replace( "#(<[a-zA-Z0-9]+)[^\>]+>#", "\\1>", $html );

That is, he wants to strip anything but the tag name out of the opening tag. It won't work for self-closing tags of course.

也就是说,他想从开始标签中除去标签名之外的任何东西。当然,它对自闭标签不起作用。

#8


0  

<?php
$text = '<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>';
echo strip_tags($text);
echo "\n";

// Allow <p> and <a>
echo strip_tags($text, '<p><a>');
?>

#9


0  

Here's an easy way to get rid of attributes. It handles malformed html pretty well.

这里有一个简单的方法可以去掉属性。它可以很好地处理格式错误的html。

<?php
  $string = '<p style="padding:0px;">
    <strong style="padding:0;margin:0;">hello</strong>
    </p>';

  //get all html elements on a line by themselves
  $string_html_on_lines = str_replace (array("<",">"),array("\n<",">\n"),$string); 

  //find lines starting with a '<' and any letters or numbers upto the first space. throw everything after the space away.
  $string_attribute_free = preg_replace("/\n(<[\w123456]+)\s.+/i","\n$1>",$string_html_on_lines);

  echo $string_attribute_free;
?>

#1


123  

Adapted from my answer on a similar question

改编自我对类似问题的回答

$text = '<p style="padding:0px;"><strong style="padding:0;margin:0;">hello</strong></p>';

echo preg_replace("/<([a-z][a-z0-9]*)[^>]*?(\/?)>/i",'<$1$2>', $text);

// <p><strong>hello</strong></p>

The RegExp broken down:

RegExp分解:

/              # Start Pattern
 <             # Match '<' at beginning of tags
 (             # Start Capture Group $1 - Tag Name
  [a-z]         # Match 'a' through 'z'
  [a-z0-9]*     # Match 'a' through 'z' or '0' through '9' zero or more times
 )             # End Capture Group
 [^>]*?        # Match anything other than '>', Zero or More times, not-greedy (wont eat the /)
 (\/?)         # Capture Group $2 - '/' if it is there
 >             # Match '>'
/i            # End Pattern - Case Insensitive

Add some quoting, and use the replacement text <$1$2> it should strip any text after the tagname until the end of tag /> or just >.

添加一些引用,并使用替换文本<$1$2>它应该在标签名之后删除任何文本,直到标签/>或>结束。

Please Note This isn't necessarily going to work on ALL input, as the Anti-HTML + RegExp will tell you. There are a few fallbacks, most notably <p style=">"> would end up <p>"> and a few other broken issues... I would recommend looking at Zend_Filter_StripTags as a more full proof tags/attributes filter in PHP

请注意,这并不一定适用于所有输入,因为反html + RegExp将告诉您。有一些失败,最明显的是

将结束

">,还有一些其他的问题……我建议使用Zend_Filter_StripTags作为PHP中更完整的证明标签/属性过滤器

#2


60  

Here is how to do it with native DOM:

以下是如何处理本地DOM的方法:

$dom = new DOMDocument;                 // init new DOMDocument
$dom->loadHTML($html);                  // load HTML into it
$xpath = new DOMXPath($dom);            // create a new XPath
$nodes = $xpath->query('//*[@style]');  // Find elements with a style attribute
foreach ($nodes as $node) {              // Iterate over found elements
    $node->removeAttribute('style');    // Remove style attribute
}
echo $dom->saveHTML();                  // output cleaned HTML

If you want to remove all possible attributes from all possible tags, do

如果您想从所有可能的标记中删除所有可能的属性,请这样做

$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//@*');
foreach ($nodes as $node) {
    $node->parentNode->removeAttribute($node->nodeName);
}
echo $dom->saveHTML();

#3


9  

I would avoid using regex as HTML is not a regular language and instead use a html parser like Simple HTML DOM

我将避免使用regex作为HTML不是一种常规语言,而是使用HTML解析器,如简单的HTML DOM

You can get a list of attributes that the object has by using attr. For example:

您可以通过使用attr获得对象的属性列表。例如:

$html = str_get_html('<div id="hello">World</div>');
var_dump($html->find("div", 0)->attr); /
/*
array(1) {
  ["id"]=>
  string(5) "hello"
}
*/

foreach ( $html->find("div", 0)->attr as &$value ){
    $value = null;
}

print $html
//<div>World</div>

#4


4  

$html_text = '<p>Hello <b onclick="alert(123)" style="color: red">world</b>. <i>Its beautiful day.</i></p>';
$strip_text = strip_tags($html_text, '<b>');
$result = preg_replace('/<(\w+)[^>]*>/', '<$1>', $strip_text);
echo $result;

// Result
string 'Hello <b>world</b>. Its beautiful day.'

#5


1  

Regex's are too fragile for HTML parsing. In your example, the following would strip out your attributes:

Regex对HTML解析来说太脆弱了。在您的示例中,以下内容将删除您的属性:

echo preg_replace(
    "|<(\w+)([^>/]+)?|",
    "<$1",
    "<p style=\"padding:0px;\">\n<strong style=\"padding:0;margin:0;\">hello</strong>\n</p>\n"
);

Update

更新

Make to second capture optional and do not strip '/' from closing tags:

从关闭标签中获取可选的和不带“/”的选项:

|<(\w+)([^>]+)| to |<(\w+)([^>/]+)?|

| <(\ w +)([^ >]+)| | <(\ w +)([^ > /]+)? |

Demonstrate this regular expression works:

演示这个正则表达式:

$ phpsh
Starting php
type 'h' or 'help' to see instructions & features
php> $html = '<p style="padding:0px;"><strong style="padding:0;margin:0;">hello<br/></strong></p>';
php> echo preg_replace("|<(\w+)([^>/]+)?|", "<$1", $html);
<p><strong>hello</strong><br/></p>
php> $html = '<strong>hello</strong>';
php> echo preg_replace("|<(\w+)([^>/]+)?|", "<$1", $html);
<strong>hello</strong>

#6


1  

Hope this helps. It may not be the fastest way to do it, especially for large blocks of html. If anyone has any suggestions as to make this faster, let me know.

希望这个有帮助。这可能不是最快的方法,特别是对于大型的html块。如果有人有什么建议可以让这个更快,请告诉我。

function StringEx($str, $start, $end)
{ 
    $str_low = strtolower($str);
    $pos_start = strpos($str_low, $start);
    $pos_end = strpos($str_low, $end, ($pos_start + strlen($start)));
    if($pos_end==0) return false;
    if ( ($pos_start !== false) && ($pos_end !== false) )
    {  
        $pos1 = $pos_start + strlen($start);
        $pos2 = $pos_end - $pos1;
        $RData = substr($str, $pos1, $pos2);
        if($RData=='') { return true; }
        return $RData;
    } 
    return false;
}

$S = '<'; $E = '>'; while($RData=StringEx($DATA, $S, $E)) { if($RData==true) {$RData='';} $DATA = str_ireplace($S.$RData.$E, '||||||', $DATA); } $DATA = str_ireplace('||||||', $S.$E, $DATA);

#7


0  

To do SPECIFICALLY what andufo wants, it's simply:

要做andufo想做的事,很简单:

$html = preg_replace( "#(<[a-zA-Z0-9]+)[^\>]+>#", "\\1>", $html );

That is, he wants to strip anything but the tag name out of the opening tag. It won't work for self-closing tags of course.

也就是说,他想从开始标签中除去标签名之外的任何东西。当然,它对自闭标签不起作用。

#8


0  

<?php
$text = '<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>';
echo strip_tags($text);
echo "\n";

// Allow <p> and <a>
echo strip_tags($text, '<p><a>');
?>

#9


0  

Here's an easy way to get rid of attributes. It handles malformed html pretty well.

这里有一个简单的方法可以去掉属性。它可以很好地处理格式错误的html。

<?php
  $string = '<p style="padding:0px;">
    <strong style="padding:0;margin:0;">hello</strong>
    </p>';

  //get all html elements on a line by themselves
  $string_html_on_lines = str_replace (array("<",">"),array("\n<",">\n"),$string); 

  //find lines starting with a '<' and any letters or numbers upto the first space. throw everything after the space away.
  $string_attribute_free = preg_replace("/\n(<[\w123456]+)\s.+/i","\n$1>",$string_html_on_lines);

  echo $string_attribute_free;
?>