如何从字符串中删除HTML编码字符？

I have a string which contains some HTML encoded characters and I want to remove them:

我有一个包含一些HTML编码字符的字符串,我想删除它们:

"&lt;div&gt;Hi All,&lt;/div&gt;&lt;div class=\"paragraph_break\"&gt;&lt; /&gt;&lt;/div&gt;&lt;div&gt;Starting today we are initiating PoLS.&lt;/div&gt;&lt;div class=\"paragraph_break\"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Please use the following communication protocols:&lt;br /&gt;&lt;/div&gt;&lt;div&gt;1. Task Breakup and allocation - Gravity&lt;br /&gt;&lt;/div&gt;&lt;div&gt;2. All mail communications - BC messages&lt;br /&gt;&lt;/div&gt;&lt;div&gt;3. Reports on PoC / Spikes: Writeboard&lt;br /&gt;&lt;/div&gt;&lt;div&gt;4. Non story related tasks: BC To-Do&lt;br /&gt;&lt;/div&gt;&lt;div&gt;5. All UI and HTML will communicated to you through BC.&lt;br /&gt;&lt;/div&gt;&lt;div&gt;6. For File sharing, we'll be using Dropbox.&lt;br /&gt;&lt;/div&gt;&lt;div&gt;7. Use Skype for lighter and generic desicussions. However, in case you need any approvals, data for later reference, etc, then please use BC. PoLS conversation has been created on skype.&lt;/div&gt;&lt;div class=\"paragraph_break\"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;You'll have been given necessary accesses to all these portals. Please start using them judiciously.&lt;/div&gt;&lt;div class=\"paragraph_break\"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;All the best!&lt;/div&gt;&lt;div class=\"paragraph_break\"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Thanks,&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Saurav&lt;br /&gt;&lt;/div&gt;"

4 个解决方案

#1

What you want to do is doable many ways. Perhaps looking at why you might want to do that will help. Usually when I want to remove encoded HTML, I want to recover the contents of the HTML. Ruby has some modules that make it easy.

你想做的事情是多方面的。也许看看你为什么要这样做会有所帮助。通常当我想删除编码的HTML时,我想恢复HTML的内容。 Ruby有一些模块可以让它变得简单。

require 'cgi'
require 'nokogiri'

html = "&lt;div&gt;Hi All,&lt;/div&gt;&lt;div class=\"paragraph_break\"&gt;&lt; /&gt;&lt;/div&gt;&lt;div&gt;Starting today we are initiating PoLS.&lt;/div&gt;&lt;div class=\"paragraph_break\"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Please use the following communication protocols:&lt;br /&gt;&lt;/div&gt;&lt;div&gt;1. Task Breakup and allocation - Gravity&lt;br /&gt;&lt;/div&gt;&lt;div&gt;2. All mail communications - BC messages&lt;br /&gt;&lt;/div&gt;&lt;div&gt;3. Reports on PoC / Spikes: Writeboard&lt;br /&gt;&lt;/div&gt;&lt;div&gt;4. Non story related tasks: BC To-Do&lt;br /&gt;&lt;/div&gt;&lt;div&gt;5. All UI and HTML will communicated to you through BC.&lt;br /&gt;&lt;/div&gt;&lt;div&gt;6. For File sharing, we'll be using Dropbox.&lt;br /&gt;&lt;/div&gt;&lt;div&gt;7. Use Skype for lighter and generic desicussions. However, in case you need any approvals, data for later reference, etc, then please use BC. PoLS conversation has been created on skype.&lt;/div&gt;&lt;div class=\"paragraph_break\"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;You'll have been given necessary accesses to all these portals. Please start using them judiciously.&lt;/div&gt;&lt;div class=\"paragraph_break\"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;All the best!&lt;/div&gt;&lt;div class=\"paragraph_break\"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Thanks,&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Saurav&lt;br /&gt;&lt;/div&gt;"

puts CGI.unescapeHTML(html)

which outputs:

<div>Hi All,</div><div class="paragraph_break">< /></div><div>Starting today we are initiating PoLS.</div><div class="paragraph_break"><br /></div><div>Please use the following communication protocols:<br /></div><div>1. Task Breakup and allocation - Gravity<br /></div><div>2. All mail communications - BC messages<br /></div><div>3. Reports on PoC / Spikes: Writeboard<br /></div><div>4. Non story related tasks: BC To-Do<br /></div><div>5. All UI and HTML will communicated to you through BC.<br /></div><div>6. For File sharing, we'll be using Dropbox.<br /></div><div>7. Use Skype for lighter and generic desicussions. However, in case you need any approvals, data for later reference, etc, then please use BC. PoLS conversation has been created on skype.</div><div class="paragraph_break"><br /></div><div>You'll have been given necessary accesses to all these portals. Please start using them judiciously.</div><div class="paragraph_break"><br /></div><div>All the best!</div><div class="paragraph_break"><br /></div><div>Thanks,<br /></div><div>Saurav<br /></div>

If I want to take it a step farther and remove the tags, retrieving all the text:

如果我想更进一步并删除标签,检索所有文本:

puts Nokogiri::HTML(CGI.unescapeHTML(html)).content

Will output:

Hi All,Starting today we are initiating PoLS.Please use the following communication protocols:1. Task Breakup and allocation - Gravity2. All mail communications - BC messages3. Reports on PoC / Spikes: Writeboard4. Non story related tasks: BC To-Do5. All UI and HTML will communicated to you through BC.6. For File sharing, we'll be using Dropbox.7. Use Skype for lighter and generic desicussions. However, in case you need any approvals, data for later reference, etc, then please use BC. PoLS conversation has been created on skype.You'll have been given necessary accesses to all these portals. Please start using them judiciously.All the best!Thanks,Saurav

Which is where I usually want to get when I see that sort of string.

当我看到那种字符串时,我通常想要得到的地方。

Ruby's CGI makes encoding and decoding HTML easy. The Nokogiri gem makes it easy to remove the tags.

Ruby的CGI使编码和解码HTML变得容易。 Nokogiri宝石可以轻松删除标签。

#2

I would suggest:

我会建议:

clean = str.gsub /&lt;.+?&gt;/, ''

#3

I think the easiest way to do this is, Assuming you want to use the html in the string.

我认为最简单的方法是,假设你想在字符串中使用html。

raw CGI.unescapeHTML('The string you want to manipulate')

#4

-1

If you have assigned that string to a variable s, is this the result you want?

如果您已将该字符串分配给变量s,这是您想要的结果吗?

puts s.gsub(/&lt;[^&]*&gt;/, '')

#1

require 'cgi'
require 'nokogiri'

html = "&lt;div&gt;Hi All,&lt;/div&gt;&lt;div class=\"paragraph_break\"&gt;&lt; /&gt;&lt;/div&gt;&lt;div&gt;Starting today we are initiating PoLS.&lt;/div&gt;&lt;div class=\"paragraph_break\"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Please use the following communication protocols:&lt;br /&gt;&lt;/div&gt;&lt;div&gt;1. Task Breakup and allocation - Gravity&lt;br /&gt;&lt;/div&gt;&lt;div&gt;2. All mail communications - BC messages&lt;br /&gt;&lt;/div&gt;&lt;div&gt;3. Reports on PoC / Spikes: Writeboard&lt;br /&gt;&lt;/div&gt;&lt;div&gt;4. Non story related tasks: BC To-Do&lt;br /&gt;&lt;/div&gt;&lt;div&gt;5. All UI and HTML will communicated to you through BC.&lt;br /&gt;&lt;/div&gt;&lt;div&gt;6. For File sharing, we'll be using Dropbox.&lt;br /&gt;&lt;/div&gt;&lt;div&gt;7. Use Skype for lighter and generic desicussions. However, in case you need any approvals, data for later reference, etc, then please use BC. PoLS conversation has been created on skype.&lt;/div&gt;&lt;div class=\"paragraph_break\"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;You'll have been given necessary accesses to all these portals. Please start using them judiciously.&lt;/div&gt;&lt;div class=\"paragraph_break\"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;All the best!&lt;/div&gt;&lt;div class=\"paragraph_break\"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Thanks,&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Saurav&lt;br /&gt;&lt;/div&gt;"

puts CGI.unescapeHTML(html)