I have a text blob field in a MySQL column that contains HTML. I have to change some of the markup, so I figured I'll do it in a ruby script. Ruby is irrelevant here, but it would be nice to see an answer with it. The markup looks like the following:
我在包含HTML的MySQL列中有一个文本blob字段。我必须修改一些标记,所以我想我要用ruby脚本来做。Ruby在这里是无关紧要的,但是如果能看到它的答案就好了。标记如下:
<h5>foo</h5>
<table>
<tbody>
</tbody>
</table>
<h5>bar</h5>
<table>
<tbody>
</tbody>
</table>
<h5>meow</h5>
<table>
<tbody>
</tbody>
</table>
I need to change just the first <h5>foo</h5>
block of each text to <h2>something_else</h2>
while leaving the rest of the string alone.
我需要将每个文本的第一个
foo
块更改为something_else
,而只保留字符串的其余部分。Can't seem to get the proper PCRE regex, using Ruby.
使用Ruby似乎无法获得正确的PCRE regex。
3 个解决方案
#1
31
# The regex literal syntax using %r{...} allows / in your regex without escaping
new_str = my_str.sub( %r{<h5>[^<]+</h5>}, '<h2>something_else</h2>' )
Using String#sub
instead of String#gsub
causes only the first replacement to occur. If you need to dynamically choose what 'foo' is, you can use string interpolation in regex literals:
使用字符串#sub代替字符串#gsub只会导致第一次替换。如果您需要动态选择什么是“foo”,您可以在regex文字中使用字符串插值:
new_str = my_str.sub( %r{<h5>#{searchstr}</h5>}, "<h2>#{replacestr}</h2>" )
Then again, if you know what 'foo' is, you don't need a regex:
然后,如果你知道什么是foo,你不需要一个regex:
new_str = my_str.sub( "<h5>searchstr</h5>", "<h2>#{replacestr}</h2>" )
or even:
甚至:
my_str[ "<h5>searchstr</h5>" ] = "<h2>#{replacestr}</h2>"
If you need to run code to figure out the replacement, you can use the block form of sub:
如果需要运行代码进行替换,可以使用sub的块形式:
new_str = my_str.sub %r{<h5>([^<]+)</h5>} do |full_match|
# The expression returned from this block will be used as the replacement string
# $1 will be the matched content between the h5 tags.
"<h2>#{replacestr}</h2>"
end
#2
6
Whenever I have to parse or modify HTML or XML I reach for a parser. I almost never bother with regex or instring unless it's absolutely a no-brainer.
每当我需要解析或修改HTML或XML时,我都会使用解析器。我几乎从不费心使用regex或instring,除非它绝对是一个不需要动脑筋的东西。
Here's how to do it using Nokogiri, without any regex:
以下是如何使用Nokogiri,而不使用任何regex:
text = <<EOT
<h5>foo</h5>
<table>
<tbody>
</tbody>
</table>
<h5>bar</h5>
<table>
<tbody>
</tbody>
</table>
<h5>meow</h5>
<table>
<tbody>
</tbody>
</table>
EOT
require 'nokogiri'
fragment = Nokogiri::HTML::DocumentFragment.parse(text)
print fragment.to_html
fragment.css('h5').select{ |n| n.text == 'foo' }.each do |n|
n.name = 'h2'
n.content = 'something_else'
end
print fragment.to_html
After parsing, this is what Nokogiri has returned from the fragment:
解析后,这是Nokogiri从片段返回的内容:
# >> <h5>foo</h5>
# >> <table><tbody></tbody></table><h5>bar</h5>
# >> <table><tbody></tbody></table><h5>meow</h5>
# >> <table><tbody></tbody></table>
This is after running:
这是在运行:
# >> <h2>something_else</h2>
# >> <table><tbody></tbody></table><h5>bar</h5>
# >> <table><tbody></tbody></table><h5>meow</h5>
# >> <table><tbody></tbody></table>
#3
2
Use String.gsub with the regular expression <h5>[^<]+<\/h5>
:
使用字符串。与正则表达式gsub < h5 >[^ <]+ < \ / h5 >:
>> current = "<h5>foo</h5>\n <table>\n <tbody>\n </tbody>\n </table>"
>> updated = current.gsub(/<h5>[^<]+<\/h5>/){"<h2>something_else</h2>"}
=> "<h2>something_else</h2>\n <table>\n <tbody>\n </tbody>\n </table>"
Note, you can test ruby regular expression comfortably in your browser.
注意,您可以在浏览器中轻松地测试ruby正则表达式。
#1
31
# The regex literal syntax using %r{...} allows / in your regex without escaping
new_str = my_str.sub( %r{<h5>[^<]+</h5>}, '<h2>something_else</h2>' )
Using String#sub
instead of String#gsub
causes only the first replacement to occur. If you need to dynamically choose what 'foo' is, you can use string interpolation in regex literals:
使用字符串#sub代替字符串#gsub只会导致第一次替换。如果您需要动态选择什么是“foo”,您可以在regex文字中使用字符串插值:
new_str = my_str.sub( %r{<h5>#{searchstr}</h5>}, "<h2>#{replacestr}</h2>" )
Then again, if you know what 'foo' is, you don't need a regex:
然后,如果你知道什么是foo,你不需要一个regex:
new_str = my_str.sub( "<h5>searchstr</h5>", "<h2>#{replacestr}</h2>" )
or even:
甚至:
my_str[ "<h5>searchstr</h5>" ] = "<h2>#{replacestr}</h2>"
If you need to run code to figure out the replacement, you can use the block form of sub:
如果需要运行代码进行替换,可以使用sub的块形式:
new_str = my_str.sub %r{<h5>([^<]+)</h5>} do |full_match|
# The expression returned from this block will be used as the replacement string
# $1 will be the matched content between the h5 tags.
"<h2>#{replacestr}</h2>"
end
#2
6
Whenever I have to parse or modify HTML or XML I reach for a parser. I almost never bother with regex or instring unless it's absolutely a no-brainer.
每当我需要解析或修改HTML或XML时,我都会使用解析器。我几乎从不费心使用regex或instring,除非它绝对是一个不需要动脑筋的东西。
Here's how to do it using Nokogiri, without any regex:
以下是如何使用Nokogiri,而不使用任何regex:
text = <<EOT
<h5>foo</h5>
<table>
<tbody>
</tbody>
</table>
<h5>bar</h5>
<table>
<tbody>
</tbody>
</table>
<h5>meow</h5>
<table>
<tbody>
</tbody>
</table>
EOT
require 'nokogiri'
fragment = Nokogiri::HTML::DocumentFragment.parse(text)
print fragment.to_html
fragment.css('h5').select{ |n| n.text == 'foo' }.each do |n|
n.name = 'h2'
n.content = 'something_else'
end
print fragment.to_html
After parsing, this is what Nokogiri has returned from the fragment:
解析后,这是Nokogiri从片段返回的内容:
# >> <h5>foo</h5>
# >> <table><tbody></tbody></table><h5>bar</h5>
# >> <table><tbody></tbody></table><h5>meow</h5>
# >> <table><tbody></tbody></table>
This is after running:
这是在运行:
# >> <h2>something_else</h2>
# >> <table><tbody></tbody></table><h5>bar</h5>
# >> <table><tbody></tbody></table><h5>meow</h5>
# >> <table><tbody></tbody></table>
#3
2
Use String.gsub with the regular expression <h5>[^<]+<\/h5>
:
使用字符串。与正则表达式gsub < h5 >[^ <]+ < \ / h5 >:
>> current = "<h5>foo</h5>\n <table>\n <tbody>\n </tbody>\n </table>"
>> updated = current.gsub(/<h5>[^<]+<\/h5>/){"<h2>something_else</h2>"}
=> "<h2>something_else</h2>\n <table>\n <tbody>\n </tbody>\n </table>"
Note, you can test ruby regular expression comfortably in your browser.
注意,您可以在浏览器中轻松地测试ruby正则表达式。